So, as we move toward Blacklight implementation, I needed some way to expose item/holdings details from my Horizon ILS so they could be consumed for display (and/or indexing) in Blacklight.
I figured, as long as I’m doing this, I might as well do it in some kind of standard (rather than custom ad-hoc) format, so the consumer on the Blacklight end could be standard-ish. And it could be possibly re-used by others, or re-used by ourselves if we switch ILSs, we’d just need to write the provider end on the ILS end, and could keep the same consumer.
So, thankfully, Horizon actually keeps all it’s info in a fairly well normalized rdbms that you can access directly, making this not too hard a task. On top of that. So, those fine folks at uchicago already had a little extension to Horizon to provide the item information in their own custom ad-hoc format, which they kindly shared with me. So I took that, and modified it to produce in ils-di format.
Now, the thing about the ils-di format. It gives you a sort of skeleton to hang your info on. You can list items. You can list what ils-di calls ‘holdingsets’ (and Horizon confusingly calls ‘copies’, and I don’t know what your ILS calls them — a group of related items, like all the bound volumes of a particular bib; or the multiple volumes of a multi-volume set). You can express which items are in which holdingsets.
This is all great, because there wasn’t a simple standard format to do that in before. But when you actually want to say something about the holdingsets or items, dls-di just gives you a slot to put some other (hopefully standard) metadata format in. ( With one exception — isl-di gives you “SimpleAvailability” to describe a human-displayable label, and one of four coded SimpleAvailability statuses to describe item availability/status. This was wise of them, because there was no good way to provide status from a standard vocabulary without this.)
Now, I think ils-di is exactly right to do things this way. Break the problem into manageable chunks, solve one chunk with a solution meant to do one thing well, and make sure your solution can be ‘loosely coupled’ with other solutions meant to solve the other parts. Fine, good show.
But that still leaves me to figure out how to actually describe what I want to describe, using what XML schemas, standardized if possible. (And leaves the community to arrive at a standard set of these extra schemas at a later date, if we want to write software that really is ‘plug and play’ with each other. Oh well, that’s how it goes, better to try some things and define ‘best practices’ and standards off of what works well, then to try and ‘standardize’ before trying in the wild.).
All my stuff
So what’s all the data elements I have that I want to describe somehow, in these extra metadata packages embedded in dlf-di?
Well, you can see them right here in uchicago’s custom ad hoc format, what their servlet did out of the box, with this example of a moderately complex serials record:
So, okay, where to put it? Well, bibIDs and itemIDs are already in the dlf-schema itself. So what else do we have? Marc Format for Holdings Data in MarcXML seems likely. Maybe ISO Holdings? Maybe NCIP?
I started with MFHD in marcxml, because NCIP confuses me (and everyone else), and ISO Holdings you need to pay a couple hundred bucks to look at the standard (although you can see the .xsd schema alone for free).
So in MFHD you can put a lot of stuff actually. Although it’s somewhat confusing to look at, since it uses those obscure marc tag codes and such. But you can put in there:
- user-displayable ‘location’ and ‘collection’ in tag 852
- ‘holding’ (ie ‘holdingset’ ie ‘copy’) identifier in tag 001.
- shelfmark (ie call number/copy information) also in 852.
- A coded value of whether that call number is LCC, NLM, Dewey, Sudocs, a couple others, or ‘other’ or ‘unknown’. 852 indicator 1.
- For ‘holdingsets’ user-presentable coverage statements (for main run, indexes, or supplements), in 866-888.
- ( Note, if my ILS actually had machine-understandable coverage statements, which it does not, you theoretically maybe could put them in MFHD, but I’d much prefer ONIX Serial Coverage, which I think does it much more elegantly and clearly. But I don’t have that data available anyway.)
- I think you can provide an un-coded user-presentable item status/availability string somewhere, but SimpleAvailability takes care of that better so I didn’t worry about.
Meanwhile, dlf:SimpleAvailability is handling my need for both a coded and user-displayable item status/availability string, great, one thing done well. (Although I needed to create a mapping from my 109 internal ‘item status’ codes to the four dlf:SimpleAvailability values!).
But that still left me with some things I wanted to include. Well, MFHD gives me user-displayable labels for location and collection. But I really wanted to include my ILS’s internal codes for location and collection and item status. Why would I want purely local internal codes? Well, because applications I’m using to consume this can possibly be configured to make use of them even though they are purely local identifiers (especially if I’m writing the apps myself!). I also wanted to include ‘item type’ as both an internal code and a user-displayable label, and strangely MFHD has no spot for even user-displayable label for that. Also similarly wanted to expose my internal system “call number type” id, which is not always mappable to a standard type in MFHD like LCC or DDC or whatever.
I looked over what documentation I could find for NCIP, as well as the NCIP xml schema, didn’t seem to have the fields I needed either. I even looked at the ISO Holdings schema without any documentation (my skills at reading raw XML schemas have improved muchly through this project). Nope, not there.
Ross Singer had an idea that you could do this purely with DublinCore (including refinements in ‘dcterms’) and RDF. That might be possible, but I just couldn’t figure out how to do it. But really, I don’t think there are sufficient elements in dc:terms to cover all of those data elements, although Ross found some clever ways to try and express a few of them (Ross trying to do a bit MORE than I really needed, since he didn’t want to depend on the dlf-di schema but I’m just trying to get some metadata I can embed in dlf-di for now, that’s my use case).
So I guess there’s theoretically some way to express your own refinements to dcterms? But I got lost trying to figure that out.
So one way or another, I figured I was going to define my own vocabulary. I could do it as an RDF Vocabularly alone, but I got confused trying to think about that, and once you go to trying to express that in RDF-XML… got confused again. Or I could do it in a custom XML Schema. If I’m going to have to create my own vocabulary anyway, XML Schema just seemed simpler, both to produce and to consume. (And it would be easy for me or someone else to convert this to RDF at a later date, starting from a schema. RDF-XML even lets any defined XML namespace pretty much be RDF out of the box, just add a few RDF attributes here or there!).
So custom schema it was. I created (or am in the middle of creating) an awfully simple XML schema for these elements I needed, mostly internal ILS values, and for each one the schema says you can supply one or more (internal or external) identifiers using a child dc:identifier, a user-displayable label using a child dc:title, and if you like a longer-format user-displayable description. (Didn’t re-use dc:description for this because I really wanted a couple extra attributes there seemed to be no way to add to a dc:description).
Here it is, work in progress. (Not even sure if this validates yet).
The (not so) final product
So here it is, the current version of a dlf ils-di document produced live from my (development box) Horizon, including in it’s metadata payload MFHD in marcxml, dlf:SimpleAvailability, and my custom as yet un-named schema.
See for example this same moderately complicated serials record:
Where to next?
Well, I’ve got to finish polishing it off, make sure all the XML validate against the schemas, make sure the new schema I created is really valid, etc. Polish off a few more things.
Then, I’d like to put this code (derived from uchicago’s code, with their permission) on Google Code, so that other Horizon institutions can use it to provide dlf ils-di responses from their catalogs, woo. (I tried to keep the code as generalizable as possible — for instance, the mapping from your local item status codes to the four dlf:SimpleAvailability values is configurable in a properties file).
I’ve also got my eyes on DAIA as another metadata schema to include in the dlf ils-di response eventually. DAIA is focused on doing what SimpleAvailability does, but with more detail: What services are available, and what’s the URL access points for that service? I need to figure out how to correctly extend DAIA to include services that aren’t in DAIA’s built-in four. (I specifically need the service ‘get a photocopy of a portion of this item’, and ‘place an ILS request/hold for pickup at circ desk’, two services we offer that DAIA doesnt’ specify right now).
And Ross tells me what I’ve done so far has gotten me a lot of the way to a jangle implementation. Great, that was part of the goal, so apparently it succeeded. I’ll finish off the rest of jangle when I have a use case that demands it, which could be sooner or later! (And first i’ll need to understand jangle better!).