exposing holdings in dlf ils-di standard format web service

So, as we move toward Blacklight implementation, I needed some way to expose item/holdings details from my Horizon ILS so they could be consumed for display (and/or indexing) in Blacklight.

I figured, as long as I’m doing this, I might as well do it in some kind of standard (rather than custom ad-hoc) format, so the consumer on the Blacklight end could be standard-ish. And it could be possibly re-used by others, or re-used by ourselves if we switch ILSs, we’d just need to write the provider end on the ILS end, and could keep the same consumer.

So looking around for standard formats, the DLF ILS-di format (xsd schema) seemed pretty suitable, designed for just this task.

So, thankfully, Horizon actually keeps all it’s info in a fairly well normalized rdbms that you can access directly, making this not too hard a task. On top of that. So, those fine folks at uchicago already had a little extension to Horizon to provide the item information in their own custom ad-hoc format, which they kindly shared with me. So I took that, and modified it to produce in ils-di format.

Metadata formats

Now, the thing about the ils-di format.  It gives you a sort of skeleton to hang your info on. You can list items. You can list what ils-di calls ‘holdingsets’ (and Horizon confusingly calls ‘copies’, and I don’t know what your ILS calls them — a group of related items, like all the bound volumes of a particular bib; or the multiple volumes of a multi-volume set).  You can express which items are in which holdingsets.

This is all great, because there wasn’t a simple standard format to do that in before.  But when you actually want to say something about the holdingsets or items, dls-di just gives you a slot to put some other (hopefully standard) metadata format in.  ( With one exception — isl-di gives you “SimpleAvailability” to describe a human-displayable label, and one of four coded SimpleAvailability statuses to describe item availability/status.  This was wise of them, because there was no good way to provide status from a standard vocabulary without this.)

Now, I think ils-di is exactly right to do things this way. Break the problem into manageable chunks, solve one chunk with a solution meant to do one thing well, and make sure your solution can be ‘loosely coupled’ with other solutions meant to solve the other parts. Fine, good show.

But that still leaves me to figure out how to actually describe what I want to describe, using what XML schemas, standardized if possible. (And leaves the community to arrive at a standard set of these extra schemas at a later date, if we want to write software that really is ‘plug and play’ with each other. Oh well, that’s how it goes, better to try some things and define ‘best practices’ and standards off of what works well, then to try and ‘standardize’ before trying in the wild.).

All my stuff

So what’s all the data elements I have that I want to describe somehow, in these extra metadata packages embedded in dlf-di?

Well, you can see them right here in uchicago’s custom ad hoc format, what their servlet did out of the box, with this example of a moderately complex serials record:

http://hip-dev.mse.jhu.edu/items/bib/418855.uchicago

So, okay, where to put it?  Well, bibIDs and itemIDs are already in the dlf-schema itself.  So what else do we have?  Marc Format for Holdings Data in MarcXML seems likely.  Maybe ISO Holdings?  Maybe NCIP?

I started with MFHD in marcxml, because NCIP confuses me (and everyone else), and ISO Holdings you need to pay a couple hundred bucks to look at the standard (although you can see the .xsd schema alone for free).

So in MFHD you can put a lot of stuff actually.  Although it’s somewhat confusing to look at, since it uses those obscure marc tag codes and such. But you can put in there:

  • user-displayable ‘location’ and ‘collection’ in tag 852
  • ‘holding’ (ie ‘holdingset’ ie ‘copy’) identifier in tag 001.
  • shelfmark (ie call number/copy information) also in 852.
  • A coded value of whether that call number is LCC, NLM, Dewey, Sudocs, a couple others, or ‘other’ or ‘unknown’. 852 indicator 1.
  • For ‘holdingsets’ user-presentable coverage statements (for main run, indexes, or supplements), in 866-888.
    • ( Note, if my ILS actually had machine-understandable coverage statements, which it does not, you theoretically maybe could put them in MFHD, but I’d much prefer ONIX Serial Coverage, which I think does it much more elegantly and clearly. But I don’t have that data available  anyway.)
  • I think you can provide an un-coded user-presentable item status/availability string somewhere, but SimpleAvailability takes care of that better so I didn’t worry about.

Meanwhile, dlf:SimpleAvailability is handling my need for both a coded and user-displayable item status/availability string, great, one thing done well. (Although I needed to create a mapping from my 109 internal ‘item status’ codes to the four dlf:SimpleAvailability values!).

But that still left me with some things I wanted to include.  Well, MFHD gives me user-displayable labels for location and collection. But I really wanted to include my ILS’s internal codes for location and collection and item status. Why would I want purely local internal codes? Well, because applications I’m using to consume this can possibly be configured to make use of them even though they are purely local identifiers (especially if I’m writing the apps myself!).  I also wanted to include ‘item type’ as both an internal code and a user-displayable label, and strangely MFHD has no spot for even user-displayable label for that.  Also similarly wanted to expose my internal system “call number type” id, which is not always mappable to a standard type in MFHD like LCC or DDC or whatever.

I looked over what documentation I could find for NCIP, as well as the NCIP xml schema, didn’t seem to have the fields I needed either. I even looked at the ISO Holdings schema without any documentation (my skills at reading raw XML schemas have improved muchly through this project). Nope, not there.

So, what?

Ross Singer had an idea that you could do this purely with DublinCore (including refinements in ‘dcterms’) and RDF. That might be possible, but I just couldn’t figure out how to do it. But really, I don’t think there are sufficient elements in dc:terms to cover all of those data elements, although Ross found some clever ways to try and express a few of them (Ross trying to do a bit MORE than I really needed, since he didn’t want to depend on the dlf-di schema but I’m just trying to get some metadata I can embed in dlf-di for now, that’s my use case).

So I guess there’s theoretically some way to express your own refinements to dcterms?  But I got lost trying to figure that out.

So one way or another,  I figured I was going to define my own vocabulary. I could do it as an RDF Vocabularly alone, but I got confused trying to think about that, and once you go to trying to express that in RDF-XML… got confused again.  Or I could do it in a custom XML Schema.  If I’m going to have to create my own vocabulary anyway, XML Schema just seemed simpler, both to produce and to consume. (And it would be easy for me or someone else to convert this to RDF at a later date, starting from a schema.  RDF-XML even lets any defined XML namespace pretty much be RDF out of the box, just add a few RDF attributes here or there!).

So custom schema it was. I created (or am in the middle of creating) an awfully simple XML schema for these elements I needed, mostly internal ILS values, and for each one the schema says you can supply one or more (internal or external) identifiers using a child dc:identifier, a user-displayable label using a child dc:title, and if you like a longer-format user-displayable description. (Didn’t re-use dc:description for this because I really wanted a couple extra attributes there seemed to be no way to add to a dc:description).

Here it is, work in progress. (Not even sure if this validates yet).

The (not so) final product

So here it is, the current version of a dlf ils-di document produced live from my (development box) Horizon, including in it’s metadata payload MFHD in marcxml, dlf:SimpleAvailability, and my custom as yet un-named schema.

See for example this same moderately complicated serials record:

http://hip-dev.mse.jhu.edu/items/bib/418855

Where to next?

Well, I’ve got to finish polishing it off, make sure all the XML validate against the schemas, make sure the new schema I created is really valid, etc.  Polish off a few more things.

Then, I’d like to put this code (derived from uchicago’s code, with their permission) on Google Code, so that other Horizon institutions can use it to provide dlf ils-di responses from their catalogs, woo.  (I tried to keep the code as generalizable as possible — for instance, the mapping from your local item status codes to the four dlf:SimpleAvailability values is configurable in a properties file).

I’ve also got my eyes on DAIA as another metadata schema to include in the dlf ils-di response eventually.  DAIA is focused on doing what SimpleAvailability does, but with more detail: What services are available, and what’s the URL access points for that service? I need to figure out how to correctly extend DAIA to include services that aren’t in DAIA’s built-in four. (I specifically need the service ‘get a photocopy of a portion of this item’, and ‘place an ILS request/hold for pickup at circ desk’, two services we offer that DAIA doesnt’ specify right now).

And Ross tells me what I’ve done so far has gotten me a lot of the way to a jangle implementation. Great, that was part of the goal, so apparently it succeeded. I’ll finish off the rest of jangle when I have a use case that demands it, which could be sooner or later! (And first i’ll need to understand jangle better!).

This entry was posted in General. Bookmark the permalink.

7 Responses to exposing holdings in dlf ils-di standard format web service

  1. Jakob says:

    About your arguments whether RDF or XML: It does not matter as long as you define a common data model and use URIs as identifiers – its more important which entities exist and what in real world they refer to instead of how they are encoded in a specific format. This is the way I choosed with DAIA which can be expressed in XML, JSON, and RDF (I am still working on the OWL Ontology for DAIA/RDF but it will come). The missing DAIA services could be added – I only think about whether a taxonomy of services makes sense: ‘place an ILS request/hold for pickup at circ desk’ is probably a special case of ‘loan’ and ‘get a photocopy of a portion of this item’ is a special case of ‘openaccess’).

  2. jrochkind says:

    Thanks Jakob.

    I agree that at a sort of basic conceptual level, the vocabulary/ontology matters more than how you serialize it. But at a practical level, at least for relative beginners like me, how you serialize it matters too, figuring out how to do it in a legal way etc. It’s great that you provided standard ways to encode DAIA in multiple ways, but I don’t neccesarily have time to do that right up front for something I’m working on (of course if it catches on, other standard serializations can always be defined — conveniently, if you have an XML Schema, you kind of get a RDF-XML serialization defined for you, since the RDF XML spec seems to provide generalized instructions for turning an XML Schema into RDF).

    Anyway, back to DAIA. Is there any way for me as an implementer to add my _own_ extension services to DAIA? I’m not sure if it makes sense to expand the DAIA taxonomy itself — it probably requires more thought and/or experimentation to see what will be needed generally. But I _know_ I need ‘place ILS request’ and ‘make photocopy’ myself in my app — they might be special cases of ‘loan’ and ‘openaccess’, but I still need to distinguish them from those general cases.

    Is there any way for me to do that as my own custom extension, without needing to wait for/convince you to expand DAIA itself? Was that contemplated in the DAIA spec? Does it make sense to maybe plan for it, and explicitly allow local extensions of the service taxonomy?

    Cause one way or another, I’m going to need it. If DAIA doesn’t allow it, then either I’ll have to figure out a way to do it ‘illegally’ with DAIA anyway — or I won’t be able to use DAIA, which would be terrible since DAIA is pretty much designed just for what I need.

  3. jrochkind says:

    PS: The DAIA wiki page says of ‘openaccess’: “openaccess – an item can be used imediately without any restrictions by the institution, you don’t even have to give it back. This applies for Open Access publications and free copies ”

    How “immediately” is “immediately”? The photocopy case is one where you put in a request, and then might have to wait several business days to get a photocopy either physically delivered to you or emailed to you. Is this a subset of ‘openaccess’?

    But whether it is or not, I’m still going to need to distinguish ‘digital copy available immediately’ from ‘photocopy available on request’ in my DAIA response — becasue I’m going to need my client apps to be able to tell the difference. Likewise with “available for loan in general” and “available for an ILS ‘request’ function”, I have some things that are the former but not the latter, and need my client apps to be able to distinguish.

  4. Jakob says:

    We thought about allowing custom services like “photocopy” in DAIA but wanted to first keep it strict before everyone starts to add his own extension. I think the best solution is to extend DAIA to allow any URI in addition to “presentation”, “loan”, “openaccess”, “interloan” so just define an URI for ‘photocopy available on request’ and put it in the DAIA service field. I will update the DAIA XML Schema and the format description and write that you SHOULD use the for basic services but you MAY use other services – but general clients will ignore them.

  5. That sounds like a great solution to me, thanks Jakob.

    Per your first idea, I guess it _might_ be good if it were possible to allow a service to say it’s a “loan”, but SPECIFICALLY also a [custom extended type, such as ‘request place’]. That way standard clients could still treat it like ‘loan’, but clients ‘in the know’ could know something more specific about it.

    But I’m not sure how to account for that in the schema. As a first step, what you suggest seems sufficient to me — you can see how it goes and how people use it, to see if more sophisticated something is required.

    As it is, with the ability to use a custom service type like this, I definitely will be using it, producing and consuming.

    For some uses I’m contemplating, the client will simply display ALL of the service types to the user, without the software needing to care what they are. For those cases, I suppose even a client ‘not in the know’ could simply display these ‘extended’ services in the list too, without needing to know what they are. I forget, does DAIA have a field for a user-displayable description or prompt about the service, in addition to the URI class?

  6. And, while I’m bugging you, one more thing I thought of: DAIA does allow me to include a “limitation” as a textual description for the user. Is there any way to include a coded/vocabulary for the limitation? Ah, is this what the href and/or id fields are for? Using my own custom vocabularly, not one included in DAIA of course. When would I use the id as opposed to the href?

    In a similar use case area, if I want to write my service such that the client CAN provide a particular user ID in a query parameter in the query requesting the DAIA response, and if provided, my software will return a DAIA response including ONLY services available to THAT specific user… does that seem appropriate?

  7. Jakob says:

    limitation.id is can hold a coded value of limitation type and limit.href can hold a hyperlink to provide more information about the limitation. IFF you URLs are 100% stable and return RDF or HTML depending on the client, you can use the same URI/URL for both.

    The second use case can be implemented in a DAIA service with an additional HTTP parameter. I recommend to use URL rewriting to support query URLs like “http://example.com/daia/USERID/?id=DOCUMENTID”, but you can also use http://example.com/daia/?id=DOCUMENTID&user=USERID

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s