Why a known item service infrastructure?

It occured to me a while ago that Umlaut isn’t just a ‘link resolver front end’, or an ‘improved link resolver’. It is those things, but when you improve a link resolver enough, and pay attention to all forms/genres (not just journals), what you get is what I’m clunkily calling a Known Item Service Provider, an additional piece of library infrastructure.

I’ve come to think that this is in fact an essential tool that most library digital infrastructure is missing. As an infrastructural tool, it’s not neccesarily designed to answer just one question for one very particular use case, it’s designed to answer the general question (for people and machine access): “What can you tell me or do for me about item X”?

Andy Powell brings up a specific question/use case that’s a sub-set of this: If I know a print book I’m interested in, and likely even know it’s ISBN, does the library have a licensed ebook version?  And secondarily, is there an ebook version in existence whether or not the library licenses it?

This is definitely something in Umlaut’s domain. How well does Umlaut do at answering it?  Currently, the second one ‘does an ebook exist whether or not we license it’, not very well, but if external sources of data (with APIs) could be identified to answer it (as Andy begins doing), plugins to Umlaut could be written to grab those data and make Umlaut’s answer better for this specific use (and perhaps improve other unexpected uses too, since you’ve improved the infrastructural tool).

The first one, does the library have an ebook version, Umlaut does better at, at least at our library.

This works because our library has endeavored to list most ebooks we have in our catalog, and Umlaut tries to do searches of the catalog.But it’s success depends on:

  • We have a record in the catalog OR in our link resolver knowledge base for the ebook. (Umlaut tries to combine both sources of information).
  • Umlaut successfully finds it, which is somewhat trickier than it sounds, since Umlaut uses some heuristic algorithms to try and balance precision (minimize false positives) with recall (minimize false negatives), as well as avoiding duplicate information when data exists in both the catalog and the link resolver.
    • sometimes the ebook record in our catalog has the print ISBN on it too. This will make umlaut’s job easier. Not sure if the SFX knowledge base puts print ISBNs on ebook records.
    • Sometimes Umlaut will do a title-author search of our catalog, but whether it does or not is related to complicated heuristics, which could be tuned for this use case and our data if we put some time into it.

But in fact, it does a reasonably good job anyway. Here are some example Umlaut URLs which take a print ISBN, and tell you “what can the library do or provide for this item”, and the result includes licensed ebooks.  I’ll include a few title-author input too, to show that’s feasible too.

It’s definitely far from perfect, I showed you some succesful positives, finding false negatives would take more time, but I’m sure there in there. (We generally tune Umlaut to avoid false positives, so those are less likely, but there’s surely a few).

Umlaut doesn’t use xISBN or any other “work set expander” service right now, that’d be one obvious improvement, I’d hope to make sometime. Although ideally not before collecting some kind of evidence on how often Umlaut fails for certain tasks in ways that would be improved by a “work set expander”.  There are other data sources and other tunings to Umlaut’s heuristics that could be done.

But I think it shows itself pretty admirably anyway. The point is that Umlaut, as an attempted platform serving as “Known Item Service provider”, is a general purpose tool that can handle this specific use case among many others, and the beauty of a general purpose tool is when you improve it for a certain use case, you get unintended benefits to other use cases you hadn’t yet considered, instead of just having very specific tool for very specific use cases.  I propose that a Known Item Service provider like Umlaut ought to in fact be a key part of an academic libraries infrastructure.

This entry was posted in General. Bookmark the permalink.

6 Responses to Why a known item service infrastructure?

  1. Pingback: Why a known item service infrastructure? « Bibliographic Wilderness « Builder

  2. I have to differ about the coverage of ebooks in the catalog – we have thousands if not tens of thousands of ebooks that are not in the catalog (and I’m not only talking about the 4-5 collections we just licensed this month). The catalog is in no way a reliable indication if we have an ebook. With that said, this could be *the* most important and useful improvement to Umlaut. The xISBN expansion might help some.
    For some publishers in the sciences, you know where to find it if there’s an ebook (Springer, CRC, O’Reilly). But figuring out if there’s an ebook version of most books is difficult and time consuming – we try to do that a lot for our collection development. So anyway – any work in this area is greatly appreciated!

  3. jrochkind says:

    Hmm Christina, well your estimation of ebook coverage in the catalog differs from what I think (unless I’ve misunderstood) is that of our TS department; I won’t take a side, I don’t have any data or experience. At any rate, it’s quite higher than it used to be. Even thousands may be only a few percent, I don’t know our total ebook collection these days.

  4. My understanding is that electronic items tend to be cataloged differently depending on how they are acquired. Ebooks acquired in bulk, as part of large packages, could sometimes be handled the way that “big deal” serials are: only accessible certain ways, and not necessarily fully cataloged.

    Anyway, I came to comment since I like the “known item service infrastructure” conceptualization of Umalut. It seems truer than anything I’ve heard you describe it as yet!

  5. jrochkind says:

    Heh, Jodi, I’m pretty sure this is the same concept I used to describe Umlaut in the last blog post that you told me was basically still confusing and not there yet. Perhaps I’m getting better at describing it, or perhaps you’ve been hearing it enough that it’s starting to make sense.

    ebooks definitely typically have a different cataloger workflow at most libraries — if they are included in cataloger workflow at all. (Same with e-journals; as more and more of our content is ‘e’, this is perhaps somewhat troubling). But at our library our cataloging/technical services department has done a pretty good job of getting a LOT of ebook content into our catalog as MARC. (Whether it really needs to/ought to be MARC is a different question). Better than many libraries I’ve seen, so kudos to them on that, if still not as complete coverage as some (Christina!) might like.

  6. dakvid says:

    Perhaps this functionality will become a reality alternatively through the new discovery layer products – e.g. Summon, Primo (with Primo Central) and EBSCO Discovery Service.

    As I understand it, each of these products has deduplication processes, so if you search for a book and there are print and electronic versions you’ll be shown a single combined result showing you both options.

    It may not be the full implemented reality at present for some or all of them, but I think that’s what they’re aiming for at least.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s