Interesting (lack of metadata) problem

So, my new features look for “search inside” functionality in GBS, Amazon, and HathiTrust.

In GBS and HathiTrust, we can look up by OCLC number. This is ordinarily very useful, allowing us to catch ‘hits’ we wouldn’t have caught otherwise, for instance for books without an ISBN.

But here’s a case where… well, it almost works too well. Or to put it better, it doesn’t work well because of lack of sufficient metadata about what’s really going on.

My link resolver gets an article-level citation, for an article that’s in the journal Educational Studies.  Somehow, Umlaut gets an OCLCnum for the journal too. Not sure if this was provided with the original citation (quite possible), or “enhanced” by Umlaut. But having the oclcnum is good, it allows us to look up print runs in our local catalog, for instance. That it’s title-level rather than article-level metadata just makes it analagous to the ISSN, which usually is present in an OpenURL.

But then we go to HathiTrust and GBS and we look up the OCLCnum: 1567620

And we find hits! Because University of Michigan has digitized one volume of that journal, and represented it on a record identified with that oclc number. Which identifies the entire journal.

So Umlaut decides that there is “search inside” functionality at HathiTrust, and provides the link. But Umlaut has no way to warn the user “it’s only search inside of a particular volume”–or even better (but completely impossible) see whether that particular volume includes the article-level citation at hand, to see if it’s useful for that particular citation at all. No way to do it. Our library metadata practices just aren’t good enough.

Umlaut also finds a hit at GBS (to another copy of the same digitized manifestation, that U of M digitized as part of their HathiTrust partnership). But, while it is searchable in GBS, the GBS API doesn’t advertise that fact (GBS for some reason chooses to not tell you about some things that really do have search-inside and/or limited previews), so Umlaut doesn’t provide a GBS search-inside link, just a “Book Information at” link. Yeah, it’s calling it a book when it really shouldn’t be–but GBS’s interface does too!


4 thoughts on “Interesting (lack of metadata) problem”

  1. The HathiTrust API is going to be updated to try to deal with this in some way, but the item-level description we have about what volumes/issues are bound together is, of course, dicey and probably impossibly to reliably parse with a machine. Time will tell how useful it is…

  2. Thanks Bill!

    I am unfortunately all too acquainted with the misery of trying to do useful machine processing on our collective serial holdings data. I’ve blogged about it before, and really wish that the cataloging community would understand what severe limitations present practices place on software. Or maybe they do realize it, but just have other priorities? I don’t know.

    But anyway, whatever you can do in this area I’ll undoubtedly find some way to take advantage of in some way.

  3. we’ve been seeing that too. I’m almost at the point of trying to see if I see an ISSN or genre that identifies a journal/article and turning google/hathi/etc off for them. It’s quite misleading.

  4. Interesting abrin, I didn’t know too many other people were doing this with both google and hathi, and for journals/articles too! Are you doing it in your link resolver? Can you show us some demos? Would love to see how you are doing it.

    I think there is potentially some usefulness to even only being able to search over a portion of a journal, the trick is making sure the user knows what they are doing.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s