Of ISBN reliability, and the importance of metadata

I write a lot of software that tries to match bibliographic records from one system to another, to try and identify availability of a desired item in various systems.

For instance, if I’m in Google Scholar, I might click on an OpenURL link to our university. And then my software wants to figure out if the item you clicked on is available from any of our various systems — primarily the catalog, or the BorrowDirect consortium, or maybe a free digital version from HathiTrust, a few other places.

You could instead be coming from EBSCOHost, or even WorldCat, or any other third party platform that uses OpenURL to hand-off a citation to a particular institution.

The best way to find these matches is with an identifier, like ISBN, OCLCnum, or LCCN. Trying to search by just author/title, it’s a lot harder for software to be sure it’s found the right thing, and only the right thing. This is why we use identifiers, right? And ISBN is by far the most popular one, the one most likely to be given to my system by a third-party system like Google Scholar — even though there are many titles that don’t have ISBN’s, Google Scholar wont’ send me an OCLCnum or LCCN. We have to work with what we’ve got.

One problem that can arise is when an ISBN seems “wrong” somewhere.

Recently, I was looking for a copy of W.E.B. DuBois’ Darkwater, after I learned that  DuBois had written a speculative fiction short story that appears in that collection! (You can actually read it for free on Project Gutenberg online, but I wanted the print version).

Coming from an OpenURL link from WorldCat, WorldCat gave my system this ISBN for that title: 9780743460606.  That 13-digit ISBN translates into a 10-digit ISBN 074346060X.  (Every 10-digit ISBN has an equivalent 13-digit ISBN which represents the same assigned ISBN, just in 13-digit form).

When my local system powered by Umlaut searched our local catalog for that ISBN — it came up with an entirely different book!  Asphalt, by Carl Rux, Atria Books, 2004.  Because my system assumes if it gets an ISBN match it’s the right book, it offered that copy to me, and I requested it for delivery at the circ desk — and only when it gave me a request confirmation did I realize, hey, that’s not the book I wanted!

It turns out not only OCLC, but the LC Catalog itself lists that same 10-digit version of the ISBN on two different bibliographic records for two entirely different titles. (I don’t know if catalog.loc.gov URLs are persistent, but try this one). LCCN 2003069067 for an edition of DuBois Darkwater , Washington Square Press 2004; and LCCN 2003069638 for a 2004 Atria Press edition of Rux’s Asphalt.  In both records in catalog.loc.gov, that same ISBN 074346060X appears in a MARC 020$a as an ISBN.

So what’s going on? Is there an error in the LC cataloging, which made it into worldcat and many many libraries cataloging?  Or did a publisher illegally re-use the same ISBN twice? (The publisher names appear different, but perhaps they are two different imprints of the same publisher? How else would they wind up with the same ISBN prefix?)

I actually don’t know.  I did go and get a copy of the 2004 Atria Press Asphalt by Rux from our stacks.  It’s a hardcover and no longer has it’s dust jacket, as is typical. But on the verso in the LC Cataloging-in-publication data, it lists a different ISBN: “ISBN 0-7434-7400-7 (alk. paper)”.  It does not list the 074346060X ISBN.  I think the 074346060X may really belong to the DuBois 2004 Washington Square Press edition?  In some cataloging records for the Asphalt/Rux, both  ISBN’s appear, as repeated 020’s.

It took me quite a bit of time to get to the bottom of this (which I still haven’t done actually), a couple hours at least.  I did it because I was curious, and I wanted to make sure there wasn’t an error in my software.  We can’t really “afford” to do this with every mistake or odd thing in our data.  But this is a reminder that our software systems can only be as good as our data.   And data can be very expensive to fix — let’s say this is an error in LC, and LC fixes it, I have no idea how long it would take to make it to WorldCat, or to individual libraries — there are many libraries that don’t routinely download updates/changes from WorldCat, and the correction would probably never make it to them. (If you have a way to report this to LC and feel like it, feel free to do so and update us in comments!)

Also a reminder that periodically downloading updates from WorldCat, to sync your catalog to any changes in the central system, is a really good idea.  It’s time consuming enough for one person to notice an error like this (if it is an error), figure out how to report it, someone to fix it.  That work should result in updated records for everyone, not just individual libraries that happen to notice the issue and manually download new copy or fix it.

It may not be a cataloging error — publishers have sometimes assigned the same ISBN to more than one title. Due to a software error on their part, or not understanding how the ISBN system works — sometimes a publisher figures if the ISBN was previously used in an edition that’s been out of print for 20 years, why not re-use it?  This is not allowed by the ISBN system. It causes havok in computer systems if a publisher does so. But the ISBN registrars could probably be doing a better job of educating publishers about this (it’s not mentioned in Bowker’s FAQ, maybe they think it’s obvious?). Or even applying some kind of financial penalty or punishment if a publisher does this, to make sure there’s a disincentive?

At any rate, as the programmers say, Garbage In, Garbage Out,  our systems can only work with the (meta)data they’ve got — and our catalogers’ and metadata professionals’ work with our data is crucial to our systems ultimate performance.

This entry was posted in General. Bookmark the permalink.

2 Responses to Of ISBN reliability, and the importance of metadata

  1. mikemonaco says:

    As a cataloger, and particularly as a cataloger in a system that uses ISBN as the preferred match point for records, I can tell you ISBNs are much less unique than they were intended to be. Re-use of old ISBNs by a publisher is one problem; some publishers keep using the same ISBN for all editions of frequently updated exam-prep books; some publishers just seem to make up any old 10 or 13 digit number. Self-publishing has contributed to the problem a little too. Vendors provide catalog records with ISBNs that are not correct too. I guess from a publisher’s point of view, all that really matters is being able to use an ISBN on invoices, so if an out-print item used it, there is no real consequence for them.
    Of course cataloging errors contribute to the problem too, and some cataloging agencies like the Library of Congress will enter all ISBNs appearing on an items’ copyright page into their record, although they are for different editions or versions (ebooks, large print, etc.). Sometimes it’sa typo, or accidentally cut & pasted from another record that has data we wanted to carry over to the new record. Sometimes multiple ISBNs are used for various printings and bindings that are bibliographically identical (same imprint, dimensions, etc.), and that is a headache for other reasons.
    In the past 10 years or so, major bibliographic utilities like OCLC have accepted an avalanche of brief records harvested from ONIX data in publisher’s sales catalogs, contributing further to the duplication of records and mis-application of ISBNs, since they are at best entered by clerks, and at worst OCR scans of title pages. The MARC 020 tag for ISBNs can use a subfeild z to indicate a number recorded there is invalid and I see it used both when the number itself is invalid (wrong check digit, etc.) and when the number is non-unique (so invalid in the sense that it is not unique to the item in hand).
    Anyway the way to reprot an error to LC is to pull up the record in their OPAC (http://catalog.loc.gov) and use the link on the bottom of the left-hand sidebar (“Report record errors”). Not having either item in hand, I can’t confidently report it as an error — it may very well appear in both books, so it is correctly transcribed.

  2. Pingback: Latest Library Links 24th April 2015 | Latest Library Links

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s