I write a lot of software that tries to match bibliographic records from one system to another, to try and identify availability of a desired item in various systems.
For instance, if I’m in Google Scholar, I might click on an OpenURL link to our university. And then my software wants to figure out if the item you clicked on is available from any of our various systems — primarily the catalog, or the BorrowDirect consortium, or maybe a free digital version from HathiTrust, a few other places.
You could instead be coming from EBSCOHost, or even WorldCat, or any other third party platform that uses OpenURL to hand-off a citation to a particular institution.
The best way to find these matches is with an identifier, like ISBN, OCLCnum, or LCCN. Trying to search by just author/title, it’s a lot harder for software to be sure it’s found the right thing, and only the right thing. This is why we use identifiers, right? And ISBN is by far the most popular one, the one most likely to be given to my system by a third-party system like Google Scholar — even though there are many titles that don’t have ISBN’s, Google Scholar wont’ send me an OCLCnum or LCCN. We have to work with what we’ve got.
One problem that can arise is when an ISBN seems “wrong” somewhere.
Recently, I was looking for a copy of W.E.B. DuBois’ Darkwater, after I learned that DuBois had written a speculative fiction short story that appears in that collection! (You can actually read it for free on Project Gutenberg online, but I wanted the print version).
Coming from an OpenURL link from WorldCat, WorldCat gave my system this ISBN for that title: 9780743460606. That 13-digit ISBN translates into a 10-digit ISBN 074346060X. (Every 10-digit ISBN has an equivalent 13-digit ISBN which represents the same assigned ISBN, just in 13-digit form).
When my local system powered by Umlaut searched our local catalog for that ISBN — it came up with an entirely different book! Asphalt, by Carl Rux, Atria Books, 2004. Because my system assumes if it gets an ISBN match it’s the right book, it offered that copy to me, and I requested it for delivery at the circ desk — and only when it gave me a request confirmation did I realize, hey, that’s not the book I wanted!
It turns out not only OCLC, but the LC Catalog itself lists that same 10-digit version of the ISBN on two different bibliographic records for two entirely different titles. (I don’t know if catalog.loc.gov URLs are persistent, but try this one). LCCN 2003069067 for an edition of DuBois Darkwater , Washington Square Press 2004; and LCCN 2003069638 for a 2004 Atria Press edition of Rux’s Asphalt. In both records in catalog.loc.gov, that same ISBN 074346060X appears in a MARC 020$a as an ISBN.
So what’s going on? Is there an error in the LC cataloging, which made it into worldcat and many many libraries cataloging? Or did a publisher illegally re-use the same ISBN twice? (The publisher names appear different, but perhaps they are two different imprints of the same publisher? How else would they wind up with the same ISBN prefix?)
I actually don’t know. I did go and get a copy of the 2004 Atria Press Asphalt by Rux from our stacks. It’s a hardcover and no longer has it’s dust jacket, as is typical. But on the verso in the LC Cataloging-in-publication data, it lists a different ISBN: “ISBN 0-7434-7400-7 (alk. paper)”. It does not list the 074346060X ISBN. I think the 074346060X may really belong to the DuBois 2004 Washington Square Press edition? In some cataloging records for the Asphalt/Rux, both ISBN’s appear, as repeated 020’s.
It took me quite a bit of time to get to the bottom of this (which I still haven’t done actually), a couple hours at least. I did it because I was curious, and I wanted to make sure there wasn’t an error in my software. We can’t really “afford” to do this with every mistake or odd thing in our data. But this is a reminder that our software systems can only be as good as our data. And data can be very expensive to fix — let’s say this is an error in LC, and LC fixes it, I have no idea how long it would take to make it to WorldCat, or to individual libraries — there are many libraries that don’t routinely download updates/changes from WorldCat, and the correction would probably never make it to them. (If you have a way to report this to LC and feel like it, feel free to do so and update us in comments!)
Also a reminder that periodically downloading updates from WorldCat, to sync your catalog to any changes in the central system, is a really good idea. It’s time consuming enough for one person to notice an error like this (if it is an error), figure out how to report it, someone to fix it. That work should result in updated records for everyone, not just individual libraries that happen to notice the issue and manually download new copy or fix it.
It may not be a cataloging error — publishers have sometimes assigned the same ISBN to more than one title. Due to a software error on their part, or not understanding how the ISBN system works — sometimes a publisher figures if the ISBN was previously used in an edition that’s been out of print for 20 years, why not re-use it? This is not allowed by the ISBN system. It causes havok in computer systems if a publisher does so. But the ISBN registrars could probably be doing a better job of educating publishers about this (it’s not mentioned in Bowker’s FAQ, maybe they think it’s obvious?). Or even applying some kind of financial penalty or punishment if a publisher does this, to make sure there’s a disincentive?
At any rate, as the programmers say, Garbage In, Garbage Out, our systems can only work with the (meta)data they’ve got — and our catalogers’ and metadata professionals’ work with our data is crucial to our systems ultimate performance.