OCLC numbers as manifestation identifiers

In writing software to tie together disparate databases of bibliographic information, having un-ambiguous identifiers to represent a manifestation or edition (this isn’t about the specificities of FRBR, use whatever term you are comfortable with) is crucial for making things work simply and reliably.

I know about a particular edition of a particular book, and I want to see if it’s available at Amazon, or Google, or HathiTrust, or WorldCat.  How do I know if a record in one of these foreign databases represents the same thing as a record I have in front of me?

In practice, ISBN, LCCN, and OCLC number are all incredibly valuable here.

We’re used to thinking of an OCLC number as identifying a particular WorldCat record. But that’s not the way I’m using them at all. For instance, Google Books will allow you to query on OCLC number to see if Google Books has a record matching that OCLC number.  I don’t need to have a WorldCat record in front me; all I need to do is know the OCLC number of the edition I’m interested in, and I can ask Google Books if they have it.

This is incredibly valuable. Of ISBN, LCCN, and OCLC number, the identifiers generally found in our library-sector bibliographic data, the OCLC number has the greatest coverage.

Re-conceptualizing OCLC number

While the OCLC number officially represents a record, becuase our library traditions are to create a new record for each edition or manifestation, it can be effectively used to represent an edition/manifestation instead. That’s really what I’m doing when I query Google Books (or HathiTrust) on an OCLC number — using it as a useful un-ambiguous way to reference a particular edition. “Do you have the edition represented by this OCLC number?”  I don’t care about the WorldCat record, really. And this ends up being awfully useful.

Since it’s so useful, the more places we have it, the better.  But our catalog, like many catalogs, has lots of records that do not have OCLC numbers, generally because they are not WorldCat records.

It would be theoretically possible for a non-Worldcat record to still have an OCLC number in it, recording “This record represents the same edition as OCLC number X.”  When I bring this up to catalogers, it often creates some kind of cognitive dissonance: “But the record couldn’t have an OCLC number, it’s not a WorldCat record!”  Sure, it’s not a WorldCat record, but it still is a record of some edition of some work; a WorldCat record probably exists describing the same work. There is no theoretical reason that the record couldn’t say this. And if it did, it would be incredibly useful, it would allow software to more easily identify what’s what.  We need to change our mental models of how an OCLC number is useful.

Perhaps we’d need to record this kind of  OCLC number with special coding in MARC to make sure we can tell the difference between a record that has an OCLC number because it actually is a WorldCat record, and a record that has an informational OCLC number merely saying that it represents the same manifestation as the WorldCat record with that number.  Any ideas of a simple way to do this in MARC?

Practical Issues

Of course, just because there’s no theoretical reason this couldn’t happen doesn’t neccesarily mean it’s practical.  If you buy a few hundred thousand records from a third party vendor, those records aren’t going to be WorldCat records, and aren’t going to have OCLC numbers. Sure, the majority of them probably represent the same edition as some WorldCat record, and could have an OCLC number indicating that, but, well, they don’t.  Manually looking up each record individually to add this information is clearly not feasible.

Interestingly, if the record has an LCCN or ISBN, then either the free Google Book API, or the WorldCat xID API (free to OCLC members) can be used to ‘translate’ from an LCCN or ISBN to an OCLCnum. But if the record already has an LCCN or ISBN, it’s at least already got some global identifier, it gives us less additional benefit to put an OCLC number on there. I’d suggest this should still be done where feasible, but this is the low-hanging fruit, not nearly as tasty as what’s at the top of the tree.

Interestingly, I’ve been told that OCLC has a ‘reclamation’ service, where you can give them lots of records that are not WorldCat records, and they will use internal proprietary algorithms to match these records to WorldCat records.  Traditionally, my understanding is that you’d get back the actual WorldCat records to replace the non-WorldCat records in your catalog.  But this isn’t what we want to do with, for instance, vendor purchased records. We chose to get those records from a non-OCLC source for some reason. But first of all, this indicates that it’s theoretically possible to write heuristic algorithms to match non-OCLC records to OCLC records (with OCLC numbers), and OCLC has solved that reasonably well.

But secondly and even more interestingly, Deborah Fritz on NGC4Lib informs us that OCLC’s reclamation service now offers you the ability to match your non-WorldCat records to WorldCat records, add an OCLCnum in the MARC 035 to your non-Worldcat records and return them to you.

So, note:

1) I find it interesting that OCLC didn’t try to find a way to encode in MARC the distinction between an OCLC number indicating an OCLC record, and an OCLC number just provided informationally meaning “this record represents the same thing.”  They’re returning non-OCLC records to you with OCLC numbers in the 035.

2) This also means that FAQ #6 in the OCLC “Policy for Use and Transfer of WorldCat® Records Frequently Asked Questions: Attribution of WorldCat” document is completely unreliable. OCLC suggests there that if a record has an 035 with an OCLC number in it, that means it’s a WorldCat record. Nope, OCLC itself is now helping people add 035 OCLCnums to records that are not WorldCat records. Not everything people tell you is true.

Misconceptions about Vendor Contracts

Some catalogers and administrators seem to be under the impression that it would somehow violate our license agreement to add OCLC numbers to vendor-purchased records. I don’t see how this can be. You can’t share those records with other people, you can’t let other people use those records without paying for them themselves. Fair enough.  But if your license when you purchased those records really prevents you from adding your own extra useful information to them — you didn’t get a very good deal, and should insist on a different license in the future. But I seriously doubt it does, I think it’s a misconception.  But a widespread one.

And just as we can send vendor-purchased records to authorities vendors for authorities processing, there shouldn’t be any reason we can’t use OCLC as a vendor and send them to them for “WorldCat matching” processing — so long as those records aren’t put into WorldCat for sharing. Yes, we can’t put them into WorldCat for sharing. But there should be no reason we can’t use OCLC as a vendor to add additional information to them, like an OCLC number, and then return them to us without retaining them.  I would be shocked if any contract when you purchased those records prohibits this.  And if it does, it’s a bad contract. But this is a widespread misconception, born of not realizing the usefulness of an OCLC number as a manifestation identifier.

This applies equally to attaching your holding to an OCLC record, for ILL support and worldcat.org display of holdings,  without actually using a WorldCat record, or sharing the vendor record you do use with WorldCat.

OCLC business and community interest

Now, it’s been proposed that one of the reasons that OCLC modified their reclamation service to allow adding 035’s to non-WorldCat records is to better support the new OCLC Local ILS-replacement aspirations.  This is almost certainly true.

And I’ve got no problem with it. This kind of value-added service is exactly what OCLC should be focusing on to rebuild a sustainable business without trying to exersize monopoly ownership of our collective bibliographic patrimony.

In fact, back to the present topic, it serves OCLC’s business interests to have people using OCLC numbers as manifestation identifiers de-coupled from WorldCat records. Because the more data that does that, the easier a time people will have integrating with OCLC WorldCat Grid services, integrating with WorldCat Local, etc.; the lower the total-cost-of-ownership for libraries to purchase FirstSearch Worldcat and use those services, etc.  That’s why they’ve apparently created a ‘reclamation’ service that lets you attach OCLC numbers to non-Worldcat records, it’s good for their business.

Is it in the interest of the generalized library community to use OCLC numbers as a generalized manifestation identifier de-coupled from actual WorldCat records? I say yes. Because it’s not easy to create such a thing, and it’s there, and we’ve got to use everything we’ve got.  But will we have to make a ‘deal with the devil’ to do that? Well, I’m no lawyer, but under U.S. law, I’m pretty sure nobody needs OCLC’s permission to use an OCLC number however they want. Bender v. West established that West publishing did not have copyright over their page numbers, and third parties did not require permission from West to cite those page numbers. OCLC numbers, incrementally assigned in order to records as they are added to WorldCat, sound awfully analagous to page numbers to me.  And mentioning it in a record as a reference sounds awfully analagous to citing a page number.

Still, I’m less sure of law in non-US jurisdictions, and regardless of the law, OCLC could do a lot to discourage this use, make it difficult, threaten people.  But remember before, I argued that it’s actually in OCLC’s interest to have people doing this — it makes it easier for customers to use their other services.

Now, there is maybe going to be confusion inside OCLC about what’s in their interests. On the one side, you’ve got supporters of the new grid services, including worldcat-as-ils, saying, sure, the more people using OCLC numbers — whether or not they are using WorldCat records — the better, the easier it is for them to use our services.   On another side, you might have the traditional OCLC-as-monopoly-data-provider folks saying, no way, we can’t do anything to make it easier for people to not use WorldCat records, we need to be exerting as much pressure as possible to make sure everyone buys all their data from WorldCat.

It’s those former people who are betting on a horse that will win though. The data monopoly business model is simply not a sustainable business model, no matter how much they fight to keep it. And is not in the interests of the library community which OCLC ostensibly represents.   No matter how much OCLC fights for it, there is going to be more and more data out there in library databases that does not come from Worldcat.  They can try to make it as challenging as possible for libraries with such data, harming their own left hand, or they can embrace a business model based on services rather than data monopoly.

New Business Models

If I were OCLC, I’d be encouraging and facillitating the use of OCLC numbers as manifestation identifiers decoupled from Worldcat records. All those vendors selling proprietary records to libraries, I’d be offering to run their records through the ‘reclamation service’ to add OCLC numbers to them for free so that when customers got the records, they had OCLC numbers in them even though they are proprietary non-Worldcat records.  Customers are going to buy and get records from non-Worldcat sources anyway, but this way when they get those records, the path is lubricated for them to still participate in OCLC ILL infrastructure, to use OCLC Grid services, to expose their holdings via worldcat.org, to use the new worldcat-as-ils services.  OCLC wins; third party vendors and libraries also win becuase they’ve got more useful valuable records.

Does it give OCLC an advantage over competitors to be the maintainer of one of the most useful manifestation identifier systems we’ve got, and have a well-developed infrastructure for dealing with records identified suchly?  Sure it does. So use it.  I don’t begrudge them the advantage based on actual value, rather than attempt to legally enforce monopoly control of data.  We all win, because OCLC numbers as manifestation identifiers are vitally useful.

So give up an attempt to own the data. Focus on the services, and the fact that you do have a privileged position to offer those services. Is this a guarantee of success?  No.  Other competitors will arise — as they have already (as evidenced by the fact that so many OCLC member libraries are already purchasing proprietary records from third party vendors — which at the moment generally means those holdings aren’t registered in WorldCat, which is no good for us or OCLC). OCLC has got to efficiently provide value to compete, one way or the other. There are no magic bullets.  We libraries are also dealing with wrenching changes to our business in an effort to remain sustainable and relevant. Welcome to the 21st century.

But OCLC can try to find a sustainable business model that also serves the interests of libraries. Or OCLC can try to stick a business model that in fact opposes the long-term interests of libraries.  While no business can afford to drive it’s customers out of business, OCLC additionally is a non-profit cooperative beholden by it’s mission to serve our interests. Grid Services, worldcat-as-ils, etc, provide a way to stay relevant and sustainable while serving our interests.  And, in a nice coincidence, that particular model is positively effected by opening up the data, encouraging the continued use of OCLC numbers as identifiers regardless of where you buy your data, etc.   One way or another, OCLC will change, they can take us all down with them, or they can switch gears.  The new services tells us that at least some parts of OCLC are trying to switch gears.

13 thoughts on “OCLC numbers as manifestation identifiers

  1. I am not very familiar with OCLC numbers, but I assume that every record in worldcat has such a number. So if special libraries put their records of say, a private library of XY they’re holding, into worldcat, the OCLC numbers refer to items, not manifestations. And what about electronic or microfilmed copies? So we do have the concepual flaw that the numbering system can refer to manifestations AND items. What we need is to be able to tell the difference and to link. Enter systems with better abilities to describe on the item level, and FRBR.

  2. I’m not familiar enough with rare book cataloging and how it intersects with WorldCat to know for sure, so maybe yeah sometimes it’s an item rather than a manifestation, but in practice.. it works out. It’s not perfect, no, but we have so few systems of un-ambiguous identifiers for our bibliographic entities at all, that we take what we can get. It works out. I actually don’t know is rare-books-cataloging means individual items can get their own records in worldcat, I’ve never seen that, but it may be.

    An electronic or microform version is considered a different manifestation. Which is why they get their own records. Or you could explain this in the other direction, since FRBR is a formalized model of more or less what we’ve been doing — since traditional library cataloging considers electronic or microform versions distinct enough to require their own records, FRBR considers them different manifestations.

  3. One problem I see right now is that OCLC numbers are not unique identifiers of manisfestations. There are in fact lots of duplicate records in OCLC for the same manisfestation today. Plus records that look like duplicates but aren’t really, since they are “parallel” records from non-English cataloging agencies. It just makes the picture a little more complex. (Though maybe the ideas you are talking about will encourage OCLC to do some de-duping cleanup on the true dups.)

    Also, when OCLC does a batch project matching vendor records to their database, they can return a file with just a list of your local system control numbers with the corresponding OCLC numbers they have found. We have a way to load these into our system and place the OCLC numbers into any field we want (we use the 001). So the vendor records remain untouched. (Though OCLC needs to see copies to do the matching.) OCLC is definitely out there trying to negotiate with the 3rd party vendors to do this. I think vendors just need to be reassured (in writing, by lawyers) that the data isn’t going to be resold or put into the cataloging database.

  4. Of course WorldCat is not perfect, but it’s _good enough_ to be VERY useful. I have made much use of it!

    I believe that OCLC considers it a mistake when two records exist representing the same manifestation. Sure, the database has mistakes. All our sources of data do. But it’s as good as anything else out there. It not being perfect does not make it useless, far from it.

    Thanks for confirming that it IS possible to add OCLC numbers to vendor-supplied records without adding them to WorldCat.

    I don’t see why there’d really need be any negotiation or re-assurance of these record vendors. When we send the records we’ve purchased from a vendor to a different vendor for authorities processing, do we need special permission? Why should this be any different, it’s quite the same thing, we’re sending their records to a vendor for additional information to be added to them. You’ve apparently actually done this, Diana? Did you need to get special permission from all of your record vendors?

  5. Actually, we do get permission from 3rd party vendors to send their records for authority processing! It’s not a big deal.

    We (U of Washington) have completed a couple of projects to add OCLC numbers to our vendor records. Here’s the web site associated with that service:

    http://www.oclc.org/worldcatlocal/support/vendor.htm

    One benefit of negotiation: If a title does NOT have an OCLC record at all, OCLC likes to get permission to add the vendor record to the database. (Not all the manifestations in the universe are in OCLC yet!) And OCLC negotiates on the behalf of all member libraries, we don’t actually get special permission from the vendors ourselves for OCLC# projects.

    And I didn’t mean to say that OCLC#s are useless because of the dups! But for instance the algorithm for matching WorldCat records against local catalog records for WorldCat Local (largely based on OCLC#s) has to jump through a lot more hoops to account for record mergers, dups, etc. Like I said, dirty data just makes life a more complicated. There are definitely times when I think it would be simpler to clean up dirty data than have to program around it.

  6. Interesting, I didn’t realize that libraries generally had to get specific permission to send records for authorities processing too. What a pain! I wonder if we should try negotiating better contracts that make it clear we can hire third parties to help us enhanced purchased records, so long as those third parties contractually commit not to retain copies.

    And I definitely agree with you about cleaning up the data. Both in terms of WorldCat, and in terms of our local stores, it’s always seemed to me that we’re being penny wise and pound foolish not expanding more resources to try and clean up legacy data. It’ll never be perfect, but we could make it a lot better, and I think a moderate amount of resources spent could pay dividends in the medium and long term.

  7. I’m wondering how this works exactly – given that OCLC numbers aren’t unique, (and I can’t find enough detail on the Google Books API page), do you know whether Google matches on a related OCLC number?

  8. I’m not sure what you mean by saying “OCLC numbers aren’t unique”, I don’t understand your question. Aren’t unique with regards to what?

    You can query Google Books for an OCLC number, and get back the Google Books record that has that OCLC number (if any). It’s quite possible that Google Book will have a record matching the same book but not actually know it’s OCLC number — although recent OCLC/Google cooperation has, in my observations, made that less likely. Most OCLCnums find a match in Google. While it’s theoretically possible that Google could have two records with the same OCLC number, I haven’t ever actually seen that, perhaps the algorithms they use for indexing OCLC numbers may not in fact possibly result in that, I don’t know.

    But Google’s de-duplication and workset-grouping algorithms are pretty imperfect too, actually in my anecdotal observations not significantly better than anyone elses good-enough-but-imperfect algorithms. I’m sure there will be times when Google has not unified records representing the same edition with the same oclcnum; I don’t know if it’s more likely that they’d both have the same oclcnum, or that one would be missing an oclcnum, or what would determine that. The specifics of Google’s algorithms are of course proprietary and not known to us.

    Not sure if that answers your question. Still wondering what you mean by ‘oclc numbers are not unique’ –of course they are, in the uninteresting sense that any integer is a unique integer!

  9. I agree that it would be useful to have the OCLC number in non-OCLC records to facilitate the collocation and linking that could be done with them. However, it would definitely be desirable to have some way to distinguish between such numbers and numbers in records that really came from OCLC. At least under the current nineteen-eighty-whatever guidelines, OCLC is apparently mostly assessing whether or not something is an OCLC record based on the presence of an OCLC number in 001/035/994. So it surprises me that they’re putting OCLC numbers in the 035 of non-OCLC number records as that seems to defeat the purpose of distinguishing records that came from OCLC. Perhaps everyone could just agree to use a different prefix to signify a reference to an OCLC number on a non-OCLC record?

    So far as I know, OCLC numbers are unique, but they do have a lot of problems with duplicate records for the same manifestation. When a record is merged into another, the old OCLC number is kept in the 019 field of the preferred record. So we can always get from our catalog to the OCLC record. However, at least for us, it doesn’t seem to work the other way around. I find that when our record links to an OCLC number in the 019, then a search in Open WorldCat will say that we have holdings, but will fail to find the record in our catalog because Open WorldCat only searches for the preferred OCLC number, which isn’t in our record. This could maybe be fixed with an OR search, but some of them would be pretty long (and our system anyway has a max number of characters allowed per search).

  10. Yeah, that’s something that really is best fixed on your local catalog end, Kelley.

    Your local catalog could have it’s data updated to use the right OCLC numbers! Which can be done by paying OCLC a lot of money, I think, or if you have local programmers could probably be done with the OCLC WorldCat API services and some local programming.

    Or your local catalog could itself expand an OCLC number search at the search-time using those WorldCat API services. But that assumes a local catalog that isn’t the crappy software most of us are stuck with.

    But that’s something that really needs to be fixed at your end. Our terribly inefficient metadata maintainance practices and terribly unpowerful and inefficient local catalog systems are an unavoidable handicap to us.

  11. Check this out: http://www.loc.gov/marc/bibliographic/bd003.html All OCLC records that one acquires from OCLC come with the OCLC number in the 001 tag with an accompanying 003 tag. Generally the oclc number is moved to the 035 tag when the record is loaded into a local system since the local system record number then resides in the 001 tag. I would imagine that most local systems retain the 003 tag even if the catalogers can’t see it. So there is a marker in the record to indicate OCLC as the source of the record. Also, many local systems allow for the addition of a unique prefix to differentiate numeric strings – i.e. ocm12345 for an OCLC number rather than just 12345. For matching purposes the prefix identifies the specific source numeric and then can be stripped off for the actually matching function.

Leave a comment