In writing software to tie together disparate databases of bibliographic information, having un-ambiguous identifiers to represent a manifestation or edition (this isn’t about the specificities of FRBR, use whatever term you are comfortable with) is crucial for making things work simply and reliably.
I know about a particular edition of a particular book, and I want to see if it’s available at Amazon, or Google, or HathiTrust, or WorldCat. How do I know if a record in one of these foreign databases represents the same thing as a record I have in front of me?
In practice, ISBN, LCCN, and OCLC number are all incredibly valuable here.
We’re used to thinking of an OCLC number as identifying a particular WorldCat record. But that’s not the way I’m using them at all. For instance, Google Books will allow you to query on OCLC number to see if Google Books has a record matching that OCLC number. I don’t need to have a WorldCat record in front me; all I need to do is know the OCLC number of the edition I’m interested in, and I can ask Google Books if they have it.
This is incredibly valuable. Of ISBN, LCCN, and OCLC number, the identifiers generally found in our library-sector bibliographic data, the OCLC number has the greatest coverage.
Re-conceptualizing OCLC number
While the OCLC number officially represents a record, becuase our library traditions are to create a new record for each edition or manifestation, it can be effectively used to represent an edition/manifestation instead. That’s really what I’m doing when I query Google Books (or HathiTrust) on an OCLC number — using it as a useful un-ambiguous way to reference a particular edition. “Do you have the edition represented by this OCLC number?” I don’t care about the WorldCat record, really. And this ends up being awfully useful.
Since it’s so useful, the more places we have it, the better. But our catalog, like many catalogs, has lots of records that do not have OCLC numbers, generally because they are not WorldCat records.
It would be theoretically possible for a non-Worldcat record to still have an OCLC number in it, recording “This record represents the same edition as OCLC number X.” When I bring this up to catalogers, it often creates some kind of cognitive dissonance: “But the record couldn’t have an OCLC number, it’s not a WorldCat record!” Sure, it’s not a WorldCat record, but it still is a record of some edition of some work; a WorldCat record probably exists describing the same work. There is no theoretical reason that the record couldn’t say this. And if it did, it would be incredibly useful, it would allow software to more easily identify what’s what. We need to change our mental models of how an OCLC number is useful.
Perhaps we’d need to record this kind of OCLC number with special coding in MARC to make sure we can tell the difference between a record that has an OCLC number because it actually is a WorldCat record, and a record that has an informational OCLC number merely saying that it represents the same manifestation as the WorldCat record with that number. Any ideas of a simple way to do this in MARC?
Of course, just because there’s no theoretical reason this couldn’t happen doesn’t neccesarily mean it’s practical. If you buy a few hundred thousand records from a third party vendor, those records aren’t going to be WorldCat records, and aren’t going to have OCLC numbers. Sure, the majority of them probably represent the same edition as some WorldCat record, and could have an OCLC number indicating that, but, well, they don’t. Manually looking up each record individually to add this information is clearly not feasible.
Interestingly, if the record has an LCCN or ISBN, then either the free Google Book API, or the WorldCat xID API (free to OCLC members) can be used to ‘translate’ from an LCCN or ISBN to an OCLCnum. But if the record already has an LCCN or ISBN, it’s at least already got some global identifier, it gives us less additional benefit to put an OCLC number on there. I’d suggest this should still be done where feasible, but this is the low-hanging fruit, not nearly as tasty as what’s at the top of the tree.
Interestingly, I’ve been told that OCLC has a ‘reclamation’ service, where you can give them lots of records that are not WorldCat records, and they will use internal proprietary algorithms to match these records to WorldCat records. Traditionally, my understanding is that you’d get back the actual WorldCat records to replace the non-WorldCat records in your catalog. But this isn’t what we want to do with, for instance, vendor purchased records. We chose to get those records from a non-OCLC source for some reason. But first of all, this indicates that it’s theoretically possible to write heuristic algorithms to match non-OCLC records to OCLC records (with OCLC numbers), and OCLC has solved that reasonably well.
But secondly and even more interestingly, Deborah Fritz on NGC4Lib informs us that OCLC’s reclamation service now offers you the ability to match your non-WorldCat records to WorldCat records, add an OCLCnum in the MARC 035 to your non-Worldcat records and return them to you.
1) I find it interesting that OCLC didn’t try to find a way to encode in MARC the distinction between an OCLC number indicating an OCLC record, and an OCLC number just provided informationally meaning “this record represents the same thing.” They’re returning non-OCLC records to you with OCLC numbers in the 035.
2) This also means that FAQ #6 in the OCLC “Policy for Use and Transfer of WorldCat® Records Frequently Asked Questions: Attribution of WorldCat” document is completely unreliable. OCLC suggests there that if a record has an 035 with an OCLC number in it, that means it’s a WorldCat record. Nope, OCLC itself is now helping people add 035 OCLCnums to records that are not WorldCat records. Not everything people tell you is true.
Misconceptions about Vendor Contracts
Some catalogers and administrators seem to be under the impression that it would somehow violate our license agreement to add OCLC numbers to vendor-purchased records. I don’t see how this can be. You can’t share those records with other people, you can’t let other people use those records without paying for them themselves. Fair enough. But if your license when you purchased those records really prevents you from adding your own extra useful information to them — you didn’t get a very good deal, and should insist on a different license in the future. But I seriously doubt it does, I think it’s a misconception. But a widespread one.
And just as we can send vendor-purchased records to authorities vendors for authorities processing, there shouldn’t be any reason we can’t use OCLC as a vendor and send them to them for “WorldCat matching” processing — so long as those records aren’t put into WorldCat for sharing. Yes, we can’t put them into WorldCat for sharing. But there should be no reason we can’t use OCLC as a vendor to add additional information to them, like an OCLC number, and then return them to us without retaining them. I would be shocked if any contract when you purchased those records prohibits this. And if it does, it’s a bad contract. But this is a widespread misconception, born of not realizing the usefulness of an OCLC number as a manifestation identifier.
This applies equally to attaching your holding to an OCLC record, for ILL support and worldcat.org display of holdings, without actually using a WorldCat record, or sharing the vendor record you do use with WorldCat.
OCLC business and community interest
Now, it’s been proposed that one of the reasons that OCLC modified their reclamation service to allow adding 035’s to non-WorldCat records is to better support the new OCLC Local ILS-replacement aspirations. This is almost certainly true.
And I’ve got no problem with it. This kind of value-added service is exactly what OCLC should be focusing on to rebuild a sustainable business without trying to exersize monopoly ownership of our collective bibliographic patrimony.
In fact, back to the present topic, it serves OCLC’s business interests to have people using OCLC numbers as manifestation identifiers de-coupled from WorldCat records. Because the more data that does that, the easier a time people will have integrating with OCLC WorldCat Grid services, integrating with WorldCat Local, etc.; the lower the total-cost-of-ownership for libraries to purchase FirstSearch Worldcat and use those services, etc. That’s why they’ve apparently created a ‘reclamation’ service that lets you attach OCLC numbers to non-Worldcat records, it’s good for their business.
Is it in the interest of the generalized library community to use OCLC numbers as a generalized manifestation identifier de-coupled from actual WorldCat records? I say yes. Because it’s not easy to create such a thing, and it’s there, and we’ve got to use everything we’ve got. But will we have to make a ‘deal with the devil’ to do that? Well, I’m no lawyer, but under U.S. law, I’m pretty sure nobody needs OCLC’s permission to use an OCLC number however they want. Bender v. West established that West publishing did not have copyright over their page numbers, and third parties did not require permission from West to cite those page numbers. OCLC numbers, incrementally assigned in order to records as they are added to WorldCat, sound awfully analagous to page numbers to me. And mentioning it in a record as a reference sounds awfully analagous to citing a page number.
Still, I’m less sure of law in non-US jurisdictions, and regardless of the law, OCLC could do a lot to discourage this use, make it difficult, threaten people. But remember before, I argued that it’s actually in OCLC’s interest to have people doing this — it makes it easier for customers to use their other services.
Now, there is maybe going to be confusion inside OCLC about what’s in their interests. On the one side, you’ve got supporters of the new grid services, including worldcat-as-ils, saying, sure, the more people using OCLC numbers — whether or not they are using WorldCat records — the better, the easier it is for them to use our services. On another side, you might have the traditional OCLC-as-monopoly-data-provider folks saying, no way, we can’t do anything to make it easier for people to not use WorldCat records, we need to be exerting as much pressure as possible to make sure everyone buys all their data from WorldCat.
It’s those former people who are betting on a horse that will win though. The data monopoly business model is simply not a sustainable business model, no matter how much they fight to keep it. And is not in the interests of the library community which OCLC ostensibly represents. No matter how much OCLC fights for it, there is going to be more and more data out there in library databases that does not come from Worldcat. They can try to make it as challenging as possible for libraries with such data, harming their own left hand, or they can embrace a business model based on services rather than data monopoly.
New Business Models
If I were OCLC, I’d be encouraging and facillitating the use of OCLC numbers as manifestation identifiers decoupled from Worldcat records. All those vendors selling proprietary records to libraries, I’d be offering to run their records through the ‘reclamation service’ to add OCLC numbers to them for free so that when customers got the records, they had OCLC numbers in them even though they are proprietary non-Worldcat records. Customers are going to buy and get records from non-Worldcat sources anyway, but this way when they get those records, the path is lubricated for them to still participate in OCLC ILL infrastructure, to use OCLC Grid services, to expose their holdings via worldcat.org, to use the new worldcat-as-ils services. OCLC wins; third party vendors and libraries also win becuase they’ve got more useful valuable records.
Does it give OCLC an advantage over competitors to be the maintainer of one of the most useful manifestation identifier systems we’ve got, and have a well-developed infrastructure for dealing with records identified suchly? Sure it does. So use it. I don’t begrudge them the advantage based on actual value, rather than attempt to legally enforce monopoly control of data. We all win, because OCLC numbers as manifestation identifiers are vitally useful.
So give up an attempt to own the data. Focus on the services, and the fact that you do have a privileged position to offer those services. Is this a guarantee of success? No. Other competitors will arise — as they have already (as evidenced by the fact that so many OCLC member libraries are already purchasing proprietary records from third party vendors — which at the moment generally means those holdings aren’t registered in WorldCat, which is no good for us or OCLC). OCLC has got to efficiently provide value to compete, one way or the other. There are no magic bullets. We libraries are also dealing with wrenching changes to our business in an effort to remain sustainable and relevant. Welcome to the 21st century.
But OCLC can try to find a sustainable business model that also serves the interests of libraries. Or OCLC can try to stick a business model that in fact opposes the long-term interests of libraries. While no business can afford to drive it’s customers out of business, OCLC additionally is a non-profit cooperative beholden by it’s mission to serve our interests. Grid Services, worldcat-as-ils, etc, provide a way to stay relevant and sustainable while serving our interests. And, in a nice coincidence, that particular model is positively effected by opening up the data, encouraging the continued use of OCLC numbers as identifiers regardless of where you buy your data, etc. One way or another, OCLC will change, they can take us all down with them, or they can switch gears. The new services tells us that at least some parts of OCLC are trying to switch gears.