OCLC in the Guardian

jrochkind General January 22, 2009January 22, 2009

Intersting article in the Guardian about OCLC. It is harshly critical of OCLC, specifically over OCLC’s record sharing policy attempt to monopolize access to the shared corpus.

Very harsh. OCLC is not going to be pleased.

While in general, I think it is in fact an accurate analysis and accounting of the record distribution policy kerfuffle, I think they go a bit too far in it’s current implications. The headline is “Why you can’t find a library book in your search engine.” I don’t think the record sharing policy or record control attempts are responsible for keeping library book records out of search engines, really—this gives the rest of the library world too much credit for what they would be able to do without worldcat! And Google already is crawling many library catalogs, they’re there (thanks dbs).

The article implies that the reason “OCLC shares only 3m of its 125m records with Google Books; none of them show up in an ordinary search” is becuase OCLC is unwilling to share more records for more uses. My understanding was that OCLC was happy to share all it’s records with Google (but not with you and I, for free), and would like them to show up in ordinary searches, but Google says it’s too much data and they don’t know what to do with it! (Contrary to popular belief, even Google is not omnipotent against the genuinely hard problem of dealing with our gigantic legacy corpus.)

OCLC in fact does a lot to try and put library books on the web, with worldcat.org, and their agreement with Google.

But they do it in a way meant to maintain their monopoly over our shared corpus. While I’m not sure as of today this is responsible for a lack of access, more entities are starting to be in a position to do things with the corpus to improve access, and it is definitely handicapping our future. I’ve questioned before the way the OCLC/Google agreement limits library benefit to the benefit of OCLC and Google. Ed summers has more arguments to that effect.

Copyright law

Meanwhile, Karen Calhoun and OCLC seem to still not understand US copyright law.

Calhoun says OCLC’s legal department is still researching the copyright question, explaining that courts have in the past considered “sweat of the brow”: creating a bibliographic record, she says, requires intellectual effort and judgments by trained personnel.

Yes, Karen, US courts in the past. Before Feist v. Rural (1991). Feist v. Rural specifically and explicitly abandoned the ‘sweat of the brow’ doctrine.

Either this is just a PR thing while OCLC stalls figuring out what it’s going to do–or OCLC ought get itself a lot better lawyers, who actually specialize in intellectual property law, and know that it’s not the same now as it was 20 years ago.

The result of Feist v. Rural is that how much effort or work or money you put into assembling something is irrelevant (in the US) to whether you can have copyright over it. The amount of creativity that went into it (not generic ‘intellectual effort’) is what is relevant. Creative works can be copyrighted. Things that are merely facts assembled without creativity can not be. If OCLC wanted to argue in a US court that the database was copyrightable, they would need to argue that either the creation of cataloging records or the choice of what records to include in worldcat was a creative act, not simply a mechanical following of rules or policies. If only the selection were a creative act (and I don’t know how you’d argue that), then individual records would still not be copyrightable, but just the aggregate database. And if OCLC wanted the copyright on any of this to be held by them, then they’d need to show that it was their employees who exersized this creativity, unless the actual copyright holders (ie, the actual libraries doing original cataloging) assigned the (putative) copyright to OCLC.

Maybe OCLC thinks it has such a novel case that it can change case law, go all the way to the supreme court.

I doubt it.

PS:

The article doesn’t quote any OCLC members actually in favor of the policy. While it’s possible they didn’t try that hard, I suspect it’s pretty much impossible to find an OCLC member administration who both can speak knowledgeably about it and will speak in favor of it.

I wonder what this will mean for the outcome of the review commission OCLC has established.

Published by jrochkind

View all posts by jrochkind

Published January 22, 2009January 22, 2009

18 thoughts on “OCLC in the Guardian”

Ryan Shaw says:

January 22, 2009 at 3:25 pm

“I don’t think the record sharing policy or record control attempts are responsible for keeping library book records out of search engines…”

OCLC’s robots.txt says otherwise.
Bryan says:

January 22, 2009 at 3:56 pm

Aaron Swartz is quoted as say that “Since the beginning of Open Library, OCLC has been threatening funders, pressuring libraries not to work with us, and using tricks to try to shut us down. It didn’t work – and so now this.”

All that may be true, but readers are just left with that statement without any elaboration. Which funders have been threatened and how? Which libraries have been pressured not to work with Open Library, and how have they been pressured? What tricks is OCLC using to shut down Open Library?

Does anybody know what he is talking about, and can they give specific, concrete examples?
doihaveto says:

January 22, 2009 at 4:21 pm

@Bryan
Sure, I can. But I won’t. I’m being pressured not to.
Kidding aside : organizations, while negociating with OCLC about new contracts, suddenly fearing that the terms of the deal might deteriorate and thus stating that it’s not “the right time” to give records to Open Library.
But that could be self-censorship, mind you…
Dorothea says:

January 22, 2009 at 6:45 pm

Thank you, Jonathan. Yours is the best coverage of the OCLC landgrab available.
Tim says:

January 23, 2009 at 12:33 am

Thank you, Jonathan. Yours is the best coverage of the OCLC landgrab available.

Agreed.

The amount of creativity that went into it (not generic ‘intellectual effort’) is what is relevant. … And if OCLC wanted the copyright on any of this to be held by them, then they’d need to show that it was their employees who exersized this creativity, unless the actual copyright holders (ie, the actual libraries doing original cataloging) assigned the (putative) copyright to OCLC.

I think you could make a good case that certain elements, of a cataloging record could be copyrighted. LCSH assignment could be, I think. Obviously the title, publisher or size could not. But you make a good point—the creativity does not reside in putting the records onto a hard drive in Dublin, OH. It was done by the cataloger. Anyone with a passing understanding of copyright law knows that it is hard to assert copyright over things you didn’t create. “Work for hire” is very narrowly drawn, and actually transferring copyright requires an explicit written agreement to that effect. Neither holds here.

It’s clear to me that OCLC doesn’t want to give up rights they could assert. Their claim in the US is worthless—but continuing to claim it means they can shake up people who don’t know better. But overseas it is not. Most European countries have a “database copyright” that would protect them. They would be fools to give that up, but arguing it would draw considerable ire. The nationalistic tone of the Guardian article shouldn’t be ignored–there is something creepy about people in Dublin Ohio competing to shut out a British based company from helping British libraries share books!
nathan says:

January 23, 2009 at 9:44 am

“LCSH assignment could be, I think.”

I think this is right on Tim. Maybe LCC and Dewey numbers too: but otherwise I think the “phone book” analogy holds for most other data in the bibliographic record…
Ed Summers says:

January 23, 2009 at 10:39 am

I’ll let the experts talk about the licensing and copyright issues. However this I question:

My understanding was that OCLC was happy to share all it’s records with Google (but not with you and I, for free), and would like them to show up in ordinary searches, but Google says it’s too much data and they don’t know what to do with it! (Contrary to popular belief, even Google is not omnipotent against the genuinely hard problem of dealing with our gigantic legacy corpus.

I’m wondering where you got that understanding from. The idea of Worldcat links searches showing up in all search results is pretty messed up. I could well imagine why Google and other search engine companies would balk at this for business reasons, not technical ones.

The point I was trying to make was: why don’t they just let worldcat get crawled like the rest of the web. Why is OCLC so different?
Mark says:

January 23, 2009 at 7:00 pm

Jonathan, I’m in agreement here, mostly. Much of cataloging is rules following, especially when it comes to standard larger publisher books.

But. Anyone who thinks even deciding what the title is of much of the old serials I work with isn’t creative has clearly *never* done in cataloging, or only very simple cataloging.

There is much in a cataloging record that is simply factual but about the only thing I can think of that is always and only factual for all (print) resources is size. I guarantee that I can find materials that will creatively challenge anyone as to who/what is the responsible body, what the title is, who the publisher is, what form of name, etc. “Rules” my rearend!

But, yes, I do this work and not OCLC.
jrochkind says:

January 23, 2009 at 8:01 pm

I know what you’re saying Mark, and some aspects of cataloging may be ‘creative’, but keep in mind that ‘creative’ is different than ‘it takes a lot of work’. ‘Creative’ in the context of copyright means ‘originality’ and ‘creativity’.

For most books, figuring out the ‘title proper’ is very straightforward. For some, it is more complicated and time consuming. But in all cases, it’s done following rules. If the cataloger is being ‘original’ in choosing the ‘title proper’, I don’t think they’re doing it right, are they? And doesn’t this apply to the vast majority of the cataloging record? Sure, it’s work, but it’s not originality, is it?

Which is not to say it can’t be a lot of work. The whole point of Feist v. Rural is that creativity/originality is different than ‘a lot of work’, and it’s only the former — a creative act, not just an intellectually strenuous act — that gets you a copyright.

Maybe some day it will be argued in court. I suspect that very little, but perhaps some, of the process of creating a cataloging record is sufficiently creative/original to create a copyright.

But that it may not be considered ‘creative’ for the copyright act doesn’t mean that it’s not intellectually strenuous skilled work which catalogers can take pride in. Two different things.

And I’d add that if any or all of a cataloging record does possess copyright, then it is of course held by the institution doing the cataloging, unless they assign it elsewhere. In the case of LC, as part of the federal government nothing they do has copyright in the US. In the case of large libraries doing original cataloging, I think it is in all of their collective interests to release this putatively copyrighted data into the public domain, in an exersize in mutual reciprocity.

Isn’t that kind of mutual reciprocity exactly why libraries do cooperative cataloging in the first place, the very endeavor that gave rise to the regional ‘bibliographic utility’ of which OCLC is the only one left standing (in the US anyway)? Libraries aren’t submitting their original cataloging to OCLC for the paltry sums OCLC ‘pays’ (ie credits) them for it; and they aren’t submitting it to OCLC in an attempt to help OCLC gain a monopoly on this information for their own business purposes. They’re submitting it, as they always have been, in order to freely share with other libraries in a collective endeavor of mutual reciprocity.
Alex says:

January 26, 2009 at 8:51 am

Ed: That’s probably a story coming from me and from my time when our library gave MARC data to Google. It was a deal to bump our results up the ladder because, um, we’re libraries, and as such should be considered experts in bibliographic data. And, like others try to point out, library infrastructure is so rubbish it can’t handle proper spidering and must hand-feed the indexing monsters with MARC. Except no one wants or needs MARC to do interesting stuff with it, so some extremely cut-down version (title, author, publisher, I think) in their own cut-down simple XML format.
Mark says:

January 26, 2009 at 2:41 pm

Oops, sorry for not getting back here sooner, Jonathan. As you’ll see, I fully agree with your comments as put into your newer post.

I also primarily agree with your commentary re cataloging as creative vs. a lot of intellectual work. But. Some of our “rules” aren’t much in the way of being rules. As a philosopher–and even as a native speaker of my language I could easily show how they are not “rules” as a most of us would understand them. But, given, most are. Also agreed that most is just work, some of it intellectual, some not. So I generally agree.

Part (much?) of my problem relating to many discussions going on is the nature of the material I work with. I do (mostly original) cataloging of old serials, in assorted languages, to include government documents (serials). Very few people do this work and the rules for serials only cover a new item in hand. Toss in the reality of the world of OCLC editing/upgrading rules, which records for what parts of the title changes exist, etc. and one gets a frequent nightmare that, to me, seems somewhat, at least, creative. Certainly not writing a symphony relative, though.

Bottom line, I 99% agree with you. I just like to try and nuance things sometimes, especially when I feel I have a viewpoint or experience that most are unaware of.
Bryan says:

January 26, 2009 at 3:28 pm

doihaveto,

But you did not say which organizations specifically. Aaron
Swartz made a claim without elaborating on it. Nor did the columnist press him for more details.

Which funders have been threatened and how? Which libraries have been pressured not to work with Open Library, and how have they been pressured? What tricks is OCLC using to shut down Open Library? I have no way of evaluating his statement because he doesn’t name names.

Can anybody else answer these questions with specific examples?
jrochkind says:

January 26, 2009 at 6:42 pm

I agree with Bryan that making claims like that without giving specifics is a bit suspect.
doihaveto says:

January 27, 2009 at 3:50 am

@Bryan and @jrochkind.
And suspect it is.
But if your refer to my comment #3, you’ll notice that I don’t claim OCLC directly put pressure on this institution not to give it’s records to OL and I clearly say that it might be self-censorship on the part of the institution.
But that’s just the point : even assuming OCLC did not pressure the institution, the situation is such, and the mistrust is such, that this institution felt she could not give it’s records away. That’s got to be wrong.
Bryan says:

January 27, 2009 at 9:44 am

doihaveto-

I did not realize that you were talking about a specific institution or organization, based on my initial reading.
I thought you were just speculating in a vague, general way about how organizations or institutions might react in the future as a result of OCLC’s change in policy.
doihaveto says:

January 27, 2009 at 9:59 am

@Bryan,
Going back to it, I realize my comment #3 was not well worded and, yes, rather too vague. Yes, I did talk about a specific institution, and no I don’t claim OCLC pressured that institution. But that :
#1 it was negociating new contracts with OCLC
#2 it decided against giving records to the OL
#3 it explicitly made a link between #1 and #2
But again maybe OCLC has nothing to do with this; maybe it’s just the institution itself being more fearful than it should. It’s just that it seems to me to be telling of a certain climate, and to be giving at least some (just some) plausibility to A. Swartz’s claims.
Aaron Swartz says:

January 28, 2009 at 11:31 am

My understanding was that OCLC was happy to share all it’s records with Google (but not with you and I, for free), and would like them to show up in ordinary searches, but Google says it’s too much data and they don’t know what to do with it!

Open Library had over five million pages in Google, so presumably if OCLC wanted to open up their data they could do even better.

Bryan: Newspaper stories are limited for space; I don’t think any conspiracy is needed to explain why the article didn’t detail a list of organizations and incidents. I’m not sure I’m allowed to name names, but the overview is basically this: OCLC told several large foundations that the work we were doing was in violation of our legal rights and that funding us could subject them to legal scrutiny. Several large libraries said that they couldn’t work with us if OCLC didn’t approve because they used many OCLC services, e.g. WorldCat Local, and didn’t want to jeopardize their relationship with OCLC.

If you have a particular reason for wanting to know the details, let me know and I can talk to people to see if they want to go on the record with their stories. But, presumably if they didn’t want to anger OCLC by working with us, they won’t want to anger OCLC by saying OCLC pressured them not to work with us. From the perspective of proving what really happened, its obviously a problematic state of affairs, but I’m not sure what else I can do.
jrochkind says:

January 28, 2009 at 12:31 pm

Those extra details are helpful even without naming names, thanks Aaron.

That’s actually rather infuriating, that OCLC is apparently telling people the work you are doing is illegal. Ridiculous.

Has OCLC or it’s lawyers actually approached the Archive, or are they just threatening your partners?

Copyright law

PS:

Share this:

Published by jrochkind

18 thoughts on “OCLC in the Guardian”

Leave a comment