google research database from GBS

http://www.nytimes.com/2010/12/17/books/17words.html

The digital storehouse, which comprises words and short phrases as well as a year-by-year count of how often they appear, represents the first time a data set of this magnitude and searching tools are at the disposal of Ph.D.’s, middle school students and anyone else who likes to spend time in front of a small screen. It consists of the 500 billion words that are contained in books published between 1800 and 2000 in English, French, Spanish, German, Chinese, Russian and Hebrew.

A side issue

from that same article:

Google says the culturomics project raises no copyright issue because the books themselves or even sections of them cannot be read.

That would kind of make sense if it were so, but the publishers suing google would probably beg to differ.  Copyright says you can’t make a copy, without the rights-holders permission. It doesn’t say you can make a copy as long as you don’t share it publicly. (Although the penalties might be worse if you do).

If it weren’t for this, the mere couple-sentance excerpts that google showed for arbitrary books in Google Book Search would almost certainly, in each individual case, be fair use.  What gave the publishers a case was that Google had to copy the whole thing to provide that service. As they do for the ‘culturomics’ database.

This entry was posted in General. Bookmark the permalink.

5 Responses to google research database from GBS

  1. Bryan says:

    Have you ever wondered why the publishers are not also suing the Google partner libraries for basically aiding and abetting Google’s wholesale copying of in-copyright works? I mean, it’s not like the libraries were ignorant of Google’s intentions. Do you think partner libraries ever thought that they would also be at risk in the lawsuit?

  2. Hugh says:

    Presumably they would argue that this is a transformative use (see http://en.wikipedia.org/wiki/Transformation_(law) ). Bryan: Libraries have special protections under copyright law that probably apply here. See http://www.law.cornell.edu/uscode/17/108.html

  3. jrochkind says:

    Well, the partner libraries ARE allowed to loan the books out to whomever, including Google. I don’t think the partner libraries are liable even if they ought to know the borrower plans to copy it. Plus, libraries are still well liked enough enough that suing them tends to look bad.

  4. I think Google is relying on the part of fair use about transformative use or not. Google will say that it has completely transformed the originals and the final product does not try to substitute for the originals in any way. For example, this is how parody manages to say it is fair use, because it is transformative.

    It would be very difficult for a copyright holder of any particular book to say that because of this new tool of Google, their book has been superceded and is no longer valuable, because Google has created something entirely new in value and purpose.

    Making the full-text scans generally available is, of course, a completely different matter.

  5. jrochkind says:

    Quite possibly James.

    Fair use, perhaps sadly for us users, doesn’t have any simple measurements. It is not just about transformation — in fact, the four standard measures, which are balanced amongst themselves, for determining fair use don’t even include transformation as such, although they include “purpose and character of the use”, with transformation being part of the character. And they certainly also include “the effect of the use upon the potential market for or value of the copyrighted work.”

    That one gets really tricky with the new fangled internet with new fangled markets. There is/was no real market for n-grams created from texts. There ALSO is/was no market for searches over texts revealing excerpts, the use that the publishers sued google for. That use certainly didn’t impact the market for actually buying tests, but I’d guess the publishers suing google argued that they thought they could create such a market — that they could charge SOMEONE for including their texts for searching purposes even though they weren’t yet. It doesn’t really help the case for a market that most of the big publishers are willingly giving free licenses to copy texts for full text searching and excerpts to Amazon, though.

    Many people thought Google could probably prevail on a fair use case of their searching-with-excerpts, and were dissapointed they choose to settle instead. In part, the the legal precedent saying that image thumbnails on the internet, used as previews, are a fair use would seem to be pretty parallel to three sentance excerpts on the internet used as previews. As wikipedia summarizes:

    First, it found the purpose of creating the thumbnail images as previews to be sufficiently transformative, noting that they were not meant to be viewed at high resolution like the original artwork was. Second, the fact that the photographs had already been published diminished the significance of their nature as creative works. Third, although normally making a “full” replication of a copyrighted work may appear to violate copyright, here it was found to be reasonable and necessary in light of the intended use. Lastly, the court found that the market for the original photographs would not be substantially diminished by the creation of the thumbnails. To the contrary, the thumbnail searches could increase exposure of the originals.

    http://en.wikipedia.org/wiki/Fair_use#Fair_use_on_the_Internet

    Sounds a lot like using three sentence excerpts as previews in search results, right? Can’t be used as a substitute for the actual book (even MORE so than a low-res preview of an ENTIRE image); the book has already been published; yes, you have to make a ‘full’ replication in order to supply the keywords-in-context three sentance excerpts, but it’s just what you have to do in order to display the appropriate three sentance excerpts for a search. And the market for buying the books themselves (in print or even digitally) does not seem to be effected by displaying three sentance in-context excerpts, which may even increase exposure of available purchase of the books.

    At any rate, it’s clearly all a mess, but I definitely question the google person’s suggestion that, oh, of COURSE there’s no copyright issue with the n-grams because “the books themselves or even sections of them cannot be read.” It is hardly that simple.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s