Google Books research corpus–acceptable uses?

So, there are various algorithms which will take text and compute an ‘audience-level’ or ‘reading-level’. (9th grade, specialized technical expert, whatever).

So let’s say I was a researcher granted access to the hypothetical research center with the GBS digitized corpus, that will hypothetically exist under the settlement.

Let’s say I wanted to compute the reading-level on every book in the corpus, and then actually publish these computations, open access (the Open Data Commons Public Domain Dedication and License would be useful), for the greater good of anyone that wanted to use it.

Would that be an allowable use?  I think it seems ‘non-consumptive’ under the terms of the proposed settlement.

I think it probably would be allowed. Anyone else have an opinion?

This seems counter-intuitive, because doing so would harm the business interests of those who currently are in the market of selling such reading-level computations… but then I remembered that those vendors are probably different than the publishers, who don’t make much money themselves from such services, and thus those sorts of were probably not represented in the settlement. Or just nobody thought of it.


2 thoughts on “Google Books research corpus–acceptable uses?”

  1. It would seem to be okay. Paragraph 7.2(d)(vii) [“Fully Participating Library Uses — Research Corpus — Publication of Results”] on page 81 of the settlement says:

    Qualified Users are permitted to report the results of their Non-Consumptive Research in scholarly publications, which may constitute indirect commercial use (e.g., reporting results in journal articles or in books sold to the academic community or to the public).

    Paragraph (viii) of the same section [“No Commercial Use”] goes on to say:

    Except with the express permission of the [Books Rights] Registry and Google, direct, for profit, commercial use of information extracted from Books within the Research Corpus is prohibited.

    … and paragraph (ix) [“Use of Data”] starts this way:

    Use of data extracted from specific Books within the Research Corpus to provide services to the public or a third party that compete with services offered by the Righsholder of those Books or by Google is prohibited….

    So it would seem that as long as the Rightsholder (or Google) is not offering a reading-level service (to use your example), and data is published in conjunction with a research paper, then it would be within bounds.

    But, then again, I’m not a lawyer… ;-)

  2. Aha, yeah, that as long as it’s not competing with a service offered by a Rightsholder is the key thing. Because a rightsholder can always _start_ offering a service, even if they don’t offer one now. DRAT. Nevermind.

    I suspect, from those excerpts, it wouldn’t in fact be neccesary to publish in conjunction with a research paper, if it didn’t compete. The not competing is the sticking point. Nevermind. It’s “non-consumptive”, but it still at least potentially competes. Especially with all the conglomeration we’ve got going on in the industry, if it doesn’t compete with a rightsholder’s service yet, it will soon enough when a rightsholder buys someone else with such a service.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s