Internet Archive full text

I have a plug-in in Umlaut which tries to look up available full text on Internet Archive.

It’s just an author/title search, so does result in false positives sometimes, but not too too often in my experience. Most of my Umlaut plugins use isbn/lccn/oclcnum lookup instead, but for a variety of reasons this isn’t practical for IA. In part because they’ve got a whole lot of full text without any such standard identifiers (project gutenberg for instance).

So anyway, this is in Umlaut, and then the Umlaut services are used in my OPAC to expose it. While it seems to work, it’s very rare that Umlaut finds something in IA that wasn’t also in Google Books/Hathi Trust.

But today I happened to come accross one such item, so I’ll put it here, for general interest, and so I myself can find it again when I need such an example!

The frog : its reproduction and development, Philadelphia: Blakiston 1951, [ Rugh, Roberts, 1903- ]

I have no idea how/why/if a book from 1951 is legally available in free full text. (which is why it’s not in GBS/HT). This one comes from the Biodiversity Heritage Library, perhaps they got a license from the copyright holder.

[PS: For whatever reason my brainless test search is often ‘frogs’, is how I came accross it].


6 thoughts on “Internet Archive full text”

  1. In case it’s useful, here’s another example for you:

    This one is linked up in OpenLibrary, though there was an initial issue with the OpenLibrary record having the incorrect language (German — now fixed).

    Thanks for your work on Umlaut. Wish I could get the powers that be to let us experiment with using it at my institution.



  2. Thanks Shirley. Some time in the next couple months I hope to upgrade Umlaut and rewrite some parts of it with the goal being making it much easier to install and run. I hope to get some more institutions interested at that point.

  3. Go to:

    which is the Stanford copyright renewal database. The database is made up from the printed indexes that were issued by the copyright office. (Scanned then OCR’d, I believe). If you don’t find it in there… then there is a chance that it didn’t get renewed. In fact, I suspect that the copyright office’s files are so garbled (as many of the entries in this database also are) that you can never prove the negative. That’s we why have so many orphan books. Note, however, that some folks have done tests on tens of thousands of titles that are in that renewal range (1923-19somethingelse) and have found that a very large number were not renewed. Cutting off your full text at 1923 (in the US) means that a lot of books that are in the public domain are not made available.

    This is, in part, the problem that Google and the publishers were trying to solve with the Books Rights Registry.

  4. Works published in the United States between 1923 and 1963 required a renewal of the copyright. The MBLWHOI Library conducted a due-diligence search of the Copyright Renewal Database by Stanford University (, the GoogleBooks scans of the US Copyright Office Catalog of Copyright Entries (, and the US Copyright Office post-1978 records ( No evidence of copyright renewal for this book was found in any of the research.

    Diane M. Rielinger
    MBLWHOI Library

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s