hathitrust vs google availability weirdness

Okay, figure this one out for me. Check out this Umlaut page:

http://findit.library.jhu.edu/link_router/index/16793853

Umlaut contacts Google Book Search to ask about that citation, and Google says they don’t have full text, but do have a metadata record. You can click on the “book information from google book search” link to see that very GBS page.

HathiTrust on the other hand, says they DO have full text–and provides fulltext via a GBS preview widget, as you can see here.

Huh?

And also EZProxy problems

If you try to click on the HathiTrust link on the Find It page, you’ll see another unrelated problem.  HathiTrust URLs are now expressed as hdl.handle.net URLs.  hdl.handle.net is included in our EZProxy config for other reasons. So  Umlaut thinks it needs to send HathiTrust urls through EZProxy, and will ask you to login before following the link. Even though it’s free.

Next time I hear someone saying we don’t need an actual non-hacky authentication solution to third party licensed content (like openid or shibboleth), because EZProxy works just fine, I’m remembering this example.

This entry was posted in General. Bookmark the permalink.

8 Responses to hathitrust vs google availability weirdness

  1. Perry Willett says:

    Jonathan,

    A correction–the HathiTrust copy is available from the HathiTrust site, not a “GBS preview widget.” It’s an important distinction, since they are doing their own rights determination separate from Google. In this case, they’ve determined that the book is in the public domain, probably after checking the copyright renewal records.

  2. jrochkind says:

    Ah, you’re absolutely right. I mis-saw it. HathiTrust has a new preview interface, or maybe I just forgot what it was.

    I guess that’s why Umlaut checks both HathiTrust and GBS. I wasn’t sure how often it would find a full text hit at Hathi but not GBS that anyone was interested in, but in this case this particular example was brought to my attention because a patron indeed needed it (and got confused by the EZProxy issues I mentioned).

    Good on HathiTrust.

  3. YS says:

    Jonathan,

    Google will never enable a book after 1923. HathiTrust has different rules.

  4. Perry Willett says:

    You’ll find quite a few differences between access restrictions on materials in HathiTrust and Google. As I mentioned, Michigan is conducting its own review of copyright status. They’ve opened access to tens of thousands of volumes published in the U.S. between 1923-1963. They also have made different determinations on U.S. federal documents than Google has. And finally, in a small number of cases, they’ve received permission from copyright holders to make works publicly available.

  5. jrochkind says:

    Perry or anyone else, do you know if there’s a way to download the complete text of a public domain book (PDF, text, or other) from HathiTrust? I can’t find one, I don’t think. this is something GBS offers; would be nice to have it from HathiTrust for books that HT calls public domain but GBS does not.

  6. Ross says:

    So I have been thinking about this EZProxy thing a bit. It seems pretty likely that purl.org and dx.doi.org might be proxied, as well.

    DOI, well, whatever — open access content would have to be picked up in SFX; the likelihood of a doi pointing at a known OA target is too minute.

    Handles and purls on the other hand… In order to avoid proxying when it’s not necessary, it might be worthwhile to send a HEAD request to handle and purl URLs and then parse the URL returned in the headers with the 30x.

    So, for handles:
    $ curl -I http://hdl.handle.net/1721.1/39443
    HTTP/1.1 302 Moved Temporarily
    Location: http://dspace.mit.edu/handle/1721.1/39443

    which then can be checked against the OpenDOAR API:
    http://opendoar.org/api13.php?kwd=dspace.mit.edu

    The signal to noise ratio with Purls is probably too small to bother with, but all three behave the same way, so it might not be too hard to just do some simple checking: a whitelist with a handful of major sites like Citeseer that don’t appear in OpenDOAR and then the OpenDOAR API.

  7. jrochkind says:

    It’s do-able, I just worry about all the code getting increasingly complicated.

  8. Perry Willett says:

    HathiTrust doesn’t have a feature allowing an enduser to download an entire volume as a single file.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s