Okay, figure this one out for me. Check out this Umlaut page:
Umlaut contacts Google Book Search to ask about that citation, and Google says they don’t have full text, but do have a metadata record. You can click on the “book information from google book search” link to see that very GBS page.
HathiTrust on the other hand, says they DO have full text–and provides fulltext via a GBS preview widget, as you can see here.
And also EZProxy problems
If you try to click on the HathiTrust link on the Find It page, you’ll see another unrelated problem. HathiTrust URLs are now expressed as hdl.handle.net URLs. hdl.handle.net is included in our EZProxy config for other reasons. So Umlaut thinks it needs to send HathiTrust urls through EZProxy, and will ask you to login before following the link. Even though it’s free.
Next time I hear someone saying we don’t need an actual non-hacky authentication solution to third party licensed content (like openid or shibboleth), because EZProxy works just fine, I’m remembering this example.
8 thoughts on “hathitrust vs google availability weirdness”
A correction–the HathiTrust copy is available from the HathiTrust site, not a “GBS preview widget.” It’s an important distinction, since they are doing their own rights determination separate from Google. In this case, they’ve determined that the book is in the public domain, probably after checking the copyright renewal records.
Ah, you’re absolutely right. I mis-saw it. HathiTrust has a new preview interface, or maybe I just forgot what it was.
I guess that’s why Umlaut checks both HathiTrust and GBS. I wasn’t sure how often it would find a full text hit at Hathi but not GBS that anyone was interested in, but in this case this particular example was brought to my attention because a patron indeed needed it (and got confused by the EZProxy issues I mentioned).
Good on HathiTrust.
Google will never enable a book after 1923. HathiTrust has different rules.
You’ll find quite a few differences between access restrictions on materials in HathiTrust and Google. As I mentioned, Michigan is conducting its own review of copyright status. They’ve opened access to tens of thousands of volumes published in the U.S. between 1923-1963. They also have made different determinations on U.S. federal documents than Google has. And finally, in a small number of cases, they’ve received permission from copyright holders to make works publicly available.
Perry or anyone else, do you know if there’s a way to download the complete text of a public domain book (PDF, text, or other) from HathiTrust? I can’t find one, I don’t think. this is something GBS offers; would be nice to have it from HathiTrust for books that HT calls public domain but GBS does not.
So I have been thinking about this EZProxy thing a bit. It seems pretty likely that purl.org and dx.doi.org might be proxied, as well.
DOI, well, whatever — open access content would have to be picked up in SFX; the likelihood of a doi pointing at a known OA target is too minute.
Handles and purls on the other hand… In order to avoid proxying when it’s not necessary, it might be worthwhile to send a HEAD request to handle and purl URLs and then parse the URL returned in the headers with the 30x.
So, for handles:
$ curl -I http://hdl.handle.net/1721.1/39443
HTTP/1.1 302 Moved Temporarily
which then can be checked against the OpenDOAR API:
The signal to noise ratio with Purls is probably too small to bother with, but all three behave the same way, so it might not be too hard to just do some simple checking: a whitelist with a handful of major sites like Citeseer that don’t appear in OpenDOAR and then the OpenDOAR API.
It’s do-able, I just worry about all the code getting increasingly complicated.
HathiTrust doesn’t have a feature allowing an enduser to download an entire volume as a single file.