I have for a while been using the “Google Book Search Data API” to search Google Books, in order to provide links in Umlaut.
There is now a new api from google for this purpose, called the “Books API“. This infoq posting where I learned of the new api claims that the old “GBS Data API” is now deprecated, but I’m not sure where they got that information, the Google docs on the old api don’t say that. And the Google docs on the new api do say “This version of the Books API is in Labs, and its features might change unexpectedly until it graduates.” It seems safe to assume the old one will eventually go away, but hopefully they’ll give us some time after the new one “graduates” until it does.
The new API is basically similar to the old one, clearly based on it, but with some new features at least one new limitation, and some mysteries.
json for xml
They’ve switched from XML to JSON for the response container, at least as a default. Which is fine with me, I don’t really care either way. Not sure if there’s still a way to get Atom XML by requesting it specifically — I couldn’t get it to work the way the documentation seemed to suggest (Google switching away from Atom, is Atom’s adoption curve on the downslope?)
api key now required, rate limiting?
The old api lets you search public information without an API key. The new API at first appears to let you do the same, but gives you a pretty small request rate before it rate limits you and says to get an API key. Not sure the limit, but I ran into it easily while testing manually.
Once you have an API key, it can keep track of # requests for that key — it’s not clear to me if they rate limit you, and if so at what rate.
(I think you need an API key to access individual account data in both APIs, which in the new api is now done OAuth as opposed to something weirder in the old API; but I don’t do that).
Additional information in response
There is a bit more access information in the new response, including usefully direct links to PDF or ePub if available, along with information on whether the document will include full text or not. (I think; it’s not entirely straightforward to interpret the access/availability data, but there is more of it than there was before).
magazines vs books
The new API includes books and magazines, and lets you limit to just one or the other if you want. (The documentation for the old API, written before magazines were in Google Book Search, doesn’t mention magazines at all. I forget if the old API never includes magazines, or always includes them with no way to exclude them. I think the latter).
Searching on third party standard identifiers
My use case depends on being able to search for ISBN, OCLCnum, and LCCN in Google Book Search. The old API docs explained how to search for ISBN, sort, but were silent on OCLCnum/LCCN, even though there was a way to do it.
The new API is silent on all of them, but, very fortunately for me, they still work similarly. For ISBN, you can specify a “fielded search”, ISBN:X.
This isn’t documented anywhere, but the fact that it’s actually a fielded search seems like it’ll continue.
For OCLCnum and LCCN, there is no fielded index, but Google is indexing tokens “OCLCx” or “LCCNx” which you can search on — this was an undocumented hack shown to me by a google engineer years ago. It does still work. But the fact that it’s not documented anywhere and it’s so hacky worries me; okay, some google engineer made sure that would work years ago to satisfy some libraries (back when Google was trying to impress libraries to get GBS partners), but that engineer surely doesn’t work on that project anymore, and does anyone else at Google know about it or plan to maintain it? Who knows. I sure don’t know how to find out from Google.
(Note that LCCN normalization issues may make LCCN matches fail, I don’t know if Google is normalizing LCCN’s before indexing them. Again, don’t really know what Google is doing).
I suppose if they took away everything but ISBN, I could still make use of it, but it is very nice that OCLC and LCCN matching can currently (sort of) be done too.
OCLC, you’ve got some relationship with Google — can you possibly represent us libraries by pushing for them to include at least OCLCnum (which obviously is in OCLC’s corporate interest too!) as a documented supported access point?
third party identifiers in response
The response does not include OCLC number or LCCN even when Google clearly does know about them. (Another way we know google knows the OCLCnum for most records, is that it’s able to provide “find in a library” links to worldcat based on OCLCnum). The old api was documented to say OCLCnum and LCCN would be included, even though they never were. The new api docs are silent on the issue. (And this would be another thing OCLC might want to try to get Google to do, in OCLC’s interests as well as all of ours).
The new API does sometimes include some weird additional identifiers, tagged as industryIdentifier/OTHER. Check out “HARVARD:32044004331278” in the record i this response: https://www.googleapis.com/books/v1/volumes?q=OCLC6476771
Very weirdly, if I ask for that hit by Google ID directly, the industryIdentifier is not included, huh? https://www.googleapis.com/books/v1/volumes/CLGo1EU8ZlgC I don’t know if this is a bug or what’s going on. It looks like actually LOTS of info is missing from the specific volume request that’s present in the search results, I don’t get it.
Anyway, my first guess was that the “HARVARD:X” was Harvard’s internal accession #, and this was a volume digitized from Harvard as a GBS partner. But maybe not, searching for that number on Harvard’s catalog’s advanced search as “Hollis #” gets 0 hits, and you know 32044004331278 is rather too big to be a bib/accession# in a typical ILS anyway. But that “HARVARD:X” identifier presumably has something to do with Harvard as a scanning partner, but who knows if it has any useful meaning at all to the rest of us. (Anyone from Harvard or another partner library reading and have any idea?)