Google Scholar does not allow meta-search

Neither does Google ‘ordinary’ web search for that matter.

Metalib, the Ex Libris federated (broadcast) search product has for a long time supported a Google Scholar target, which I believe was accomplished by screen-scraping G.Scholar (cause I can’t figure out any other way it could have worked).

As long ago, as, I think, two years, some Metalib customers had problems with this G.Scholar target, and when they contacted Google, Google basically told them “Well, that’s because we don’t allow federated search of Google Scholar. You are violating our terms of service. Our automated rate controls probably noticed all the traffic from your IP and cut you off as a bot, which is what we wanted them to do.”

So from that moment on, I expected that G.Scholar would eventually become no longer supported in Metalib.  I’m surprised it took this long!

Finally today, Ex Libris sent out an email that says, among other things:

In brief, Google rejected the use of these services via MetaSearch, since their policy is to reject automated queries. We understand that Google’s technology may be identifying MetaSearch as an automated query, which is identified (and blocked) by IP addresses. We contacted Google to ask about one of these services (Google Scholar) and were informed that Google does not permit use of these services via MetaSearch.

We always respect the decisions and policies of the search engines that act as MetaLib resources. We will therefore de-activate resources that are not supported when we become aware of it, and ensure that all new resources are supported before we add them.

Ex Libris customers are welcome to ask Google to change its policy. If Google decides to support MetaSearch engines for a particular service at any time, we will be happy to re-activate the relevant resource.

We will also endeavor to reach a positive outcome with Google on this issue.

I wouldn’t hold my breath for a “positive outcome”, myself. I think Google is pretty clear amongst themselves that it’s not in their business interests to allow someone else to present Google (Scholar) results ‘intermingled’ with other people’s results, which is exactly what meta-search does.   In most cases, they’ve decided that it’s not even in their business interests to allow external software to search their resources, even if it presents them in a unitary and non-comingled-with-others way. Note that they have very specifically not provided any API for Google Scholar, although many people have asked for one.

One exception is the Google Books service. Now, at first Google provided a ‘javascript’ Google Books API which would allow you to sort of kind of embed Google results in your application, but only if the actual requests to Google were coming from the individual browser via AJAX.  If you tried using the javascript API server-side, their rate limiting software would soon notice and cut you off. (They have a similar javascript API for ordinary Google web search results too, I think.)

However, later they provided a “Data API” for Google Books, which you are explicitly allowed to call from server-side applications. This is actually awesome, it lets us do a lot more (I’m using it in Umlaut),  and I’m so pleased they did this — especially because they have NOT done this for any other google search.  Note though that even the Google Books Data API terms of service (to my reading) prevent you from inter-mingling Google Books result list with other services results, so Metalib _still_ couldn’t do it’s thing on GBS.

Google Scholar is a great resource, i wish we could include it in our meta search tools — but Google has decided it’s not in their business interests to allow this. It’s a good reminder that, yes, Google does have business interests, and, yes, they act in them, even when it results in things we don’t like.    It’s refreshing that for once, if local staff asks “why can’t google scholar be in our metasearch product”, I don’t have to blame the metasearch vendor, I can say “Because Google will not allow it”, and in the process help teach that Google is not some utopian charity after all.

In brief, Google rejected the use of these services via MetaSearch, since their policy is to reject automated queries. We understand that Google's technology may be identifying MetaSearch as an automated query, which is identified (and blocked) by IP addresses. We contacted Google to ask about one of these services (Google Scholar) and were informed that Google does not permit use of these services via MetaSearch.

We always respect the decisions and policies of the search engines that act as MetaLib resources. We will therefore de-activate resources that are not supported when we become aware of it, and ensure that all new resources are supported before we add them.

Ex Libris customers are welcome to ask Google to change its policy. If Google decides to support MetaSearch engines for a particular service at any time, we will be happy to re-activate the relevant resource.
We will also endeavor to reach a positive outcome with Google on this issue.

12 thoughts on “Google Scholar does not allow meta-search

  1. Serials Solutions 360 Search stopped supporting Ggogle searches for the same reasons. From Serials Solutions Support Center:
    “Google’s Terms of Use state that any federated search engine, such as 360 Search or WebFeat, is not allowed to display results from Google properties. In order to satisfy Google’s terms, Serials Solutions terminated connections to Google content in both 360 Search and WebFeat in mid-2008.”

  2. I’m honestly surprised folks have been able to query Scholar for as long as they have. I’ve been working with Scholar for 2 years on and off talking to them about supporting an API due to LibraryFind and have been told a number of times that this just isn’t something that they have an interest in.

    –TR

  3. Andrew, I’m curious if you know _when_ SerSol removed Google/Google Scholar, compared to when EL did (just now, or not quite yet even).

    Oh wait, I see your quote said mid 2008. Okay then.

  4. Well, I do not think that not to provide Scholar API is a decision of Google itself. G depends on good will of commercial journal publishers to get their “raw” data and they have their own commercial interests (Chemical Abstracts, Scopus …) which would be endangered by a free Scholar API. So to get the data, G has to simply respect their conditions. I am afraid the G Scholar API is still far away.

  5. I was wondering whether it is possible to use the Mendeley search API for citation data. You would need to access the reference list of all the papers in their database to do this. But I think that journal copyright issues could prevent this. Any ideas on this?

  6. Thanks Tom. The OpenURL interface is not the kind of API I’m talking about here talking about here, to allow software to do fielded keyword searches, and get back structured machine-interpretable data.

    Web of Science/Knowledge does several other API’s, which in combination can sometimes be used for the sorts of use cases we’re talking about here. However they sadly exist in a state of under-documentation, under-awareness, and under-support/maintenance. However, I have used them effectively in the past for certain use cases with Umlaut.

Leave a comment