So, I’ve found out about a couple new things from Google I hadn’t known about. (Google is such a prominent player in our space, we need to keep up with what’s going on there so we know how to exploit it to maximum effect. I need to remember to go explore google’s interfaces and documentation more regularly to see changes). 1. Google search API now allows server-side access. 2. Google search allows limit on usage license. And both these things got me started about open access discoverability again.
1. Google API allows server-side access!
Thanks to Kent Fitch for alerting us on the code4lib listserv.
“An area to pay special attention to relates to correctly identifying yourself in your requests. Applications MUST always include a valid and accurate http referer header in their requests. In addition, we ask, but do not require, that each request contains a valid API Key.”
Now if only they’d do the same thing for the Google Books Discoverability api. That’s where I really need it; it’s still not clear to me how I might usefully incorporate automated general google search (including google scholar) into my library applications dealing with scholarly materials, because of the high chance that what Google returns will be for-pay and not available to my users: I don’t want to show them that.
So it was with interest I noticed a new feature:
2. Google search supports usage rights limit
Take a look at the Google advanced search page. Click on “
Unfortunately, some initial test searches revealed that this is a tiny piece of the actual open access pie. Many scholarly materials that ARE available online open access are not in fact in Google’s indexes. Probably because they don’t advertise it properly in a machine-readable way? Still, this is a great step by Google, and indicates that Google recognizes users are increasingly having trouble with getting too much restricted content in their google search results.
But my frustration remains with the scholarly open access community. If the problem is that open access repositories aren’t advertising CC licenses properly–why aren’t these software packages (many of them open source) being fixed? Why isn’t there general concerted funded effort from the open access repository community to solve this general problem: And the general problem is there’s no good place to search aggregated open access content and ONLY open access content. To use in software that wants to answer the question “Is there an open access version of the article with this title and author available?” No good way to do it. And this lack of discoverability is a huge problem with the utility of the existing open access repository domain. I don’t understand why there isn’t more concerted effort to solve it.
Although, in fairness, I did recently become aware of a European initiative, that’s apparently actually funded, to address at least part of this issue. Registering in machine readable format whether content is open access is the first step to building aggregated indexes. (It’s a dirty secret of the ‘open access repository’ domain that much of the content in so-called “open access repositories” is not in fact open access at all, it’s behind IP and password based restrictions. A cursory sample of items in repositories listed in the OpenDOAR–whose collection policies say that a reason for EXCLUSION from OpenDOAR is “Site requires login to access any material (gated access) – even if freely offered”–will reveal that that collection policy is quite often honored in the breach. Although I guess DOAJ has less of a problem with that, and that SPARC/DOAJ initiative is just about DOAJ, so it’s not clear to me that the SPARC project will really address my problem. I guess the SPARC project is about people not being sure if they can re-use material in DOAJ journals—my problem is being able to do a meta-search limited to publically available open access content in the first place, and I don’t care if it’s licensed for re-use, I just want to find only stuff that is actually viewable online for free!
Hmph. What can we do to get the open repositories communities to take note of this problem and address resources toward it?