OAISter -> points to plenty of non open access stuff

So I had been operating under the incorrect assumption that OAISter only aggregated feeds which claimed to be of open access materials.

After embarrassingly sending them a letter (and cc’ing code4lib) asking for clarification I noticed their collection development policy page. (Embarrassing because I should have checked first).


  • We harvest and retain all records that point to digital resources.
  • This includes freely-available and restricted-access digital resources.

While apparently this has always been their policy, until recently the vast majority of what they held seemed to be open access, such that many of us didn’t notice the restricted for-pay stuff in there. In the past 6 months to a year they seemed to have added a bunch of feeds with large amount of restricted for-pay stuff, such that it’s not uncommon to run into it.

It still took me until just now to realize what was going on, and that I couldn’t in fact use OAISter as a reliable search engine of public access content. I bet many of you reading this haven’t realized it yet either, which is why I point this out.

But this is highly unfortunate. I thought I could use OAISter as a search engine covering a large swatch of the public-access scholarly web. I really could use a source I ca swarch with a known title and author to see if the article is available public access somewhere, in a reliable way (yeah, you could just search google, but I’m using this in software that wants to make a decision about what to do with it based on it being public access! I don’t want to present my users with links they can’t access!). I thought that was OAISter, but it’s not.

So is there anything else that will do that for me? I don’t think so. It’s a gaping hole. We need someone to create an OAI-PMH aggregator that, unlike OAISter, will only take feeds of public access content.

OASTer responds

To my letter sent without appropriate research.

Hi Jonathan,

We have always included more than open access repositories. You can find our
collection development policy at:
We developed that policy about a year ago because of questions just like

We fully understand the need for an aggregator to only OA materials.
However, we are currently in the midst of infrastructure and hosting changes
for OAIster, and are not able to undertake that at the moment. We would hope
we can achieve this at some point. In the meantime, have you checked out
OpenDOAR? They have a service akin to what you're looking for, and they do
provide full-text searching.

Please let me know if you have further questions.
 -Kat [Hagedorn]

To which I say

Glad they are considering it. And thank them for the pointer to DOAJ, looks like that’s what I _really_ needed when I was using OAISter instead. DOAJ claims to offer an XML API (wish it were jusr SRU instead). Guess I”m off to investigate it and try to implement a DOAJ query into Umlaut.


Their API is just of repositories, not of article-level metadata.  They have a custom google search of article-level metadata, but we all know google has no apis anymore. Drat drat drat. So I’m back to having NO available option for searching a large swatch of the public access scholarly internet via API. This is a big problem.

Hmm, I am feeling awfully stymied.


5 thoughts on “OAISter -> points to plenty of non open access stuff”

  1. Jonathan –

    As someone who’s worked with OAI PMH based aggregations a fair amount, I can say that one of the main problems with trying to weed out restricted material from an OAIster style aggregation is that the metadata providers do not use standard language to indicate that access is restricted or don’t indicate that access is restricted at all.

    You could try to weed this out on a repository or set by set basis, but that’s not always possible (depends on how the OAI data provider has been set up), and in some cases the restricted material is going to be mixed up with the open material (like in the IR I’m responsible for at UIUC – we don’t have much restricted access material but there’s some and it’s not necessarily separated off in a specific collection).

    It is very frustrating – we’d love to do fancy things with the metadata that we got via OAI PMH on some of our projects at UIUC, but the metadata just doesn’t let you do it without an awful lot of fussing.

    Sarah Shreeves

  2. I guess I’m imagining an aggregator that only took feeds/sources that could say they’d be all (or even, say, 90%) open access. You’d definitely be leaving out a lot of content in sources that couldn’t provide such a feed, but it would be worth it.

    But yeah, the lack of all but the most rudimentary unreliable metadata from most sources is a barrier to doing anything too fancy with it. But if I just had a feed that I could be pretty much confident would be almost all open access, I could do some incredibly useful things as far as discovery. I think it’s totally do-able to do something useful, just needs a bit of resources behind it.

    Hopefully as time went on more sources would be able to at least seperate their OA from their not OA in the feed to provide an open access OAI-PMH collection. After all, the source has to already _know_ whether a given item is going to be displayed to the public or not; that info is recorded somehow, it is not a technical challenge (for the software developers) to build your OAI-PMH collection(s) off of it.

  3. Hi

    Good post, agree we need an OAIster which only searches across open access content. I am probably one of many who were not aware of their policy and presume it was a service to search open content.

    On a different note,
    I’ve heard people talk about (on the jisc-repository mailing list) the disadvantages of OAI-PMH. My (basic) understanding is that OAI-PMH concentrates on describing records, and not the actual full text content.

    I can see what they mean. Looking at this example

    (you may need to do ‘view source’ to see the actual XML OAI-PMH

    You can see that there are two relations. Now one so happens to be a copy of the item (full text) within the repository that, for whatever reason, is only accessible by the repository admin (which so happens to be me). The other relation is a DOI Url which will take you to the publisher’s copy of the article.

    Now I know nothing about OAI-PMH, but I can see nothing in the code which states these facts, i.e. that one is an external link, and one is not publicly available. That seems like not a good thing to me.

    Chris Keene

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s