I had previously known about TicTocs, a nice aggregator of journal RSS latest article feeds. The aggregated feeds themselves are public access, but the articles linked to may not be. This put a damper on my thoughts of using this to power a service for my users — I don’t want to provide a service unless I can send users to only licensed copies (or public access, but there’s no way to be sure if it will be or not from the RSS feed). To make matters worse sometimes the publisher-provided copy linked to in the RSS won’t be licensed by our institution, but our institution will have access to the article from an aggregator or other alternate platform — so not only would I be sending users to something they couldn’t access, but in those cases there’s a link they could get the article at, which I wouldn’t be sending them to!
So I was reading this blog entry from Dave Pattern where he mentions a “JournalTocs”. At first I thought this was the old TicTocs, but with a new name. But it would seem to be a different service. TicTocs is, I think, hosted by JISC, while “JournalTOCs is an initiative of the ICBL at Heriot-Watt University and is being managed by Santy Chumbe.”
JournalTocs seems to often get actual structured citation information out of the RSS feeds. Including: Year, volume, issue, start page, end page, DOI. Different feeds have different structured data available. Sometimes there’s a DOI, sometimes there isn’t (obviously sometimes the article may not have a DOI; surely other times it does but JournalTocs doesn’t succeed in sniffing it from the RSS feed). Sometimes there’s vol/issue/page, sometimes only a subset of those, sometimes nothing.
In at least some cases, JournalTocs would seem to be taking structured information from the original publisher feed, which included structured citation information using DC or prism namespaced elements. (I am not familiar with ‘prism’, or where it came from?). I am not sure if in other cases JournalTocs is ‘sniffing’ the data in other ways?
JournalTocs has some basic APIs (returning RSS feeds), including the ability to get an RSS feed by ISSN from JournalTocs itself, instead of the original publishers. I like this, to the extent that JournalTocs may be sniffing non-structured data and then structuring it, or otherwise normalizing the publishers feeds. Here’s the example on their documentation page.
Now what I’m not sure is how many of the feeds from JournalTocs are going to have structured data minimally neccesary to create a good OpenURL link to my link resolver: Either DOI, or year/volume/issue/start-page. Because that’s really my goal here, to be able to use this service in my own services, but sending users to my own link resolver for locally licensed copy, or barring that ILL form.
If I could do that, then I could do some cool stuff. Put a list of recent articles on my catalog detail pages, or Find It link resolver pages. (In fact, I’d probably do the former by making some kind of service in Find It and vending it to the catalog). Or give the users a way to have RSS/Atom feeds whose links took them through our institutional link resolver; or email notification of new articles from a journal, etc.
Some day I’ll have time to work on that, it seems a pretty good project. When/if I do, I’d email the JournalTocs folks to find out more about what they’re doing, and how often I can expect to find sufficient structured data to create an OpenURL. Also curious what level of institutional support this project has, how reliably sustainable it might be, or if it might disappear soon. If anyone has any more info to share though, please do.
One oddity of the JournalTocs recent article API feed (or at least the one in their example?) is that it returns a feed sorted alphabetically. I’d really want a feed sorted by publication date, and ideally by page number within the same publication date. But, if publication date and/or vol/issue/page are in the structured data, my own software could always sort the feed from JournalTocs itself before doing anything else with it.
possible useful client code to write
- A ruby gem that deals with JournalTocs with some ‘value added’. Give it an issn, it’ll look up the JournalTocs feed; provide facilities to translate RSS to Atom if you want; optionally add in an OpenURL context object (not sure the best way to embed a context object in atom or rss. As an <html:span> element using COinS maybe?; or optionally add in a complete http OpenURL to a local link resolver (embed that in atom as a <link rel=z3988> or something).
- A Rails engine-type plugin that puts some controller and view wrappers around that gem, so you can easily create a web app (returning RSS, Atom, HTML, etc) for those functions, or include such functionality in your own app.
- Have Umlaut use that plugin, and then write an Umlaut source adapter to add recent article information to the Umlaut responses (HTML and APIs), so it can be consumed by my catalog etc.