Bibliographic Wilderness

improving citation export in a sufia/hyrax app


Our app is based on Sufia 7, but I believe the relevant parts are mostly true for Hyrax as well; if I know they aren’t, I’ll try to make a note of it.

The out of the box Sufia app offers three citation export links on an individual item (ie Work) page, for: Endnote, Zotero, and Mendeley.

The Zotero and Mendeley links just take you to a page that says:

Exporting to Zotero[Mendeley] is supported via embedded metadata. If Zotero[Mendeley] does not automatically pick up metadata for deposited files, please report the issue via the <%= link_to ‘Contact Form’, sufia.contact_form_index_path %>.

I believe the automatic metadata pickup is supposed to be via COinS.  Putting aside that that is a bit weird UX there, Zotero’s “Save to Zotero” button did do something with “Embedded Metadata”, but didn’t really pick up all the metadata we’d want. I think this is because we hadn’t properly configured all our local custom metadata fields to work with COinS, which I believe in Sufia is done via Rails i18n, and in hyrax by a different mechanism.

I didn’t get to the bottom of this, because either way, COinS isn’t really granular/specific enough to get all the metadata we have as good as it can be for a reference management application — there’s no way to say type “Manuscript”, or provide archival arrangement/location (box/folder).  I’m not sure if there’s a way to send abstract or subject/keywords (which users appreciate included in their export to reference manager, even though they aren’t part of a citation) — and the link I used to use to check what fields are available in standard OpenURL metadata (on which COinS is based) are giving me 404 errors from OCLC today.  Oh, and did I mention that COinS (if not OpenURL itself) is kind of an abandonware standard, the site that documents the standard is currently only available in internet archive wayback machine.

The EndNote export was also not including all of our possible metadata as well as it could be. I’m not sure where I’d customize this for our local fields, perhaps I need to override the Sufia::SolrDocument::Export class; not really sure what’s going on there. But looking at that class suggests that the format it’s calling “EndNote” is this one , which I think is now more commonly called “Endnote Tagged Format” (although I can’t find a reference for that), as distinct from Endnote XML, which I’m also having trouble finding documentation for.

Rather than trying to get each of these existing logic paths working, we decided to initially replace with…

Replace with RIS for everyone

RIS is the closest thing to a “lingua franca” among reference management software. While it is also an abandoned standard (wikipedia links to this capture), pretty much every reference management software can handle it, and in fairly compatible/standard ways — I think mainly due to every new reference management software trying to be compatible with the current market leader at the point it was introduced, all the way back to the no-longer-existing software that originated RIS.

For the same reasons, it seems to be relatively close to the internal data models of most reference management software.  It’s annoying in some ways, including (did we mention) that it’s an unmaintained abandonware standard, there are (undocumented) minor differences between how different software handles it on import, and the same ‘tag’ in RIS can be interpreted differently depending on the ‘type’ of the reference. (Oh, and there’s a limited number of ‘types’, not suitable to the full diversity of the modern digital archive, or even for all types found in modern reference management software!).

But it’s way more expressive than COinS, and close to as expressive as Endnote Tagged Format (probably just as good for the actual metadata we have), and there’s not much better.

And it’s super convenient to be able to write one export which will work with all reference management software, rather than spend extra time (we can’t necessarily afford) to do a custom export for every possible software (and over the past decade the “popular” software has changed several times, and may vary in different disciplines — but they all do RIS).

When I asked in the Zotero forums (the Zotero people are great and tend understand the ecosystem way beyond just their software, as domain experts in a way many of us don’t)  if there was a better format to use for a ‘generic’ import to multiple reference management systems, or even a better format to use just for Zotero, @adamsmith replied:

There is indeed no useful bibliographic exchange format. It’s a fairly ridiculous situation. You’ll get the best import into Zotero using Zotero RDF, but a) that isn’t well documented and b) it’ll probably be replaced with a JSON-LD/ based schema in the not-too-distant future, so I wouldn’t invest heavily in implementing it. Endnote XML is marginally better documented and, by virtue of being XML, more robust, so that might be worth it. BibLaTeX is very precise and exceedingly well documented, but I don’t think many tools other than Zotero do very well importing it (and I don’t know _how_ well Zotero does — most people use this the other way from Zotero to BibLaTeX).

(EndNote XML didn’t look to me significantly more powerful or convenient than RIS for the sorts of data we have, although it’s more straightforward in some ways. Not sure if it has as universal adoption).

In general, if you download an RIS file, and double-click on it, it will open in your installed reference software of choice (or, as in Firefox, depending on your browser and browser preferences, open immediately in your reference manager software without having to find the file and double-click on it). If you have the Zotero Chrome extension installed, it will (at first ask to) “intercept” an RIS download (with proper MIME/IANA content-type header) and immediately send it to Zotero, even though Chrome doesn’t ordinarily do that.

So, rather than figure out how the current Sufia citation export stuff worked to make it work better for us and/or try to improve or expand it, we decided to try replacing the built-in stuff with our own RIS implementation.

Our implementation

I basically just created a ruby class that can take one of our Sufia ‘work’ models, and translate it to RIS — not really all that hard.  Thinking of working towards something shareable, I did split my implementation into a base class that sets up some tools for defining mappings, and a concrete sub-class that defines the mappings.

I originally intended to allow the mappings to look up attributes based on RDF predicates, which might theoretically make it possible to share mappings with more likely chance of working across projects. But I see now I never actually implemented that feature, oops. (And it’s unclear how/if this kind of rdf-predicate-to-model-attribute lookup would work in a valkyrie-based app like planned hyrax 3.0, or if it would be possible to make it work in a standard way).

Then just register the RIS mime type; hook into a CurationConcerns method to have the work show method deliver the RIS using our serializer; generate an on-page link to that action in our already customized view; and that’s pretty much it.

Some interesting parts:

You can take a look at the PR with initial implementation of this feature in our app if you like.  Reviewing it now, looks like that PR accidentally ended up with a new file that is unrelated, and really from a different feature, at `app/views/application/_query_constraint_as_form.html.erb`, oops sorry.  I see now too there is only a limited “smoke test” spec for “converts without raising any exceptions”, so it goes.

How did it turn out? Future improvement?

As I write this, we just now deployed to production, but we earlier did some user testing with several users in a feature demo.  In general, we found out that users have pretty low expectations when it comes to citation export, they are used to it not working perfectly, and most users asked found our system to work at least as well as their expectations of an automated reference export, and often better.  I feel good about the RIS direction as an efficient use of developer time to get pretty decent citation export feature.

There are a couple of outstanding issues:

Child works

We have some things that are ‘works’ in sufia, but are really excerpts from the “work” that should be cited in the reference.  At the moment sometimes we have that ‘parent’ work stored in our Sufia repo, we sometimes don’t.  Our RIS export feature never takes it into account though, it always exports the citation as if it’s a standalone thing based on the title of the ‘work’ in sufia, even if there’s really a parent ‘container’ work that the reference should be based on.  This is a bit hard to get right for both metadata reasons (we might not have sufficient machine-readable metadata in all cases to determine correct citation), and technical reasons (sufia doesn’t make it super easy to get access to parent information in an efficient/performant way).

Zotero toolbar button

If you actually click on the “Export citation” button, it generally gets into Zotero fine. (On Chrome, need Zotero plugin installed; on Firefox with plugin or need to tell Firefox the first time to open .ris with Zotero). But if you have the Zotero browser plugin installed, you have a “Save to Zotero” button in toolbar.  Using this one imports into Zotero as a “web page” (rather than correct citation type for the reference; our users generally wanted reference types based on the original item, not ‘web page’), and with stunted/limited metadata.  (In our case Zotero is picking up “Embedded Metadata” from somewhere, not sure in what format, it was not intentional by me; but if it were not the metadata would be no better).

One of our test users tried this, and was disappointed.

Zotero supports a couple generic options for getting the “Save to Zotero” button to pick up embedded metadata.  COinS, as mentioned, isn’t really expressive enough for our metadata. I’m not sure what they mean about “META tags”, but possibly only applies to RDF? (And I would not be thrilled about figuring out right RDF vocab for Zotero to pick up, and doing the translation). That seems to leave unAPI, from which we could actually expose/re-use our now-existing RIS, great. UnAPI is another kind of abandoned standard, and based on kind of a mis-use of HTML too with possible accessibility concerns. ☹️  It wouldn’t be that hard to implement, but even easier would be if Zotero would just pick up HTML <link rel="alternate" type="type"> tags for Zotero-recognized types. Zotero doesn’t do that at present, but when I asked, there seems to be some support for the idea of it it doing so, with some details (as well as implementation!) to be worked out. (Also, can I say again I love how responsive Zotero devs are on the Zotero forums?).

Of course, if we had no “Export citation” button deployed, the “Save to Zotero” button provided by Zotero plugin would still be there, and still behave unsatisfactorily.

But Deployed

Based on consultation with potential users, we didn’t consider either of these problems severe enough to delay release of our RIS export button, although we’ve made a note of them as possible future improvement to prioritize.  You can see the RIS export feature in action in our current production system, on any individual item page, such as this one, look for the “Export citation” button.