some notes on the DOI environment

It took me a while to understand/figure out some things about how DOIs work in ways related to my applications, both for assignment and using DOIs out there in the world. I wrote the following in a thread on the Code4Lib listserv, seems worth sharing here too, so it’s googlable and such: The overall DOI environment is a bit confusing to understand exactly what the options and trade-offs are and what’s going on.

CrossRef

So there are various DOI top-level registrars that can register DOIs. (I don’t know if “registrar” is actually the name DOI uses for em, it’s what I’m calling them. Any readers who can supply the correct terminology as used by DOI?).

For scholarly publications, CrossRef is the one that’s typically used.  But to register a DOI through CrossRef, a “publisher” needs to have a relationship/membership with CrossRef, which has some non-trivial expenses and obligations (such as being _required_ by CrossRef to look up DOIs for any works cited in the published work, and include those in the published work).  And I’m not entirely sure what tools CrossRef gives you for managing your DOIs.

I am not sure how an individual dude who wants a few DOIs would best go about getting them through CrossRef, might be worth asking CrossRef is there is any feasible way to do this.

Any DOI registration through any registrar will get DOI forwarding resolution. That is, for instance, http request to http://dx.doi.org/something gets redirected to your destination URL.  Any DOI through any registrar gets this.

Metadata lookup

When you register a DOI through CrossRef, you also get a ‘metadata lookup’ service, where a person or machine armed with a DOI can look up article metadata such as author, title, journal, etc., from a CrossRef service. This is not part of the central DOI architecture, but something through CrossRef.  Some library-sector software (such as SFX) is set up to do these metadata lookups from CrossRef on any DOI — if the DOI wasn’t registered through CrossRef, they won’t get any info back.

DateCite/EZID

Now, in addition to CrossRef, there are some other registrars.  One I know about is DateCite.  And as Keven mentions, the EZID service is a pretty slick front-end that allows you to register and maintain DateCite DOIs.  While DataCite was set up for assigning DOIs to data sets,  as far as I know nothing in any policies (of DOI foundation, DataCite, or EZID) prevents you from using DateCite DOIs in general or EZID-managed DataCite DOIs in particular for any kind of information resource you want, including scholarly articles, or whatever.

EZID may still be free at the moment in it’s testing phase, but eventually, I am told by EZID, that there WILL be a cost charged.  However, I think the cost model, tools provided, and obligatory requirements (none) of EZID/DataCitecompared to CrossRef may be more amenable/feasible for a small project. (The EZID service is not limited to DOIs, it also supports several other types of global unique identifiers, which I am largely even less familiar with).  EZID is sort of an intermediary between you and DateCite; I don’t know if there any similar value-added intermediaries with CrossRef, I believe it’s typical for publishers to deal with CrossRef directly without an intermediary.

Metadata lookup

So EZID sounds good for a small project: However, using EZID/DataCite you won’t get the CrossRef metadata lookup that some library-sector applications (like SFX) use.

DataCite is, I hear, planning on adding their own metadata lookup service. But the library applications will still only be using the CrossRef one unless they are updated, and the DataCite one may use an entirely different data schema/vocabulary than CrossRef — this metadata lookup thing is not something actually standardized for DOI in general, in terms of either API or response schema. Or discovering a metadata lookup service to use for a given DOI. (I tend to think DOI probably ought to work on some standardization and auto-discovery here, personally, I don’t know if they are. )

This entry was posted in General. Bookmark the permalink.

7 Responses to some notes on the DOI environment

  1. jrochkind says:

    Hmm, it looks like maybe DOI is working on a standardized architectural way of getting additional metadata: http://www.crossref.org/CrossTech/2011/04/content_negotiation_for_crossr.html

    I can’t completely wrap my head around how well that will work for standardization — it looks like metadata for CrossRef DOIs is already available from the dx.doi.org endpoint, without needing to know it’s a CrossRef DOI. If you ask for it in atom+xml format, you get PRISM standard metadata identifying the article — great.

    The blog post mentions that DataCite is NOT yet participating in this system. If they were…. would they return data using the same vocabularies (in atom or RDF), or different ones?

    The benefits of standardization really only come if DataCite DOIs return metadata using the same vocabularies. So software can request metadata for a DOI without knowing the registrar, and know how to interpret the response in the same way, without knowing the registrar or being unable to predict what vocabularies will be used in what ways. It’s not clear to me if DOI is standardizing this too, or just providing a service “you’ll get some kind of atom+xml (or rdf+xml or turtle), but there’s really no way to predict what vocabularies will be used how, depends on the choices of the registration agencies which could change anytime.” That sort of situation would make writing a reliable software client a pretty expensive proposition, requiring continual investigation and reverse engineering of what you happen to get.

  2. Carol Meyer says:

    This is a very useful post, and is quite accurate about the metadata services that CrossRef offers.

    There are a couple of reasons that there are costs and obligations to assign DOIs through CrossRef. The most important of which is that DOIs are supposed to be persistent, like forever persistent. Requiring a level of financial and contractual commitment is one way for us to make sure that “an individual dude who wants a few DOIs” doesn’t assign DOIs and then just abandon them. One of the obligations you allude to is the obligation to keep the metadata (especially the URLs) current for anything a DOI is assigned to so that it doesn’t ever “break” or fail to resolve. Another of our obligations is that members make reasonable business efforts to archive their content (for example with CLOCKS, Portico, or Koninklijke Bibliotheek) so that CrossRef DOIs continue to resolve even if the individual or organization that assigned the DOI goes out of business or for other reasons ceases to make the content available.

    As members of the International DOI Foundation, which authorizes DOI Registration Agencies, both DataCite and CrossRef are committed to the interoperabiity of the DOI system. At CrossRef we are delighted that DataCite is working on metadata for making data sets discoverable.

    To answer another of your questions, CrossRef has a number of Authorized Affiliates who are qualified to interact with CrossRef on behalf of our member publishers. You can find a list of them here: http://www.crossref.org/01company/08affiliates.html#Affiliates

    We are also working with organizations such as the International Network for the Availability of Scientific Publications (INASP) and the Open Access Scholarly Publishers Association (OASPA) who may be able to assign DOIs for organizations that might not otherwise be able to afford CrossRef fees.

  3. jrochkind says:

    Thanks Carol. Are such arrangements with for instance the OASPA already in place, or you are in discussions?

    One of the obligations I was thinking of was actually the one that says the publisher must look up ALL works cited in any paper assigned a DOI, and list the DOI for those works cited, if such DOI exists, in the works cited. I certainly understand why it’s _good_ for everyone if this is done, but it’s also an expense that in some cases (esp small publishers) serves to make CrossRef DOI’s infeasible.

  4. Carol Meyer says:

    The arrangements with OASPA and INASP are in place, though I believe those organizations may still be ramping up their ability to perform this service.

    There is a very simple reason for the outbound linking requirement. Organizations who assign DOIs to their content benefit by the inbound traffic provided by links FROM other publisher’s content. If a publisher only receives traffic from these links but doesn’t contribute traffic to others’ content, it violates one of our primary tenets of fairness and reciprocity.

    We do have a number of tools available for small volume publishers to do these lookups manually. See for example our Guest Query and Simple Text Query Forms.

  5. jrochkind says:

    Thanks Carol. I understand the rationale, but it’s a barrier for small/ free//volunteer publishers regardless, even with the manual lookup (still a time-consuming operation).

    The fairness rationale makes sense when you’re dealing with large commercial publishers. But if, for example, DataCite or another registration agency would allow a DOI to be assigned to an article without the outbound linking requirement… it’s not clear to me who that would be harming. I do understand it’s _preferable_ to readers and the world in general if DOI’s are provided on every single citation where a DOI is available, this is what makes DOI’s useful. But it’s not immediately obvious to me how it’s unfair to anyone in particular if a small publisher gets a DOI from some arbitrary DOI registration agency, even though they don’t have the resources to do DOI lookup for all works cited.

  6. Carol Meyer says:

    I’m certainly not saying that organizations shouldn’t get DOIs from other RAs if they don’t need our services, just trying to clarify the reasons for our policies.

  7. Tom Pasley says:

    Just to clarify, it’s DataCite – but I’m being picky. Yes, it’s good to see that CrossRef are improving on their openness by enabling content negotiation, and numerous formats for the metadata. I find that CrossRef’s metadata is great, though PubMed works well for DOIs and PIIs (as well as PMIDs).
    Have you had a look at openbiblio.net ? Some interesting stuff going on there too!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s