SRU and OpenSearch

Some thoughts on SRU and OpenSearch in response to some off-hand comments Eric Lease Morgan made comparing the two. Disclaimer: I am not actually that familiar with SRU.

I have been spending some time with the OpenSearch spec lately in anticipation of using it in a particular future project I have in mind.  (A generic ‘saved searches’ store as part of Xerxes).

I’ve been quite impressed with it. I think the idea of merging SRU into OpenSearch is a good one.

It’s not really true to say that “OpenSearch returns an RSS-like data stream” — rather an OpenSearch description can specify exactly what format is returned using a MIME type, and can even specify different URLs for retrieving different formats.  So I think it’s rather compatible with SRU here. There’s no reason an OpenSearch description can’t declare results returned in MARC-XML or anything else. (I hope MARCXML, MODS, etc, have declared mime types? If not they need to asap for a billion reasons!)

Also, while an OpenSearch description document doesn’t declare any particular syntax for it’s query, there’s no reason an OpenSearch query _couldn’t_ be in CQL.

It would seem useful to me to extend OpenSearch to allow a description document to specify that CQL is supported, and specify what search indexes the server provides, or any other CQL-related metadata.

One of the most useful parts of OpenSearch is how it allows easy extensibility using custom namespaces. I am actually not that familiar with SRU, and haven’t read that OASIS document Eric mentions yet, but I hope they take the approach of trying to fit SRU into the existing OpenSearch standard, rather than creating a new ‘umbrella’ standard on top of it. I think the OpenSearch standard would be quite suited to this.  Perhaps a CQL or SRU extension to OpenSearch.

I perused the OpenSearch listserv recently, and I didn’t see any mention of OASIS or SRU, but I wasn’t looking for it either.  I hope the OASIS folks are actually talking to the OpenSearch folks about this, rather than just doing things in their own silo. The OpenSearch folks seem to me to be quite on top of things, and interested in making sure OpenSearch supports new use cases in the ‘right’ way.

This entry was posted in General. Bookmark the permalink.

7 Responses to SRU and OpenSearch

  1. Matthias says:

    Jonathan,

    I fully support the sentiment of this post. Recently, I’ve added OpenSearch capabilities to refbase, and I was also amazed at how flexible OpenSearch was. Besides Atom, RSS & HTML, refbase supports MODS & DC XML output via OpenSearch, and allows for search suggestions and CQL querying.

    http://beta.refbase.net/opensearch.php
    http://beta.refbase.net/opensearch.php?operation=explain

    And yes, I’d also wish there’d be dedicated mime types for MODS & DC XML! I don’t think there are. Also, I agree that it would be very nice if the supported CQL search syntax could be indicated in the OpenSearch description document.

    From an application developer perspective who’s interested in integrating different bibliographic services, this is a dream come true – well, at least in theory. Only issue is that, as you mention, everyone is doing his own thing… Sigh.

    Speaking of the developer perspective, I’ve become a big fan of Atom XML, which (for each record) can include direct links to other bibliographic metadata formats (such as MODS), formatted citation formats (HTML, RTF, PDF, LaTeX, etc), or OpenURL & unAPI links. The Atom XML record can also directly include Dublin Core metadata as well as the formatted citation in HTML and plain text. See e.g. the source of this query output (which uses CQL, btw):

    http://beta.refbase.net/opensearch.php?query=dc.title%3DBaltic&maximumRecords=20

    IMHO, this can be quite useful for mashups, especially since there are already many libraries that facilitate parsing of OpenSearch Atom results. The Atom format can be also displayed out of the box by any good feed reader.

    I’ve written more about the refbase OpenSearch implementation at http://opensearch.refbase.net/

    I really wish other services (publisher sites, PubMed, etc) would support OpenSearch with rich data output & support for CQL querying. This would mean a huge step forward for developers & users of bibliographic applications.

  2. Ross says:

    So, yeah, we talked about this a bit, already, but Jangle does pretty much what you mention here, too.

    For searches (although, let me point out early, Jangle has no ‘search requirement’, per se, it just lays out how search should work if the implementer so decides) it uses CQL for the search syntax and OpenSearch+Atom for the result format.

    To meet the functionality of SRU’s ‘explain’ method, Jangle namespaces in a ZeeRex explain document into the OpenSearch Description Document. It’s optional, but would go a long way towards autodiscovery.

    For the MODS/MARCXML thing… sadly there is no specific mime-type, no. They are just application/xml. Because this isn’t unique to library standards (there aren’t mime-types for specific RDF vocabularies, for example), Jangle extends Atom by coining URIs for the ‘format’ of the payload. So binary MARC21 is http://jangle.org/vocab/formats#application/marc and MARCXML is http://jangle.org/vocab/formats#http://www.loc.gov/MARC21/slim. This gives the client a fighting chance at knowing what might be getting transported in the Atom feed.

    Is this more what you’re looking for?

  3. jrochkind says:

    Yeah Ross, that’s quite a bit like what I was imagining. One thing I’m confused about, the sample OpenSearch description uses a “zr:” prefix on XML elements, but doens’t seem to declare that namespace. Is that an error? Oh wait, nevermind, now I see it on the ‘explain’ element itself.

    So in general, would an opensearch desc consumer properly take the presence of a “http://explain.z3950.org/dtd/2.1/:explain” element as evidence that the OpenSearch described can take CQL querries?

    Would there be any point to declaring that you take CQL querries _without_ an ‘explain’ document? Maybe not, so maybe no such element would be needed, the zr:explain existing is a decleration that you can take CQL according to the details of the ‘explain’.

    I guess it would apply to all of the possibly multiple Url elements contained in an OpenSearch desc? I wonder if there would be need to have a given ‘explain’ apply to one Url but not to others? Maybe the zr:explain should be able to appear inside a certain Url element (applying just to that element), or outside, applying to all of them?

    Hey Ross, do you want to help me (or me help you) work up an actual specification for OpenSearch+SRU (not jangle specific, but could be used for jangle), run it by the OpenSearch listserv, and maybe even get them to include it on the OpenSearch website under the non-standard extensions, with a particular URI?

  4. jrochkind says:

    Also, something I’ve wondered about—is bare keywords alone a valid (simple) CQL statement, that means more or less what you’d want it to?

    If it were, that would make things a LOT more convenient for backwards compatibility of an OpenSearch desc with CQL. If a particular opensearch desc consumer didn’t know anything about CQL, it could ignore it, and put ordinary keyword terms in the {searchterms}.

    If ordinary keyword terms _aren’t_ valid CQL, then it probably makes sense to use something other than {searchterms} as the placeholder in the URL template. Use a custom “cql:query” or something instead, so consumers that don’t know cql won’t try to put ordinary searchterms in there.

    But i’m hoping just a sequence of search terms is a valid CQL statement that means something reasonable?

  5. Ross says:

    So, to answer your questions… yeah, I’d love to work on something like this outside of Jangle. I never wanted it to be confined to Jangle, but it was a good place to introduce the idea into the wild, I thought. I’m not sure exactly what would need to be introduced to the OpenSearch crowd. To “mix in” CQL wouldn’t even require extending OpenSearch, since it would all just be done with the {searchTerms} parameter. So no real change would be needed, just some sort of agreement. I suppose some change *would* be needed in the Description Document, so that’s a possibility. Plus the existence of Explain in the Description Document could alert the client that the server supports CQL (and the absence could lead one to assume otherwise).

    As far as different explain docs for different kinds of resources, yes. Each Jangle ‘entity’ has its own Opensearch Description Document/Explain (assuming it is a searchable entity). So to search for a patron would be a completely different search silo than to search for a book.

    As far as Jangle is concerned, if you were to just send a keyword search, it defaults to “cql.serverChoice” to handle the request. This would be illegal in SRU, so in the Jangle implementations, it takes the query, tries to parse it with the CQL parser, catches the exception, prepends ‘cql.serverChoice=’, and then tries to parse it again.

  6. jrochkind says:

    Matthias, I really like what you’ve done with Atom feeds there. I think you, me, and Ross should talk about writing down a standardized way to use of SRU/CQL in OpenSearch. I think it seems mostly pretty straightforward, but I see a couple possible choices that it would be good to standardize.

  7. Ralph LeVan says:

    I think SRU is about as ready as it’s going to be for OpenSearch. I’ve got an OpenSearch description document for our VIAF database. (Prose here: http://outgoing.typepad.com/outgoing/2009/08/viaf-and-opensearch.html and Description Document here: http://viaf.org/allFieldsSearch.xml) Tony Hammond has just submitted a draft of an OpenSearch extension for SRU. If you don’t mind using my VIAF implementation as a starting place, let’s see what we can do to make it meet your expectations.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s