Pretty URIs in Horizon Information Portal 3.x

So, my install of HIP already supported showing you bibs in a variety of formats, courtesy of an open source add-on originally by Casey Durfee.

But I got to thinking, gee wouldn’t it be nice if I had a ‘cool’ URI for my bib records that I could use as an identifier in, say, Umlaut? And hey, as long as I’m doing that, how about more pretty looking and ‘cool’ URIs for those various formats? And, hey, how about some automatic content-negotiation on the original main URI for those other formats?

With just a little bit of apache mod_rewrite hacking on top, done and done.

This isn’t deployed in my production catalog yet mainly because I deploy nothing in production at 5pm on Friday. So, in my test catalog:

A cool URI for a bib record, which in your browser will redirect you to HTML:

Or in mods, via this url or an Accept: application/x-mods+xml or (unregistered) application/mods+xml

Or how about marcxml?

Getting less useful: oai_dc, no problem!

Or, the lamest thing that could technically be called ‘rdf’ imaginable:

(16 April 09 — fixed using Apache [PT]!)You’ll notice that all of those URLs give you an HTTP 302 redirect to an ugly URL. At first I tried having them just plain deliver you the content without a redirect, but there was some weird interaction I didn’t understand between my attempt at an internal redirect, cookies, and Jboss (that HIP runs under), and it all worked fine with a redirect, so I gave up and made it a redirect.

(oops, I just noticed those examples include an identifier for the ISBN that is illegal in at least two different ways. Bah. But those formats were all pre-existing, and improving the actual output is a project for another day. Today, it was just about making new URIs to point to them, with content negotiation.)

Here’s my apache code: (updated 16 April 09 to avoid unneccesary redirects using Apache PT. Man, apache mod_rewrite hacking is like black magic sometimes.)

# Cool URIs and Content Negotiation for HIP. 
# Support a shortcut/identifier uri for a bib
#   HTTP content-negotiation also respected, for non-official MIME types.
#   For just text/xml or application/xml we give marcxml. 
# marcxml, default for content negotiation for xml. 
RewriteCond %{HTTP_ACCEPT} !text/html
RewriteCond %{HTTP_ACCEPT} !application/xhtml\+xml 
RewriteCond %{HTTP_ACCEPT} !text/plain
RewriteCond %{HTTP_ACCEPT} application/marcxml\+xml [OR]
RewriteCond %{HTTP_ACCEPT} application/x\-marc\+xml [OR]
RewriteCond %{HTTP_ACCEPT} application/xml [OR]
RewriteCond %{HTTP_ACCEPT} text/xml
RewriteCond %{REQUEST_URI} ^/bib/(\d+)$
RewriteRule .* /bib/%1/marcxml [R=302]

RewriteCond %{REQUEST_URI} ^/bib/(\d+)/marcxml$
RewriteRule .* /mods/?format=marcxml&bib=%1 [PT]

RewriteCond %{HTTP_ACCEPT} application/mods\+xml [OR]
RewriteCond %{HTTP_ACCEPT} application/x\-mods\+xml
RewriteCond %{REQUEST_URI} ^/bib/(\d+)$
RewriteRule .* /mods/?format=mods&bib=%1 [PT]

RewriteCond %{REQUEST_URI} ^/bib/(\d+)/mods$ 
RewriteRule .* /mods/?format=mods&bib=%1 [PT]

# rdf -- pretty lame rdf, this is more like just DC. but oh well, all we got
# right now. .
RewriteCond %{HTTP_ACCEPT} application/rdf\+xml
RewriteCond %{REQUEST_URI} ^/bib/(\d+)$
RewriteRule .* /mods/?format=rdf&bib=%1 [PT]

RewriteCond %{REQUEST_URI} ^/bib/(\d+)/rdf$
RewriteRule .* /mods/?format=rdf&bib=%1 [PT]

# In OAI-DC. This one we'll deliver if someone asks for /oai-dc. 
# No Internet Content Type (MIME Type) for you, sorry. 
RewriteCond %{REQUEST_URI} ^/bib/(\d+)/oai_dc$
RewriteRule .* /mods/?format=oai&bib=%1 [PT]

# html
RewriteRule ^/bib/(\d+)(/html)?$ /ipac20/ipac.jsp?profile=default&index=BIB&term=$1 [PT]

9 thoughts on “Pretty URIs in Horizon Information Portal 3.x”

  1. That’s really cool! :-)

    Apache redirection was never my strongest point, but does it work if you replace the “[R=302]” with “[L,QSA]” or is that what you tried and didn’t work?

  2. I tried a few things, and then gave up and said, hey, what’s so bad about redirection anyway.

    I find mod_rewrite to be a frustrating partner in crime. I have a love/hate relationship with it. I tried [L], but I didn’t try [L,QSA]. I don’t think QSA will do anything here though, but wait, maybe. You can try if you want, since I know you already have your HIP fronted by apache, making this possible.

    The weird thing was, when I tried [L] it worked fine if you had no cookies, but broke if you had cookies. Or maybe it was vice versa. Either way — huh?

    I forget if a bibnum index comes by default with HIP, but everyone ought to have one, so you should make one if you don’t, that’s the other thing you need for this of course. Oh, and you need Durfee’s mods extension for the alternate formats, of course.

    Some day I want to re-write Durfee’s mods extension to do this by default, take a few other bug fixes we’ve done to it, combine with Durfee’s RSS extension with a few bugfixes, and UChicago’s items-in-xml extension, and package them all together on google code. But Java development is not my strong point.

  3. Wait, to keep posting more and more, I think I realize the problem, involves the content negotiation, in content negotiation scenario two RewriteRules are being applied. Which may be a problem. But who knows, mod_rewrite hacking is often painful for me.

  4. At first I thought you were suggesting I rewrite my OPAC in ruby, heh.

    I’m trying to keep the complexity of my software stack commensurate with how important the extra features are, just an apache conf is about right for this right now.

    Sadly, getting complete jangle functionality will be somewhat more complicated, I think. There’s no easy way to get a feed of all bibs, and especially no good way to get a feed of bibs that have changed since a certain date. Also, while I’m not sure if jangle covers it or not, my potential jangle-using apps will need item-level information, which isn’t available right now. But Tod at UChicago sent me some java code to add on to HIP which makes item level info available, haven’t had time to take a look at it yet.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s