Hacking the Horizon OPAC (HIP) with Apache

This post is mainly a service to google searchers.   Several hacks at the Apache layer to add functionality to the Horizon OPAC, called the Horizon Information Portal, or HIP.

1. Front HIP with Apache, including https

HIP is actually a JBoss application. The standard install is to have browsers connecting directly to that JBoss app. However, it’s useful and convenient to have Apache serve as a front-end, proxying to HIP. This makes possible the subsequent hacks, using apache mod_rewrite.

Dave Pattern provides instructions for using Apache mod_proxy to provide an Apache reverse proxy front-end to HIP.

I found it was unneccesary to use his perl output filter hack in order to get the right hostname in absolute URLs generated by HIP. Merely including “ProxyPreserveHost On” in the apache conf was sufficient. That causes Apache to send the original Host: header with the proxied request to HIP, and HIP is smart enough to see this and use it for it’s generated absolute URLs.

Here at MPOW, we actually used to use mod_jk instead of simple mod_proxy to provide apache front-end to HIP. This worked, but was implemented by a predecessor, and I never could wrap my head around mod_jk, it’s awfully complicated.  So when upgrading the HIP server, I switched to Pattern’s simpler mod_proxy reverse proxy setup.

https connections

However, I did need to use the perl output filter hack in order to get https connections working.

We wanted to allow both http and https connections to HIP. The initial naive solution is simply to make sure the apache virtual host listening on 443 for https also includes those same mod_proxy reverse proxy directives, so it too will reverse proxy to HIP.

And this works fine, except for one issue. The fact that HIP is in the habit of sometimes generating absolute urls, with hostname and protocol. The ProxyPreserveHost line was enough to make sure that these absolute urls included the ‘right’ host, but they still end up including http://, even if the browser’s connection was really https://.

This is because the apache reverse proxy connection to HIP was http, and HIP notices that.  One approach would be to fix the apache reverse proxy connection to be https, which would require setting up keys in the HIP JBoss instance, etc.  That might have been better, but, man, that got complicated fast to try and understand. (It’s also worth noting that this problem did not occur under the previous mod_jk solution, which I didn’t understand).

Instead, I took advantage of Pattern’s perl output filter hack after all, to make sure all absolute urls pointing back to HIP were https urls.

my $hostname = $f->r->hostname();
$f->print( replace($hostname, scalar($1) ), $2);
sub replace
    my $hostname = shift;
    my $str = shift;

    # Replace either http:// or the escacped version http%3A%2F%2F
    $str =~ s#http((://)|(%3A%2F%2F))$hostname#https\1$hostname#g;       

This is only implemented in the ssl apache virtual host.

2. Store Session in Cookies

WARNING to Google Searchers. I believe this solution caused problems for my HIP installation. I believe it was related to accidentally re-using a session ID from one HIP profile to another, resulting weird errors. I could try to fix this, but the experience kind of put me off trying to hack HIP in this particular way, I think I’ve abandoned it for now.  –jrochkind 30 Jan 09.

HIP passes a session token in the URL, and not in cookies.

This means that when a user clicks back and forth between HIP and other apps in our environment, they can often lose their session, and wind up creating a new one. This is an inconvenience to the user, who has to log in again, and puts extra load on HIP (extra resources are needed for each session created).

Since we’re fronting HIP with Apache, we can use Apache mod_rewrite and mod_headers to hack this, without touching the HIP code. We eavesdrop on HIP in apache.conf.

When we see a session token in a URL, we store it in a cookie. When we see a HIP URL with no session token, but we have a session token in a cookie, we add it back into the URL with mod_rewrite.

We also need to do some hacking of the Referer header using mod_headers. It turns out that HIP has some code built in such that if it gets a session token in a URL, and the HTTP Referer does NOT match HIP, it will log the user out (keeps the same session, just logs the user out in it). This is apparently intended so that if you copy and paste a HIP URL and email it to someone else, they won’t accidentally get a session logged in as you. [Thanks to Lare Mischo and others on the HIP listserv for alerting me to this. I never would have figured out what was going on on my own, I would have just given up.]

This needs to be defeated for our session cookie storing, because it defeats the purpose. So we again use apache mod_headers to ‘fake’ the referer to be our own host, when restoring a session token from a cookie. However, we want to preserve this protection against accidental session sharing. So we make sure to only set the cookie in the first place if the referer matches HIP.

Note that this kind of referer checking (whether in HIP, or our hacks in apache) is not true security. An attacker can easily fake a referer. It is only protection against accidental session sharing.

# If we have a session in the URL, store it in a cookie.
# ONLY if referer matches HIP, to
# prevent an emailed or stored link with sessionID from giving
# a third party (like Aunt Millie) someone elses login.
# Session cookie only, will dissappear on browser-close
# (but watch out for Firefox saved session feature, which will
# keep session cookie around for tabs that exist on quit)
  RewriteCond %{HTTP_REFERER} /ipac20/ipac.jsp
  RewriteCond %{QUERY_STRING} (^|;|&)session=([^;&]+)
  RewriteRule (.*) - [CO=hip_session:%2:%{HTTP_HOST}]

# If we have a session in a cookie but NOT in the URL,
# put it in the URL.
#  1) ONLY for GET requests, to avoid interfering with
#     HIP's own login procedure.
#  2) In addition to setting it in the query string, set an
#     env variable that we use later with mod_headers to
#     fake the HTTP referer, to get around HIP's own
#     referer checking, that will cause an auto-logout
#     if referer isn't HIP.
  RewriteCond %{REQUEST_METHOD} ^GET$
  RewriteCond %{QUERY_STRING} !(^|;|&)session=([^;&]+)
  RewriteCond %{HTTP_COOKIE} "(^| )hip_session=([^;]+)"
  RewriteRule (.*) $1?session=%2 [QSA,PT,E=SESSION_FROM_COOKIE:1]
# Fake the referer
  RequestHeader set Referer "http://%{HTTP_HOST}e/ipac20/ipac.jsp?faked_referer=true&original_referer=%{HTTP_REFERER}e" env=SESSION_FROM_COOKIE

3. Support Shibboleth Login to HIP

This one got crazy. The basic outline:

Provide a button in the HIP login screen that says “login with your enterprise ID” or whatever. When pressed, this submits a form to a seperate CGI script, and tells it the original HIP URL the user was trying to look at, and the HIP session token (which can sometimes be inferred from the original HIP URL, but sometimes can’t.)  This is done by hacking HIP XSLT. It actually turns out to be tricky to figure out the current URI in the HIP XSLT, it’s not provided for you, you need to calculate a good approximation over again instead.

This CGI script is protected by apache Shibboleth module, so the user is prompted to authenticate with Shib when accessing it (if they don’t already have a Shib session).  So when the script executes, it knows the user’s enterprise ID.

The script then needs to look up the user’s borrower account based on their enterprise ID.  We store the user’s enterprise ID in the “other_id” column that Horizon generously provides. The script accesses the Horizon Sybase database directly, looking up the user’s barcode and PIN based on that other_id.  (Adding an index to the Horizon Sybase for the other_id column is a good idea, since it doesn’t have one by default).

Then, the script actually accesses HIP behind the scenes with it’s own HTTP requests, to get a login form, and submit it with the barcode and pin looked up.  Now HIP thinks the user has logged into their session.

And finally, the script redirects the user back to the HIP URL they were originally looking at. And since HIP now thinks they are logged in, it doesn’t throw a login screen at them, but lets them look at whatever it was they were trying to look at. (Basically, either a request form, or their account information).

But recall HIP’s annoying behavior above, where it automatically logs a user out if the referer doesn’t match.  Becuase this Shib login process involves a bunch of redirects, including to forms, the Referer will no longer match HIP when the user comes back. So, we again had to add something to the apache conf to fake the Referer header when it’s coming from the Shib login process.

Phew. But it works. Code available on request.

Update:  Versions

A reader asked about platforms/versions I am using.

HIP 3.08.  (Although any HIP 3.x should probably work the same. Not the abomination that is HIP 4.x).

Red Hat Enterprise Linux 4.  (Although Dave Pattern does similar things on Windows with Apache, I have no Windows server experience whatsoever myself. And I’d rather be using RHEL5 but SirsiDynix says they’ll only support RHEL4. Although it’d probably work anyway, I stick to RHEL4).

Apache 2.0.52 (Any Apache 2.x after the first few versions should work, I think there were some bugs in the first couple 2.x’s in mod_headers).

2 thoughts on “Hacking the Horizon OPAC (HIP) with Apache

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s