crazy use of encryption to protect refworks callback urls

Dealing with export to Refworks is relatively straightforward, here are their docs on how to support it.  I’ve been using the “RIS Format” when I can as the format to send to Refworks. And their ‘callback’ method — the ‘form post’ method would require you to include (eg) RIS in every web page with an ‘export to refworks’ link on it just in case the user wants to click ‘export’, that’s no good.

The problem comes in when the things you are exporting come from a tool that is licensed only for your affiliated users, you aren’t supposed to provide public access to it.   For instance, I’m implementing a service that provides an embedded ‘article search’ service in our library platform, powered by the EBSCOHost api.  Only logged in users can use this service, per our EBSCOHost license, and the app protects acccess to require authentication first, great.

We’re still clearly allowed to provide for an export of individual user-selected records  to Refworks, that isn’t the problem. (After all, the EBSCO native interface provides for this kind of export to Refworks, among other places, too).

The problem is that we need to give a callback URL to Refworks, which delivers RIS-formatted records from our system for records originally from EBSCO.  And we can’t auth-protect that callback URL, or Refworks can’t get in!

So first thought, who cares?  What does this expose that it shouldn’t, if we just leave it open? Well, I’d have a callback url, where if you know an EBSCO database code, and an EBSCO document accession number, you can get the metadata for that document in RIS, without authenticating (but no fulltext ever). At first I figured, eh, who cares, is someone really going to notice this?  In a way that I or EBSCO notice them noticing?  Then someone on #code4lib reminded me that if there’s any opening at all, someone from overseas will try to systematically download the whole thing. And I figured, yeah, I better protect this.

What are the current standard practices?

I’m not the only one that has to deal with this; anyone that does a Refworks export has to.

So, with the help of LiveHTTPHeaders in Firefox, I went to see what EBSCO native HTML interface did for Refworks exports. (Still haven’t figured out if there’s a way to use Chrome debugging tools to track HTTP transaction flow over multiple redirects, or found another chrome plugin that does this. This is pretty much the only thing I still open up FF for).

EBSCO gives a callback URL to Refworks that includes a session token, like:
http://web.ebscohost.com/refworks?sid=347b5b92-1154-45b8-841f-6198df2b14d7@sessionmgr4&vid=4&hid=18

So even though these URLs aren’t protected, there’s no way for an ‘attacker’ to construct URLs themselves to download arbitrary known items, because presumably only documents that were associated with a particular session are even available, and you’d have to know the sessionID to get them, which you wouldn’t ordinarily. On the other hand, this method may actually EXPOSE sessionID’s by sending them in cleartext over the wire.

But more importantly, this would be a pain to implement in my own app design. When a user clicked on ‘export to refworks’, I’d have to first have that request go to my app, where it saved what document they were asking for and associated it with some session-specific opaque token, and then redirect them to refworks with a callback url using that opaque token.  That’s a lot of extra work, plus some session-specific state that has to be stored on my server and cleaned up later.

I knew Bill Dueber at umich had done a whole bunch of work with Refworks exports, and I knew Bill Dueber writes code that works correctly and considers security vulnerabilities, so I asked him what he did. He has a variation on that same pattern: opaque tokens that are associated with the ‘real’ document (or set of documents), and which are only valid for a short duration. These opaque tokens can be used in refworks callback urls. But again it means you need logic to store these tokens and their mapping to the ‘real’ documents or document ID’s in a local persistent store (like a DB), then you need logic to ‘expire’ them or clean them out to keep your data from growing forever, etc.

I didn’t want to do that.

Weirdest use of encryption ever?

So i was brainstorming.

  • Okay, we want to construct an opaque token of some kind….
  • Which my app can translate back to a document unique id….
  • But using some method such that people who aren’t my app can’t construct their own token that will translate to a document ID of their choice….
  • But without having to store the token-to-real-ID mapping in a database, having it instead somehow be entirely algorithmic and not depend on any persistent state.

Then I realized, duh, this is pretty much what encryption is, right?  I construct and give to refworks a URL with an encrypted version of the real ID.  In the URL that delivers the RIS, I take the input token, and decrypt it to get the real ID, which I use to look up the record, convert to RIS, and deliver that RIS.

That’s a crazy idea… but Rails gives us the tools that it can be implemented in about 10 lines and 20 minutes… and it works. So why not?

I’m using Rails ActiveSupport::MessageEncryptor class; which is associated with code used for Rails cookies, although Rails ordinarily only cryptographically signs but does not encrypt cookies (and you need a third party gem, not just configuration, to get it to encrypt), the ActiveSupport library still includes the functionality to encrypt too.

For Rails built-in cryptographic cookie signing function, every Rails app has a secret key, usually in ./config/initializers/secret_token.rb (and lately there’s been some attention to the fact that some rails apps accidentaly publish their secret key in a public repo! Don’t do that.).

I don’t re-use this existing secret key, because just in case what I’m doing is insecure in a way that allows the key to be guessed, I don’t want to make Rails session security vulnerable too. But no problem, just run `rake secret` in a Rails app to get a new random secret key, and put it in a new config variable in secret_token.rb:

SampleMegasearch::Application.config.refworks_callback_secret_token = '98acd.......'

Then it’s as easy as providing these two methods in my ApplicationController, for encrypting and decrypting record ID’s:


  # The encrypt one we make a helper method, so that views can generate.
  def encrypt_callback_record_id(id)
    encrypter = ActiveSupport::MessageEncryptor.new( 
      Rails.application.config.refworks_callback_secret_token 
    )
    return encrypter.encrypt_and_sign(id)
  end
  helper_method :encrypt_bento_id

  def decrypt_callback_record_id(encrypted_id)
    encrypter = ActiveSupport::MessageEncryptor.new( 
      Rails.application.config.refworks_callback_secret_token 
    )
    return encrypter.decrypt_and_verify(encrypted_id)
  end

When generating a URL, just encrypt the ID, say some_route_path( encrypt_callback_record_id( unique_id )). And in the action that is going to deliver the RIS, just decrypt it: real_id = decrypt_callback_record_id( params[:encrypted_id] )

Why am I using encrypt_and_sign/decrypt_and_verify instead of just encrypt/decrypt? Well, ActiveSupport::MessageEncryptor deprecates the non-signed variants, because for the expected cookie use of MessageEncryptor anyway, encrypting without signing makes you vulnerable to a certain kind of attack. Is my use vulnerable to that attack? Do I even care if it is, since this ain’t fort knox? I don’t know, but I don’t want deprecation warnings from ActiveSupport, so I just sign/verify, why not.

It does make the encrypted-and-signed opaue token 50 bytes or so longer. Yeah, the encrypted-and-signed form of a 10-20 char real ID is 150 chars of gobbledygook base64-encoded whatever, looking something like:

http://example.com/refworks_callback/NVVWcHFqTy9nRmlDMXgva2FXRWZkVEkwbnJtZHNFMnBMMFB2ZVdDb2lLWT0tLU5VUVFPVGg5bXBrL3FWUGdtQkt0VFE9PQ==e5753e47b559822a1120e7b36488ecf829671eca.ris (Yeah, the rails .ris suffix to trigger RIS params[:format], why not).

But nobody but Refworks is ever going to see this, what do I care that it’s ugly. 200 or even 500 char URLs are still well within the limits of what’s safe for common systems to deal with.

If the encrypted base64-encoded version could have a period in it, it would mess up rails routing… it doesn’t seem to, but since it’s not clearly documented what Base64-encoding variant Rails is using here, it would be safer to use URLs that look more like /refworks_callback?encrypted_id=NVWcHF.... instead, if you want to actually be careful.

And anyway, it all seems to work. What do you think, is this genius or horribly too clever?

More sundry notes on Refworks Export

One other random Refworks gotcha. Once you direct the user to the Refworks expressimport.asp url with your callback as a param… Refworks will use Javascript to rename the window to ‘RefWorksMain’. It’s best if you put a target="RefWorksMain" on your hyperlink tag yourself to keep things consistent and avoid confusion. (If you put a target="somethingElse" instead, you might think that each time a user clicks on such a hyperlink, they’ll re-use the same new window. But they won’t, it seemed to end working as if it was just target="_blank" instead, because of whatever Refworks JS was doing.

This entry was posted in General. Bookmark the permalink.

One Response to crazy use of encryption to protect refworks callback urls

  1. Wait, my only choices are “genius” and “horribly too clever”???

    I really like this for the one-record solution. My use case involves users sending one-or-more records to RefWorks, so I used something a little heavier, but this made me wonder if I could just have a comma-delimited list of internal IDs and send those

    A quick test of generating a list of nine-digit IDs, gzipping the list, then encrypting it, then base64 encoding it (whew!) shows that I could squeeze at least 250 nine-digit ids into a URL with plenty of room for the rest of the URL while staying under the 2000-byte URL length limit. More complex ids don’t gzip as well (random nine-character alphanumerics only gives me room for about 175 ids), but it’s not a bad solution at all if you’re dealing with reasonable demands.

    I threw my test code in a gist: https://gist.github.com/4484735

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s