Special guest post by Eddie Rubeiz
I’m Eddie Rubeiz. Along with the owner of this blog, Jonathan Rochkind, and our system administrator, Dan, I work on the Science History Institute’s digital collections website, where you will find, among other marvels, this picture of the inventor of Styrofoam posing with a Santa “sculpture”, which predates the invention of the term “Styrofoam”:
Our work, unlike the development of polystyrene, is not shrouded in secret. That is as it should be: we are a nonprofit, and the files we store are all mostly in the public domain. Our goal is to remove as many barriers to access as we can, to make our public collection as public as it can be. Most of our materials are open to the public and don’t require us to collect much personal information. So what use could we have for encryption?
Sensitive Data
Well, once in a while, a patron will approach our staff asking that a particular physical item in our collections be photographed. The patron is often a researcher who’s already working with our physical materials. In some of those cases, we determine the item — a rare book, or a scientific instrument, for instance — is also a good fit with the rest of our digital collections, and we add it in to our queue so it can be ingested and made available not just to the researcher, but to the general public.
In many cases, by the time we determined an item was a good fit, we had already done much of the work of cataloging it. The resulting pile of metadata, stored in a Google spreadsheet, then had to be copied and pasted from our request spreadsheet to our digitization queue. To save time over the long run, we decided last December to keep track of these requests inside our Rails-based digital collections web app, thus allowing us to keep track of the entire pipeline in the same place, from the moment a patron asked us to photograph an item all the way until the point it is presented, fully described and indexed, to the public.
Accepting patrons’ names and addresses into our database is problematic. As librarians, we’re inclined to encrypt this information; as software developers, we’re wary of the added complexity of encryption, and all the ways we might get it wrong. On the one hand, you don’t want private information to be seen by to an attacker. On the other hand, you don’t want to throw out the only copy of your encryption key, out of an excess of caution, and find yourself locked out of your own vault. Encryption tends to be difficult to understand, explain, install, and maintain.
Possible Security Solutions
This post on Securing Sensitive Data in Rails offers a pretty good overview of data security options in ruby/rails context, and was very helpful in getting us started thinking about it.
Here are the solutions we considered:
0) Don’t store the names or emails at all. Instead, we could use arbitrary IDs to allow everyone involved to keep track of the request. (Think of those pager buzzers some restaurants hand out, which buzz when your table is ready. They allow the restaurant greeters to avoid keeping track of your name and number in much the same way.) The person who handled the initial conversation with the patron, not our database, would thus be in charge of keeping track of which ID goes with which patron.
1) Disk-level encryption: simply encrypt the drives the database is stored on. If those drives are stolen, an attacker needs the encryption key to decipher anything on the drives — not just the database. Backup copies of the database stored in other unsecured locations remain vulnerable.
2) Database-level encryption: the database encrypts and decrypts data using a key that is sent (with every query) by the database adapter on the webserver. (See e.g. PGCrypto for ActiveRecord). See also postgres documentation on encryption options. One challenge with this approach, since encryption key is sent with many db queries, is keeping it out of any logs.
3) Encrypt just the names and emails — per-column encryption — at the application logic level. When the app pulls them out, they are encrypted. The app is in charge of decrypting them as it reads them, and re-encrypting them before writing them to the database. If an attacker gets hold of the database, they get all of our collection info (which is public anyway), but also two columns of encrypted gobbledygook. To read these columns, the attacker would need the key. In the simplest case, they could obtain this by breaking into one of our web/application servers (on a different machine). But at least our DB backups alone are secure and don’t need to be treated as if they had confidential info.
Our solution: per-column encryption with the lockbox gem
We weighed our options: 0) and 1) were too bureaucratic and not particularly secure either. The relative merits of 2) and 3) are debated at length in this post and others like it. We eventually settled on 3) as the path that affords us the best security given that our web server and DB are on separate servers.
Within 3), and given that our site is a Ruby on Rails site, we gave two tools a test drive: attr_encrypted and lockbox. That post I mentioned before Securing Sensitive Data in Rails was by lockbox’ author, arkane, which raised our confidence that the lockbox author had the background to implement encryption correctly. After tinkering with each, it appeared that both lockbox and attr_encrypted worked as advertised, but Lockbox seemed better designed, coming with fewer initial settings for us to agonize over, but offering a variety of ways to customize it later on should we be unsatistifed with the defaults. Furthermore:
- lockbox works with blind indexing, whereas in attr_encrypted searches and joins on the encrypted data are not available. We do not currently need to search on the columns, and these requests are fairly infrequent (perhaps a hundred in any given year, with only a few active at a time.) But it’s good to know we won’t have to switch encryption libraries in the future if we did need that functionality.
- lockbox offers better support for key management services such as Vault, AWS KMS, and Google Cloud KMS, we consider the logical next step in securing data. For now we’re just leaving keys on the disk of servers that need them but may take this next step eventually — if we were storing birth dates or social security numbers, we would probably up the priority of this.
- attr_encrypted has not been updated for over a year, whereas lockbox is under active development.
We had a proof of concept up and running on our development server within an afternoon, and it only took a few of days to get things working in production, with some basic tests.
An important part of deciding to use lockbox was figuring out what to do if someone did gain access to our encryption key. The existing documentation for Lockbox key rotation was a bit sparse, but this was quickly remedied by the Andrew Kane, the developer of Lockbox, once we reached out to him. The key realization (pardon the pun) was that Lockbox uses both a master key and a series of secondary keys for each encrypted column. The secondary keys are the product of a recipe that includes the master key and the names of the tables and columns to be encrypted.
If someone gets access to your key, you currently need to:
- figure out what all your secondary keys are
- use them to decrypt all your stuff
- generate a new master key
- re-encrypt everything using your new keys
- burn all the old keys.
However, Andrew, within hours of us reaching out via a Github Issue, added some code to Lockbox that drastically simplifies this process; this will be available in the next release.
It’s worth noting in retrospect how many choices were available to us in the decision, and thus how much research was thus needed to narrow them down. The time consuming part was figuring out what to do, but once we had made up our mind, the actual work of implementing our chosen solution took only a few hours of work, some of which involved being confused about some of the lockbox documentation which has since been improved. Lockbox is a great piece of software, and our pull request to implement it in our app is notably concise.
If you have been thinking you maybe should be treating patron data more securely in your Rails app, but thought you didn’t have time to deal with it, we recommend considering lockbox. It may be easier and quicker than you think!
Another byproduct of our investigations was a heightened awareness of technological security in the rest of our organization, which is of course a never-ending project. Where else might this same data be stored that is even less secure than our Rails app? In an nonprofit with over a hundred employees, there are always some data stores that are guarded more securely than others, and focusing so carefully on a particular tool naturally leads one to notice other areas where we will want to do more. One day at a time!