Encrypting patron data (in Rails): why and how

Special guest post by Eddie Rubeiz

I’m Eddie Rubeiz. Along with the owner of this blog, Jonathan Rochkind, and our system administrator, Dan, I work on the Science History Institute’s digital collections website, where you will find, among other marvels, this picture of the inventor of Styrofoam posing with a Santa “sculpture”, which predates the invention of the term “Styrofoam”:

Ray McIntire posed with Styrofoam Santa Claus
Ray McIntire posed with Styrofoam Santa Claus

Our work, unlike the development of polystyrene, is not shrouded in secret. That is as it should be: we are a nonprofit, and the files we store are all mostly in the public domain. Our goal is to remove as many barriers to access as we can, to make our public collection as public as it can be. Most of our materials are open to the public and don’t require us to collect much personal information. So what use could we have for encryption?

Sensitive Data

Well, once in a while, a patron will approach our staff asking that a particular physical item in our collections be photographed. The patron is often a researcher who’s already working with our physical materials. In some of those cases, we determine the item — a rare book, or a scientific instrument, for instance — is also a good fit with the rest of our digital collections, and we add it in to our queue so it can be ingested and made available not just to the researcher, but to the general public.

In many cases, by the time we determined an item was a good fit, we had already done much of the work of cataloging it. The resulting pile of metadata, stored in a Google spreadsheet, then had to be copied and pasted from our request spreadsheet to our digitization queue. To save time over the long run, we decided last December to keep track of these requests inside our Rails-based digital collections web app, thus allowing us to keep track of the entire pipeline in the same place, from the moment a patron asked us to photograph an item all the way until the point it is presented, fully described and indexed, to the public.

Accepting patrons’ names and addresses into our database is problematic. As librarians, we’re inclined to encrypt this information; as software developers, we’re wary of the added complexity of encryption, and all the ways we might get it wrong. On the one hand, you don’t want private information to be seen by to an attacker. On the other hand, you don’t want to throw out the only copy of your encryption key, out of an excess of caution, and find yourself locked out of your own vault. Encryption tends to be difficult to understand, explain, install, and maintain.

Possible Security Solutions

This post on Securing Sensitive Data in Rails offers a pretty good overview of data security options in ruby/rails context, and was very helpful in getting us started thinking about it.

Here are the solutions we considered:

0) Don’t store the names or emails at all. Instead, we could use arbitrary IDs to allow everyone involved to keep track of the request. (Think of those pager buzzers some restaurants hand out, which buzz when your table is ready. They allow the restaurant greeters to avoid keeping track of your name and number in much the same way.) The person who handled the initial conversation with the patron, not our database, would thus be in charge of keeping track of which ID goes with which patron.

1) Disk-level encryption: simply encrypt the drives the database is stored on. If those drives are stolen, an attacker needs the encryption key to decipher anything on the drives — not just the database. Backup copies of the database stored in other unsecured locations remain vulnerable.

2) Database-level encryption: the database encrypts and decrypts data using a key that is sent (with every query) by the database adapter on the webserver. (See e.g. PGCrypto for ActiveRecord). See also postgres documentation on encryption options. One challenge with this approach, since encryption key is sent with many db queries, is keeping it out of any logs.

3) Encrypt just the names and emails — per-column encryption — at the application logic level. When the app pulls them out, they are encrypted. The app is in charge of decrypting them as it reads them, and re-encrypting them before writing them to the database. If an attacker gets hold of the database, they get all of our collection info (which is public anyway), but also two columns of encrypted gobbledygook. To read these columns, the attacker would need the key. In the simplest case, they could obtain this by breaking into one of our web/application servers (on a different machine). But at least our DB backups alone are secure and don’t need to be treated as if they had confidential info.

Our solution: per-column encryption with the lockbox gem

We weighed our options: 0) and 1) were too bureaucratic and not particularly secure either. The relative merits of 2) and 3) are debated at length in this post and others like it. We eventually settled on 3) as the path that affords us the best security given that our web server and DB are on separate servers.

Within 3), and given that our site is a Ruby on Rails site, we gave two tools a test drive: attr_encrypted and lockbox. That post I mentioned before Securing Sensitive Data in Rails was by lockbox’ author, arkane, which raised our confidence that the lockbox author had the background to implement encryption correctly. After tinkering with each, it appeared that both lockbox and attr_encrypted worked as advertised, but Lockbox seemed better designed, coming with fewer initial settings for us to agonize over, but offering a variety of ways to customize it later on should we be unsatistifed with the defaults. Furthermore:

  • lockbox works with blind indexing, whereas in attr_encrypted searches and joins on the encrypted data are not available. We do not currently need to search on the columns, and these requests are fairly infrequent (perhaps a hundred in any given year, with only a few active at a time.) But it’s good to know we won’t have to switch encryption libraries in the future if we did need that functionality.
  • lockbox offers better support for key management services such as Vault, AWS KMS, and Google Cloud KMS, we consider the logical next step in securing data. For now we’re just leaving keys on the disk of servers that need them but may take this next step eventually — if we were storing birth dates or social security numbers, we would probably up the priority of this.
  • attr_encrypted has not been updated for over a year, whereas lockbox is under active development.

We had a proof of concept up and running on our development server within an afternoon, and it only took a few of days to get things working in production, with some basic tests.

An important part of deciding to use lockbox was figuring out what to do if someone did gain access to our encryption key. The existing documentation for Lockbox key rotation was a bit sparse, but this was quickly remedied by the Andrew Kane, the developer of Lockbox, once we reached out to him. The key realization (pardon the pun) was that Lockbox uses both a master key and a series of secondary keys for each encrypted column. The secondary keys are the product of a recipe that includes the master key and the names of the tables and columns to be encrypted.

If someone gets access to your key, you currently need to:

  • figure out what all your secondary keys are
  • use them to decrypt all your stuff
  • generate a new master key
  • re-encrypt everything using your new keys
  • burn all the old keys.

However, Andrew, within hours of us reaching out via a Github Issue, added some code to Lockbox that drastically simplifies this process; this will be available in the next release.

It’s worth noting in retrospect how many choices were available to us in the decision, and thus how much research was thus needed to narrow them down. The time consuming part was figuring out what to do, but once we had made up our mind, the actual work of implementing our chosen solution took only a few hours of work, some of which involved being confused about some of the lockbox documentation which has since been improved. Lockbox is a great piece of software, and our pull request to implement it in our app is notably concise.

If you have been thinking you maybe should be treating patron data more securely in your Rails app, but thought you didn’t have time to deal with it, we recommend considering lockbox. It may be easier and quicker than you think!

Another byproduct of our investigations was a heightened awareness of technological security in the rest of our organization, which is of course a never-ending project. Where else might this same data be stored that is even less secure than our Rails app? In an nonprofit with over a hundred employees, there are always some data stores that are guarded more securely than others, and focusing so carefully on a particular tool naturally leads one to notice other areas where we will want to do more. One day at a time!

Intentionally considering fixity checking

In our digital collections app rewrite at Science History Institute, we took a moment to step back and  be intentional about how we approach “fixity checking” features and UI, to make sure it’s well-supporting the needs it’s meant to.  I think we do a good job of providing UI to let repository managers and technical staff get a handle on a reliable fixity checking service, that others may be interested in seeing as an example to consider. Much of our code was implemented by my colleague Eddie Rubeiz.

What is “fixity checking”?

In the field of digital preservation, “fixity” and “fixity checking” basically just means:

  • Having a calculated checksum/digest value for a file
  • Periodically recalculating that value and making sure it matches the recorded expected value, to make sure there has been no file corruption.

See more at the Digital Preservation Coalition’s Digital Preservation Handbook.

Do we really need fixity checking?

I have been part of some conversations with peers wondering if we really need to be doing all this fixity checking. Modern file/storage systems are pretty good at preventing byte corruption, whether on-premises or cloud PaaS, many have their own low-level “fixity checking” with built-in recovery happening anyway. And it can get kind of expensive doing all that fixity checking, whether in cloud platform fees or local hardware, or just time spent on the systems.  Reports of actual fixity check failures (that are not false positives) happening in production are rare to possibly nonexistent.

However, I think everyone I’ve heard questioning is still doing it. We’re not sure we don’t need it,  industry/field best practices still mostly suggest doing it, we’re a conservative/cautious bunch.

Myself, I was skeptical of whether we needed to do fixity checking — but when we did our data migration to a new system, it was super helpful to at least have the feature available to be able to help ensure all data was migrated properly. Now I think it’s probably worthwhile to have the feature in a digital preservation system; I think it’s probably good enough to “fixity check” files way less often than many of us do though, maybe as infrequently as once a year?

But, if we’re gonna do fixity checking, we might as well do it right, and reliably.

Pitfalls of Fixity Check Features, and Requirements

Fixity checks are something you need for reliablity, but might rarely use or even look at — and that means it’s easy to have them not working and have nobody notice. It’s a “requirements checklist” thing, institutions want to be able to say the app supports it, but some may not actually prioritize spending much time to make sure it’s working, or the exposed UI is good enough to accomplish the purpose of it.

And in fact, when we were implementing the first version of our app on sufia (the predecessor to hyrax) — we realized that the UI in sufia for reporting fixity check results on a given file object seemed to be broken, and we weren’t totally sure it was recording/keeping the results of it’s checks. (If a fixity check fails in a forest, and…) This may have been affecting other institutions who hadn’t noticed either, not sure. It’s sort of like thinking you have backups, but never testing them, it’s a pitfall of “just in case” reliability features. (I did spend a chunk of time understanding what was going on and submitting code to hyrax fix it up a bit.).

If you have an app that does regular fixity checking, it’s worth considering: Are you sure it’s happening, instead of failing to run (properly or at all) due to an error? How would you check that? Do you have the data and/or UX you need to be confident fixity checking is working as intended, in the absence of any fixity failures?

A fixity check system might send a “push” alert in case of a fixity check failure — but that will usually be rare to nonexistent.  We decided that in addition to being able to look at current fixity check status on an individual File/Asset — we need some kind of “Fixity Health Summary” dashboard, that tells you how many fixity checks have been done, which Files (if any) lack fixity checks, if any haven’t gotten a fixity check in longer than expected, total count of any failing fixity check, etc.

This still relies on someone to look at it, but at least there is some way in the UI to answer the question “Are fixity checks happening as expected”.

Fixity Check record data model

Basically following the lead set by sufia/hyrax, we keep a history of multiple past fixity checks.

In our new app, which uses ordinary ActiveRecord postgres rdbms, it’s just a one-to-many association between Asset (our file model class) and a FixityCheck model. 

Having many instead of one fixity status on record did end up significantly complicating the code compared to keeping only the latest fixity check result. Because you often want to do SQL queries based on the date and/or status of the latest fixity check result, and needing to get this from “the record from the set of associated FixityChecks with the latest date” can be kind of tricky to do in SQL, especially when fetching or reporting over many/all of your Assets.

Still, it might be a good idea/requirement? I’m not really sure, or sure what I’d do if I had it to do over, but it ended up this way in our implementation.

Still, we don’t  want to keep every past fixity check on record — it would eventually fill up our database if we’re doing regular fixity checks. So what do we want to keep?  If a record keeps passing fixity every day, there’s no new info from keeping em all, we decided to mostly just keep fixity checks which established windows on status changes. (I think Hyrax does similar at present).

  • The first fixity check
  • The N most recent fixity checks (where N may be 1)
  • Any failed checks.
  • The check right before or right after any failed check, to establish the maximum window that the item may have been failing fixity, as a sort of digital provenance context. (The idea is that maybe something failed, and then you restored it from backup, and then it passed again).

We have some code that looks through all fixity checks for a given work, and deletes any checks not spec’d as keepable as above.Which we normally call after recording any additional fixity check.

Our “FixityCheck” database table includes a bunch of description of exactly what happened: the date of the fixity check, status (success or failure), expected and actual digest values, the location of the file checked (ie S3 bucket and path), as well as of course the foreign key to the Asset “model” object that the file corresponds to.

We also store the digest algorithm used. We use SHA512, due to general/growing understanding that MD5 and SHA1 are outdated and should not be used, and SHA512 is a good one. But want to record this in the database for record-keeping purposes, and to accomodate any future changes to digest algorithm, which may require historical data points using different algorithms to coexist in the database.

The Check: Use shrine API to calculate via streaming bytes from cloud storage

The process of doing a fixity check is pretty straightforward — you just have to compute a checksum!

Because we’re going to be doing this a lot, on some fairly big files (generally we store ~100MB TIFFs, but we have some even larger ones) we want the code that does the check to be as efficient as possible.

Our files are stored in S3, and we thought doing it as efficiently as possible means calculating the SHA512 from a stream of bytes being read from S3, without ever storing them to disk. Reading/writing from disk is actually a pretty slow thing for a process to do, and also risks clogging up disk IO pipelines if lots of processes are doing it at once. And by streaming, calculating iteratively based on the bytes as we fetch them them over the network (which the SHA512 algorithm and most other modern digesting algorithms are capable of), we can get a computation faster.

We are careful to use the proper shrine API to get a stream from our remote storage, avoid shrine caching read bytes to disk, and pass it to the proper ruby OpenSSL::Digest API to calculate the SHA512 from streamed bytes.  Here is our implementation. (Shrine 3.0 may make this easier).

Calculate for 1/Nth of all assets per night

If our goal is to fixity check every file once every 7 days, then we want to spread that out by checking 1/7th of our assets every night. In fact we wanted to parameterize that to N, although N==7 for us at present, we want the freedom to make it a lot higher without a code rewrite.  To keep it less confusing, I’ll keep writing as if N is 7.

At first, we considered just taking an arbitrary 1/7th of all Assets, just take the Asset PK, turn it into an integer with random distribution (say MD5 it, I dunno, whatever), and modulo 7.

But we decided that instead taking the 1/7th of Assets that have been least recently checked (or never checked; sort nulls first) has some nice properties. You always check the things most in need of being checked, including recently created assets without yet a check. If some error keeps some thing from being checked or having a check recorded, it’ll still be first in line for the next nightly check.

A little bit tricky to find that list of things to check in SQL cause of our data model, but a little “group by” will do it, here’s our code. We use ActiveRecord find_each to make sure we’re being efficient with memory use when iterating through thousands+ of records.

And some batching in postgres transactions writing results to try to speed things up yet further (not actually sure how well that works). Here’s our rake task for doing nightly fixity checking.— which can show a nice progress bar when run interactively. We think it’s important to have good “developer UI” for all this stuff, if you actually want it to be used regularly — the more frustrating it is to use, the less it will get used, developers are users too!

It ends up taking somewhere around 1-2s per File to check fixity and record check for our files which are typically 100MB or so each. The time it takes to fixity check a file mainly scales with the size of the file. We think mainly simply waiting on streaming the bytes from S3 to calculate a digest (even more than the CPU time of actually calculating the digest).  So it should be pretty parallelizable, although we haven’t really tried parallelizing it, cause this is fast enough for us at our scale. (We have around ~25K Files, 1.5TB of total original content).

Notification UI

If a fixity check fails, we want a “push” notification, to actually contact someone and tell them it failed. Currently we do that with both an email and register an error to the Honeybadger error reporting service we already used. (since we already have honeybadger errors being reported to a slack channel with a honeybadger integration, this means it goes to our Slack too).

Admin UI for individual asset fixity status

In the administration page for an individual Asset, you want to be able to confirm the fixity check status, and when the last time a check happened was. Also, you might want to see when the earliest fixity check on record is, and look at the complete recorded history of fixity checks (what’s the point of keeping them around if you aren’t going to show them in any admin UI?)

Screenshot 2019-12-16 15.48.34.png

That “Fixity check history” link is a little expand/contract collapsible control, the history underneath it does not start out expanded. Note it also confirms the digest algorithm used (sha512), and what the actual recorded digest checksum at that timestamp was.

As you can see we also give a “Schedule a check now” button — this actually queues up a fixity check as a background ActiveJob, it’s usually complete within 10 or 20 seconds. This “schedule now” button is useful if you have any concerns, or are trying to diagnose or debug something.

If there’s a failure, you might need a bit more information:

Screenshot 2019-12-16 15.55.30.png

The actual as well as expected digest value; the postgres PK for the table recording this logged info, for a developer to really get into it; and a reverse engineered AWS S3 Console URL that will (after you login to your AWS account with privs) take you to the S3 console view of the key, so you can investigate the file directly from S3, download it, whatever.

(Yeah, all our files are in S3).

Fixity Health Dashboard Admin UI

As discussed, we decided it’s important not just to be able to see fixity check info for a specified known item, but to get a sense of general “fixity health”.

So we provide a dashboard that most importantly will tell you:

  • If any assets have a currently failing fixity check
  • If any assets haven’t been fixity checked in longer than expected (for us at present, last 7 days).
    • But there may be db records for Assets that are still in process of ingesting; these aren’t expected to have fixity checks (although if they are days old and still not fully ingested, it may indicate a problem in our ingest pipeline!)
    • And if an Asset was ingested only in the past 24 hours, maybe it just hasn;t gotten it’s first nightly check, so that’s also okay.

It gives some large red or green thumbs-up or thumbs-down icons based on these values, so a repository/collections manager that may not look at this page very often or be familiar with the details of what everything means can immediately know if fixity check health is good or bad.

screencapture-digital-sciencehistory-org-admin-fixity-report-2019-12-16-16_02_51

In addition to the big green/red health summary info at the top, there’s some additional “Asset and Fixity Descriptive Statistics” that will help an administrator, especially a more technical staff member, get more of a sense of what’s going on with our assets and their fixity checks in general/summary, perhaps especially useful for diagnosing a ‘red’ condition.

Here’s another example from a very unhealthy development instance. You can see the list of assets failing fixity check is hyperlinked, so you can go to the administrative page for that asset to get more info, as above.

screencapture-localhost-3000-admin-fixity-report-2019-12-16-16_08_00.png

The nature of our schema (a one-to-many asset-to-fixity-checks instead of a single fixity check on record) makes it a bit tricky to write the SQL for this, involving GROUP BYs and inner subqueries and such.

The SQL is also a bit expensive, despite trying to index what can be indexed — I think whole-table-aggregate-statistics are just inherently expensive (at least in postgres) — our fixity health summary report page can take ~2 seconds to return in production, which is not terrible by some standards, but not great — and we have much smaller corpus than some, it will presumably scale slower linearly with number of Assets.  One approach to dealing with that I can think of is caching (possibly with calculation in a bg job), but it’s not bad enough for us to require that attention at present.

Current mysteries/bugs

So, we’re pretty happy with this feature set — although fixity check features are something we don’t actually use or look at that much, so I’m not sure what being happy with it means — but if we’re going to do fixity checking, we might as well do our best to make it reliable and give collection/repository managers the info they need to know if it’s being done reliably and we think we’ve done pretty good, and better than a lot of things we’ve seen.

There are a couple outstanding mysteries in our code.

  1. While we thought we wrote things and set things up to fixity check 1/7th of the collection every night… it seems to be checking 100% of the collection every night instead. Haven’t spent the time to get to the bottom of that and find the bug.
  2. For a while, we were getting fixity check failures that seemed to be false positive failures. After a failure, if we went to the Asset detail page for the failed asset and clicked “schedule fixity check now” — it would pass. (This is one reason that “fixity check now” button is a useful feature!).  Not sure if there’s a race condition or some other kind of bug in our code (or shrine code) that fetches bytes. OR it also could have just been byproduct of some of our syncing/migration logic that was in operation before we went totally live with new site — I don’t believe we’ve gotten any fixity failures since we actually cut over fully to the newly launched site, so possibly we won’t ever again and won’t have to worry about it. But in the interests of full disclosure, wanted to admit it.

Sprockets 4 and your Rails app

Sprockets 4.0 was released on October 8th 2019, after several years of beta, congratulations and hooray.

There are a couple confusing things that may give you trouble trying to upgrade to sprockets 4 that aren’t covered very well in the CHANGELOG or upgrade notes, although now that I’ve taken some time to understand it, I may try to PR to the upgrade notes. The short version:

  1. If your Gemfile has `gem ‘sass-rails’, ‘~> 5.0’` in it (or just '~> 5'), that will prevent you from upgrading to sprockets 4. Change it to `gem ‘sass-rails’, ‘~> 6.0’` to get sass-rails that will allow sprockets 4 (and, bonus, will use the newer sassc gem instead of the deprecated end-of-lifed pure-ruby sass gem).
  2. Sprockets 4 changes the way it decides what files to compile as top-level aggregated compiled assets. And Rails (in 5.2 and 6) is generating a sprockets 4 ‘config’ file that configures something that is probably inadvisable and likely to do the wrong thing with your existing app.
      • If you are seeing an error like Undefined variable: $something this is probably affecting you, but it may be doing something non-optimal even without an error. (relevant GH Issue)
      • You probably want to go look at your ./app/assets/config/manifest.js file and turn //= link_directory ../stylesheets .css to //= link 'application.css'.
      • If you are not yet in Rails 6, you probably have a //= link_directory ../javascripts .js, change this to link application.js
      • This still might not get you all the way to compatiblity with your existing setup, especially if you had additional top-level target files. See details below.

The Gory Details

I spent some hours trying to make sure I understood everything that was going on. I explored both a newly generated Rails 5.2.3 and 6.0.0 app; I didn’t look at anything earlier. I’m known for writing long blog posts, cause I want to explain it all! This is one of them.

Default generated Rails 5.2 or 6.0 Gemfile will not allow sprockets 4

Rails 5.2.3 will generate a Gemfile that includes the line:

gem 'sass-rails', '~> 5.0'

sass-rails 5.x expresses a dependency on sprockets < 4 , so won’t allow sprockets 4.0.0.

This means that a newly generated default rails app will never use sprockets 4.0. And you can’t get to sprockets 4.0 by running any invocation of bundle update, because your Gemfile links to a dependency requirement tree that won’t allow it.

The other problem with sass-rails 5.x is it depends on the deprecated and end-of-lifed pure-ruby sass gem. So if you’re still using it (say with a default generated Rails app), you may be seeing deprecation “please don’t use this messages” too.

So some people may have already updated their Gemfile. There are a couple ways you can do that:

  • You can change the dependency to gem 'sass-rails', '~> 6.0' (or '>= 5.0'), which is what an upcoming Rails release will probably do)
  • But sass-rails 6.0 is actually a tiny little wrapper over a different gem, sassc-rails. (which itself depends on non-deprecated sassc instead of deprecated pure-ruby sass).  So you can also just change your dependency to gem 'sassc-rails', '~> 2.0',
  • which you may have already done when you wanted to get rid of ruby-sass deprecation warnings, but before sass-rails 6 was released. (Not sure why they decided to release sass-rails 6 as a very thin wrapper on a sassc-rails 2.x), and have Rails attempt to still generate a Gemfile with sass-rails
  • Either way, you will then have a dependency requirement tree which allows any sprockets `> 3.0` (which is still an odd dependency spec; 3.0.0 isn’t allowed, but 3.0.1 and higher are? It probably meant `>= 3.0`? Which is still kind of dangerous for allowing future sprockets 5 6 or 7 too…) — anyway, so allows sprockets 3 or 4.

Once you’ve done that, if you do a bundle update now that sprockets 4 is out, you may find yourself using it even if you didn’t realize you were about to do a major version upgrade. Same if you do bundle update somegem, if somegem or something in it’s dependency tree depends on sprockets-rails or sprockets, you may find it upgraded sprockets when you weren’t quite ready to.

Now, it turns out Rails 6.0.0 apps are in exactly the same spot, all the above applies to them too. Rails intended to have 6.0 generate a Gemfile  which would end up allowing sass-rails 5.x or 6.x, and thus sprockets 3 or 4.

It did this by generating a Gemfile with a dependency that looks like ~> 5, which they thought meant `>= 5` (I would have thought so too), but it turns out it doesn’t, it seems to mean the same thing as ~> 5.0, so basically Rails 6 is still in the same boat. That was fixed in a future commit, but not in time for Rails 6.0.0 release — Rails 6.1 will clearly generate a Gemfile that allows sass-rails 5/6+ and sprockets 3/4+, not sure about a future 6.0.x.

So, Rails 5.2 won’t allow you to upgrade to sprockets 4 without a manual change, and it turns out accidentally Rails 6 won’t either. That might be confusing if you are trying to update to sprockets 4, but it actually (accidentally) saves you from the confusion that comes from accidentally upgrading to sprockets 4 and finding a problem with how top-level targets are determined. (Although if even before sprockets4 came out you were allowing sass-rails 6.x to avoid deprecated ruby-sass… you will be able to get sprockets 4 with bundle update, accidentally or on purpose).

Rails-Sprockets built-in logic for determining top-level compile targets CHANGES depending on Sprockets 3 or 4

The sprockets-rails gem actually has a conditional for applying different logic depending on whether you are using Sprockets 3 or 4.  Rails 5.2 or 6 won’t matter; but in either Rails 5.2 or 6, changing from Sprockets 3 to 4 will change the default logic for determining top-level compile targets (the files that can actually be delivered to the browser, and will be generated in your public/assets directory as a result of rake assets:precompile).

This code has been in sprockets-rails since sprockets-rails 3.0, released in December 2015(!). The preparations for sprockets 4 are a long time coming.

This means that switching from Sprockets 3 to 4 can mean that some files you wanted to be delivered as top-level targets no longer are; and other files that you did not intend to be are; in some cases, when sprockets tries to compile as a top-level target when not intended as such, the file actually can’t be compiled as such without an error, and that’s when you get an error like Undefined variable: $something— it was meant as a sass “partial” to be compiled in a context where that variable was defined, but sprockets is trying to compile it as a top-level target.

rails-sprockets logic for Sprockets 3

If you are using sprockets 3, the sprockets-rails logic supplies a regexp basically saying the files `application.css` and `application.js` should be compiled as top-level targets. (That might apply to such files found in an engine gem dependency too? Not sure).

And it supplies a proc object that says any file that is in your local ./app/assets (or a subdir), and has a file extension, but that file extension is not `.js` or `.css`  => should be compiled as a top-level asset.

  • Actually not just .js and .css are excluded, but anything sprockets recognizes as compiling to .js or .css, so .scss is excluded too.

That is maybe meant to get everything in ./app/assets/images, but in fact it can get a lot of other things, if you happened to have put them there. Say ./app/assets/html/something.html or ./app/assets/stylesheets/images/something.png.

rails-sprockets logic for Sprockets 4

If you are using sprockets-4, sprockets won’t supply that proc or regexp (and in fact proc and regexp args are not supported in sprockets 4, see below), but will tell sprockets to start with one file: manifest.js.

This actually means any file in any subdir of app/assets (maybe files from rails engine gems too?), but the intention is that this refers to app/assets/config/manifest.js.

The idea is that the manifest.js will include the sprockets link, link_directory, and link_tree methods to specify files to treat as top-level targets.

And possibly surprising you, you probably already have that file there, because Rails has been generating that file for new apps for some time. (I am not sure for how long, because I haven’t managed to find what code generates it. Can anyone find it? But I know if you generate a new rails 5.2.3 or rails 6 app, you get this file even though you are using sprockets 3).

If you are using sprockets 3, this file was generated but not used, due to the code in sprockets-rails that does not set it up for use if you are using sprockets 3. (I suppose you could have added it to Rails.application.config.assets.precompile yourself in config/initializers/assets.rb or wherever). But it was there waiting to be used as soon as you switched to sprockets4.

What is in the initial Rails-generated app/assets/config/manifest.js?

In Rails 5.2.3:

//= link_tree ../images
//= link_directory ../javascripts .js
//= link_directory ../stylesheets .css

This means:

  • Anything in your ./app/assets/images, including subdirectories
  • Anything directly in your `./app/assets/javascripts` (not including subdirs) that ends in `.js`.
  • Anything directly in your `./app/assets/stylesheets` (not including subdirs) that ends in `.css`
    • So here’s the weird thing, it actually seems to mean “any file recognized as a CSS” file — file ending in `.scss` get included too. I can’t figure out how this works or is meant to work;  Can anyone find better docs for what the second arg to `link_directory` or `link_tree` does or figure it out from the code, and want to share?

Some significant difference between sprockets3 and sprockets4 logic

A initially generated Rails 5.2.3 app has a file at ./app/assets/javascripts/cable.js. It is referenced with a sprockets require from the generated application.js; it is not intended to be a top-level target compiled separately. But a default generated Rails 5.2.3 app, once using sprockets 4 — will compile the cable.js file as a top-level target, putting it in `public/assets` when you do rake assets:precompile. Which you probably don’t want.

It also means it will take any CSS file (including .scss)  directly (not in subdir) at ./app/assets/stylesheets and try to compile them as top-level targets. If you put some files here that were only intended to be `imported` by sass elsewhere (say, _mixins.scss), sprockets may try to compile them on their own, and raise an error. Which can be a bit confusing, but it isn’t really a “load order problem”, but about trying to compile a file as a top-level target that wasn’t intended as such.

Even if it doesn’t raise an error, it’s spending time compiling them, and putting them in your public/assets, when you didn’t need/want them there.

Perhaps it was always considered bad practice to put something at the top-level `./app/assets/stylesheets` (or ./app/assets/javascripts?)  that wasn’t intended as a top-level target… but clearly this stuff is confusing enough that I would forgive anyone for not knowing that.

Note that the sprockets-rails code activated for sprockets3 will never choose any file ending in .js or .css as a top-level target, they are excluded. While they are specifically included in the sprockets4 code.

(Rails 6 is identical situation to above, except it doesn’t generate a `link_directory` referencing assets/javascripts, becuase Rails 6 does not expect you will use sprockets for JS, but will use webpacker instead).

I am inclined to say the generated Rails code is a mistake and it probably should be simply

//= link_tree ../images 
//= link application.js # only in Rails 5.2
//= link application.css

You may want to change it to that. If you have any additional things that should be top-level targets compiled, you will have to configure them seperately…

Options for configuring additonal top-level targets

If you are using Sprockets 3, you are used to configuring additional top-level targets by setting the array at Rails.application.config.assets.precompile. (Rails 6 even still generates a comment suggesting you do this at./config/initializers/assets.rb).

The array at config.assets.precompile can include filenames (not including paths), a regexp, or a proc that can look at every potential file (including files in engines I think?) and return true or false.

If you are using sprockets4, you can still include filenames in this array. But you can not include regexps or procs. If you try to include a regexp or proc, you’ll get an error that looks something like this:

`NoMethodError: undefined method `start_with?' for #`
...sprockets-4.0.0/lib/sprockets/uri_utils.rb:78:in `valid_asset_uri?'

While you can still include individual filenames, for anything more complicated you need to use sprockets methods in the `./app/assets/config/manifest.js` (and sprockets really wants you to do this even instead of individual filenames).

The methods available at `link`, `link_directory`, and `link_tree`. The documentation isn’t extensive, but there’s some in the sprockets README , and a bit more in sourcecode in a somewhat unexpected spot.

I find the docs a bit light, but from experimentation it seems to me that the first argument to link_directory and link_tree is a file path relative to the manifest.js itself (does not use “asset load path”), while the first argument to link is a file path relative to some dir in “asset load path”, and will be looked up in all asset load paths (including rails engine gems) and first one found used.

  • For instance, if you have a file at ./app/assets/images/foo/bar.jpg, you’d want //= load foo/bar.jpg since all subdirs of  ./app/assets/ end up in your “asset load path”.
  • I’m not sure where what I’m calling the “asset load path” is configured/set, but if you include a //= load for some non-existent file, you’ll conveniently get the “asset load path” printed out in the error message!

The new techniques are not as flexible/powerful as the old ones that allowed arbitrary proc logic and regexps (and I think the proc logic could be used for assets in dependent engine gems too). So you may have to move some of your intended-as-top-level-targets source files to new locations, so you specify them with the link/link_tree/link_directory functions available; and/or refactor how you are dividing things between what asset files generally.

What went wrong here? What should be fixed?

Due to conditional logic in sprockets 3/4, very different logic for determining top-level targets will be used when you update to sprockets 4. This has affected a lot of people I know, but it may affect very few people generally and not be disruptive? I’m not sure.

But it does seem like kind of a failure in QA/release management, making the upgrade to sprockets 4 not as backwards compat as intended. While this roadblock was reported to sprockets in a 4.0 beta release back in January, and reported to Rails too in May, sadly  neither issue received any comments or attention from any maintainers before or after sprockets 4.0 release; the sprockets one is still open, the rails one was closed as “stale” by rails-bot in August.

This all seems unfortunate, but the answer is probably just that sprockets continues to not really have enough maintainers/supporters/contributors working on it, even after schneem’s amazing rescue attempt.

If it had gotten attention (or if it does, as it still could) and resources for a fix… what if anything should be done? I think that Rails ought to be generating the ./app/assets/config/manifest.js with eg //= link application.css instead of //= link_directory ../stylesheets .css.

  • I think that would be closer to the previous sprockets3 behavior,  and would not do the ‘wrong’ thing with the Rails 5.2.3 cable.js file. (In Rails 6 by default sprockets doesn’t handle JS, so cable.js not an issue for sprockets).
  • This would be consistent with the examples in the sprockets upgrading guide.

I think/guess it’s basically a mistake, from inconsistent visions for what/how sprockets/rails integration should or would work over many years with various cooks.

Since (by accident) no Rails has yet been released which will use Sprockets 4 (and the generated manifest.js file) without a manual change to the Gemfile, it might be a very good time to fix this before an upcoming Rails release that does. Becuase it will get even more confusing to change at a later date after that point.

The difficulties in making this so now:

  • I have been unable to find what code is generating this to even make a PR. Anyone?
  • Finding what code is generating it would also help us find commit messages from when it was added, to figure out what they were intending, why they thought this made sense.
  • But maybe this is just my opinion that the generated manifest.js should look this way. Am I wrong? Should (and will) a committer actually merge a PR if I made one for this? Or is there some other plan behind it? Is there anyone who understands the big picture? (As schneems himself wrote up in the Saving Sprockets post, losing the context brought by maintainers-as-historians is painful, and we still haven’t really recovered).
  • Would I even be able to get anyone with commit privs attention to possibly merge a PR, when the issues already filed didn’t get anyone’s attention? Maybe. My experience is when nobody is really sure what the “correct” behavior is, and nobody’s really taking responsibility for the subsystem, it’s very hard to get committers to review/merge your PR, they are (rightly!) kind of scared of it and risking “you broke it you own it” responsibility.

Help us shneems, you’re our only hope?

My other conclusion is that a lot of this complexity came from trying to make sprockets decoupled from Rails, so it can be used with non-Rails projects. The confusion and complexity here is all about the Rails/sprockets integration, with sprockets as a separate and decoupled project that doens’t assume Rails, so needs to be configured by Rails, etc. The benefits of this may have been large, it may have been worth it — but one should never underestimate the complexity and added maintenance burden of trying to make an independent decoupled tool, over something that can assume a lot more about context, and significantly added difficulty to making sprockets predictable, comprehensible, and polished. We’re definitely paying the cost here, I think a new user to Rails is going to be really confused and overwhelemed trying to figure out what’s going on if they run into trouble.

 

open source, engineering professional ethics, complicity, and chef

So an open topic of controversy in open source philosophy/ideology/practice (/theology), among those involved in controversing on such things, has been “field of endeavor” restrictions. If I release software I own the copyright to as (quasi-)open source, but I try to say that legally you can’t use it for certain things, or the license suggests I have the legal right to withdraw permission for certain entities to be named later… is this truly “open source”? Is it practical at all, can we as developers get what we want out of shared collaborative gift-economy-esque software if everyone starts doing that? GPL/rms says it’s not workable to try it,  and the Open Source Initiative says it’s not “open source” if you try it. Both the GPL/”viral”/free-as-in-libre and the Apache/MIT-style/unencumbered/”corporate” sides of open source theology seem to agree on this one, so maybe the controversy hasn’t been all that open, but it comes up in internet arguments.

I’m honestly not sure how to work it all out in legal/licensing or social/practice-of-engineering systems, I don’t think there’s a pat answer, but I know I wouldn’t be happy about software I wrote and shared open source with “gift economy” intentions, to find it was being used — with no interaction with me personally — by, say, the Nazis in Nazi Germany, or, just another of course unrelated example, ICE/CBP. It would lead me to question how I had directed my labor, based on the results.

But that basic situation is NOT, in fact, quite what’s going on here, or at least all that’s going on here, in this article from Vice’s Motherboard, ‘Everyone Should Have a Moral Code’ Says Developer Who Deleted Code Sold to ICE, by Joseph Cox.

Rather than releasing open source software and discovering that someone had chosen to use it for unpleasant purposes on their own, Chef, Inc. instead seems to have a $100,000 contract with ICE of some kind, where Chef makes money helping or providing software to help ICE manage their information systems in some way (using the chef software).

And Seth Vargo used to work for Chef, Inc., but no longer does… but apparently still had admin permissions to code repos and release artifacts for some open source parts of chef. And maybe kept making open source code writing/reviewing/releasing contributions after he was no longer an employee? Not sure. The Motherboard article is short on the details we curious software engineers would want on the social/business/licensing aspects, and I haven’t done the research to track it all down yet, sorry; I don’t believe the specific nature of Chef Inc’s business with ICE is publicly known.

Personally, I was aware of chef-the-software but my own experience with it has not gone beyond skimming docs to get a basic idea of what it does. I had been under the (mistaken?) impression the whole thing was open source, which left me confused by what code Chef Inc “sold” to ICE (in the Motherboard headline) how… but googled and discovered it had been “open core”, but in April 2019 all the code was released with an apache license… still a bit confused what’s going on.

At any rate, Seth Vargo apparently was kinda furious that code he wrote was being used to help ICE manage their information systems, for organizing, you know, concentration camps and child abuse and fundamental violations of human rights and dignity and stuff like that.  (And if it were me, I’d be especially enraged that someone was making money off doing that with the code I wrote, not sure how that reaction fits into a moral philosophy, but I know I’d have it).  And Vargo did some things he could to disrupt it, at least a bit, (basically deleting and misconfiguring things that can, ultimately, still be fairly easily/quickly restored). I think he deserves support for doing so, and for bringing more attention to the case in part by doing so.

Meanwhile, these quotes from Chef CEO Barry Crist are just ridiculous. 

“While I understand that many of you and many of our community members would prefer we had no business relationship with DHS-ICE, I have made a principled decision, with the support of the Chef executive team, to work with the institutions of our government, regardless of whether or not we personally agree with their various policies,” Crist wrote, who added that Chef’s work with ICE started during the previous administration.

“My goal is to continue growing Chef as a company that transcends numerous U.S. presidential administrations. And to be clear: I also find policies such as separating families and detaining children wrong and contrary to the best interests of our country,” he wrote.

This is the statement of a moral coward. He does not seem to realize he’s essentially telling us “I want you to know, I have values, I’m not a monster! It’s just that I’m willing to sacrifice all of them for the right price, like anyone would be, right?”

He even suggests there is something “principled” about the decision “to work with the institutions of our government, regardless of whether or not we personally agree with their various policies.” While 1930s IBM agreed with the “principle” of aiding efforts of any government whose money was good, say, maybe in Germany, “whether or not anyone personally agreed” with the efforts they were aiding… this is a self-serving sociopathic Ayn Rand-ian “principle”.

These comments kept burning me up, I couldn’t get them out of my head… and then I realized this is basically the conversation in Boots Riley’s batshit political parody(?) 2018 film Sorry to Bother You (SPOILERS AHEAD) , in what was for me the most genius gut-punchingly horribly hilarious moment in a movie that has plenty of them. A scene which doesn’t come across nearly as well in just text transcript without the context and body language/tone and exceptional delivery of the actors, but I’m gonna give it to you anyway.

So at one point in Sorry To Bother You, just after Cash has discovered the results of the rich CEO’s secret plan for engineering horse-human hybrids out of kidnapped conscripts, the CEO has shown the terrified and confused Cash an in-house promotional video explaining the, uh, well-thought-out business model for horse-human slave labor. The video ends, and:

CEO: See? It’s all just a big misunderstanding.

Cash: This ain’t no fucking ‘misunderstanding’, man.
So, you making half-human half-horse fucking things so you can make more money?

CEO: Yeah, basically. I just didn’t want you to think I was crazy. That I was doing this for no reason. Because this isn’t irrational.

Cash: Oh…. Cool. Alright. Cool…. No, I understand. I just, I just got to leave now, man. So, please get the fuck out of my way.

Of course we don’t agree with what ICE is doing, we don’t want you to think we’re crazy… it’s just that the principle of being able to “grow Chef as a company” by helping them do those things wins out, right?

With what I know now, I would  never work for Chef Inc. or contribute any code to any chef projects, and will be using any power or sway I have to dissuade anyone i work for or with from using chef. (I don’t think any do at present, so it’s not much of a sacrifice/risk to me at present, to be sure).  Engineering ethics matter. These are not good times, it’s not always clear to me either what to do about it, anyone who sees somewhere they can afford to intervene should take the opportunity, we can’t afford to skip any, large or small.

Incidentally I found out about this story by  seeing this cryptic post on the rubygems blog, noticing it via the Rubyland News aggregator I run, and then googling to figure out what weirdness was going on with chef to prompt that, and finding the Motherboard article by Joseph Cox. Also credit to @shanley for, apparently, discovering and publicizing the Chef/ICE contract. And to rubygems/Evan Phoenix for transparently posting evidence they had forcibly changed gem ownership, rather than do it silently.

I probably wouldn’t have noticed at all if Vargo hadn’t made it a story by engaging in some relatively easy low-risk direct action, which is really the least any of us should do in such a situation, but Vargo deserves credit and support because so many of us engineers maybe wouldn’t have, but it’s time for us to figure out how to step up.

In some more welcome news from here in Baltimore, Johns Hopkins University/Medical Institutions is reported to have recently declined to renew some ICE contracts — including one for “tactical medical training” to agents in the Homeland Security Investigations unit of ICE, which carries out workplace raids — occurring after a Hopkins student-led but community coalition campaign of public pressure on Hopkins to stop profiting from supporting and facilitating ICE/CBP human rights violations. While the ~$1.7 million ICE contracts were relatively small money in Hopkins terms, Hopkins as an institution has previously shown itself to be quite dedicated to that same “principle” of never, ever, turning down a buck; may this breach of profiteering “principle” lead to many more.

Card Catalogs: “Paper Machines”

A book I just became aware of that I am very excited about (thanks to Jessamyn West for posting a screenshot of her ‘summer reading’ on facebook, bringing it to my attention!)

Paper Machines: About Cards & Catalogs, 1548-1929
by Krajewski PhD, Markus (Author), Peter Krapp (Translator)

Why the card catalog―a “paper machine” with rearrangeable elements―can be regarded as a precursor of the computer.

Today on almost every desk in every office sits a computer. Eighty years ago, desktops were equipped with a nonelectronic data processing machine: a card file. In Paper Machines, Markus Krajewski traces the evolution of this proto-computer of rearrangeable parts (file cards) that became ubiquitous in offices between the world wars.

The story begins with Konrad Gessner, a sixteenth-century Swiss polymath who described a new method of processing data: to cut up a sheet of handwritten notes into slips of paper, with one fact or topic per slip, and arrange as desired. In the late eighteenth century, the card catalog became the librarian’s answer to the threat of information overload. Then, at the turn of the twentieth century, business adopted the technology of the card catalog as a bookkeeping tool. Krajewski explores this conceptual development and casts the card file as a “universal paper machine” that accomplishes the basic operations of Turing’s universal discrete machine: storing, processing, and transferring data. In telling his story, Krajewski takes the reader on a number of illuminating detours, telling us, for example, that the card catalog and the numbered street address emerged at the same time in the same city (Vienna), and that Harvard University’s home-grown cataloging system grew out of a librarian’s laziness; and that Melvil Dewey (originator of the Dewey Decimal System) helped bring about the technology transfer of card files to business.

I haven’t read it yet myself.

But I’ve thought for a while about how card catalogs were pre-computer information processing systems (with some nostalgia-for-a-time-i-didn’t-experience-myself of when library science was at the forefront of practically-focused information processing system theory and practice).

And I’ve realized for a while that most of our legacy data was designed for these pre-computer information processing systems. And by “legacy” data, I mean the bulk of data we have :) MARC, AACR2, LCSH, even call number systems like DDC or LCC.

If you want to understand this data, you have to understand the systems it was designed for — their affordances and constraints, how they evolved over time — and thinking of them as information processing machines is the best way to understand it, and understand how to make use of it in the present digital environment, or how to change it to get the most benefit from the different constraints and affordances of a computerized environment.

So I can’t quite recommend the book, cause I haven’t read it myself yet — but I recommend it anyway. :)

Dealing with legacy and externally loaded code in webpack(er)

I’ve been mostly a ruby and Rails dev for a while now, and I’ve been a ‘full-stack web dev’ since that was the only kind of web dev. I’ve always been just comfortable enough in Javascript to get by — well, until recently.

The, I don’t know what you call it, “modern JS” (?) advances and (especially) tooling have left me a bit bewildered. (And I know I’m not alone there).  But lately I’ve been pulled (maybe a bit kicking and screaming) into Webpacker with Rails, because you really need modern npm-based tooling to get some JS dependencies you want — and Webpacker will be the default JS toolchain in Rails 6 (and I think you won’t have access to sprockets for JS at all by default).

If the tooling wasn’t already confusing enough — and Webpacker makes webpack a little bit easier to use with Rails-friendly conventions over configuration, but also adds another layer of indirection on understanding your tools — I frequently have to deal with projects where not all the code is managed with webpacker.

I might have dependencies to JS provided only via Rails engine gems (no npm package). I might have legacy projects where not all the code that could be transitioned to webpack(er) control has been yet. And other reasons.  So I might have some code being included via a webpack and javascript_pack_tag, but some code being included in a separate compiled JS via sprockets and javascript_include_tag, and maybe other code doing other odd things.

  • Might need webpacker-packed code that uses dependencies loaded via external mechanisms (sprockets, or a raw <script> tag to a CDN host).
  • Might need non-webpacker-packed code (ie, sprockets-managed usually) that uses a dependency that is loaded by webpacker (because npm/yarn is the best way to get a lot of JS dependencies)
  • Might have “vendored” third-party code that is old and doesn’t play well with ES6 import/export.

So I decided to take some time and understand the webpack(er) patterns and features relevant here. Some webpack documentation calls these techniques for “shimming”, but I think they are relevant for cases beyond what I would consider “shimming”. These techniques are generally available in webpack, but my configuration examples will be Webpacker, cause lack of webpacker examples was a barrier to newbie me in figuring this out.

I am not an expert in this stuff and appreciate any corrections!

“Externals” — webpack code depending on a library loaded via other means

Let’s say we load Uppy via a manual script tag to CDN (so it’s available old-style via window.Uppy after load), but have webpacker-packed code that needs to refer to it. (Why are we loading Uppy that way? I dunno, we might have Reasons, or it might just be legacy in mid-point in being migrated to webpacker).

You want to use the webpack externals feature.

In your config/webpack/environment.js: (after “const { environment } = require(‘@rails/webpacker’)” and before “module.exports = environment”)

environment.config.externals = {
  uppy: 'Uppy'
}

And now you can import Uppy from 'uppy'; in a webpacker source just like you would if Uppy was a local yarn/npm dependency.

The typical examples do this with jQuery:

  externals: {
    jquery: 'jQuery'
  }

Note: In my experimentations, I found that I can apparently just use Uppy (when it’s loaded in window.Uppy by non-webpacker sources) in my webpacker sources without doing the externals setup and the import. I’m not sure why or if this is expected, but externals seems like better practice.

Note: Every time the pack is loaded, if window.Uppy is not available when you have that external, you’ll get a complaint in console “Uncaught ReferenceError: Uppy is not defined”, and your whole JS pack won’t be loaded due to aborting on the error — this tripped me up when I was trying to conditionally load Uppy from CDN only on pages that needed it, but other pages had the pack loaded. I guess the right way to do this would be having separate pack files, and only register the externals with the pack file that actually uses Uppy.

Note: I don’t have Uppy in my package.json/yarn.lock at ALL, so I know webpacker isn’t compiling it into the pack. If I did, but for some reason still wanted to rely on it from an ‘external’ instead of compiling it into the pack, I’d want to do more investigative work to make sure it wans’t in my pack too, resulting in a double-load in the browser since it was already being loaded via CDN.

“Expose” — make a webpack(er) loaded dependency available to external-to-webpack JS

Let’s say you have openseadragon being controlled by webpacker and included in your pack. (Because how else are you going to get the dependency? The old method of creating a rails engine gem with a vendored asset, and keeping it up to date with third-party releases, is a REAL DRAG).

But let’s say the code that uses openseadragon is not controlled by webpacker and included in your pack. It’s still being managed and delivered with sprockets. (Why? Maybe it’s just one step along a migration to webpacker, in which you want to keep everything working step by step.)

So even though OpenSeadragon is being included in your pack, you want it available at window.OpenSeadragon “old-style”, so the other code that expects it there old-style can access it. This is a task for the webpack expose-loader.

You’ll need to yarn add expose-loader — it doesn’t come with webpack/webpacker by default. (You don’t seem to need any configuration to make it available to webpack, once you’ve added it to your package).

So you’ve already yarn add openseadragon-ed. Now in your config/webpack/environment.js: (after “const { environment } = require(‘@rails/webpacker’)” and before “module.exports = environment”)

environment.loaders.append('expose', {
  test: require.resolve('openseadragon'),
  use: [{
          loader: 'expose-loader',
          options: 'OpenSeadragon'
  }]
})

Now window.OpenSeadragon will be set, and available to JS sources that came from somewhere else (like sprockets) (or just accessed as OpenSeadragon in that code).

That is, as long as openseadragon is included in your pack. The “expose” loader directive alone won’t put it in your pack, and if it’s not in your pack, it can’t be exposed at window. (and webpacker won’t complain).

So if you aren’t already including it in your pack, over in (eg) your app/javascript/packs/application.js, add one of these:

// You don't need all of these lines, any one will do:
import 'openseadragon'
import OpenSeadragon from 'openseadragon'
require('openseadragon')

Now OpenSeadragon is included in your pack file, and exposed at window.OpenSeadragon for non-packed JS (say in a sprockets-compiled file) to access.

If you are loading jQuery in your pack, and want to make it available to “external” JS at both jQuery and $ to support either as ordinary JQuery, you want:

environment.loaders.append('expose', {
  test: require.resolve('jquery'),
  use: [{
    loader: 'expose-loader',
    options: 'jQuery'
  }, {
    loader: 'expose-loader',
    options: '$'
  }]
})

“Provide” — automatic “import” for legacy code that doesn’t

Let’s say you are including jQuery with webpacker, in your pack. Great!

And you have some legacy code in sprockets you want to move over to webpacker. This legacy code, as legacy code does, just refers to $ in it, expecting it to be available in window.$ . Or maybe it refers to jQuery. Or a little bit of both.

The “right” way to handle this would be to add import jQuery from 'jquery' at the top of every file as you move it into webpacker. Or maybe import $ from 'jquery'. Or if you want both… do you do two imports? I’m not totally sure.

Or, you can use the webpack ProvidePlugin to avoid having to add import statements, and have $ and jQuery still available (and their use triggering an implicit ‘import’ so jQuery is included in your pack).

In the middle of your config/webpack/environment.js:

const webpack = require('webpack');
environment.plugins.append('Provide', new webpack.ProvidePlugin({
  $: 'jquery',
  jQuery: 'jquery'
}));

Now you can just refer to $ and jQuery in a webpacker source, and it’ll just magically be as if you had imported it.

For code you control, this may be just a convenience, there ought to be a way to get it working without ProvidePlugin, with the proper import statements in every file. But maybe it’s vendored third-party code that was written for a “pre-modern” JS world, and you don’t want to to editing it. ProvidePlugin magically makes it automatically “import” what it needs without having to go adding the right ‘import’ statements everywhere.

Other WebPack plugins of note for legacy code

The ImportsLoader works very much like the ProvidePlugin above. But while the ProvidePlugin makes it magic happen globally — any file in the pack that references an “auto-imported” constant will trigger an import — the ImportsLoader lets you scope that behavior to only specific files.

That seems better overall — avoid accidentally using automatic import in some non-legacy code where you intend to be doing things “right” — but for whatever reason ImportsLoader is discussed a lot less on the web than ProvidePlugin, and I didn’t discover it until later, and I haven’t tried it out.

The ExportsLoader seems to be a way of magically getting legacy code to do ES6 exports (so they can be imported by other webpack sources), without actually having to edit the code to have exports statements. I haven’t played with it.

More Sources

In addition to the docs linked to above for each feature, there are a couple WebPack guides on ‘shimming’ that try to cover this material. I’m not sure why I found two of them and they don’t quite match in their recommendations, not sure which is more up to date. 1) Shimming in Webpack docs. 2) “Shimming modules” in webpack github wiki

My favorite blog post covering converting a sprockets-based Rails app to be webpacker-based instead is “Goodbye Sprockets. Welcome Webpacker” by Alessandro Rodi, although there are other blog posts covering the same goal written since I discovered Rodi’s.

In Rails6, you may not have sprockets available at all for managing Javascript, unless you hack (or politely just “configure”) it back in. This reddit comment claims to have instructions for doing so in Rails6, although I haven’t tried it yet (nor have I confirmed that in RC2 you indeed need it to get sprockets to handle JS; leaving this in part for myself, when I get to it). See also this diff.

Bootstrap 3 to 4: Changes in how font size, line-height, and spacing is done. Or “what happened to $line-height-computed.”

Bootstrap 4 (I am writing this in the age of 4.3.0) changes some significant things about how it handles font-size, line-height, and spacer variables in SASS.

In particular, changing font-size calculations from px units to rem units; with some implications for line-heights as handled in bootstrap; and changes to how whitespace is calculated to be in terms of font-size.

I have a custom stylesheet built on top of Bootstrap 3, and am migrating it to Bootstrap 4, and I was getting confused about what’s going on. And googling, some things are written about “Bootstrap 4” that are really about a Bootstrap 4 alpha, and in some cases things changed majorly before the final.

So I decided to just figure it out looking at the code and what docs I could find, and write it up as a learning exersize for myself, perhaps useful to others.

Bootstrap 3

In Bootstrap 3, the variable $font-size-base is the basic default font size. It defaults to 14px, and is expected to be expressed in pixel units.

CSS line-height is given to the browser as a unit-less number. MDN says “Desktop browsers (including Firefox) use a default value of roughly 1.2, depending on the element’s font-family.” Bootstrap sets the CSS line-height to a larger than ‘typical’ browser default value, having decided that is better typography at least for the default Bootstrap fonts.

In Bootstrap 3, the unit-less $line-height-base variable defaults to the unusual value of 1.428571429. This is to make it equivalent to a nice round value of “20px” for a font-size-base of 14px, when the unit-less line-height is multiplied by the font-size-base. And there is a line-height-computed value that’s defined as exactly that by default, it’s defined in terms of $line-height-base.  So line-height-base is a unit-less value you can supply to the CSS line-height property (which _scaffolding does on body), and line-height-computed is a value in pixels that should be the same size, just converted to pixels.

 

As a whitespace measure, in bootstrap 3

Bootstrap wants to make everything scale depending on font-size, so tries to define various paddings and margins based on your selected line height in pixels.

For instance, an alerts, breadcrumbs, and tables, all have a margin-bottom of $line-height-computed (default 20px, with the default 14px font size and default unit-less line-height). h1, h2, and h3 all have a margin-top of $line-height-computed.

h1, h2, and h3 all have a margin-bottom of $line-height-computed/2 (half a line heigh tin pixels; 10px by default). And ($line-height-computed / 2) is both margin-bottom and margin-top for a p tag.

You can redefine the size of your font or line-height in variables, but bootstrap 3 tries to express lots of whitespace values in terms of “the height of a line on the page in pixels” (or half of one) — which is line-height-computed, which is by default 20px.

On the other hand, other kinds of whitespace are expressed in hard-coded values, unrelated to the font-size, and only sometimes changeable by bootstrap variables either.  Often using the specific fixed values 30px and 15px.

$grid-gutter-width is set to 30px.  So is $jumbotron-padding, You can change these variables yourself, but they don’t automatically change “responsively” if you change the base font-size in $font-size-base. They aren’t expressed in terms of font-size.

A .list-group has a margin-bottom set to 20px, and a .list-group-item has a padding of 10px 15px, and there’s no way to change either of these with a bootstrap variable, they are truly hard-coded into the SCSS. (You could of course try to override them with additional CSS).

So some white-space in Bootstrap 3 does not scale proportionately when you change $font-size-baseand/or $line-height-base.

Bootstrap 4

In Bootstrap 4, the fundamental starting font-size variable is still $font-size-base, but it’s defined in terms of rem, it is by default defined to 1rem.

You can’t set $font-size-base to a value in px units, without bootstrap’s sass complaining as it tries to do things with it that are dimensionally incompatible with px. You can change it to something other than 1rem, but bootstrap 4 wants $font-size-base in rem units.

1rem means “same as the font-size value on the html element.”  Most browsers (at least most desktop browsers?) default to 16px, so it will usually by default mean 16px. But this isn’t required, and some browsers may choose other defaults.

Some users may set their browser default to something other than 16px, perhaps because they want ‘large print’. (Although you can also set default ‘zoom level’ instead in a browser; what a browser offers and how it effects rendering can differ between browsers). This is, I think, the main justification for Bootstrap changing to rem, accessibility improvements respecting browser default stylesheets.

Bootstrap docs say not much to explain the change, but I did find this:

No base font-size is declared on the <html>, but 16px is assumed (the browser default). font-size: 1rem is applied on the <body> for easy responsive type-scaling via media queries while respecting user preferences and ensuring a more accessible approach.

https://getbootstrap.com/docs/4.3/content/reboot/#page-defaults

Perhaps for these reasons of accessibility, Bootstrap itself does not define a font-size on the html element, it just takes the browser default. But in your custom stylesheet, you could insist html { font-size: 16px } to get consistent 1rem=16px regardless of browser (and possibly with accessibility concerns — although you can find a lot of people debating this if you google, and I haven’t found much that goes into detail and is actually informed by user-testing or communication with relevant communities/experts).  If you don’t do this, your bootstrap default font-size will usually be 16px, but may depend on browser, although the big ones seem to default to 16px.

(So note, Bootstrap 3 defaulted to 14px base-font-size, Bootstrap 4 defaults to what will usually be 16px). 

Likewise, when they say “responsive type-scaling via media queries”, I guess they mean that based on media queries, you could set font-size on html to something like 1.8​, meaning “1.8 times as large as ordinary browser default font-size.”  Bootstrap itself doesn’t seem to supply any examples of this, but I think it’s what it’s meant to support. (You wouldn’t want to set the font-size in px based on a media-query, if you believe respecting default browser font-size is good for accessibility).

Line-height in Bootstrap 4

The variable line-height-base is still in Bootstrap 4, and defaults to 1.5.  So in the same ballpark as Bootstrap 3’s 1.428571429, although slightly larger — Bootstrap is no longer worried about making it a round number in pixels when multiplied against a pixel-unit font-size-base.  line-height-base is still set as default line-height for body, now in _reboot.scss (_scaffolding.scss no longer exists).

$line-height-computed, which in Bootstrap 3 was “height in pixel units”, no longer exists in Bootstrap 4. In part because at CSS-writing/compile time, we can’t be sure what it will be in pixels, because it’s up to the browser’s default size.

If we assume browser default size of 16px, the “computed” line-height it’s now 24px, which is still a nice round number after all.

But by doing everything in terms of rem, it can also change based on media query of course. So if the point of Bootstrap 3 line-height-computed was often to use for whitespace and other page-size calculations based on base font-size, if we want to let base-font-size fluctuate based on a media query, we can’t know the value in terms of pixels at CSS writing time.

Bootstrap docs say:

For easier scaling across device sizes, block elements should use rems for margins.

https://getbootstrap.com/docs/4.3/content/reboot/#approach

Font-size dependent whitespace in Bootstrap 4

In Bootstrap 3, line-height-computed ) (20px for 14px base font; one line height) was often used for a margin-bottom.

In Bootstrap 4, we have a new variable $spacer that is often used. For instance, table now uses $spacer as margin bottom.  And spacer defaults to… 1rem. (Just like font-size-base1, but it’s not defined in terms of it, if you want them to match and you change one, you’d have to change the other to match).

alert and breadcrumbs both have their own new variables for margin-bottom, which also both default to: 1rem. Again not in terms of font-size-base, just happen to default to the same thing.

So one notable thing is that Bootstrap 3, as related to base font size, is putting less whitespace in margin-bottom on these elements. In Bootstrap 3, they got the line-height as margin (roughly 1.5 times the font size, 20px for a 14px font-size). In Bootstrap 4, they get 1rem which is the same as the default font-size, so in pixels that’s 16px for the default 16px font-size. Not sure why Bootstrap 4 decided to slightly reduce the separator whitespace here. 

All h1-h6 have a margin-bottom of $headings-margin-bottom, which defaults to half a $spacer. –default 1rem. (bootstrap 3 gave h1-h2 ‘double’ margin-bottom).

p uses $paragraph-margin-bottom, now in _reboot.scss. Which defaults to, you guessed it, 1rem.  (note that paragraph spacing in bootstrap 3 was ($line-height-computed / 2), half of a lot of other block element spacing. Now it’s 1rem, same as the rest).

grid-gutter-width is still in pixels, and still 30px, it is not responsive to font size.

list-groups look like the use padding rather than margin now, but it is defined in terms of rem .75rem in the vertical direction.

So a bunch of white-space separator values that used to be ‘size of line-height’ are now the (smaller) ‘size of font’ (and now expressed in rems).

If you wanted to make them bigger, the same relation to font/line-height they had in bootstrap 3, you might want to set them to 1rem * $line-height-base, or to actually respond properly to any resets to font-size-base, $font-size-base * $line-height-base. You’d have a whole bunch of variables to reset this way, as every component uses it’s own variable, which aren’t in terms of each other.

The only thing in Bootstrap 4 that still uses $font-size-base * $line-height-base (actual line height expressed in units, in this case rem units) seems to in custom_forms for custom checkbox/radio button styling. 

For your own stuff? $spacer and associated multiples

$spacer is probably a good variable to use where before you might have used $line-height-computed, for “standard vertical whitespace used most other places” — but beware it’s now equal to font-size-base, not (the larger) line-height-base.

There are additional spacing utilities, to let you get standard spaces of various sizes as margin or padding, whose values are by default defined as multiples of $spacer. I don’t believe these $spacer values are used internally to bootstrap though, even if the comments suggest they will be. Internally, bootstrap sometimes manually does things like $spacer / 2, ignoring your settings for $spacers.

If you need to do arithmetic with something expressed in rem (like $spacer), and a value expressed in pixels… you can let the browser do it with calc. calc($spacer - 15px)" actually delivered to the browser should work in any recent browser.

One more weird thing: Responsive font-sizes?

While off by default, Bootstrap gives you an option to enable “responsive font sizes”, which change themselves based on the viewport size. Not totally sure of the implications of this on whitespace defined in terms of font-size (will that end up responsive too?), it’s enough to make the head spin.

What happened to $grid-float-breakpoint in Bootstrap 4. And screen size breakpoint shift from 3 -> 4.

I have an app that customizes Bootstrap 3 stylesheets, by re-using Bootstrap variables and mixins.

My app used the Bootstrap 3 $grid-float-breakpoint and $grid-float-breakpoint-max variables in @media queries, to have ‘complex’ layout ‘collapse’ to something compact and small on a small screen.

This variable isn’t available in bootstrap 4 anymore.  This post is about Bootstrap 4.3.0, and probably applies to Bootstrap 4.0.0 final too. But googling to try to figure out changes between Bootstrap 3 and 4, I find a lot of things written for one of the Bootstrap 4 alphas, sometimes just calling it “Bootstrap 4” — and in some cases things changed pretty substantially between alphas and final. So it’s confusing, although I’m not sure if this is one of those cases. I don’t think people writing “what’s changed in Bootstrap 4” blogs about an alpha release were expecting as many changes as there were before final.

Quick answer

If in Bootstrap 3 you were doing:

// Bootstrap 3
@media(max-width: $grid-float-breakpoint-max) {
  // CSS rules
}

Then in Bootstrap 4, you want to use this mixin instead:

// Bootstrap 4
@include media-breakpoint-down(md) {
  // CSS rules
}

In in Bootstrap 3, you were doing:

// Bootstrap 3
@media (min-width: $grid-float-breakpoint) {
  // CSS rules
}  

Then in Bootstrap 4, you want to do:

@include media-breakpoint-up(lg) {
  // CSS rules
}

If you were doing anything else in Bootstrap 3 with media queries and $grid_float_breakpoint, like doing (min-width: $grid-float-breakpoint-max`) or (max-width: $grid-float-breakpoint), or doing any + 1 or - 1 yourself — you probably didn’t mean to be doing that, were doing the wrong thing, and meant to be doing one of these things. 

One of the advantage of the new mix-in style, is that it makes it a little bit more clear what you are doing, how to apply a style to “just when it’s collapsed” vs “just when it’s not collapsed”.

What’s going on

Bootstrap 3

In Bootstrap 3,   there is a variable `$grid-float-breakpoint`, documented in comments as “Point at which the navbar becomes uncollapsed.”  It is by default set to equal the Bootstrap 3 variable `$screen-sm-min` — so we have an uncollapsed navbar at “sm” screen size and above, and a collapsed navbar at smaller than ‘sm’ screen size.  screen-sm-min in Bootstrap 3 defaults to 768px. 

For convenience, there was also a $grid-float-breakpoint-max, documented as “Point at which the navbar begins collapsing” — which is a bit confusing to my programmer brain, it’s more accurate to say it’s the largest size at which the navbar is uncollapsed. (I would say it begins collapsing at $grid-float-breakpoint, one higher than $grid-float-breakpoint-max).

$grid-float-breakpoint-maxis defined as ($grid-float-breakpoint - 1) to make that so. So, yeah, $grid-float-breakpoint-max is confusingly one pixel less than $grid-float-breakpoint — kind of easy to get confused.

While documented as applying to the navbar, it was also used in default Bootstrap 3 styles in at least one other place, dropdown.scss,  where I don’t totally understand what it’s doing, but is somehow changing alignment to something suitable for ‘small screen’ at the same place navbars break — smaller than ‘screen-sm’.

If you wanted to change the point of ‘breakdown’ navbars, dropdowns, and anything else you may have re-used this variable for — you could just reset the $grid-float-breakpoint variable, it would now be unrelated to $screen-sm size. Or you could reset $screen-sm size. In either case, the change is now global to all navbars, dropdowns, etc.

Bootstrap 4

In Bootstrap 4, instead of just one breakpoint for navbar collapsing, hard-coded at the screen-sm boundary, you can choose to have your navbar break at any of bootstrap’s screen size boundaries, using classes ‘.navbar-expand-sm’, ‘.navbar-expand-lg’, etc. ‘navbar-expand-sm’. You can now choose different breakpoints for different navbars using the same stylesheet, so long as they correspond to one of the bootstrap defined breakpoints.

‘.navbar-expand-sm` means “be expanded at size ‘sm’ and above’, collapsed below that.”

If you don’t put any ‘.navbar-expand-*’ class on your navar — it will always be collapsed, always have the ‘hamburger’ button, no matter how small the screen size.

And instead of all dropdowns breaking at the same point as all navbars at ‘grid-float-break, there are similar differently-sized responsive classes for dropdowns.  (I still don’t entirely understand how dropdowns change at their breakpoint, have to experiment).

In support of bootstrap’s own code creating all these breakpoints for navbars and dropdowns, there is a new set of breakpoint utility mixins.  These also handily make explicit in their names “do you want this size and smaller” or “do you want this size and larger”, to try to avoid the easy “off by one” errors using Bootstrap 3 variables, where a variable name sometimes left it confusing whether it was the high-end of (eg) md or the low-end of md.

You can also use these utility mixins yourself of course!  breakpoint-min(md) will be the lowest value in pixels that is still “md” size. breakpoint-min(xs) will return sass null value (which often converts to an empty string), because “xs” goes all the way to 0.

breakpoint-max(md) will return a value with px units, that is the largest pixel value that’s within “md” size. breakpoint-max(xl) will return null/””, because “xl” has no max value, it goes all the way up to infinity.

Or you can use the mixins that generate the actual media queries you want, like media-breakpoint-up(sm) (size “sm” and up), or media-breakpoint-down(md) (size ‘md’ and down). Or even the handy media-breakpoint-between(sm, lg) (small to large, inclusive; does not include xs or xl.)

Some Bootstrap 4 components still have breakpoints hard-coded to a certain responsive size, rather than the flexible array of responsive breakpoint classes. For instance a card has a collapse breakpoint at the bottom of ‘sm’ size, and there’s no built-in way to choose a different collapse breakpoint.  Note how the Bootstrap source uses the media-breakpoint-up utility to style the ‘card’ collapse breakpoint.

Bootstrap 4 responsive sizes shift by one from Bootstrap 3!

To make things more confusing, ‘sm’ in bootstrap 3 is actually ‘md’ in bootstrap 4.

  • Added a new sm grid tier below 768px for more granular control. We now have xs, sm, md, lg, and xl. This also means every tier has been bumped up one level (so .col-md-6 in v3 is now .col-lg-6 in v4)

https://getbootstrap.com/docs/4.0/migration/

In Bootstrap 3, ‘sm’ began at 768px. In Bootstrap 4, it’s md that by default begins at 768px. And there’s a new ‘sm’ inserted below 768 — in Bootstrap 4 sm by default begins at 576px. 

So that’s why to get the equivalent of Bootstrap 3(max-width: $grid-float-breakpoint-max), where $grid-float-breakpoint was defined  based on “screen-sm-min” in Bootstrap 3 (smaller than ‘sm’) — in bootstrap 4 we need to use md instead — media-breakpoint-down(md).

Customizing breakpoints in Bootstrap 4

The responsive size breakpoints in bootstrap 4 are defined in a SASS ‘map’ variable called grid-breakpoints. You can change these breakpoints, taking some care to mutate the ‘map;’ without removing default values, if you that is your goal.

If you change them there, you will change all the relevant breakpoints, including the grid utility classes like col-lg-2, as well as the collapse points for responsive classes for navbars and dropdowns. If you change the sm breakpoint, you’ll change the collapse breakpoint for card for instance too.

There’s no way to only change the navbar/dropdown collapse breakpoint, as you could in Bootstrap 3 with $grid-float-breakpoint. On the other hand, you can at least hypothetically (I haven’t tried it or seen it documented) add additional breakpoints if you want, maybe you want something in between md and large, called, uh, I don’t know what you’d call it that wouldn’t be confusing. But in theory all the responsive utilities should work with it, the various built-in *-md-* etc classes should now be joined by classes for your new one (since the built-in ones are generated dynamically), etc. I don’t know if this is really a good idea.

Blacklight 7: current_user or other request context in SearchBuilder solr query builder

In Blacklight, the “SearchBuilder” is an object responsible for creating a Solr query. A template is generated into your app for customization, and you can write a kind of “plugin” to customize how the query is generated.

You might need some “request context” to do this. One common example is the current_user, for various kinds of axis control. For instance, to hide certain objects from returning in Solr query depending on user’s permissions, or perhaps to keep certain Solr fields from searched (in qf or pf params) unless a user is authorized to see/search them.

The way you can do this changed between Blacklight 6 and Blacklight 7. The way to do it in Blacklight 7.1 is relatively straightforward, but I’m not sure if it’s documented, so I’ll explain it here. (Anyone wanting to try to update the blacklight-access_controls or hydra-access-controls gems to work with Blacklight 7 will need to know this).

I was going to start by describing how this worked in Blacklight 6… but I realized I didn’t understand it, and got lost figuring it out. So we’ll skip that. But I believe that in BL 6, controllers interacted directly with a SearchBuilder. I can also say that the way a SearchBuilder got “context” like a current_user in BL6 and previous was a bit ad hoc and messy, without a clear API, and had evolved over time in a kind of “legacy” way.

Blacklight 7 introduces a new abstraction, the somewhat generically named “search service”, normally an instance of Blacklight::SearchService. (I don’t think this is mentioned in the BL 7 Release Notes, but is a somewhat significant architectural change that can break things trying to hook into BL).

Now, controllers don’t interact with the SearchBuilder, but with a “search service”, which itself instantiates and uses a SearchBuilder “under the hood”. In Blacklight 7.0, there was no good way to get “context” to the SearchBuilder, but 7.1.0.alpha has a feature that’s pretty easy to use.

In your CatalogController, define a search_service_context method which returns a hash of whatever context you need available:

class CatalogController < ApplicationController
  include Blacklight::Catalog

  def search_service_context
    { current_user: current_user }
  end

# ...
end

OK, now the Blacklight code will automatically add that to the "search service" context. But how does your SearchBuilder get it?

Turns out, in Blacklight 7, the somewhat confusingly named scope attribute in a SearchBuilder will hold the acting SearchService instance, so in a search builder or mix-in to a search_builder…

def some_search_builder_method
  if scope.context[:current_user]
    # we have a current_user!
  end
end

And that’s pretty much it.

I believe in BL 7, the scope attribute in a SearchBuilder will always be a “search service”, perhaps it would make sense to alias it as “search_service”. To avoid the somewhat ugly scope.context[:current_user], you could put a method in your SearchBuilder that covers that as current_user, but that would introduce some coupling between that method existing in SearchBuilder, and a SearchBuilder extension that needs to use it, so I didn’t go that route.

For a PR in our local app that supplies a very simple local SearchBuilder extension, puts it into use, and makes the current_user available in a context, see this PR. 

A terrible Github UI — accidentally shadow a tag with a branch

So we generally like to tag our releases in git, like v1.0.0 or what have you.

Github Web UI has a “tag/branch switcher” widget, which lets you look at a particular branch or tag in the Web UI.

Screenshot 2019-04-30 12.27.17

You can see it has separate tabs for “branches” and “tags”. Let’s say you get confused, and type “v1.0.0” (a tag) while the “branches” tab is selected (under the text box).

Screenshot 2019-04-30 12.30.55

It found no auto-complete for “v1.0.0” in “branches” (although there is a tag with that name it would have found if “tags” tab had been selected), and it “helpfully” offers to create a branch with that name.

Now, if you do that, you’re going to have a new branch, created off master, with the same name as a tag. Which is going to be really confusing. And not what you wanted.

Maybe your muscle memory makes your fingers hit “enter” and you wind up there — but at least it is very clearly identified, it says in fairly big and bold text “Create branch: v1.0.0 (from master)”, at least it warned you, although it’d be easy to miss in a hurry from muscle memory thinking you know what you’re doing.

That’s not the really evil UI yet.

Now let’s go to git’s “compare” UI. At https://github.com/someorg/someproject/compare

A fairly common thing I at least want to do is look at the compare between two releases, or from last release to master. But the ‘compare’ UI doesn’t have the tabs, it will only list or auto-complete from branches.

Screenshot 2019-04-30 12.34.55

In a hurry, going from muscle memory, you type in “v1.0.0” anyway.

Screenshot 2019-04-30 12.35.37

It does say “nothing to show”. But “v1.0.0” shows up in the list anyway. With a pretty obscure icon I’ve never seen before. Do you know what that icon means? It turns out, apparently, it means “Create branch: v1.0.0 (from master)”.

If confused, or in a hurry, or with your muscle memory outpacing your brain, you click on that line — that’s what happens.

Now you’ve got a branch called “v1.0.0”, created off current master, along with a tag “v1.0.0” pointing at a different SHA.  Because many UI’s treat branches and tags somewhat interchangeably, this is confusing. If you do a git checkout v1.0.0, are you going to get the branch or the tag?

It turns out if you go to a github compare UI, like `https://github.com/someorg/someproject/compare/v1.0.0..master`, Github is going to compare the new branch you accidentally made, not the existing tag (showing nothing in the diff, if master hasn’t changed yet). There is no no way to get Github to compare the tag. If you didn’t realize exactly what you did, you’re going to be awfully confused about what the heck is going on.

You’re going to need to figure it out, and delete the branch you just made, which it turns out you can do from the command line with the confusing and dangerous command: ` git push origin :refs/heads/v1.0.0`

And that’s how I lost a couple hours to figuring out “what the heck is going on here?”

What should you do if you want github ‘compare’ web UI for a tag rather than a branch? Turns out, as far as I know, you just need to manually enter the the URL https://github.com/org/project/compare/v1.0.0..v1.0.1 or what have you. The actual UI widgets will not get you there. They’ll just get you to a mess.

Am I missing something? That seems like github web UI is not only not providing for what I would think is a pretty common use (comparing tags), but leading you down a path to disaster when you look for it, no?