Escaping/encoding URI components in ruby 3.2

Thanks to zverok_kha’s awesome writeup of Ruby changes, I noticed a new method released in ruby 3.2: CGI.escapeURIComponent

This is the right thing to use if you have an arbitrary string that might include characters not legal in a URI/URL, and you want to include it as a path component or part of the query string:

require 'cgi'

url = "https://example.com/some/#{ CGI.escapeURIComponent path_component }" + 
  "?#{CGI.escapeURIComponent my_key}=#{CGI.escapeURIComponent my_value}"
  • The docs helpfully refer us to RFC3986, a rare citation in the wild world of confusing and vaguely-described implementations of escaping (to various different standards and mistakes) for URLs and/or HTML
  • This will escape / as %2F, meaning you can use it to embed a string with / in it inside a path component, for better or worse
  • This will escape a space ( ) as %20, which is correct and legal in either a query string or a path component
  • There is also a reversing method available CGI.unescapeURIComponent

What if I am running on a ruby previous to 3.2?

Two things in standard library probably do the equivalent thing. First:

require 'cgi'
CGI.escape(input).gsub("+", "%20")

CGI escape but take the +s it encodes space characters into, and gsub them into the more correct %20. This will not be as performant because of the gsub, but it works.

This, I noticed once a while ago, is what ruby aws-sdk does… well, except it also unescapes %7E back to ~, which does not need to be escaped in a URI. But… generally… it is fine to percent-encode ~ as %7E. Or copy what aws-sdk does, hoping they actually got it right to be equivalent?

Or you can use:

require 'erb'
ERB::Util.url_encode(input)

But it’s kind of weird to have to require the ERB templating library just for URI escaping. (and would I be shocked if ruby team moves erb from “default gem” to “bundled gem”, or further? Causing you more headache down the road? I would not). (btw, ERB::Util.url_encode leaves ~ alone!)

Do both of these things do exactly the same thing as CGI.escapeURIComponent? I can’t say for sure, see discussion of CGI.escape and ~ above. Sure is confusing. (there would be a way to figure it out, take all the chars in various relevant classes in the RFC spec and test them against these different methods. I haven’t done it yet).

What about URI.escape?

In old code I encounter, I often see places using URI.escape to prepare URI query string values…

# don't do this, don't use URI.escape
url = "https://example.com?key=#{ URI.escape value }"

# not this either, don't use URI.escape
url = "https://example.com?" + 
   query_hash.collect { |k, v| "#{URI.escape k}=#{URI.escape v}"}.join("&")

This was never quite right, in that URI.escape was a huge mess… intending to let you pass in whole URLs that were not legal URLs in that they had some illegal characters that needed escaping, and it would somehow parse them and then escape the parts that needed escaping… this is a fool’s errand and not something it’s possible to do in a clear consistent and correct way.

But… it worked out okay because the output of URI.escape overlapped enough with (the new RFC 3986-based) CGI.escapeURIComponent that it mostly (or maybe even always?) worked out. URI.escape did not escape a /… but it turns out / is probably actually legal in a query string value anyway, it’s optional to escape it to %2F in a query string? I think?

And people used it in this scenario, I’d guess, because it’s name made it sound like the right thing? Hey, I want to escape something to put it in a URI, right? And then other people copied from code they say, etc.

But URI.escape was an unpredictable bad idea from the start, and was deprecated by ruby, then removed entirely in ruby 3.0!

When it went away, it was a bit confusing to figure out what to replace it with. Because if you asked, sometimes people would say “it was broken and wrong, there is nothing to replace it”, which is technically true… but the code escaping things for inclusion in, eg, query strings, still had to do that… and then the “correct” behavior for this actually only existed in the ruby stdlib in the erb module (?!?) (where few had noticed it before URI.escape went away)… and CGI.escapeURIComponent which is really what you wanted didn’t exist yet?

Why is this so confusing and weird?

Why was this functionality in ruby stdlib non-existent/tucked away? Why are there so many slightly different implementations of “uri escaping”?

Escaping is always a confusing topic in my experience — and a very very confusing thing to debug when it goes wrong.

The long history of escaping in URLs and HTML is even more confusing. Like, turning a space into a + was specified for application/x-www-form-urlencoded format (for encoding an HTML form as a string for use as a POST body)… and people then started using it in url query strings… but I think possibly that was never legal, or perhaps the specifications were incomplete/inconsistent on it.

But it was so commonly done that most things receiving URLs would treat a literal + as an encode space… and then some standards were retroactively changed to allow it for compatibility with common practice…. maybe. I’m not even sure I have this right.

And then, as with the history of the web in general, there have been a progression of standards slightly altering this behavior, leapfrogging with actual common practice, where technically illegal things became common and accepted, and then standards tried to cope… and real world developers had trouble underestanding there might be different rules for legal characters/escaping in HTML vs URIs vs application/x-www-form-urlencoded strings vs HTTP headers…. and then language stdlib implementers (including but not limited to ruby) implemented things with various understandings acccording to various RFCs (or none, or buggy), documented only with words like “Escapes the string, replacing all unsafe characters with codes.” (unsafe according to what standard? For what purpose?)

PHEW.

It being so confusing, lots of people haven’t gotten it right — I swear that AWS S3 uses different rules for how to refer to spaces in filenames than AWS MediaConvert does, such that I couldn’t figure out how to get AWS MediaConvert to actually input files stored on S3 with spaces in them, and had to just make sure to not use spaces in filenames on S3 destined for MediaConvert. But maybe I was confused! But honestly I’ve found it’s best to avoid spaces in filenames on S3 in general, because S3 docs and implementation can get so confusing and maybe inconsistent/buggy on how/when/where they are escaped. Because like we’re saying…

Escaping is always confusing, and URI escaping is really confusing.

Which is I guess why the ruby stdlib didn’t actually have a clearly labelled provided-with-this-intention way to escape things for use as a URI component until ruby 3.2?

Just use CGI.escapeURIComponent in ruby 3.2+, please.

What about using the Addressable gem?

When the horrible URI.escape disappeared and people that had been wrongly using it to escape strings for use as URI components needed some replacement and the ruby stdlib was confusing (maybe they hadn’t noticed ERB::Util.url_encode or weren’t confident it did the right thing and gee I wonder why not), some people turned to the addressable gem.

This gem for dealing with URLs does provide ways to escape strings for use in URLs… it actually provides two different algorithms depending on whether you want to use something in a path component or a query component.

require 'addressable'

Addressable::URI.encode_component(query_param_value, Addressable::URI::CharacterClasses::QUERY)

Addressable::URI.encode_component(path_component, Addressable::URI::CharacterClasses::PATH)

Note Addressable::URI::CharacterClasses::QUERY vs Addressable::URI::CharacterClasses::PATH? Two different routines? (Both by the way escape a space to %20 not +).

I think that while some things need to be escaped in (eg) a path component and don’t need to be in a query component, the specs also allow some things that don’t need to be escaped to be escaped in both places, such that you can write an algorithm that produces legally escaped strings for both places, which I think is what CGI.escapeURIComponentis. Hopefully we’re in good hands.

On Addressable, neither the QUERY nor PATH variant escapes /, but CGI.escapeURIComponent does escape it to %2F. PHEW.

You can also call Addressable::URI.encode_component with no second arg, in which case it seems to escape CharacterClasses::RESERVED + CharacterClasses::UNRESERVED from this list. Whereas PATH is, it looks like there, equivalent to UNRESERVED with SOME of RESERVED (SUB_DELIMS but only some of GENERAL_DELIMS), and QUERY is just path plus ? as needing escaping…. (CGI.escapeURIComponent btw WILL escape ? to %3F).

PHEW, right?

Anyhow

Anyhow, just use CGI.escapeURIComponent to… escape your URI components, just like it says on the lid.

Thanks to /u/f9ae8221b for writing it and answering some of my probably annoying questions on reddit and github.

attr_json 2.0 release: ActiveRecord attributes backed by JSON column

attr_json is a gem to provide attributes in ActiveRecord that are serialized to a JSON column, usually postgres jsonb, multiple attributes in a json hash. In a way that can be treated as much as possible like any other “ordinary” (database column) ActiveRecord.

It supports arrays and nested models as hashes, and the embedded nested models can also be treated much as an ordinary “associated” record — for instance CI build tests with cocoon , and I’ve had a report that it works well with stimulus nested forms, but I don’t currently know how to use those. (PR welcome for a test in build?)

An example:

# An embedded model, if desired
class LangAndValue
  include AttrJson::Model

  attr_json :lang, :string, default: "en"
  attr_json :value, :string
end

class MyModel < ActiveRecord::Base
   include AttrJson::Record

   # use any ActiveModel::Type types: string, integer, decimal (BigDecimal),
   # float, datetime, boolean.
   attr_json :my_int_array, :integer, array: true
   attr_json :my_datetime, :datetime

   attr_json :embedded_lang_and_val, LangAndValue.to_type
end

model = MyModel.create!(
  my_int_array: ["101", 2], # it'll cast like ActiveRecord
  my_datetime: DateTime.new(2001,2,3,4,5,6),
  embedded_lang_and_val: LangAndValue.new(value: "a sentence in default language english")
)

By default it will serialize attr_json attributes to a json_attributes column (this can also be specified differently), and the above would be serialized like so:

{
  "my_int_array": [101, 2],
  "my_datetime": "2001-02-03T04:05:06Z",
  "embedded_lang_and_val": {
    "lang": "en",
    "value": "a sentence in default language english"
  }
}

Oh, attr_json also supports some built-in construction of postgres jsonb contains (“@>“) queries, with proper rails type-casting, through embedded models with keypaths:

MyModel.jsonb_contains(
  my_datetime: Date.today,
  "embedded_lang_and_val.lang" => "de"
) # an ActiveRelation, you can chain on whatever as usual

And it supports in-place mutations of the nested models, which I believe is important for them to work “naturally” as ruby objects.

my_model.embedded_lang_and_val.lang = "de"
my_model.embedded_lang_and_val_change 
# => will correctly return changes in terms of models themselves
my_model.save!

There are some other gems in this “space” of ActiveRecord attribute json serialization, with different fits for different use cases, created either before or after I created attr_json — but none provide quite this combination of features — or, I think, have architectures that make this combination feasible (I could be wrong!). Some to compare are jsonb_accessor, store_attribute, and store_model.

One use case where I think attr_json really excels is when using Rails Single-Table Inheritance, where different sub-classes may have different attributes.

And especially for a “content management system” type of use case, where on top of that single-table inheritance polymorphism, you can have complex hierarchical data structures, in an inheritance hierarchichy, where you don’t actually want or need the complexity of an actual normalized rdbms schema for the data that has both some polymorphism and some hetereogeneity. We get some aspects of a schema-less json-document-store, but embedded in postgres, without giving up rdbms features or ordinary ActiveRecord affordances.

Slow cadence, stability and maintainability

While the 2.0 release includes a few backwards incompats, it really should be an easy upgrade for most if not everyone. And it comes three and a half years after the 1.0 release. That’s a pretty good run.

Generally, I try to really prioritize backwards compatibility and maintainability, doing my best to avoid anything that could provide backwards incompat between major releases, and trying to keep major releases infrequent. I think that’s done well here.

I know that management of rails “plugin” dependencies can end up a nightmare, and I feel good about avoiding this with attr_json.

attr_json was actually originally developed for Rails 4.2 (!!), and has kept working all the way to Rails 7. The last attr_json 1.x release actually supported (in same codebase) Rails 5.0 through Rails 7.0 (!), and attr_json 2.0 supports 6.0 through 7.0. (also grateful to the quality and stability of the rails attributes API originally created by sgrif).

I think this succesfully makes maintenance easier for downstream users of attr_json, while also demonstrating success at prioritizing maintainability of attr_json itself — it hasn’t needed a whole lot of work on my end to keep working across Rails releases. Occasionally changes to the test harness are needed when a new Rails version comes out, but I actually can’t think of any changes needed to implementation itself for new Rails versions, although there may have been a few.

Because, yeah, it is true that this is still basically a one-maintainer project. But I’m pleased it has successfully gotten some traction from other users — 390 github “stars” is respectable if not huge, with occasional Issues and PR’s from third parties. I think this is a testament to it’s stability and reliability, rather than to any (almost non-existent) marketing I’ve done.

“Slow code”?

In working on this and other projects, I’ve come to think of a way of working on software that might be called “slow code”. To really get stability and backwards compatibility over time, one needs to be very careful about what one introduces into the codebase in the first place. And very careful about getting the fundamental architectural design of the code solid in the first place — coming up with something that is parsimonious (few architectural “concepts”) and consistent and coherent, but can handle what you will want to throw at it.

This sometimes leads me to holding back on satisfying feature requests, even if they come with pull requests, even if it seems like “not that much code” — if I’m not confident it can fit into the architecture in a consistent way. It’s a trade-off.

I realize that in many contemporary software development environments, it’s not always possible to work this way. I think it’s a kind of software craftsmanship for shared “library” code (mostly open source) that… I’m not sure how much our field/industry accomnodates development with (and the development of) this kind of craftsmanship these days. I appreciate working for a non-profit academic institute that lets me develop open source code in a context where I am given the space to attend to it with this kind of care.

The 2.0 Release

There aren’t actually any huge changes in the 2.0 release, mostly it just keeps on keeping on.

Mostly, 2.0 tries to make things adhere even closer and more consistently to what is expected of Rails attributes.

The “Attributes” API was still brand new in Rails 4.2 when this project started, but now that it has shown itself solid and mature, we can always create a “cover” Rails attribute in the ActiveRecord model, instead of making it “optional” as attr_json originally did. Which provides for some code simplification.

Some rough edges were sanded involved making Time/Date attributes timezone-aware in the way Rails usually does transparently. And with some underlying Rails bugs/inconsistencies having been long-fixed in Rails, they can now store miliseconds in JSON serialization rather than just whole seconds too.

I try to keep a good CHANGELOG, which you can consult for more.

The 2.0 release is expected to be a very easy migration for anyone on 1.x. If anyone on 1.x finds it challenging, please get in touch in a github issue or discussion, I’d like to make it easier for you if I can.

For my Library-Archives-Museums Rails people….

The original motivation from this came from trying to move off samvera (nee hydra) sufia/hyrax to an architecutre that was more “Rails-like”. But realizing that the way we wanted to model our data in a digital collections app along the lines of sufia/hyrax, would be rather too complicated to do with a reasonably normalized rdbms schema.

So… can we model things in the database in JSON — similar to how valkyrie-postgres would actually model things in postgres — but while maintaining an otherwise “Rails-like” development architecture? The answer: attr_json.

So, you could say the main original use case for attr_json was to persist a “PCDM“-ish data model ala sufia/hyrax, those kinds of use cases, in an rdbms, in a way that supported performant SQL queries (minimal queries per page, avoiding n+1 queries), in a Rails app using standard Rails tools and conventions, without an enormously complex expansive normalized rdbms schema.

While the effort to base hyrax on valkyrie is still ongoing, in order to allow postgres vs fedora (vs other possible future stores) to be a swappable choice in the same architecture — I know at least some institutions (like those of the original valkyrie authors) are using valkyrie in homegrown app directly, as the main persistence API (instead of ActiveRecord).

In some sense, valkyrie-postgres (in a custom app) vs attr-json (in a custom app) are two paths to “step off” the hyrax-fedora architecture. They both result in similar things actually stored in your rdbms (and we both chose postgres, for similar reasons, including I think good support for json(b)). They have both have advantages and disadvantages. Valkyrie-postgres kind of intentionally chooses not to use ActiveRecord (at least not in controllers/views etc, not in your business logic), one advantage of such is to get around some of the known widely-commented upon deficiencies and complaints with Rails standard ActiveRecord architecture.

Whereas I followed a different path with attr_json — how can we store things in postgres similarly, but while still using ActiveRecord in a very standard Rails way — how can we make it as standard a Rails way as possible? This maintains the disadvantages people sometimes complain about Rails architecture, but with the benefit of sticking to the standard Rails ecosystem, having less “custom community” stuff to maintain or figure out (including fewer lines of code in attr-json), being more familiar or accessible to Rails-experienced or trained developers.

At least that’s the idea, and several years later, I think it’s still working out pretty well.

In addition to attr_json, I wrote a layer on top to provide some parts on top of attr_json, that I thought would be both common and somewhat tricky in writing a pcdm/hyrax-ish digital collections app as “standard Rails as much as it makes sense”. This is kithe and it hasn’t had very much uptake. The only other user I’m aware of (who is using only a portion of what kithe provides; but kithe means to provide for that as a use case) is Eric Larson at https://github.com/geobtaa/geomg.

However, meanwhile, attr_json itself has gotten quite a bit more uptake — from wider Rails developer community, not our library-museum-archives community. attr_json’s 390 github stars isn’t that big in the wider world of things, but it’s pretty big for our corner of the world. (Compare to 160 for hyrax or 721 for blacklight). That the people using attr_json, and submitting Issues or Pull Requests largely aren’t library-museum-archives developers, I consider positive and encouraging, that it’s escaped the cultural-heritage-rails bubble, and is meeting a more domain-independent or domain-neutral need, at a lower level of architecture, with a broader potential community.

A tiny donation to rubyland.news would mean a lot

I started rubyland.news in 2016 because it was a thing I wanted to see for the ruby community. I had been feeling a shrinking of the ruby open source collaborative community, it felt like the room was emptying out.

If you find value in Rubyland News, just a few dollars contribution on my Github Sponsors page would be so appreciated.

I wanted to make people writing about ruby and what they were doing with it visible to each other and to the community, in order to try to (re)build/preserve/strengthen a self-conception as a community, connect people to each other, provide entry to newcomers, and just make it easier to find ruby news.

I’ve been solely responsible for its development, and editorial and technical operations. I think it’s been a success. I don’t have analytics, but it seems to be somewhat known and used.

Rubyland.news has never been a commercial project. I have never tried to “monetize” it. I don’t even really highlight my personal involvement much. I have in the past occasionally had modest paid sponsorship barely enough to cover expenses, but decided it wasn’t worth the effort.

I have and would never provide any kind of paid content placement, because I think that would be counter to my aims and values — I have had offers, specifically asking for paid placement not labelled as such, because apparently this is how the world works now, but I would consider that an unethical violation of trust.

It’s purely a labor or love, in attempted service to the ruby community, building what I want to see in the world as an offering of mutual aid.

So why am I asking for money?

The operations of Rubyland News don’t cost much, but they do cost something. A bit more since Heroku eliminated free dynos.

I currently pay for it out of my pocket, and mostly always have modulo occasional periods of tiny sponsorship. My pockets are doing just fine, but I do work for an academic non-profit, so despite being a software engineer the modest expenses are noticeable.

Sure, I could run it somewhere cheaper than heroku (and eventually might have to) — but I’m doing all this in my spare time, I don’t want to spend an iota more time or psychic energy on (to me) boring operational concerns than I need to. (But if you want to volunteer to take care of setting up, managing, and paying for deployment and operations on another platform, get in touch! Or if you are another platform that wants to host rubyland news for free!)

It would be nice to not have to pay for Rubyland News out of my pocket. But also, some donations would, as much as be monetarily helpful, also help motivate me to keep putting energy into this, showing me that the project really does have value to the community.

I’m not looking to make serious cash here. If I were able to get just $20-$40/month in donations, that would about pay my expenses (after taxes, cause I’d declare if i were getting that much), I’d be overjoyed. Even 5 monthly sustainers at just $1 would really mean a lot to me, as a demonstration of support.

You can donate one-time or monthly on my Github Sponsors page. The suggested levels are $1 and $5.

(If you don’t want to donate or can’t spare the cash, but do want to send me an email telling me about your use of rubyland news, I would love that too! I really don’t get much feedback! jonathan at rubyland.news)

Thanks

  • Thanks to anyone who donates anything at all
  • also to anyone who sends me a note to tell me that they value Rubyland News (seriously, I get virtually no feedback — telling me things you’d like to be better/different is seriously appreciated too! Or things you like about how it is now. I do this to serve the community, and appreciate feedback and suggestions!)
  • To anyone who reads Rubyland News at all
  • To anyone who blogs about ruby, especially if you have an RSS feed, especially if you are doing it as a hobbyist/community-member for purposes other than business leads!
  • To my current single monthly github sponsor, for $1, who shall remain unnamed because they listed their sponsorship as private
  • To anyone contributing in their own way to any part of open source communities for reasons other than profit, sometimes without much recognition, to help create free culture that isn’t just about exploiting each other!

vite-ruby for JS/CSS asset management in Rails

I recently switched to vite and vite-ruby for managing my JS and CSS assets in Rails. I was switching from a combination of Webpacker and sprockets — I moved all of my Webpacker and most of my sprockets to vite.

  • Note that vite-ruby has smooth ready-made integrations for Padrino, Hanami, and jekyll too, and possibly hook points for integrations with arbitrary ruby, plus could always just use vite without vite-ruby — but I’m using vite-ruby with Rails.

I am finding it generally pretty agreeble, so I thought I’d write up some of the things I like about it for others. And a few other notes.

I am definitely definitely not an expert in Javascript build systems (or JS generally), which both defines me as an audience for build tools, but also means I don’t always know how these things might compare with other options. The main other option I was considering was jsbundling-rails with esbuild and cssbundling-rails with SASS, but I didn’t get very far into the weeds of checking those out.

I moved almost all my JS and (S)CSS into being managed/built by vite.

My context

I work on a monolith “full stack” Rails application, with a small two-developer team.

I do not do any very fancy Javascript — this is not React or Vue or anything like that. It’s honestly pretty much “JQuery-style” (although increasingly I try to do it without jquery itself using just native browser API, it’s still pretty much that style).

Nonetheless, I have accumulated non-trivial Javascript/NPM dependencies, including things like video.js , @shoppify/draggable, fontawesome (v4), openseadragon. I need package management and I need building.

I also need something dirt simple. I don’t really know what I’m doing with JS, my stack may seem really old-fashioned, but here it is. Webpacker had always been a pain, I started using it to have something to manage and build NPM packages, but was still mid-stream in trying to switch all my sprockets JS over to webpacker when it was announced webpacker was no longer recommended/maintained by Rails. My CSS was still in sprockets all along.

Vite

One thing to know about vite is that it’s based on the idea of using different methods in dev vs production to build/serve your JS (and other managed assets). In “dev”, you ordinarily run a “vite server” which serves individual JS files, whereas for production you “build” more combined files.

Vite is basically an integration that puts together tools like esbuild and (in production) rollup, as well as integrating optional components like sass — making them all just work. It intends to be simple and provide a really good developer experience where doing simple best practice things is simple and needs little configuration.

vite-ruby tries to make that “just works” developer experience as good as Rubyists expect when used with ruby too — it intends to integrate with Rails as well as webpacker did, just doing the right thing for Rails.

Things I am enjoying with vite-ruby and Rails

  • You don’t need to run a dev server (like you do with jsbundling-rails and css-bundling rails)
    • If you don’t run the vite dev server, you’ll wind up with auto-built vite on-demand as needed, same as webpacker basically did.
    • This can be slow, but it works and is awesome for things like CI without having to configure or set up anything. If there have been no changes to your source, it is not slow, as it doesn’t need to re-build.
    • If you do want to run the dev server for much faster build times, hot module reload, better error messages, etc, vite-ruby makes it easy, just run ./bin/vite dev in a terminal.
  • If you DO run the dev server — you have only ONE dev-server to run, that will handle both JS and CSS
    • I’m honestly really trying to avoid the foreman approach taken by jsbundling-rails/cssbundling-rails, because of how it makes accessing the interactive debugger at a breakpoint much more complicated. Maybe with only one dev server (that is optional), I can handle running it manually without a procfile.
  • Handling SASS and other CSS with the same tool as JS is pretty great generally — you can even @import CSS from a javascript file, and also @import plain CSS too to aggregate into a single file server-side (without sass). With no non-default configuration, it just works, and will spit out stylesheet <link> tags, and it means your css/sass is going through the same processing whether you import it from .js or .css.
    • I handle fontawesome 4 this way. Include "font-awesome": "^4.7.0" in my package.json, then @import "font-awesome/css/font-awesome.css"; just works, and from either a .js or a .css file. It actually spits out not only the fontawesome CSS file, but also all the font files referenced from it and included in the npm package, in a way that just works. Amazing!!
    • Note how you can reference things from NPM packages with just package name. On google for some tools you find people doing contortions involving specifically referencing node-modules, I’m not sure if you really have to do this with latest versions of other tools but you def don’t with vite, it just works.
  • in general, I really appreciate vite’s clear opinionated guidance and focus on developer experience. Understanding all the options from the docs is not as hard because there are fewer options, but it does everything I need it to. vite-ruby succesfully carries this into ruby/Rails, it’s documentation is really good, without being enormous. In Rails, it just does what you want, automatically.
  • Vite supports source maps for SASS!
    • Not currently on by default, you have to add a simple config.
    • Unfortunately sass sourcemaps are NOT supported in production build mode, only in dev server mode. (I think I found a ticket for this, but can’t find it now)
    • But that’s still better than the official Rails options? I don’t understand how anyone develops SCSS without sourcemaps!
      • But even though sprockets 4.x finally supported JS sourcemaps, it does not work for SCSS! Even though there is an 18-month-old PR to fix it, it goes unreviewed by Rails core and unmerged.
      • Possibly even more suprisingly, SASS sourcemaps doesn’t seem to work for the newer cssbundling-rails=>sass solution either. https://github.com/rails/cssbundling-rails/issues/68
      • Previous to this switch, I was still using sprockets old-style “comments injected into CSS built files with original source file/line number” — that worked. But to give that up, and not get working scss sourcemaps in return? I think that would have been a blocker for me against cssbundling-rails/sass anyway… I feel like there’s something I’m missing, because I don’t understand how anyone is developing sass that way.

  • If you want to split up your js into several built files (“chunks), I love how easy it is. It just works. Vite/rollup will do it for you automatically for any dynamic runtime imports, which it also supports, just write import with parens, inside a callback or whatever, just works.

Things to be aware of

  • vite and vite-ruby by default will not create .gz variants of built JS and CSS
    • Depending on your deploy environment, this may not matter, maybe you have a CDN or nginx that will automatically create a gzip and cache it.
    • But in eg default heroku Rails deploy, it really really does. Default Heroku deploy uses the Rails app itself to deliver your assets. The Rails app will deliver content-encoding gzip if it’s there. If it’s not… when you switch to vite from webpacker/sprockets, you may now delivering uncommpressed JS and CSS with no other changes to your environment, with non-trivial performance implications but ones you may not notice.
    • Yeah, you could probably configure your CDN you hopefully have in front of your heroku app static assets to gzip for you, but you may not have noticed.
    • Fortunately it’s pretty easy to configure
  • There are some vite NPM packages involved (vite itself as well as some vite-ruby plugins), as well as the vite-ruby gem, and you have to keep them up to date in sync. You don’t want to be using a new version of vite NPM packages with too-old gem, or vice versa. (This is kind of a challenge in general with ruby gems with accompanying npm packages)
    • But vite_ruby actually includes a utility to check this on boot and complain if they’ve gotten out of sync! As well as tools for syncing them! Sweet!
    • But that can be a bit confusing sometimes if you’re running CI after an accidentally-out-of-sync upgrade, and all your tests are now failing with the failed sync check. But no big deal.

Things I like less

  • vite-ruby itself doesn’t seem to have a CHANGELOG or release notes, which I don’t love.
  • Vite is a newer tool written for modern JS, it mostly does not support CommonJS/node require, preferring modern import. In some cases that I can’t totally explain require in dependencies seems to work anyway… but something related to this stuff made it apparently impossible for me to import an old not-very-maintained dependency I had been importing fine in Webpacker. (I don’t know how it would have done with jsbundling-rails/esbuild). So all is not roses.

Am I worried that this is a third-party integration not blessed by Rails?

The vite-ruby maintainer ElMassimo is doing an amazing job. It is currently very well-maintained software, with frequent releases, quick turnaround from bug report to release, and ElMassimo is very repsonsive in github discussions.

But it looks like it is just one person maintaining. We know how open source goes. Am I worried that in the future some release of Rails might break vite-ruby in some way, and there won’t be a maintainer to fix it?

I mean… a bit? But let’s face it… Rails officially blessed solutions haven’t seemed very well-maintained for years now either! The three year gap of abandonware between the first sprockets 4.x beta and final release, followed by more radio silence? The fact that for a couple years before webpacker was officially retired it seemed to be getting no maintainance, including requiring dependency versions with CVE’s that just stayed that way? Not much documentation (ie Rails Guide) support for webpacker ever, or jsbundling-rails still?

One would think it might be a new leaf with css/jsbundling-rails… but I am still baffled by there being no support for sass sourcemaps in cssbundling-rails and sass! Official rails support doesn’t necessarily get you much “just works” DX when it comes to asset handling for years now.

Let’s face it, this has been an area where being in the Rails github org and/or being blessed by Rails docs has been no particular reason to expect maintenance or expect you won’t have problems down the line anyway. it’s open source, nobody owes you anything, maintainers spend time on what they have interest to spend time on (including time to review/merge/maintain other’s PR’s — which is def non-trivial time!) — it just is what it is.

While the vite-ruby code provides a pretty great integrated into Rails DX, its also actually mostly pretty simple code, especially when it comes to the Rails touch points most at risk of Rails breaking — it’s not doing anything too convoluted.

So, you know, you take your chances, I feel good about my chances compared to a css/jsbundling-rails solution. And if someday I have to switch things over again, oh well — Rails just pulled webpacker out from under us quicker than expected too, so you take your chances regardless!


(thanks to colleague Anna Headley for first suggesting we take a look at vite in Rails!)

Rails7 connection.select_all is stricter about it’s arguments in backwards incompat way: TypeError: Can’t Cast Array

I have code that wanted to execute some raw SQL against an ActiveRecord database. It is complicated and weird multi-table SQL (involving a postgres recursive CTE), so none of the specific-model-based API for specifying SQL seemed appropriate. It also needed to take some parameters, that needed to be properly escaped/sanitized.

At some point I decided that the right way to do this was with Model.connection.select_all , which would create a parameterized prepared statement.

Was I right? Is there a better way to do this? The method is briefly mentioned in the Rails Guide (demonstrating it is public API!), but without many details about the arguments. It has very limited API docs, just doc’d as: select_all(arel, name = nil, binds = [], preparable: nil, async: false), “Returns an ActiveRecord::Result instance.” No explanation of the type or semantics of the arguments.

In my code working on Rails previous to 7, the call looked like:

MyModel.connection.select_all(
  "select complicated_stuff WHERE something = $1",
  "my_complicated_stuff_name",
  [[nil, value_for_dollar_one_sub]],
  preparable: true
)
  • yeah that value for the binds is weird, a duple-array within an array, where the first value of the duple-array is just nil? This isn’t documented anywhere, I probably got that from somewhere… maybe one of the several StackOverflow answers.
  • I honestly don’t know what preparable: true does, or what difference it makes.

In Rails 7.0, this started failing with the error: TypeError: can’t cast Array.

I couldn’t find any documentation of that select_all all method at all, or other discussion of this; I couldn’t find any select_all change mentioned in the Rails Changelog. I tried looking at actual code history but got lost. I’m guessing “can’t cast Array” referes to that weird binds value… but what is it supposed to be?

Eventually I thought to look for Rails tests of this method that used the binds argument, and managed to eventually find one!

So… okay, rewrote that with new binds argument like so:

bind = ActiveRecord::Relation::QueryAttribute.new(
  "something", 
  value_for_dollar_one_sub, 
  ActiveRecord::Type::Value.new
)

MyModel.connection.select_all(
  "select complicated_stuff WHERE something = $1",
  "my_complicated_stuff_name",
  [bind],
  preparable: true
)
  • Confirmed this worked not only in Rails 7, but all the way back to Rails 5.2 no problem.
  • I guess that way I was doing it previously was some legacy way of passing args that was finally removed in Rails 7?
  • I still don’t really understand what I’m doing. The first arg to ActiveRecord::Relation::QueryAttribute.new I made match the SQL column it was going to be compared against, but I don’t know if it matters or if it’s used for anything. The third argument appears to be an ActiveRecord Type… I just left it the generic ActiveRecord::Type::Value.new, which seemed to work fine for both integer or string values, not sure in what cases you’d want to use a specific type value here, or what it would do.
  • In general, I wonder if there’s a better way for me to be doing what I’m doing here? It’s odd to me that nobody else findable on the internet has run into this… even though there are stackoverflow answers suggesting this approach… maybe i’m doing it wrong?

But anyways, since this was pretty hard to debug, hard to find in docs or explanations on google, and I found no mention at all of this changing/breaking in Rails 7… I figured I’d write it up so someone else had the chance of hitting on this answer.

Github Action setup-ruby needs to quote ‘3.0’ or will end up with ruby 3.1

You may be running builds in Github Actions using the setup-ruby action to install a chosen version of ruby, looking something like this:

    - name: Set up Ruby
      uses: ruby/setup-ruby@v1
      with:
        ruby-version: 3.0

A week ago, that would have installed the latest ruby 3.0.x. But as of the christmas release of ruby 3.1, it will install the latest ruby 3.1.x.

The workaround and/or correction is to quote the ruby version number. If you actually want to get latest ruby 3.0.x, say:

      with:
        ruby-version: '3.0'

This is reported here, with reference to this issue on the Github Actions runner itself. It is not clear to me that this is any kind of a bug in the github actions runner, rather than just an unanticipated consequence of using a numeric value in YAML here. 3.0 is of course the same number as 3, it’s not obvious to me it’s a bug that the YAML parser treats them as such.

Perhaps it’s a bug or mis-design in the setup-ruby action. But in lieu of any developers deciding it’s a bug… quote your 3.0 version number, or perhaps just quote all ruby version numbers with the setup-ruby task?

If your 3.0 builds started failing and you have no idea why — this could be it. It can be a bit confusing to diagnose, because I’m not sure anything in the Github Actions output will normally echo the ruby version in use? I guess there’s a clue in the “Installing Bundler” sub-head of the “Setup Ruby” task:

Of course it’s possible your build will succeed anyway on ruby 3.1 even if you meant to run it on ruby 3.0! Mine failed with LoadError: cannot load such file -- net/smtp, so if yours happened to do the same, maybe you got here from google. :) (Clearly net/smtp has been moved to a different status of standard gem in ruby 3.1, I’m not dealing with this further becuase I wasn’t intentionally supporting ruby 3.1 yet).

Note that if you are building with a Github actions matrix for ruby version, the same issue applies. Maybe something like:

matrix:
        include:
          - ruby: '3.0' 
 steps:
    - uses: actions/checkout@v2

    - name: Set up Ruby
      uses: ruby/setup-ruby@v1
      with:
        ruby-version: ${{ matrix.ruby }}

Notes on retrying all jobs with ActiveJob retry_on

I would like to configure all my ActiveJobs to retry on failure, and I’d like to do so with the ActiveJob retry_on method.

So I’m going to configure it in my ApplicationJob class, in order to retry on any error, maybe something like:

class ApplicationJob < ActiveJob::Base
  retry_on StandardError # other args to be discussed
end

Why use ActiveJob retry_on for this? Why StandardError?

Many people use backend-specific logic for retries, especially with Sidekiq. That’s fine!

I like the idea of using the ActiveJob functionality:

  • I currently use resque (more on challenges with retry here later), but plan to switch to something else at some point medium-term. Maybe sideqkiq, but maybe delayed_job or good_job. (Just using the DB and not having a redis is attractive to me, as is open source). I like the idea of not having to redo this setup when I switch back-ends, or am trying out different ones.
  • In general, I like the promise of ActiveJob as swappable commoditized backends
  • I like what I see as good_job’s philosophy here, why have every back-end reinvent the wheel when a feature can be done at the ActiveJob level? That can help keep the individual back-end smaller, and less “expensive” to maintain. good_job encourages you to use ActiveJob retries I think.

Note, dhh is on record from 2018 saying he thinks setting up retries for all StandardError is a bad idea. But I don’t really understand why! He says “You should know why you’d want to retry, and the code should document that knowledge.” — but the fact that so many ActiveJob back-ends provide “retry all jobs” functionality makes it seem to me an established common need and best practice, and why shouldn’t you be able to do it with ActiveJob alone?

dhh thinks ActiveJob retry is for specific targetted retries maybe, and the backend retry should be used for generic universal ones? Honestly I don’t see myself doing much specific targetted retries, making all your jobs idempotent (important! Best practice for ActiveJob always!), and just having them all retry on any error seems to me to be the way to go, a more efficient use of developer time and sufficient for at least a relatively simple app.

One situation I have where a retry is crucial, is when I have a fairly long-running job (say it takes more than 60 seconds to run; I have some unavoidably!), and the machine running the jobs needs to restart. It might interrupt the job. It is convenient if it is just automatically retried — put back in the queue to be run again by restarted or other job worker hosts! Otherwise it’s just sitting there failed, never to run again, requiring manual action. An automatic retry will take care of it almost invisibly.

Resque and Resque Scheduler

Resque by default doens’t supprot future-scheduled jobs. You can add them with the resque-scheduler plugin. But I had a perhaps irrational desire to avoid this — resque and it’s ecosystem have at different times had different amounts of maintenance/abandonment, and I’m (perhaps irrationally) reluctant to complexify my resque stack.

And do I need future scheduling for retries? For my most important use cases, it’s totally fine if I retry just once, immediately, with a wait: 0. Sure, that won’t take care of all potential use cases, but it’s a good start.

I thought even without resque supporting future-scheduling, i could get away with:

retry_on StandardError, wait: 0

Alas, this won’t actually work, it still ends up being converted to a future-schedule call, which gets rejected by the resque_adapter bundled with Rails unless you have resque-scheduler installed.

But of course, resque can handle wait:0 semantically, if the code was willing to do it by queing an ordinary resque job…. I don’t know if it’s a good idea, but this simple patch to Rails-bundled resque_adapter will make it willing to accept “scheduled” jobs when the time to be scheduled is actually “now”, just scheduling them normally, while still raising on attempts to future schedule. For me, it makes retry_on.... wait: 0 work with just plain resque.

Note: retry_on attempts count includes first run

So wanting to retry just once, I tried something like this:

# Will never actually retry
retry_on StandardError, attempts: 1

My job was never actually retried this way! It looks like the attempts count includes the first non-error run, the total number of times job will be run, including the very first one before any “retries”! So attempts 1 means “never retry” and does nothing. Oops. If you actually want to retry only once, in my Rails 6.1 app this is what did it for me:

# will actually retry once
retry_on StandardError, attempts: 2

(I think this means the default, attempts: 5 actually means your job can be run a total of 5 times– one original time and 4 retries. I guess that’s what was intended?)

Note: job_id stays the same through retries, hooray

By the way, I checked, and at least in Rails 6.1, the ActiveJob#job_id stays the same on retries. If the job runs once and is retried twice more, it’ll have the same job_id each time, you’ll see three Performing lines in your logs, with the same job_id.

Phew! I think that’s the right thing to do, so we can easily correlate these as retries of the same jobs in our logs. And if we’re keeping the job_id somewhere to check back and see if it succeeded or failed or whatever, it stays consistent on retry.

Glad this is what ActiveJob is doing!

Logging isn’t great, but can be customized

Rails will automatically log retries with a line that looks like this:

Retrying TestFailureJob in 0 seconds, due to a RuntimeError.
# logged at `info` level

Eventually when it decides it’s attempts are exhausted, it’ll say something like:

Stopped retrying TestFailureJob due to a RuntimeError, which reoccurred on 2 attempts.
# logged at `error` level

This does not include the job-id though, which makes it harder than it should be to correlate with other log lines about this job, and follow the job’s whole course through your log file.

It’s also inconsistent with other default ActiveJob log lines, which include:

  • the Job ID in text
  • tags (Rails tagged logging system) with the job id and the string "[ActiveJob]". Because of the way the Rails code applies these only around perform/enqueue, retry/discard related log lines apparently end up not included.
  • The Exception message not just the class when there’s a class.

You can see all the built-in ActiveJob logging in the nicely compact ActiveJob::LogSubscriber class. And you can see how the log line for retry is kind of inconsistent with eg perform.

Maybe this inconsistency has persisted so long in part because few people actually use ActiveJob retry, they’re all still using their backends backend-specific functionality? I did try a PR to Rails for at least consistent formatting (my PR doesn’t do tagging), not sure if it will go anywhere, I think blind PR’s to Rails usually do not.

In the meantime, after trying a bunch of different things, I think I figured out the reasonable way to use the ActiveSupport::Notifications/LogSubscriber API to customize logging for the retry-related events while leaving it untouched from Rails for the others? See my solution here.

(Thanks to BigBinary blog for showing up in google and giving me a head start into figuring out how ActiveJob retry logging was working.)

(note: There’s also this: https://github.com/armandmgt/lograge_active_job But I’m not sure how working/maintained it is. It seems to only customize activejob exception reports, not retry and other events. It would be an interesting project to make an up-to-date activejob-lograge that applied to ALL ActiveJob logging, expressing every event as key/values and using lograge formatter settings to output. I think we see exactly how we’d do that, with a custom log subscriber as we’ve done above!)

Warning: ApplicationJob configuration won’t work for emails

You might think since we configured retry_on on ApplicationJob, all our bg jobs are now set up for retrying.

Oops! Not deliver_later emails.

Good_job README explains that ActiveJob mailers don’t descend from ApplicationMailer. (I am curious if there’s any good reason for this, it seems like it would be nice if they did!)

The good_job README provides one way to configure the built-in Rails mailer superclass for retries.

You could maybe also try setting delivery_job on that mailer superclass to use a custom delivery job (thanks again BigBinary for the pointer)… maybe one that subclasses the default class to deliver emails as normal, but let you set some custom options like retry_on? Not sure if this would be preferable in any way.

logging URI query params with lograge

The lograge gem for taming Rails logs by default will lot the path component of the URI, but leave out the query string/query params.

For instance, perhaps you have a URL to your app /search?q=libraries.

lograge will log something like:

method=GET path=/search format=html

The q=libraries part is completely left out of the log. I kinda want that part, it’s important.

The lograge README provides instructions for “logging request parameters”, by way of the params hash.

I’m going to modify them a bit slightly to:

  • use the more recent custom_payload config instead of custom_options. (I’m not certain why there are both, but I think mostly for legacy reasons and newer custom_payload? is what you should read for?)
  • If we just put params in there, then a bunch of ugly <ActionController::Parameters show up in the log if you have nested hash params. We could fix that with params.to_unsafe_h, but…
  • We should really use request.filtered_parameters instead to make sure we’re not logging anything that’s been filtered out with Rails 6 config.filter_parameters. (Thanks /u/ezekg on reddit). This also converts to an ordinary hash that isn’t ActionController::Parameters, taking care of previous bullet point.
  • (It kind of seems like lograge README could use a PR updating it?)
  config.lograge.custom_payload do |controller|
    exceptions = %w(controller action format id)
    params: controller.request.filtered_parameters.except(*exceptions)
  end

That gets us a log line that might look something like this:

method=GET path=/search format=html controller=SearchController action=index status=200 duration=107.66 view=87.32 db=29.00 params={"q"=>"foo"}

OK. The params hash isn’t exactly the same as the query string, it can include things not in the URL query string (like controller and action, that we have to strip above, among others), and it can in some cases omit things that are in the query string. It just depends on your routing and other configuration and logic.

The params hash itself is what default rails logs… but what if we just log the actual URL query string instead? Benefits:

  • it’s easier to search the logs for actually an exact specific known URL (which can get more complicated like /search?q=foo&range%5Byear_facet_isim%5D%5Bbegin%5D=4&source=foo or something). Which is something I sometimes want to do, say I got a URL reported from an error tracking service and now I want to find that exact line in the log.
  • I actually like having the exact actual URL (well, starting from path) in the logs.
  • It’s a lot simpler, we don’t need to filter out controller/action/format/id etc.
  • It’s actually a bit more concise? And part of what I’m dealing with in general using lograge is trying to reduce my bytes of logfile for papertrail!

Drawbacks?

  • if you had some kind of structured log search (I don’t at present, but I guess could with papertrail features by switching to json format?), it might be easier to do something like “find a /search with q=foo and source=ef without worrying about other params)
  • To the extent that params hash can include things not in the actual url, is that important to log like that?
  • ….?

Curious what other people think… am I crazy for wanting the actual URL in there, not the params hash?

At any rate, it’s pretty easy to do. Note we use filtered_path rather than fullpath to again take account of Rails 6 parameter filtering, and thanks again /u/ezekg:

  config.lograge.custom_payload do |controller|
    {
      path: controller.request.filtered_path
    }
  end

This is actually overwriting the default path to be one that has the query string too:

method=GET path=/search?q=libraries format=html ...

You could of course add a different key fullpath instead, if you wanted to keep path as it is, perhaps for easier collation in some kind of log analyzing system that wants to group things by same path invariant of query string.

I’m gonna try this out!

Meanwhile, on lograge…

As long as we’re talking about lograge…. based on commit history, history of Issues and Pull Requests… the fact that CI isn’t currently running (travis.org grr) and doesn’t even try to test on Rails 6.0+ (although lograge seems to work fine)… one might worry that lograge is currently un/under-maintained…. No comment on a GH issue filed in May asking about project status.

It still seems to be one of the more popular solutions to trying to tame Rails kind of out of control logs. It’s mentioned for instance in docs from papertrail and honeybadger, and many many other blog posts.

What will it’s future be?

Looking around for other possibilties, I found semantic_logger (rails_semantic_logger). It’s got similar features. It seems to be much more maintained. It’s got a respectable number of github stars, although not nearly as many as lograge, and it’s not featured in blogs and third-party platform docs nearly as much.

It’s also a bit more sophisticated and featureful. For better or worse. For instance mainly I’m thinking of how it tries to improve app performance by moving logging to a background thread. This is neat… and also can lead to a whole new class of bug, mysterious warning, or configuration burden.

For now I’m sticking to the more popular lograge, but I wish it had CI up that was testing with Rails 6.1, at least!

Incidentally, trying to get Rails to log more compactly like both lograge and rails_semantic_logger do… is somewhat more complicated than you might expect, as demonstrated by the code in both projects that does it! Especially semantic_logger is hundreds of lines of somewhat baroque code split accross several files. A refactor of logging around Rails 5 (I think?) to use ActiveSupport::LogSubscriber made it possible to customize Rails logging like this (although I think both lograge and rails_semantic_logger still do some monkey-patching too!), but in the end didn’t make it all that easy or obvious or future-proof. This may discourage too many other alternatives for the initial primary use case of both lograge and rails_semantic_logger — turn a rails action into one log line, with a structured format.

Notes on Cloudfront in front of Rails Assets on Heroku, with CORS

Heroku really recommends using a CDN in front of your Rails app static assets — which, unlike in non-heroku circumstances where a web server like nginx might be taking care of it, otherwise on heroku static assets will be served directly by your Rails app, consuming limited/expensive dyno resources.

After evaluating a variety of options (including some heroku add-ons), I decided AWS Cloudfront made the most sense for us — simple enough, cheap, and we are already using other direct AWS services (including S3 and SES).

While heroku has an article on using Cloudfront, which even covers Rails specifically, and even CORS issues specifically, I found it a bit too vague to get me all the way there. And while there are lots of blog posts you can find on this topic, I found many of them outdated (Rails has introduced new API; Cloudfront has also changed it’s configuration options!), or otherwise spotty/thin.

So while I’m not an expert on this stuff, i’m going to tell you what I was able to discover, and what I did to set up Cloudfront as a CDN in front of Rails static assets running on heroku — although there’s really nothing at all specific to heroku here, if you have any other context where Rails is directly serving assets in production.

First how I set up Rails, then Cloudfront, then some notes and concerns. Btw, you might not need to care about CORS here, but one reason you might is if you are serving any fonts (including font-awesome or other icon fonts!) from Rails static assets.

Rails setup

In config/environments/production.rb

# set heroku config var RAILS_ASSET_HOST to your cloudfront
# hostname, will look like `xxxxxxxx.cloudfront.net`
config.asset_host = ENV['RAILS_ASSET_HOST']

config.public_file_server.headers = {
  # CORS:
  'Access-Control-Allow-Origin' => "*", 
  # tell Cloudfront to cache a long time:
  'Cache-Control' => 'public, max-age=31536000' 
}

Cloudfront Setup

I changed some things from default. The only one that absolutely necessary — if you want CORS to work — seemed to be changing Allowed HTTP Methods to include OPTIONS.

Click on “Create Distribution”. All defaults except:

  • Origin Domain Name: your heroku app host like app-name.herokuapp.com
  • Origin protocol policy: Switch to “HTTPS Only”. Seems like a good idea to ensure secure traffic between cloudfront and origin, no?
  • Allowed HTTP Methods: Switch to GET, HEAD, OPTIONS. In my experimentation, necessary for CORS from a browser to work — which AWS docs also suggest.
  • Cached HTTP Methods: Click “OPTIONS” too now that we’re allowing it, I don’t see any reason not to?
  • Compress objects automatically: yes
    • Sprockets is creating .gz versions of all your assets, but they’re going to be completely ignored in a Cloudfront setup either way. ☹️ (Is there a way to tell Sprockets to stop doing it? WHO KNOWS not me, it’s so hard to figure out how to reliably talk to Sprockets). But we can get what it was trying to do by having Cloudfront encrypt stuff for us, seems like a good idea, Google PageSpeed will like it, etc?
    • I noticed by experimentation that Cloudfront will compress CSS and JS (sometimes with brotli sometimes gz, even with the same browser, don’t know how it decides, don’t care), but is smart enough not to bother trying to compress a .jpg or .png (which already has internal compression).
  • Comment field: If there’s a way to edit it after you create the distribution, I haven’t found it, so pick a good one!

Notes on CORS

AWS docs here and here suggest for CORS support you also need to configure the Cloudfront distribution to forward additional headers — Origin, and possibly Access-Control-Request-Headers and Access-Control-Request-Method. Which you can do by setting up a custom “cache policy”. Or maybe instead by by setting the “Origin Request Policy”. Or maybe instead by setting custom cache header settings differently using the Use legacy cache settings option. It got confusing — and none of these settings seemed to be necessary to me for CORS to be working fine, nor could I see any of these settings making any difference in CloudFront behavior or what headers were included in responses.

Maybe they would matter more if I were trying to use a more specific Access-Control-Allow-Origin than just setting it to *? But about that….

If you set Access-Control-Allow-Origin to a single host, MDN docs say you have to also return a Vary: Origin header. Easy enough to add that to your Rails config.public_file_server.headers. But I couldn’t get Cloudfront to forward/return this Vary header with it’s responses. Trying all manner of cache policy settings, referring to AWS’s quite confusing documentation on the Vary header in Cloudfront and trying to do what it said — couldn’t get it to happen.

And what if you actually need more than one allowed origin? Per spec Access-Control-Allow-Origin as again explained by MDN, you can’t just include more than one in the header, the header is only allowed one: ” If the server supports clients from multiple origins, it must return the origin for the specific client making the request.” And you can’t do that with Rails static/global config.public_file_server.headers, we’d need to use and setup rack-cors instead, or something else.

So I just said, eh, * is probably just fine. I don’t think it actually involves any security issues for rails static assets to do this? I think it’s probably what everyone else is doing?

The only setup I needed for this to work was setting Cloudfront to allow OPTIONS HTTP method, and setting Rails config.public_file_server.headers to include 'Cache-Control' => 'public, max-age=31536000'.

Notes on Cache-Control max-age

A lot of the existing guides don’t have you setting config.public_file_server.headers to include 'Cache-Control' => 'public, max-age=31536000'.

But without this, will Cloudfront actually be caching at all? If with every single request to cloudfront, cloudfront makes a request to the Rails app for the asset and just proxies it — we’re not really getting much of the point of using Cloudfront in the first place, to avoid the traffic to our app!

Well, it turns out yes, Cloudfront will cache anyway. Maybe because of the Cloudfront Default TTL setting? My Default TTL was left at the Cloudfront default, 86400 seconds (one day). So I’d think that maybe Cloudfront would be caching resources for a day when I’m not supplying any Cache-Control or Expires headers?

In my observation, it was actually caching for less than this though. Maybe an hour? (Want to know if it’s caching or not? Look at headers returned by Cloudfront. One easy way to do this? curl -IXGET https://whatever.cloudfront.net/my/asset.jpg, you’ll see a header either x-cache: Miss from cloudfront or x-cache: Hit from cloudfront).

Of course, Cloudfront doesn’t promise to cache for as long as it’s allowed to, it can evict things for it’s own reasons/policies before then, so maybe that’s all that’s going on.

Still, Rails assets are fingerprinted, so they are cacheable forever, so why not tell Cloudfront that? Maybe more importantly, if Rails isn’t returning a Cache-Cobntrol header, then Cloudfront isn’t either to actual user-agents, which means they won’t know they can cache the response in their own caches, and they’ll keep requesting/checking it on every reload too, which is not great for your far too large CSS and JS application files!

So, I think it’s probably a great idea to set the far-future Cache-Control header with config.public_file_server.headers as I’ve done above. We tell Cloudfront it can cache for the max-allowed-by-spec one year, and this also (I checked) gets Cloudfront to forward the header on to user-agents who will also know they can cache.

Note on limiting Cloudfront Distribution to just static assets?

The CloudFront distribution created above will actually proxy/cache our entire Rails app, you could access dynamic actions through it too. That’s not what we intend it for, our app won’t generate any URLs to it that way, but someone could.

Is that a problem?

I don’t know?

Some blog posts try to suggest limiting it only being willing to proxy/cache static assets instead, but this is actually a pain to do for a couple reasons:

  1. Cloudfront has changed their configuration for “path patterns” since many blog posts were written (unless you are using “legacy cache settings” options), such that I’m confused about how to do it at all, if there’s a way to get a distribution to stop caching/proxying/serving anything but a given path pattern anymore?
  2. Modern Rails with webpacker has static assets at both /assets and /packs, so you’d need two path patterns, making it even more confusing. (Why Rails why? Why aren’t packs just at public/assets/packs so all static assets are still under /assets?)

I just gave up on figuring this out and figured it isn’t really a problem that Cloudfront is willing to proxy/cache/serve things I am not intending for it? Is it? I hope?

Note on Rails asset_path helper and asset_host

You may have realized that Rails has both asset_path and asset_url helpers for linking to an asset. (And similar helpers with dashes instead of underscores in sass, and probably different implementations, via sass-rails)

Normally asset_path returns a relative URL without a host, and asset_url returns a URL with a hostname in it. Since using an external asset_host requires we include the host with all URLs for assets to properly target CDN… you might think you have to stop using asset_path anywhere and just use asset_urlYou would be wrong.

It turns out if config.asset_host is set, asset_path starts including the host too. So everything is fine using asset_path. Not sure if at that point it’s a synonym for asset_url? I think not entirely, because I think in fact once I set config.asset_host, some of my uses of asset_url actually started erroring and failing tests? And I had to actually only use asset_path? In ways I don’t really understand what’s going on and can’t explain it?

Ah, Rails.