vite-ruby for JS/CSS asset management in Rails

I recently switched to vite and vite-ruby for managing my JS and CSS assets in Rails. I was switching from a combination of Webpacker and sprockets — I moved all of my Webpacker and most of my sprockets to vite.

  • Note that vite-ruby has smooth ready-made integrations for Padrino, Hanami, and jekyll too, and possibly hook points for integrations with arbitrary ruby, plus could always just use vite without vite-ruby — but I’m using vite-ruby with Rails.

I am finding it generally pretty agreeble, so I thought I’d write up some of the things I like about it for others. And a few other notes.

I am definitely definitely not an expert in Javascript build systems (or JS generally), which both defines me as an audience for build tools, but also means I don’t always know how these things might compare with other options. The main other option I was considering was jsbundling-rails with esbuild and cssbundling-rails with SASS, but I didn’t get very far into the weeds of checking those out.

I moved almost all my JS and (S)CSS into being managed/built by vite.

My context

I work on a monolith “full stack” Rails application, with a small two-developer team.

I do not do any very fancy Javascript — this is not React or Vue or anything like that. It’s honestly pretty much “JQuery-style” (although increasingly I try to do it without jquery itself using just native browser API, it’s still pretty much that style).

Nonetheless, I have accumulated non-trivial Javascript/NPM dependencies, including things like video.js , @shoppify/draggable, fontawesome (v4), openseadragon. I need package management and I need building.

I also need something dirt simple. I don’t really know what I’m doing with JS, my stack may seem really old-fashioned, but here it is. Webpacker had always been a pain, I started using it to have something to manage and build NPM packages, but was still mid-stream in trying to switch all my sprockets JS over to webpacker when it was announced webpacker was no longer recommended/maintained by Rails. My CSS was still in sprockets all along.

Vite

One thing to know about vite is that it’s based on the idea of using different methods in dev vs production to build/serve your JS (and other managed assets). In “dev”, you ordinarily run a “vite server” which serves individual JS files, whereas for production you “build” more combined files.

Vite is basically an integration that puts together tools like esbuild and (in production) rollup, as well as integrating optional components like sass — making them all just work. It intends to be simple and provide a really good developer experience where doing simple best practice things is simple and needs little configuration.

vite-ruby tries to make that “just works” developer experience as good as Rubyists expect when used with ruby too — it intends to integrate with Rails as well as webpacker did, just doing the right thing for Rails.

Things I am enjoying with vite-ruby and Rails

  • You don’t need to run a dev server (like you do with jsbundling-rails and css-bundling rails)
    • If you don’t run the vite dev server, you’ll wind up with auto-built vite on-demand as needed, same as webpacker basically did.
    • This can be slow, but it works and is awesome for things like CI without having to configure or set up anything. If there have been no changes to your source, it is not slow, as it doesn’t need to re-build.
    • If you do want to run the dev server for much faster build times, hot module reload, better error messages, etc, vite-ruby makes it easy, just run ./bin/vite dev in a terminal.
  • If you DO run the dev server — you have only ONE dev-server to run, that will handle both JS and CSS
    • I’m honestly really trying to avoid the foreman approach taken by jsbundling-rails/cssbundling-rails, because of how it makes accessing the interactive debugger at a breakpoint much more complicated. Maybe with only one dev server (that is optional), I can handle running it manually without a procfile.
  • Handling SASS and other CSS with the same tool as JS is pretty great generally — you can even @import CSS from a javascript file, and also @import plain CSS too to aggregate into a single file server-side (without sass). With no non-default configuration, it just works, and will spit out stylesheet <link> tags, and it means your css/sass is going through the same processing whether you import it from .js or .css.
    • I handle fontawesome 4 this way. Include "font-awesome": "^4.7.0" in my package.json, then @import "font-awesome/css/font-awesome.css"; just works, and from either a .js or a .css file. It actually spits out not only the fontawesome CSS file, but also all the font files referenced from it and included in the npm package, in a way that just works. Amazing!!
    • Note how you can reference things from NPM packages with just package name. On google for some tools you find people doing contortions involving specifically referencing node-modules, I’m not sure if you really have to do this with latest versions of other tools but you def don’t with vite, it just works.
  • in general, I really appreciate vite’s clear opinionated guidance and focus on developer experience. Understanding all the options from the docs is not as hard because there are fewer options, but it does everything I need it to. vite-ruby succesfully carries this into ruby/Rails, it’s documentation is really good, without being enormous. In Rails, it just does what you want, automatically.
  • Vite supports source maps for SASS!
    • Not currently on by default, you have to add a simple config.
    • Unfortunately sass sourcemaps are NOT supported in production build mode, only in dev server mode. (I think I found a ticket for this, but can’t find it now)
    • But that’s still better than the official Rails options? I don’t understand how anyone develops SCSS without sourcemaps!
      • But even though sprockets 4.x finally supported JS sourcemaps, it does not work for SCSS! Even though there is an 18-month-old PR to fix it, it goes unreviewed by Rails core and unmerged.
      • Possibly even more suprisingly, SASS sourcemaps doesn’t seem to work for the newer cssbundling-rails=>sass solution either. https://github.com/rails/cssbundling-rails/issues/68
      • Previous to this switch, I was still using sprockets old-style “comments injected into CSS built files with original source file/line number” — that worked. But to give that up, and not get working scss sourcemaps in return? I think that would have been a blocker for me against cssbundling-rails/sass anyway… I feel like there’s something I’m missing, because I don’t understand how anyone is developing sass that way.

  • If you want to split up your js into several built files (“chunks), I love how easy it is. It just works. Vite/rollup will do it for you automatically for any dynamic runtime imports, which it also supports, just write import with parens, inside a callback or whatever, just works.

Things to be aware of

  • vite and vite-ruby by default will not create .gz variants of built JS and CSS
    • Depending on your deploy environment, this may not matter, maybe you have a CDN or nginx that will automatically create a gzip and cache it.
    • But in eg default heroku Rails deploy, it really really does. Default Heroku deploy uses the Rails app itself to deliver your assets. The Rails app will deliver content-encoding gzip if it’s there. If it’s not… when you switch to vite from webpacker/sprockets, you may now delivering uncommpressed JS and CSS with no other changes to your environment, with non-trivial performance implications but ones you may not notice.
    • Yeah, you could probably configure your CDN you hopefully have in front of your heroku app static assets to gzip for you, but you may not have noticed.
    • Fortunately it’s pretty easy to configure
  • There are some vite NPM packages involved (vite itself as well as some vite-ruby plugins), as well as the vite-ruby gem, and you have to keep them up to date in sync. You don’t want to be using a new version of vite NPM packages with too-old gem, or vice versa. (This is kind of a challenge in general with ruby gems with accompanying npm packages)
    • But vite_ruby actually includes a utility to check this on boot and complain if they’ve gotten out of sync! As well as tools for syncing them! Sweet!
    • But that can be a bit confusing sometimes if you’re running CI after an accidentally-out-of-sync upgrade, and all your tests are now failing with the failed sync check. But no big deal.

Things I like less

  • vite-ruby itself doesn’t seem to have a CHANGELOG or release notes, which I don’t love.
  • Vite is a newer tool written for modern JS, it mostly does not support CommonJS/node require, preferring modern import. In some cases that I can’t totally explain require in dependencies seems to work anyway… but something related to this stuff made it apparently impossible for me to import an old not-very-maintained dependency I had been importing fine in Webpacker. (I don’t know how it would have done with jsbundling-rails/esbuild). So all is not roses.

Am I worried that this is a third-party integration not blessed by Rails?

The vite-ruby maintainer ElMassimo is doing an amazing job. It is currently very well-maintained software, with frequent releases, quick turnaround from bug report to release, and ElMassimo is very repsonsive in github discussions.

But it looks like it is just one person maintaining. We know how open source goes. Am I worried that in the future some release of Rails might break vite-ruby in some way, and there won’t be a maintainer to fix it?

I mean… a bit? But let’s face it… Rails officially blessed solutions haven’t seemed very well-maintained for years now either! The three year gap of abandonware between the first sprockets 4.x beta and final release, followed by more radio silence? The fact that for a couple years before webpacker was officially retired it seemed to be getting no maintainance, including requiring dependency versions with CVE’s that just stayed that way? Not much documentation (ie Rails Guide) support for webpacker ever, or jsbundling-rails still?

One would think it might be a new leaf with css/jsbundling-rails… but I am still baffled by there being no support for sass sourcemaps in cssbundling-rails and sass! Official rails support doesn’t necessarily get you much “just works” DX when it comes to asset handling for years now.

Let’s face it, this has been an area where being in the Rails github org and/or being blessed by Rails docs has been no particular reason to expect maintenance or expect you won’t have problems down the line anyway. it’s open source, nobody owes you anything, maintainers spend time on what they have interest to spend time on (including time to review/merge/maintain other’s PR’s — which is def non-trivial time!) — it just is what it is.

While the vite-ruby code provides a pretty great integrated into Rails DX, its also actually mostly pretty simple code, especially when it comes to the Rails touch points most at risk of Rails breaking — it’s not doing anything too convoluted.

So, you know, you take your chances, I feel good about my chances compared to a css/jsbundling-rails solution. And if someday I have to switch things over again, oh well — Rails just pulled webpacker out from under us quicker than expected too, so you take your chances regardless!


(thanks to colleague Anna Headley for first suggesting we take a look at vite in Rails!)

Advertisement

Rails7 connection.select_all is stricter about it’s arguments in backwards incompat way: TypeError: Can’t Cast Array

I have code that wanted to execute some raw SQL against an ActiveRecord database. It is complicated and weird multi-table SQL (involving a postgres recursive CTE), so none of the specific-model-based API for specifying SQL seemed appropriate. It also needed to take some parameters, that needed to be properly escaped/sanitized.

At some point I decided that the right way to do this was with Model.connection.select_all , which would create a parameterized prepared statement.

Was I right? Is there a better way to do this? The method is briefly mentioned in the Rails Guide (demonstrating it is public API!), but without many details about the arguments. It has very limited API docs, just doc’d as: select_all(arel, name = nil, binds = [], preparable: nil, async: false), “Returns an ActiveRecord::Result instance.” No explanation of the type or semantics of the arguments.

In my code working on Rails previous to 7, the call looked like:

MyModel.connection.select_all(
  "select complicated_stuff WHERE something = $1",
  "my_complicated_stuff_name",
  [[nil, value_for_dollar_one_sub]],
  preparable: true
)
  • yeah that value for the binds is weird, a duple-array within an array, where the first value of the duple-array is just nil? This isn’t documented anywhere, I probably got that from somewhere… maybe one of the several StackOverflow answers.
  • I honestly don’t know what preparable: true does, or what difference it makes.

In Rails 7.0, this started failing with the error: TypeError: can’t cast Array.

I couldn’t find any documentation of that select_all all method at all, or other discussion of this; I couldn’t find any select_all change mentioned in the Rails Changelog. I tried looking at actual code history but got lost. I’m guessing “can’t cast Array” referes to that weird binds value… but what is it supposed to be?

Eventually I thought to look for Rails tests of this method that used the binds argument, and managed to eventually find one!

So… okay, rewrote that with new binds argument like so:

bind = ActiveRecord::Relation::QueryAttribute.new(
  "something", 
  value_for_dollar_one_sub, 
  ActiveRecord::Type::Value.new
)

MyModel.connection.select_all(
  "select complicated_stuff WHERE something = $1",
  "my_complicated_stuff_name",
  [bind],
  preparable: true
)
  • Confirmed this worked not only in Rails 7, but all the way back to Rails 5.2 no problem.
  • I guess that way I was doing it previously was some legacy way of passing args that was finally removed in Rails 7?
  • I still don’t really understand what I’m doing. The first arg to ActiveRecord::Relation::QueryAttribute.new I made match the SQL column it was going to be compared against, but I don’t know if it matters or if it’s used for anything. The third argument appears to be an ActiveRecord Type… I just left it the generic ActiveRecord::Type::Value.new, which seemed to work fine for both integer or string values, not sure in what cases you’d want to use a specific type value here, or what it would do.
  • In general, I wonder if there’s a better way for me to be doing what I’m doing here? It’s odd to me that nobody else findable on the internet has run into this… even though there are stackoverflow answers suggesting this approach… maybe i’m doing it wrong?

But anyways, since this was pretty hard to debug, hard to find in docs or explanations on google, and I found no mention at all of this changing/breaking in Rails 7… I figured I’d write it up so someone else had the chance of hitting on this answer.

Github Action setup-ruby needs to quote ‘3.0’ or will end up with ruby 3.1

You may be running builds in Github Actions using the setup-ruby action to install a chosen version of ruby, looking something like this:

    - name: Set up Ruby
      uses: ruby/setup-ruby@v1
      with:
        ruby-version: 3.0

A week ago, that would have installed the latest ruby 3.0.x. But as of the christmas release of ruby 3.1, it will install the latest ruby 3.1.x.

The workaround and/or correction is to quote the ruby version number. If you actually want to get latest ruby 3.0.x, say:

      with:
        ruby-version: '3.0'

This is reported here, with reference to this issue on the Github Actions runner itself. It is not clear to me that this is any kind of a bug in the github actions runner, rather than just an unanticipated consequence of using a numeric value in YAML here. 3.0 is of course the same number as 3, it’s not obvious to me it’s a bug that the YAML parser treats them as such.

Perhaps it’s a bug or mis-design in the setup-ruby action. But in lieu of any developers deciding it’s a bug… quote your 3.0 version number, or perhaps just quote all ruby version numbers with the setup-ruby task?

If your 3.0 builds started failing and you have no idea why — this could be it. It can be a bit confusing to diagnose, because I’m not sure anything in the Github Actions output will normally echo the ruby version in use? I guess there’s a clue in the “Installing Bundler” sub-head of the “Setup Ruby” task:

Of course it’s possible your build will succeed anyway on ruby 3.1 even if you meant to run it on ruby 3.0! Mine failed with LoadError: cannot load such file -- net/smtp, so if yours happened to do the same, maybe you got here from google. :) (Clearly net/smtp has been moved to a different status of standard gem in ruby 3.1, I’m not dealing with this further becuase I wasn’t intentionally supporting ruby 3.1 yet).

Note that if you are building with a Github actions matrix for ruby version, the same issue applies. Maybe something like:

matrix:
        include:
          - ruby: '3.0' 
 steps:
    - uses: actions/checkout@v2

    - name: Set up Ruby
      uses: ruby/setup-ruby@v1
      with:
        ruby-version: ${{ matrix.ruby }}

Notes on retrying all jobs with ActiveJob retry_on

I would like to configure all my ActiveJobs to retry on failure, and I’d like to do so with the ActiveJob retry_on method.

So I’m going to configure it in my ApplicationJob class, in order to retry on any error, maybe something like:

class ApplicationJob < ActiveJob::Base
  retry_on StandardError # other args to be discussed
end

Why use ActiveJob retry_on for this? Why StandardError?

Many people use backend-specific logic for retries, especially with Sidekiq. That’s fine!

I like the idea of using the ActiveJob functionality:

  • I currently use resque (more on challenges with retry here later), but plan to switch to something else at some point medium-term. Maybe sideqkiq, but maybe delayed_job or good_job. (Just using the DB and not having a redis is attractive to me, as is open source). I like the idea of not having to redo this setup when I switch back-ends, or am trying out different ones.
  • In general, I like the promise of ActiveJob as swappable commoditized backends
  • I like what I see as good_job’s philosophy here, why have every back-end reinvent the wheel when a feature can be done at the ActiveJob level? That can help keep the individual back-end smaller, and less “expensive” to maintain. good_job encourages you to use ActiveJob retries I think.

Note, dhh is on record from 2018 saying he thinks setting up retries for all StandardError is a bad idea. But I don’t really understand why! He says “You should know why you’d want to retry, and the code should document that knowledge.” — but the fact that so many ActiveJob back-ends provide “retry all jobs” functionality makes it seem to me an established common need and best practice, and why shouldn’t you be able to do it with ActiveJob alone?

dhh thinks ActiveJob retry is for specific targetted retries maybe, and the backend retry should be used for generic universal ones? Honestly I don’t see myself doing much specific targetted retries, making all your jobs idempotent (important! Best practice for ActiveJob always!), and just having them all retry on any error seems to me to be the way to go, a more efficient use of developer time and sufficient for at least a relatively simple app.

One situation I have where a retry is crucial, is when I have a fairly long-running job (say it takes more than 60 seconds to run; I have some unavoidably!), and the machine running the jobs needs to restart. It might interrupt the job. It is convenient if it is just automatically retried — put back in the queue to be run again by restarted or other job worker hosts! Otherwise it’s just sitting there failed, never to run again, requiring manual action. An automatic retry will take care of it almost invisibly.

Resque and Resque Scheduler

Resque by default doens’t supprot future-scheduled jobs. You can add them with the resque-scheduler plugin. But I had a perhaps irrational desire to avoid this — resque and it’s ecosystem have at different times had different amounts of maintenance/abandonment, and I’m (perhaps irrationally) reluctant to complexify my resque stack.

And do I need future scheduling for retries? For my most important use cases, it’s totally fine if I retry just once, immediately, with a wait: 0. Sure, that won’t take care of all potential use cases, but it’s a good start.

I thought even without resque supporting future-scheduling, i could get away with:

retry_on StandardError, wait: 0

Alas, this won’t actually work, it still ends up being converted to a future-schedule call, which gets rejected by the resque_adapter bundled with Rails unless you have resque-scheduler installed.

But of course, resque can handle wait:0 semantically, if the code was willing to do it by queing an ordinary resque job…. I don’t know if it’s a good idea, but this simple patch to Rails-bundled resque_adapter will make it willing to accept “scheduled” jobs when the time to be scheduled is actually “now”, just scheduling them normally, while still raising on attempts to future schedule. For me, it makes retry_on.... wait: 0 work with just plain resque.

Note: retry_on attempts count includes first run

So wanting to retry just once, I tried something like this:

# Will never actually retry
retry_on StandardError, attempts: 1

My job was never actually retried this way! It looks like the attempts count includes the first non-error run, the total number of times job will be run, including the very first one before any “retries”! So attempts 1 means “never retry” and does nothing. Oops. If you actually want to retry only once, in my Rails 6.1 app this is what did it for me:

# will actually retry once
retry_on StandardError, attempts: 2

(I think this means the default, attempts: 5 actually means your job can be run a total of 5 times– one original time and 4 retries. I guess that’s what was intended?)

Note: job_id stays the same through retries, hooray

By the way, I checked, and at least in Rails 6.1, the ActiveJob#job_id stays the same on retries. If the job runs once and is retried twice more, it’ll have the same job_id each time, you’ll see three Performing lines in your logs, with the same job_id.

Phew! I think that’s the right thing to do, so we can easily correlate these as retries of the same jobs in our logs. And if we’re keeping the job_id somewhere to check back and see if it succeeded or failed or whatever, it stays consistent on retry.

Glad this is what ActiveJob is doing!

Logging isn’t great, but can be customized

Rails will automatically log retries with a line that looks like this:

Retrying TestFailureJob in 0 seconds, due to a RuntimeError.
# logged at `info` level

Eventually when it decides it’s attempts are exhausted, it’ll say something like:

Stopped retrying TestFailureJob due to a RuntimeError, which reoccurred on 2 attempts.
# logged at `error` level

This does not include the job-id though, which makes it harder than it should be to correlate with other log lines about this job, and follow the job’s whole course through your log file.

It’s also inconsistent with other default ActiveJob log lines, which include:

  • the Job ID in text
  • tags (Rails tagged logging system) with the job id and the string "[ActiveJob]". Because of the way the Rails code applies these only around perform/enqueue, retry/discard related log lines apparently end up not included.
  • The Exception message not just the class when there’s a class.

You can see all the built-in ActiveJob logging in the nicely compact ActiveJob::LogSubscriber class. And you can see how the log line for retry is kind of inconsistent with eg perform.

Maybe this inconsistency has persisted so long in part because few people actually use ActiveJob retry, they’re all still using their backends backend-specific functionality? I did try a PR to Rails for at least consistent formatting (my PR doesn’t do tagging), not sure if it will go anywhere, I think blind PR’s to Rails usually do not.

In the meantime, after trying a bunch of different things, I think I figured out the reasonable way to use the ActiveSupport::Notifications/LogSubscriber API to customize logging for the retry-related events while leaving it untouched from Rails for the others? See my solution here.

(Thanks to BigBinary blog for showing up in google and giving me a head start into figuring out how ActiveJob retry logging was working.)

(note: There’s also this: https://github.com/armandmgt/lograge_active_job But I’m not sure how working/maintained it is. It seems to only customize activejob exception reports, not retry and other events. It would be an interesting project to make an up-to-date activejob-lograge that applied to ALL ActiveJob logging, expressing every event as key/values and using lograge formatter settings to output. I think we see exactly how we’d do that, with a custom log subscriber as we’ve done above!)

Warning: ApplicationJob configuration won’t work for emails

You might think since we configured retry_on on ApplicationJob, all our bg jobs are now set up for retrying.

Oops! Not deliver_later emails.

Good_job README explains that ActiveJob mailers don’t descend from ApplicationMailer. (I am curious if there’s any good reason for this, it seems like it would be nice if they did!)

The good_job README provides one way to configure the built-in Rails mailer superclass for retries.

You could maybe also try setting delivery_job on that mailer superclass to use a custom delivery job (thanks again BigBinary for the pointer)… maybe one that subclasses the default class to deliver emails as normal, but let you set some custom options like retry_on? Not sure if this would be preferable in any way.

logging URI query params with lograge

The lograge gem for taming Rails logs by default will lot the path component of the URI, but leave out the query string/query params.

For instance, perhaps you have a URL to your app /search?q=libraries.

lograge will log something like:

method=GET path=/search format=html

The q=libraries part is completely left out of the log. I kinda want that part, it’s important.

The lograge README provides instructions for “logging request parameters”, by way of the params hash.

I’m going to modify them a bit slightly to:

  • use the more recent custom_payload config instead of custom_options. (I’m not certain why there are both, but I think mostly for legacy reasons and newer custom_payload? is what you should read for?)
  • If we just put params in there, then a bunch of ugly <ActionController::Parameters show up in the log if you have nested hash params. We could fix that with params.to_unsafe_h, but…
  • We should really use request.filtered_parameters instead to make sure we’re not logging anything that’s been filtered out with Rails 6 config.filter_parameters. (Thanks /u/ezekg on reddit). This also converts to an ordinary hash that isn’t ActionController::Parameters, taking care of previous bullet point.
  • (It kind of seems like lograge README could use a PR updating it?)
  config.lograge.custom_payload do |controller|
    exceptions = %w(controller action format id)
    params: controller.request.filtered_parameters.except(*exceptions)
  end

That gets us a log line that might look something like this:

method=GET path=/search format=html controller=SearchController action=index status=200 duration=107.66 view=87.32 db=29.00 params={"q"=>"foo"}

OK. The params hash isn’t exactly the same as the query string, it can include things not in the URL query string (like controller and action, that we have to strip above, among others), and it can in some cases omit things that are in the query string. It just depends on your routing and other configuration and logic.

The params hash itself is what default rails logs… but what if we just log the actual URL query string instead? Benefits:

  • it’s easier to search the logs for actually an exact specific known URL (which can get more complicated like /search?q=foo&range%5Byear_facet_isim%5D%5Bbegin%5D=4&source=foo or something). Which is something I sometimes want to do, say I got a URL reported from an error tracking service and now I want to find that exact line in the log.
  • I actually like having the exact actual URL (well, starting from path) in the logs.
  • It’s a lot simpler, we don’t need to filter out controller/action/format/id etc.
  • It’s actually a bit more concise? And part of what I’m dealing with in general using lograge is trying to reduce my bytes of logfile for papertrail!

Drawbacks?

  • if you had some kind of structured log search (I don’t at present, but I guess could with papertrail features by switching to json format?), it might be easier to do something like “find a /search with q=foo and source=ef without worrying about other params)
  • To the extent that params hash can include things not in the actual url, is that important to log like that?
  • ….?

Curious what other people think… am I crazy for wanting the actual URL in there, not the params hash?

At any rate, it’s pretty easy to do. Note we use filtered_path rather than fullpath to again take account of Rails 6 parameter filtering, and thanks again /u/ezekg:

  config.lograge.custom_payload do |controller|
    {
      path: controller.request.filtered_path
    }
  end

This is actually overwriting the default path to be one that has the query string too:

method=GET path=/search?q=libraries format=html ...

You could of course add a different key fullpath instead, if you wanted to keep path as it is, perhaps for easier collation in some kind of log analyzing system that wants to group things by same path invariant of query string.

I’m gonna try this out!

Meanwhile, on lograge…

As long as we’re talking about lograge…. based on commit history, history of Issues and Pull Requests… the fact that CI isn’t currently running (travis.org grr) and doesn’t even try to test on Rails 6.0+ (although lograge seems to work fine)… one might worry that lograge is currently un/under-maintained…. No comment on a GH issue filed in May asking about project status.

It still seems to be one of the more popular solutions to trying to tame Rails kind of out of control logs. It’s mentioned for instance in docs from papertrail and honeybadger, and many many other blog posts.

What will it’s future be?

Looking around for other possibilties, I found semantic_logger (rails_semantic_logger). It’s got similar features. It seems to be much more maintained. It’s got a respectable number of github stars, although not nearly as many as lograge, and it’s not featured in blogs and third-party platform docs nearly as much.

It’s also a bit more sophisticated and featureful. For better or worse. For instance mainly I’m thinking of how it tries to improve app performance by moving logging to a background thread. This is neat… and also can lead to a whole new class of bug, mysterious warning, or configuration burden.

For now I’m sticking to the more popular lograge, but I wish it had CI up that was testing with Rails 6.1, at least!

Incidentally, trying to get Rails to log more compactly like both lograge and rails_semantic_logger do… is somewhat more complicated than you might expect, as demonstrated by the code in both projects that does it! Especially semantic_logger is hundreds of lines of somewhat baroque code split accross several files. A refactor of logging around Rails 5 (I think?) to use ActiveSupport::LogSubscriber made it possible to customize Rails logging like this (although I think both lograge and rails_semantic_logger still do some monkey-patching too!), but in the end didn’t make it all that easy or obvious or future-proof. This may discourage too many other alternatives for the initial primary use case of both lograge and rails_semantic_logger — turn a rails action into one log line, with a structured format.

Notes on Cloudfront in front of Rails Assets on Heroku, with CORS

Heroku really recommends using a CDN in front of your Rails app static assets — which, unlike in non-heroku circumstances where a web server like nginx might be taking care of it, otherwise on heroku static assets will be served directly by your Rails app, consuming limited/expensive dyno resources.

After evaluating a variety of options (including some heroku add-ons), I decided AWS Cloudfront made the most sense for us — simple enough, cheap, and we are already using other direct AWS services (including S3 and SES).

While heroku has an article on using Cloudfront, which even covers Rails specifically, and even CORS issues specifically, I found it a bit too vague to get me all the way there. And while there are lots of blog posts you can find on this topic, I found many of them outdated (Rails has introduced new API; Cloudfront has also changed it’s configuration options!), or otherwise spotty/thin.

So while I’m not an expert on this stuff, i’m going to tell you what I was able to discover, and what I did to set up Cloudfront as a CDN in front of Rails static assets running on heroku — although there’s really nothing at all specific to heroku here, if you have any other context where Rails is directly serving assets in production.

First how I set up Rails, then Cloudfront, then some notes and concerns. Btw, you might not need to care about CORS here, but one reason you might is if you are serving any fonts (including font-awesome or other icon fonts!) from Rails static assets.

Rails setup

In config/environments/production.rb

# set heroku config var RAILS_ASSET_HOST to your cloudfront
# hostname, will look like `xxxxxxxx.cloudfront.net`
config.asset_host = ENV['RAILS_ASSET_HOST']

config.public_file_server.headers = {
  # CORS:
  'Access-Control-Allow-Origin' => "*", 
  # tell Cloudfront to cache a long time:
  'Cache-Control' => 'public, max-age=31536000' 
}

Cloudfront Setup

I changed some things from default. The only one that absolutely necessary — if you want CORS to work — seemed to be changing Allowed HTTP Methods to include OPTIONS.

Click on “Create Distribution”. All defaults except:

  • Origin Domain Name: your heroku app host like app-name.herokuapp.com
  • Origin protocol policy: Switch to “HTTPS Only”. Seems like a good idea to ensure secure traffic between cloudfront and origin, no?
  • Allowed HTTP Methods: Switch to GET, HEAD, OPTIONS. In my experimentation, necessary for CORS from a browser to work — which AWS docs also suggest.
  • Cached HTTP Methods: Click “OPTIONS” too now that we’re allowing it, I don’t see any reason not to?
  • Compress objects automatically: yes
    • Sprockets is creating .gz versions of all your assets, but they’re going to be completely ignored in a Cloudfront setup either way. ☹️ (Is there a way to tell Sprockets to stop doing it? WHO KNOWS not me, it’s so hard to figure out how to reliably talk to Sprockets). But we can get what it was trying to do by having Cloudfront encrypt stuff for us, seems like a good idea, Google PageSpeed will like it, etc?
    • I noticed by experimentation that Cloudfront will compress CSS and JS (sometimes with brotli sometimes gz, even with the same browser, don’t know how it decides, don’t care), but is smart enough not to bother trying to compress a .jpg or .png (which already has internal compression).
  • Comment field: If there’s a way to edit it after you create the distribution, I haven’t found it, so pick a good one!

Notes on CORS

AWS docs here and here suggest for CORS support you also need to configure the Cloudfront distribution to forward additional headers — Origin, and possibly Access-Control-Request-Headers and Access-Control-Request-Method. Which you can do by setting up a custom “cache policy”. Or maybe instead by by setting the “Origin Request Policy”. Or maybe instead by setting custom cache header settings differently using the Use legacy cache settings option. It got confusing — and none of these settings seemed to be necessary to me for CORS to be working fine, nor could I see any of these settings making any difference in CloudFront behavior or what headers were included in responses.

Maybe they would matter more if I were trying to use a more specific Access-Control-Allow-Origin than just setting it to *? But about that….

If you set Access-Control-Allow-Origin to a single host, MDN docs say you have to also return a Vary: Origin header. Easy enough to add that to your Rails config.public_file_server.headers. But I couldn’t get Cloudfront to forward/return this Vary header with it’s responses. Trying all manner of cache policy settings, referring to AWS’s quite confusing documentation on the Vary header in Cloudfront and trying to do what it said — couldn’t get it to happen.

And what if you actually need more than one allowed origin? Per spec Access-Control-Allow-Origin as again explained by MDN, you can’t just include more than one in the header, the header is only allowed one: ” If the server supports clients from multiple origins, it must return the origin for the specific client making the request.” And you can’t do that with Rails static/global config.public_file_server.headers, we’d need to use and setup rack-cors instead, or something else.

So I just said, eh, * is probably just fine. I don’t think it actually involves any security issues for rails static assets to do this? I think it’s probably what everyone else is doing?

The only setup I needed for this to work was setting Cloudfront to allow OPTIONS HTTP method, and setting Rails config.public_file_server.headers to include 'Cache-Control' => 'public, max-age=31536000'.

Notes on Cache-Control max-age

A lot of the existing guides don’t have you setting config.public_file_server.headers to include 'Cache-Control' => 'public, max-age=31536000'.

But without this, will Cloudfront actually be caching at all? If with every single request to cloudfront, cloudfront makes a request to the Rails app for the asset and just proxies it — we’re not really getting much of the point of using Cloudfront in the first place, to avoid the traffic to our app!

Well, it turns out yes, Cloudfront will cache anyway. Maybe because of the Cloudfront Default TTL setting? My Default TTL was left at the Cloudfront default, 86400 seconds (one day). So I’d think that maybe Cloudfront would be caching resources for a day when I’m not supplying any Cache-Control or Expires headers?

In my observation, it was actually caching for less than this though. Maybe an hour? (Want to know if it’s caching or not? Look at headers returned by Cloudfront. One easy way to do this? curl -IXGET https://whatever.cloudfront.net/my/asset.jpg, you’ll see a header either x-cache: Miss from cloudfront or x-cache: Hit from cloudfront).

Of course, Cloudfront doesn’t promise to cache for as long as it’s allowed to, it can evict things for it’s own reasons/policies before then, so maybe that’s all that’s going on.

Still, Rails assets are fingerprinted, so they are cacheable forever, so why not tell Cloudfront that? Maybe more importantly, if Rails isn’t returning a Cache-Cobntrol header, then Cloudfront isn’t either to actual user-agents, which means they won’t know they can cache the response in their own caches, and they’ll keep requesting/checking it on every reload too, which is not great for your far too large CSS and JS application files!

So, I think it’s probably a great idea to set the far-future Cache-Control header with config.public_file_server.headers as I’ve done above. We tell Cloudfront it can cache for the max-allowed-by-spec one year, and this also (I checked) gets Cloudfront to forward the header on to user-agents who will also know they can cache.

Note on limiting Cloudfront Distribution to just static assets?

The CloudFront distribution created above will actually proxy/cache our entire Rails app, you could access dynamic actions through it too. That’s not what we intend it for, our app won’t generate any URLs to it that way, but someone could.

Is that a problem?

I don’t know?

Some blog posts try to suggest limiting it only being willing to proxy/cache static assets instead, but this is actually a pain to do for a couple reasons:

  1. Cloudfront has changed their configuration for “path patterns” since many blog posts were written (unless you are using “legacy cache settings” options), such that I’m confused about how to do it at all, if there’s a way to get a distribution to stop caching/proxying/serving anything but a given path pattern anymore?
  2. Modern Rails with webpacker has static assets at both /assets and /packs, so you’d need two path patterns, making it even more confusing. (Why Rails why? Why aren’t packs just at public/assets/packs so all static assets are still under /assets?)

I just gave up on figuring this out and figured it isn’t really a problem that Cloudfront is willing to proxy/cache/serve things I am not intending for it? Is it? I hope?

Note on Rails asset_path helper and asset_host

You may have realized that Rails has both asset_path and asset_url helpers for linking to an asset. (And similar helpers with dashes instead of underscores in sass, and probably different implementations, via sass-rails)

Normally asset_path returns a relative URL without a host, and asset_url returns a URL with a hostname in it. Since using an external asset_host requires we include the host with all URLs for assets to properly target CDN… you might think you have to stop using asset_path anywhere and just use asset_urlYou would be wrong.

It turns out if config.asset_host is set, asset_path starts including the host too. So everything is fine using asset_path. Not sure if at that point it’s a synonym for asset_url? I think not entirely, because I think in fact once I set config.asset_host, some of my uses of asset_url actually started erroring and failing tests? And I had to actually only use asset_path? In ways I don’t really understand what’s going on and can’t explain it?

Ah, Rails.

Heroku release phase, rails db:migrate, and command failure

If you use capistrano to deploy a Rails app, it will typically run a rails db:migrate with every deploy, to apply any database schema changes.

If you are deploying to heroku you might want to do the same thing. The heroku “release phase” feature makes this possible. (Introduced in 2017, the release phase feature is one of heroku’s more recent major features, as heroku dev has seemed to really stabilize and/or stagnate).

The release phase docs mention “running database schema migrations” as a use case, and there are a few ((1), (2), (3)) blog posts on the web suggesting doing exactly that with Rails. Basically as simple as adding release: bundle exec rake db:migrate to your Procfile.

While some of the blog posts do remind you that “If the Release Phase fails the app will not be deployed”, I have found the implications of this to be more confusing in practice than one would originally assume. Particularly because on heroku changing a config var triggers a release; and it can be confusing to notice when such a release has failed.

It pays to consider the details a bit so you understand what’s going on, and possibly consider somewhat more complicated release logic than simply calling out to rake db:migrate.

1) What if a config var change makes your Rails app unable to boot?

I don’t know how unusual this is, but I actually had a real-world bug like this when in the process of setting up our heroku app. Without confusing things with the details, we can simulate such a bug simply by putting this in, say, config/application.rb:

if ENV['FAIL_TO_BOOT']
  raise "I am refusing to boot"
end

Obviously my real bug was weirder, but the result was the same — with some settings of one or more heroku configuration variables, the app would raise an exception during boot. And we hadn’t noticed this in testing, before deploying to heroku.

Now, on heroku, using CLI or web dashboard, set the config var FAIL_TO_BOOT to “true”.

Without a release phase, what happens?

  • The release is successful! If you look at the release in the dashboard (“Activity” tab) or heroku releases, it shows up as successful. Which means heroku brings up new dynos and shuts down the previous ones, that’s what a release is.
  • The app crashes when heroku tries to start it in the new dynos.
  • The dynos will be in “crashed” state when looked at in heroku ps or dashboard.
  • If a user tries to access the web app, they will get the generic heroku-level “could not start app” error screen (unless you’ve customized your heroku error screens, as usual).
  • You can look in your heroku logs to see the error and stack trace that prevented app boot.

Downside: your app is down.

Upside: It is pretty obvious that your app is down, and (relatively) why.

With a db:migrate release phase, what happens?

The Rails db:migrate rake task has a dependency on the rails :environment task, meaning it boots the Rails app before executing. You just changed your config variable FAIL_TO_BOOT: true such that the Rails app can’t boot. Changing the config variable triggered a release.

As part of the release, the db:migrate release phase is run… which fails.

  • The release is not succesful, it failed.
  • You don’t get any immediate feedback to that effect in response to your heroku config:add command or on the dashboard GUI in the “settings” tab. You may go about your business assuming it succeeded.
  • If you look at the release in heroku releases or dashboard “activity” tab you will see it failed.
  • You do get an email that it failed. Maybe you notice it right away, or maybe you notice it later, and have to figure out “wait, which release failed? And what were the effects of that? Should I be worried?”
  • The effects are:
    • The config variable appears changed in heroku’s dashboard or in response to heroku config:get etc.
    • The old dynos without the config variable change are still running. They don’t have the change. If you open a one-off dyno, it will be using the old release, and have the old (eg) ENV[‘FAIL_TO_BOOT’] value.
    • ANY subsequent attempts at a releases will keep fail, so long as the app is in a state (based on teh current config variables) that it can’t boot.

Again, this really happened to me! It is a fairly confusing situation.

Upside: Your app is actually still up, even though you broke it, the old release that is running is still running, that’s good?

Downside: It’s really confusing what happened. You might not notice at first. Things remain in a messed up inconsistent and confusing state until you notice, figure out what’s going on, what release caused it, and how to fix it.

It’s a bit terrifying that any config variable change could do this. But I guess most people don’t run into it like I did, since I haven’t seen it mentioned?

2) A heroku pg:promote is a config variable change, that will create a release in which db:migrate release phase fails.

heroku pg:promote is a command that will change which of multiple attached heroku postgreses are attached as the “primary” database, pointed to by the DATABASE_URL config variable.

For a typical app with only one database, you still might use pg:promote for a database upgrade process; for setting up or changing a postgres high-availability leader/follower; or, for what I was experimenting with it for, using heroku’s postgres-log-based rollback feature.

I had assumed that pg:promote was a zero-downtime operation. But, in debugging it’s interaction with my release phase, I noticed that pg:promote actually creates TWO heroku releases.

  1. First it creates a release labelled Detach DATABASE , in which there is no DATABASE_URL configuration variable at all.
  2. Then it creates another release labelled Attach DATABASE in which the DATABASE_URL configuration variable is defined to it’s new value.

Why does it do this instead of one release that just changes the DATABASE_URL? I don’t know. My app (like most Rails and probably other apps) can’t actually function without DATABASE_URL set, so if that first release ever actually runs, it will just error out. Does this mean there’s an instant with a “bad” release deployed, that pg:promote isn’t actually zero-downtime? I am not sure, it doens’t seem right (I did file a heroku support ticket asking….).

But under normal circumstances, either it’s not a problem, or most people(?) don’t notice.

But what if you have a db:migrate release phase?

When it tries to do release (1) above, that release will fail. Because it tries to run db:migrate, and it can’t do that without a DATABASE_URL set, so it raises, the release phase exits in an error condition, the release fails.

Actually what happens is without DATABASE_URL set, the Rails app will assume a postgres URL in a “default” location, try to connect to, and fail, with an error message (hello googlers?), like:

ActiveRecord::ConnectionNotEstablished: could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

Now, release (2) is coming down the pike seconds later, this is actually fine, and will be zero outage. We had a release that failed (so never was deployed), and seconds later the next correct release succeeds. Great!

The only problem is that we got an email notifying us that release 1 failed, and it’s also visible as failing in the heroku release list, etc.

A “background” (not in response to a git push or other code push to heroku) release failing is already a confusing situation — a”false positives” that actually mean “nothing unexpected or problematic happened, just ignore this and carry on.” is… really not something I want. (I call this the “error notification crying wolf”, right? I try to make sure my error notifications never do it, because it takes your time away from flow unecessarily, and/or makes it much harder to stay vigilant to real errors).

Now, there is a fairly simple solution to this particular problem. Here’s what I did. I changed my heroku release phase from rake db:migrate to a custom rake task, say release: bundle exec rake my_custom_heroku_release_phase, defined like so:

task :my_custom_heroku_release_phase do
if ENV['DATABASE_URL']
Rake::Task["db:migrate"].invoke
else
$stderr.puts "\n!!! WARNING, no ENV['DATABASE_URL'], not running rake db:migrate as part of heroku release !!!\n\n"
end
end
view raw custom.rake hosted with ❤ by GitHub

Now that release (1) above at least won’t fail, it has the same behavior as a “traditional” heroku app without a release phase.

Swallow-and-report all errors?

When a release fails because a release phase has failed as result of a git push to heroku, that’s quite clear and fine!

But the confusion of the “background” release failure, triggered by a config var change, is high enough that part of me wants to just rescue StandardError in there, and prevent a failed release phase from ever exiting with a failure code, so heroku will never use a db:migrate release phase to abort a release.

Just return the behavior to the pre-release-phase heroku behavior — you can put your app in a situation where it will be crashed and not work, but maybe that’s better not a mysterious inconsistent heroku app state that happens in the background and you find out about only through asynchronous email notifications from heroku that are difficult to understand/diagnose. It’s all much more obvious.

On the other hand, if a db:migrate has failed not becuase of some unrelated boot process problem that is going to keep the app from launching too even if it were released, but simply because the db:migrate itself actually failed… you kind of want the release to fail? That’s good? Keep the old release running, not a new release with code that expects a db migration that didn’t happen?

So I’m not really sure.

If you did want to rescue-swallow-and-notify, the custom rake task for your heroku release logic — instead of just telling heroku to run a standard thing like db:migrate on release — is certainly convenient.

Also, do you really always want to db:migrate anyway? What about db:schema:load?

Another alternative… if you are deploying an app with an empty database, standard Rails convention is to run rails db:schema:load instead of db:migrate. The db:migrate will probably work anyway, but will be slower, and somewhat more error-prone.

I guess this could come up on heroku with an initial deploy or (for some reason) a database that’s been nuked and restarted, or perhaps a Heroku “Review app”? (I don’t use those yet)

stevenharman has a solution that actually checks the database, and runs the appropriate rails task depending on state, here in this gist.

I’d probably do it as a rake task instead of a bash file if I were going to do that. I’m not doing it at all yet.

Note that stevenharman’s solution will actually catch a non-existing or non-connectable database and not try to run migrations… but it will print an error message and exit 1 in that case, failing the release — meaning that you will get a failed release in the pg:promote case mentioned above!

Rails auto-scaling on Heroku

We are investigating moving our medium-small-ish Rails app to heroku.

We looked at both the Rails Autoscale add-on available on heroku marketplace, and the hirefire.io service which is not listed on heroku marketplace and I almost didn’t realize it existed.

I guess hirefire.io doesn’t have any kind of a partnership with heroku, but still uses the heroku API to provide an autoscale service. hirefire.io ended up looking more fully-featured and lesser priced than Rails Autoscale; so the main service of this post is just trying to increase visibility of hirefire.io and therefore competition in the field, which benefits us consumers.

Background: Interest in auto-scaling Rails background jobs

At first I didn’t realize there was such a thing as “auto-scaling” on heroku, but once I did, I realized it could indeed save us lots of money.

I am more interested in scaling Rails background workers than I a web workers though — our background workers are busiest when we are doing “ingests” into our digital collections/digital asset management system, so the work is highly variable. Auto-scaling up to more when there is ingest work piling up can give us really nice inget throughput while keeping costs low.

On the other hand, our web traffic is fairly low and probably isn’t going to go up by an order of magnitude (non-profit cultural institution here). And after discovering that a “standard” dyno is just too slow, we will likely be running a performance-m or performance-l anyway — which likely can handle all anticipated traffic on it’s own. If we have an auto-scaling solution, we might configure it for web dynos, but we are especially interested in good features for background scaling.

There is a heroku built-in autoscale feature, but it only works for performance dynos, and won’t do anything for Rails background job dynos, so that was right out.

That could work for Rails bg jobs, the Rails Autoscale add-on on the heroku marketplace; and then we found hirefire.io.

Pricing: Pretty different

hirefire

As of now January 2021, hirefire.io has pretty simple and affordable pricing. $15/month/heroku application. Auto-scaling as many dynos and process types as you like.

hirefire.io by default can only check into your apps metrics to decide if a scaling event can occur once per minute. If you want more frequent than that (up to once every 15 seconds), you have to pay an additional $10/month, for $25/month/heroku application.

Even though it is not a heroku add-on, hirefire does advertise that they bill pro-rated to the second, just like heroku and heroku add-ons.

Rails autoscale

Rails autoscale has a more tiered approach to pricing that is based on number and type of dynos you are scaling. Starting at $9/month for 1-3 standard dynos, the next tier up is $39 for up to 9 standard dynos, all the way up to $279 (!) for 1 to 99 dynos. If you have performance dynos involved, from $39/month for 1-3 performance dynos, up to $599/month for up to 99 performance dynos.

For our anticipated uses… if we only scale bg dynos, I might want to scale from (low) 1 or 2 to (high) 5 or 6 standard dynos, so we’d be at $39/month. Our web dynos are likely to be performance and I wouldn’t want/need to scale more than probably 2, but that puts us into performance dyno tier, so we’re looking at $99/month.

This is of course significantly more expensive than hirefire.io’s flat rate.

Metric Resolution

Since Hirefire had an additional charge for finer than 1-minute resolution on checks for autoscaling, we’ll discuss resolution here in this section too. Rails Autoscale has same resolution for all tiers, and I think it’s generally 10 seconds, so approximately the same as hirefire if you pay the extra $10 for increased resolution.

Configuration

Let’s look at configuration screens to get a sense of feature-sets.

Rails Autoscale

web dynos

To configure web dynos, here’s what you get, with default values:

The metric Rails Autoscale uses for scaling web dynos is time in heroku routing queue, which seems right to me — when things are spending longer in heroku routing queue before getting to a dyno, it means scale up.

worker dynos

For scaling worker dynos, Rails Autoscale can scale dyno type named “worker” — it can understand ruby queuing libraries Sidekiq, Resque, Delayed Job, or Que. I’m not certain if there are options for writing custom adapter code for other backends.

Here’s what the configuration options are — sorry these aren’t the defaults, I’ve already customized them and lost track of what defaults are.

You can see that worker dynos are scaled based on the metric “number of jobs queued”, and you can tell it to only pay attention to certain queues if you want.

Hirefire

Hirefire has far more options for customization than Rails Autoscale, which can make it a bit overwhelming, but also potentially more powerful.

web dynos

You can actually configure as many Heroku process types as you have for autoscale, not just ones named “web” and “worker”. And for each, you have your choice of several metrics to be used as scaling triggers.

For web, I think Queue Time (percentile, average) matches what Rails Autoscale does, configured to percentile, 95, and is probably the best to use unless you have a reason to use another. (“Rails Autoscale tracks the 95th percentile queue time, which for most applications will hover well below the default threshold of 100ms.“)

Here’s what configuration Hirefire makes available if you are scaling on “queue time” like Rails Autoscale, configuration may vary for other metrics.

I think if you fill in the right numbers, you can configure to work equivalently to Rails Autoscale.

worker dynos

If you have more than one heroku process type for workers — say, working on different queues — Hirefire can scale the independently, with entirely separate configuration. This is pretty handy, and I don’t think Rails Autoscale offers this. (update i may be wrong, Rails Autoscale says they do support this, so check on it yourself if it matters to you).

For worker dynos, you could choose to scale based on actual “dyno load”, but I think this is probably mostly for types of processes where there isn’t the ability to look at “number of jobs”. A “number of jobs in queue” like Rails Autoscale does makes a lot more sense to me as an effective metric for scaling queue-based bg workers.

Hirefire’s metric is slightly difererent than Rails Autoscale’s “jobs in queue”. For recognized ruby queue systems (a larger list than Rails Autoscale’s; and you can write your own custom adapter for whatever you like), it actually measures jobs in queue plus workers currently busy. So queued+in-progress, rather than Rails Autoscale’s just queued. I actually have a bit of trouble wrapping my head around the implications of this, but basically, it means that Hirefire’s “jobs in queue” metric strategy is intended to try to scale all the way to emptying your queue, or reaching your max scale limit, whichever comes first. I think this may make sense and work out at least as well or perhaps better than Rails Autoscale’s approach?

Here’s what configuration Hirefire makes available for worker dynos scaling on “job queue” metric.

Since the metric isn’t the same as Rails Autosale, we can’t configure this to work identically. But there are a whole bunch of configuration options, some similar to Rails Autoscale’s.

The most important thing here is that “Ratio” configuration. It may not be obvious, but with the way the hirefire metric works, you are basically meant to configure this to equal the number of workers/threads you have on each dyno. I have it configured to 3 because my heroku worker processes use resque, with resque_pool, configured to run 3 resque workers on each dyno. If you use sidekiq, set ratio to your configured concurrency — or if you are running more than one sidekiq process, processes*concurrency. Basically how many jobs your dyno can be concurrently working is what you should normally set for ‘ratio’.

Hirefire not a heroku plugin

Hirefire isn’t actually a heroku plugin. In addition to that meaning separate invoicing, there can be some other inconveniences.

Since hirefire only can interact with heroku API, for some metrics (including the “queue time” metric that is probably optimal for web dyno scaling) you have to configure your app to log regular statistics to heroku’s “Logplex” system. This can add a lot of noise to your log, and for heroku logging add-ons that are tired based on number of log lines or bytes, can push you up to higher pricing tiers.

If you use paperclip, I think you should be able to use the log filtering feature to solve this, keep that noise out of your logs and avoid impacting data log transfer limits. However, if you ever have cause to look at heroku’s raw logs, that noise will still be there.

Support and Docs

I asked a couple questions of both Hirefire and Rails Autoscale as part of my evaluation, and got back well-informed and easy-to-understand answers quickly from both. Support for both seems to be great.

I would say the documentation is decent-but-not-exhaustive for both products. Hirefire may have slightly more complete documentation.

Other Features?

There are other things you might want to compare, various kinds of observability (bar chart or graph of dynos or observed metrics) and notification. I don’t have time to get into the details (and didn’t actually spend much time exploring them to evaluate), but they seem to offer roughly similar features.

Conclusion

Rails Autoscale is quite a bit more expensive than hirefire.io’s flat rate, once you get past Rails Autoscale’s most basic tier (scaling no more than 3 standard dynos).

It’s true that autoscaling saves you money over not, so even an expensive price could be considered a ‘cut’ of that, and possibly for many ecommerce sites even $99 a month might a drop in the bucket (!)…. but this price difference is so significant with hirefire (which has flat rate regardless of dynos), that it seems to me it would take a lot of additional features/value to justify.

And it’s not clear that Rails Autoscale has any feature advantage. In general, hirefire.io seems to have more features and flexibility.

Until 2021, hirefire.io could only analyze metrics with 1-minute resolution, so perhaps that was a “killer feature”?

Honestly I wonder if this price difference is sustained by Rails Autoscale only because most customers aren’t aware of hirefire.io, it not being listed on the heroku marketplace? Single-invoice billing is handy, but probably not worth $80+ a month. I guess hirefire’s logplex noise is a bit inconvenient?

Or is there something else I’m missing? Pricing competition is good for the consumer.

And are there any other heroku autoscale solutions, that can handle Rails bg job dynos, that I still don’t know about?

update a day after writing djcp on a reddit thread writes:

I used to be a principal engineer for the heroku add-ons program.

One issue with hirefire is they request account level oauth tokens that essentially give them ability to do anything with your apps, where Rails Autoscaling worked with us to create a partnership and integrate with our “official” add-on APIs that limits security concerns and are scoped to the application that’s being scaled.

Part of the reason for hirefire working the way it does is historical, but we’ve supported the endpoints they need to scale for “official” partners for years now.

A lot of heroku customers use hirefire so please don’t think I’m spreading FUD, but you should be aware you’re giving a third party very broad rights to do things to your apps. They probably won’t, of course, but what if there’s a compromise?

“Official” add-on providers are given limited scoped tokens to (mostly) only the actions / endpoints they need, minimizing blast radius if they do get compromised.

You can read some more discussion at that thread.

Gem authors, check your release sizes

Most gems should probably be a couple hundred kb at most. I’m talking about the package actually stored in and downloaded from rubygems by an app using the gem.

After all, source code is just text, and it doesn’t take up much space. OK, maybe some gems have a couple images in there.

But if you are looking at your gem in rubygems and realize that it’s 10MB or bigger… and that it seems to be getting bigger with every release… something is probably wrong and worth looking into it.

One way to look into it is to look at the actual gem package. If you use the handy bundler rake task to release your gem (and I recommend it), you have a ./pkg directory in your source you last released from. Inside it are “.gem” files for each release you’ve made from there, unless you’ve cleaned it up recently.

.gem files are just .tar files it turns out. That have more tar and gz files inside them etc. We can go into it, extract contents, and use the handy unix utility du -sh to see what is taking up all the space.

How I found the bytes

jrochkind-chf kithe (master ?) $ cd pkg

jrochkind-chf pkg (master ?) $ ls
kithe-2.0.0.beta1.gem        kithe-2.0.0.pre.rc1.gem
kithe-2.0.0.gem            kithe-2.0.1.gem
kithe-2.0.0.pre.beta1.gem    kithe-2.0.2.gem

jrochkind-chf pkg (master ?) $ mkdir exploded

jrochkind-chf pkg (master ?) $ cp kithe-2.0.0.gem exploded/kithe-2.0.0.tar

jrochkind-chf pkg (master ?) $ cd exploded

jrochkind-chf exploded (master ?) $ tar -xvf kithe-2.0.0.tar
 x metadata.gz
 x data.tar.gz
 x checksums.yaml.gz

jrochkind-chf exploded (master ?) $  mkdir unpacked_data_tar

jrochkind-chf exploded (master ?) $ tar -xvf data.tar.gz -C unpacked_data_tar/

jrochkind-chf exploded (master ?) $ cd unpacked_data_tar/
/Users/jrochkind/code/kithe/pkg/exploded/unpacked_data_tar

jrochkind-chf unpacked_data_tar (master ?) $ du -sh *
 4.0K    MIT-LICENSE
  12K    README.md
 4.0K    Rakefile
 160K    app
 8.0K    config
  32K    db
 100K    lib
 300M    spec

jrochkind-chf unpacked_data_tar (master ?) $ cd spec

jrochkind-chf spec (master ?) $ du -sh *
 8.0K    derivative_transformers
 300M    dummy
  12K    factories
  24K    indexing
  72K    models
 4.0K    rails_helper.rb
  44K    shrine
  12K    simple_form_enhancements
 8.0K    spec_helper.rb
 188K    test_support
 4.0K    validators

jrochkind-chf spec (master ?) $ cd dummy/

jrochkind-chf dummy (master ?) $ du -sh *
 4.0K    Rakefile
  56K    app
  24K    bin
 124K    config
 4.0K    config.ru
 8.0K    db
 300M    log
 4.0K    package.json
  12K    public
 4.0K    tmp

Doh! In this particular gem, I have a dummy rails app, and it has 300MB of logs, cause I haven’t b bothered trimming them in a while, that are winding up including in the gem release package distributed to rubygems and downloaded by all consumers! Even if they were small, I don’t want these in the released gem package at all!

That’s not good! It only turns into 12MB instead of 300MB, because log files are so compressable and there is compression involved in assembling the rubygems package. But I have no idea how much space it’s actually taking up on consuming applications machines. This is very irresponsible!

What controls what files are included in the gem package?

Your .gemspec file of course. The line s.files = is an array of every file to include in the gem package. Well, plus s.test_files is another array of more files, that aren’t supposed to be necessary to run the gem, but are to test it.

(Rubygems was set up to allow automated *testing* of gems after download, is why test files are included in the release package. I am not sure how useful this is, and who if anyone does it; although I believe that some linux distro packagers try to make use of it, for better or worse).

But nobody wants to list every file in your gem individually, manually editing the array every time you add, remove, or move one. Fortunately, gemspec files are executable ruby code, so you can use ruby as a shortcut.

I have seen two main ways of doing this, with different “gem skeleton generators” taking one of two approaches.

Sometimes a shell out to git is used — the idea is that everything you have checked into your git should be in the gem release package, no more or no less. For instance, one of my gems has this in it, not sure where it came from or who/what generated it.

spec.files = `git ls-files -z`.split("\x0").reject do |f|
 f.match(%r{^(test|spec|features)/})
end

In that case, it wouldn’t have included anything in ./spec already, so this obviously isn’t actually the gem we were looking at before.

But in this case, in addition to using ruby logic to manipulate the results, nothing excluded by your .gitignore file will end up included in your gem package, great!

In kithe we were looking at before, those log files were in the .gitignore (they weren’t in my repo!), so if I had been using that git-shellout technique, they wouldn’t have been included in the ruby release already.

But… I wasn’t. Instead this gem has a gemspec that looks like:

s.test_files = Dir["spec/*/"]

Just include every single file inside ./spec in the test_files list. Oops. Then I get all those log files!

One way to fix

I don’t really know which is to be preferred of the git-shellout approach vs the dir-glob approach. I suspect it is the subject of historical religious wars in rubydom, when there were still more people around to argue about such things. Any opinions? Or another approach?

Without being in the mood to restructure this gemspec in anyway, I just did the simplest thing to keep those log files out…

Dir["spec/*/"].delete_if {|a| a =~ %r{/dummy/log/}}

Build the package without releasing with the handy bundler supplied rake build task… and my gem release package size goes from 12MB to 64K. (which actually kind of sounds like a minimum block size or something, right?)

Phew! That’s a big difference! Sorry for anyone using previous versions and winding up downloading all that cruft! (Actually this particular gem is mostly a proof of concept at this point and I don’t think anyone else is using it).

Check your gem sizes!

I’d be willing to be there are lots of released gems with heavily bloated release packages like this. This isn’t the first one I’ve realized was my fault. Because who pays attention to gem sizes anyway? Apparently not many!

But rubygems does list them, so it’s pretty easy to see. Are your gem release packages multiple megs, when there’s no good reason for them to be? Do they get bigger every release by far more than the bytes of lines of code you think were added? At some point in gem history was there a big jump from hundreds of KB to multiple MB? When nothing particularly actually happened to gem logic to lead to that?

All hints that you might be including things you didn’t mean to include, possibly things that grow each release.

You don’t need to have a dummy rails app in your repo to accidentally do this (I accidentally did it once with a gem that had nothing to do with rails). There could be other kind of log files. Or test coverage or performance metric files, or any other artifacts of your build or your development, especially ones that grow over time — that aren’t actually meant to or needed as part of the gem release package!

It’s good to sanity check your gem release packages now and then. In most cases, your gem release package should be hundreds of KB at most, not MBs. Help keep your users’ installs and builds faster and slimmer!