Heroku release phase, rails db:migrate, and command failure

If you use capistrano to deploy a Rails app, it will typically run a rails db:migrate with every deploy, to apply any database schema changes.

If you are deploying to heroku you might want to do the same thing. The heroku “release phase” feature makes this possible. (Introduced in 2017, the release phase feature is one of heroku’s more recent major features, as heroku dev has seemed to really stabilize and/or stagnate).

The release phase docs mention “running database schema migrations” as a use case, and there are a few ((1), (2), (3)) blog posts on the web suggesting doing exactly that with Rails. Basically as simple as adding release: bundle exec rake db:migrate to your Procfile.

While some of the blog posts do remind you that “If the Release Phase fails the app will not be deployed”, I have found the implications of this to be more confusing in practice than one would originally assume. Particularly because on heroku changing a config var triggers a release; and it can be confusing to notice when such a release has failed.

It pays to consider the details a bit so you understand what’s going on, and possibly consider somewhat more complicated release logic than simply calling out to rake db:migrate.

1) What if a config var change makes your Rails app unable to boot?

I don’t know how unusual this is, but I actually had a real-world bug like this when in the process of setting up our heroku app. Without confusing things with the details, we can simulate such a bug simply by putting this in, say, config/application.rb:

if ENV['FAIL_TO_BOOT']
  raise "I am refusing to boot"
end

Obviously my real bug was weirder, but the result was the same — with some settings of one or more heroku configuration variables, the app would raise an exception during boot. And we hadn’t noticed this in testing, before deploying to heroku.

Now, on heroku, using CLI or web dashboard, set the config var FAIL_TO_BOOT to “true”.

Without a release phase, what happens?

  • The release is successful! If you look at the release in the dashboard (“Activity” tab) or heroku releases, it shows up as successful. Which means heroku brings up new dynos and shuts down the previous ones, that’s what a release is.
  • The app crashes when heroku tries to start it in the new dynos.
  • The dynos will be in “crashed” state when looked at in heroku ps or dashboard.
  • If a user tries to access the web app, they will get the generic heroku-level “could not start app” error screen (unless you’ve customized your heroku error screens, as usual).
  • You can look in your heroku logs to see the error and stack trace that prevented app boot.

Downside: your app is down.

Upside: It is pretty obvious that your app is down, and (relatively) why.

With a db:migrate release phase, what happens?

The Rails db:migrate rake task has a dependency on the rails :environment task, meaning it boots the Rails app before executing. You just changed your config variable FAIL_TO_BOOT: true such that the Rails app can’t boot. Changing the config variable triggered a release.

As part of the release, the db:migrate release phase is run… which fails.

  • The release is not succesful, it failed.
  • You don’t get any immediate feedback to that effect in response to your heroku config:add command or on the dashboard GUI in the “settings” tab. You may go about your business assuming it succeeded.
  • If you look at the release in heroku releases or dashboard “activity” tab you will see it failed.
  • You do get an email that it failed. Maybe you notice it right away, or maybe you notice it later, and have to figure out “wait, which release failed? And what were the effects of that? Should I be worried?”
  • The effects are:
    • The config variable appears changed in heroku’s dashboard or in response to heroku config:get etc.
    • The old dynos without the config variable change are still running. They don’t have the change. If you open a one-off dyno, it will be using the old release, and have the old (eg) ENV[‘FAIL_TO_BOOT’] value.
    • ANY subsequent attempts at a releases will keep fail, so long as the app is in a state (based on teh current config variables) that it can’t boot.

Again, this really happened to me! It is a fairly confusing situation.

Upside: Your app is actually still up, even though you broke it, the old release that is running is still running, that’s good?

Downside: It’s really confusing what happened. You might not notice at first. Things remain in a messed up inconsistent and confusing state until you notice, figure out what’s going on, what release caused it, and how to fix it.

It’s a bit terrifying that any config variable change could do this. But I guess most people don’t run into it like I did, since I haven’t seen it mentioned?

2) A heroku pg:promote is a config variable change, that will create a release in which db:migrate release phase fails.

heroku pg:promote is a command that will change which of multiple attached heroku postgreses are attached as the “primary” database, pointed to by the DATABASE_URL config variable.

For a typical app with only one database, you still might use pg:promote for a database upgrade process; for setting up or changing a postgres high-availability leader/follower; or, for what I was experimenting with it for, using heroku’s postgres-log-based rollback feature.

I had assumed that pg:promote was a zero-downtime operation. But, in debugging it’s interaction with my release phase, I noticed that pg:promote actually creates TWO heroku releases.

  1. First it creates a release labelled Detach DATABASE , in which there is no DATABASE_URL configuration variable at all.
  2. Then it creates another release labelled Attach DATABASE in which the DATABASE_URL configuration variable is defined to it’s new value.

Why does it do this instead of one release that just changes the DATABASE_URL? I don’t know. My app (like most Rails and probably other apps) can’t actually function without DATABASE_URL set, so if that first release ever actually runs, it will just error out. Does this mean there’s an instant with a “bad” release deployed, that pg:promote isn’t actually zero-downtime? I am not sure, it doens’t seem right (I did file a heroku support ticket asking….).

But under normal circumstances, either it’s not a problem, or most people(?) don’t notice.

But what if you have a db:migrate release phase?

When it tries to do release (1) above, that release will fail. Because it tries to run db:migrate, and it can’t do that without a DATABASE_URL set, so it raises, the release phase exits in an error condition, the release fails.

Actually what happens is without DATABASE_URL set, the Rails app will assume a postgres URL in a “default” location, try to connect to, and fail, with an error message (hello googlers?), like:

ActiveRecord::ConnectionNotEstablished: could not connect to server: No such file or directory
	Is the server running locally and accepting
	connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

Now, release (2) is coming down the pike seconds later, this is actually fine, and will be zero outage. We had a release that failed (so never was deployed), and seconds later the next correct release succeeds. Great!

The only problem is that we got an email notifying us that release 1 failed, and it’s also visible as failing in the heroku release list, etc.

A “background” (not in response to a git push or other code push to heroku) release failing is already a confusing situation — a”false positives” that actually mean “nothing unexpected or problematic happened, just ignore this and carry on.” is… really not something I want. (I call this the “error notification crying wolf”, right? I try to make sure my error notifications never do it, because it takes your time away from flow unecessarily, and/or makes it much harder to stay vigilant to real errors).

Now, there is a fairly simple solution to this particular problem. Here’s what I did. I changed my heroku release phase from rake db:migrate to a custom rake task, say release: bundle exec rake my_custom_heroku_release_phase, defined like so:

task :my_custom_heroku_release_phase do
if ENV['DATABASE_URL']
Rake::Task["db:migrate"].invoke
else
$stderr.puts "\n!!! WARNING, no ENV['DATABASE_URL'], not running rake db:migrate as part of heroku release !!!\n\n"
end
end
view raw custom.rake hosted with ❤ by GitHub

Now that release (1) above at least won’t fail, it has the same behavior as a “traditional” heroku app without a release phase.

Swallow-and-report all errors?

When a release fails because a release phase has failed as result of a git push to heroku, that’s quite clear and fine!

But the confusion of the “background” release failure, triggered by a config var change, is high enough that part of me wants to just rescue StandardError in there, and prevent a failed release phase from ever exiting with a failure code, so heroku will never use a db:migrate release phase to abort a release.

Just return the behavior to the pre-release-phase heroku behavior — you can put your app in a situation where it will be crashed and not work, but maybe that’s better not a mysterious inconsistent heroku app state that happens in the background and you find out about only through asynchronous email notifications from heroku that are difficult to understand/diagnose. It’s all much more obvious.

On the other hand, if a db:migrate has failed not becuase of some unrelated boot process problem that is going to keep the app from launching too even if it were released, but simply because the db:migrate itself actually failed… you kind of want the release to fail? That’s good? Keep the old release running, not a new release with code that expects a db migration that didn’t happen?

So I’m not really sure.

If you did want to rescue-swallow-and-notify, the custom rake task for your heroku release logic — instead of just telling heroku to run a standard thing like db:migrate on release — is certainly convenient.

Also, do you really always want to db:migrate anyway? What about db:schema:load?

Another alternative… if you are deploying an app with an empty database, standard Rails convention is to run rails db:schema:load instead of db:migrate. The db:migrate will probably work anyway, but will be slower, and somewhat more error-prone.

I guess this could come up on heroku with an initial deploy or (for some reason) a database that’s been nuked and restarted, or perhaps a Heroku “Review app”? (I don’t use those yet)

stevenharman has a solution that actually checks the database, and runs the appropriate rails task depending on state, here in this gist.

I’d probably do it as a rake task instead of a bash file if I were going to do that. I’m not doing it at all yet.

Note that stevenharman’s solution will actually catch a non-existing or non-connectable database and not try to run migrations… but it will print an error message and exit 1 in that case, failing the release — meaning that you will get a failed release in the pg:promote case mentioned above!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s