Note: This info is out of date with Rails 3.1 and the asset pipeline. The asset pipeline does a better job of taking care of this for you, although you may still need to do some work in apache to get headers set the right way, not sure, but it’d not be the exact solution mentioned here.
tldr intro: While Rails adds a ?timestamp query string to the end of asset URLs, under passenger+apache deployment (and I suspect most other deployment), these actually have no effect whatsoever on browser caching. In some cases this might not matter to you much — if you are able to use the Rails helper :cache=>’filename’ argument, or if you don’t have very many asset files, or if you aren’t trying to wring every last bit of performance out of your app. But sometimes you really do want your Rails assets to be cached for real by the browser, without even an if-modified check. Below is a way to kind of hackily set up your apache conf to do so. [btw, wouldn’t it be nice if Passenger would do this automatically for Rails, avoiding the need for a hack?]
The basic issue: why do we care?
So one approach is to try and have fewer CSS or JS files, perhaps using Rails asset helper methods “:cache=>’filename'” argument. There are some cases where this isn’t convenient, and also some issues with this. And even if you use it, you think, gee, wouldn’t it be nice to have the browser use it’s cached copy on the second and later requests, if it has one?
Although it’s certainly less of an issue if you’ve consolidated your js and css down to one file each, still, you may have noticed Rails adds timestamp query strings to assets, aren’t those there for a reason?
Rails timestamp in query string: Does nothing at all in typical setup
The idea here is that exact URL can in fact be cached forever by the client, because if the content ever changes, the timestamp on the end will change, and the browser will be sent a different URL.
Browsers all cache URLs including query string. However, apache ignores the query string on such a URL when it’s serving static resources (and a typical passenger deployment has apache serving static resources directly), just serves up the same resource as if the query string weren’t there.
That’s the idea of these timestamp query strings, but how is the browser to know it’s allowed to cache that resource forever (or for a year, which is the longest the HTTP standard allows to be specified)? It only knows that if the web server sends appropriate caching headers (such as “Expires:”) ; the browser can’t know that just becuase a URL ends in a query string with ten digits, that means it’s Rails way of saying it’s cacheable forever, that’s what HTTP headers are for.
But under a standard apache+passenger deployment (and I’m betting any other common deployment), the web server doesn’t send any special headers for these URLs, because the web server also doesn’t have any way to know these are cache-forever-able URLs. In fact, by the time apache gets to returning the response which could have expires headers, apache has already ignored the fact that the url even has a query string.
So what will the browser actually do, the second time it needs one of these asset files it already requested? Well, it does have them in it’s cache. So it sends a conditional GET to the server, saying “Only send me this file if it’s changed since the last one I had.” (Apache, in most typical setups, will also send an “Etag” with the original asset, which provides another way for the browser to ask the server if the content has changed). And it won’t download the file a second time if it hasn’t.
But recall that in most applications, it’s not really the size of the asset downloads that causes a slowdown, it’s the number of them, because of the overhead of making all those HTTP requests. So even sending a conditional GET, the browser is still sending an http request for each asset. And if you’ve got a bunch of assets, that’s still going to slow things down — on my application, it was slowing things down by up to a second, just for all conditional gets!
Yes, one solution would be trying to reduce my number of assets. But what’s that Rails-generated timestamp for, anyway? In a typical setup, the timestamp is doing absolutely nothing to effect browser caching behavior — the conditional get described above happens with or without the special Rails timestamp.
Googling around for this, I found a few people mentioning this, but not as much discussion as I expected. I’d think it would be more noteworthy that Rails is spending time calculating and attaching these timestamps, which then have no effect at all. Maybe my googling skills are just poor. There were a few comments on approaches to address this, which I added upon to arrive at a solution satisfactory to me.
To be fair, there is some discussion of this in the latest Rails documentation. It recommends that you tell apache to send Expires “access plus 1 year” for all your resources. Which is an approach I found googling too.
The problem with this, is that it will send expires-one-year headers for ALL assets, those with the timestamp/versioning query string on the end, and those without. But it’s the versioning/timestamp query string on the end that really makes the URL cacheable “forever”, that’s what keeps the browser from using a cached version of something that has since changed.
Really, we want the expires-one-year header to be sent ONLY for asset URLs that have the versioning query string appended on the end, and not for those that don’t. Because otherwise, if you send it for all of them, and there’s one single place in your app that causes the browser to request a non-version-query-stringed asset url, the browser will end up caching it forever and not pick up changes you make to that asset (unless the user does a hard reset), which is pretty unfortunate. So what the Rails docs recommend is actually kind of dangerous unless you’re really careful.
But the Apache mod_expires modules doesn’t accept any conditions, you can’t tell it to only set an expires header for certain URLs. Well, you sort of can, by placing it inside a specific virtual host, or <location>, <locationmatch>, <directory>, <filesmatch> etc directive. But none of those directives let you constrain their applicability based on query string, they all just check the path part of the URL.
mod_rewrite will let you check the query string, but mod_rewrite doesn’t give you any good way to set the expires header to “one year in the future”. It’ll let you hard-code an expires to a certain date, but then you’ve always got to be going back and pushing that hard-coded date into the future, as time marches on.
So I was pointed to the real solution by this blog post by Stephen Sykes. You’ve got to kind of hackily first use mod_rewrite to change the actual path of any asset with a rails versioning/timestamp query. You create a symlink in your file system so that new path actually points to exactly the same files as the old path. But since it’s a new path, you can use apache directives to set far-future expires only for resources in that path. Bingo, it’s kind of hacky, but now you have far-future expires headers only on rails assets with the versioning/timestamp query string, and not on those that don’t.
I made a few changes to Stephen’s suggestions though, because:
- I didn’t want to have to specify a file system <directory> in my apache conf. I use passenger, and the exact directory something lives in is determined simply by a symlink in web root to the real application. If I ever changed this, I didn’t want to have to remember to go sync it in the apache conf too. I’d rather use an apache directive like <location> based on the URI, not <directory> based on the file system location.
- My Rails apps are sometimes deployed not at web server root, but at a certain path on the web server, so I needed to take account of that, and make sure I was only doing the path rewriting for things actually in my Rails apps.
- I prefer to be more conservative with the conditions on the RewriteRule, to only apply this logic to things in one of the actual asset directories, /images, /stylesheets, /css. (Plus /plugin_assets for Rails2 engines). If I’m too conservative, and realize there’s something else I wanted to cache, can always expand the condition later. But if I’m too generous, by the time I realize it there might be clients out there caching forever something I don’t want them to, with no way to fix it except somehow getting every such user to do a hard-refresh in their browser.
So here’s what I end up with. First you still need to go into your Rails app’s public directory, and do a:
ln -s . add_expires_header
(Symlinks go into git fine, it looks like, if you have your public directory in git for capistrano deployment or anything else.)
Then, in your apache conf, one per application. In this case my application is mounted at /demo, along with the passenger directive: RailsBaseURI /demo
Yeah, you need one of these per application, and you need to remember go and fix the prefix /demo path if you change your mount-point. But at least you don’t need to change anything just because you change the file system location, if you don’t change the URL prefix.
Wouldn’t it be nice if…
I kind of wish Passenger would just do this for us. With a config directive, perhaps even turned on by default. Would go with the general “just do the right thing” goals of Passenger. Rails is creating these timestamps, isn’t the whole point that they be cached on the browser? I’d like it if Passenger just made it so, which it likely could, since it’s already an apache module. Although it might be a bit of a pain since usually the middleware stays out of the way of apache serving of static resources entirely.
Alternately, there is some talk that appending a “cache busting” string in the query section of the URI is the wrong approach for other reasons (along with other complaints about Rails general strategy, although this issue isn’t mentioned), because not all proxies and other caches pay attention to the query string in caching. If Rails were putting the ‘cache buster’ in the cache portion alone, it would be easier to do the apache directives in a much shorter and simpler way, just using mod_expires without the need to bring mod_rewrite into it. If you use that Asset Fingerprint plugin, you might find it easier to do the apache conf simpler. (Although putting the fingerprint in the path instead of the query requires either making an extra copy of the asset on the server, as the Asset Fingerprint plugin does, or intervening in the server processing with mod_rewrite again).
My own complex situation, if you’re curious
In order to provide for modular code, where extra add-ons or local customizations can provide their own CSS or JS that integrates cleanly with the core BL, BL builds up it’s list of JS or CSS assets dynamically, per-action. That makes it quite hard to use the standard :cache=>’key’ rails asset helper argument, without accidentally caching two different collections of assets under the same key, with disastrous consequences.
Idea for even better solution
So the above solution with apache conf is definitely good enough for now, improving page loading times (when assets are cached) by up to 1 second or more in my application. But on the first request (or first request after assets have changed on the server), the client will still need to download a bunch of separate assets using a bunch of HTTP calls, which is unfortunate.
Taking the principles of the Asset Fingerprint plugin, but combining them with the goals of the standard Rails helper methods :cache=>key argument to aggregate resources, here’s my idea for a caching strategy fit for my situation.
The list of assets to render is actually a list of array-arguments to the rails helper methods, which sometimes include the Rails2 Engines :plugin argument too.
So first take this list, and normalize it to a string in a sorted way (basically, sort all the elements, and make sure the serialized hashes have sorted keys and any nested arrays are sorted too etc). This should hopefully be a fairly quick thing to do, becuase we’re going to do it on any page render.
Now check an application-wide hash, with that normalized serialization as a key, to see if we already have an aggregated asset filename generated for this unique combination of asset files. If so, we just generate the proper html tag to include it.
If not, we combine all these assets into one file (with internal comments as to where each subsidiary file starts, to aid in debugging if neccesary). We MD5 hash the whole file, and construct a filename that includes the MD5 hash (meaning the filename is unique to it’s exact contents). (These steps are done only once per unique-combination-of-assets per application instance run, so it’s okay if they are expensive). We put the file in a public/aggregated_assets directory (not bothering if there’s a file with the exact same name, and thus neccesarily the same content, already there; maybe another instance already created it, no big deal), and save the filename in the app-wide Hash mentioned above. And we set up apache with a simple location/expires directive to expire-far-future anything in the /aggregated_assets directory.
If I ever have time to work on it, or if it ever becomes a priority to squeeze out even a bit more performance from my application, that’s what I’ll try. Get the benefits of consolidating multiple assets into one; get the benefits of a far-future expires date on a filename which will change if it’s contents change; and as an added bonus do caching the way the Asset Fingerprint plugin argues is better, based on MD5 fingerprint instead of timestamp and in path instead of query string.