rails3 unbearably slow view rendering: use REE with GC tuning

I have a fairly large/complex Rails application that has been chugging along in Rails2, and I am now at the final stages of migrating it to Rails3.

Now, the Rails2 version wasn’t all that speedy. I’m embaressed to admit in public that it’s average response time for the main action was 1.5-2s.   But that was (barely) good enough, and better than the proprietary application it replaced, so I hadn’t had time to get to profiling and figuring out how to do something about that.

(Some of the reason for it’s slowness is that it currently needs to talk to three different external data sources for each request in this primary action. A Solr index to resolve a search; then an external proprietary database to get some volatile state information about each document retrieved; then a local rdbms to get some user/session-specific info. So there will be a limit to how fast it can be, based on how fast those external data sources are. but I think there’s actually probably something else undesirable and fixable going on that I haven’t discovered yet.)

But okay, so I’m near the end of my Rails3 migration, got all the code working and tests passing… but I’m realizing that the Rails3 migrated version of the app is unbearably slow. 1.5-2 seconds is one thing, but it’s looking more like 4-12 seconds, obviously completely unacceptable.

Yes this occurred even in ‘production’ mode, and yes, I was using a recent 3.0.x version. (The most recent at this time, 3.09, but also tried with 3.06 same behavior). Using MRI ruby 1.8.7. In fact, for reasons not clear to me, the problem was WORSE in production mode, with rendering times going over 10 seconds!

So I start trying to figure out why, basically ‘profiling’ with my kludgey inexpert means of looking at the logs (which helpfully give rendering time for each view template, which ends up helpful), and manually inserting RubyProf statements in different parts of my code, looking at GraphHTML output for a given request.

The slowdown was definitely happening in the view rendering part. And did not appear to be due to ActiveRecord database access pushed into the view rendering part by ActiveRelation. It’s looking like it’s actual rendering… but I couldn’t figure out what view template where was causing slowness, in fact it was looking kind of non-deterministic. There might be a partial that rendered 20 times, and 19 of em render in 3ms each, and then one render takes 3000ms.  What?  Start futzing with my code and take out all calls to that partial, now there’s a different partial that was previously rendering perfectly quickly but now is rendering pathologically.

This was driving me crazy trying to figure out what was going on.  This kind of unpredictable behavior reminds me of concurrency issues, but as far as I could tell I wasn’t doing any weird ruby threading or anything.

Ruby Garbage Collection

Then I found this post which had the answer.  They were seeing symptoms much like mine, partial views taking pathological amounts of time to render (although there’s weren’t nearly as crazy bad as mine). And they discovered the problem was Ruby’s Garbage Collector. (Aha! Why didn’t that remind me of very similar problems I’ve had in the past when I worked with Java, with GC?  Probably because I’ve blocked it out of my memory to preserve my sanity).

Apparently (?) Rails3 (and possibly various other gem dependencies I use updated from Rails2 to Rails3) creates so many more objects than Rails2 that an app with the same logic can run into  GC issues with Rails3 and not Rails2. I guess?

Now, you can’t do much about this in MRI, but fortunately there exists Ruby Enterprise Edition,  which allows you to set just a few GC strategy params.  Following the advice of Bill Harding in that blog post mentioned above, I installed REE using rvm, set the environment variables with GC params with the ‘twitter’ example… and bingo, the problem I had spent a few days banging my head against the wall about was gone, the app now performs about the same as it did in it’s Rails2 incarnation (maybe even a bit better, maybe a bit worse, I haven’t done the formal benchmarking, you know, it’s good enough).

I actually had to increase the HEAP_SLOTS related params to double the ‘twitter’ example to get acceptable performance again. (Oddly, moving the RUBY_GC_MALLOC_LIMIT up or down seemed to be deleterious, the ‘twitter’ value seemed to be right). 

RUBY_HEAP_MIN_SLOTS=1000000
RUBY_HEAP_SLOTS_INCREMENT=500000
RUBY_HEAP_SLOTS_GROWTH_FACTOR=1
RUBY_GC_MALLOC_LIMIT=50000000

Thanks so much to Bill Harding for blogging this, if I hadn’t eventually found his blog on google, who knows how long it would have taken me to discover this.

So why haven’t more people noticed this publicly?

The mysterious thing to me is why there hasn’t been more talk of this.

So, okay, some people have known that ruby’s Garbage Collection is sometimes a problem (in fact one of the motivations for the phusion folks writing REE). But I was used to thinking of it as an issue only people with very high traffic apps (not mine) had to worry about.

If Rails3 makes it an issue for many more people, why is that one blog post by Bill Harding the only mention of it I can find?  Rails 3.0 has been out for quite a long time, Bill’s blog post is only from March.

  • Maybe it only effects a minority of Rails apps doing very specific things, of a kind that both me and Bill but not most people run into. (I do use an awful lot of partials. Which used to be a problem way back in Rails1 days, but hadn’t been a problem in Rails2 for a long time. I also use a bunch of ‘engine’ type gems. Wonder if Bill’s app does either of those things.
  • Or maybe something’s actually mis-configured in my app (and Bill’s) possibly as a result of migrating from Rails2, that triggers this? (I did make sure to update my environments/production.rb)
  • Maybe all the Rails bloggers/developers/other-cool-kids were already using GC tuned REE anyway before Rails3 and continued to do so.
  • Maybe all those folks are already using Ruby 1.9, which maybe doesn’t exhibit this problem with Rails3?  (I have no idea if Ruby 1.9 does GC different/better. There’s no REE for 1.9, right? So if MRI 1.9 triggers the same problems, it could be an even WORSE situation.)
  • Or maybe it’s gotten worse in more recent versions of Rails 3.0.x?
  • Or maybe it’s some other gem dependency I’m using, updated for rails3 (i’ve got a bunch) which is triggering it, and not Rails3 all by itself, so it bites only people using that gem in the right/wrong context?
  • What else? Anyone got any ideas?  Why isn’t this more discusssed/considered a bigger deal?  It seems potentially disastrous, especially if it’s a problem in 1.9 too but the REE fix isn’t available.

Are ruby/rails getting harder?

The buzz in rails blogs/podcasts these days is Steve Coast’s blog post “What the hell is happening to Rails?”, and Yehuda Katz’s response. 

The thing is, I actually don’t agree with any of Steve’s particular complaints. The way Rails used to do javascript helpers was Just Plain Wrong, and the Rails3 way is the right way to do it, it had to be done, and it’s done pretty slickly.  The Rails 3.1 asset pipeline likewise does things right that almost every app needed to do for performance, but was tricky to do with no clear ‘best practice’ standard way to do it in previous versions. Coffeecode, well, okay, I don’t see the appeal and don’t like it and don’t understand why it’s been made a default, but it’s so easy to not use it in Rails 3.1 that it doesn’t bother me much. (To not use it, you, well, just don’t use it. Just don’t suffix your js files in .coffee and don’t write coffeecode. Couldn’t be easier).  Etc.

(On the other hand, I seem to be alone in blasphemously still not liking Rails resourceful routing. It’s hard to build up via composition from various pieces of code. It’s too much magic. Have you tried to read that source code? But, nicely, Rails3 makes “old style” routing a lot better than it used to be too, with things like optional path components etc. Ironically, I think if “old style” routing had been as good originally as it is now in Rails3, there would have been a lot less motiviation/push for ‘resourceful’ routing!).

So all those particular things I think were the right (or not too wrong) choices, but still I think I’m not alone in having the general feeling that things are  getting harder to deal with, although I can’t put my finger on exactly what’s doing it or how it could be better.

In this case, I’m definitely not happy to have to be using REE, honestly. I’m basically the only Rails guy here at my shop, so I need to leave clear instructions for others on how to set up a production environment for our Rails apps. Now these instructions need to include REE (and really RVM, as the only sane way to install REE), setting up Passenger to use REE with proper GC tuning (which requires a ruby wrapper script!) etc. Instructions to set up a production environment have become quite a bit more complex.

And now I’m worried about what happens when I want to upgrade to ruby 1.9, is this problem going to be back without an REE fix?

But is this new? Well, sort of, because Rails3 seems to make the GC issues more likely to appear as a problem, but it was also a problem waiting to bite you all along if you managed to trigger it.  I think one portion of the perceived added complexity is a more complex toolchain, with some of the tools trying to deal with insufficiencies of more core utilities.  REE is dealing with problems with MRI, that really ought to be fixed in MRI. (And maybe are in 1.9?).   bundler makes dependency management so much easier than it used to be, I’d never want to go back. But it’s another part of the toolchain neccesary in large part because of insufficiencies in rubygems in the first place. (And the dreaded “Fetching source index for http://rubygems.org/” wait sucks).

Some of this is just the nature of under-resourced collaborative open source. ruby and rails seem to have significantly less full time company-supported developers working on it than many other succesful open source projects. That they are as good as they are with the development resources they’ve got is pretty amazing to begin with. But this lack of developer time is I think what leads to debacles like the recent issues with rubygems/bundler/rake/rails all stepping on each others toes — and such dependency issues I think have a lot to do with impressions of “damn, it’s hard to use.”  Those dependency issues in core utilities were really confusing and generally sucky to deal with. (And do not, as far as I can tell, all seem to be the fault of rubygems. At the same time, rake was having trouble with other core utilities not related to rubygems, I think? Why this had to happen during the same period that rails or bundler were having problems with rubygems, I don’t know).   But when you’re trying to get as much done as the ruby/rails community does with the limited developers doing it, combining seperate projects like that to make a larger whole is the only way to go, and dependency conflicts are a looming risk then. (Thankfully we at least have bundler now, making it not quite as disastrous as those same sort of problems would have been without bundler).

I think some of the “it seems harder to use” feeling in Rails also comes from the inevitable conflict between flexibility and simplicity.  Rails has gotten a lot more powerful. It’s also gotten a lot better factored for customizing parts of  Rails or adding extensions to Rails or using parts of Rails in isolation. These are all good things. But accomplishing that inevitably means more abstraction and complexity in code, which has an inevitable tension with simplicity and ease of use. Writing good software is hard, and the more powerful/abstract/flexible the software is, the more so.  It takes expertise and it takes time, and it’s never as good as we’d like, and it’s always hard to power and flexibility while keeping the simplicity.  Such is software.  I’d rather have the power and flexibility and developers with (absolutely, undoubtedly) the right ease-of-use goals, but sometimes falling short of the ideal — then be stuck with PHP.

Also it may be that ruby/rails has never been as easy to use as we’d like, and it’s still not?

17 thoughts on “rails3 unbearably slow view rendering: use REE with GC tuning

  1. For your situation of having to leave instructions for server setup, sounds like you should _really_ be using puppet or chef.

  2. yeah, I think you’re right on chef, haven’t gotten around to it. There’s definitely a sort of tipping point balancing the pain of setting up chef, and teaching current and successor coworkers (and documenting) how to use chef; with the pain of just just documenting and following narrative instructions. I think I’ve probably crossed that tipping point, certainly with the addition of ree-with-gc-tuning if not earlier. (You probably cross if it your instructions are more than a screenful on a small monitor).

    But haven’t gotten around to/found time to set up, learn, and implement recipes for chef yet.

  3. Far out, great blog Jonathan. I was starting to think that I was going crazy. We’ve got a huge Rails app which we’ve just ported from Rails 2.3.11 to Rails 3.1.rc4, MySQL to Postgres and we’re benchmarking going from REE to 1.9.2. The performance issue has been a huge one.

    We also use templates a lot, although personally I fail to see how anyone can be DRY and not use templates, unless they do naughty things like rendering lots of markup from helpers? We also have a large number of dependencies and we have also upgraded a legacy 2.x app. However a number of our devs have personal projects in Rails 3.x and note the same thing in any project with lots of rendering.

    By the way, don’t even think about going to 1.9.2, it’s a complete dog with this kind of app. We jumped from around 0.8sec load time to almost 4secs in 1.9.2 consistently across many different views. I’m about to try ruby-1.9.2-head with RVM and some other Rubies but won’t be wasting too much time as we will host on Heroku soon.

    I really hope they get the issue sorted in 1.9.3 + 3.1, it’s a big black eye for RoR IMHO, and I’m a RoR fanboy. One could create one’s first Rails app, do the right thing and keep one’s views DRY and think that RoR is as slow as a wet week.

    I’m going to try your GC tuning tips but as we are looking at moving hosting to Heroku we’ll have to rely on them hitting the “twitter numbers”.

  4. Hmm, that’s really concerning that you didn’t have any better luck with 1.9.2. ruby 1.9.x is definitely the future, and we’re going to have to move their eventually. (I think i saw in release notes somewhere that rails 3.2 will only support ruby 1.9?).

    Jruby (1.8 mode) seems to be working okay for me, in the same ballpark as GC tuned REE.

    I’m not even positive it’s the “many partials” thing that’s doing it. When I try to experimentally take things out until the performance behaves reasonably again…. I don’t get reasonable (or deterministically predicctable) performance even when I take out most of my partials. I’m not entirely sure WHAT it is. To make matters even more mysterious, I have colleagues at other organizations who are using very similar apps to me (based on the same codebase) who are NOT experiencing the problem — I almost wonder if it might be somehow OS or other environment related (we use RHEL 5 on a xen VM).

  5. Hey Jonathan, just wanted to say that we too experience this, with a rather straight forward rails app that we’re porting from rails2 to 3. We haven’t profiled yet, will do after we first get all tests working again, will let you know what we find out. One question: do you use haml/sass for your views?

    I’ll bet it has something to do with some resources being reloaded and/or recompiled after each request. In development mode for example it reloads all i18n resources for each request. For us on my machine that means every request already takes about 700ms to a full second, with the effect of loads and loads of String object creations each time, which would need the GC, etc.

  6. Btw, loading those exact same i18n resources in rails2 with i18n v0.4.2 takes about 70ms to 100ms per request on my machine. So it seems there’s a 10x performance degradation with i18n v0.5 which I’m using with rails3.0.9. I’m measuring only the time it takes to load the yml resources, nothing else.

  7. Thanks Lawrence. I’m not using haml or sass. And the problem exhibits for me in production too, which is the really problematic thing. I’m also not using much i18n, although I am using it here or there. But the problem exhibits for me even in production, when presumably i18n yml files (for example) are only loaded once and load time would not be an issue. It is incredibly frustrating to me right now, I’ve lost several days to this — I’m having trouble getting acceptable performance even with REE attempted GC tuned. And my app is not deployable in this state, but I need to get it on rails3 to keep up with some dependencies.

  8. As said, I haven’t profiled yet, not there yet to look into those details. I’m not convinced the solution is tuning the GC, that’s seems more like a workaround to me. The question is: why does the GC suddenly need to work so much more with rails3? I had expected it would be leaner and meaner. Got a gut feel that it has something to do String operations, especially those of the new ActiveSupport SafeBuffer kind. Have you tried profiling your app and see what kind of objects are created most that requires the GC to work?

  9. I’m not too experienced with profiling, I’ve tried to play around a bit with the available tools, but haven’t yet gotten anything I understand out of it, and haven’t had the time it would take to understand how to get something useful that I understand out.

    But I didn’t expect Rails3 to be leaner and meaner, really. It’s got a lot more abstraction and layers in it, which means more objects, which means more GC, right? Looking around on the net, this seems to be what everyone else thinks too.

  10. What really frustrates me, is that the only “constant” I can see in most people posting this problem is that they upgraded from Rails2. When I say we upgraded, we created a “clean” Rails 3.1 project and copy our models/controllers/views and re-wrote the routes from scratch. There’s probably some Rails2 config smell hanging around but I don’t see anything that smells suspicious.

    @lawrence we are using SASS heavily, but I don’t think that’s the issue. I can tail the logfiles and watch it chug through all the view rendering whereas with REE1.8.7 on Rails 3.1 it whips through it (and is easily 3 times faster)

  11. @Matt Powell
    It’s not confined to updated Rails2 apps. My app is a brand new Rails 3.1.0 app and I’m seeing the same symptoms:

    Completed 200 OK in 1658ms (Views: 1331.2ms | ActiveRecord: 13.3ms | Solr: 0.0ms)

    My app uses Haml exclusively and all my css files have the scss extension on them, though none of them use Sass.

    Has anyone tried using Ruby 1.9.3-rc1?

Leave a comment