I have a fairly large/complex Rails application that has been chugging along in Rails2, and I am now at the final stages of migrating it to Rails3.
Now, the Rails2 version wasn’t all that speedy. I’m embaressed to admit in public that it’s average response time for the main action was 1.5-2s. But that was (barely) good enough, and better than the proprietary application it replaced, so I hadn’t had time to get to profiling and figuring out how to do something about that.
(Some of the reason for it’s slowness is that it currently needs to talk to three different external data sources for each request in this primary action. A Solr index to resolve a search; then an external proprietary database to get some volatile state information about each document retrieved; then a local rdbms to get some user/session-specific info. So there will be a limit to how fast it can be, based on how fast those external data sources are. but I think there’s actually probably something else undesirable and fixable going on that I haven’t discovered yet.)
But okay, so I’m near the end of my Rails3 migration, got all the code working and tests passing… but I’m realizing that the Rails3 migrated version of the app is unbearably slow. 1.5-2 seconds is one thing, but it’s looking more like 4-12 seconds, obviously completely unacceptable.
Yes this occurred even in ‘production’ mode, and yes, I was using a recent 3.0.x version. (The most recent at this time, 3.09, but also tried with 3.06 same behavior). Using MRI ruby 1.8.7. In fact, for reasons not clear to me, the problem was WORSE in production mode, with rendering times going over 10 seconds!
So I start trying to figure out why, basically ‘profiling’ with my kludgey inexpert means of looking at the logs (which helpfully give rendering time for each view template, which ends up helpful), and manually inserting RubyProf statements in different parts of my code, looking at GraphHTML output for a given request.
The slowdown was definitely happening in the view rendering part. And did not appear to be due to ActiveRecord database access pushed into the view rendering part by ActiveRelation. It’s looking like it’s actual rendering… but I couldn’t figure out what view template where was causing slowness, in fact it was looking kind of non-deterministic. There might be a partial that rendered 20 times, and 19 of em render in 3ms each, and then one render takes 3000ms. What? Start futzing with my code and take out all calls to that partial, now there’s a different partial that was previously rendering perfectly quickly but now is rendering pathologically.
This was driving me crazy trying to figure out what was going on. This kind of unpredictable behavior reminds me of concurrency issues, but as far as I could tell I wasn’t doing any weird ruby threading or anything.
Ruby Garbage Collection
Then I found this post which had the answer. They were seeing symptoms much like mine, partial views taking pathological amounts of time to render (although there’s weren’t nearly as crazy bad as mine). And they discovered the problem was Ruby’s Garbage Collector. (Aha! Why didn’t that remind me of very similar problems I’ve had in the past when I worked with Java, with GC? Probably because I’ve blocked it out of my memory to preserve my sanity).
Apparently (?) Rails3 (and possibly various other gem dependencies I use updated from Rails2 to Rails3) creates so many more objects than Rails2 that an app with the same logic can run into GC issues with Rails3 and not Rails2. I guess?
Now, you can’t do much about this in MRI, but fortunately there exists Ruby Enterprise Edition, which allows you to set just a few GC strategy params. Following the advice of Bill Harding in that blog post mentioned above, I installed REE using rvm, set the environment variables with GC params with the ‘twitter’ example… and bingo, the problem I had spent a few days banging my head against the wall about was gone, the app now performs about the same as it did in it’s Rails2 incarnation (maybe even a bit better, maybe a bit worse, I haven’t done the formal benchmarking, you know, it’s good enough).
I actually had to increase the HEAP_SLOTS related params to double the ‘twitter’ example to get acceptable performance again. (Oddly, moving the RUBY_GC_MALLOC_LIMIT up or down seemed to be deleterious, the ‘twitter’ value seemed to be right).
RUBY_HEAP_MIN_SLOTS=1000000 RUBY_HEAP_SLOTS_INCREMENT=500000 RUBY_HEAP_SLOTS_GROWTH_FACTOR=1 RUBY_GC_MALLOC_LIMIT=50000000
Thanks so much to Bill Harding for blogging this, if I hadn’t eventually found his blog on google, who knows how long it would have taken me to discover this.
So why haven’t more people noticed this publicly?
The mysterious thing to me is why there hasn’t been more talk of this.
So, okay, some people have known that ruby’s Garbage Collection is sometimes a problem (in fact one of the motivations for the phusion folks writing REE). But I was used to thinking of it as an issue only people with very high traffic apps (not mine) had to worry about.
If Rails3 makes it an issue for many more people, why is that one blog post by Bill Harding the only mention of it I can find? Rails 3.0 has been out for quite a long time, Bill’s blog post is only from March.
- Maybe it only effects a minority of Rails apps doing very specific things, of a kind that both me and Bill but not most people run into. (I do use an awful lot of partials. Which used to be a problem way back in Rails1 days, but hadn’t been a problem in Rails2 for a long time. I also use a bunch of ‘engine’ type gems. Wonder if Bill’s app does either of those things.
- Or maybe something’s actually mis-configured in my app (and Bill’s) possibly as a result of migrating from Rails2, that triggers this? (I did make sure to update my environments/production.rb)
- Maybe all the Rails bloggers/developers/other-cool-kids were already using GC tuned REE anyway before Rails3 and continued to do so.
- Maybe all those folks are already using Ruby 1.9, which maybe doesn’t exhibit this problem with Rails3? (I have no idea if Ruby 1.9 does GC different/better. There’s no REE for 1.9, right? So if MRI 1.9 triggers the same problems, it could be an even WORSE situation.)
- Or maybe it’s gotten worse in more recent versions of Rails 3.0.x?
- Or maybe it’s some other gem dependency I’m using, updated for rails3 (i’ve got a bunch) which is triggering it, and not Rails3 all by itself, so it bites only people using that gem in the right/wrong context?
- What else? Anyone got any ideas? Why isn’t this more discusssed/considered a bigger deal? It seems potentially disastrous, especially if it’s a problem in 1.9 too but the REE fix isn’t available.
Are ruby/rails getting harder?
(On the other hand, I seem to be alone in blasphemously still not liking Rails resourceful routing. It’s hard to build up via composition from various pieces of code. It’s too much magic. Have you tried to read that source code? But, nicely, Rails3 makes “old style” routing a lot better than it used to be too, with things like optional path components etc. Ironically, I think if “old style” routing had been as good originally as it is now in Rails3, there would have been a lot less motiviation/push for ‘resourceful’ routing!).
So all those particular things I think were the right (or not too wrong) choices, but still I think I’m not alone in having the general feeling that things are getting harder to deal with, although I can’t put my finger on exactly what’s doing it or how it could be better.
In this case, I’m definitely not happy to have to be using REE, honestly. I’m basically the only Rails guy here at my shop, so I need to leave clear instructions for others on how to set up a production environment for our Rails apps. Now these instructions need to include REE (and really RVM, as the only sane way to install REE), setting up Passenger to use REE with proper GC tuning (which requires a ruby wrapper script!) etc. Instructions to set up a production environment have become quite a bit more complex.
And now I’m worried about what happens when I want to upgrade to ruby 1.9, is this problem going to be back without an REE fix?
But is this new? Well, sort of, because Rails3 seems to make the GC issues more likely to appear as a problem, but it was also a problem waiting to bite you all along if you managed to trigger it. I think one portion of the perceived added complexity is a more complex toolchain, with some of the tools trying to deal with insufficiencies of more core utilities. REE is dealing with problems with MRI, that really ought to be fixed in MRI. (And maybe are in 1.9?). bundler makes dependency management so much easier than it used to be, I’d never want to go back. But it’s another part of the toolchain neccesary in large part because of insufficiencies in rubygems in the first place. (And the dreaded “Fetching source index for http://rubygems.org/” wait sucks).
Some of this is just the nature of under-resourced collaborative open source. ruby and rails seem to have significantly less full time company-supported developers working on it than many other succesful open source projects. That they are as good as they are with the development resources they’ve got is pretty amazing to begin with. But this lack of developer time is I think what leads to debacles like the recent issues with rubygems/bundler/rake/rails all stepping on each others toes — and such dependency issues I think have a lot to do with impressions of “damn, it’s hard to use.” Those dependency issues in core utilities were really confusing and generally sucky to deal with. (And do not, as far as I can tell, all seem to be the fault of rubygems. At the same time, rake was having trouble with other core utilities not related to rubygems, I think? Why this had to happen during the same period that rails or bundler were having problems with rubygems, I don’t know). But when you’re trying to get as much done as the ruby/rails community does with the limited developers doing it, combining seperate projects like that to make a larger whole is the only way to go, and dependency conflicts are a looming risk then. (Thankfully we at least have bundler now, making it not quite as disastrous as those same sort of problems would have been without bundler).
I think some of the “it seems harder to use” feeling in Rails also comes from the inevitable conflict between flexibility and simplicity. Rails has gotten a lot more powerful. It’s also gotten a lot better factored for customizing parts of Rails or adding extensions to Rails or using parts of Rails in isolation. These are all good things. But accomplishing that inevitably means more abstraction and complexity in code, which has an inevitable tension with simplicity and ease of use. Writing good software is hard, and the more powerful/abstract/flexible the software is, the more so. It takes expertise and it takes time, and it’s never as good as we’d like, and it’s always hard to power and flexibility while keeping the simplicity. Such is software. I’d rather have the power and flexibility and developers with (absolutely, undoubtedly) the right ease-of-use goals, but sometimes falling short of the ideal — then be stuck with PHP.
Also it may be that ruby/rails has never been as easy to use as we’d like, and it’s still not?