more thoughts on unbearably slow rails3

So as mentioned before, my specific app converted to Rails3 is horribly slow (300% to 400% slower than it was before conversion in Rails2), and the slowness seems to be attributable to ruby Garbage Collection.  And in fact, contrary to my initial report, even using REE and messing with GC tuning hasn’t been enough to reliably make things right again.

Continuing to look around on the web for anyone with anything to say about this, I find this post mentioning similar Rails performance problems — in a comment on that post, Jared Mehle points us to Aaron Paterson’s keynote at the recent RailsConf.   (thanks very much to Luke, Jared, and the rails blogosophere in general for sharing their notes and links making this info more findable than it might be otherwise).

Starting at about 34:10, Aaron talks about Rails3 performance loss compared to Rails2.  In whatever sample application used, he’s talking about ~30% performance loss, not the more like 300% I’m seeing.  But he also attributes it to ruby garbage collection.

And to what does he attribute the extra ruby garbage collection? Increased size of the call stack.  That is, if you were to raise an exception and look at the call stack backtrace from a controller action method, in Rails2.3, according to Aaron,  you’d get a call stack of 51 lines, and in Rails3.0 60 lines, and in Rails 3.1 (which my app isn’t actually on yet), 67 lines.

Aaron says “The way that ruby’s garbage collector works, it has to scan the stack for objects to collect, so obviously as our stack gets larger, the garbage collector has to do more work.”  This isn’t entirely ‘obvious’ to me, I don’t really understand why deeper call stack (as opposed to the more straightforward ‘creating more objects’) leads to more GC (something about closures or similar?), but I trust Aaron knows what he’s talking about, he’s both a ruby and Rails core team member.

Now if I was just getting a 30% or even 40% slowdown, like Aaron’s benchmarked case, that would be one thing. But I’m getting a 300, 400, 500% slowdown, incapacitating my app. So I’d been wondering, okay, what is my app specifically doing to put itself in this pathological case, what makes my app different than anyone elses?  And I hadn’t come up with any hypothesis that ended up verified by experiment. (Ie, cutting out the part of my app I suspected to see if that did it).

But here’s a scary thought: Maybe my app has a deeper call stack than other peoples?  Maybe it always did, but then when you add my own code’s deeper call stack onto Rails3’s deeper call stack, you get a really big call stack which crosses some line to make the ruby GC really pathological?  That I might have a deeper call stack than everyone else seems plausible to me — I am in the habit of creating some fairly abstract multi-level OO designs to make things reusable and seperated, which would lead to increased depth of call stack.

But this is a really disconcerting thought, because I (like most of us) am really not used to thinking in terms of how deep the call stack is, that is not a kind of analysis that is intuitive for me or anything but very challenging to analyze, or take account of in my architecture and programming.  The way we (or at least I) have learned to program, all about encapsulation and abstraction and seperation of concerns and testing in isolation, the size of the call stack is an implementation detail at a lower level of abstraction than I generally ever think about.  And it’s very unclear what I could possibly do to reduce the size of the call stack in my own code, aside from ripping out all my elegant OO abstraction and turning it into a mess of unshareable non-DRY unique-to-the-project spaghetti instead. Nor do I really have any idea how to test this hypothesis that a deep call stack is giving me pathological performance. (Patterson has some ideas for reducing the size of the call stack in Rails3 itself, having identified a particular part of the architecture that has led to the stack size increase, and some ideas for refactoring to reduce it — the middleware stack).

I am awfully frustrated with ruby right now.

I think that ruby 1.9.3, predicted in final release in a couple months, has some new GC strategies, which if I’m lucky might address some of these issues. I haven’t gotten my particular app to run under 1.9.x yet, but perhaps that’s what I’m going to have to do sooner rather than later. Very frustrated with ruby/rails right now, the amount of time I am spending on under-the-hood infrastructure and migration and remediation is leaving not enough time for actually developing the features that make the customers happy.

23 thoughts on “more thoughts on unbearably slow rails3

  1. Did you try running on JRuby? The JVM garbage collector might be your best option in this case if your app can run on JRuby and you can use it now instead of waiting for 1.9.3 to arrive.

  2. I have tried JRuby, yeah, and it’s still slow, not quite as slow as straight MRI, but actually slower than GC-tuned REE (and even GC tuned REE is still too slow). Now, I could try GC-tuning the JVM under JRuby…. but man, I thought there was a reason I wrote ruby instead of Java, and nightmares about magical GC tuning of the JVM is part of it.

  3. They want people to leave Ruby 1.8/Rails 2, when Ruby 1.9/Rails3.0 still have issues. You don’t ask people to abandon perfecly working apps for the sake of moving forward. Fix it, make it better.. then and only then can they demand that we move.

  4. Initial testing of my app with Ruby 1.9.2-p180 and Rails 3.0.9, it’s still unbearably slow.

    This isn’t really my complete app — complete app has encoding errors which are nightmarish to fix, although I’d do it if I thought there was light at the end of the tunnel. So I can only test certain click paths and with some functionality disabled — but still unbearably slow.

  5. Just a thought, but you may have latched onto the GC-as-culprit too quickly. With an order of magnitude difference in the slowdown you’re seeing vs the folks talking about GC problems, that connection is starting to look tenuous. Have you done more thorough profiling of the whole app to see where it’s spending all its time under different rails versions? There are code changes between the rails 2 and rails 3 versions of *your* code, yes? Is it possible some of them are to blame?

  6. The reason I’ve latched onto GC-as-culprit is because:

    * REE with GC tuning supplied provides better performance than any
    other option (including jruby (without JVM GC tuning) and ruby 1.9). (It’s still not
    reliably great, but it’s an order of magnitude or so better than all other options)

    * When trying to profile to find the slowdown, it was weird and
    non-deterministic. One method would take 800ms to execute for no
    apparent reason; but if I removed calls to this method, then ANOTHER
    method would start taking 800ms to execute, where previously that same
    method call took only 20ms and there had been no changes to the
    implementation of that method. This kind of weird results seems
    consistent with GC being the real culprit, those 800ms weren’t in that
    method implementation, but ‘between the lines’ in the interpreter.

    * Then the other people who reported GC as the culprit, who had more
    skills to verify that, and had similar symptoms.

    Still, I certainly could be wrong. But yes, I’ve definitely tried
    profiling, it just was giving me results that didn’t make any sense,
    but sort of make sense if you assume GC as the culprit.

    I am figuring it’s likely that some changes to my code, or it’s
    dependencies (other than Rails) are to blame — but it’s looking like
    whatever those changes are, they are to blame because of their effect
    on ruby GC.

  7. A few quick things:

    1) http://support.newrelic.com/kb/ruby/ruby-gc-instrumentation

    Enable GC Instrumentation if you haven’t. Will make determining the slow downs MUCH more transparent: http://engineering.gomiso.com/2011/06/06/low-hanging-fruit-for-ruby-performance-optimization-in-rails/

    2) https://github.com/newrelic/rpm_contrib/blame/master/README.md

    Enable contrib which adds additional time monitors to make measuring performance easier

    I have experienced similar problems with a Rails 2 -> Rails 3 conversion in terms of almost 150% slower on Rails 3. I am still using Rails 2 for the application but ported several of the functions to Sinatra/Padrino web services since I am decided to try for a more modular approach to web development.

  8. Hmm, GC.enable_stats, as directed in that helpfile, gives me ” undefined method `enable_stats’ for GC:Module (NoMethodError)” on an RVM-installed ruby-1.8.7-p352. Is this only available for ruby 1.9? Ah, looks like it’s only available for REE, or MRI with custom patches. Thing is, REE is the only ruby that’s close to performing adequately already, and I don’t really have time to figure out how to patch ruby.

    This whole process I keep getting side-tracked into things ending up monstrous; I could easily spend a day or two just trying to get new_relic GC instrumentation to work, and it’s a day or two I don’t really have at this point. I’m way over time budget at this point. (Even the 2-3 minute wait each time I do a ‘bundle install’, trying out yet another env/dependency variation, is starting to seriously irk me).

    Still, just for the heck of it, okay, REE, I suppose I could try it without any of the GC tuning, duh, yeah… but I’m having trouble getting my stats to show up in the new relic console, or maybe I don’t know where to look for GC instrumentation results. I’m at the end of my rope here.

  9. If you check our blog post you can see an example: http://engineering.gomiso.com/2011/06/06/low-hanging-fruit-for-ruby-performance-optimization-in-rails/ It was very helpful for us. And I would recommend you remove the tuning you did on REE and enable GC stats. This will either confirm or deny your hypothesis on the GC issue. The stats after some time will appear right on your primary response time graph (not real-user monitoring), the one that breaks down ruby/db/etc

    Sorry you are experiencing these issues and I can imagine it is frustrating going through all these hoops. Best of luck.

  10. Okay, got a bit of new_relic to work. Using REE but without GC tuning options (stock), one of the really pathological examples takes 3500ms to return, of which new_relic says 1413ms was GC. Odd thing is, with GC tuning, that same action takes LESS than 3500-1413ms to return, it often returns in under 1s.

    Guess I’d have to spend time investigating the new_relic_contrib stuff to get more info on the GC beyond the total time count. Add it to the list of things I’m investigating.

  11. If the response takes 3500ms – 1413ms in that particular recorded transaction, what is taking up the remaining 2+ seconds? I would install rpm_contrib when you can? Is the bulk of the remaining time DB? Rendering? External Calls?

  12. About 1s of the remaining time is external calls, yeah. The rest, according to new_relic, is just “ruby”. new_relic doesn’t break out rendering, does it? I don’t trust Rails report on rendering time, as of course it can include anything that happened during the rendering even if it was external calls and such.

    If I was writing an app from scratch and started running into this stuff, it woudlnt’ be that bad — the annoying thing is that basically this same code worked fine in rails2. The only code I changed was code I had to change to work in Rails3. I end up feeling really betrayed by ruby/rails.

  13. Yeah I believe you, as I said I have experienced similar problems before to a lesser degree, I was just curious at a lower level what in particular made it slow. I have a hard time believing it is the GC collection exclusively when you are seeing such slow response times…there’s got to be more to it in terms of slower Rails 3 implementations for other aspects. Such serious slowdowns must have another source.

  14. Thanks a lot for the tips, nesquena, that blog post is useful, definitely gonna try to find time to do more with new_relic. (It’s not clear to me if you can get GC times and other stats per-action? There’s ONE action that is my really problematic one? And it looks like stack traces, which would be really useful here, are only available with paying accounts. But I’ve only glanced at your blog post so far).

    I think I’ve figured out ONE piece of the puzzle, that maybe I’ll write another blog post on: ActiveSupport::Memoizable. I was using it in my own code. That seems to be an AWFUL idea, esp in Rails3, no idea why it wasn’t so horrible in the equivalent rails2 app (or if it may be related to GC, perhaps by creating lots of extra objects). No longer using Memoizable hasn’t eliminated the problem in my actual app, but it DID eliminate the problem in a slice of my app I was using to see “how much can I cut out and still have the problem exhibit” — got the problem to exhibit, went away when I took out Memoizable — sadly taking out Memoizable of full app doesn’t eliminate the problem there, so, okay, back to more analysis and creation of subsets of my code that still demonstrate problem.

  15. You’re probably using a butt load of inheritance and anti patterns. Sounds complex and worrying when I’ve upgraded countless apps from 1.1.6 and 2.3.5 to brand new 3.1 apps, and I’ve never had an issue. Of course I also don’t use ActiveRecord, use fat clients and heavy JavaScript and only abstract/modularize things when there’s an existing reason to. If I have parts of my apps that need a lot of backend Ruby I tend to write those in Sinatra and mount them as middleware. 9 times out of 10 when a developer is blaming the software they are riding it instead of driving it.

  16. I am exactly in the same condition as Jonathan…
    Our app performs 300-400% slower after upgrade.

    Here is what I get on my rails 3 windows environment:

    Processing by AppsController#index as HTML
    Rendered shared/_search_box.html.haml (93.0ms)
    Rendered shared/_alphabetical_pagination.html.haml (78.0ms)
    Rendered shared/_alphabetical_pagination.html.haml (78.0ms)
    Rendered apps/_index.html.erb (2422.0ms)
    Rendered apps/index.html.erb within layouts/application (2625.0ms)
    Rendered plans/_top_tabs.html.haml (47.0ms)
    Rendered reports/_tabs.html.haml (47.0ms)
    Rendered account/_tabs.html.haml (47.0ms)
    Rendered users/_tabs.html.erb (78.0ms)  Converted to erb – (before converting to erb, it would take 125 ms to load)
    Rendered account/_environment_tabs.html.haml (47.0ms)
    Rendered shared/_privilege_tabs.html.erb (188.0ms)  Converted to erb – (before converting to erb, it would take like 250 ms to load)
    Rendered shared/_header.html.erb (375.0ms)  Converted to erb – (before converting to erb, it would take like 462 ms to load)
    Rendered shared/_footer.html.erb (15.0ms)  Converted to erb – (before converting to erb, it would take like 40 ms to load)
    Completed 200 OK in 3140ms (Views: 2953.0ms | ActiveRecord: 125.0ms)

    Now, let us look at the corresponding rails 2 on windows:

    Rendered shared/_alphabetical_pagination (1.0ms)
    Rendered apps/_index (462.0ms)
    Rendered reports/_tabs (3.0ms)
    Rendered account/_tabs (9.0ms)
    Rendered users/_tabs (3.0ms)
    Rendered account/_environment_tabs (4.0ms)
    Rendered shared/_privilege_tabs (22.0ms)
    Rendered shared/_header (39.0ms)
    Rendered shared/_footer (2.0ms)
    Completed in 611ms (View: 460, DB: 121) | 200 OK [http://localhost/applications]

    If you look, there is not much of a difference in the db/activerecord time. The huge difference is in the rendering of views.
    The numbers are like 7-8 times more than rails 2 on Windows!

  17. I’m actually no longer having the problem with recent Rails 3.2.x, ruby 1.9.3, and updates to my various dependencies. It’s hard to say exactly what fixed/changed it (_part_ of it was problems with using kaminari with a very high result count, don’t know if you use kaminari)

    MRI 1.9.3 _does_ support env variables for turning GC parameters as 1.8 REE did (the same env vars I think), so if it is GC related you could try tuning these parameters.

  18. Jonathan, thanks for your reply.
    I am running on recent rails 3.2.6 and ruby 1.9.2 stack. I am using JRUBY though.

    After some analysis it turns out that it is a problem while rendering views.

    Almost all of our app has haml views. I converted a few high frequency ones to old school erbs.
    And there was almost a 50% increase in performance. Not quite close to our old rails 2 app, but appreciable.

    I turned on verbose garbage collection options in JRUBY and noticed that In the rails 3.2.x app, it runs GC almost 5-6 times as compared to the rails 2 app.
    Will look for some JVM performance tuning options too…

    Thanks,
    Rajesh

  19. Jruby 1.7.1 + JAVA 7 == WOW!

    Awesome performance! They seem to have fixed major performance problems with rendering from earlier version 1.6.7.2!

    I would recommend anyone using 1.6.x to use 1.6.8 or 1.7.x ASAP!
    We see a 100% improvement in our app’s performance and scale!

Leave a comment