14 Nov 2011. This is all very outdated, written for Rails 2.1. Please see my updated post on multi-threaded ActiveRecord access.
5 February 2009: I note that this post is still getting a lot of hits, so I’m posting this update. Beware, this information is OLD. The state of threading in Rails has changed quite a bit (for the better) in ActiveRecord 2.2. You can still use this post as some general background, and the techniques and warnings MAY still be relevant, but AR 2.2’s introduction of thread pooling, and Rails 2.2’s generally taking threading a bit more seriously, changes things. If anyone has some good links to suggests I include here, let me know, but I’m afraid I don’t have any off the top of my head.
Update 11 Feb 09 My lengthy further findings on concurrency in Rails, and why I no longer have confidence in the approach outlined here, can be found here.
So, I’m back from my vacation. Still working out a piece on proper conceptual modelling of authority work as a critique of FRAD. But it’s tricky. In the meantime, here’s a more technical piece on Ruby on Rails and threading (Well, FRAD is ‘technical’ too. So let’s say a piece about programming). For whatever reasons, I get non-trivial web traffic from web searches leading to my fairly useless ramblings about Rails, so here’s a more useful piece instead.
You can’t thread in Rails? Sure you can!
But you may or may not want to.
People often say “Rails doesn’t do threading/concurrency”. This isn’t really true, what they really mean is that Rails synchronizes access to the “request-response loop”—from when a request comes in, to when the response is returned to the browser, only one of those request-response actions can happen concurrently. If you want to handle more traffic, add instances. And this is true, and fine, so far as it goes. But there are still some cases where you really could use concurrency/threading in a Rails app, and adding more instances won’t help.
The standard Rails community answer to these cases seems to be to use BackgroundRB. I will confess that I find BackgroundRB intimidating. I am not encouraged that it’s website says it currently “is to be considered experimental, in-complete and in many respect untested.” I am not encouraged that such basic things as rake tasks for control have been broken for months (or at least that’s what the web page says). I did not like the idea of making installation even more difficult for this app which I plan to distribute to others, as easily as possible. And I was not encouraged that some colleagues told me they found it a bear to work with.
So I decided to buck the community conventional wisdom and try to use Ruby Threads in a Rails app. Since, after all, perusal of what documentation I could fine, discussion with colleagues, and help on the rails listserv led me to believe that officially this ought to work in ActiveRecord and in Rails. Contrary to popular belief, as long as you aren’t trying to do concurrent request-response loops, officially Rails says it should support concurrency using Ruby threads.
Of course, there’s ‘ought to’ and then there’s reality, and when doing something that most of the community doesn’t do, in a product at the maturity level of Rails, you are bound to run into difficulties. I did, but fairly minor ones easily solved. In some cases leaving inconveniences that could not be gotten around, but no major problems. At least that’s how it looks to me now–my app is still in development, it is not large scale live yet.
[update, May 2008: App has been live production since January, at around 1.5 requests/second, seems to having no problems due to concurrency/threading at all. My mongrels do seem to occasionally unexpectedly die without leaving any error log. I don’t believe this is related to concurrency, especially since Mongrel theoretically is okay with threads, but who knows. It seems to be something that happens to other people’s mongrels too. I just run a cluster big enough to handle some drop-outs, and periodically have a script restarting any dropped out mongrels. Annoying in its lameness, but not a big deal.]
But I didn’t find instructions on how to do this successfully in any one place, so here’s me documenting what I’ve learned.
Concurrency within a request-response loop
The first type of concurrency we might be interested in is within a request-response loop. We might have several operations we want to perform in parallel in a given request handling. The example web developers run into most often is querrying foreign web services. For instance, the app I was working on needed to query both a Google and a Yahoo API (and several others too) in handling a given request–it doesn’t really make sense to have to wait for the Google API to return before sending out the request to the Yahoo API. Response time to the user can be improved greatly by carrying these out concurrently in parallel–but all threads should complete before the response is returned to the browser. No threads will actually be left running between request-response loops. Because of this, this is in my opinion really quite in line with Rails design philosophy, and there’s little reason not to do it.
So you just create some threads. You let them do their thing. You make sure to join on all of them before returning from your action method. As with any concurrent programming, you are very careful with any data structures/objects that might be used by more than one thread, making sure those objects are written in a thread-safe manner. Since the Ruby and Rails documentation is fairly silent on this topic, I generally opted for extreme paranoid safety and with any Ruby or Rails classes, I assumed they weren’t thread-safe, and made copies of them one for each thread instead of allowing threads to share.
So the only trick is with ActiveRecord. First you have to (step 1) tell ActiveRecord that it will be operating in a concurrent environment. Just put this in your environment.rb:
ActiveRecord::Base.allow_concurrency = true
Ah, but what this actually does is make it so every Thread gets it’s own database connection, threads won’t share connections. That’s how AR handles concurrency safety. The main potential problem here is that AR never actually closes any of these DB connections (at least not automatically)–even once the Thread has stopped, even once the Thread object has gone out of scope. This is obviously a problem, and you’ll eventually waste up all your available DB connections. One solution would be to create some kind of Thread pooling system. Overkill. Better yet, there’s a completely undocumented (argh!!) method that someone on the listserv alerted me to.
(step 2) After threads complete, call:
This nice method does exactly what we want, it checks through all of AR’s db connections, and if any are associated with a Thread that no longer exists, it closes them. Since I’m waiting for all my threads to join anyway before returning from my action method, I just call verify_active_connections!() after they’ve all returned. Seems to work. Sadly, completely undocumented, but someone on the Rails listserv shared it with me, and now I share it again with the internet.
But then I ran into an AR/Rails bug of some kind. Horrors! Fortunately, I found it documented on the Rails ticketing system,with an easy fix.
So I took the diff patch from that ticket, and turned it into a (step 3) monkey patch to ActiveRecord::Base to fix this bug that exhibits only under concurrency conditions in Rails. environment.rb might be the best place for it, but for my own reasons I didn’t want to modify environment.rb in this way, so I stuck this code at the end of application_controller.rb instead, where it did the trick. Here’s the code to put in one of those places, to redefine the clear_reloadable_connections! method, to fix the bug I ran into, detailed in this ticket:
module ActiveRecord class Base class << self def clear_reloadable_connections! if @@allow_concurrency # Hash keyed by thread_id in @@active_connections. Hash of hashes. @@active_connections.each do |thread_id, conns| conns.each do |name, conn| if conn.requires_reloading? conn.disconnect! @@active_connections[thread_id].delete(name) end end end else # Just one level hash, no concurrency. @@active_connections.each do |name, conn| if conn.requires_reloading? conn.disconnect! @@active_connections.delete(name) end end end end end end end
And with those three things, using threads to have concurrent tasks within a given request-response loop appears to work. Although according to the community, Backgroundrb is the ‘preferred’ solution to even this I believe–but to me, it’s not really at all outside of the design philosophy of Rails to do it this way, and appears to me to work fine if you do it as outlined above.
Concurrency that goes beyond a request-response
Okay, but let’s say you want a thread that keeps going beyond the request-response loop. A long running task which takes so long, you don’t want to wait for it to complete before returning a response. You want to return a response, and then have the browser ‘poll’ at some point on status (AJAX, meta refresh, user-initiated click, there are a bunch of possible ways to do this) to get the finished ‘results’ at a later time. Again, the Rails community reccomended way to do this is Backgroundrb, and again I’m (perhaps irrationally) scared of Backgroundrb, so I decided to see if I could do it with Threads in Rails. Start a Thread, do NOT wait for it to finish (with join) before returning the response. Leave the Thread running in the background—the thread adds things to the database as ‘results’, and the ‘poll’ can check the db for results to display. (All communication between thread and everything else is through db, to avoid other headaches.)
And I found this worked, with one significant inconvenience. Rails in development mode, to allow ‘rapid development’ with automatic reloading of changed classes and such–forces an unloading all class definitions at the end of every request-response loop. Ordinarily, they’ll then be magically lazily reloaded next request without you even noticing. This works fine assuming that no code is executing in between requests. But once I had a Thread running that I left running after the response was returned—Rails went and unloaded the class definitions while there were still actual instances of those classes still live! Which caused them to complain with an exception about their class definitions going missing.
So I turned off this development mode reloading by putting this in config/environments/development.rb:
config.cache_classes = true
That seemed to work, but it’s pain to turn off rapid reloading of classes in development. I need to restart my app more often when developing to see changes. Inconvenient, but it works.
This concurrency between requests might violate the Rails design philosophy a bit more though. Ideally, you should be able to stop and start Rails instances at any time, seamlessly. But here you can’t, if there are still threads running in the background, restarting the process will interrupt them and that’s bad. (But isn’t this still a problem with Backgroundrb, just with the backgroundrb process instead of the Rails process? How does backgroundrb handle this?). So, for less confidence about the ‘railsness’ of this, and the whole cache_classes thing, if you need concurrency between requests, you might be better off using Backgroundrb after all. If you do, let me know how it goes. :) But my app right now is still using Threads, and turning cache_classes = true even in development, and it SEEMS to be working okay.
So that’s my report.