Threading in Rails

jrochkind Practice, programming August 28, 2007November 14, 2011

14 Nov 2011. This is all very outdated, written for Rails 2.1. Please see my updated post on multi-threaded ActiveRecord access.

5 February 2009: I note that this post is still getting a lot of hits, so I’m posting this update. Beware, this information is OLD. The state of threading in Rails has changed quite a bit (for the better) in ActiveRecord 2.2. You can still use this post as some general background, and the techniques and warnings MAY still be relevant, but AR 2.2’s introduction of thread pooling, and Rails 2.2’s generally taking threading a bit more seriously, changes things. If anyone has some good links to suggests I include here, let me know, but I’m afraid I don’t have any off the top of my head.

Update 11 Feb 09 My lengthy further findings on concurrency in Rails, and why I no longer have confidence in the approach outlined here, can be found here.

So, I’m back from my vacation. Still working out a piece on proper conceptual modelling of authority work as a critique of FRAD. But it’s tricky. In the meantime, here’s a more technical piece on Ruby on Rails and threading (Well, FRAD is ‘technical’ too. So let’s say a piece about programming). For whatever reasons, I get non-trivial web traffic from web searches leading to my fairly useless ramblings about Rails, so here’s a more useful piece instead.

You can’t thread in Rails? Sure you can!

But you may or may not want to.

People often say “Rails doesn’t do threading/concurrency”. This isn’t really true, what they really mean is that Rails synchronizes access to the “request-response loop”—from when a request comes in, to when the response is returned to the browser, only one of those request-response actions can happen concurrently. If you want to handle more traffic, add instances. And this is true, and fine, so far as it goes. But there are still some cases where you really could use concurrency/threading in a Rails app, and adding more instances won’t help.

The standard Rails community answer to these cases seems to be to use BackgroundRB. I will confess that I find BackgroundRB intimidating. I am not encouraged that it’s website says it currently “is to be considered experimental, in-complete and in many respect untested.” I am not encouraged that such basic things as rake tasks for control have been broken for months (or at least that’s what the web page says). I did not like the idea of making installation even more difficult for this app which I plan to distribute to others, as easily as possible. And I was not encouraged that some colleagues told me they found it a bear to work with.

So I decided to buck the community conventional wisdom and try to use Ruby Threads in a Rails app. Since, after all, perusal of what documentation I could fine, discussion with colleagues, and help on the rails listserv led me to believe that officially this ought to work in ActiveRecord and in Rails. Contrary to popular belief, as long as you aren’t trying to do concurrent request-response loops, officially Rails says it should support concurrency using Ruby threads.

Of course, there’s ‘ought to’ and then there’s reality, and when doing something that most of the community doesn’t do, in a product at the maturity level of Rails, you are bound to run into difficulties. I did, but fairly minor ones easily solved. In some cases leaving inconveniences that could not be gotten around, but no major problems. At least that’s how it looks to me now–my app is still in development, it is not large scale live yet.

[update, May 2008: App has been live production since January, at around 1.5 requests/second, seems to having no problems due to concurrency/threading at all. My mongrels do seem to occasionally unexpectedly die without leaving any error log. I don’t believe this is related to concurrency, especially since Mongrel theoretically is okay with threads, but who knows. It seems to be something that happens to other people’s mongrels too. I just run a cluster big enough to handle some drop-outs, and periodically have a script restarting any dropped out mongrels. Annoying in its lameness, but not a big deal.]

But I didn’t find instructions on how to do this successfully in any one place, so here’s me documenting what I’ve learned.

Concurrency within a request-response loop

The first type of concurrency we might be interested in is within a request-response loop. We might have several operations we want to perform in parallel in a given request handling. The example web developers run into most often is querrying foreign web services. For instance, the app I was working on needed to query both a Google and a Yahoo API (and several others too) in handling a given request–it doesn’t really make sense to have to wait for the Google API to return before sending out the request to the Yahoo API. Response time to the user can be improved greatly by carrying these out concurrently in parallel–but all threads should complete before the response is returned to the browser. No threads will actually be left running between request-response loops. Because of this, this is in my opinion really quite in line with Rails design philosophy, and there’s little reason not to do it.

So you just create some threads. You let them do their thing. You make sure to join on all of them before returning from your action method. As with any concurrent programming, you are very careful with any data structures/objects that might be used by more than one thread, making sure those objects are written in a thread-safe manner. Since the Ruby and Rails documentation is fairly silent on this topic, I generally opted for extreme paranoid safety and with any Ruby or Rails classes, I assumed they weren’t thread-safe, and made copies of them one for each thread instead of allowing threads to share.

So the only trick is with ActiveRecord. First you have to (step 1) tell ActiveRecord that it will be operating in a concurrent environment. Just put this in your environment.rb:

ActiveRecord::Base.allow_concurrency = true

Ah, but what this actually does is make it so every Thread gets it’s own database connection, threads won’t share connections. That’s how AR handles concurrency safety. The main potential problem here is that AR never actually closes any of these DB connections (at least not automatically)–even once the Thread has stopped, even once the Thread object has gone out of scope. This is obviously a problem, and you’ll eventually waste up all your available DB connections. One solution would be to create some kind of Thread pooling system. Overkill. Better yet, there’s a completely undocumented (argh!!) method that someone on the listserv alerted me to.

(step 2) After threads complete, call:

ActiveRecord::Base.verify_active_connections!()

This nice method does exactly what we want, it checks through all of AR’s db connections, and if any are associated with a Thread that no longer exists, it closes them. Since I’m waiting for all my threads to join anyway before returning from my action method, I just call verify_active_connections!() after they’ve all returned. Seems to work. Sadly, completely undocumented, but someone on the Rails listserv shared it with me, and now I share it again with the internet.

But then I ran into an AR/Rails bug of some kind. Horrors! Fortunately, I found it documented on the Rails ticketing system,with an easy fix.

So I took the diff patch from that ticket, and turned it into a (step 3) monkey patch to ActiveRecord::Base to fix this bug that exhibits only under concurrency conditions in Rails. environment.rb might be the best place for it, but for my own reasons I didn’t want to modify environment.rb in this way, so I stuck this code at the end of application_controller.rb instead, where it did the trick. Here’s the code to put in one of those places, to redefine the clear_reloadable_connections! method, to fix the bug I ran into, detailed in this ticket:

module ActiveRecord
     class Base
       class << self

       def clear_reloadable_connections!
         if @@allow_concurrency
           # Hash keyed by thread_id in @@active_connections. Hash of hashes.
                @@active_connections.each do |thread_id, conns|
                  conns.each do |name, conn|
                    if conn.requires_reloading?
                      conn.disconnect!
                      @@active_connections[thread_id].delete(name)
               end
                  end
             end
           else
           # Just one level hash, no concurrency.
              @@active_connections.each do |name, conn|
             if conn.requires_reloading?
                 conn.disconnect!
               @@active_connections.delete(name)
                  end
              end
            end
         end
     end
     end
  end

And with those three things, using threads to have concurrent tasks within a given request-response loop appears to work. Although according to the community, Backgroundrb is the ‘preferred’ solution to even this I believe–but to me, it’s not really at all outside of the design philosophy of Rails to do it this way, and appears to me to work fine if you do it as outlined above.

Concurrency that goes beyond a request-response

Okay, but let’s say you want a thread that keeps going beyond the request-response loop. A long running task which takes so long, you don’t want to wait for it to complete before returning a response. You want to return a response, and then have the browser ‘poll’ at some point on status (AJAX, meta refresh, user-initiated click, there are a bunch of possible ways to do this) to get the finished ‘results’ at a later time. Again, the Rails community reccomended way to do this is Backgroundrb, and again I’m (perhaps irrationally) scared of Backgroundrb, so I decided to see if I could do it with Threads in Rails. Start a Thread, do NOT wait for it to finish (with join) before returning the response. Leave the Thread running in the background—the thread adds things to the database as ‘results’, and the ‘poll’ can check the db for results to display. (All communication between thread and everything else is through db, to avoid other headaches.)

And I found this worked, with one significant inconvenience. Rails in development mode, to allow ‘rapid development’ with automatic reloading of changed classes and such–forces an unloading all class definitions at the end of every request-response loop. Ordinarily, they’ll then be magically lazily reloaded next request without you even noticing. This works fine assuming that no code is executing in between requests. But once I had a Thread running that I left running after the response was returned—Rails went and unloaded the class definitions while there were still actual instances of those classes still live! Which caused them to complain with an exception about their class definitions going missing.

So I turned off this development mode reloading by putting this in config/environments/development.rb:

config.cache_classes = true

That seemed to work, but it’s pain to turn off rapid reloading of classes in development. I need to restart my app more often when developing to see changes. Inconvenient, but it works.

This concurrency between requests might violate the Rails design philosophy a bit more though. Ideally, you should be able to stop and start Rails instances at any time, seamlessly. But here you can’t, if there are still threads running in the background, restarting the process will interrupt them and that’s bad. (But isn’t this still a problem with Backgroundrb, just with the backgroundrb process instead of the Rails process? How does backgroundrb handle this?). So, for less confidence about the ‘railsness’ of this, and the whole cache_classes thing, if you need concurrency between requests, you might be better off using Backgroundrb after all. If you do, let me know how it goes. :) But my app right now is still using Threads, and turning cache_classes = true even in development, and it SEEMS to be working okay.

So that’s my report.

Published by jrochkind

View all posts by jrochkind

Published August 28, 2007November 14, 2011

23 thoughts on “Threading in Rails”

jrochkind says:

September 4, 2007 at 4:12 pm

I forgot one more interesting piece of data.

Once I did this, my console is litered with weird “RangeError” report output. I haven’t been able to figure out what this means, but it doesn’t seem to cause any problem other than polluting my console output. It does not interrupt any execution anywhere.
phi says:

September 11, 2007 at 4:49 pm

Thanks Jonathan.

I’m just into the design of a rails app that will have concurrency beyond the r-r cycle. And your experience is very welcome.

I was thinking of running a separate process (launched manually at the same time of the server) to do all the background tasks (putting the results in the DB). But then I thunk, why have an additionnal process while there’s already one running. In my app, the server will not have a high page hit frequency.

I’m a bit like you, I’ll prefer going with ruby’s threads instead of Backgroundrb.

So, thanx again, I’ll let you know if I have other issues using threads.
jonathan rochkind says:

September 11, 2007 at 8:08 pm

Cool. If you WERE going to ‘run a seperate process launched at the time of server launch’ to run your background stuff–then you should probably definitely use backgroundrb. Since that’s EXACTLY what backgroundrb is.

Alternately, there’s of course the rails crontab integration design, built into rails.

I think often there is a better solution then what I’ve done. But despite it’s problems, I’m still happiest with my solution in my particular circumstances.
jrochkind says:

September 19, 2007 at 10:40 pm

I really should add two more things I forgot:

Use Mongrel, not WEBrick. WEBrick started seizing up when I did what I describe above, but Mongrel had no problems. [For the newbies, If you gem install mongrel, then $RAILS_ROOT/scripts/server will launch with Mongrel by default from now on. ]

Forget SQLite too. Well, apparently you just need to make sure to compile SQLite the right way, but the one that my package manager supplied by default apparently wasn’t compiled the right way, and I said forget it, just went right to MySQL, and never looked back.
Tom says:

September 27, 2007 at 2:28 pm

Because of the inherent problems with threading in Rails, I have found that forking seems to be a more reliable way to get this behavior. I release a very simple plugin so check it out on my blog:

http://hearmesqueak.blogspot.com/2007/09/spawn-background-processing-in-rails.html
phi says:

October 12, 2007 at 11:24 am

I reached beta stage on my project. Everything seems to be ok for now.

I still use Webrick now, I didn’t had problem with it but maybe they’ll appear in production.

For the DB I also stick with MySQL.

I didn’t check, but is fork supported on Windows? Because my client has a production machine under Windows, so I may not have alternatives. Note that ruby green threads seems to work for now (for me ;-)
Tom says:

October 16, 2007 at 3:26 am

FYI, version 0.3 of the spawn plugin now does threading in addition to forking.

http://rubyforge.org/forum/forum.php?forum_id=18237
Michael Koziarski says:

October 25, 2007 at 4:11 am

I realise I’m late to the party, but thanks for the interesting article.

I’ve applied the dev-mode threading patch you mentioned here, so hopefully you and the spawn plugin can avoid some of the monkeypatching.
Craig Caraway says:

April 4, 2008 at 10:29 pm

Jrochkind – It was awesomely kind of you to document your experience. I believe you have probably saved me and others who must implement under windows countless hours. I am going to try some of the solutions you specified and will respond to my findings.

Thanks for your contribution!
José Valim says:

May 22, 2008 at 5:57 am

Hi, I was just wondering, if Your model that search within Google API and Yahoo API doesn’t touch the ActiveRecord at all, You would have to do those changes?

What I have in mind is to set ActionMailer to send e-mails as threads (or fork) so the client wouldn’t have to wait it finishes (and I will probable release it as a plugin).

What do You think?

And thanks! Your tutorial and spawn plugin are helping a lot in that sense! =)
jrochkind says:

May 22, 2008 at 11:38 am

Hey José, glad this essay has been of use to you.

I’m not sure what you’re asking. For what reason isn’t your your search within Google API and Yahoo API touching your ActiveRecord at all? Are you doing the searches in ruby code, or in client-side javascirpt (Google recently opened up their API to server-side REST requests, so you can do it straight from the server now, hooray). You’d like to do those searches-server side but in seperate concurrent threads? Yes, you can use the method above to save any outcome of the Google or Yahoo search to your ActiveRecords in seperate threads, no problem. The primary thread that is returning the HTTP response may need to reload it’s ActiveRecord objects to see the changes. See the ActiveRecord reset and refresh methods for uncaching to-many relationships, if that’s what you need. Not sure what your problem is.

I’m not familiar with ActionMailer at all, not sure if it’s advertised as being concurrency-safe in what ways, etc. Sorry.
José Valim says:

May 22, 2008 at 4:09 pm

Thanks for the reply! =)

Reformulating the problem: let’s suppose I do something like this in the middle of an action:

tid = Thread.new do
# some code
end

But that “some code” doesn’t touch the ActiveRecord at all.

So do You think that I have to set allow_concurrency to true in ActiveRecord?

I will do some tests, but don’t worry with verify_active_connections! would let things very straight-forward! =)
jrochkind says:

May 22, 2008 at 5:28 pm

Hi José. No, if that code doesn’t touch ActiveRecord at all, I don’t see any need to set allow_concurrency.
Pingback: skwpspace – Long running Threads in Rails and metaprogramming fun
junebug says:

June 12, 2008 at 8:08 pm

i tried the approach above, but it fails to work for me. my rails app does fire off the threads in parallel, but the threads seem to share the same activerecord connection so they end up waiting and firing off db queries serially (which is exactly what i’m trying to parallelize!)

how do you work around this? i have the concurrency variable set to true and i’m using mongrel…

my app is fairly simple. i’m running 10 queries. i want to run them in parallel using threads and join the results for display. thanks in advance for any help you can provide
Jonathan Rochkind says:

June 12, 2008 at 8:40 pm

What database are you using? Is the database itself capable of handling concurrent connections?

I’m not really expert in this topic, but I’m not sure you can get an actual speed up by doing what you’re doing. You only have so much CPU and disk I/O available, I’m not sure that, even if you had a DB that could handle concurrent querries, the total time to completion of all the querries run in parallel will be any less than the total time of them all run serially. Potentialy barring multiple CPUs and a database that knows how to use them. But this is not my area of expertise, sorry.
junebug says:

June 13, 2008 at 12:43 pm

im using mysql. i think mysql can handle queries in parallel. i replaced my thread code to use the spawn plugin instead and that seems to work. it fires off the queries in parallel and mysql has no problem with it. the problem is i want to return results from the forked off child processes back to the parent process.

i cant think of a good way to do that. perhaps i could create a tmp table in sql for each child process to update and read it back via the parent process. or have each child process write their result to the local /tmp disk and have the parent process read it all back. both of these sound bad to me.

thats why i liked threads because it looked like there was an easy way to pass results from the threads back to the main thread. oh well…i’ll keep looking…thanks for the help though. i know i can probably easily do this with backgroundrb, but i dont want to deal with maintaining/monitoring another server.
Pingback: More threading in Rails « Bibliographic Wilderness
Myron says:

July 31, 2008 at 12:31 am

I’m working on a using some threading for a long-running process as well, so thanks for these tips. There’s one thing I haven’t been able to figure out yet (that I’m hoping you have!): when an exception occurs in my background thread, how do I pass it to rails’ exception handling mechanism? I’d like for it to get logged to my log file and emailed to me using the exception notification plugin.
jrochkind says:

July 31, 2008 at 1:18 am

I’m not sure about the exception notification plugin. But you can catch the exception yourself (wrap your whole thread body in begin/rescue), and then you can write it to the log file using the rails logger, which is accessible from a constant, something like this:

RAILS_DEFAULT_LOGGER.error(“Fatal exception: ” + exception.message ).
# To include the backtrace too… this includes the entire backtrace, I’m not sure how to duplicate Rails behavior of only including ‘interesting’ parts of the backtrace…
RAILS_DEFAULT_LOGGER.error( exception.backtrace() )

Not sure if the exception notification plugin watches the rails logger, maybe this is enough to get the exception mailed to you, not familiar with that plugin. For more information on the logger, check out the rdoc:

http://ruby-doc.org/core/classes/Logger.html

Except Rails does some weird things with the logger that aren’t covered in any good documentation I’ve found. But you can ask about the logger in the ruby forums, there’s nothing really special about threads here actually.
Pingback: Rails threading nightmare « Bibliographic Wilderness
Pingback: Multi-threaded use of Rails ActiveRecord 3.0-3.1 | Bibliographic Wilderness
Pingback: ActiveRecord Concurrency Currently: Good News and Bad | Bibliographic Wilderness