I have a rails app (ruby 1.8) that makes database connections using the ‘mysql’ adapter. (It’s an old app written before ‘mysql2’ even existed, currently running on ruby 1.8.6 where ‘mysql2’ isn’t supported, on rails2).
I am not sure if ‘mysql’ supports a ‘timeout’ param in connection setup at all. But the default Rails generator way back then didn’t put one in, and I don’t have one set. (Now I need to go investigate if it supports it, I hope so — it looks to me like it does NOT, uh oh. GOT to get this app migrated to rails2/mysql2/ruby 1.9. Which is non-trivial work).
So last night, the server running a database this app connects to suffered a catastraphic failure, it ran out of RAM (and swap). But the machine was still up, it just couldn’t do anything.
So the rails app waited…. forever on it’s database connections. I had a rake task that ran 4:30am last night and made a database connection, performing some nightly maintenance. The rake task emails me if it fails. I got the email… at 12:30pm, just now. Telling me “Lost connection to MySQL server at ‘reading initial communication packet’
The intervening eight hours, it was waiting on the socket, presumably. (The machine was rebooted about an hour ago, not sure why it took an hour after that for the app to actually give up on the socket. Why it took us 8 hours to notice and reboot the machine is another story).
Now this was a nightly rake task, what happened to the actual app, which also connected to this same database? (This isn’t the app’s main database, but a subsidiary ‘enterprisey’ thing on another server). All those database connections also hung forever waiting, completely incapacitating the app. I didn’t wait for those to close on their own after bringing up the db server again, I manually restarted the app. (I guess it could have taken it an hour to notice like the rake task did? It was the eventual email from the rake task that made it perfectly clear what was going on).
Now, even if there had been a timeout, the app still woudln’t have worked, true. But it would have returned an error quickly, instead of browser timeout on any page trying to contact that database server. And when the database came back up, it would have just started working again without requiring a hard restart of the app instance (to kill all those eternally blocked socket connections).
This is a pretty disastrous failure mode.
Note that this applies not to just databases, but also to for instance your use of Net::HTTP. ruby 1.8 Net::HTTP has no default timeout (not sure if they gave it one in 1.9), and worse, ruby 1.8 open-uri has _no way_ to set a timeout (they did fix that in 1.9, I think). And if you’re using one of the many many wrappers or alternatives for Net::HTTP… who knows. But it’s pretty darn important to make sure that any network call you make has a timeout set… which unfortunately means ruby 1.8 open-uri is fundamentally broken.
And since it looks like the ‘mysql‘ (as opposed to ‘mysql2’) connector does not support a timeout either, it’s also pretty fundamentally broken.