I work at a small non-profit research institute. I work on a Rails app that is a “digital collections” or “digital asset management” app. Basically it manages and provides access (public as well as internal) to lots of files and description about those files, mostly images.
It’s currently deployed on some self-managed Amazon EC2 instances (one for web, one for bg workers, one in which postgres is installed, etc). It gets pretty low-traffic in general web/ecommerce/Rails terms. The app is definitely not very optimized — we know it’s kind of a RAM hog, we know it has many actions whose response time is undesirable. But it works “good enough” on it’s current infrastructure for current use, such that optimizing it hasn’t been the highest priority.
We are considering moving it from self-managed EC2 to heroku, largely because we don’t really have the capacity to manage the infrastructure we currently have, especially after some recent layoffs.
Our Rails app is currently served by passenger on an EC2 t2.medium (4G of RAM).
I expected the performance characteristics moving to heroku “standard” dynos would be about the same as they are on our current infrastructure. But was surprised to see some degradation:
- Responses seem much slower to come back when deployed, mainly for our slowest actions. Quick actions are just as quick on heroku, but slower ones (or perhaps actions that involve more memory allocations?) are much slower on heroku.
- The application instances seem to take more RAM running on heroku dynos than they do on our EC2 (this one in particular mystifies me).
I am curious if anyone with more heroku experience has any insight into what’s going on here. I know how to do profiling and performance optimization (I’m more comfortable with profiling CPU time with ruby-prof than I am with trying to profile memory allocations with say derailed_benchmarks). But it’s difficult work, and I wasn’t expecting to have to do more of it as part of a migration to heroku, when performance characteristics were acceptable on our current infrastructure.
Response Times (CPU)
Again, yep, know these are fairly slow response times. But they are “good enough” on current infrastruture (EC2 t2.medium), wasn’t expecting them to get worse on heroku (standard-1x dyno, backed by heroku pg standard-0 ).
Fast pages are about the same, but slow pages (that create a lot of objects in memory?) are a lot slower.
This is not load testing, I am not testing under high traffic or for current requests. This is just accessing demo versions of the app manually one page a time, to see response times when the app is only handling one response at a time. So it’s not about how many web workers are running or fit into RAM or anything; one is sufficient.
|Action||Existing EC2 t2.medium||Heroku standard-1x dyno|
|Slow reporting page that does a few very expensive SQL queries, but they do not return a lot of objects. Rails logging reports: Allocations: 8704||~3800ms||~3200ms (faster pg?)|
|Fast page with a few AR/SQL queries returning just a few objects each, a few partials, etc. Rails logging reports: Allocations: 8205||81-120ms||~120ms|
|A fairly small “item” page, Rails logging reports: Allocations: 40210||~200ms||~300ms|
|A medium size item page, loads a lot more AR models, has a larger byte size page response. Allocations: 361292||~430ms||600-700ms|
|One of our largest pages, fetches a lot of AR instances, does a lot of allocations, returns a very large page response. Allocations: 1983733||3000-4000ms||5000-7000ms|
Fast-ish responses (and from this limited sample, actually responses with few allocations even if slow waiting on IO?) are about the same. But our slowest/highest allocating actions are ~50% slower on heroku? Again, I know these allocations and response times are not great even on our existing infrastructure; but why do they get so much worse on heroku? (No, there were no heroku memory errors or swapping happening).
RAM use of an app instance
We currently deploy with passenger (free), running 10 workers on our 4GB t2.medium.
To compare apples to apples, deployed using passenger on a heroku standard-1x. Just one worker instance (because that’s actually all I can fit on a standard-1x!), to compare size of a single worker from one infrastructure to the other.
On our legacy infrastructure, on a server that’s been up for 8 days of production traffic,
passenger-status looks something like this:
Requests in queue: 0 * PID: 18187 Sessions: 0 Processed: 1074398 Uptime: 8d 23h 32m 12s CPU: 7% Memory : 340M Last used: 1s * PID: 18206 Sessions: 0 Processed: 78200 Uptime: 8d 23h 32m 12s CPU: 0% Memory : 281M Last used: 22s * PID: 18225 Sessions: 0 Processed: 2951 Uptime: 8d 23h 32m 12s CPU: 0% Memory : 197M Last used: 8m 8 * PID: 18244 Sessions: 0 Processed: 258 Uptime: 8d 23h 32m 11s CPU: 0% Memory : 161M Last used: 1h 2 * PID: 18261 Sessions: 0 Processed: 127 Uptime: 8d 23h 32m 11s CPU: 0% Memory : 158M Last used: 1h 2 * PID: 18278 Sessions: 0 Processed: 105 Uptime: 8d 23h 32m 11s CPU: 0% Memory : 169M Last used: 3h 2 * PID: 18295 Sessions: 0 Processed: 96 Uptime: 8d 23h 32m 11s CPU: 0% Memory : 163M Last used: 3h 2 * PID: 18312 Sessions: 0 Processed: 91 Uptime: 8d 23h 32m 11s CPU: 0% Memory : 169M Last used: 13h * PID: 18329 Sessions: 0 Processed: 92 Uptime: 8d 23h 32m 11s CPU: 0% Memory : 163M Last used: 13h * PID: 18346 Sessions: 0 Processed: 80 Uptime: 8d 23h 32m 11s CPU: 0% Memory : 162M Last used: 13h
We can see, yeah, this app is low traffic, most of those workers don’t see a lot of use. The first worker, which has handled by far the most traffic has a Private RSS of 340M. (Other workers having handled fewer requests much slimmer). Kind of overweight, not sure where all that RAM is going, but it is what it is. I could maybe hope to barely fit 3 workers on a heroku standard-2 (1024M) instance, if these sizes were the same on Heroku.
This is after a week of production use — if I restart passenger on a staging server, and manually access some of my largest, hungriest, most-allocating pages a few times, I can only see Private RSS use of like 270MB.
However, on the heroku standard-1x, with one passenger worker, using the heroku log-runtime-metrics feature to look at memory… private RSS is I believe what should correspond to passenger’s report, and what heroku uses for memory capacity limiting…
Immediately after restarting my app, it’s at
sample#memory_total=184.57MB sample#memory_rss=126.04MB. After manually accessing a few of my “hungriest” actions, I see:
sample#memory_total=511.36MB sample#memory_rss=453.24MB . Just a few manual requests not a week of production traffic, and 33% more RAM than on my legacy EC2 infrastructure after a week of production traffic. Actually approaching the limits of what can fit in a standard-1x (512MB) dyno as just one worker.
Now, is heroku’s memory measurements being done differently than passenger-status does them? Possibly. It would be nice to compare apples to apples, and passenger hypothetically has a service that would let you access passenger-status results from heroku… but unfortunately I have been unable to get it to work. (Ideas welcome).
Other variations tried on heroku
Trying the heroku gaffneyc/jemalloc build-pack with
heroku config:set JEMALLOC_ENABLED=true (still with passenger, one worker instance) doesn’t seem to have made any significant differences, maybe 5% RAM savings or maybe it’s a just a fluke.
Switching to puma (puma5 with the experimental possibly memory-saving features turned on; just one worker with one thread), doesn’t make any difference in response time performance (none expected), but… maybe does reduce RAM usage somehow? After a few sample requests of some of my hungriest pages, I see
sample#memory_total=428.11MB sample#memory_rss=371.88MB, still more than my baseline, but not drastically so. (with or without jemalloc buildpack seems to make no difference). Odd.
So what should I conclude?
I know this app could use a fitness regime; but it performs acceptably on current infrastructure.
We are exploring heroku because of staffing capacity issues, hoping to not to have to do so much ops. But if we trade ops for having to spend much time on challenging (not really suitable for junior dev) performance optimization…. that’s not what we were hoping for!
But perhaps I don’t know what I’m doing, and this haphapzard anecdotal comparison is not actually data and I shoudn’t conclude much from it? Let me know, ideally with advice of how to do it better?
Or… are there reasons to expect different performance chracteristics from heroku? Might it be running on underlying AWS infrastructure that has less resources than my
Or, starting to make guess hypotheses, maybe the fact that heroku
standard tier does not run on “dedicated” compute resources means I should expect a lot more variance compared to my own
t2.medium, and as a result when deploying on heroku you need to optimize more (so the worst case of variance isn’t so bad) than when running on your own EC? That’s maybe just part of what you get with heroku, unless paying for
performance dynos, it is even more important to have an good performing app? (yeah, I know I could use more caching, but that of course brings it’s own complexities, I wasn’t expecting to have to add it in as part of a heroku migration).
Or… I find it odd that it seems like slower (or more allocating?) actions are the ones that are worse. Is there any reason that memory allocations would be even more expensive on a heroku standard dyno than on my own EC2 t2.medium?
And why would the app workers seem to use so much more RAM on heroku than on my own EC2 anyway?
Any feedback or ideas welcome!