interesting fact: googlebot is pretty good at using nearly all your CPU but no more

So on a new server we set up with our webapp, Googlebot was hitting it pretty hard, lots of traffic, and it was consistently running at about 80% CPU.  But actual googlebot, and a local google search appliance (run by another department, it probably shouldn’t be trying to scrape our entire library catalog, but that’s another story).

We added another CPU core to the VM (not really just because of googlebot, for other reasons).

Interestingly, after doing so, googlebot has seemed to up it’s request volume approximately double, and the CPUs are still running at at 80% utilization.

Which of course is pretty much fine. The interesting thing is that googlebot is smart enough to figure this out soley, I’m guessing, from response times — response times are getting slower, better slow down googlebot, response times are getting faster, can make more requests. They of course don’t have access to my actual CPU load figures, they have some algorithm which pretty cleverly manages to put as-much-as-you-can-stand-but-no-more load on you, adjusting soley based on response times and other data available to them.  Which is pretty clever.

Of course, it’s not always perfect. Prior to setting up our new server for the web app, the web app was sharing a server with some other processes, including a big indexer.  Googlebot’s traffic did manage to get the live web app to take enough CPU that it was starving out the indexer, and causing problems. Which is one more reason that separating your components/tiers onto different servers/VMs is just plain a good idea.

(There is a way to tell googlebot to throttle it’s requests to a certain limit, if you log into Google Site Admin. This is what I was doing prior to our new server being ready. (No idea if the google search appliance admin can do it for GSA).   The settings only last a few months before you need to reset them. They do tell you there “Google figures out for itself how fast to make requests to your server without causing you problems and is usually right, only set this if you need to.” I didn’t really believe them until I saw what appears to be that on the new server!  )

About these ads
This entry was posted in General. Bookmark the permalink.

3 Responses to interesting fact: googlebot is pretty good at using nearly all your CPU but no more

  1. You can also slow down Googlebot by adding “Crawl-delay: 3.0″ (1 request per 3 seconds) to your robots.txt file.

  2. jrochkind says:

    Wow, thanks Ryan, that’s great to know and I had not found it looking around. Is that documented anywhere you know of?

  3. jrochkind says:

    Huh, googling around I get mixed info on whether googlebot actually supports crawl-delay in robots.txt (can’t find a mention of it on actual google docs, but that doesn’t mean much; some people on the web think it does not support it); do you have experience that it does?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s