So HackerNews is currently down. It happens, we’ll probably find out why when it returns.
For a while yesterday, all HN URLs were returning error messages from CloudFlare, apparently their CDN. But today, all HN URLs are returning an apparently intentional outage message, “Sorry for the downtime. We hope to be back soon.”
But they are returning an HTTP 200 “OK” status code with this message. From all URLs.
This seems like a big mistake. You are telling any interested software (such as, say, Google), “All is well, and this is the proper content for the URL you requested.” For every single URL. Google might index this content. Google might decide that since there are a bazillion URLs at your hostname that all have the same content, relevancy/pagerank decisions should be made based on this (probably harming your visibility; why would a big website with a million URLs all of which say “Sorry for the downtime. We hope to be back soon” be given good visibility by a search engine). Etc.
I don’t know if Google in particular really does this; perhaps Google is smart enough to deal with improper 200’s for error pages, somehow using heuristics to guess that it’s really a temporary error that should be ignored. Not sure how that would work, but Google is often clever, I dunno.
But it’s still a bad idea. Don’t return 200’s for error pages. Use an appropriate response code for temporary outages. Google itself seems to suggest using 503 Service Temporarily Unavailable, which makes a lot of sense. If you can’t do this for some reason, perhaps you could use a 307 Temporary Redirect to redirect to an outage message — you’re saying it’s a ‘temporary’ redirect which shouldn’t be considered long-term content by indexers and such. (A 301 Permanent Redirect, or a 404 Not Found, seems just as bad as a 200).
In HackerNews’s case, it may actually be CloudFlare returning the 200, through misconfiguration or poorly thought out feature from CloudFlare. Either way, it seems like a bad idea.
Use HTTP response codes responsibly, and software agents consuming your web page will be happier! And there are some software agents (like Google), you really want to keep happy.
Actually, now that I look at those headers — those cache control headers seem unwise too. Am I wrong, or is the response telling agents they can cache the “Sorry for the downtime” message for 10 years? That doesn’t seem wise either, does it?
$ curl -i https://news.ycombinator.com HTTP/1.1 200 OK Server: cloudflare-nginx Date: Mon, 06 Jan 2014 15:22:04 GMT Content-Type: text/html Transfer-Encoding: chunked Connection: keep-alive Set-Cookie: __cfduid=[ommitted]; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.ycombinator.com; HttpOnly Last-Modified: Mon, 06 Jan 2014 13:14:48 GMT Vary: Accept-Encoding Expires: Thu, 04 Jan 2024 13:14:48 GMT Cache-Control: max-age=315352364 Cache-Control: public CF-RAY: [omitted] <html> <head> <link rel="stylesheet" type="text/css" href="/news.css"> <link rel="shortcut icon" href="/favicon.ico"> <title>Hacker News</title> </head> <body> <center> <table border="0" cellpadding="0" cellspacing="0" width="85%" bgcolor="#f6f6ef"> <tr> <td bgcolor="#ff6600"> <table border="0" cellpadding="0" cellspacing="0" width="100%" style="padding:2px"> <tr> <td style="width:18px;padding-right:4px"> <a href="http://ycombinator.com"> <img src="/y18.gif" width="18" height="18" style="border:1px #ffffff solid;" /> </a> </td> <td style="line-height:12pt; height:10px;"> <span class="pagetop"> <b><a href="/news">Hacker News</a></b> </span> </td> </tr> </table> </td> </tr> <tr style="height:10px"></tr> <tr> <td> Sorry for the downtime. We hope to be back soon. </td> </tr> <tr> <td> <img src="s.gif" height="10" width="0" /> <table width="100%" cellspacing="0" cellpadding="1"> <tr> <td bgcolor="#ff6600"></td> </tr> </table> <br /> </td> </tr> </table> </center> </body> </html>