amazon ‘reader’ interface

At work, from my desktop computer (Windows, Firefox), I still get the reader interface I am used to seeing for a year now.


At home (OSX, Firefox), I get this really annoying javascript ‘lightbox’ interface instead. Which among other problems, gives me one of those infuriating inside-the-page scroll-bars that extend off the edge of my browser window. It also lets me see much less of the book without being logged into Amazon–the first version of the reader let’s me see more.


Except from the Umlaut server, which is trying to access the amazon page to screen-scrape it and see if look/search-inside function exists (unix, manual HTTP request, although it sends headers pretending to be firefox)… sometimes it gets the first one, and sometimes it gets the second one. And when it gets the second javascripty one, the html source actually doesn’t include enough information for it’s screen-scraping to work, because that stuff is apparently loaded by an AJAX call after initial page load.  (I’ve scanned the source of the amazon page to try and figure out what’s going on, but it’s not simple.)

What the heck is going on? Hey readers, when you look at a ‘look inside’ or ‘search inside’ view from Amazon, which do you get?

If I knew who to contact at Amazon, I might contact them to ask about it–except I’m not sure if they’d consider my Umlaut screen-scraping to be welcome, as it’s driving traffic to Amazon, or instead to be something they’d want to prevent, so I’d be hesitant to do so even if I knew who to talk to.

It seems to have something to do with whether the ‘reader’ URL has /gp/ in it or not. => The ‘good’ reader page when I can get it, but when I can’t, it’s because that URL is just issuing a redirect to: => The ‘bad’ too-much-javascripty reader page. Except, weirdly, if I try accessing this link manually from my work desktop that let’s me have access to the ‘good’ version, this URL to the ‘bad’ version—just gives me an ordinary Amazon item page, with no reader javascripty popup at all.

Confused. This is what you get when you try to screen scrape, alas.

Update: Okay, I can’t even tell you how I figured this out, because I don’t know, basically a couple hours of google researching anything I could find on the Amazon Online Reader and urls, and not actually finding any explanations, but finding some interesting looking urls mentioned in four year old pages…

I think if you prefix the url like this:

You get the page I consider the ‘good’ reader.  And it doesn’t insist on redirecting you to the ‘bad’ one. At least that’s what it looks like now to me, further expermentation is required to see how reliable this is.  I still have no idea why the other URLs I started with give me one interface (the v3?) on some computers, but another (v4? v5?) on others. And there’s really no telling how long this “v3” one will stay around. I still wish I knew what was going on, but I think I can get the page I need to scrape with ‘sitbv3’.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s