A spell-suggest service from Yahoo

Many of us want spell-suggest services in our search interfaces.

Spell suggest options

Google’s is exceptionally good these days, often seeming to read my mind. It is clearly based on fancy algorithms that have all of Google search history as one input. We probably aren’t going to get one that works as well as Google.

However, we need one that’s “good enough” — user tolerance for very poor spell suggest routines is pretty low. In the distant past, with our out-of-the-box OPAC, we at one point hacked in a spell-suggest based on aspell — it ended up working so poorly that we eventually decided to turn it off.

If you use Solr, there are various ways to implement spell-suggest in Solr, based on your actual corpus of data (rather than a particular set dictionary). At first it sounds like a spellcheck based on your actual corpus is a great idea. However, reviews are pretty mixed on this too, with a data set like a library catalog, or other library types of data.  I have heard mixed reviews of  people’s experience with Solr spellcheck with a local Solr they have complete control over; as well as mixed reviews of the Summon spellcheck feature, which I might hazard a guess is based on Solr spellcheck features, since Summon is clearly based on Solr.

For instance, if you search for ‘chicano’ and it suggests ‘chicago’ to you (or vice versa), that’s kind of undesirable.

One could spend a lot of time working on fine-tuning a local Solr spellcheck. At least for searches based on a local Solr. But I’d also like to offer spell-check for my article search function based on a third-party API (which may or may not offer it’s own spell suggest that we’re happy with).  And, is there a third party service which could work better at a reasonable price, compared to many hours trying to fine-tune local solr?

Google used to offer a spell suggest API you could incorporate in your own apps. But they don’t any more. Yahoo used to offer a free one, then started charging for it. Bing used to offer a free one, then started charging for it.

So both Yahoo (via the “YBoss” suite) and Bing offer a for-fee spell suggest API. Yahoo’s pricing is a lot more reasonable than Bing’s.  I am back-of-the-envelope estimating that my usage needs might be on the order of magnitude of 100k requests per month.  For the Yahoo service, that’s $10 a month. For Bing, it’s at least $200 a month, and it’s unclear if you have to go up to the $500 a month level if you go to 100,001.  (Yahoo charges a different rate for spell queries than search queries; Bing does not. But even aside from this, at least an order of magnitude difference in pricing).

YBoss Spell API

The YBoss spell suggest api seems to work pretty well.

It does not suggest ‘chicago’ for ‘chicano’ or vice versa. It generally seems to do the ‘right thing’. It leaves punctuation (or at least double quotes) in input alone, so if you give it input involving, say, a phrase quote — your results leave the phrase quote intact.

  • elisabeth cady stanton sufragets => elizabeth cady stanton suffragettes
  • winton marsallis => wynton marsalis
  • noam chompskey “manufacturing consent” => noam chomsky “manufacturing consent”
  • Samuel Delaney => # sadly, no correction here, but if you include his most popular work:
  • Samuel Delaney Dhalgren => Samuel Delany Dhalgren

Not bad, overall in testing it seems to perform quite decently. (Have not tested it on non-English input, curious).

Technically, there were a couple of implementation challenges, but I still managed to get a working demo implemented in about a day.

Auth is a bit tricky

For authentication, it requires a somewhat complex algorithm based on some OAuth 1.0 alternative. It’s not really enough just for their documentation to say “It’s OAuth, use an OAuth library” — there are too many different kinds of OAuth, and you still can’t figure out what you actually need to do.  (Man, OAuth, don’t get me started).

Fortunately, there was a very recently released ruby library for working with YBoss. (There were also some much older ones, where I wasn’t sure if there were being maintained or not, and wasn’t sure if YBoss API had changed in the past couple years; so I went with the recent one).

I wasn’t super happy with the API offered by the yboss gem, and some choices it made around error handling though (plus I like using HTTPClient for http interactions these days)… so I just copy and pasted it’s code dealing with auth, and adjusted to have the API I wanted, and re-used in my own code for the rest of the API interaction.  That actually worked fine as far as auth was concerned. (YBoss auth seems similar (the same?) to AWS REST API auth, for what it’s worth. AWS docs are a lot better though, heh. ).

Weird double-escaping required

But then I still had trouble with punctuation in my queries, and escaping. Sending a query with an apostrophe, or other puncutation, would sometimes result in a 400 ‘bad request’ error from the YBoss API… and, as I tried various ways to do this, and various kinds of escaping, would other times result in a spell suggestion output that had HTML entities in it like “&#xx;”, very inconvenient for turning into a hyperlink for the user to accept the suggestion.

(It’s possible this was a bug in the OAuth code I copied from the yboss gem… but I don’t think so, I think it’s actually a bug (or incomplete/incorrect docs) in YBoss api. )

Turns out I needed to first URI-encode (percent-escape) all punctuation in this listplus apostrophes (‘) which are not in that list. THEN pass it to the OAuth signing routine.

In ruby, I first do:

URI.escape(query, "/?&;:@,$=%\"#*<>{}|[]^\\`()'")

Then, I pass that escaped query to the OAuth preperation routine, which will escape it again (while also calculating the signature etc). Yes, this ends up with double-URI-escaped data.  Once I do this, everything seems to work right. Yeah, it seems like a bug to me, but at this point they’d break my code if they fixed it. shrug.

That is, for a user query for “women’s rights”, the actual literal GET url I send to the api looks like this (correct, working)

http://yboss.yahooapis.com/ysearch/spelling?format=json&oauth_consumer_key=somekey&oauth_nonce=something&oauth_signature=M7a%2BaO2MH6tDW2tBxHufthST3vM%3D&oauth_signature_method=HMAC-SHA1&oauth_timestamp=1362693737&oauth_version=1.0&q=women%2527s%20rights

Note the double escaping of the apostrophe as %2527 instead of just %27.

Not like this:

....&q=women%27s%20rights

(I don’t even want to think about what if a user actually wants to search for the literal string “%25” or “%27” for some reason.)

 

update 11 March 2013EXCEPT that this double-escaping, neccesary to get the punctuation to be handled right… is causing challenges for my handling of diacritics and non-Latin characters, in ways I haven’t quite figured out yet. Man, character encoding and escaping issues are hard enough already, without having to reverse engineer an API that gets it wrong and doens’t document it.

Branding requirements?

I could swear when I signed up for the account, it gave me some ‘terms of service/user’ which required me to include a pretty large chunky YBoss icon on the “about” page, or equivalent. As I don’t really have an ‘about’ page, or anything else suitable, I thought this was going to be a problem.

But the only Terms of Service/Use I can find don’t include any such requirement.

Ah, then I found this PDF, which suggests the branding requirements again. But aren’t actually included in the Terms of Use?  So confused. Am I really bound by this? If I am, I’m still not quite sure how I’m going to handle it.

But in general

YBoss Spell Suggest seems like a pretty decent spell suggest service, at a pretty decent price.

This entry was posted in General. Bookmark the permalink.

4 Responses to A spell-suggest service from Yahoo

  1. Pingback: Yahoo Boss spell suggestion API percent encodes | Bibliographic Wilderness

  2. Pingback: More yspell weird: extra spaces after double quotes | Bibliographic Wilderness

  3. jrochkind says:

    I recently realized we were paying for a bit more spellchecking than we needed to, because we were issuing spellcheck queries for atom-formatted responses too, where we never offered the spell suggestion anyway.

  4. Use this:

    $xt = “Token Generated From http://xspell.tk“;
    $xs = “Word Or Sentence To Check Spelling”;
    $xu = “http://xspell.tk”;
    $xp = “api=spell&token=$xt&check=$xs”;
    $x = curl_init();
    curl_setopt($x,CURLOPT_POST,1);
    curl_setopt($x,CURLOPT_POSTFIELDS,$xp);
    curl_setopt($x,CURLOPT_URL,$xu);
    curl_setopt($x,CURLOPT_RETURNTRANSFER,1);
    $output = curl_exec($x);
    Buttar

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s