Yahoo Boss spell suggestion API percent encodes

We’ve been using the Yahoo Boss spell suggestion api to provide spelling suggestions in our discovery interface for some months now.

It continues to work pretty well.

But I just discovered that the API seems to return suggestions with most punctuation percent-encoded.

This isn’t mentioned in the documentation at all, as far as I can tell. (The docs try to vaguely say something about the need to escape strings in request arguments when making a request; what they say about that is confusing and unclear but it’s something; they say nothing about escaping of responses).

There’s no reason that strings in an XML payload would have to be percent-encoded. I guess maybe they figured a common use case was taking the suggestion and making a URI variable out of it, so they might as well URI-escape it for you in advance.  That doesn’t actually make a lot of sense though, because it actually causes problems when you’ve already got a good framework like Rails; and at any rate it ought to be documented.

Nonetheless, lesson learned, got to un encode it first, if you want to, say, display it to the user in a Rails view.

Here’s an example, trigger a spell suggestion with an intentionally misspelled word I know the service likes to correct, and include a buncha punctuation so we can see how it outputs the punctuation in it’s subsequent suggestion:

chmsky's & "herman" 40%

The suggestion you get back is:

chomsky%27s %26 %22 herman%22 40%25

So, yeah, it seems to pretty reliably percent-encode punctuation, including the “%” mark itself. Okay, fine, so we’ll de-code it ourselves.

In ruby, `URI.decode()` can be used to turn it back into a human readable string literal.

I’m not sure if this is a recent change to the API, or (more likely) I just didn’t notice until now, when a user reported the bug in my app.

2 thoughts on “Yahoo Boss spell suggestion API percent encodes

  1. Maybe you’ve written about it somewhere else, but I’m getting spaces encoded as simply “%” rather than “%20”. Which means that blindly URL-unescaping (and then replacing “%” with ” “) /usually/ works. But it doesn’t when the two characters after the % are valid hexadecimal. In that case, unescaping results in a weird Unicode character. I’m having to do:
    =~ s/%(\D)/ $1/g;
    before the URL-unescaping step.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s