last.fm has a really nice api.
I’d really like to write a plug-in for last.fm for Umlaut.
But first I’ve got to figure out how Umlaut is going to figure out (and track in it’s internal schema) whether a given OpenURL citation actually represents some kind of musical album, or at least a sound resource that has a higher chance than just any random citation of being a musical album. Otherwise there are going to be too many false positives, since keyword search on artist/album is going to be the only way to search last.fm. A bit tricky since the commonly used OpenURL referent formats don’t really provide for this.
Such a plug-in could provide free cover images, an artist biography, and a link to the really useful last.fm artist page. The API even reveals if the album is going to have streamable samples from last.fm, so you can pre-notify the user of that before clicking on the link. Would be cool.
Anyone know any algorithms for string similarity?
Something occurs to me that would be useful for all sorts of Umlaut plug-ins that rely on keyword searches of external databases like this. Are there any good standard algorithms for computing some measure of ‘similarity’ between two strings? It’s quite likely that album name, or book title, or author/artist name in the incoming citation won’t match _exactly_ with a record found in an external database, but will still in fact represent the same real world thing.
An algorithm for computing some measure of similarity between two strings would be helpful in improving Umlaut’s matching heuristics in these cases. If one is a strict subset of the other, or close to a strict subset, or the same except for some puncutation (or a strict subset except for some punctuation), etc., odds are better they’re the same. But rather than try to write a bunch of rule-based heuristics like this, I’m thinking a computational similarity approach already worked out by someone else is the way to go.
Why a link resolver really isn’t federated search
I think this is a good example of why the ‘link resolver’ domain really is qualitatively different than the ‘federated search’ domain.
Sure, much like a broadcast search application, Umlaut goes out and searches several different external databases using APIs, or screen scraping, etc.
But a link resolver is focused like a laser on a very particular application, receiving a known item citation and trying to find versions, services, and descriptions of that known item in external databases. That special focus leads to code that you’d never put in a general purpose federated search application; and also allows one to leave out all sorts of code that would be basic requirements in a federated search application.
Similar, but not quite the same thing.