I wrote in a Library Journal article a year or so ago about how federated broadcast search had some inherent technical limitations compared to metasearch supported by local indexing of records.
One trick is getting metadata from our licensed search/content vendors to locally index. Looks like maybe they’re starting to come around? At least EBSCO is.
Enterprise Search Integration (ESI): ESI makes EBSCOhost database metadata content available to an Enterprise Search Application (ESA) search index. The end result is that an ESA user can search EBSCOhost content within an integrated result list in combination with other databases and local content, under one common interface. ESAs can index millions of structured and unstructured documents from several content sources, and can utilize internal/external taxonomies to present users with a rich search experience – far exceeding today’s Federated Search products in usability. Some of the most popular ESAs include: Endeca, FAST Search, Autonomy, Northern Light, Lucene/Solr, and Google Search Appliance.
EBSCOhost can push article metadata with links back to the full text. Databases are accessible as XML via different mediums, including FTP. Incremental updates to the content are available in new XML files. This allows ESAs to index the content on their own time, and avoids the resource intensive processes and duplicate HTTP hits associated with web crawling.
http://www.ebscohost.com/thisMarket.php?marketID=35 [I think that was the right url cite for this clipping from my files, although it's currently down for me when I try to check !]
Interesting they mention Lucene/Solr there as an “ESA”–really, of course, Lucene/Solr is just a building block, not a finished end product like some of the things they compare it to. (But then, so is Endeca). But it almost makes it seem like they’ve got a clue, huh?
Of course, if you want to take advantage of this, you’ve got to actually have some infrastructure for local indexing. I bet Terry Reese and LibraryFind are excited about this; my understanding is that while LibraryFind will do a broadcast search, it’s really happiest with locally indexed data.
I don’t know how many libraries have a flexible local index search application up and running. (I suppose you could translate their XML to MARC and load it in your traditional ILS/catalog software — but I don’t think you want to. ) I think you can buy an add-on to Ex Libris Metalib for locally indexed content. I suspect few do.
On the other hand
Meanwhile, other vendors are still suspicious not only of giving you metadata for local indexing, but of even broadcast federated search. When a customer recently approached Scopus encouraging them to improve the Scopus APIs used by Metalib for broadcast search, to let Metalib do a better job, a Scopus representative said in email:
On the “returning of the abstract text” [so abstracts could be included in federated search results] we see a more important role for Scopus as the main interface for search and retrieval of articles, than just a content source for Metalib. Would it be possible to make Scopus more visible on your website?
Aka, “You want us to make federated search work better? Can’t you just have your users use Scopus instead of federated search?” Well, no, we can’t.
Federated search not going away, Scopus
Here at MPOW, we certainly don’t hide Scopus or any other native interfaces. But we believe, and have evidence to support, that offering federated search in addition is a valuable service for our users. David Walker’s usage statistics at CSU seem to suggest that the presence of federated search doesn’t reduce use of native interfaces — it just increases database use over all, as people add federated search on top of that.
(Interstingly, Scopus actually offers pretty good APIs compared with their peers who usually offer no APIs to customers. This doesn’t seem to jibe with their apparent resistance to using Scopus through anything but the Scopus native interface. Go figure.)
So it’ll probably be quite some time until all the major vendors (let alone all of our content/search providers) are willing to provide metadata for local indexing.
This creates issues of it’s own in figuring out interfaces. It’s tricky to have a setup that’s: You can use this search interface (local index) to search some of our licensed scholarly databases, but you can use this other one (broadcast federation) to search more of them, but you can’t really get all of them except by searching one by one native interfaces. Sigh.
One obvious solution might initially seem to be offering interfaces that search both locally indexed and remote-broadcast with one unified merged result list. But sadly, for a variety of reasons, that doens’t work too well, it kinds of ends up bringing down even the locally indexed content to the sort of lowest common denominator of slow, poorly ranked, poorly facetted broadcast search. In Ex Libris Primo, which theoretically allows merged result set combining local and broadcast results, as far as I know no customers actually choose to take advantage of this feature.