I wrote in a Library Journal article a year or so ago about how federated broadcast search had some inherent technical limitations compared to metasearch supported by local indexing of records.
One trick is getting metadata from our licensed search/content vendors to locally index. Looks like maybe they’re starting to come around? At least EBSCO is.
Enterprise Search Integration (ESI): ESI makes EBSCOhost database metadata content available to an Enterprise Search Application (ESA) search index. The end result is that an ESA user can search EBSCOhost content within an integrated result list in combination with other databases and local content, under one common interface. ESAs can index millions of structured and unstructured documents from several content sources, and can utilize internal/external taxonomies to present users with a rich search experience – far exceeding today’s Federated Search products in usability. Some of the most popular ESAs include: Endeca, FAST Search, Autonomy, Northern Light, Lucene/Solr, and Google Search Appliance.
EBSCOhost can push article metadata with links back to the full text. Databases are accessible as XML via different mediums, including FTP. Incremental updates to the content are available in new XML files. This allows ESAs to index the content on their own time, and avoids the resource intensive processes and duplicate HTTP hits associated with web crawling.
http://www.ebscohost.com/thisMarket.php?marketID=35 [I think that was the right url cite for this clipping from my files, although it’s currently down for me when I try to check !]
Interesting they mention Lucene/Solr there as an “ESA”–really, of course, Lucene/Solr is just a building block, not a finished end product like some of the things they compare it to. (But then, so is Endeca). But it almost makes it seem like they’ve got a clue, huh?
Of course, if you want to take advantage of this, you’ve got to actually have some infrastructure for local indexing. I bet Terry Reese and LibraryFind are excited about this; my understanding is that while LibraryFind will do a broadcast search, it’s really happiest with locally indexed data.
I don’t know how many libraries have a flexible local index search application up and running. (I suppose you could translate their XML to MARC and load it in your traditional ILS/catalog software — but I don’t think you want to. ) I think you can buy an add-on to Ex Libris Metalib for locally indexed content. I suspect few do.
On the other hand
Meanwhile, other vendors are still suspicious not only of giving you metadata for local indexing, but of even broadcast federated search. When a customer recently approached Scopus encouraging them to improve the Scopus APIs used by Metalib for broadcast search, to let Metalib do a better job, a Scopus representative said in email:
On the “returning of the abstract text” [so abstracts could be included in federated search results] we see a more important role for Scopus as the main interface for search and retrieval of articles, than just a content source for Metalib. Would it be possible to make Scopus more visible on your website?
Aka, “You want us to make federated search work better? Can’t you just have your users use Scopus instead of federated search?” Well, no, we can’t.
Federated search not going away, Scopus
Here at MPOW, we certainly don’t hide Scopus or any other native interfaces. But we believe, and have evidence to support, that offering federated search in addition is a valuable service for our users. David Walker’s usage statistics at CSU seem to suggest that the presence of federated search doesn’t reduce use of native interfaces — it just increases database use over all, as people add federated search on top of that.
(Interstingly, Scopus actually offers pretty good APIs compared with their peers who usually offer no APIs to customers. This doesn’t seem to jibe with their apparent resistance to using Scopus through anything but the Scopus native interface. Go figure.)
So it’ll probably be quite some time until all the major vendors (let alone all of our content/search providers) are willing to provide metadata for local indexing.
This creates issues of it’s own in figuring out interfaces. It’s tricky to have a setup that’s: You can use this search interface (local index) to search some of our licensed scholarly databases, but you can use this other one (broadcast federation) to search more of them, but you can’t really get all of them except by searching one by one native interfaces. Sigh.
One obvious solution might initially seem to be offering interfaces that search both locally indexed and remote-broadcast with one unified merged result list. But sadly, for a variety of reasons, that doens’t work too well, it kinds of ends up bringing down even the locally indexed content to the sort of lowest common denominator of slow, poorly ranked, poorly facetted broadcast search. In Ex Libris Primo, which theoretically allows merged result set combining local and broadcast results, as far as I know no customers actually choose to take advantage of this feature.
8 thoughts on “local indexing coming?”
You might be interested in the Universal Search Solution that the New England Law Library Consortium (NELLCO) and IndexData are co-developing. They are planning to locally index harvested library catalog records and law-related articles. I believe they are going to roll it out in early 2009.
There is lots of activity here. OhioLINK has a project going on along these lines, the Rochester XC project seeks to integrate many forms of metadata, and others. Preliminary words about Summon from Serials Solutions sounds like a commercial effort in this area.
Is Serials Solutions’ Summon also an example of this? I got a response from their Senior Product Manager to my post (http://distlib.blogs.com/distlib/2009/01/serialssolution.html) that says:
I saw your post regarding the Serials Solutions Summon™ service, and I assure you that it is a completely new service that is independent of WebFeat or 360 Search. It is not a federated search—we are pre-harvesting content on a large-scale (really large scale) from publishers, aggregators, and database providers, and then allowing searching for libraries using their knowledge bases to identify the content to which they subscribe (plus their OPAC content and open access content). Furthermore, we use the library’s OpenURL resolver to provide access to the appropriate (for them) copies.
Sounds like the SerSol product is an example of this–and SerSol might have better luck getting that metadata than actual customers would, sadly.
Much like OCLC gives to Google data it won’t give to customers.
I’m not sure if the Scopus person who made that comment was trying to imply that we don’t want people to use Scopus outside of its native interface – because I don’t think that’s true. It seems this comment was pulled out of the context of a particular e-mail exchange with a customer, so I don’t think it’s fully representative of what our position is, really.
Scopus is indeed pretty “open” in terms of offering APIs (such as http://www.scopus.com/scsearchapi/), and we applaud integration of our content into other websites. We do need to strike a balance, though, between how much of our content and functionality we make available for display in other interfaces and under what conditions, and what we show exclusively on scopus.com. The decisions we make about that are partly dictated by marketing considerations, partly by obligations towards the parties who supply us with our data, and partly by practical limitations. There’s a lot that comes into play here, and different use cases lead to different decisions.
I’d like to think, though, that in these decisions we predominantly choose openness over closedness – I believe that there are a lot of places (outside of the native Scopus interface) where Scopus content and functionality work very well in concert with other content and functionality, and we really try to enable these integrations in a good way (which can be challenging at times). The better we can get them to work, the better off our users and customers are, and with that, so are we.
Hi Alex, thanks for the feedback. Apologies if I unfairly maligned Scopus quoting comments out of context; I will be more careful in the future.
I am confused though, as a customer, about the balance you are talking about. Whether my patrons access Scopus through the scopus.com interface or through federated search, my institution pays you all the same. I’m happy for my institution to pay for valuable Scopus content regardless of which method my users use. In fact, the more completely I can integrate Scopus into federated search and other integrative technologies, the more of a value Scopus is, and the happier I am to have my institution pay for it. It’s not clear to me how you lose if my users choose federated search instead of scopus.com. I don’t understand why this is a balancing act.
More and more, the ability to fully integrate in federated search and other integrative technologies is really key to the value we get from the platforms and content we purchase. The better your platform does this, the more business advantage you will have.
Or wait, are the APIs being used for federated search of scopus the very same ones that are available for free to non-customers? I certainly understand that there are limits on what you can give away for free. I have no quarrel there. But as a paying customer, the functionality available to me shouldn’t be limited by a lowest common denominator of what you can afford to give away for free. You’re not giving it away for free to me! Perhaps you need additional functionality in your APIs, or additional APIs, that are only available to customers and licensees.
As I’ve said, in the future, I expect that customer’s continual willingness to BE customers is going to really be effected by how well a platform can be integrated with federated search, and other integrative technologies which can combine resources from several vendors, under the control of the customer.
Hi Jonathan – thanks for these thoughts.
Our APIs are indeed also (partly) available to non-paying customers, and indeed, there’s a limitation to what a non-paying user can get. So – that’s a “marketing” decision: we need to distinguish between what a paying customer gets and what a non-paying customer gets. I don’t think that there are a lot of surprises there.
There are also some other, more practical, considerations:
– We try to keep the behavior of our APIs aligned with the behavior on Scopus as much as makes sense – easier to manage, easier to communicate. We don’t offer abstracts in federated search results for Scopus – but as you can see on Scopus itself, the abstracts aren’t in the search results list there either!
– It’s not very simple to just make everything we have on Scopus available through an API. As a product development person, I’d LOVE to be able to do that, but there are economics involved: developing and maintaining both a GUI ánd an API for a piece of functionality or a set of content is, frankly, not trivial. So – before we develop an API for something, we want to make sure it’s popular enough.
– APIs are not only a way of getting the same content or functionality to a user in a different way. They can also be used for harvesting data, for hooking up systems, for matching records – which is really a very different type of usage than what you seem to be alluding to, which is offering navigational tools for users. Often, this different type of activity is also associated with higher usage volumes. So – if we provide an API, we need to be prepared that our customers are going to be using them for very different things than just federating search – which is OK, but as you might understand, that notion also influences the decisions on how we design and deploy our APIs.