Federated Search: Users might actually like it

There is LOTS of skepticism toward federated search from librarians and library staff.  And indeed I agree that even the best library-oriented federated search solutions I’ve seen are awfully kludgey in many ways. (By “library-oriented” I mean oriented toward finding citations to (generally scholarly) publications, mostly articles.)

However, I believe that some form of inter-mediated meta-search is neccessary to meet certain patron needs we have, and I’ll explain why. But first, some anecdotal verification for my belief.

We’ve deployed Xerxes here at my place of work, a much better interface on top of the Metalib broadcast federated search engine.  The actual Metalib search engine is unchanged, you still get the same results you would from Metalib, no better. But they are presented in a much more usable interface.

Despite these improvements, many librarians here are still highly skeptical of our JHSearch federated search service, and reluctant to show it users.

But Christina Pikas, despite her reservations, decided to at least mention it’s existence at a recent library orientation she did for a particular disciplinary unit of researchers.

And shockingly, a few users tried it out, and liked it enough that they, without prompting or solicitation, sent her rave reviews.  One user went so far as to send Christina a screenshot demonstrating how JHSearch found the article he wanted, and got him to fulltext. Another user said, get ready for it, “Much better than google.”

Much better than google? I don’t know about that, but in some contexts, depending on what you’re looking for and what you want to do with it, sure, definitely.

An important point here is that, while librarians might want users to use only native platform interfaces from our licensed databases, they are not going to. They are not going to learn dozens of different (often clunky and confusing) vendor interfaces, and perform multiple searches (on multiple platforms) for every query.  Even sophisticated faculty searchers.  They might learn one (or maybe two) native vendor platforms, that’s typically about it.

So they’re going to go to Google.  Which often works, but has some problems as well. Google (and even Google Scholar) aren’t that great at getting users to licensed fulltext, even when your library does license fulltext for an article the user finds on Google (or Scholar).   Google Scholar (and especially Google) are kind of grab bags of content; they have a lot, but for scholarly research, depending on your search and needs, the kind of things you’re actually looking for may be drowned out by noise, and there may be lots of content (much of which the library has licensed fulltext for) which are not in there at all. It’s hard to say exactly what’s in there, we have no control over or really much information about what’s in Google, and no service agreements with them.

But Google is fast, and easy to use. Then we have our licensed vendor platforms which in some cases are fast and easy to use, in some cases aren’t, but typically offer more powerful searching tools than Google (or broadcast federated search like JHSearch).  But are also multitudinous, requiring a researcher to do multiple searches in multiple interfaces if they want to take full advantage, and they aren’t going to do that.

Then we have the library-provided broadcast federated search. It’s (even in the best implementations I’ve seen) slower and klunkier than Google, but (if you make the interface as good as you can, like with Xerxes), easier to use than the aggregate collection of our multitudinous vendor platforms.  It probably doesn’t cover as much content as if you were to search every single licensed vendor platform (I have not seen any academic federated search deployment that does), but for many (not neccesarily) searches it will offer a better, both more complete and more focused, collection than Google.  And it’s the only option of these three that the library actually has control over, to improve the interface to try to meet local user needs.

Each of these options has pros and cons for the user. I wish we didn’t have to present the user with so many options, and could just give the user a tool that would work in a variety of contexts and needs, but the technological and business environment just doesn’t make that possible right now.  I continue to be of the opinion that the library providing some form of “multi-vendor content search” like broadcast federated search is a crucial tool for us to supply for our users search toolboxes.

Now, I continue to be very interested in the “aggregated index” solutions like SerialSolutions Summon and Ex Libris PrimoCentral that are appearing in the academic/scholarly research market.  I think they have a lot of promise to hit most of the benefits of broadcast federated search solutions while reducing a lot of the problems with broadcast federated search solutions.

These aggregated index solutions could very well become a better option than broadcast federated search for meeting this space in the middle of licensed vendor platforms and Google:  An interface under library control, crossing publisher and aggregator vendor boundaries in a single search, but more focused/targetted content for scholarly search than Google, and with better connections to licensed fulltext and other library services (like ILL).

I haven’t had a chance to investigate either of these aggregated index solutions exhaustively, I’m not sure how they’d realistically stack up against broadcast federated search for an academic instution, but the concept definitely has promise. But they are still not going to be able to offer as sophisticated search tools as licensed vendor platforms — nevertheless, one way or another we need to meet this “middle ground” need, and they have the promise to meet this need while improving on the user experiene of broadcast federated search like Metalib, we will see.


15 thoughts on “Federated Search: Users might actually like it

  1. I suppose we should not read too much into those who try to compare Federated searching with Google. It is a ridiculous comparison! There is something called “Controlled vocabulary” in the licensed databases that makes the hits relevant – you get lots of rubbish in Google searches, no good for sensible research.

  2. You don’t get much use of controlled vocabulary in broadcast federated search though, because a single search will typically cross the boundaries of different databases with different (or in some cases none) controlled vocabularies. If I’m misunderstanding and you believe that controlled vocabularies come into play in broadcast federated search, I’d be interested to hear how.

  3. My experience with our own Metalib install is that librarians dismiss it because it makes it hard to find the very best results. But there’s a whole class of students that *love* it because it gives them “good enough” results incredibly quickly. I find that to be the fundamental disconnect between librarians and (many/most) patrons, and no software product is going to be able to address it.

  4. Bill, I know what you mean about not expecting software to change staff attitudes; I’m more concerned with meeting patron needs (I originally said ‘student’, but faculty and research staff DO use federated search and like it, as in the story in this post) than staff attitudes, but it’s still important in to get organizational commitment to your user services, like federated search, if you want them to actually be promoted and used, this is true.

    I do think that putting Xerxes on top of our Metalib has helped make some librarians more open to considering it. I’m guessing this is for two reasons:

    1) It’s just plain a better interface, it’s no longer so ridiculously bad as Metalib out of the box, which makes people, even librarians, more open to considering it.

    2) Our Xerxes installation allows librarians to easily embed lists of subject-appropriate databases on their subject guides, with the URLs and names (and even descriptions, of so chosen) being managed centrally, so if they change then all the subject guides get the changes automatically. reference/research librarians tend to like this, it helps them make a good subject page easily — but what they get is a list of databases that comes along with a federated search box. (Actually, they CAN omit the search box if they want, but we succesfully encourage them not to — it’s half an inch on the screen, what can it hurt?) Users can choose to click on individual databases for native interfaces, or choose to use the federated search. I think through this, librarians end up hearing from their users they do indeed like it, and/or end up trying it out for themselves now and then since it’s right there, and seeing that it’s sometimes useful (see #1 above again).

    Here’s an example subject guide with embedded-via-Xerxes database-list- with-fed-search-box: http://guides.library.jhu.edu/content.php?pid=16418

  5. Hi Jonathan

    We at Canterbury Christ Church, UK have MetaLib and we are just at the start of a review of our Search and Discovery services. We have found the same resistance by some staff and some students to using federated search that you have suggested. The awful MetaLib interface doesn’t help matters, which is why we were interested in Xerxes and I would imagine this is why ExLibris have chosen to overlay the Primo interface on to MetaLib for a future release.

    We are not a huge subscriber of electronic resources, but do have quite a few packages. We were interested to find that Summon has a coverage of around 96% of our SFX holdings (and I’m sure that Primo Central is equivalent as well). If we did purchase one of these indexes (actually we get Primo Central for free as a MetaLib/SFX customer) it kind of makes the MetaLib search redundant. Why bother to do a federated search when an index is available to search in a lightening quick time?

    I believe that most of our students and staff would be happy with a single interface (be it Primo, Aquabrowser, Summon, VuFind, etc) even if it did only bring back a 90% of the holdings. I believe that users would be prepared to then dig deeper into vendor native interface to research that remaining 10%.

    I’m really interested to see what other feel about this, which is why I started a twitpol…hope you don’t mind me posting a link here: http://twtpoll.com/mzxq23

  6. Andy , yeah, i agree with your opinion on the potential usefulness of aggregated indexes like PrimoCentral or SerSol Summon. Certain federated search vendors are insisting that you can never give up federated search for aggregated index because you’d be missing too much content — a claim I’ve been suspicious of. So I’m very interested to hear your findings that PrimoCentral covers 96% of… well, you say of what you have activated in SFX, is it fair to say this is a good proxy for saying it covers 96% of your licensed content? Do you have any idea how much coverage Metalib has, in comparison? I’m guessing significantly less, in fact (even if you assume you can search all your activated ML resources at once in Metalib, which you can’t, but some competing fed search products may support this).

    On this particular issue, I’m less interested in “opinions” than actual numbers/facts/evidence of some kind, which is why your 96% figure is interesting, thanks!

    If you wanted to go further, and analyze coverage in PrimoCentral too, and in Metalib too — and do a bit of examination to make sure the % numbers you get from the vendors marketting and documentation really do represent reality — that would be awfully useful information to share. I think it would make a pretty good short Code4Lib Journal article too, if you’re interested.

  7. Jonathan,

    I have no problem with generating some statistics regarding the coverage of Summon and Primo central against our SFX/MetaLib holdings. What I’ll need to do is contact SerSol and ExLib to get an up-to-date coverage list and to request permission to make my findings public (I don’t see why they wouldn’t give permission). I’ll also need to check with our Library management team that it’s OK to publish these stats (again I don’t see why they would mind).

    If all the above is OK then sure why not put it in an article :)

    Leave it with me and I’ll get back to you.

  8. wrt comment #1 – yeah, I’ll go out on a limb. Controlled vocabulary can introduce terms – make them available for search – when they don’t appear in the text. That is, of course, if the fields metalib searches include de, id, etc.

  9. I agree with Bill, it depends on expectations and needs.

    I helped develop a pilot with Metalib aimed for graduate students within a specific discipline. We created a custom interface for the pilot that was better for the task than the native Metalib (probably not as polished as Xerxes). However, the show-stopping complaints we got from the graduate student evaluators had to do with the quality of the results and the precision of advanced search. We also experienced issues with the granularity differences between the sources. Good search terms for general sources were too restrictive for niche sources, and good search terms for niche resources were too expansive for general sources.

    At the end of our pilot it was clear that building several discipline-specific federated search applications on top of Metalib would not work across the board. Perhaps some disciplines might use a set of resources better suited to federated search, but this was not the case for our pilot discipline. We came to the conclusion that graduate students are better off learning the native interfaces for the relevant databases, especially if they expect their literature searches to be exhaustive.

    We did recognize undergraduate students would likely be a better target for any future pilots. As Bill pointed out, “good enough” results are often good enough to meet their needs. They are also taking classes all over the discipline spectrum, and learning the native interfaces relevant for each class/discipline is too much to ask.

  10. Jonathan,

    I did contact ExLibris and Serials Solutions asking for permission to make public the findings of a match between our holdings and their indexes. I’m still waiting for an answer from both…so I’m not sure if I’ll ever get an answer :(

    Unofficially I have calculated a coverage of our SFX full-text holdings to be:

    Summon – 96%
    Primo Cental – 76%


  11. Jonathan,

    Sorry me again :)

    Just had a reply from ExLibris this morning and they are calculating our match of SFX holdings on Primo Central to be more like around 90%. I thought it only fair to post this because I had given figure for Primo Central in my last comment that were much lower.


  12. Awesome, thanks so much for doing that and sharing, Andy, so interesting to see. I doubt you’ll get permission from either of them, I wonder if you need permission though? But that’s your call, thanks for sharing here anyway.

    Do you have any sense of % coverage by searchable resources in Metalib?


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s