While we libraries typically spend more of our resources on ‘catalog’ search, for academic libraries a significant portion of users probably spend more time looking for articles instead. (Anyone have a cite to any research showing this?)
So many of us are trying to spend more resources on supporting article search in a way that is integrated into our web infrastructure, with a good user interface, instead of a hacky afterthought.
I had been assuming that the way to do this was to, somehow, provide a single search interface that would search both over the traditional catalog/ILS database, and over vendor content including scholarly articles, in one merged result list. But this is a very tricky thing to do, and most of the feasible ways to do it seriously constrain our infrastructure choices to dependency on a single vendor’s stack. However, some interesting research from UVa suggests that users may not want actual merged search results, may indeed specifically prefer not to have it. This would actually open up our options some.
Catalog Discovery Options
As I’ve mentioned before, we undertook a survey of “next generation” “discovery service” options around two years ago, focusing on replacing our traditional OPAC, focusing on the ILS/catalog database.
At that time, looking at our options, we decided that of the various proprietary options available, they’d take pretty much as much local programmer work to set up as an open source option; and they still wouldn’t give us a lot of the feature we wanted, including some sophisticated search features, as well as seamless UI integration with ILS features like account functions (items out, making requests), and live shelf status (checked out, etc.).
So we decided to go with an open source solution, that would be (we thought) actually cheaper TCO, but give us the ability, if we could afford the development time, to do things just how we wanted. In retrospect, the open source solution took more time than we expected, but I do think we ended up with a better interface than we would have gotten from even today’s (two years later) proprietary products, better UI integration with ILS functions. While we dont’ have ‘browse search’ yet (and it’s non-trivial to add to Solr-based software), if we can afford the time to spend on it, I am confident I could code it up — instead of just waiting/hoping for a vendor to do so in a proprietary product. We have more control, and more options.
In the context of the options we evaluated two years ago, I think we made the right choice.
But the context has changed somewhat, which initially caused me to second guess a bit.
The Rise of Aggregated Index Offerings
Two years ago when we did our survey, it was before the rise of vendor-provided aggregated indexes. These are indexes of third party content, initially and still majority composed of scholarly articles indexed at the article-level, combined from various third party sources. Examples are Serial Solutions Summon, Ex Libris Primo with Primo Central, and OCLC WorldCat Local. (Which existed two years ago, but had not yet added significant third party article-level content, which they since have done).
This class of product marks a new option for providing “article searching” support to our patrons. Historically, there have been several ways libraries tried to provide this support. One way is simply giving the user a list of licensed databases, and letting them to a variety of third party vendor platforms to do these searches. While OpenURL allowed such an approach to still, hackily, integrate with library services and alternate full text destinations, this still required users to deal with a bunch of very different interfaces, fairly poorly integrated into the library web infrastructure.
The next approach, at least 15 years old, was broadcast federated search, where products — such as Ex Libris Metalib or IndexData MasterKey — actually try to search a handful (or dozens or even hundreds) of third party databases every time the user enters a query, and blend the results. This gives us a single interface for searching multiple vendors databases, and allows us to integrate that interface (to some extent) into a unified web presence…. but it just doesn’t work very well, for somewhat insurmountable technical reasons.
I wrote about this four-and-a-half years ago, and suggested the only way to get a good “meta-search” user experience was to harvest/collect metadata from these disparate third party databases, and combine them in a single aggregated index. (What I called a ‘local index’ in that article I really should have called an ‘aggregated index’ — the word ‘local’ is unclear, local to what?).
I still believe this to be true — and I think the rise of the aggregated index products bears this out — but I’ve come to realize it takes an enormous effort to do this well (including keeping your data up-to-date on an ongoing basis), and it’s infeasible for an individual library to do this on their own. It would take a large consortium, or a large vendor, to have the resources and economy of scale to pull this off. Thus the rise of the vendor-provided aggregated index products.
So Summon, Primo(Central), and WorldCat Local all include such an aggregated index, hosted on the vendor’s platform. They also all will give you a product that includes your local metadata (ie, what you control locally through your ‘catalog’, including your own physical holdings). This allows what many of us thought was the “holy grail” of library search — combining ‘catalog’ search with ‘article’ search in a single interface allowing you to search both at once and returning a single merged, relevancy ranked, result set. (That part in italics is key, we’ll return to it shortly).
But it’s completely incompatible with an open source we-control-the-software-and-can-add-features approach. Sure, some of these vendors give you limited APIs, allowing you to (at potentially great cost in development time) put your own ‘skin’ on their search, potentially integrating with local systems (for instance, patron account screens) better than the out of the box product. (and some of them don’t give you sufficient APIs for that). But they’re all based on indexes that reside on a vendor’s server, and you don’t have access to change the underlying indexing routines, strategies, fields, etc. Nor will any of these vendors share their aggregated data with you. (Perhaps because of licensing agreements with the providers they harvest from; perhaps just because it would be too much trouble and expense for the limited number of libraries interested in such a service and what those customers would be willing to pay.)
There’d be no way to add a browse search, or a novel timeline or map search results display, or a novel authorities browse, etc. You’re stuck with what the vendor gives you, and stuck paying for it, and by combining your catalog search and your article search in one vendor product, you’re putting all your eggs in one basket, limiting future flexibility, increasing cost of switching, etc.
But what you get an aggregated index. This seems like a trade-off without a best-of-both-worlds option. But in fact that’s only true if we assume that users want/need to be able to enter a query, and get back both ‘catalog’ and ‘article’ results, in a single merged and relevancy ranked result list.
Recent communications from librarians at UVa give us cause to question this assumption.
Users actually may not want single search
Julie Meloni from the University of Virgina writes on the blacklight listserv:
- Process: A/B testing (really A/B/C/D testing) of four interfaces that offered some sort of aggregated search (Stanford, Michigan, Villanova, and University of Central Florida (who is doing the blended results/relevancy rankings if anyone remembers that conversation from NGC4Lib ) if you’re wondering). From those results we determined two critical pieces of data (among several others): patrons come to Virgo knowing the _type_ of item they’re looking for (e.g. book or article), and too much info in search results is not desired.
- Really important point that came out in user testing here (of our patrons and their needs, with all due respect to others) is that patrons _did not_ want blended results. At all. Across the board dissatisfaction with that approach. This was awesome for us to hear because it meant that we _didn’t_ have to come up with some intricate/ tricky/very fragile way of maintaining article metadata (that legally we couldn’t hold anyway) in our own Solr index such that everything could have our own relevancy rankings applied and so on.
This is in fact a hugely important re-evaluation of our assumptions of what user’s want. As Julie says, it opens up our options a whole lot.
If we don’t need to provide blended search results, then it may indeed be possible to use a vendor-provided aggregated index combined with a different product, such as an open source Solr-based product, to provide searches.
You’d still want to provide an integrated look-and-feel that makes it look to the user like it’s one “product”. But if you don’t need to blend the results, behind the scenes it could be consulting your own Solr index for ‘local’ (ie catalog) content, and the remote vendor provided database for ‘article’ content. Which is exactly what UVa plans to do.
It’s not necessarily easy, some libraries will, at the moment, still find it easier to just pay a vendor to provide a single solution that does it all. Certainly it’s possible for these all-in-one vendor aggregated index products to provide seperate “local” vs “non-local” content searches, if that’s what the user really wants to. But so long as you don’t have to blend the results, it’s feasible to combine a local open source ‘catalog’ index with a remote vendor supplied ‘articles’ index, maintaining more control over your local search functions — and can become even easier and cheaper as libraries with the interest and resources to work on it provide more open source tools.
What do you call these two types of search?
So UVA says that users didn’t want blended results. They wanted to keep these ‘two types’ of searches — catalog and ‘article’ — seperate. But this makes me wonder, what words or concepts did users use to describe these ‘two types’, how can the system describe them in a way that makes sense?
I don’t know if users really know what ‘catalog’ means, especially as contrasted to ‘articles’. Many users expect our existing ‘catalogs’ to include article-level citations and are confused they don’t. The ‘catalog’ isn’t just ‘books’ — from a back-end point of view, it’s our ‘locally controlled metadata’, but this obviously is not something the user cares about at all.
In fact the ‘catalog’ contains all of our ‘physical’ holdings — including books, videos, CDs, pamphlets, assorted ‘realia’, etc. And it includes journals we have physically, but only at the journal or volume/issue level, not article-level metadata. Oh, and then it includes some electronic content, like ebooks we’ve licensed, and electronic access to journals. But still not article-level metadata. Which again, is what many of our users are interested in finding much/most of their searching time.
And then there’s this ‘other stuff’, aggregated metadata from a variety of third party vendors. Which is mostly scholarly article citations (which often we’ll be able to provide electronic full text for), but may also contain some book citations, some audio/video citations, who knows what, whatever EBSCO or ISI or Scopus decided to index.
So apparently we know that users (or a least UVa users) want to keep these things seperate, what is it they think they’re keeping separate? (in fact, our inability to make these two categories distinguish on any actual user-centered characteristic is what led me to assume they should be ‘blended’).
I think some user-centered research on that would be interesting, perhaps if we learn more about what UVa did, we’ll find they explored that too. Joseph Gilbert from the UVa says:
It’s still a bit earlier in our usability testing to know for sure, but our delineation of “Catalog + Articles”, “Catalog only”, and “Articles only” seems to resonate with our user population.
Villanova uses “Combined results” [“combined” is a side-by-side listing of two result sets, not a merged result set –jrochkind], “Books and more” and “Articles and more”, which our users found a bit confusing, partly for the reasons you mention (especially “books” as a stand in for everything in the local collection) and partly for UI reasons. Catalog seems to be a reasonable catch-all for all our videos, bound journals, books, etc., though we also highlight specialized sub collections like our video search and music search. We’ve found that sometimes users are unclear if “article search” means only online sources or not, but we have a significant design space dedicated to making this distinction clearer. I think any single-word label without contextual help is likely to be confusing in one way or another.
Cheaper source of aggregated index?
It occured to me recently that in addition to the new fangled ‘next generation’ ‘discovery layer’ aggregated indexes, there are actually some aggregated index type products we’ve been paying for for quite a while.
Examples are Scopus and ISI Web of Knowledge. Both Scopus and ISI try to get as many scholarly citations as possible, by aggregating from a variety of sources, similar to the newer aggregated index products. Many of our libraries are already paying for Scopus and/or ISI, and at significantly cheaper (is my impression) prices than vendors are charging for new-fangled aggregated index products.
So what is the difference between these ‘old’ aggregated index products and the new fangled ones?
Well, the new ones allow you to blend your local metadata (‘catalog’) with their aggregated index in a single relevance ranked hit list. You know, the thing we’re saying maybe our users don’t actually want, and maybe we can’t feasibly do while maintaining control over our software stacks either. The new ones also give somewhat fancier slicker interfaces than Scopus or ISI did last time I looked — interfaces that the UVa-type approach probably wont’ be using anyway, instead using API’s to get results from aggregated index products and present them in local open source interfaces.
So I wonder if we could profitably use Scopus or ISI Web of Knowledge as our ‘article search’ source in this strategy, at significant cost savings over buying one of the new ‘discovery layer’ aggregated index products, only for use of it’s API.
It depends on the quality of the APIs and the quality of the search results from an ‘old school’ aggregated index product. I seem to recall Scopus has a pretty good API, but I haven’t looked at either one in a while.
I have a fantasy now of doing A/B(/C/D) style testing with users comparing our various ‘meta-search’ options. One or more of Summon, PrimoCentral, or WorldCat Local; compared to Scopus or ISI Web of Knowledge; perhaps compared to Metalib too. See how the actual results stack up, see if Scopus or ISI can provide ‘article search’ services just as well, at lesser cost.
(I can’t keep track of vendor ownership these days. Are either or both of Scopus or ISI owned by company which is in addition trying to sell you a more expensive ‘discovery layer’ service? That might effect how much their owners would want to meet such a use case).
Hope to see more from UVa
So, I think UVa’s research in this area is really important, really enlightening and challening some assumptions many of us have about user desires and needs and how they should be translated into software to suppor them.
I hope we can soon see more complete write-ups from UVa on what they did and what they found, continuing as they continue to find out more stuff through research and trial development.