Article Search Improvement Strategy

jrochkind General October 2, 2012October 4, 2012

A position paper prepared for internal use, but why not share it with you all? If you prefer a powerpoint presentation, I have that too.

(experimenting with putting phrases in bold for readability in a short attention span twitterfied world. Don’t know if it really improves readability, or just makes it look like an informercial).

Summary of Arguments

We should prioritize improving article search for our users
A “bento style” search interface implemented in Catalyst is, right now, the right strategy for us to pursue in terms of:
- our overall strategic directions
- Cost/benefit, we think it can provide significant user benefit with, compared to other options, high feasibility and low development time.

Article Search Is Important to Our Users

We know that many of our users spend as much or more time looking for articles (both known-item and topic searches) as they do looking for Catalog materials. While this may vary between users in differnet disciplines and at different stages in academic career, we know it’s important to a great many of our users. This is why we spend so much money on article fulltext and A&I databases, of course.

North Carolina State University (NCSU) provides a search tool that searches both Catalog and Articles, putting results in different areas of the screen (we’ll look more at this style of interface later). NCSU found that 45% of their user clicks on results were in the ‘Articles’ results, vs 35% in the ‘Catalog’ results. (http://crl.acrl.org/content/early/2012/01/09/crl-321.full.pdf+html).

Selected comments from 2012 LibQual indicating importance of article search and/or licensed databases:

“Having full-length scholarly articles available online (that can be accessed off-premises) is the #1 useful thing I use the library for! “
“While I have not physically been to the library, I have had occasion to use the electronic access and find it very useful obtaining articles for research.”
“I find the online resources very useful. I don’t usually spend a lot of time at the library itself, but I very often use the search engine to find journal articles, and have so far always been happy with the results.”
“I love the access to the online databases it is very useful for research”
“I mainly use the medical databases offered through Welch. These websites cost a fortune to access without the JHU pass.”

And more LibQual comments indicating some problems (we’ll see more later):

“I’ve always had really great experiences, and the main thing I’d suggest improvement for would be the library website’s search function for articles & databases”
“It is also sometimes difficult for me to find peer-reviewed articles on a particular topic if I don’t already have a citation.”

We know article search is important to our users. And, as the following will show, we know that our existing solutions are not satisfying users. But we have not traditionally spent as much time on improving article search options as we have on improving Catalog search options — for a variety of reasons including historical predilections and lack of options. However, it’s time we spend some time on improving our article search services.

Existing Article Search Options Are Not Sufficient For Our Users

Existing article search functionality that is integrated into library services includes:

Licensed databases, provided in the JHSearch directory, available for individual access and searching.
The Metalib-powered JHSearch federated search tool
Google Scholar

Individual Licensed Databases

The most ‘traditional’ way of supporting article search is with our licensed databases. We have hundreds of licensed databases. Each database may have it’s own particular sophisticated search.

This does not satisfy users who do not want to learn and deal with multiple seperate vendor interfaces, but do not have a single database that meets all their needs.

It also requires users to take extra steps to realize there is a list of databases, to choose from this list, and to deal with an unfamiliar interface. In the current environment, many users much of the time just want to be able to ‘search for articles’ without these extra steps.

Even when we provide a by-subject directory of databases, users have trouble picking databases to use (assuming they found the directory in the first place). In a Bowling Green State University study, librarians found:

…most students at BGSU choose to use databases whose names they recognize, and, students who do not know of a named database to use have a great deal of difficulty otherwise identifying one appropriate for their search topic, even when using library-provided subject lists and descriptions. Three students specifically mentioned that they would probably just go to Google or Google Scholar.

Amy Fry, Linda Rich (2011) Usability Testing for e-Resource Discovery: How Students Find and Choose e-Resources Using Library Web Sites. The Journal of Academic Librarianship 37(5), pp. 386–401

From our own 2012 LibQual:

” The electronic resources are generally good, but sometimes it’s hard to figure out which database to look in to find a specific journal or resource.”

We suspect that some users end up using only a single licensed database in their work (JStor is a popular one), not neccesarily because it includes all the content they are interested in (until recently JStor had no recent content, but many users relied on it exclusively anyway), because it has a simple interface and they don’t want to deal with multiple databases.

The “first pick a database” approach is also particularly poor for when a user has citation information for a known article and wants to look it up for full text or other library delivery options; they don’t want to guess which database might include this article first.

Our directory of individual licensed databases will not be going away — these databases are powerful tools providing sophisticated and focused searching for those who need it and are willing to invest the time to learn them. But we believe this method of offering article search is not sufficient for our users needs.

Metalib-powered JHSearch federated search

History, how we got here

Metalib is a product we license which falls into the category of “broadcast federated search”. When a user enters a query, Metalib goes out and searches multiple databases at once, and then tries to combine the results from these databases in a blended result set.

This class of product started appearing in the library market well over a decade ago, and was meant to provide a simpler search environment to deal with some of the downsides of directories of databases as avenues for article search.

We’ve licensed Metalib for at least 8 years ago, and approximately 5 years ago revamped it’s User Interface using the open source Xerxes front-end, to try and ameliorate some significant UI problems in Metalib alone.

Metalib was intended precisely to deal with the problems outlined above of sending users to individual databases, providing a simpler, more consistent, and integrated search service.

not entirely happy with Metalib

However, we have never been very happy with Metalib’s search results. The technology used to do ‘broadcast federated search’ is inherently flawed.

The product is very slow to return results.
Relevancy ranking is poor when blending results from different databases.
A broadcast federated search offering can only offer limited and inconsistent faceting, limiting, or fielded search, because of inconsistencies between databases.

The nature of the broadcast search technology is such that JHSearch can only search 5-10 databases simultaneously — we could push that a bit higher, but not to all of our licensed databases — such that users still need to make a choice of databases to search (or find a librarian-selected subject-specific set, which is what our tools actually do). And it also means that Metalib-powered JHSearch performs very poorly for specific ‘known item’ search, as the article you are looking for has to happen to be in those 5-15 databases.

These are challenges for the ‘broadcast federated search’ technological approach, which have been known for some time: I first wrote about them in 2007 in Library Journal ((Meta)seach like Google, Library Journal 2/15/2007, http://www.libraryjournal.com/article/CA6413442.html)

Largely because of these problems, SAIS has never chosen to direct their users to our JHSearch federated search product. Welch does use Metalib in their own custom way, although not via our JHSearch service (I do not have usage or satisfaction information from Welch Metalib service, although it would be interesting to see). MSE, however, has highlighted JHSearch in our offerings to users, and MSE’s subject guide pages include search boxes which send users to JHSearch.

And yet, users use it

Despite the problems of JHSearch, use of JHSearch for article searching is huge — presumably mostly MSE users, since MSE is the only Hopkins library that promotes JHSearch. Our statistics showed 45,000 searches using JHSearch between Dec 04 2011 and Feb 14 2012 (the last time we checked). (These numbers are so high it makes me actually doubt the validity of our measurements and wonder if we’re over-measuring somehow. But it seems safe to say this search service is getting used a LOT).

This extensive use, despite the problems of Metalib-powered search, shows that there is great demand among our users for the kind of search service we aim to provide with JHSearch — one very simple to use, integrated with web pages our users are already at (such as the subject guides in this case), requiring no pre-requisite choices from the user before doing a search.

But we want to, and can, do better than we can do with Metalib.

From 2012 LibQual:

“My biggest complaint about the library is that while it does offer the Communications and Journalism “Research by Subject” tool, it does not allow you to filter for peer reviewed articles only. As a result, I need to go in and search in each individual database to filter for peer reviewed materials.”
“JHSearch is fantastic, but the electronic thesis database is difficult to find and tends to time out. “

Google Scholar

Google has a ‘Google Scholar’ product which tries to aggregate article citations and other scholarly citations — in a single aggregated index that Google creates, rather than by ‘broadcast federated searching’, avoiding many of the problems of broadcast federated search.

Google Scholar does work well for many of our users — and many of our librarians direct users there. This again shows the demand for simple “google like” article search.

Google Scholar is a great service for our users and we will continue to direct them there as appropriate. However, there are some serious limitations to relying on Google Scholar to serve our users needs:

Anecdotal experience shows that it works better in some disciplinary areas than others (works especially well in the sciences), and in general better for ‘known item’ searches than topical searches.
While Google Scholar does provide limited integration with Find It for getting licensed library full text or other library delivery services — it’s user interface can often misleadingly send users to vendors asking for payment for fulltext, when the library actually licenses that fulltext from another vendor.
We have no contract with Google, no support from Google, no way to find out more about the internals of how it works or make feature requests. Google has their own interests at play, and is not neccesarily interested in optimizing their service for the needs and interests of our library or our patrons. Google could decide to take away Google Scholar — or end it’s integration with Find It — at any time.

In general, while we are glad Google Scholar is useful to our users, article search is too important and core a research service for us to give up on a library-provided solution and simply outsource to a free service we have no contract or relationship with. We need to try to meet users needs with a service we can control, optimize for their needs, and crucially: integrate with our existing web pages and web services that our patrons are already visiting, to lower the barrier of discovery and use.

Selected comments from 2012 LibQual:

“I start with google on the internet, but 2/3 of the time it is not free, then I login here to find the article. I may need to look into how to do this better but it has not been entirely intuitive. I usually find what I need but seems like it could be much smoother.”
” I typically access from Google Scholar and I have my setting set to JHU library. I have found this the easiest way to have access to online articles.”

Focused use case: Simple search

We believe many of our users much of the time want a simple article search option, integrated into our existing website/Catalyst, which requires them to make as few decisions or clicks as possible in order to get to useful search results.

JHSearch gets a huge amount of use despite it’s flaws and insufficiencies — this shows demand for a simple decision-free search.
Selected comments from 2012 LibQual:
- “There is way too much nagivation required for the online resources. It just seems like an antiquated interface, and more times than I care to experience, I have come across articles or journals that are not available in full text unless I purchase them.”
- ” I love that we have access to so many journals, but they are hard to get to from off campus (ie connectng to the VPN first, then going through the library’s search, then choosing a service to view the journal, then looking for my specific article…)”
- “However, my complaint would be the website for the library. It doesn’t feel intuitive and I often have a very hard time finding even basic things I need like newspaper articles.”
- “What is Catalyst? It doesn’t provide articles or reports. What do the databases under your menus provide, so that I know whether to use them for my needs?”
- “[Suggestion:] 2) use google search for library search portal instead of current search engine, which is kinda sucky…. What currently happens is you do an easy google search and produce a bunch of hits, but the same search terms produce jack squat in the library portal search. So then you have to go article by article through the google results, looking up the journal in the library portal…”
- “I wish the library homepage was a little easier to navigate…I usually use Google to find articles first and use the library database as a last resort. “
Our logs show users entering article titles into just about any search box we offer, even though most of them can not find articles like this. Sean Hannan has comments from users clicking on the Catalyst ‘feedback’ link which also shows users expect Catalyst to search articles, and are not succesfully aware of or finding our other article search options (especially for known-item searching):
- “Even something as simple as searching for an article, with known volume number and page numbers, in the American Journal of Public Health took me numerous attempts and many frustrated clicks. Someone, please make this process easy.”
- “[It is] difficult to find full text journal article online by name of journal. I am trying to locate a specific article by Robin Newhouse written in 2007.”
- “Hello I am a student in the Discovery Hopkins Program and am taking a 2 week course called Mind Brain and Beauty instructed by Dr Monica Lopez Gonzalez. She has sent s a reading list and I am having trouble finding the articles in the on-line library.”
- Sean writes: “As well as countless people just pasting in a citation into the feedback box with a “Get this for me.” message.”

For these reasons, we suggest the use pattern/usage style to focus on:

Users who, due to level of experience or just being in a hurry…
…Do not want to have to make decisions about what databases to search, or have to go out of their way to find the search tool — they want an article search function seamlessly integrated into our tools….
…Which will be more or less “google style”, enter some search terms get back results. This use case focuses on simple basic search, not fielded/advanced or faceting.

This does however include both use case of:

Searching for articles on a topic
Searching for a specific known article by title/author, for purposes of finding full text or other delivery options.

This sort of ‘simple single search’ is not the only way all our users will want to search all the time. But we believe it is a significant usage style that many users will want much of the time; it is a usage style supported least well by our existing services; and, significantly, is one we have the most power to intervene in solving.

In the post-google world, many of our users much of the time want a search as simple as Google. We would want this search service to be closely integrated with our Catalog search, because we know users do not want to have to choose different places to search for different materials, and often are not aware of these different places.

This is essentially the usage style/pattern that Metalib-powered JHSearch aims at now. We would be looking for an improved service to replace current Metalib-powered JHSearch rather than be an additional option — the users engaging in the use patterns we are focusing on do not want multiple options to choose from, and we do not have the resources to support multiple solutions. And it should be integrated with Catalyst.

New Options for Improving Article Search

Historically there wasn’t a lot we could do to improve article search options: we could license lots of databases and show them to users; we could try to use federated search products to provide simple integrated searches. So this is what we did. However, as above, we believe these solutions are not sufficient for our users at present: And new products give us some additional options.

Over the past several years, library industry vendors have come out with ‘discovery services‘, which include aggregated indexes of scholarly citations. This is in part a response to the known insufficiencies of the ‘broadcast federated search’ technology used by Metalib, and evidence that an aggregegated index could do better. Many of the companies offering these new discovery services based on aggregated indexes also offer older ‘broadcast federated search tools’ — and in some cases seem to be working to phase out the older federated search tool — Ex Libris seems to see Primo as the eventual migration path for current Metalib customers, and does not seem to be doing significant development in current Metalib.

These products include: Serial Solutions Summon, Ex Libris Primo, EBSCO Discovery Service (as well as OCLC’s attempts to include more article content in WorldCat and WorldCat Local).

‘Blended’ results via ‘Bento Style’ results.

These new-generation ‘discovery’ products all include aggregated indexes of articles and scholarly content — but they are also all designed to include your catalog and other local metadata, in one single ‘blended’ search results list. For instance, Summon as implemented by NCSU: http://ncsu.summon.serialssolutions.com/search?s.q=noam+chomsky+manufacturing+consent Note fourth result is a book in the library.

However, there is another way these discovery layers can be used, using them only for their article/scholarly citation searching, and leaving your catalog in another product — but still with an integrated feel. Tito Sierra, was working at NCSU at the time, called this other option “bento style”, as it’s reminiscent of a Japanese bento lunch box (http://en.wikipedia.org/wiki/Bento)

Bento style examples

And in fact, while NCSU does have a Summon interface that includes their catalog, their library website primary search landing point is a bento style interface instead: http://www.lib.ncsu.edu/search/index.php?q=globalization+indonesia&x=0&y=0

Note that catalog results (labelled ‘Books & Media’) are in a different area of the screen than Article results. Other individual boxes include ‘library website’ and ‘databases’. NCSU’s article search is powered by Summon, but their main catalog access is via another system, with results integrated together on one page, but in separate sections.

This bento-style interface has been adopted by others of our peer institutions, for reasons we’ll think about in a moment, including:

Columbia, with a ‘Catalog’ section, an ‘Articles’ section, also an Institutional Repository section, and library website. Their ‘articles’ area is powered by Summon.
University of Virginia, which offers only Catalog and Article sections, although kinds of searches are available as additional links. Their ‘articles’ area is powered by Primo.
Stanford recently got a lot of attention for making the main search on their library website result in a ‘bento style’ page as well. Their page actually doesn’t (yet?) include an ‘articles’ section, but includes ‘Books & Media’ (Catalog), Library Web Site, and Databases, as well as a couple blocks of static content.

While the ‘blended’ style initially seems attractive — it is the closest to the ‘google style single search’ we think our users want — there are some significant reasons to prefer ‘bento style’, at least for an initial implementation. These reasons include both user’s UI preferences, and our own strategic reasons.

Do user’s really prefer ‘blended’ Article and Catalog results?

There is somewhat mixed evidence.

Julie Meloni from the University of Virginia reported some findings from user testing at their institution on the Blacklight listserv on Aug 3 2011. They found that their users actually expressed a distaste for ‘blended’ results, leading UVa to implement the ‘bento style’.

Process: A/B testing (really A/B/C/D testing) of four interfaces that offered some sort of aggregated search (Stanford, Michigan, Villanova, and University of Central Florida (who is doing the blended results/relevancy rankings if anyone remembers that conversation from NGC4Lib ) if you’re wondering). From those results we determined two critical pieces of data (among several others): patrons come to Virgo knowing the type of item they’re looking for (e.g. book or article), and too much info in search results is not desired.

Really important point that came out in user testing here (of our patrons and their needs, with all due respect to others) is that patrons did not want blended results. At all. Across the board dissatisfaction with that approach. This was awesome for us to hear because it meant that we didn’t have to come up with some intricate/ tricky/very fragile way of maintaining article metadata (that legally we couldn’t hold anyway) in our own Solr index such that everything could have our own relevancy rankings applied and so on

A literature survey included in an article by Sue Fahey et al in the “Partnership: the Canadian Journal of Library and Information Practice & Research” found mixed evidence of user preference for or against ‘blended’ search.

The inclusion of journal content in WCL, viewed favorably by participants in some studies, has conversely proved confusing to participants in other studies. Thomas and Buck observed that “this merger of format types within a single set of results caused confusion as the participants did not easily distinguish between books and articles” (669). The York St John University JISC LMS Project also noted that, “alarmingly, some users seem to be unaware of the difference between books, journals and articles” and had “trouble differentiating between the different types of material returned in the results”

Fahey, Sue; Gordon, Shannon; Rose, Crystal (2011). Seeing Double at Memorial University: Two WorldCat Local Usability Studies. Partnership : the Canadian Journal of Library and Information Practice and Research 6(2) pp 1-14.

In a recent NISO presentation on Discovery Services (http://www.niso.org/news/events/2012/nisowebinars/discovery_and_delivery/), David Bietila of the University of Chicago provides some quoted user comments on a discovery system wtih ‘blended’ results they were evaluating, which I think are illuminating in their ambivalence:

“Record included books, needed a way to filter this out.”
“It’s wonderful to have ONE place where you can search for both articles and books! However, it seems like more books should show up because some books relevant to my search showed up in lens but not in Articles Plus. If you don’t choose this search tool, please do adopt some search tool that allwos comprenehsive searching of books and articles!”

The second user comment here reveals both that the user valued having one place to search both books and articles — but that in the ‘blended’ search, books became somewhat lost amongst the articles. While in the first comment, the user wanted to “filter out” the books! (the interface actually did include such a feature but the user did not find it, further emphasizing the need for a simple search that just works for many use cases, without needing to apply additional filters or facets.).

My hypothesis to explain these ambivalent results is that:

Users DO want ‘one place’ to search both articles and books, they do not want to have to find another web page to visit to search alternate things.
However, the nature of library content and current technology makes it difficult to create usable ‘blended’ results; catalog materials may crowd out books or vice versa.

A ‘bento style’ presentation may be the best way to deal with this contradiction at present.

Additionally, it is difficult to provide consistent faceting, limits, or advanced ‘fielded’ search over a combined corpus of both catalog materials and articles — the metadata is too different in these different collections.

We don’t know for sure that user’s don’t want blended search results, but we also don’t know for sure that they do, and have some reason to think they may even dislike it. In the presence of this ambivalence, there are some important strategic reasons to prefer a ‘bento style’ approach.

Strategic reasons to prefer ‘bento style’ approach

We don’t want to be forced to change the Catalog at the same time we improve Articles

Adopting a discovery layer with ‘blended’ results for our Catalog is incompatible with our current Blacklight implementation, and would require abandoning it.

We are not locked into our Blacklight implementation forever; at some point we may want to evaluate a change in direction there. But we don’t want to be forced to deal with this now in order to improve article search.

Our current attempt at meetin basic article search usage patterns (Metalib-based JHSearch) is so bad, that’s it not hard to find an improvement. But it would be much more controversial and time consuming to evaluate whether a change to public catalog is an improvement.

And implementing a change to public-facing catalog too will be much more time consuming than implementing an improved article search alone.

The Article Study as ‘proof of concept’ leads us to believe that we can bring up a bento-style interface with article search, in our existing infrastructure, relatively quickly.

In general, in the Systems department, we try to keep different components implementation as independently implemented as possible, so each one leaves our options open for the others, and we can focus on improving each area independently.

More control of our interface, more options, less vendor lock-in

The ‘bento style’ also leads us to realize we can consider not only new (and expensive) “discovery services” as article search providers, but also more traditional A&I or other comprehensive databases we may already license and be able to use at significantly less cost — so long as they have suitable API’s we can make use of to create a ‘bento style’ interface.

The ‘bento style’ approach is neccesarily implemented by writing local code that uses the API of a search product, to integrate it’s functionality into our application(s) (ie, Catalyst). This approach leaves us in control of our UI, rather than outsourcing our UI to a vendor’s product. While outsourcing the UI to a vendor’s product is less work for us (as it involves little to no local development), current systems strategy is to maintain control of our UI to better serve our users and avoid vendor lock-in, where feasible. The ‘proof of concept’ of the Article Search Study leaves us to think it is feasible here.

Additionally, the ‘bento style’ approach based on API use will let us fairly easily switch our article search provider at a later date, with minimized development time and disruption of our user’s familiar interfaces. The Article Search Study — which required us to present results from different search providers in an identical format — served as proof of concept confirming this.

The ‘bento style’ also allows us to add sections of other search content: Such as a library web site search, or embedded WorldCat search. We do not need to wait until a discovery vendor makes an agreement with OCLC or implements WorldCat Search — the ‘bento style’ approach lets us put the article search vendor in one ‘section’, and ourselves add other content from other vendors in other sections. (Internal proof-of-concept prototypes suggest WorldCat and Library Web Site search are both feasible).

Conclusion: Re-iterates our introduction.

We believe that the time is right to continue with improving article search as a priority.

We believe that ‘bento style’ search provides, at present, the best cost/benefit way to improve article search relatively quickly with relatively large user benefit. We will continue with this as a goal.

Published by jrochkind

View all posts by jrochkind

Published October 2, 2012October 4, 2012

18 thoughts on “Article Search Improvement Strategy”

Alan Cockerill says:

October 2, 2012 at 6:28 pm

Hi Jonathan. Wow. Maybe for the first time I’m glad I don’t have the resources and skills you have at your disposal. I feel like Africa jumping to cell phones and skipping landlines. Summon seems like such a no-brainer for us. Not only is the vendor responsive to suggestions for improvement – the ‘users’ are professionals scattered over the globe channeling their client needs, wants and disappointments. And if you’re really at a loose end there’s the API.

It’s not utopia but if you’re not at Johns Hopkins (or equivalent) it’s a heck of a way of pooling your purchasing power.

Thanks for the insights in this piece.

Alan.
jrochkind says:

October 2, 2012 at 6:33 pm

Alan, thanks for the response. Are you using Summon for your local user-facing catalog as well, do you send users to it as your primary catalog interface? To a ‘blended’ search that includes articles and catalog materials in the same search results? If so, I am interested what the response to this has been.

It would certainly be possible to use Summon as an article search provider (or even both an article-search and catalog-search provider) in a ‘bento style’ interface. I do believe for the reasons argued above that a bento style search may often serve the users better than a ‘blended’ search — but certainly if you have no local development capacity whatsoever, it can make a lot of sense to essentially ‘outsource’ your UI to a vendor’s product instead.

In my evaluation of various discovery/article search APIs, I was quite happy with Summon’s, in terms of API ease of use, functional completeness, and response times. There will be more on that in the nearish future. But it’s API is indeed good, although that won’t matter if you are just using their default interface and making no use of the API.

One thing that may keep Summon from being a “no-brainer” for many is it’s price.
jrochkind says:

October 2, 2012 at 7:34 pm

Alan, PS: I will reveal that part of the reason such a lengthy analysis is useful for us here is because we are a fairly large, and somewhat decentralized organization — getting all the various decision-makers on the same page with regard to what we’re going to do or how we’re going to spend our resources (staff or $)…. takes some work. Such as analysis like this. Even if it was ‘no-brainer’ to me personally to do something or other (it is not), we’d still need to do lots of research and analysis to get the organization to move in that direction, heh. So, to the extent that having more resources often comes along with having large, ponderous, decentralized organizations, I suppose you are right to be glad you don’t have it. :) At least half of the analysis here is just to get us locally on the same page that we should currently, for the moment, be spending our time on improving article search options rather than worrying about the catalog.
Alan Cockerill says:

October 2, 2012 at 8:01 pm

Hi Jonathan. We are using Summon as our default search box on our home page. A catalogue result in Summon will show holdings/availability but does link through to the item in the catalogue. The response has been muted. We had a few ‘where’s the catalogue’ enquiries, mainly from academics which we’ve tried to accommodate. We even had some people missing broadcast federated search early on. Use continues to grow somewhere between linearly and exponentially. We have campuses across two countries and we invest heavily in electronic resources, making access to the physical collection housed mainly at one campus, less of an equity issue.

The primary aim was to knock down barriers for those new to lit searching caused by the data silos you mentioned. Your point about searchers needing to know a database title, and then limiting themselves to the one they know is confirmed by my experience.

After two years a significant part of the student body has been refreshed and never known any different. I sometimes ponder what the response would be if we reinstated the catalogue search on the home page and provided a link to 400 database titles, libguides, institutional repository and a fed search tool, and then I start whistling ‘Big Yellow Taxi’ ;-)

As for price… at face value it may appear expensive but if you add up the cost of developers, system maintainers, hardware, backup, data imports, indexing, and weight your total outlay on resources and the value you get from exposing them through a discovery layer it still seems like a bargain on my abacus. I don’t want to sound like a proselytizer. It really is horses for courses.

Cheers, Alan.
Jason Thomale says:

October 3, 2012 at 11:50 am

Jonathan–nice analysis; it matches many of our own findings. Over the past year or so we (University of North Texas) have been working on implementing a resource-discovery interface improvement plan that we developed after doing a very similar analysis. Our plan has included a catalog redesign and a website redesign as well as acquisition/implementation of Summon, among other things. So–while right now I think we’re shooting for the bento-box (or a bento-box-like) approach too, it’s going to take us a little bit longer to get there.

We launched our Summon instance in February and just released our new website in August. We’re using Summon to power articles searches and our catalog to power a books & media search (with no intermediate bento-style interface at this point). We haven’t indexed our catalog in Summon, and I think we’re going to leave it that way for the foreseeable future. Although initially we were open to the idea of a single search box and blended results, evidence keeps stacking up that seems–at least, to us–to support a “users may not want to mix articles and books” hypothesis.

The early stages of our website redesign included a series of participatory design studies where, among other things, we wanted to get an idea about what sort of initial discovery search our users would prefer on the library homepage. The first part of the study involved showing them print-outs of several different, strategically-selected homepages from various academic libraries and getting them to talk about what they thought would be useful and not useful to them, and the second part involved asking them to design their ideal library homepage. For the second task, we gave them a pencil and some cutouts of various components from the library homepages they looked at during the first task. The homepages that we gave them included interfaces with just one simple, unadorned search box; tabbed search boxes that included a combined search along with a separate books search and articles search; and tabbed search boxes that didn’t include a combined search. Almost universally the users we tested preferred the search boxes that presented multiple search options over the ones that consisted of just a single text input element. Several people said explicitly that 1. they liked seeing what their options were, 2. they liked knowing what they were searching, and 3. the plain search box interfaces didn’t give them enough information about what they were doing to make them feel comfortable. And the designs that they came up with reflected those preferences. We tested a good variety of users–several undergraduates (most of whom were not regular library patrons), a few grad students, some faculty members, and a couple of librarians. Students were from a wide variety of disciplines. Oddly, the one and only person we tested that preferred the simple search box was a librarian.

I don’t think our study is exactly conclusive (it measured preference rather than performance and certainly didn’t control for every variable), and it didn’t address the question of combined vs separate results. It didn’t address anything beyond the homepage. The search box example that users preferred most did in fact include a combined search tab along with a separate books and a separate articles tab, and in fact a couple of users told us that they liked and would probably use the combined search option. But the near universal rejection of the simple search box really surprised us. It seems to suggest that users–our users, at least–do make some sort of distinction in their mind about types of library resources and that at some level they expect different types/pools of resources to be kept or treated separately in some way. Combined with some of the findings you referenced (and some other things that we’ve found), I definitely think there’s plenty of reason to question the prevailing notion that came out of the last decade suggesting that the majority of users just want the library’s search to emulate Google. It doesn’t seem to be nearly that simple. This is an area that needs further study.
jrochkind says:

October 3, 2012 at 1:33 pm

Thanks Jason! Super useful. Keep your eye out for more I’ll be publishing, here and elsewhere, on our investigations in this area. One thing I am too impatient to wait to start putting out there is: Once you’ve decided on bento style with catalog and articles kept in seperate areas, you _potentially_ have the option of using some more ‘traditional’ sources for the article search, traditional A&I type sources of various kinds, so long as they have sufficient API.
dsalo says:

October 3, 2012 at 4:06 pm

There’s also the question of what users SAY they want versus what their behavior actually gravitates to. I’d want to do some A/B testing or similar (especially against a search-everything-then-facet system) before I relied unhesitatingly on results from ask-the-user studies.

These studies are interesting stuff, though, so thanks for sharing, Jason!
Art Rhyno says:

October 3, 2012 at 4:22 pm

Great post! Just on the article search aspect, google itself might be a possible source. For example, this google CSE (http://www.google.com/cse/home?cx=007573061199770941539:tq8w6o8sbb4) has these sites as its sources:

http://bmj.com
http://eric.ed.gov
http://ingentaconnect.com
http://linkinghub.elsevier.com
http://nature.com
http://ncbi.nlm.nih.gov
http://onlinelibrary.wiley.com
http://psycnet.apa.org
http://sciencemag.org
http://tde.sagepub.com
http://www.routledge-ny.com
http://www.springerlink.com

I haven’t look into this for a while but my sense is that a google API with a high volume CSE is cheaper than paying for an API from a discovery vendor, plus google’s indexing seems more up to date for many of the same sources in my admittedly limited testing. You can build a much more formidable list than this but you get the idea.
jrochkind says:

October 4, 2012 at 11:56 am

Huh, that is an interesting idea, that I hadn’t actually considered even though I’ve been working with the google site API lately too…. very intriguing idea thanks Art!

You would have a very basic search though, mostly just keyword search, maybe a few fields. No limiting to ‘peer reviewed only’ (something our users ask for), probably no ‘publication date’ range search, no facets, etc. Might be good enough, at the price. Very interesting idea.
jrochkind says:

October 4, 2012 at 12:05 pm

Art, ah, you know what the problem of using google CSE API to power article search is…. the ‘appropriate copy problem’. Once my users find an article they are interested in, I need to get them to a licensed fulltext copy (if available), or an ILL form, etc. Google API does not give me sufficient semantics to do either of those things. All I can do is send them to the URL of the harvested google content, whereas we may have a licensed copy from another source (or a print copy, or want to fill out an ILL form for them).
Art Rhyno says:

October 4, 2012 at 2:41 pm

I should confess that my idea of a bento box is here: http://winspace.uwindsor.ca/jamun/mockup.jpg, or at least, that’s how I perceived it in 2010 and it’s not the slickest mockup in the world. I was greatly influenced by one of my kids’ (I have 3 in university) investigations into water quality at the time, and I was hung up on which source had the most up to date indexing. My mockup punted to Google Scholar for the more intricate stuff, and I purposely wanted to fold the process into general web site searching, so I was probably approaching heretical status on several fronts. You can glean some semantics through the google API through the classes, e.g. the value of the node marked with “gs-title” and so on, but I didn’t burrow very far into that rabbithole and I wasn’t thinking far beyond ezproxy hostnames. The costing aspect is definitely of interest but the key is probably which source has the most relevant and up-to-date content, and surfaces that content at the top. Google excelled at currency for the sources I looked at but I will send you a table of what I unearthed. I suspect my bento box is more of an appetizer delivery tool than the full meal.
Alan Zuckerman says:

October 4, 2012 at 4:08 pm

Bento style is simply more evolved than blended search. What I like about either of these approaches is that students who are disinclined to even try our catalog of books, turning instead to purely article-style databases, will be presented with viable book options whether they intended this or not. In other words, The combined searching you describe (way preferably bento), is the best possible advertisement for our still-potentially-useful, gigantic print collection.
jrochkind says:

October 4, 2012 at 5:50 pm

Art, in general, I am not sure how many use cases we serve where “must have up to date indexing of content from last 6 months” is a high priority. Certainly sometimes it is (probably including medical/clinical!), but I think we probably serve an awful lot of use cases where it is not. Although this may change, as scholarly publishing continues to get more immediate in the age of the web.

Regardless, we only have so many options for an integrated article search, and they probably all have about the same currency. (Google is not one of them, for reasons partially delved into above, although we will certainly continue to ALSO direct users to Google/Google Scholar as appropriate).
Pingback: First-year college student online research preferences | Bibliographic Wilderness
Pingback: Continued adventures in integrated article search | Bibliographic Wilderness
Pingback: To Bento or Not to Bento – Displaying search results | usable libraries
Pingback: Bento boxes – UH Libraries Web Services
Pingback: Segmenting “Catalog” and “Articles” in EDS API | Bibliographic Wilderness