A position paper prepared for internal use, but why not share it with you all? If you prefer a powerpoint presentation, I have that too.
(experimenting with putting phrases in bold for readability in a short attention span twitterfied world. Don’t know if it really improves readability, or just makes it look like an informercial).
Summary of Arguments
- We should prioritize improving article search for our users
- A “bento style” search interface implemented in Catalyst is, right now, the right strategy for us to pursue in terms of:
- our overall strategic directions
- Cost/benefit, we think it can provide significant user benefit with, compared to other options, high feasibility and low development time.
We know that many of our users spend as much or more time looking for articles (both known-item and topic searches) as they do looking for Catalog materials. While this may vary between users in differnet disciplines and at different stages in academic career, we know it’s important to a great many of our users. This is why we spend so much money on article fulltext and A&I databases, of course.
North Carolina State University (NCSU) provides a search tool that searches both Catalog and Articles, putting results in different areas of the screen (we’ll look more at this style of interface later). NCSU found that 45% of their user clicks on results were in the ‘Articles’ results, vs 35% in the ‘Catalog’ results. (http://crl.acrl.org/content/early/2012/01/09/crl-321.full.pdf+html).
Selected comments from 2012 LibQual indicating importance of article search and/or licensed databases:
- “Having full-length scholarly articles available online (that can be accessed off-premises) is the #1 useful thing I use the library for! “
- “While I have not physically been to the library, I have had occasion to use the electronic access and find it very useful obtaining articles for research.”
- “I find the online resources very useful. I don’t usually spend a lot of time at the library itself, but I very often use the search engine to find journal articles, and have so far always been happy with the results.”
- “I love the access to the online databases it is very useful for research”
- “I mainly use the medical databases offered through Welch. These websites cost a fortune to access without the JHU pass.”
And more LibQual comments indicating some problems (we’ll see more later):
- “I’ve always had really great experiences, and the main thing I’d suggest improvement for would be the library website’s search function for articles & databases”
- “It is also sometimes difficult for me to find peer-reviewed articles on a particular topic if I don’t already have a citation.”
We know article search is important to our users. And, as the following will show, we know that our existing solutions are not satisfying users. But we have not traditionally spent as much time on improving article search options as we have on improving Catalog search options — for a variety of reasons including historical predilections and lack of options. However, it’s time we spend some time on improving our article search services.
Existing article search functionality that is integrated into library services includes:
- Licensed databases, provided in the JHSearch directory, available for individual access and searching.
- The Metalib-powered JHSearch federated search tool
- Google Scholar
The most ‘traditional’ way of supporting article search is with our licensed databases. We have hundreds of licensed databases. Each database may have it’s own particular sophisticated search.
This does not satisfy users who do not want to learn and deal with multiple seperate vendor interfaces, but do not have a single database that meets all their needs.
It also requires users to take extra steps to realize there is a list of databases, to choose from this list, and to deal with an unfamiliar interface. In the current environment, many users much of the time just want to be able to ‘search for articles’ without these extra steps.
Even when we provide a by-subject directory of databases, users have trouble picking databases to use (assuming they found the directory in the first place). In a Bowling Green State University study, librarians found:
…most students at BGSU choose to use databases whose names they recognize, and, students who do not know of a named database to use have a great deal of difficulty otherwise identifying one appropriate for their search topic, even when using library-provided subject lists and descriptions. Three students specifically mentioned that they would probably just go to Google or Google Scholar.
Amy Fry, Linda Rich (2011) Usability Testing for e-Resource Discovery: How Students Find and Choose e-Resources Using Library Web Sites. The Journal of Academic Librarianship 37(5), pp. 386–401
From our own 2012 LibQual:
- ” The electronic resources are generally good, but sometimes it’s hard to figure out which database to look in to find a specific journal or resource.”
We suspect that some users end up using only a single licensed database in their work (JStor is a popular one), not neccesarily because it includes all the content they are interested in (until recently JStor had no recent content, but many users relied on it exclusively anyway), because it has a simple interface and they don’t want to deal with multiple databases.
The “first pick a database” approach is also particularly poor for when a user has citation information for a known article and wants to look it up for full text or other library delivery options; they don’t want to guess which database might include this article first.
Our directory of individual licensed databases will not be going away — these databases are powerful tools providing sophisticated and focused searching for those who need it and are willing to invest the time to learn them. But we believe this method of offering article search is not sufficient for our users needs.
Metalib is a product we license which falls into the category of “broadcast federated search”. When a user enters a query, Metalib goes out and searches multiple databases at once, and then tries to combine the results from these databases in a blended result set.
This class of product started appearing in the library market well over a decade ago, and was meant to provide a simpler search environment to deal with some of the downsides of directories of databases as avenues for article search.
We’ve licensed Metalib for at least 8 years ago, and approximately 5 years ago revamped it’s User Interface using the open source Xerxes front-end, to try and ameliorate some significant UI problems in Metalib alone.
Metalib was intended precisely to deal with the problems outlined above of sending users to individual databases, providing a simpler, more consistent, and integrated search service.
However, we have never been very happy with Metalib’s search results. The technology used to do ‘broadcast federated search’ is inherently flawed.
- The product is very slow to return results.
- Relevancy ranking is poor when blending results from different databases.
- A broadcast federated search offering can only offer limited and inconsistent faceting, limiting, or fielded search, because of inconsistencies between databases.
The nature of the broadcast search technology is such that JHSearch can only search 5-10 databases simultaneously — we could push that a bit higher, but not to all of our licensed databases — such that users still need to make a choice of databases to search (or find a librarian-selected subject-specific set, which is what our tools actually do). And it also means that Metalib-powered JHSearch performs very poorly for specific ‘known item’ search, as the article you are looking for has to happen to be in those 5-15 databases.
These are challenges for the ‘broadcast federated search’ technological approach, which have been known for some time: I first wrote about them in 2007 in Library Journal ((Meta)seach like Google, Library Journal 2/15/2007, http://www.libraryjournal.com/article/CA6413442.html)
Largely because of these problems, SAIS has never chosen to direct their users to our JHSearch federated search product. Welch does use Metalib in their own custom way, although not via our JHSearch service (I do not have usage or satisfaction information from Welch Metalib service, although it would be interesting to see). MSE, however, has highlighted JHSearch in our offerings to users, and MSE’s subject guide pages include search boxes which send users to JHSearch.
Despite the problems of JHSearch, use of JHSearch for article searching is huge — presumably mostly MSE users, since MSE is the only Hopkins library that promotes JHSearch. Our statistics showed 45,000 searches using JHSearch between Dec 04 2011 and Feb 14 2012 (the last time we checked). (These numbers are so high it makes me actually doubt the validity of our measurements and wonder if we’re over-measuring somehow. But it seems safe to say this search service is getting used a LOT).
This extensive use, despite the problems of Metalib-powered search, shows that there is great demand among our users for the kind of search service we aim to provide with JHSearch — one very simple to use, integrated with web pages our users are already at (such as the subject guides in this case), requiring no pre-requisite choices from the user before doing a search.
But we want to, and can, do better than we can do with Metalib.
From 2012 LibQual:
- “My biggest complaint about the library is that while it does offer the Communications and Journalism “Research by Subject” tool, it does not allow you to filter for peer reviewed articles only. As a result, I need to go in and search in each individual database to filter for peer reviewed materials.”
- “JHSearch is fantastic, but the electronic thesis database is difficult to find and tends to time out. “
Google has a ‘Google Scholar’ product which tries to aggregate article citations and other scholarly citations — in a single aggregated index that Google creates, rather than by ‘broadcast federated searching’, avoiding many of the problems of broadcast federated search.
Google Scholar does work well for many of our users — and many of our librarians direct users there. This again shows the demand for simple “google like” article search.
Google Scholar is a great service for our users and we will continue to direct them there as appropriate. However, there are some serious limitations to relying on Google Scholar to serve our users needs:
- Anecdotal experience shows that it works better in some disciplinary areas than others (works especially well in the sciences), and in general better for ‘known item’ searches than topical searches.
- While Google Scholar does provide limited integration with Find It for getting licensed library full text or other library delivery services — it’s user interface can often misleadingly send users to vendors asking for payment for fulltext, when the library actually licenses that fulltext from another vendor.
- We have no contract with Google, no support from Google, no way to find out more about the internals of how it works or make feature requests. Google has their own interests at play, and is not neccesarily interested in optimizing their service for the needs and interests of our library or our patrons. Google could decide to take away Google Scholar — or end it’s integration with Find It — at any time.
In general, while we are glad Google Scholar is useful to our users, article search is too important and core a research service for us to give up on a library-provided solution and simply outsource to a free service we have no contract or relationship with. We need to try to meet users needs with a service we can control, optimize for their needs, and crucially: integrate with our existing web pages and web services that our patrons are already visiting, to lower the barrier of discovery and use.
Selected comments from 2012 LibQual:
- “I start with google on the internet, but 2/3 of the time it is not free, then I login here to find the article. I may need to look into how to do this better but it has not been entirely intuitive. I usually find what I need but seems like it could be much smoother.”
- ” I typically access from Google Scholar and I have my setting set to JHU library. I have found this the easiest way to have access to online articles.”
We believe many of our users much of the time want a simple article search option, integrated into our existing website/Catalyst, which requires them to make as few decisions or clicks as possible in order to get to useful search results.
- JHSearch gets a huge amount of use despite it’s flaws and insufficiencies — this shows demand for a simple decision-free search.
- Selected comments from 2012 LibQual:
- “There is way too much nagivation required for the online resources. It just seems like an antiquated interface, and more times than I care to experience, I have come across articles or journals that are not available in full text unless I purchase them.”
- ” I love that we have access to so many journals, but they are hard to get to from off campus (ie connectng to the VPN first, then going through the library’s search, then choosing a service to view the journal, then looking for my specific article…)”
- “However, my complaint would be the website for the library. It doesn’t feel intuitive and I often have a very hard time finding even basic things I need like newspaper articles.”
- “What is Catalyst? It doesn’t provide articles or reports. What do the databases under your menus provide, so that I know whether to use them for my needs?”
- “[Suggestion:] 2) use google search for library search portal instead of current search engine, which is kinda sucky…. What currently happens is you do an easy google search and produce a bunch of hits, but the same search terms produce jack squat in the library portal search. So then you have to go article by article through the google results, looking up the journal in the library portal…”
- “I wish the library homepage was a little easier to navigate…I usually use Google to find articles first and use the library database as a last resort. “
- Our logs show users entering article titles into just about any search box we offer, even though most of them can not find articles like this. Sean Hannan has comments from users clicking on the Catalyst ‘feedback’ link which also shows users expect Catalyst to search articles, and are not succesfully aware of or finding our other article search options (especially for known-item searching):
- “Even something as simple as searching for an article, with known volume number and page numbers, in the American Journal of Public Health took me numerous attempts and many frustrated clicks. Someone, please make this process easy.”
- “[It is] difficult to find full text journal article online by name of journal. I am trying to locate a specific article by Robin Newhouse written in 2007.”
- “Hello I am a student in the Discovery Hopkins Program and am taking a 2 week course called Mind Brain and Beauty instructed by Dr Monica Lopez Gonzalez. She has sent s a reading list and I am having trouble finding the articles in the on-line library.”
- Sean writes: “As well as countless people just pasting in a citation into the feedback box with a “Get this for me.” message.”
- Users who, due to level of experience or just being in a hurry…
- …Do not want to have to make decisions about what databases to search, or have to go out of their way to find the search tool — they want an article search function seamlessly integrated into our tools….
- …Which will be more or less “google style”, enter some search terms get back results. This use case focuses on simple basic search, not fielded/advanced or faceting.
This does however include both use case of:
- Searching for articles on a topic
- Searching for a specific known article by title/author, for purposes of finding full text or other delivery options.
This sort of ‘simple single search’ is not the only way all our users will want to search all the time. But we believe it is a significant usage style that many users will want much of the time; it is a usage style supported least well by our existing services; and, significantly, is one we have the most power to intervene in solving.
In the post-google world, many of our users much of the time want a search as simple as Google. We would want this search service to be closely integrated with our Catalog search, because we know users do not want to have to choose different places to search for different materials, and often are not aware of these different places.
This is essentially the usage style/pattern that Metalib-powered JHSearch aims at now. We would be looking for an improved service to replace current Metalib-powered JHSearch rather than be an additional option — the users engaging in the use patterns we are focusing on do not want multiple options to choose from, and we do not have the resources to support multiple solutions. And it should be integrated with Catalyst.
Historically there wasn’t a lot we could do to improve article search options: we could license lots of databases and show them to users; we could try to use federated search products to provide simple integrated searches. So this is what we did. However, as above, we believe these solutions are not sufficient for our users at present: And new products give us some additional options.
Over the past several years, library industry vendors have come out with ‘discovery services‘, which include aggregated indexes of scholarly citations. This is in part a response to the known insufficiencies of the ‘broadcast federated search’ technology used by Metalib, and evidence that an aggregegated index could do better. Many of the companies offering these new discovery services based on aggregated indexes also offer older ‘broadcast federated search tools’ — and in some cases seem to be working to phase out the older federated search tool — Ex Libris seems to see Primo as the eventual migration path for current Metalib customers, and does not seem to be doing significant development in current Metalib.
These products include: Serial Solutions Summon, Ex Libris Primo, EBSCO Discovery Service (as well as OCLC’s attempts to include more article content in WorldCat and WorldCat Local).
These new-generation ‘discovery’ products all include aggregated indexes of articles and scholarly content — but they are also all designed to include your catalog and other local metadata, in one single ‘blended’ search results list. For instance, Summon as implemented by NCSU: http://ncsu.summon.serialssolutions.com/search?s.q=noam+chomsky+manufacturing+consent Note fourth result is a book in the library.
However, there is another way these discovery layers can be used, using them only for their article/scholarly citation searching, and leaving your catalog in another product — but still with an integrated feel. Tito Sierra, was working at NCSU at the time, called this other option “bento style”, as it’s reminiscent of a Japanese bento lunch box (http://en.wikipedia.org/wiki/Bento)
And in fact, while NCSU does have a Summon interface that includes their catalog, their library website primary search landing point is a bento style interface instead: http://www.lib.ncsu.edu/search/index.php?q=globalization+indonesia&x=0&y=0
Note that catalog results (labelled ‘Books & Media’) are in a different area of the screen than Article results. Other individual boxes include ‘library website’ and ‘databases’. NCSU’s article search is powered by Summon, but their main catalog access is via another system, with results integrated together on one page, but in separate sections.
This bento-style interface has been adopted by others of our peer institutions, for reasons we’ll think about in a moment, including:
- Columbia, with a ‘Catalog’ section, an ‘Articles’ section, also an Institutional Repository section, and library website. Their ‘articles’ area is powered by Summon.
- University of Virginia, which offers only Catalog and Article sections, although kinds of searches are available as additional links. Their ‘articles’ area is powered by Primo.
- Stanford recently got a lot of attention for making the main search on their library website result in a ‘bento style’ page as well. Their page actually doesn’t (yet?) include an ‘articles’ section, but includes ‘Books & Media’ (Catalog), Library Web Site, and Databases, as well as a couple blocks of static content.
While the ‘blended’ style initially seems attractive — it is the closest to the ‘google style single search’ we think our users want — there are some significant reasons to prefer ‘bento style’, at least for an initial implementation. These reasons include both user’s UI preferences, and our own strategic reasons.
There is somewhat mixed evidence.
Julie Meloni from the University of Virginia reported some findings from user testing at their institution on the Blacklight listserv on Aug 3 2011. They found that their users actually expressed a distaste for ‘blended’ results, leading UVa to implement the ‘bento style’.
- Process: A/B testing (really A/B/C/D testing) of four interfaces that offered some sort of aggregated search (Stanford, Michigan, Villanova, and University of Central Florida (who is doing the blended results/relevancy rankings if anyone remembers that conversation from NGC4Lib ) if you’re wondering). From those results we determined two critical pieces of data (among several others): patrons come to Virgo knowing the type of item they’re looking for (e.g. book or article), and too much info in search results is not desired.
- Really important point that came out in user testing here (of our patrons and their needs, with all due respect to others) is that patrons did not want blended results. At all. Across the board dissatisfaction with that approach. This was awesome for us to hear because it meant that we didn’t have to come up with some intricate/ tricky/very fragile way of maintaining article metadata (that legally we couldn’t hold anyway) in our own Solr index such that everything could have our own relevancy rankings applied and so on
A literature survey included in an article by Sue Fahey et al in the “Partnership: the Canadian Journal of Library and Information Practice & Research” found mixed evidence of user preference for or against ‘blended’ search.
The inclusion of journal content in WCL, viewed favorably by participants in some studies, has conversely proved confusing to participants in other studies. Thomas and Buck observed that “this merger of format types within a single set of results caused confusion as the participants did not easily distinguish between books and articles” (669). The York St John University JISC LMS Project also noted that, “alarmingly, some users seem to be unaware of the difference between books, journals and articles” and had “trouble differentiating between the different types of material returned in the results”
Fahey, Sue; Gordon, Shannon; Rose, Crystal (2011). Seeing Double at Memorial University: Two WorldCat Local Usability Studies. Partnership : the Canadian Journal of Library and Information Practice and Research 6(2) pp 1-14.
In a recent NISO presentation on Discovery Services (http://www.niso.org/news/events/2012/nisowebinars/discovery_and_delivery/), David Bietila of the University of Chicago provides some quoted user comments on a discovery system wtih ‘blended’ results they were evaluating, which I think are illuminating in their ambivalence:
- “Record included books, needed a way to filter this out.”
- “It’s wonderful to have ONE place where you can search for both articles and books! However, it seems like more books should show up because some books relevant to my search showed up in lens but not in Articles Plus. If you don’t choose this search tool, please do adopt some search tool that allwos comprenehsive searching of books and articles!”
The second user comment here reveals both that the user valued having one place to search both books and articles — but that in the ‘blended’ search, books became somewhat lost amongst the articles. While in the first comment, the user wanted to “filter out” the books! (the interface actually did include such a feature but the user did not find it, further emphasizing the need for a simple search that just works for many use cases, without needing to apply additional filters or facets.).
My hypothesis to explain these ambivalent results is that:
- Users DO want ‘one place’ to search both articles and books, they do not want to have to find another web page to visit to search alternate things.
- However, the nature of library content and current technology makes it difficult to create usable ‘blended’ results; catalog materials may crowd out books or vice versa.
A ‘bento style’ presentation may be the best way to deal with this contradiction at present.
Additionally, it is difficult to provide consistent faceting, limits, or advanced ‘fielded’ search over a combined corpus of both catalog materials and articles — the metadata is too different in these different collections.
We don’t know for sure that user’s don’t want blended search results, but we also don’t know for sure that they do, and have some reason to think they may even dislike it. In the presence of this ambivalence, there are some important strategic reasons to prefer a ‘bento style’ approach.
Adopting a discovery layer with ‘blended’ results for our Catalog is incompatible with our current Blacklight implementation, and would require abandoning it.
We are not locked into our Blacklight implementation forever; at some point we may want to evaluate a change in direction there. But we don’t want to be forced to deal with this now in order to improve article search.
Our current attempt at meetin basic article search usage patterns (Metalib-based JHSearch) is so bad, that’s it not hard to find an improvement. But it would be much more controversial and time consuming to evaluate whether a change to public catalog is an improvement.
And implementing a change to public-facing catalog too will be much more time consuming than implementing an improved article search alone.
The Article Study as ‘proof of concept’ leads us to believe that we can bring up a bento-style interface with article search, in our existing infrastructure, relatively quickly.
In general, in the Systems department, we try to keep different components implementation as independently implemented as possible, so each one leaves our options open for the others, and we can focus on improving each area independently.
The ‘bento style’ also leads us to realize we can consider not only new (and expensive) “discovery services” as article search providers, but also more traditional A&I or other comprehensive databases we may already license and be able to use at significantly less cost — so long as they have suitable API’s we can make use of to create a ‘bento style’ interface.
The ‘bento style’ approach is neccesarily implemented by writing local code that uses the API of a search product, to integrate it’s functionality into our application(s) (ie, Catalyst). This approach leaves us in control of our UI, rather than outsourcing our UI to a vendor’s product. While outsourcing the UI to a vendor’s product is less work for us (as it involves little to no local development), current systems strategy is to maintain control of our UI to better serve our users and avoid vendor lock-in, where feasible. The ‘proof of concept’ of the Article Search Study leaves us to think it is feasible here.
Additionally, the ‘bento style’ approach based on API use will let us fairly easily switch our article search provider at a later date, with minimized development time and disruption of our user’s familiar interfaces. The Article Search Study — which required us to present results from different search providers in an identical format — served as proof of concept confirming this.
The ‘bento style’ also allows us to add sections of other search content: Such as a library web site search, or embedded WorldCat search. We do not need to wait until a discovery vendor makes an agreement with OCLC or implements WorldCat Search — the ‘bento style’ approach lets us put the article search vendor in one ‘section’, and ourselves add other content from other vendors in other sections. (Internal proof-of-concept prototypes suggest WorldCat and Library Web Site search are both feasible).
We believe that the time is right to continue with improving article search as a priority.
We believe that ‘bento style’ search provides, at present, the best cost/benefit way to improve article search relatively quickly with relatively large user benefit. We will continue with this as a goal.