Virtual Shelf Browse is a hit?

With the semester starting back up here, we’re getting lots of positive feedback about the new Virtual Shelf Browse feature.

I don’t have usage statistics or anything at the moment, but it seems to be a hit, allowing people to do something like a physical browse of the shelves, from their device screen.

Positive feedback has come from underclassmen as well as professors. I am still assuming it is disciplinarily specific (some disciplines/departments simply don’t use monographs much), but appreciation and use does seem to cut across academic status/level.

Here’s an example of our Virtual Shelf Browse.

Here’s a blog post from last month where I discuss the feature in more detail.

Posted in General | Leave a comment

blacklight_cql plugin

I’ve updated the blacklight_cql plugin for running without deprecation warnings on Blacklight 5.14.

I wrote this plugin way back in BL 2.x days, but I think many don’t know about it, and I don’t think anyone but me is using it, so I thought I’d take the opportunity having updated it, to advertise it.

blacklight_cql gives your BL app the ability to take CQL queries as input. CQL is a query language for writing boolean expressions (; I don’t personally consider it suitable for end-users to enter manually, and don’t expose it that way in my BL app.

But I do it use it as an API for other internal software to make complex boolean queries against my BL app; like “format = ‘Journal’ AND (ISSN = X OR ISSN =Y OR ISBN = Z)”  Paired with the BL Atom response, it’s a pretty powerful query API against a BL app.

Both direct Solr fields, and search_fields you’ve configured in Blacklight are available in CQL; they can even be mixed and matched in a single query.

The blacklight_cql plug-in also provides an SRU/ZeeRex EXPLAIN handler, for a machine-readable description of what search fields are supported via CQL.  Here’s “EXPLAIN” on my server:

The plug-in does NOT provide a full SRU/SRW implementation — but as it does provide some of the hardest parts of an SRW implementation, it would probably not be too hard to write a bit more glue code to get a full implementation.  I considered doing that to make my BL app a target of various federated search products that speak SRW, but never wound up having a business case for it here.  (Also, it may or may not actually work out, as SRW tends to vary enough that even if it’s a legal-to-spec SRW implementation, that’s no guarantee it will work with a given client).

Even though the blacklight_cql plugin has been around for a while, it’s perhaps still somewhat immature software (or maybe it’s that it’s “legacy” software now?). It’s worked out quite well for me, but I’m not sure anyone else has used it, so it may have edge case bugs I’m not running into, or bugs that are triggered by use cases other than mine. It’s also, I’m afraid, not very well covered by automated tests. But I think what it does is pretty cool, and if you have a use for what it does, starting with blacklight_cql should be a lot easier than starting from scratch.

Feel free to let me know if you have questions or run into problems.

Posted in General | Leave a comment

Blacklight Community Survey

I’ve created a Google Docs survey targetted at organizations who have Blacklight installations (or vendor-hosted BL installations on their behalf? Is that a thing?).

Including Blacklight-based stacks like Hydra.

The goal of the survey is to learn more about how Blacklight is being used in “the wild”, specifically but not limited to people’s software stacks they are using BL with.

If you host (or have, or plan to) a Blacklight-based application, it would be great if you filled out the survey!

Posted in General | Leave a comment

“Registered clinical trials make positive findings vanish”

via, Registered clinical trials make positive findings vanish

The launch of the registry in 2000 seems to have had a striking impact on reported trial results, according to a PLoS ONE study1 that many researchers have been talking about online in the past week.

A 1997 US law mandated the registry’s creation, requiring researchers from 2000 to record their trial methods and outcome measures before collecting data. The study found that in a sample of 55 large trials testing heart-disease treatments, 57% of those published before 2000 reported positive effects from the treatments. But that figure plunged to just 8% in studies that were conducted after 2000….

…Irvin says that by having to state their methods and measurements before starting their trial, researchers cannot then cherry-pick data to find an effect once the study is over. “It’s more difficult for investigators to selectively report some outcomes and exclude others,” she says….

“Loose scientific methods are leading to a massive false positive bias in the literature,”

Posted in General | Leave a comment


ILS Vendor III has released a report based on a survey of patrons at 7 UK academic libraries:

“WE LOVE THE LIBRARY, BUT WE LIVE ON THE WEB.” Findings around how academic library users view online resources and services (You have to register to download)

Some of the summary of findings from the report:

  • “User behaviours are increasingly pervasive, cutting across age, experience, and subject areas”
  • “Online anywhere, on any device, is the default access setting”
  • “Almost without exception, users are selecting different discovery tools to meet different requirements, ranging from known item searches to broad investigation of a new topic. Perhaps with some credit due to recent ‘discovery layer’ developments, the specialist library search is very much of interest in this bag of tools, alongside global search engines and more particular entry points such as Google Scholar and Wikipedia.”
  • Library Search is under informed scrutiny. Given a user base that is increasingly aware of the possibilities for discovery and subsequent access, there are frustrations regarding a lack of unified coverage of the library content, the failure to deliver core purposes well (notably, known item searches and uninterrupted flow-through to access), and unfavourable comparisons with global search engines in general and Google Scholar in particular. We note:
    • Global Search Engines – Whilst specialised tools are valued, the global search engines (and especially Google) are the benchmark.
    • Unified Search – Local collection search needs to be unified, not only across print and electronic, but also across curatorial silos (archives, museums, special collections, repositories, and research data stores).
    • . Search Confidence – As well as finding known items reliably and ordering results accordingly, library search needs to be flexible and intelligent, not obstructively fussy and inexplicably random.

I think this supports some of the directions we’ve been trying to take here. We’ve tried to make our system play well with Google Scholar (both directing users to Google Scholar as an option where appropriate, and using Umlaut to provide as good a landing page as possible when users come from Google Scholar and want access to licensed copies, phyisically held copies, or ILL services for items discovered).  We’ve tried to move toward a unified search in our homegrown-from-open-source-components catalog.

And most especially we’ve tried to focus on “uninterrupted flow-through to access”, again with the Umlaut tool.

We definitely have a ways to go in all these areas, it’s an uphill struggle in many ways , as discussed in my previous comments on the Ithaka report on Streamlining Access to Scholarly Resources.

But I think we’ve at least been chasing the right goals.

Another thing noted in the report:

  • “Electronic course readings are crucial (Sections 8, 12) Clearly, the greatest single issue raised in qualitative feedback is the plea for mandated / recommended course readings— and, ideally, textbooks—to be universally available as digital downloads,”

We’ve done less work locally in this direction, on course reserves in general, and I think we probably ought to. This is one area where I’d especially wonder if UK users may not be representative of U.S. users — but I still have no doubt that our undergraduate patrons spend enough time with course readings to justify more of our time then we’ve been spending on analyzing what they need in electronic systems and improving them.

The report makes a few recommendations:

  • “The local collection needs to be surfaced in the wider ecosystem.”
  • “Libraries should consider how to encompass non-text resources.”
  • “Electronic resources demand electronic workflows.”
  • “Libraries should empower users like any modern digital service. Increasing expectations exist across all user categories—likely derived from experiences with other services—that the library should provide ‘Apps’ geared to just-in-time support on the fly (ranging from paying a fine to finding a shelf) and should also support interactions for registered returning users with transaction histories, saved items, and profile-enabled automated recommendations.”
  • “Social is becoming the norm”

Other findings suggest that ‘known item searches’ are still the most popular use of the “general Library search”, although “carry out an initial subject search” is still present as well.  And that when it comes to ebooks, “There is notably strong support to be able to download content to use on any device at any time.”  (Something we are largely failing at, although we can blame our vendors).

Posted in General | Leave a comment

Virtual Shelf Browse

We know that some patrons like walking the physical stacks, to find books on a topic of interest to them through that kind of browsing of adjacently shelved items.

I like wandering stacks full of books too, and hope we can all continue to do so.

But in an effort to see if we can provide an online experience that fulfills some of the utility of this kind of browsing, we’ve introduced a Virtual Shelf Browse that lets you page through books online, in the order of their call numbers.

An online shelf browse can do a number of things you can’t do physically walking around the stacks:

  • You can do it from home, or anywhere you have a computer (or mobile device!)
  • It brings together books from various separate physical locations in one virtual stack. Including multiple libraries, locations within libraries, and our off-site storage.
  • It includes even checked out books, and in some cases even ebooks (if we have a call number on record for them)
  • Place one item at multiple locations in a Virtual Shelf, if we have more than one call number on record for it. There’s always more than one way you could classify or characterize a work; a physical item can only be in one place at a time, but not so in a virtual display.

The UI is based on the open source stackview code released by the Harvard Library Innovation Lab. Thanks to Harvard for sharing their code, and to @anniejocaine for helping me understand the code, and accepting my pull requests with some bug fixes and tweaks.

This is to some extent an experiment, but we hope it opens up new avenues for browsing and serendipitous discovery for our patrons.

You can drop into one example place in the virtual shelf browse here, or drop into our catalog to do your own searches — the Virtual Shelf Browse is accessed by navigating to an individual item detail page, and then clicking the Virtual Shelf Browse button in the right sidebar.  It seemed like the best way to enter the Virtual Shelf was from an item of interest to you, to see what other items are shelved nearby.

Screenshot 2015-07-23 15.09.12

Our Shelf Browse is based on ordering by Library of Congress Call Numbers. Not all of our items have LC call numbers, so not every item appears in the virtual shelf, or has a “Virtual Shelf Browse” button to provide an entry point to it. Some of our local collections are shelved locally with LC call numbers, and these are entirely present. For other collections —  which might be shelved under other systems or in closed stacks and not assigned local shelving call numbers — we can still place them in the virtual shelf if we can find a cataloger-suggested call number in the MARC bib 050 or similar fields. So for those collections, some items might appear in the Virtual Shelf, others not.

On Call Numbers, and Sorting

Library call number systems — from LC, to Dewey, to Sudocs, or even UDC — are a rather ingenious 19th century technology for organizing books in a constantly growing collection such that similar items are shelved nearby. Rather ingenious for the 19th century anyway.

It was fun to try to bringing this technology — and the many hours of cataloger work that’s gone into constructing call numbers — into the 21st century to continue providing value in an online display.

It was also challenging in some ways. It turns out the nature of ordering of Library of Congress call numbers particularly is difficult to implement in computer software, there are a bunch of odd cases where to a human it might be clear what the proper ordering is  (at least to a properly trained human? and different libraries might even order differently!), but difficult to encode all the cases into software.

The newly released Lcsort ruby gem does a pretty marvelous job of allowing sorting of LC call numbers that properly sorts a lot of them — I won’t say it gets every valid call number, let alone local practice variation, right, but it gets a lot of stuff right including such crowd-pleasing oddities as:

  • `KF 4558 15th .G6` sorts after `KF 4558 2nd .I6`
  • `Q11 .P6 vol. 12 no. 1` sorts after `Q11 .P6 vol. 4 no. 4`
  • Can handle suffixes after cutters as in popular local practice (and NLM call numbers), eg `R 179 .C79ab`
  • Variations in spacing or punctuation that should not matter for sorting, `R 169.B59.C39` vs `R169 B59C39 1990` `R169 .B59 .C39 1990` etc.

Lcsort is based on the cummulative knowledge of years of library programmer attempts to sort LC calls, including an original implementation based on much trial and error by Bill Dueber of the University of Michigan, a port to ruby by Nikitas Tampakis of Princeton University Library, advice and test cases based on much trial and error from Naomi Dushay of Stanford, and a bunch more code wrangling by me.

I do encourage you to check out Lcsort for any LC call number ordering needs, if you can do it in ruby — or even port it to another language if you can’t. I think it works as well or better as anything our community of library technologies has done yet in the open.

Check out my code — rails_stackview

This project was possible only because of the work of so many that had gone before, and been willing to share their work, from Harvard’s stackview to all the work that went into figuring out how to sort LC call numbers.

So it only makes sense to try to share what I’ve done too, to integrate a stackview call number shelf browse in a Blacklight Rails app.  I have shared some components in a Rails engine at rails_stackview

In this case, I did not do what I’d have done in the past, and try to make a rock-solid, general-purpose, highly flexible and configurable tool that integrated as brainlessly as possible out of the box with a Blacklight app. I’ve had mixed success trying to do that before, and came to think it might have been over-engineering and YAGNI to try. Additionally, there are just too many ways to try to do this integration — and too many versions of Blacklight changes to keep track of — I just wasn’t really sure what was best and didn’t have the capacity for it.

So this is just the components I had to write for the way I chose to do it in the end, and for my use cases. I did try to make those components well-designed for reasonable flexibility, or at least future extension to more flexibility.

But it’s still just pieces that you’d have to assemble yourself into a solution, and integrate into your Rails app (no real Blacklight expectations, they’re just tools for a Rails app) with quite a bit of your own code.  The hardest part might be indexing your call numbers for retrieval suitable to this UI.

I’m curious to see if this approach to sharing my pieces instead of a fully designed flexible solution might still ends up being useful to anyone, and perhaps encourage some more virtual shelf browse implementations.

On Indexing

Being a Blacklight app, all of our data was already in Solr. It would have been nice to use the existing Solr index as the back-end for the virtual shelf browse, especially if it allowed us to do things like a virtual shelf browse limited by existing Solr facets. But I did not end up doing so.

To support this kind of call-number-ordered virtual shelf browse, you need your data in a store of some kind that supports some basic retrieval operations: Give me N items in order by some field, starting at value X, either ascending or descending.

This seems simple enough; but the fact that we want a given single item in our existing index to be able to have multiple call numbers makes it a bit tricky. In fact, a Solr index isn’t really easily capable of doing what’s needed. There are various ways to work around it and get what you need from Solr: Naomi Dushay at Stanford has engaged in some truly heroic hacks to do it, involving creating a duplicate mirror indexing field where all the call numbers are reversed to sort backwards. And Naomi’s solution still doesn’t really allow you to limit by existing Solr facets or anything.

That’s not the solution I ended up using. Instead, I just de-normalize to another ‘index’ in a table in our existing application rdbms, with one row per call number instead of one row per item.  After talking to the Princeton folks at a library meet-up in New Haven, and hearing this was there back-end store plan for supporting ‘browse’ functions, I realized — sure, why not, that’ll work.

So how do I get them indexed in rdbms table? We use traject for indexing to Solr here, for Blacklight.  Traject is pretty flexible, and it wasn’t too hard to modify our indexing configuration so that as the indexer goes through each input record, creating a Solr Document for each one — it also, in the same stream, creates 0 to many rows in an RDBMS for each call number encountered.

We don’t do any “incremental” indexing to Solr in the first place, we just do a bulk/mass index every night recreating everything from the current state of the canonical catalog. So the same strategy applies to building the call numbers table, it’s just recreated from scratch nightly.  After racking my brain to figure out how to do this without disturbing performance or data integrity in the rdbms table — I realized, hey, no problem, just index to a temporary table first, then when done swap it into place and delete the former one.

I included a snapshotted, completely unsupported, example of how we do our indexing with traject, in the rails_stackview documentation.  It ends up a bit hacky, and makes me wish traject let me re-use some of it’s code a little bit more concisely to do this kind of a bifurcated indexing operation — but it still worked out pretty well, and leaves me pretty satisfied with traject as our indexing solution over past tools we had used.

I had hoped that adding the call number indexing to our existing traject mass index process would not slow down the indexing at all. I think this hope was based on some poorly-conceived thought process like “Traject is parallel multi-core already, so, you know, magic!”  It didn’t quite work out that way, the additional call number indexing adds about 10% penalty to our indexing time, taking our slow mass indexing from a ~10 hour to an ~11 hour process.  We run our indexing on a fairly slow VM with 3 cores assigned to it. It’s difficult to profile a parallel multi-threaded pipeline process like traject, I can’t completely wrap my head around it, but I think it’s possible on a faster machine, you’d have bottlenecks in different parts of the pipeline, and get less of a penalty.

On call numbers designed for local adjustment, used universally instead

Another notable feature of the 19th century technology of call numbers that I didn’t truly appreciate until this project — call number systems often, and LC certainly,  are designed to require a certain amount of manual hand-fitting to a particular local collection.  The end of the call number has ‘cutter numbers’ that are typically based on the author’s name, but which are meant to be hand-fitted by local catalogers to put the book just the right spot in the context of what’s already been shelved in a particular local collection.

That ends up requiring a lot more hours of cataloger labor then if a book simply had one true call number, but it’s kind of how the system was designed. I wonder if it’s tenable in the modern era to put that much work into call number assignment though, especially as print (unfortunately) gets less attention.

However, this project sort of serves as an experiment of what happens if you don’t do that local easing. To begin with, we’re combining call numbers that were originally assigned in entirely different local collections (different physical library locations), some of which were assigned before these different libraries even shared the same catalog, and were not assigned with regard to each other as context.  On top of that, we take ‘generic’ call numbers without local adjustment from MARC 050 for books that don’t have locally assigned call numbers (including ebooks where available), so these also haven’t been hand-fit into any local collection.

It does result in occasional oddities, such as different authors with similar last names writing on a subject being interfiled together. Which offends my sensibilities since I know the system when used as designed doesn’t do that. But… I think it will probably not be noticed by most people, it works out pretty well after all.

Posted in General | 3 Comments

Long-standing bug in Chrome (WebKit?) on page not being drawn, scroll:auto, retina

In a project I’m recently working on, I ran into a very odd bug in Chrome (may reproduce in other WebKit browsers, not sure).

My project loads some content via AJAX into a portion of the page. In some cases, the content loaded is not properly displayed, it’s not actually painted by the browser. There is space taken up by it on the page, but it’s kind of as if it had `display:none` set, although not quite like that because sometimes _some_ of the content is displayed but not others.

Various user interactions will force the content to paint, including resizing the browser window.

Googling around, there are various people who have been talking about this bug, or possibly similar bugs, for literally years. Including here and maybe this is the same thing or related, hard to say.

think the conditions that trigger the bug in my case may include:

  • A Mac “retina” screen, the bug may not trigger on ordinary resolutions.
  • Adding/changing content via Javascript in a block on the page that has been set to `overflow: auto` (or just overflow-x or overflow-y auto).

I think both of these things are it, and it’s got something to do with Chrome/WebKit getting confused calculating whether a scrollbar is neccesary (and whether space has to be reserved for it) on a high-resolution “retina” screen, when dynamically loading content.

It’s difficult to google around for this, because nobody seems to quite understand the bug. It’s a big dismaying though that it seems likely this bug — or at least related bugs with retina screens, scrollbar calculation, dynamic content, etc — have existed in Chrome/WebKit for possibly many years.  I am not certain if any tickets are filed in Chrome/WebKit bug tracker on this (or if anyone’s figured out exactly what causes it from Chrome’s point of view).  (this ticket is not quite the same thing, but is also about overflow calculations and retina screens, so could be caused by a common underlying bug).

There are a variety of workarounds suggested on Google, for bugs with Chrome not properly painting dynamically loaded content. Some of them didn’t seem to work for me; others cause a white flash even in browsers that wouldn’t otherwise be effected by the bug; others were inconvenient to apply in my context or required a really unpleasant `timeout` in JS code to tell chrome to do something a few dozen/hundred ms after the dynamic content was loaded. (I think Chrome/WebKit may be smart enough to ignore changes that you immediately undo in some cases, so they don’t trigger any rendering redraw; but here we want to trick Chrome into doing a rendering redraw without actually changing the layout, so, yeah).

Here’s the hacky lesser evil workaround which seems to work for me. Immediately after dynamically loading the content, do this to it’s parent div:

$("#parentDiv").css("opacity", 0.99999).css("opacity", 1.0);

It does leave a `style` element setting opacity to 1.0 sitting around on your parent container after you’re done, oh well.

I haven’t actually tried the solution suggested here, to a problem which may or may not be the same one I have — of simply adding `-webkit-transform: translate3d(0,0,0)` to relevant elements.

One of the most distressing things about this bug is if you aren’t testing on a retina screen (and why/how would you unless your workstation happens to have one), you may not ever notice or be able to reproduce the bug, but you may be ruining the interface for users on retina screens (and find their bug report completely unintelligible and unreproducible if they do report it, whether or not they mention they have a retina screen when they file it, which they probably won’t, they may not even know what this is, let alone guess it’s a pertinent detail).

Also that the solutions are so hacky that I am not confident they won’t stop working in some future version of Chrome that still exhibits the bug.

Oh well, so it goes. I really wish Chrome/WebKit would notice and fix though. Probably won’t happen until someone who works on Chrome/WebKit gets a retina screen and happens to run into the bug themselves.

Posted in General | 1 Comment

“Dutch universities start their Elsevier boycott plan”

“We are entering a new era in publications”, said Koen Becking, chairman of the Executive Board of Tilburg University in October. On behalf of the Dutch universities, he and his colleague Gerard Meijer negotiate with scientific publishers about an open access policy. They managed to achieve agreements with some publishers, but not with the biggest one, Elsevier. Today, they start their plan to boycott Elsevier.

Dutch universities start their Elsevier boycott plan

Posted in General | 2 Comments

“First Rule of Usability? Don’t Listen to Users”

A 15-year-old interesting brief column from noted usability expert Jakob Nielsen, which I saw posted today on reddit:  First Rule of Usability? Don’t Listen to Users

Summary: To design the best UX, pay attention to what users do, not what they say. Self-reported claims are unreliable, as are user speculations about future behavior. Users do not know what they want.

I’m reposting here, even though it’s 15 years old, because I think many of us haven’t assimilated this message yet, especially in libraries, and it’s worth reviewing.

An even worse version of trusting users self-reported claims, I think, is trusting user-facing librarians self-reported claims about what they have generally noticed users self-reporting.  It’s like taking the first problem and adding a game of ‘telephone’ to it.

Nielsen’s suggested solution?

To discover which designs work best, watch users as they attempt to perform tasks with the user interface. This method is so simple that many people overlook it, assuming that there must be something more to usability testing. Of course, there are many ways to watch and many tricks to running an optimal user test or field study. But ultimately, the way to get user data boils down to the basic rules of usability:

  • Watch what people actually do.
  • Do not believe what people say they do.
  • Definitely don’t believe what people predict they may do in the future.

Yep. If you’re not doing this, start. If you’re doing it, you probably need to do it more.  Easier said than done in a typical bureaucratic inertial dysfunctional library organization, I realize.

It also means we have a professional obligation to watch what the users do — and determine how to make things better for them. And then watch again to see if it did. That’s what makes us professionals. We can not simply do what the users say, it is an abrogation of our professional responsibility, and does not actually produce good outcomes for our patrons. Again, yes, this means we need library organizations that allow us to exersize our professional responsibilities and give us the resources to do so.

For real, go read the very short article. And consider what it would mean to develop in libraries taking this into account.

Posted in General | 3 Comments

Yahoo YBoss spell suggest API significantly increases pricing

For a year or two, we’ve been using the Yahoo/YBoss/YDN Spelling Service API to provide spell suggestions for queries in our homegrown discovery layer. (Which provides UI to search the catalog via Blacklight/Solr, as well as an article search powered by EBSCOHost api).

It worked… good enough, despite doing a lot of odd and wrong things. But mainly it was cheap. $0.10 per 1000 spell suggest queries, according to this cached price sheet from April 24 2105. 

However, I got an email today that they are ‘simplifying’ their pricing by charging for all “BOSS Search API” services at $1.80 per 1000 queries, starting June 1.

That’s 18x increase. Previously we paid about $170 a year for spell suggestions from Yahoo, peanuts, worth it even if it didn’t work perfectly. That’s 1.7 million querries for $170, pretty good.  (Honestly, I’m not sure if that’s still making queries it shouldn’t be, in response to something other than user input. For instance, we try to suppress spell check queries on paging through an existing result set, but perhaps don’t do it fully).

But 18x $170 is $3060.  That’s a pretty different value proposition.

Anyone know of any decent cheap spell suggest API’s? It looks like maybe Microsoft Bing has a poorly documented one.  Not sure.

Yeah, we could role our own in-house spell suggestion based on a local dictionary or corpus of some kind. aspell, or Solr’s built-in spell suggest service based on our catalog corpus.  But we don’t only use this for searching the catalog, and even for the catalog I previously found these API’s based on web searches provided better results than a local-corpus-based solution.  The local solutions seemed to false positive (provide a suggestion when the original query was ‘right’) and false negative (refrain from providing a suggestion when it was needed) more often than the web-based API’s. As well, of course, as being more work on us to set up and maintain.

Posted in General | 5 Comments