Blacklight Community Survey Results

On August 20th I announced a Blacklight Community Survey to the blacklight and code4lib listservs, and it was also forwarded on to the hydra listserv by a member.

Between August 20th and September 2nd, I received 18 responses. After another week of no responses, I shut off the survey. It’s taken me until now to report the results, sorry!

The Survey was implemented using Google Docs. You can see the survey instrument here, access the summary of results from Google Docs here, and the complete spreadsheet of responses here.  The survey was intentionally anonymous.

My own summary with limited discussion follows below. 

Note: The summary of results incorrectly reports 24 responses rather than 18; I accidentally didn’t delete some test data before releasing the survey, and had no way to update the summary count. However, the spreadsheet is accurate; and the summaries for individual questions are accurate (you’ll see they each add up to 18 responses or fewer), except for the Blacklight version questions which have a couple test answers in the summary version. Sorry!

I am not sure if 18 responses should be considered a lot or a little, or what percentage of Blacklight implementations it represents. It should definitely not be considered a valid statistical sample; I think of it more like getting together people who happen to be at a conference to talk about their experiences with Blacklight, but I think such a view into Blacklight experiences is still useful.

I do suspect that Hydra gets more use than these results would indicate, and Hydra users of Blacklight are under-represented. I’m not sure why, but some guesses might be that Hydra implementations of blacklight are disproportionately done by vendors/contractors, or are more likely to be “release and forget about it” implementations — in either case meaning the host institutions are less likely to maintain a relationship to the Blacklight community, and find out about or care to respond to the survey.

Institutional Demographics and Applications

The majority (12 out of 18) respondents are Academic Libraries. Along with one public library, one museum, one vendor/contractor, two national libraries or consortiums, and one ‘other’.

I was unsurprised to see that the majority of use of Blacklight is for “special collection” or “institutional repository” type use. Only 1/3rd of respondents use Blacklight for a “Library catalog/discovery” application, with the rest “A Single special-purpose collection” (5 of 18), “Institutional/Digital collections repository (multiple collections)” (11, the majority of 18 respondents), or “Other” (4).

At my place of work, when we first adopted Blacklight the primary use case for existing implementations and developers were library catalog/discovery, but I had seen the development efforts mostly focusing on other use cases lately, and it makes sense to see a shift in uses to majority “repository” or “special-purpose collection” uses along with that.

A majority (1o of 18) respondents run more than 1 Blacklight application, which I did find a bit surprising, but may go along with “repository” type use, where each repo or collection gets it’s own BL app?  6 respondents run only one BL app, and 2 respondents are working on BL app(s) in development not yet in production.

Only 3 respondents (including myself) use Blacklight to host “No digital content, just metadata records”; 3 more just digital content, and the remaining 12 (the majority) some of each.

A full 8 of 18 include at least some MARC-origin metadata in their apps, 2 more than the number reporting using their app for “Library catalog/discovery”. Not quite a majority, but it seems MARC is definitely not dead in BL-land. “Dublin Core” and “Content from a Fedora Repository”, at 9 respondents each, only barely beat out MARC.

With 9 respondents reporting using “Content from a Fedora Repo”, and 11 reporting “Institutional/Digital collections repository” I expected this would mean lots of Hydra use. But in a later question we’ll examine in more detail later, only 4 respondents reported using “hydra-head (hydra framework)” in their app, which I find surprising. I don’t know if this is accurate, or respondents missed or didn’t understand the checkbox at that later question.

Versions of Blacklight in Use, and Experience with Upgrading

Two respondents is actually still deploying an app with Blacklight 3.x.

Two more are still on Blacklight 4.x — one of those runs multiple apps with some of them already on 5.x but at least one not yet upgraded; the other runs only one app on BL 4.30.

The rest of respondents on all on a Blacklight 5.x, but they are diverse 5.x releases from 5.5 to 5.14.  At the time the survey data was collected, only four of 18 respondents had crossed the BL 5.12 boundary, where lots of deprecations and refactorings were introduced. 5.12 had been released for about 5 months at that point.  That is, many months after a given BL version was released, most BL implementations (at least in this sample) still had not upgraded to it.

Just over half of respondents, 10 of 18 have never actually upgraded a Blacklight app across a major version (eg 3.x to 4.x or 4.x to 5.x); the other 8 have.

Oddly, the two respondents reporting themselves to be still running at least one BL 3.x app also said they did have experience upgrading a BL app across major versions. Makes me wonder why some of their apps are still on 3.x. None of the respondents still deploying 4.x said they had experience upgrading a BL app across a major version.

It seems that BL apps are in general not being quickly upgraded to keep up with BL releases. Live production BL deployments in the wild use a variety of BL versions, even across major versions, and some may have never been upgraded since install.

Solr Versions In Use

Only 16 of 18 respondents reported the version of Solr they are using (actually we asked for the lowest version of Solr they were using, if they had multiple Solrs used with BL).

A full 14 of these 16 are using some variety of Solr 4.x, with a large variety of 4.x Solrs in use from 4.0 to 4.10.

No respondents were still running Solr 3.x, but one poor soul is still running Solr 1.4. And only one respondent was running a Solr 5.x. It sounds like it may be possible for BL to drop support for Solr 3.x (or has that already happened), but requiring Solr 5.x would probably be premature.

I’m curious how many people have upgraded their Solr, and how often; it may be that the preponderance of Solr 4.x indicates that most installations were first deployed when Solr was in 4.x.

Rails Versions in Use

Four of 18 respondents are still using Rails 3.x, the rest have upgraded to 4.x — although not all to 4.2.

Those using Rails 3.x also tended to be the ones still reporting old BL versions in use, including BL 3.x.  I suspect this means that a lot of installations get deployed and never have any dependencies upgraded. Recall 10 of 18 respondents have never upgraded BL across a major version.  Although many of the people reporting running old Rails and old BL have upgraded BL across a major version (I don’t know if this means they used to be running even older versions of BL, of that they’ve upgraded some but not others).

If it isn’t broke don’t fix it might sometimes work, for a “deploy and done” project that never receives any new features or development. But I suspect a lot of these institutions are going to find themselves in trouble when they realize they are eventually running old unsupported versions of Rails, ruby, or BL, especially if a security vulnerability is discovered.  Even if a backport security patch is released for an old unsupported Rails or ruby version they are using (no guarantee), they may lack local expertise to actually apply those upgrades; or upgrading Rails may require upgrading BL as well to work with later Rails, which can be a very challenging task.

Local Blacklight Development Practices and Dependencies

A full 16 of 18 respondents report apps that include locally-developed custom features. 1 more respondent didn’t answer, only 1 said their app(s) did not.

I was surprised to see that only 2 respondents said they had hired a third-party vendor or contractor to install, configure, or develop a BL app. 2 more had hired a contractor; and 2 more said they were vendors/contractors for others.

I know there are people doing a healthy business in Blacklight consulting, especially Hydra; I am guessing that most of their clients are not enough involved in the BL community to see and/or want to answer this survey. (And I’m guessing many of those installations, unless the vendor/contractor has a maintenance contract, were also “deploy and ignore” installations which have not been upgraded since release).

So almost everyone is doing local implementation of features, but not by hiring a vendor/contractor, actually doing them in-house.

I tried to list every Blacklight plugin gem I could find distributed, and ask respondents which they used. The leaders were blacklight_advanced_search (53%) and blacklight_range_limit (39%).  Next were geoblacklight and hydra-head, each with 4 respondents (31%) claiming use. Again, I’m mystified how so few respondents can be using hydra-head when so many report IR/fedora uses. No other plugin got more than 3 respondents claiming use. I was surprised that only one respondent claimed sufia use.

Blacklight Satisfaction and Evaluation

Asking how satisfied you are with blacklight, on a scale of 1 (least) to 5 (most), the median score was 4, pretty respectable.

Looking at free form answers for what people like, don’t like, or want from Blacklight.

A major trend in what people like is Blacklight’s flexibility, customizability, and extensibility:

  • “The easily extendable and overridable features make developing on top of Blacklight a pleasure.”
  • “…Easy to configure faceting and fields.”
  • “…the ability to reuse other community plugins.”
  • “The large number of plugins that enhance the search experience…”
  • “We have MARC plus lots of completely randomly-organized bespoke cataloging systems. Blacklight committed from the start to be agnostic as to the source of records, and that was exactly what we needed. The ability to grow with Blacklight’s feature set from back when I started using it, that was great…”
  • “Easily configurable, Easily customizable, Ability to tap into the search params logic, Format specific partial rendering”

Major trends in what people don’t like or find most challenging about Blacklight is difficulty of upgrading BL:

  • “When we have heavily customized Blacklight applications, upgrading across major versions is a significant stumbling block.”
  • “Being bound together with a specific Bootstrap causes enormous headaches with updating”
  • “Upgrades and breaking of backwards compatibility. Porting changes back into overridden partials because much customization relies on overriding partials. Building custom, complicated, special purpose searches using Blacklight-provided methods [is a challenge].”
  • “Upgrading is obviously a pain-point; although many of the features in newer versions of Blacklight are desirable, we haven’t prioritized upgrading our internal applications to use the latest and greatest.”
  • “Varied support for plugins over versions [is a challenge].”
  • “And doing blacklight upgrades, which usually means rewriting everything.”
  • “Rapid pace of development. New versions are released very quickly, and staying up to date with the latest version is challenging at times. Also, sometimes it seems that major changes to Blacklight (for example, move from Bootstrap 2 to Bootstrap 3) are quasi-dictated by needs of one (or a handful) of particular institutions, rather than by consensus of a wider group of adopters/implementors. Also, certain Blacklight plugins get neglected and start to become less and less compatible with newer versions of Blacklight, or don’t use the latest methods/patterns, which makes it more of a challenge to maintain one’s app.”
  • “Getting ready for the upgrade to 6.0. We’ve done a lot of local customizations and overrides to Blacklight and some plugins that are deprecated.”

As well as difficulty in understanding the Blacklight codebase:

  • “Steep learning curve coming from vanilla rails MVC. Issues well expressed by B Armintor here: http://github.com/barmintor/ansr.”
  • “Code churn in technical approach (often I knew how something was done but find out it has changed since the last time I looked). Can sometimes be difficult to debug the source given the layers of abstraction (probably a necessary evil however).”
  • “Too much dinking around and mucking through lengthy instructions and config files is required to do simple things. BL requires someone with substantial systems skills to spend a lot of time to use — a luxury most organizations don’t have. Skinning BL is much more painful than it needs to be as is making modifications to BL behaviors. BL requires far more time to get running and has more technical/skill dependencies than other things we maintain. In all honesty, what people here seem to like best about BL is actually functionality delivered by solr.”
  • “Figuring out how to alter blacklight to do our custom development.”
  • “Understanding and comprehension of how it fits together and how to customise initially.”
  • “Less. Simplicity instead of more indirection and magic. While the easy things have stayed easy anything more has seemed to be getting harder and more complicated. Search inside indexing patterns and plugin. Better, updated, maintained analytics plugin.”
  • “A more active and transparent Blacklight development process. We would be happy to contribute more, but it’s difficult to know a longer-term vision of the community.”

What does it mean?

I’ve separated my own lengthy interpretation, analysis, and evaluation based on my own personal judgement into a subsequent blog post. 

Posted in General | 2 Comments

“Agile Failure Patterns In Organizations”

An interesting essay showed up on Hacker News, called “Agile Failure Patterns In Organizations

Where I am, we’ve made some efforts to move to a more small-a agile iterative and incremental development approach in different ways, and I think it’s been successful in some ways and less successful in others. (Really, I would say we’ve been trying to do this before we’d even heard the word “agile”).

Parts of the essay seem a bit too scrum-focused to me (I’m sold on the general principle of agile development, I’m less sold on Scrum(tm)), and I’m not sure about the list of “Agile Failures at a Team Level”, but the list of “Agile Failures at Organizational Level”… ring some bells for me, loudly.

Agile Failure At Organizational Level:

  • Not having a (product) vision in the first place: If you don’t know, where you are going, any road will take you there.
  • The fallacy of “We know what we need to build”. There is no need for product discovery or hypotheses testing, the senior management can define what is relevant for the product backlog.
  • A perceived loss of control at management level leads to micro-management.
  • The organization is not transparent with regard to vision and strategy hence the teams are hindered to become self-organizing.
  • There is no culture of failure: Teams therefore do not move out of their comfort zones, but instead play safe.
  • The organization is not optimized for a rapid build-test-learn culture and thus departments are moving at different speed levels. The resulting friction caused is likely to equalize previous Agile gains.
  • Senior management is not participating in Agile processes, e.g. sprint demos, despite being a role model. But they do expect a different form of (push) reporting.
  • Not making organizational flaws visible: The good thing about Agile is that it will identify all organizational problems sooner or later. „When you put problem in a computer, box hide answer. Problem must be visible!“ Hideshi Yokoi, former President of the Toyota Production System Support Center in Erlanger, Kentucky, USA
  • Product management is not perceived as the “problem solver and domain expert” within the organization, but as the guys who turn requirements into deliverables, aka “Jira monkeys”.
  • Other departments fail to involve product management from the start. A typical behavior in larger organizations is a kind of silo thinking, featured by local optimization efforts without regard to the overall company strategy, often driven by individual incentives, e.g. bonuses. (Personal agendas are not always aligned with the company strategy.)
  • Core responsibilities of product management are covered by other departments, e.g. tracking, thus leaving product dependent on others for data-driven decisions.
  • Product managers w/o a dedicated team can be problem, if the product management team is oversized by comparison to the size of the engineering team.

How about you, do some of those make you wonder if the author has been studying your organization, they ring so true?

Posted in General | Leave a comment

Carl Grant on the Proquest acquisition of Ex Libris

Worth reading, links to a couple other posts worth reading. I don’t have any comments of my own to add at present, but may in a future blog post.

http://thoughts.care-affiliates.com/2015/10/another-perspective-on-proquest-buying.html

Posted in General | Leave a comment

Just curious: Do you think there is a market for additional Rails contractors for libraries?

Fellow library tech people and other library people who read this blog, what do you think?

Are there libraries who would be interested in hiring a Rails contractor/consultant to do work for them, of any kind?

I know Data Curation Experts does a great job with what they do — do you think there is work for more than just them, whether on Blacklight/Hydra or other Rails?

Any sense of it, from where you work or what you’ve heard?

I’m just curious, thinking about some things.

Posted in General | 4 Comments

DOAJ API in bento_search 1.5

bento_search is a gem I wrote that lets you search third party search engine APIs with standardized, simple, natural ruby API. It’s focused on ‘scholarly’ sources and use cases.

In the just-released version 1.5, a search engine adapter is included for the Directory of Open Access Journals (DOAJ) article search api.

While there certainly might be circumstances where you want to provide end-users with interactive DOAJ searches, embedded in your application, my main thoughts of use cases are different, and involve back-end known-item lookup in DOAJ.

It’s not a coincidence that bento_search introduced multi-field querying in this same 1.5 release.

The SFX link resolver  is particularly bad at getting users to direct article-level links for open access articles. (Are products from competitors like SerSol or OCLC any better here?). At best, you are usually directed to a journal-level URL for the journal title the article appears in.

But what if the link resolver knew it was probably an open access journal based on ISSN (or at the Umlaut level, based on SFX returning a DOAJ_DIRECTORY_OPEN_ACCESS_JOURNALS_FREE target as valid).  You could take the citation details, look them up in DOAJ to see if you get a match, and if so take the URL returned by DOAJ and return it to the user, knowing it’s going to be open access and not paywalled.

searcher = BentoSearch::DoajArticlesEngine.new
results = searcher.search(:query => {
    :issn       => "0102-3772",
    :volume     => "17",
    :issue      => "3",
    :start_page => "207"
})
if results.count > 0
   url = results.first.link
   # => "http://www.scielo.br/scielo.php?script=sci_arttext&pid=S0102-37722001000300002"
   # hey, maybe we got a doi too. 
   doi = results.first.doi
   # => "10.1590/S0102-37722001000300002"
end

Or if an application isn’t sure whether an article citation is available open source or not, it could check DOAJ to see if the article is listed there.

Perhaps such a feature will be added to Umlaut at some point.

As more and more is published open access, DOAJ might also be useful as a general large aggregator for metadata enhancement or DOI reverse lookup, for citations in it’s database.

Another known-item-lookup uses of DOAJ might be to fetch an abstract for an article in it’s database.

Neat!

DOAJ API tips

For anyone interested in using the DOAJ Article Search API (some of whom might arrive here from Google), I found the DOAJ API to be pretty easy to work with and straightforward, but I did encounter a couple tricky parts that are worth sharing.

URI Escaping in a Path component

The DOAJ Search API has the query in the path component of the url, not a query param: /api/v1/search/articles/{search_query]

In the path component of a URI, spaces are not escaped as “+” — “+” just means “+”, and will indeed be interpreted that way by the DOAJ servers.  (Thanks DOAJ api designers for echo’ing back the query in the response, to make my bug there a bit more discoverable!) Spaces are escaped as “%20”.  (Really, escaping spaces as “+” even in query param is an odd legacy practice of unclear standards compliance, but most systems accept it, in the query params after the ? in a URL).

At first I just reached for my trusty ruby stdlib method `CGI.escape`, but that escapes spaces as `+`, resulting in faulty input to the API.  Then I figured maybe I should be using ruby `URI.escape` — that does turn spaces into “%20”, but leaves some things like “/” alone entirely. True, “/” is legal in a URI, but as a path component separator! If I actually wanted it inside the last path component as part of the query, it should be escaped as “%2F”. (I don’t know if that would ever be a useful thing to include a query to this API, but I strive for completeness).

So I settled for ruby `CGI.escape(input).gsub(“+”, “%20″)` — ugly, but okay.

Really, for designing API’s like this, I’d suggest always leaving a query like this in a URI query param where it belongs (” http://example.org/search?q=query “).  It might initially seem nice to have URLs for search results like ” https://doaj.org/api/v1/search/articles/baltimore “, but when you start having multi-word input, or worse complex expression (see next section), it gets less nice quick: ” https://doaj.org/api/v1/search/articles/%2Bbibjson.title%3A%28%2Bsomething%29%20%2Bbibjson.author.name%3A%28%2Bsmith%29

Escaping is confusing enough already; stick the convention, there’s a reason the query component of the URI (after the question mark) is called the query component of the URI!

ElasticSearch as used by DOAJ API defaults to OR operator

At first I was confused by the results I was getting from the API, which seemed very low precision, including results that I wasn’t sure why.

The DOAJ Search API docs helpfully tell us that it’s backed by ElasticSearch, and the query string can be most any ElasticSearch query string. 

I realized that for multi-word queries, it was sending them to ElasticSearch, with the default `default_operator` of “OR”, meaning all terms were ‘optional’. And apparently with a very low (1?) `minimum_should_match`

Meaning results included documents with just any one of the search terms. Which didn’t generally produce intuitive or useful results for this corpus and use case — note that DOAJ’s own end-user-facing search uses an “AND” default_operator producing much more precise results.

Well, okay, I can send it any ElasticSearch query, so I’ve just got to prepend a “+” operator to all terms, to make them mandatory. Which gets a bit trickier when you want to support phrases too, as I do; you need to do a bit of tokenization of your own. But doable.

Instead of sending query, which the user may have entered, as:  apple orange “strawberry banana”

Send query: +apple +orange +”strawberry banana”

Or for a fielded search:  bibjson.title:(+apple +orange +”strawberry banana”)

Or for a multi-fielded search where everything is still supposed to be mandatory/AND-ed together, the somewhat imposing:  +bibjson.title:(+apple +orange +”strawberry banana”) +bibjson.author.name:(+jonathan +rochkind)

Convoluted, but it works out.

I really like that they allow the API client to send a complete ElasticSearch query, it let me do what I wanted even if it wasn’t what they had anticipated. I’d encourage this pattern for other query API’s — but if you are allowing the client to send in an ElasticSearch (or Solr) query, it would be much more convenient if you also let the client choose the default_operator (Solr `q.op`), and `minimum_should_match` (Solr `mm`).

So, yeah, bento_search

The beauty of bento_search is that one developer figures out these confusing idiosyncracies once (and most of the bento_search targets have such things), encode them in the bento_search logic — and you the bento_search client can be blissfully ignorant of them, you just call methods on a BentoSearch::DoajArticlesEngine same as any other bento_search engine (eg engine.search(‘apple orange “strawberry banana”‘), and it takes care of the under-the-hood api-specific idiosyncracies, workarounds, or weirdness.

Notes on ElasticSearch

I haven’t looked much at ElasticSearch before, although I’m pretty familiar with it’s cousin Solr.

I started looking at the ElasticSearch docs since DOAJ API told me I could send it any valid ElasticSearch query. I found it familiar, from my Solr work, they are both based on Lucene after all.

I started checking out documentation beyond what I needed (or could make use of) for the DOAJ use too, out of curiosity. I was quite impressed with ElasticSearch’s feature set, and it’s straightforward and consistent API.

One thing to note is ElasticSearch’s really neat query DSL that lets you specify queries as a JSON representation of a query abstract syntax tree — rather than just try to specify what you mean in a textual string query.  For machine-generated queries, this is a great feature, and can make it easier to specify complicated queries than in a textual string query — or make certain things possible that are not even possible at all in the textual string query language.

I recall Erik Hatcher telling me several years ago — possibly before ElasticSearch even existed — that a similar feature was being contemplated for Solr (but taking XML input instead of JSON, naturally).   I’m sure the hypothetical Solr feature would be more powerful than the one in ElasticSearch, but years later it still hasn’t landed in Solr so far as I know, but there it is in ElasticSearch….

I’m going to try to keep my eye on ElasticSearch.

Posted in General | 1 Comment

bento_search 1.5, with multi-field queries

bento_search is a gem I wrote that lets you search third party search engine APIs with standardized, simple, natural ruby API. It’s focused on ‘scholarly’ sources and use cases.

Version 1.5, just released, includes support for multi-field searching:

searcher = BentoSearch::ScopusEngine.new(api_key: ENV['SCOPUS_API_KEY'])
results = searcher.search(:query => {
    :title  => '"Mystical Anarchism"',
    :author => "Critchley",
    :issn   => "14409917" 
})

Multi-field searches are always AND’d together, title=X AND author=Y; because that was the only use case I had and seems like mostly what you’d want. (On our existing Blacklight-powered Catalog, we eliminated “All” or “Any” choices for multi-field searches, because our research showed nobody ever wanted “Any”).

As with everything in bento_search, you can use the same API across search engines, whether you are searching Scopus or Google Books or Summon or EBSCOHost, you use the same ruby code to query and get back results of the same classes.

Except, well, multi-field search is not yet supported for Summon or Primo, because I do not have access to those proprietary projects or documentation to make sure I have the implementation right and test it. I’m pretty sure the feature could be added pretty easily to both, by someone who has access (or wants to share it with me as an unpaid ‘contractor’ to add it for you).

What for multi-field querying?

You certainly could expose this feature to end-users in an application using a bento_search powered interactive search. And I have gotten some requests for supporting multi-field search in our bento_search powered ‘articles’ search in our discovery layer; it might be implemented at some point based on this feature.

(I confess I’m still confused why users want to enter text in separate ‘author’ and ‘title’ fields, instead of just entering the author’s name and title in one ‘all fields’ search box, Google-style. As far as I can tell, all bento_search engines perform pretty well with author and title words entered in the general search box. Are users finding differently? Do they just assume it won’t, and want the security, along with the more work, of entering in multiple fields? I dunno).

But I’m actually more interested in this feature for other users than directly exposed interactive search.

It opens up a bunch of possibilities for a under-the-hood known-item identification in various external databases.

Let’s say you have an institutional repository with pre-prints of articles, but it’s only got author and title metadata, and maybe the name of the publication it was eventually published in, but not volume/issue/start-page, which you really want for better citation display and export, analytics, or generation of a more useful OpenURL.

So you take the metadata you do have, and search a large aggregating database to see if you can find a good match, and enhance the metadata with what that external database knows about the article.

Similarly, citations sometimes come into my OpenURL resolver (powered by Umlaut) that lack sufficient metadata for good coverage analysis and outgoing link generation, for which we generally need year/volume/issue/start-page too. Same deal.

Or in the other direction, maybe you have an ISSN/volume/issue/start-page, but don’t have an author and title. Which happens occasionally at the OpenURL link resolver, maybe other places. Again, search a large aggregating database to enhance the metadata, no problem:

results = engine.search(:query => {
    :issn       => "14409917",
    :volume     => "10",
    :issue      => "2",
    :start_page => "272"
})

Or maybe you have a bunch of metadata, but not a DOI — you could use a large citation aggregating database that has DOI information as a reverse-DOI lookup. (Which makes me wonder if CrossRef or another part of the DOI infrastructure might have an API I should write a BentoSearch engine for…)

Or you want to look up an abstract. Or you want to see if a particular citation exists in a particular database for value-added services that database might offer (look inside from Google Books; citation chaining from Scopus, etc).

With multi-field search in bento_search 1.5, you can do a known-item ‘reverse’ lookup in any database supported by bento_search, for these sorts of enhancements and more.

In my next post, I’ll discuss this in terms of DOAJ, a new search engine added to bento_search in 1.5.

Posted in General | 1 Comment

Oyster commercial ebook lending library shutting down

I’ve written a couple times about Oyster, the commercial ebook lending library, and what it might mean for the future of the book marketplace. (Dec 2013, April 2015).

So it seems right to add the coda — Oyster is going out of business.

One of the challenges that Oyster faced faced was having to constantly placate publishers concerns.  The vast majority of them are very apprehensive about going the same route music or movies went.

In a recent interview with the Bookseller, Arnaud Nourry, the CEO of Hachette said“We now have an ecosystem that works. This is why I have resisted the subscription system, which is a flawed idea even though it proliferates in the music business. Offering subscriptions at a monthly fee that is lower than the price of one book is absurd. For the consumer, it makes no sense. People who read two or three books a month represent an infinitesimal minority.”

Penguin Random House’s CEO Tom Weldon echoed Arnaud’s sentiments at the Futurebook conference a little awhile ago in the UK. “We have two problems with subscription. We are not convinced it is what readers want. ‘Eat everything you can’ isn’t a reader’s mindset. In music or film you might want 10,000 songs or films, but I don’t think you want 10,000 books.”

–– Oyster is Shutting Down their e-Book Subscription Service by Michael Kozlowski 

The closure of Oyster comes two months after Entitle, another e-book subscription service, closed. With Entitle and now Oyster gone there is one remaining standalone e-book service, Scribd, as well Amazon’s Amazon Unlimited service.

–– Oyster Is Shutting Down Operations, Publisher’s Weekly

What could have done Oyster in? Oh, I don’t know, perhaps another company with a subscription e-book service and significantly more resources and consumers. Like, say, Amazon? It was pretty clear back when Amazon debuted “Kindle Unlimited” in July 2014 that the service could spell trouble for Oyster. The price was comparable ($9.99 a month) as was the collection of titles (600,000 on Kindle Unlimited as compared to about 500,000 at the time on Oyster). Not to mention that Amazon Prime customers already had complimentary access to one book a month from the company’s Kindle Owner’s Lending Library (selection that summer: more than 500,000). In theory, Oyster’s online e-book store was partly created to strengthen its bid against Amazon, but even here the startup was fighting a losing battle, with many titles priced significantly higher there than on Jeff Bezos’ platform.

Where Oyster failed to take Amazon on, however, it’s conceivable that Google plus a solid portion of Oyster’s staff could succeed. The Oyster team has the experience, while Google has the user base and largely bottomless pockets. By itself, Oyster wasn’t able to bring “every book in the world” into its system. But with Google, who knows? The Google Books project, a sort of complement to the Google Play Store, is already well on its way to becoming a digital Alexandria. Reincarnated under the auspices of that effort, Van Lancker’s dream may happen yet.

Posted in General | 1 Comment

Optional gem dependencies

I’ve sometimes wanted to release a gem with an optional dependency. In one case I can recall, it was an optional dependency on Celluloid (although I’d update that to use concurrent-ruby if I did it over again now) in bento_search.

I didn’t want to include (eg) Celluloid in the gemspec, because not all or even most uses of bento_search use Celluloid. Including it in the gemspec, bundler/rubygems would insist on installing Celluloid for all users of my gem — and in some setups the app would also actually require Celluloid on boot too. Requiring celluloid on boot will also do some somewhat expensive setup code, run some background threads, and possibly give you strange warnings on app exit (all of those things at least in some versions of Celluloid, like the one I was developing against at the time; not sure if it’s still true). I didn’t want any of those things to happen for most people who didn’t need Celluloid with bento_search.

But rubygems/bundler has no way to specify an optional gem dependency. So I resorted to not including the desired optional dependency in my gemspec, but just providing documentation saying “If you want to use feature X, which uses Celluloid, you must add Celluloid to your Gemfile yourself.”

What I didn’t like was there was no way, other than documentation, to include a version specification for what versions of Celluloid my own gem demanded, as an optional dependency. You don’t need to use Celluloid at all, but if you do, then it must be a certain version we know we work with. Not too old (lacking features or having bugs), but also not too new (may have backwards breaking changes; assuming the optional dependency uses semver so that’s predictable on version number).

I thought there was no way to include a version specification for this kind of optional dependency. To be sure, an optional gem dependency is not a good idea. Don’t do it unless you really have to, it complicates things. But I think sometimes it really does make sense to do so, and if you have to, it turns out there is one way to deal with specifying version requirements too.

Because it turns out Rails agrees with me that sometimes an optional dependency really is the lesser evil. The ActiveRecord database adapters are included with Rails, but they often depend on a lower-level database-specific gem, which is not included as an actual gemspec dependency.

They provide a best-of-dealing-with-a-bad-situation pattern to specify version constraints too: use the runtime (not Bundler) `gem` method that rubygems provides, as at:

https://github.com/rails/rails/blob/661731c4c83f7d60f6b97c77f008e2f08441e1a1/activerecord/lib/active_record/connection_adapters/mysql2_adapter.rb#L3

This will get executed only if you are requiring the mysql2_adapter. If you are, it’ll try to load the `mysql2` gem with that version spec, in that version of mysql2_adapter `~> 0.3.13‘` If the “optional dependency” is not loaded at all (because you didn’t include it in your Gemfile), or a version is loaded that doesn’t match those version requirements, it’ll raise a Gem::LoadError, which Rails catches and re-raises with a somewhat better message:

https://github.com/rails/rails/blob/dfb89c9ff2628da9edda7d95fba8657d2fc16d3b/activerecord/lib/active_record/connection_adapters/connection_specification.rb#L175

Of course, this leads to a problem many of us have run into over the past two days since mysql2 was released.  The generated (or recommended) Gemfile for a Rails app using mysql2 includes an unconstrained `gem “mysql2″`. So Bundler is willing to install and use the newly released mysql2 0.4.0.  But then the mysql2_adapter is not willing to use 0.4.0, and ends up raising the somewhat confusing error message:

Specified ‘mysql2’ for database adapter, but the gem is not loaded. Add `gem ‘mysql2’` to your Gemfile (and ensure its version is at the minimum required by ActiveRecord).

In this case, mysql2 0.4.0 would in fact work fine, but mysql2_adapter isn’t willing to use it. (As an aside, why the heck isn’t the mysql2 gem at 1.0 yet and using semver?)  As another aside, if you run into this, until Rails fixes things up, you need to modify your app Gemfile to say `gem ‘mysql2’, “< 0.4.0″`, since Rails 4.2.4 won’t use 0.4.0.

The error message is confusing, because the problem was not a minimum specified by ActiveRecord, but a maximum.  And why not have the error message more clearly tell you exactly what you need?

Leaving aside the complexities of what Rails is trying to do and the right fix on Rails’ end, if I need an optional dependency in the future, I think I’d follow Rails lead, but improve upon the error message:

begin
   gem 'some_gem', "~&gt; 1.4.5"
rescue Gem::LoadError =&gt; e
   raise Gem::LoadError, "You are using functionality requiring 
     the optional gem dependency `#{e.name}`, but the gem is not
     loaded, or is not using an acceptable version. Add 
     `gem '#{e.name}'` to your Gemfile. Version #{MyGem::VERSION}
     of my_gem_name requires #{e.name} that matches #{e.requirement}"
end

Note that the Gem::LoadError includes a requirement attribute that tells you exactly what the version requirements were that failed. Why not include this in the message too, somewhat less confusing?

Except I realize we’re still creating a new Gem::LoadError, without those super useful `name` and `requirement` fields filled out. Our newly raised exception probably ought to copy those over properly too. Left as an exersize to the reader.

I may try to submit a PR to Rails to include a better error message here.

Optional dependencies are still not a good idea. They lead to confusingness like Rails ran into here. But sometimes you really do want to do it anyway, it’s not as bad as the alternatives. Doing what Rails does seems like the least worst pattern available for this kind of optional dependency: Use the runtime `gem` method to specify version constraints for the optional dependency; catch the `Gem::LoadError`;  and provide a better error message for it (either re-raising or writing to log or other place developer will see an error).

Posted in General | 8 Comments

Memories of my discovery of the internet

As I approach 40 years old, I find myself getting nostalgic and otherwise engaged in memories of my youth.

I began high school in 1989. I was already a computer nerd, beginning from when my parents sent me to a Logo class for kids sometime in middle school; I think we had an Apple IIGS at home then, with a 14.4 kbps modem. (Thanks Mom and Dad!).  Somewhere around the beginning of high school, maybe the year before, I discovered some local dial-up multi-user BBSs.

Probably from information on a BBS, somewhere probably around 1994, me and a friend discovered Michnet, a network of dial-up access points throughout the state of Michigan, funded, I believe, by the state department of education. Dialing up Michnet, without any authentication, gave you access to a gopher menu. It didn’t give you unfettered access to the internet, but just to what was on the menu — which included several options that would require Michigan higher ed logins to proceed, which I didn’t have. But also links to other gophers which would take you to yet other places without authentication. Including a public access unix system (which did not have outgoing network connectivity, but was a place you could learn unix and unix programming on your own), and ISCABBS. Over the next few years I spent quite a bit of time on ISCABBS, a bulletin board system with asynchronous message boards and a synchronous person-to-person chat system, which at that time routinely had several hundred simultaneous users online.

So I had discovered The Internet. I recall trying to explain it to my parents, and that it was going to be big; they didn’t entirely understand what I was explaining.

When visiting colleges to decide on one in my senior year, planning on majoring in CS, I recall asking at every college what the internet access was like there, if they had internet in dorm rooms, etc. Depending on who I was talking to, they may or may not have known what I was talking about. I do distinctly recall the chair of the CS department at the University of Chicago telling me “Internet in dorm rooms? Bah! The internet is nothing but a waste of time and a distraction of students from their studies, they’re talking about adding internet in dorm rooms but I don’t think they should! Stay away from it.” Ha. I did not enroll at the U of Chicago, although I don’t think that conversation was a major influence.

Entering college in 1993, in my freshmen year in the CS computer lab, I recall looking over someone’s shoulder and seeing them looking at a museum web page in Mozilla NCSA Mosaic  — the workstations in the lab were unix X-windows systems of some kind, I forget what variety of unix. I had never heard of the web before. I was amazed, I interupted them and asked “What is that?!?”. They said “it’s the World Wide Web, duh.”  I said “Wait, it’s got text AND graphics?!?”  I knew this was going to be big. (I can’t recall the name of the fellow student a year or two ahead who first showed me the WWW, but I can recall her face. I do recall Karl Fogel, who was a couple years ahead of me and also in CS, kindly showing me things about the internet on other occasions. Karl has some memories of the CS computer lab culture at our college at the time here, I caught the tail end of that).

Around 1995, the college IT department hired me as a student worker to create the first-ever experimental/prototype web site for the college. The IT director had also just realized that the web was going to be big, and while the rest of the university hadn’t caught on yet, he figured they should do some initial efforts in that direction. I don’t think CSS or JS existed yet then, or at any rate I didn’t use them for that website. I did learn SQL on that job.  I don’t recall much about the website I developed, but I do recall one of the main features was an interactive campus map (probably using image maps).  A year or two or three later, when they realized how important it was, the college Communications unit (ie, advertising for the college)  took over the website, and I think an easily accessible campus map disappeared not to return for many years.

So I’ve been developing for the web for 20 years!

Ironically (or not), some of my deepest nostalgia these days is for the pre-internet pre-cell-phone society; even most of my university career pre-dated cell phones, you wanted to get in touch with someone you called their dorm room, maybe left a message on their answering machine.  The internet, and then cell phones, eventually combining into smart phones, have changed our social existence truly immensely, and I often wonder these days if it’s been mostly for the better or not.

Posted in General | 1 Comment

bento_search 1.4 released

bento_search is a ruby gem that provides standardized ruby API and other support for querying external search engines with HTTP API’s, retrieving results, and displaying them in Rails. It’s focused on search engines that return scholarly articles or citations.

I just released version 1.4.

The main new feature is a round-trippable JSON serialization of any BentoSearch::Results or Items. This serialization captures internal state, suitable for a round-trip, such that if you’ve changed configuration related to an engine between dump and load, you get the new configuration after load.  It’s main use case is a consumer that is also ruby software using bento_search. It is not really suitable for use as an API for external clients, since it doesn’t capture full semantics, but just internal state sufficient to restore to a ruby object with full semantics. (bento_search does already provide a tool that supports an Atom serialization intended for external client API use).

It’s interesting that once you start getting into serialization, you realize there’s no one true serialization, it depends on the use cases of the serialization. I needed a serialization that really was just of internal state, for a round trip back to ruby.

bento_search 1.4 also includes some improvements to make the specialty JournalTocsForJournal adapter a bit more robust. I am working on an implementation of JournalTocs featching that needed the JSON round-trippable serialization too, for an Umlaut plug-in. Stay tuned.

Posted in General | Leave a comment