Ex Libris URM — post-ILS ILS?

I am at the ELUNA (Ex-Libris user’s group) conference, and just saw a presentation on Ex Libris strategy from Oren Beit-Arie, chief strategy officer of Ex Libris, and Catherine [someone], URM project manager.

I was quite impressed. The URM strategy is basically Ex Libris’ vision for a new kind of ILS (post-ILS ILS?).  [I talked about this before after last year’s ELUNA] I hope they make the slides and such from this presentation public, because I can’t quite cover what they did. They showed basically a structural diagram of the software (fairly consistent with what I wrote on future directions of library systems), and some mock-ups of staff user interfaces for workflow.

But my basic summary is:  Yes! This is indeed a vision for an ILS that makes sense. They get it. I imagine most code4libbers if they saw this presentation would agree, wow, yeah, that’s the way an ILS should actually work:  supporting staff workflow in an integrated way that actually makes sense, modular and componentized, full of APIs and opportunities for customer-written plugins, talking to various third-party software (from vendors (of various classes) to ERP software etc.), etc etc.

The vision was good. What’s the execution going to be?  So enough praise, let’s move on to questions, concerns, and some interesting implications.


The timelines given were pretty ambitious.  Did they really mean to say that those mock-ups of staff interfaces Catherine showed are planned to be reality by the end of 2010?  The software is ambitious enough to begin with, but on top of that the mock-ups shown were heavily dependent on being able to talk to external software via web services, and getting other people’s software that may or may not have those services available to do what’s needed in that time too… I’m a bit skeptical. [And with the staffing required to pull this off and based on pricing for other products… I would have to predict that end pricing is going to be in the mid six figures].

On top of that, when Oren supplied his timeline there seemed to be a bit of slight of hand going on that confused me a bit. He said that next version of SFX was expected end of 2009, and with that lit up a bunch of boxes on his structural diagram that he said this release of SFX would fulfill.  If SFX is really going to fill those boxes for the integrated and modular architectural vision he outlined (with rationalized workflow and data management not based on the existing borders of silos that exist for historical reasons, and which SFX definitely exhibits—and SFX is not known for a staff interface that supports rational workflow)…

….then either SFX is going to become quite different software than it is now (by end of 2009?)—-or the execution is going to be significantly less than the vision offered.

Network-level metadata control?

Part of the vision involved a network-level metadata store, with individual libraries linking holdings to truly shared metadata records. (Catherine at one point said “MARC… er… descriptive records” That is, an actual slip of the tongue, not one made intentionally for effect. I suspect she intended to avoid saying “MARC” at all to avoid bringing up the issue without specifying MARC…. hm.).  The example used was that 4200 libraries use essentially the same record for Freakonomics, and they all manage it seperately, and when they enhance it, their enhancements are seldom shared–and this makes no sense.

We all know this vision makes sense. We also all know that this is Ex Libris challenging OCLC’s baliwick.  And, I will say, with some sadness, the fact that this vision sounds so enticing and is so different from what we’ve got is kind of an indictment of OCLC’s stewardship of what they do. This is how OCLC should be working, we all know. So Ex Libris has a vision to make it work.

How is this going to interfere with OCLC?  Where are they going to get all these records from?  In the flowchart of where these records would come from, libraries were identified. Can libraries legally share these records with an Ex Libris central metadata store, will OCLC let them (not if they can help it!).   The screenshots of staff workflow imply that when a new acquisition comes in (or really, even at the suggestion/selection level), a match will be immediately be made to a metadata record in the central system—this implies the central system will have metadata records for virtually everything (ie, like Worldcat).  Can they pull this off?

If they DO pull it off, it’ll be a great system—and will divide library metadata into several worlds, some libraries using a central metadata system provided by a for-profit vendor, others using the OCLC coooperative, others using… what?  It’ll be sad to me to fragment the shared cataloging corpus like this, divide it along library ‘class’ lines, and surrendure what had been owned by the collective library community through a non-profit cooperative to a for-profit vendor.

On the other hand, lest we forget, the shared metadata world really already is divided on “class” lines–many libraries can not afford to live in the OCLC world (although I suspect those same libraries will not be able to afford to live in the Ex Libris shared metadata world either).   And if OCLC actually acted more like the non-profit cooperative representing the collective interests of the library sector, as it claims to be… it would be even more distressing that it is not succeeding in supplying the kind of shared metadata environment that a for-profit vendor is envisoning challenging them with.

True Modularity?

Oren suggested that their vision was open-ness, integration, and modularity. The implication to me is that I should be able to mix-and-match components of this envisioned URM architecture with other non-ExLibris (proprietary or open source)  components.

Is this really going to be?  As I in fact said last year after the introduction of the URM strategy, this URM strategy is technically ambitious even without this kind of true mix-and-match modularity.  To add that in in a realistic way makes it even more ambitious.  And to what extent does it really meet Ex Libris business interests (not suggesting they are misleading us about the goal, but when your plan is too ambitious to meet the timelines you need to stay in business…   what’s going to be the first thing to drop?).

For instance, to go back to our Metadata discussion, if Worldcat (or LibLime, or anyone else) did provide a central metadata repository with the kind of functionality envisioned there, and further provided a full set of web services (both read and write) for this functionality… could I use Worldcat in this URM vision instead of the Ex Libris centralized metadata repository?  (by 2010?).   Really?

For another example, Primo is in some ways written quite satisfactorily to be inter-operable with third party products. But what if I want to buy just the “eShelf” function of Primo (because it’s pretty good), and use someone elses discovery layer with this?  What if I want to buy Primo without the eShelf and use some other product for personal collection/saved record/eShelf functionality?  Right now I can’t. How truly  “modular mix-and-match” something is depends on where you draw the lines between modules, doesn’t it?

[If Ex Libris really were interested in prioritizing this kind of mix-and-match modularity, I’d encourage them to explore using the Evergreen OpenSRF architecture in an Evergreen-compatible way. But, yes, to do this in a real way would take additional development resources in an already ambitious plan.]

[And in Primo’s defense, if I wanted to use Primo with a third-party “eShelf”, or use the Primo eShelf with a third party discovery layer, Primo’s configuration and customizability would _probably_ support this.  The former might even make financial sense with no discount—buying Primo just for the eShelf obviously would not.  As more and more complex features are added, however, will they be consistently modularized to even allow this technically? It’s definitely not a trivial technological project.]

May you live in interesting times

If nothing else, it’s disturbingly novel to be presented with a vendor who seems to really get it. They are talking the talk (and mocking the interface mock-ups) that match where library software really does need to go.

Will they pull it off?  If they do, will it really be open in the modular mix-and-match way we all know we need to avoid vendor lock-in and continue to innovate?  Will we be able to afford it? We will see.

I would think the strength of this vision would light a fire under their competitors (including OCLC, and maybe the open source supporters too), spurring more rational vision-making from other library industries, making it more clear (if it wasn’t already) that keeping on going the way you can keep on going is not an option.  I hope so. (On the other hand, Ex Libris is clearly targetting a library customer field that can afford it, in several definitionsof ‘afford’. Do other vendors think they can keep on keeping on to target different markets?]


Free Covers? From Google?

Tim Spalding writes about Google Book Search API, cover availability, and terms of service:

In NGC4Lib:

Basically, I’ve been told [I can’t help but wonder: Told by whom? -JR] that I was wrong to promote the use of GBS for covers in an OPAC, that the covers were licensed from a cover supplier-ten guesses!-and should not have allowed this, and that the new GBS API terms of service put the kibosh on the use I promoted.

And on his blog:

The back story is an interesting one. Soon after I wrote and spoke about the covers opportunity, a major cover supplier contacted me. They were mifffed at me, and at Google. Apparently a large percentage of the Google covers were, in fact, licensed to Google by them. They never intended this to be a “back door” to their covers, undermining their core business. It was up to Google to protect their content appropriately, something they did not do. For starters, the GBS API appears to have gone live without any Terms of Service beyond the site-wide ones. The new Terms of Service is, I gather, the fruit of this situation.

Now, I am not surprised. As soon as I heard the Google staff person on the Talis interview implying that Google had no problem with use of cover images in a library catalog application, I knew that something would come through the pipeline to put the kibosh on that. Not least because I too had had backchannel communications with a certain Large Library Vendor, about Amazon, where they revealed (accidentally I think), that they had had words with Amazon about Amazon’s lack of terms of service for their cover image. Even then, I wondered exactly what the legal situation was, in the opinion of this Large Vendor, of Amazon, or of any other interested parties.

More questions than answers

But here’s the thing. When I read GBS’s new Terms of Service looking for something to prevent library catalog cover image use… I don’t see anything. And if there WAS going to be something, what the heck would it even look like anyway?

Amazon tried to put the kibosh on this by having their terms of service say “You can only use our whole API if the primary purpose of your website is to sell books from Amazon.” Making not just cover usage, but really any use of the Amazon API by libraries pretty much a violation. If Google goes _that_ route, it’d be disastrous for us.

But I doubt they would–not even just trying to limit cover image usage by those terms–because, first of all, they intended from the start for this service to be used by libraries, and had development partners from the start that included libraries and library vendors. Secondly, what would the equivelent be for Google? You can only use this service if your primary business is sending users to google full text? Ha! That’s nobody’s primary business!

What terms could possibly restrict us anyway?

Without restricting what Google was trying to do in the first place.

And besides, the whole point of the GBS API having cover images was to let people put a cover image next to a link to GBS. The utility of this is obvious.

But isn’t that exactly what we library catalog users are doing, no more and no less? So what could their terms say?

“you are allowed to use these covers next to a link to GBS, but ONLY if you are not a library or someone else who is the target market for Large Library Market Vendors. You can only use it if Large Library Market Vendor is NOT trying to sell you a cover image service.”

Or, “you can only use it if you’re not a library.”

Can they really get away with that? Just in terms of PR, especially since, unlike Amazon, they get most of their content from library partners?

I know the Major Library Vendors want to keep us on the hook for paying them Big Money for this service. And they’re the same ones selling these images to Google. But it’s unclear to me what terms could possibly prevent us from using the covers, while allowing the purposes that Google licensed them for in the first place.

And what’s the law, anyway?

Then we get to the actual legal issues. To begin with a “terms of service” that you do NOT in fact need to even “click through” to use the service—thus you don’t ever have had to have READ to use the service–I’m not sure it’s enforceable at all. But they could fix that by requiring an API key for the GBS API, and making you click-through to get the key.

But the larger issue is that legal issues around cover image usage is entirely unclear to begin with.

I remain very curious what the Large Library Vendor’s agreements with the publishers (who actually own the intellectual property for cover images, generally) is, and what makes them think they have exclusive right to provide libraries with this content? It also remains an unanswered question exactly what “fair use” rights we have with cover images. Of course, that’s all moot if you have a license agreement with yoru source of cover images, that trumps fair use (thus the “terms of service”. But again, I dont’ see anything in the terms of service to prevent cover image use by libraries).

Search hints/related search?

So google and Yahoo both sometimes offer “related” searches, in a nice AJAXy popup.

I don’t have time to find an example to show you, but I think most of you have seen it with Google at least. The firefox google opensearch toolbar for instance. I put in “library” and in a popup it suggests “library of congress; librarything; library thing; library journal” etc. Maybe that wasn’t the best example, but sometimes this is useful.

It strikes me that it would be really nice to have a similar feature in our various library search functions (including catalog and federated search?). First thought is, gee, can I just use the Yahoo and/or Google apis to do this? But I seriously doubt that would be consistent with either of their Terms of Service, to use this service for something that has nothing to do with google/yahoo and isn’t going to lead to a search of google/yahoo, but instead use these suggestions for search of our own content.

So, that gets me thinking, how do you do this? Obviously Google and Yahoo are coming up with these suggestions by analyzing their own data—either their corpus of indexed stuff, their query logs, or likely a combination of both. Anyone know if there are any public basic algorithms for doing this kind of thing? Anyone have enough “information retrieval” knowledge to hazzard a guess as to what sorts of algorithms are used for this? How would we go about adding this to our own apps?

Update: It also occurs to me that this would be ANOTHER natural service for OCLC to provide. To provide “related search” suggestions well, you need a good corpus and some data mining. OCLC has a giant corpus of not only book metadata, but search query history from their database offerings.  An OCLC “search suggestion” API where you give it a query, and it gives you search suggetsions, which you are licensed to use in any search your library has? I’d reccomend my library pay for that, if the price was right.  Natural service from OCLC.


Tagging and motivation in library catalogs?

Eh, this comment was long enough I might as well post it here too, revised and expanded a bit. (I’ve been flagging on the blogging lately). Karen Schneider thinks about “tagging in a workflow context

Tagging in library catalogs hasn’t worked yet for a number of reasons…

Karen goes on to discuss much of the ‘when’ of tagging, but I still think the ‘why’ of tagging is more relevant. Why would a user spend their valuable time adding tags to books in your library catalog?

I think the vast majority of succesful tagging happens when users tag to aid their OWN workflow. Generally to keep track of things. You tag on delicious to keep track of your bookmarks. You tag on librarything to organize your collections. The most succesful tagging isn’t done to help _other_ people find things, but to keep track of things yourself–at least not at first, not the tagging that builds the successful tag ecology. Most cases of a successful tagging community where people do to tag to help others find things–I’d suggest it would be because it somehow benefits them personally to help people find things. Such as, maybe, tagging your blog posts on wordpress.com because you want others to find your blog posts–still a personal benefit.

A succesful tag ecology is generally built on tagging actions that serve very personal interests which do not need the succesful tagging ecology on top of it. Interests served even if you are the only one who is tagging. The succesful tagging ecology which builds out of it–and which goes on to provide collective benefit that was not the original intent of the taggers–is an epiphenomenon.

Amazon might be a notable exemption to this hypothesis, perhaps because it such a universally used service before tagging already. (Unlike our library catalogs).  I would be interested to understand what motivates users to tag in Amazon. Anyone know of anyone who’s looked into this? It’s also possible that if amazon’s tags are less useful, it is in fact because of this lack of personal benefit from tagging.

So what personal benefit can a user get in tagging in a library catalog? If we provided better ’saved records’ features, perhaps, keep tracks of books you’ve checked out, books you might want to check out, etc. But I’m not sure if our users actually USE our catalogs enough to find this useful, no matter how good a ’saved records’ feature we provide. In an academic setting, items from the catalog no longer neccesarily make up a majority of a user’s research space.

To me, that suggests, can we capture tags from somewhere else? My users export items to refworks. Does refworks allow tagging yet? If it did, is there a way to export (really re-import) these tags BACK to the catalog, when a user tags something? But even if so, it would be better if Refworks somehow magically aggregated tags from _different_ catalogs, of the same work. But that relies on identifier issues we haven’t solved yet. If our catalogs provide persistent URLs (which they don’t usually, which is a tragedy), users COULD tag in delicious if they wanted to. Is there a way to scan delicious for any tags including your catalogs url, and import those back in?

In addition to organizing one’s research and books/items of interest, are there other reasons it would serve a patron’s interest to tag, other things they could get out of it?  A professor might tag books of interest for their students, perhaps (not that most professors are looking for more technological things to spend time on helping students, but some are).   And librarians themselves might tag things with non-controlled-vocabulary topic areas they know would be of use to a particular class or program or department, with terms of use to those classes or programs or departments.  Can anyone think of any other reasons tagging could be of benefit to a user (not whether a successful tagging ecology would be of collective benefit–but benefits an individual user can get from assigning tags in a library catalog).

Worldcat covers a much larger share of my academic users’ research universe than my own catalog. And worldcat has solved the “aggregating different copies of this work from different libraries” problem to some extent. Which is why it would make so much sense for worldcat to offer a tagging service–which can be easily incorporated into your own local catalog for both assigning and displaying tags (if not for searching) ala library thing. It is astounding to me that OCLC hasn’t provided this yet. It seems to be a very ‘low hanging fruit’ (a tagging interface on worldcat.org with a good API is not rocket science) that is worth a try.


More on open access discoverability

This is worth pulling out into a post of it’s own. Thanks to Dorothea Salo for the comments on the post where I broached this issue sort of in passing. Good to know that I’m indeed not alone in worrying about this stuff.

But there are actually a few different (but related) issues Dorothea has identified here, some of which aren’t a problem for my projects at all, others of which are. Let’s analyze them out:

1. Some faculty are unwilling to publish open access.

This might be a problem, but despite this problem there’s plenty of free-web publicly accessible scholarly content available. (I use this phrase because the specific licensing might be unclear, but an unauthenticated user can get it on the web.) I’m thinking specifically about so-called preprint/postprint public accessible versions of articles that also appear in not-open-access journals. There’s lots of it. This is in fact what motivates my desires in the first place.

2. Some repository software doesn’t allow control of access to the level desired by repository managers.

This might be a problem too, but despite it, most supposed “open access” repositories do contain material that the repository does not in fact make available to the general unauthenticated public! So the software might not be flexible enough, but it is often restricting access to contents in it anyway. And including metadata for those restricted items in the general OAI-PMH feed, without any predictable machine-readable way to tell that it is in fact restricted content.

So it’s in fact the ability of many repositories now to restrict content that brings me to my issue:

3. I have no way to identify the universe of actually publically accessible ‘open access’ scholarly content.

Even if I created an aggregate index of OAI-PMH feeds from all “open access” repositories—it would include content which is not viewable by an unauthenticted user! What I want to do in my software is, I have a known-item citation, I want to tell the user if there’s a publically-viewable copy of this citation online. I have no way to find/identify such a copy though! I have no way to weed out the stuff that isn’t really publically accessible. I don’t want to send the user to something they cant’ access—some repositories listed in DOAR actually have the majority of their items (in the OAI-PMH feed) not available to the unauthenticated off-campus user!

So 1 and 2 might be issues in general, but aren’t what’s providing the roadblock for me. 3 is. There are a couple other issues worth nothing, one that is an inconvenience (but not a roadblock) for my project, one that is not.

4. Difficulty of identifying articles in repositories matching a citation.

When I experimentally tried doing a search against OAISter (before I realized that OAISter didn’t even limit itself to so-called open access repositories; and before I realized that even open access repositories weren’t)—I had to do a search based just on title and author keywords. It would be better if I could search based on an identifier (DOI or pmid) when present—or based on structured publication data for the actual publication of the pre/postprint: ISSN, vol, issue, page number. But these things aren’t available in the OAI-PMH feed, and in fact probably aren’t even in most repositories metadata. Most repository metadata doesn’t try to connect a pre or post-print to the actual published version in any way.

This is annoying, but I found that author/title keyword search worked good enough to be useful even without this, so it wasn’t a roadblock.

5. Might be publically accessible, but is it open access?

This gets at what the SPARC/DOAJ initiative is trying to solve. Okay, I’m a reader, I can look at this article online on the free-web, but what am I allowed to do with it? Am I allowed to reproduce it? This matters to readers and is a real issue, but doesn’t in fact matter to my project. All I care about is if I can show them the full text on the public web—once I can do that, I can worry about helping them understanding the license and their access rights, but first I need to help them discover the article in the first place!


Google feature changes; open access discoverability

So, I’ve found out about a couple new things from Google I hadn’t known about. (Google is such a prominent player in our space, we need to keep up with what’s going on there so we know how to exploit it to maximum effect. I need to remember to go explore google’s interfaces and documentation more regularly to see changes).  1.  Google search API now allows server-side access. 2. Google search allows limit on usage license.  And both these things got me started about open access discoverability again.

1. Google API allows server-side access!

Thanks to Kent Fitch for alerting us on the code4lib listserv.


“For Flash developers, and those developers that have a need to access the AJAX Search API from other Non-Javascript environments, the API exposes a simple RESTful interface….

“An area to pay special attention to relates to correctly identifying yourself in your requests. Applications MUST always include a valid and accurate http referer header in their requests. In addition, we ask, but do not require, that each request contains a valid API Key.”

This is huge. I’ve complained before about how it was difficult to incorporate Google features into my own service-oriented software in a maintainable way when only javascript AJAX functions were allowed.

Now if only they’d do the same thing for the Google Books Discoverability api. That’s where I really need it; it’s still not clear to me how I might usefully incorporate automated general google search (including google scholar) into my library applications dealing with scholarly materials, because of the high chance that what Google returns will be for-pay and not available to my users: I don’t want to show them that.

So it was with interest I noticed a new feature:

2. Google search supports usage rights limit

Take a look at the Google advanced search page. Click on “Date, usage rights, numeric range, and more”. Look, there’s a “usage rights” limit which filters by CC licenses. When did that show up?  Of course, it can only include things in the filter that advertise a CC license in a way that Google’s bots can recognize. (Not sure how this is done, Google doesn’t say; I think I recall there’s a standard CC-endorsed way to do this?).

Unfortunately, some initial test searches revealed that this is a tiny piece of the actual open access pie.  Many scholarly materials that ARE available online open access are not in fact in Google’s indexes. Probably because they don’t advertise it properly in a machine-readable way? Still, this is a great step by Google, and indicates that Google recognizes users are increasingly having trouble with getting too much restricted content in their google search results.

But my frustration remains with the scholarly open access community. If the problem is that open access repositories aren’t advertising CC licenses properly–why aren’t these software packages (many of them open source) being fixed? Why isn’t there general concerted funded effort from the open access repository community to solve this general problem: And the general problem is there’s no good place to search aggregated open access content and ONLY open access content. To use in software that wants to answer the question “Is there an open access version of the article with this title and author available?” No good way to do it. And this lack of discoverability is a huge problem with the utility of the existing open access repository domain. I don’t understand why there isn’t more concerted effort to solve it.

Although, in fairness, I did recently become aware of a European initiative, that’s apparently actually funded, to address at least part of this issue.  Registering in machine readable format whether content is open access is the first step to building aggregated indexes. (It’s a dirty secret of the ‘open access repository’ domain that much of the content in so-called “open access repositories” is not in fact open access at all, it’s behind IP and password based restrictions. A cursory sample of items in repositories listed in the OpenDOAR–whose collection policies say that a reason for EXCLUSION from OpenDOAR is “Site requires login to access any material (gated access) – even if freely offered”–will reveal that that collection policy is quite often honored in the breach. Although I guess DOAJ has less of a problem with that, and that SPARC/DOAJ initiative is just about DOAJ, so it’s not clear to me that the SPARC project will really address my problem.  I guess the SPARC project is about people not being sure if they can re-use material in DOAJ journals—my problem is being able to do a meta-search limited to publically available open access content in the first place, and I don’t care if it’s licensed for re-use, I just want to find only stuff that is actually viewable online for free!

Hmph.   What can we do to get the open repositories communities to take note of this problem and address resources toward it?


rails debugging

I know other rails devs read this blog. I LOVE ruby-prof.  It rocks. You have to use the ‘graph’ profile to really get it’s power, in default mode it doesn’t do much.  I haven’t even tried it yet with KCachegrind visualization, haven’t had the energy to go over THAT learning curve. Like everything else in the Rails world, there’s a bit of a learning curve to figure it out–for me, mainly in finding the right documentation. Which was that excellent blog post referenced above. After there, it flowed smoothly.

It’s really helping me figure out where the bottlenecks are Umlaut resolve action.

The query_trace plugin is pretty great too.

And, in that vein, I still don’t understand how some of my fellow coders get along without ruby-debug. But if I were better at conscientiously writing the unit tests I should be writing, maybe I wouldn’t be using ruby-debug so much.


Rails gotcha — assigning relationships

You’ve got Employees and Departments. Each Employee has one Department, each Department has many Employees. Very many.  Let’s say thousands, or even tens of thousands.

So you want to create a new Employee and assign it to a Department.

dept = Department.find_existing_dept_somehow()  # existing one fetched from db

employee = Employee.createAndInit() # newly created not yet saved

Now you have two choices

1: departments << employee
2: employee.department =  department

Either way you end with: employee.save!

Those might look equivelent, but I think the first ends up being a huge performance problem. I believe that is because the first call will end up requiring a fetch of all the department’s employees (thousands or more), before adding the new employee to it–and possibly doing an implicit save of one or more object too.  While the second never forces the potentially expensive fetching of all the department’s employees. But I’m just guessing here. All I know is that when I changed the #1 style to the #2 style, I just erased one mysterious performance hit in my app.


Think you can use Amazon api for library service book covers?

Update 19 May 2008: See also Alternatives To Amazon API including prudent planning for if Amazon changes it’s mind.

Update: 17 Dec 2008: This old blog post is getting a LOT of traffic, so I thought it important to update it with my current thoughts, which have kind of changed.

Lots of library applications out there are using Amazon cover images, despite the ambiguity (to be generous; or you can say prohibition if you like) in the Amazon ToS.  Amazon is unlikely to care (it doesn’t hurt their business model at all). The publishers who generally own copyright on covers are unlikely to care (in fact, they generally encourage it).

So who does care, why does Amazon’s ToS say you can’t do it?  Probably the existing vendors of bulk cover image to libraries. And, from what I know, my guess is that one of them had a sufficient relationship with Amazon to get them to change their terms as below. (After all, while Amazon’s business model isn’t hurt by you using cover images for your catalog, they also probably don’t care too much about whether you can or not).

Is Amazon ever going to notice and tell you to stop? I doubt it. If that hypothetical existing vendor notices, do they even have standing to tell you to stop? Could they get Amazon to tell you to stop? Who knows.  I figure I’ll cross that bridge when we come to it.

Lots of library apps are using Amazon cover images, and nobody has formally told them to stop yet. Same for other Amazon Web Services other than covers (the ToS doesn’t just apply to covers).

But if you are looking for a source of cover images without any terms-of-service restrictions on using them in your catalog, a couple good ones have come into existence lately.  Take a look at CoverThing (with it’s own restrictive ToS, but not quite the same restrictions) and OpenLibrary (with very few restrictions). Also, the Google Books API allows you to find cover images too, but you’re on your own trying to figure out what uses of them are allowed by their confusing ToS.

And now, to the historically accurate post originally from March 19 2008….

Think again.


Jesse Haro of the Phoenix Public Library writes:

Following the release of the Customer Service Agreement from Amazon this past

December, we requested clarification from Amazon regarding the use of AWS for library catalogs and received the following response:

“Thank you for contacting Amazon Web Services. Unfortunately your application does not comply with section 5.1.3 of the AWS Customer Agreement. We do not allow Amazon Associates Web Service to be used for library catalogs. Driving traffic back to Amazon must be the primary purpose for all applications using Amazon Associates Web

There are actually a bunch of reasons library software might be interested in AWS. But the hot topic is cover images. If libraries could get cover images for free from AWS, why pay for the expensive (and more technically cumbersome!) Bowker Syndetics service to do the same? One wonders what went on behind the scenes to make Amazon change their license terms in 2007 to result in the above. I am very curious as to where Amazon gets their cover images and under what, if any, licensing terms. I am curious as to where Bowker Syndetics gets their cover images and on what licensing terms–I am curious as to whether Bowker has an exclusive license/contract with publishers to sell cover images to libraries (or to anyone else other than libraries? I’m curious what contracts Bowker has with whom). All of this I will probably never know unless I go work for one of these companies.

I am also curious about the copyright status of cover images and cover image thumbnails in general. Who owns copyright on covers? The publisher, I guess? Is using a thumbnail of a cover image in a library catalog (or online store) possibly fair use that would not need copyright holder permission? What do copyright holders think about this? This we may all learn more about soon. There is buzz afoot about other cover image services various entities are trying to create with an open access model, without any license agreements with publishers whatsoever.


Google Book Search API

So Google has announced a much-awaited api for pre-checking availability of full text in Google Books. Here is one post with more detail than other announcements I’ve found.

I note that the API is described as a javascript api, and examples are provided where the request to the API is made on the client-side with javascript.

However, there’s no technical reason why you couldn’t do this server-side as well. It’s just an HTTP GET request with certain query parameters which returns JSON. I can certainly parse JSON server-side.

It makes a big difference to me whether I can do this server-side or not. Why? One example is because my software wants to query multiple sources of digital text (including our own licensed e-text from our catalog), and do something different depending on whether there is any available text or none. In some contexts, the user may even get an entirely different page depending on the answer to that. It’s difficult or impossible to implement that kind of logic only on the client-side (plus it would only work for those with javascript).

So there’s no technical reason I can’t do it server-side. But Google may certainly stop me with policy. They could rate-limit requests to the API from any given IP (and it sounds like they DO: “Because developers often issue an atypical quantity of requests, you may accidentially tip the security precautions found in Google Book Search.” ). Google certainly has it’s own business reasons to want to aggregate as much individual data as possible, not let my application be an intermediate proxy. (Google is in fact in the business of collecting usage data, not of providing search. Think about how they make their money). So hmm, time will tell.

Interestingly, a couple of the examples on the announcement are Google Books pre-check integrated into sfx! It sounds as if this was done by Ex Libris, not by the individual customer. And when I attempt to reverse-engineer the HTML to see what’s going on–it looks to me like SFX is indeed making a server-side pre-check, not doing it in javascript on the client side. Which would be encouraging. Unless Ex Libris somehow has special permission. Hmm.

Eagerly awaiting more information about this. Not quite sure how to get it.

updates (14 Mar):


Got a reply from Google:

You can do something similar to this on the client side, just add some logic
in the JavaScript on whether or not to show the div with the books dependent
on the viewability information.

Unfortunately, we don’t support server side querying of the API, because
viewability is based on local rights limitations (different countries
consider different books to be public domain), and we think it hurts theuser experience to provide incorrect viewability information.

Doh! This doesn’t really answer my concern I’m afraid, I really can’t do what I need to do client side, at least not without extreme difficulty or loss of functionality. But I guess that’s how it’ll be!


Had the idea of asking the ksu SFX example for an XML response, to see what that tells us. Of course everything in the XML response is neccesarily generated server side. The Google Books section looks like this:

<target_public_name>Google Book Search</target_public_name>


Hmm. It’s hard to say. The XML response does not include what’s in the HTML response telling you for sure what kind of access is available. But it does include a google books URL that sure looks like it required talking to Google on the server-side to generate–it doesnt’ have an ISSN in it, it has a Google unique ID. How would SFX know that Google unique ID without talking to Google on the server-side? Which Google tells me they don’t allow. Hmm. Curioser.

Another update: If I turn off javascript and look at the ksu SFX page, I still get the Google Books link. It would definitely appear to be server side. Is Ex Libris SFX allowed to do something that I am not?