jump to navigation

digital media in dissertations November 1, 2009

Posted by jrochkind in General.
9 comments

For personal (rather than work) interests, I was interested in the dissertation written by an acquaintance.

Harmony in Bulgarian music
by Kirilov, Kalin Stanchev, Ph.D., University of Oregon, 2007 , 531 pages; AAT 3294000

I found it in Proquest Dissertations & Theses no problem, and shortly had the PDF. Ain’t the 21st century grand?

But wait, reading the text, it turns out that the dissertation has accompanying CDs. Listed in the table of contents as “POCKET MATERIAL: Three Compact Discs…. Inside Back Cover.”

But of course I can’t get the CD’s from Proquest. That got me started thinking, what if Proquest excepted digital attachments with dissertations? But then I realized they’d have to get into the much of digital archiving, deciding what formats they accept and developing a plan to maintain them as readable. (This might be unreasonable to expect from a business that currently doens’t seem to even bother OCR’ing it’s digital dissertation PDFs, at least this one wasn’t).

Then I wondered if maybe the University of Oregon had a dissertation repository that might actually have those CD’s online. I mean, they’re already digital materials, no need for ’scanning’, just an easy CD rip. But the likelyhood of this existing didn’t seem high enough to overcome my laziness and send me on an investigation to see if it existed. (Shoudln’t that be easier too?)
I wonder if any universities are making available digital attachments (‘pocket material’) that go with their dissertations.

The technical issues aren’t much, but the legal issues are probably more of a barrier: it’s probably fair use to attach a CD to a single copy of a dissertation regardless of copyright, but not neccesarily to put the same recordings in an online archive, or to make them publically available.

I probably couldn’t even ILL the dissertation in question; most universities won’t send their physical dissertations through ILL, will they? I guess I’d have to go there and listen to it, or track down the author of the dissertation and ask him for a copy (that will get a lot harder in 100 years, naturally).

interests… October 14, 2009

Posted by jrochkind in General.
3 comments

…Of readers, publishers, authors, and libraries all more or less came to a compromise in inherited publishing market. But the digital age is upsetting that compromise, that’s for sure.

As digital collections grow, Mr. Sargent said he feared a world in which “pretty soon you’re not paying for anything.” In part because of such concerns, Macmillan does not allow its e-books to be offered in public libraries. The company publishes authors like Janet Evanovich, Augusten Burroughs and Jeffrey Eugenides.

http://www.nytimes.com/2009/10/15/books/15libraries.html?_r=1&hp

I’d note that (at least in the US) the first sale doctrine would make it impossible for publishers to prohibit libraries from buying and lending physical books — we legally have that right.  But electronic books, covered by licensing agreements and not covered by the first sale doctrine? Apparently they can tell us we aren’t allowed to buy them and lend them.

(Could they tell an individual that once purchased (or ‘licensed’), she couldn’t let anyone else read it on her e-reader, they had to buy their own copy? Maybe.)

Although according to wikipedia, this is actually something of a legal gray area, not entirely decided. Maybe the first sale doctrine does apply to software in general, and e-books in particular. I bet e-books would make a better test case (for those who want to see that it does apply) than software, since they are so analagous to the print books the first sale doctrine was actually intended for.   It would be nice if some library was willing to push it, buy an e-book and lend it out, insisting that the first sale doctrine gave them that right,  even if a publisher insisted they weren’t’ allowed to do so.

That it’s libraries involved makes things even more confusing, because there are, according to wikipedia, special exemptions for libraries in certain provisions which specifically exempt computer software from loan or rental under the first sale doctrine.

Opening for dept head at MPOW October 14, 2009

Posted by jrochkind in General.
add a comment

A position has been posted for the head of the department in which I work.

user behavior October 8, 2009

Posted by jrochkind in General.
add a comment

Thanks to Lorcan Dempsey for pointing out this very interesting report on “Discoverability” from the University of Minnesota.

The report basically analyzes the research-acquiring behavior of their users (and academic library users in general via the literature), and comes up with some trends and suggestions for library strategic directions to meet their needs. I recommend it highly, it’s got a nice executive summary (which is pretty much all I’ve made it through so far, but I plan to read more). (Incidentally, why does cut and paste from the PDF result in gibberish? Very annoying. Speaking of usability.)

Somewhere I forget recently I read a piece of science journalism that had a quote from a scientist along these lines:  “There are two kinds of interesting research findings. Sometimes you discover something you did not expect at all, and sometimes you verify something you suspected but didn’t yet have sufficient evidence for.”

Most (but not all) of what’s in the report is in the second category for me, but of course that doesn’t impeded it’s use, the evidence-based findings is important.

Much of what’s in the report resonate with existing ideas I had for intermediate term development of library digital services, with implementation ideas often made possible by Umlaut.

Use of non-library discovery interfaces

Trend 1. Users are discovering relevant resources outside traditional library systems.

[...]

[Suggestion:] …We need to ensure that items in our collection are and licensed resources are discoverable in non-library environments.

I’d on to this “ensure that library services for making use of items are accessible even when the user starts from a non-library environment.” (Eric Lease Morgan has talked a bit about this).

One of my long-standing goals for Umlaut I think resonates with this. Umlaut is designed from the start to be a ‘landing page’ for library services for a known item. No matter where you find an item, if you want to find out how you can get it from the library or what library services exist from it, Umlaut will do that for you.

Umlaut, like any link resolver, traditionally does this by working with licensed vendor discovery services that send an OpenURL link to Umlaut. But the problem is that users are using many discovery interfaces that don’t this, and are unlikely to do this in the near term (for various reasons, including business interests of the operators of those services). So what can we do?

Well, one thing I really want to do is customize LibX to work optimally with Umlaut, adding links to the Umlaut page to the third party discovery interface. (or even Umlaut-provided services directly on the third party page).  Find a book in Amazon, Google Books, or a variety of other places? No problem, we can still connect you to Library availability and services in one click (or zero clicks if the info is inserted directly on the page!).

It’s unfortunate that a browser plugin (which works only with IE or FF, not Safari, not custom smartphone web browser, etc) is required for this, but I can’t think of much other way around. Another possible ‘fallback’ interface could be providing the URL of the page you are looking at to a server-side application, which does LibX-style processing on the server, and then tells you what the library can do for you for items found on that page. This might be the best ‘fallback’ option for users who can’t install a LibX style plugin.

Delivery

Trend 2: Users expect discovery and delivery to coincide. Searchers do not distinguish between discovery and delivery in their web searches…

[Suggestion] …systems, data, and information should be optimized for fulfillment.

This gets to another long standing desire I’ve had — to unify my libraries various delivery mechanisms in one simple interface with as few clicks as possible.

We offer a variety of pretty useful delivery mechanisms. Sure, sometimes there’s immediately available electronic text.  When there’s not, there’s ILL.  For some combinations of user type and material type, we’ll also deliver physical copies in our stacks directly to your office; or make a scan of a chapter or article from a volume in our stacks and email it to you. But other materials are in-library use only — some of these you can pull of the stacks yourself, and others of these you need to request a pull and view it in a special office (eg special collections; also some AV materials).

Pretty darn useful, but we have a variety of different forms and interfaces (at least 6 different ones, if not more) to make a variety of different requests.  You’ve got to know they exist, find em, fill em out.

Instead, I’m imagining a ‘delivery menu’ (brand it like a chinese takeaway menu if you want to be cute) that figures out, based on who you are, whether the item is in our stacks or not, and what type of material it is, tell you exactly what you can do with this. You can view this in the library. You can check this out. You can have it pulled for you and waiting at the circ desk to check out. You can have it delivered to your office. You can get a photocopy of a specific article emailed to you. Etc. And present all this information — and provide actions to choose an option — in as few clicks as possible.

Combined with the prior note, we can imagine that a user finds an item of interest on Amazon, and then in as few clicks as possible (0-3) finds out what delivery options are available to her, and chooses one of them, knowing how long we approximate it will take to get to her depending on her choice.  Of course, this ‘delivery menu’ would be available in library discovery interfaces too, but the real power is in combining this with acknoledgement of the “using non library discovery services” trend.

This is totally do-able, especially on the Umlaut platform. The real challenge is on the business process end, not the technical end. (Consolidating and rationalizing all our delivery options, potentially requiring changes in staff workflow or policies to make everything make sense).

Mobile

Trend 3. Usage of portable Internet-capable devices is expanding.  Rather than just supplementing the desktop computer, mobile devices are poised to become the primary means of Internet access for a critical mass of users.

This one is trickier to figure out how to address, especially when combined with this reccommendation from the report:

…we should strive to be end-user device/platform agnostic.

Taking account of that I wouldn’t actually move to “develop an iPhone app” for it, as seems to be a popular trend. We don’t have the resources to develop and maintain custom apps for every possible advanced mobile device in use.

Instead, I’d develop special stylesheets for our core services that divide and format pages appropriately for an iPhone or similar next generation smartphone “high resolution mobie display, significantly smaller than a laptop or desktop.”  [Look for an upcoming article in Code4Lib journal adressing how you start doing this.]

Additionally, I’ve thought before about developing some SMS (aka ‘txt message’) interfaces to meet the lowest common denominator of cellphone mobile net access.  I would like it if you could text an ISBN (or ISSN, or even DOI) to Umlaut, and Umlaut would text you back with whether the library has it and what you can do. “Reply with the numeral 1 to place an ILL request for this item.” (Or other appropriate options.). Also, if you happen to have a camera cell phone, why type in an ISBN when you can snap a picture of a barcode, and MMS it to Umlaut instead?

Again, totally do-able, especially with Umlaut as a platform.

Recommendation Systems

Trend 4: Discovery increasingly happens through recommending. Facilitating discovery requires us to develop and implement systems that push relevant content to users and allows users to share content with others.

This one is harder.

The study recommends:

We should capture the data necessary to provide targetted suggestions to users and defer to network-level systems where a critical mass already exists.

Umlaut was in fact originally intended by Ross Singer to capture that data to provide those systems, but those features were never really matured, and are not currently present in Umlaut. Umlaut as a platform is still potentially a key point to capture data — but the study’s point that we need to move this data to aggregated systems with a critical mass is key. Just data from my institution is not going to cut it to algorithmically provide useful recommendations.

Perhaps an Umlaut that captures data and then sends it to a cross-institution SOPAC installation? (I’m not sure if the SOPAC infrastructure can handle article, rather than book/title, citation data or not).

Ex Libris’s bX service is designed to do this too,  although currently can only take source data from a stock SFX installation (not from Umlaut); it could still be used to provide recommendations in Umlaut, if we wanted to pay for it.  I expect more vendors to start adding such services.

As a stop-gap, Umlaut currently does provide links on it’s “landing page” to the recommendation services we were already paying for: Scopus and ISI Web of Knowledge. On a landing page for an item, Umlaut gives you one-click access to Scopus or ISI’s “similar items”.  (which are not based on usage, but based on reference and metadata similarity).

Non-traditional objects

Trend 5. Our users increasingly rely on emerging nontraditional information objects. The format of useful and discoverable information is much broader than those traditionally offered through the libraries; users increasingly rely upon multimedia objects, data sets, blogs, and other “grey” objects to meet their information needs.

Okay, this one is a stumper. Nothing in my pre-existing idea bag meets it, I’ve got nothing.

The issue is that my services, such as Umlaut, really rely on pre-existing databases/knowledge bases the library has of items and what we can do with them. Including both very traditional databases (the catalog) and more recent ones (the link resolver knowledge base).

But almost all of these databases can really only ‘control’ pretty traditional information.  If a user comes to my service with a dataset she’s interested in, my software doesnt’ really have any good way to figure out what we can do for her with that dataset, if we have it in the library, if she can get it ILL, where it is on the internet. I’m pretty much at a loss. (If the dataset has a DOI, and that DOI can be provided, I’m in a bit better shape).

I’d note that the study’s recommendations don’t provide much actionable advice on this one either. It’s a toughy, and requires rethinking larger swaths of library operation to address, I can’t identify many intermediate-term ways to address it, although maybe it just needs some more clever thinking.

Umlaut

I remain pleased at how well-positioned Umlaut is as a platform to address most of the trends identified.  Umlaut exists as a flexible platform for “known item services” — for making a ‘landing page’ (or inserting services on a foreign page via javascript) for giving the user delivery/access options and other library services for a known item — regardless of where the user found the known item, through a library search service, a licensed vendor search service, or a third party web discovery service.

I am more and more convinced that this is a key piece of library infrastructure for the foreseeable future, and the investments we’ve made in it so far are very worthwhile.

But while the platform is there as an appropriate place to add features, as identified above, actually adding the features takes time and resources. I hope I get the time to work on some of them in the intermediate future. Perhaps this report will help make it more clear to some resource allocators the necessity of some of these directions.

cataloging and ‘citations’ September 30, 2009

Posted by jrochkind in General.
14 comments

So my understanding is that many ‘entries’ in a cataloging record are meant to be ‘citations’. They are meant to unambiguously identify the work cited.   In the age when cataloging rules were created, what you’d do with that unambiguous citation was simply look it up in a printed or card catalog.

But the very precise rules involving ‘main entry’ and ‘uniform title’ should, I believe, allow software to unambiguously find the target of the citation in a database, if it’s there.

I am at the very beginning stages of figuring out how to do this exactly, it’s not exactly simple.

If it turns out that you can’t even do this, I’m really going to think that much of the very complicated and time-consuming cataloging rules are irrelevant in the post-card-catalog age. But we’re not there yet.

Initial signs, however, aren’t very good. Take this example from OCLC docs on 76x-78x linking fields.

The first choice for identification is the uniform title. If available, use the entire uniform title (e.g., title and qualifier) to identify the related publication. If the uniform title is unavailable, use the main entry and title proper. For example, if OCLC record number 6597310 has the following uniform title:

130 0 Monthly digest of statistics (Zimbabwe. Central Statistical Office)

It would be linked to the related publication in field 780.

780 0 0 t Monthly digest of statistics (Zimbabwe. Central Statistical Office) w (OCoLC)6597310

Okay, fair enough. And a referenced uniform title should indeed allow us to unambiguously identify records belonging to the cited work.  But wait. That title is clearly a uniform title, it’s given in a 130.

But in the 780 example then… shouldn’t that title be in subfield ’s’, not ‘t’? 780 subfield s is clearly documented as “uniform title”, right?

But wait, $t says: it is indeed used for title elements from a 245 or a 130.  Subfield ‘u’ is only used for field 240 entered uniform titles.

So wait, when citing a work in a 780, you put a uniform title in subfield s if it’s title-main-entry, but you put it in subfield t if it’s author main entry? And when you find a title in t, there’s no way to know if it’s a uniform (controlled) title, or a transcribed (245) title?

Um. So, um.  I am kinda speechless. If you’re going to spend all these expensive cataloger hours following very precise rules, wouldn’t it be sensible to make the rules result in data that can actually be interpreted to do what’s it’s supposed to do?

More MARC issues: 700 September 28, 2009

Posted by jrochkind in General.
10 comments

So, okay, here’s another puzzle for the catalogers.

A 700 (or 7xx in general) could be an ‘analytic’, representing one element that’s the contents of the item cataloged. OR could just represent a contributor (who isn’t ‘main entry’) to the work. An ‘analytic’ will mention the particular part of the work contained, generally in controlled form.

Now, I want to treat this differently depending on if it’s an analytic or not. For instance, just plain contributor names should be listed as ‘contributors’, along with links to collocate on controlled form of name. But if it’s an analytic, I STILL want to seperate out the person’s actual name as ‘contributor’ (and let you collocate in general just by their name).  But I ALSO  say what part of the work they contributed, and give a link to look up other records for that analytic entry (the part).

So 7xx field have second indicator two. Which oddly gives you two possibilities. You can note that it definitely is an analytic entry. Or you can note that you don’t know either way. Very strangely there is no way to even note that you definitely know it’s not! Second indicator blank just means “no information.” So it might still be an analytic.

Of course, even if the indicators gave you a way to record that it definitely wasn’t, no doubt we’d still have plenty of records whose second indicator gave no information.

So….   how can I tell if a 7xx is an ‘analytic’ or not?  Can I assume that it’s an analytic if and only if subfield t is present? Are there any cases where it is an analytic but there’s no subfield t, or where it’s not an analytic but there is a subfield t?

Addendum:

The 730 field specifically is even worse. I don’t know if there’s any way for me to tell if it’s an analytic or not?  I mean, if second indicator is 2, it is. And if second indicator is blank… absolutely no way to tell.

What the heck could a 730 be other than an analytic? Anyone have examples?

Principle of avoiding “false promises” in interfaces September 24, 2009

Posted by jrochkind in General.
3 comments

So lately I keep thinking about this idea I think of as a “false promise” in a user interface.  Not sure if other people already recognize this and refer to the concept by some other label, let me know if you know they do.

But the idea is that your software shouldn’t suggest by it’s input that it can do something that it really can’t do at all.   This becomes especially tricky when we’re dealing with our library data and systems that in fact can’t do a lot of things.  Some examples will help.

SFX ‘citation linker’ input screen

SFX by default has a screen that let’s you input an article citation, and then SFX will try to find links or other information for it.  (I don’t want to put a link to mine here cause I don’t want to attract the robots).

Now, to begin with, this is both an annoying process for the user, and an error-prone process for SFX. But I want to draw your attention to two particular fields on that screen: “Author” and “Article Title”.

The default input screen asks you to input an “Author”. However, in (estimating) 95%-99% of cases, SFX can’t actually do anything with that author or title you’ve input at all. It doesn’t help SFX find a match, it doesn’t effect SFX’s functionality at all.

So our interface implies that the user ought to enter author and title — a painful and annoying process for the user.  The “false promise” here, in my opinion, is that this will do anything at all. Now, granted, in a tiny minority of cases it will, which is why SFX puts the field there. But that means we’re making a “false promise” in the vast majority of cases, in my opinion. We’re “leading the user on.”

MARC relator codes

This might be a better example. So MARC fields for listing controlled authors or other contributors (100 and 700) theoretically allow the data to say particularly what relationship the contributor has to the work at hand. (Author? Editor? Illustrator? Performer on a musical composition? Composer? Wrote a preface?).

Most OPAC interfaces don’t do much with this. But if you start thinking of what you might want to do, an initial naive approach might be to allow the user to limit a search by these relator codes. Don’t just give me any record that has Noam Chomsky in any 100 or 700 — that’s what our traditional interfaces do, but for prolific people it might give me too much. I really only want books where Noam Chomksy wrote a preface.

So, okay, maybe you go ahead and provide this limit in your search interface.  The problem is that the vast majority of our data doesn’t have these relator codes. So if you just do a search for Noam Chomksy with relator code for ‘wrote a preface’, you’re going to miss most of the books that Noam Chomsky really did write a preface for.

You might miss it because Noam Chomsky is in a 700 field with no relator code. Or you might miss it because we don’t often record people who wrote prefaces at all.

In either case though, I think the interface was making a ‘false promise’, it suggested you could search limiting by role of the contributor, but our data doesn’t really support that at all. The results are going to be misleading if the user assumes the interface really can do what it suggests it can.

So?

So what do you think? Any other examples you can think of of ‘false promises’ that our interfaces make?

Identifying the ‘false promises’ is easier than fixing them. Usually they are there because of limitations in our software or data that are not easy or cheap to resolve.  If you really get rid of all of the false promises, you have to get rid of much of your functionality!  Or pepper it with disclaimers and limitations that most users won’t read anyway, and just make us look kind of incompetent if they do. (“WARNING: You can TRY to search on relator code, but your results will only include a tiny percentage of things that really matched your search.”)

A reasonable display for series data in MARC? September 24, 2009

Posted by jrochkind in General.
16 comments

So I know plenty of catalogers read my blog  (or used to).  Appreciate any feedback or advice you have on this.

Basically, I’m trying to figure out how to actually do a useful user-friendly display of  ’series’ information from MARC records.

My assumptions

So we have 440, 490, and 8xx.  There’s a distinction between “transcribed” series, and controlled (aka “traced” or “access point”).  I know that the controlled data is meant to be used for collocation.  I am assuming that the “transcribed” data is better for user display though.  Is this right?   (I’ll refer to these two concepts as “displayable” and “controlled”).

So if we’ve got a 440, then that is both displayable and controlled.

But current practice going forward is not to use 440, but instead to use a 490 for displayable, and a 8xx for controlled.

So what should the interface do?

So thinking about an individual record display. I can’t just list all 440, 490, and 8xx fields under “Series”, because in the case of 490/8xx, that’ll lead to me displaying the same series twice. Once in transcribed form, and once in controlled form. This is confusing and doesn’t make sense.

So what I’m thinking is that for a 490/8xx pair, I actually display the 490 on the screen — it’s the value meant for user-display.  But it’s clickable, and when you click on it, the search that will be executed is actually on the corresonding 8xx, because that’s the field meant for collocation.

This is assuming there is a corresponding 8xx. If there’s not, it’s somewhat simpler. We display the 490, and either it’s not click-searchable at all, or if it is, it searches an uncontrolled series index of all 490s, it doesn’t actually try to collocate on a controlled field, cause we don’t have one.

Does this make sense?  Am I missing something?

But the problem

But there’s still a problem here. A record can theoretically belong to multiple series.  Meaning it could have multiple 490s.  Each of which may or may not have a controlled 8xx corresponding to it.

As far as I can tell, there’s no way to tell which 8xx goes with which 490. Especially since a 490 may or may not have a corresponding 8xx.

This might not effect very many records, that have multiple series, but it still annoys me to have a known ‘bug’, a known case where things won’t work right at all.  I’m not really sure what the heck my code should do if there are multiple 490s.  Am I missing something?

By the way

This is one good example of how it’s somehow difficult or even impossible to get meaningful information out of our AACR2/MARC, despite some people’s belief to the contrary that it’s always simple and straightforward.

So… what the heck should be done with this 440/490/8xx stew?

Amazon Windowshop: Serendipitous Browsing Online September 18, 2009

Posted by jrochkind in General.
add a comment

Fiacre O’Duinn alerts us to a kind of interesting interface Amazon provides, which I hadn’t been aware of before: Amazon Windowshop.

Fiacre asks if this is what the library catalog should look like.

I wouldn’t want the WHOLE library catalog to look ONLY like that — but I think it could be VERY useful and interesting to provide a “serendipitous browsing” interface to the catalog (on top of a more traditional type-in-search-get-result list interface) that is along the lines of Amazon windowshop.

Try to replicate the experience of browsing the shelves, but online you get the benefit that you can arrange books in more than one dimension (as amazon windowshop does in two), re-arrange them in different orders (for instance LCC OR DDC OR something else entirely, don’t have to pick just one), and additionally be able to allow unified browsing of a corpus that may be in several different physical locations (including off-site storage) or may be currently checked out but maybe you want to include them in the ‘browse’ anyway.

I’ve been thinking for a while about how to provide such an online serendipitous browse experience, like a physical shelf browse but taking advantage of the unique affordances offered by the online environment. And I definitely thought (cover) images were a necessary component — I had been thinking of iTunes coverflow as a model. Amazon Windowshop provides another VERY interesting model to try and steal the best parts of — whenever I or anyone else can find the time to try and work on it!  Too many cool projects, not enough time. (And replicating Amazon windowshop would take some fancy coding).

Sophisticated item services from Umlaut in Xerxes federated search interface September 15, 2009

Posted by jrochkind in General.
add a comment

So, if you try to architect your applications solidly and flexibly, and build in features for integration, and it all works out okay, one of the benefits you get is it’s pretty easy to combine them.

I’ve added a feature to the Xerxes federated search tool to add sophisticated item-level information and services that were already being compiled by our Umlaut installation— to  Xerxes record-detail pages.

I think this is pretty neat from a sort of ’single business’ perspective of providing consistent services regardless of what tool the user happens to be using.

So now, when you look at an item detail page in Xerxes, you can, right on that page,  see:

  • call numbers and availability
  • Full text links from SFX, right on the page
  • Links to “similar items” content from Web of Knowledge and Scopus.
  • links to pre-filled ILL forms, as appropriate.
  • For monographic content, full text, preview, and ’search inside’ functionality from Amazon, Google, and others.
  • Other stuff — whatever happens to be configured in Umlaut, when new stuff is added to Umlaut, it’ll automatically show up in Xerxes too. (Well, new services of the existing types; if a whole new type/section is added to Umlaut, will take a couple lines of code in Xerxes to add it).

This is live in production here now, but you can’t really see it without a local login. So here’s some screenshots of Xerxes item detail pages, content from Umlaut circled in red.

book

article

It’s worth noting that this content is inserted on the page by javascript after page load. It can take 1-3 seconds or so to come in (depending on speed Umlaut can do it’s thing), which you can’t see in the screenshots. While waiting, you get a spinner and status message. If a user doesn’t have javascript enabled, this feature won’t effect their page view at all.