provider-neutral ebook records, help!

jrochkind General August 26, 2010August 26, 2010

So any catalogers reading this, would appreciate some ideas or background information if you’ve got em.

So recently, someone (PCC, the Program for Cooperative Cataloging, I guess?), came up with this “Provider-Neutral” policy for “e-monograph” records. Which apparently is now being implemented, picking up steam.

Previously, if I understand right, if there was an e-book published on several different platforms, the cooperative cataloging corpus (meaning basically OCLC, and perhaps also LC) would have a seperate bib record for each one. Although they were largely identical, they had different URLs, among other things. (Not sure how this applies to the increasingly popular case of an ebook that is downloadable, not on the web, but that’s not what this post is about).

Now, instead, all the e-versions will share a bib record.

On it’s face, this made a lot of sense to me when I heard about it. More efficient, why create duplicate records that are pretty much the same, sure, why not, consolidate them, why spend time describing the unique aspects of a partiuclar provider’s representaiton that nobody really cares about anyway, “provider-neutral”, why not.

But when combined with our standard (truly insane) actual real world practices, this seems to result in some big problems.

So we buy a new ebook package, and we get a bunch of recorsd for ebooks. Sometimes we get them for free from the content vendor. Sometimes we get them from OCLC (it’s not entirely clear to me how we bulk download the right OCLC records for a several thousand book package, but we definitely don’t pick em out manually one by one). Sometimes we might get em from yet another party, not sure. Previous to “provider-neutral”, we’d get the record(s) for our licensed provider(s), which would individually have the URLs (marc 856) to access the content from those providers in em. We’d load in em our catalog, which would display the 856 urls, users would see them and click on them, great.

Now, due to the gradual adoption of ‘provider-neutral’, when we get those same records (from any of those sources), if I understand things correctly, an increasingly large portion of them (eventually to be all of them when ‘provider-neutral’ is fully adopted) have 856 URLs for every known provider of the e-book in each record. Half a dozen, more, who knows.

If we just load these all in our catalog, then for our patrons it’s like a game of scratch lotto to actually get to the content. Click on a URL, maybe you’ll hit a paywall and a solicitation to pay them $40 for access, or maybe you’ll actually pick the one(s) we’ve actually paid for as a library, who knows!

This obviously is not acceptable. But there’s also no clear way for us to filter out only the marc 856 urls that we license from the marc bibs. How do you even know which url goes with what platform? They aren’t even identified from any kind of controlled vocubulary. Sometimes the platform seems to be stuffed in a $3 subfield, which is odd since marc defines $3 as “Materials specified”, like “first four chapters only”, or “table of contents only” or what have you. “SpringerLink” is not a materials specified. And on top of that, they’re just thrown into the $3 in narrative form, along with english sentences that may or may not actually describe the ‘materials specified’, as far as I can tell using whatever language the cataloger felt like to specify the provider and/or platform. Different catalogers could use different words to identify the same provider and/or platform (sometimes what is referenced seems to be a provider, other times a platform, two subtly different things). This is not suitable data for machine processing to determine if the platform/provider specified is one we license.

So, um, what the heck are we supposed to do? What were those behind the “provider-neutral e-monograph policy” expecting that we’d do? What are other libraries doing? Having to give users a lotto scratch card of providers and hoping they “get lucky” is a big step backwards in our user experience.

Can anyone shed any light on this? Am I misunderstanding what’s actually going on? Is what’s actually going on different than the “provider neutral” drafters expected to happen? Is anyone in the cataloging world alarmed about this?

There’s an FAQ in the Provider-neutral document that provides some hand-waving about how libraries “might” handle these records in the future. Well, the future is now, what is being done? The answer talks about what libraries “using WorldCat local”might do; surely cooperative cataloging decisions, I hope, haven’t been taken to try to lock people into WorldCat Local, we don’t use it. It says “it is very likely that libraries, vendors, and OCLC will work together to provide the URLs, OCLC numbers, and vendor specific information on MARC records using the provider-neutral OCLC record as the base record.” — has that happened? Even if it has, it sounds like that means “You can pay a lot of money to a vendor for a new service you didn’t used to have to pay for at all, to wind up with basically what you used to have without paying for it.” That’s extra money we don’t have.

Also, on MARC changes

Reading the policy document, it suggests there are some corresponding changes in Marc values, but I’m confused about what they are.

Therefore, we have written a Discussion Paper to MARBI to add two new values in the fixed field byte for “Form of item” across all formats, for online access and for direct access. Currently byte 008/23 “s” is used for records in the “Books” format and in most of the other formats as well; byte 008/29 “s” is used for records in the “Maps” and “Visual materials” formats. If our recommendations in the form of a proposal to MARBI are successful, then code “s” for electronic will be replaced by the two new values, and code “s” will be made obsolete

Did this happen? If so, what are the two new values, and are they documented anywhere?

Inevitable funeral?

I can’t but worry that this is one more nail in the coffin of our cooperative cataloging enterprise. It was a noble endeavor, but it’s just so so broken. Almost all libraries pay for expensive vendor processing services on top of our theoretically cooperatively cataloged records (additional processing the results of which can not be shared cooperatively), and we still end up with (very expensive) data that is not actually sufficient to power the systems that serve our users. And things seem to be getting worse, not better. For a while people have been saying about cooperative cataloging (and cataloging in general) “if we don’t fix this soon, it’ll be too late.” I worry we may already have passed the “too late”, whether we noticed or not, this stuff is a mess, it’s not efficient, it doesn’t serve our users, it is not going well.

Published by jrochkind

View all posts by jrochkind

Published August 26, 2010August 26, 2010

13 thoughts on “provider-neutral ebook records, help!”

Laura says:

August 26, 2010 at 8:27 pm

Jonathan – it’s not uncommon to have to do some processing on records prior to batch loading them into the ILS, no matter the source. On a good day, this manipulation doesn’t take too much work. Some basic regular expression skills and a text editor are required. MarcEdit is a popular choice of tool. I agree, it’s a bit of a pain. But it’s still far less painful than dealing with records 1-by-1.
jrochkind says:

August 26, 2010 at 8:34 pm

Can you give me an example of the sort of processing you do to ensure only licensed providers are included on your 856’s, when they come from a ‘content neutral’ record? Some tips as to how to do this sort of processing with MarcEdit?
Bernadette Houghton says:

August 26, 2010 at 9:25 pm

No problem from my library’s POV; we’re an III library with ERM. Condensing the process to its bare bones, we strip 856s out of bibs before loading (most of them anyway; we keep LOC TOC 856s). Then use ERM to create checkins/links etc for those ebook packages we require.
Laura says:

August 26, 2010 at 9:29 pm

Not offhand, I haven’t personally encountered a provider neutral record with more than 2-3 856 fields. Typically we just remove the extraneous 856 fields – usually there’s some “hook” like the subfield codes being different or unique text string for the unwanted provider. That doesn’t even require regex, just a find and replace-with-nothing.

I recall a recent discussion on the MarcEdit email list about removing multiple 856 fields. A search of the archives would pull it up. I recommend subscribing, it’s quite a useful list. I
jrochkind says:

August 26, 2010 at 9:39 pm

Thanks Laura. Maybe that’s the difference between loading a few of these at once, and loading hundreds or thousands of them at once, as we do. Maybe most people don’t load thousands? With thousands, it doesn’t matter if there’s only 2-3 per record, if we only want 1-2 of them, manually editing them is out of the question. I guess they need to be examined and analyzed for a regexp, for each load? That seems… a lot of work. But I guess I can’t think of any other option.

Perhaps we could maintain a list of all known good URLs (or hostnames, or regexps describing URLs) for all the providers we license, and then just run every batch through software that strips anything not in the known good list. That’s a lot of work too, not sure which is more.

I still don’t get what the provider-neutral drafters intended here. If we’re unusual in loading batches of hundreds or thousands of records at a time, I don’t think we’ll be for long, as everyone gets more and more ebooks.
Lynn says:

August 27, 2010 at 12:42 pm

Our consortium loads thousands of records at a time. What we do with provider-neutral records depends on the source:

Some sources actually shield us from the “provider-neutral” aspect by pre-processing the records for us and inserting our custom link. Costs for such records are higher than standard but not that much higher.

Other sources send us the provider-neutral record and we run them through MarcEdit as the others have described. We search for links that don’t contain certain specific codes that match our member libraries and delete them. It doesn’t take long. Mind you, we only have a few sources of records (2 to be exact) so that makes it easier.

We do automated pre-processing on every record load coming into our system – not just for e-materials. So this is not an addition to our work, it is expected.
Laura says:

August 27, 2010 at 3:26 pm

We process batches numbering from 100s to 1000s of records. As Lynn says, the work is an expected part of production processing.

Jonathan you say you don’t get what’s intended with the provider neutral record. I think provider neutral records are intended to help catalogers with the selection of records vs the editing of records. The provider neutral record gives us a single record for an electronic manifestation of a work rather than multiple records for each publisher/provider. We got into the situation of having multiple records for each publisher’s issuance of a work because of Library of Congress Rule Interpretation (LCRI) 1.0. LCRI 1.0 considers the same work, issued by a different publisher, to be a new manifestation requiring a new record.

E-books don’t really fit that mold, especially when the work has the same content and the same form but merely made available in a different way. If we continued to follow LCRI 1.0 practice of multiple records, then a cataloger searching for copy would get multiple hits and would have to spend time evaluating records to select the best record for their local ILS. Multiple records for the same work also confound automatic batch searching for copy when you’re dealing with thousands of records in a given package. How can you tell a machine to pick the best record when there are multiple choices? The provider neutral record is a sort of assurance that you’re getting the best possible record for the electronic version of a book.

The provider neutral record is not meant to save the time of the cataloger in terms of providing a “clean” record that needs minimal editing. Even when catalogers deal with one record at a time there are local edits to be made prior to ingesting in the local ILS. The only difference with large batches of records is that one is doing global edits vs. individual edits. We still need to spend time editing.

Like others have said, it does take time to set up your regexp to manipulate a large batch. It’s complicated so humans have to do it. It isn’t easily automated. Even so, it’s still much better than manually editing individual records.

That’s my understanding of the advantages of the provider neutral record. I’m a relatively new cataloger, so I hope others chime in if I’ve got it wrong. I hope that clarifies things somewhat for you.
Mark says:

August 27, 2010 at 9:31 pm

Jonathan:

The MARC change you’re referring to went into effect with this past February’s MARC update. The new codes are o and q, summarized on the following page just under Update 11:

“Form of Item” for each format gives further details:

Note that OCLC is not implementing “Form of Item” for the Computer File format until the next version of Connexion Client comes out later this year or early 2011:

“OCLC will add the Form of Item fixed field (008/23 & 006/06), with codes o, q, and [blank] to the Computer File format in a future version of the Connexion client.”

Scroll down to “Form” for the original text:

OCLC fixed field “Form” revisions have made their way into Bib Formats & Standards:
Mark says:

August 27, 2010 at 9:34 pm

Damn, the URLs went missing. Here they are in order again:

Update 11:
http://www.loc.gov/marc/bibliographic/bdapndxg.html

Further details:
http://www.loc.gov/marc/bibliographic/bd008.html

Original text:
http://www.oclc.org/support/documentation/worldcat/tb/258/default.htm#bibchanges

Bib Formats & Standards:
http://www.oclc.org/bibformats/en/fixedfield/form.shtm
John says:

August 30, 2010 at 2:22 pm

We have a vendor script (Aleph) that we can edit for loading e-books. I have had to change the script to find the 856 field that has vendor specific wording (e.g. a URL), then copy that 856 to a 999 field. I then delete the 856 fields, and copy the 999 back to the 856. Actually, in this way, I can also re-create the indicators, which is good since some vendors will leave those out. Some of our software is sensitive to these indicators, so this is an extra plus to using this script.
Jonathan Rochkind says:

August 30, 2010 at 5:51 pm

Mark, thanks, that’s helpful.

Lynn (or anyone else), can you give me any more information on “We search for links that don’t contain certain specific codes that match our member libraries and delete them.”

Provider neutral records aren’t going to have library holdings codes in the bibs. So are you filtering on 856$u, the url itself, against a known list of urls? Or somehow filtering on 856$3? Or doing something I’m not thinking of, because your workflow is different than ours (or perhaps I don’t even understand our own completely accurately!).

If anyone can give me more information on what you do in MarcEdit for the “we just use MarcEdit” folks, that would be really helpful. Go through a typical load, and tell me what field/subfields you filter on in MarcEdit how. Thanks if anyone can!
Laura says:

August 30, 2010 at 7:04 pm

Jonathan – I haven’t done a regex to remove extraneous 856 but I imagine it would be a somewhat similar process.

This is an example of what we did in MarcEdit with e-book records from the publisher (Morgan&Claypool for the Synthesis Digital Library). Each record had only 1 URL but there were variations in how the 856 was treated. We wanted to customize the 856$u so the URL goes through our proxy and the 856$z has our SFX button. The 856$z controls how the link displays in our ILS.

Obviously, the DOI varied in every record. We wanted to retain that unique identifier while we edited.

EXAMPLE OF ORIGINAL 856s IN THE FILE

=856 42$3Abstract with links to resource$uhttp://dx.doi.org/10.2200/S00053ED1V01Y200710BME017

=856 42$3Abstract with links to full text$uhttp://dx.doi.org/10.2200/S00082ED1V01Y200612ENG003

=856 42$3Abstract with links to full text$uhttp://dx.doi.org/abs/10.2200/S00114ED1V01Y200804CEM022

=856 42$3Abstract with links to full text$uhttp://www.morganclaypool.com/doi/abs/10.2200/S00186ED1V01Y200903TIS001

MarcEdit Find (with regular expression):

(?:=856 42\$3Abstract with links to resource|=856 42\$3Abstract with links to full text)\$uhttp://dx\.doi\.org/10\.2200/(\w*)

Replace (with regular expression):

=856 40$uhttps://clsproxy.library.caltech.edu/login?url=http://dx.doi.org/10.2200/$1$z
Pingback: Further adventures in provider-neutral e-book bibs « Bibliographic Wilderness

Also, on MARC changes

Inevitable funeral?

Share this:

Published by jrochkind

13 thoughts on “provider-neutral ebook records, help!”

Leave a comment