MARC 856, I don’t like you

If there’s one MARC field that needs an over-haul, it’s the 856. Roy has already talked about how it’s pretty much impossible to tell what the url in an 856 represents with relation to the item cataloged.

But here’s a question for you, let’s say you have an 856 URL to full text for a serial. And you know what date ranges it covers. What sub-field would you put that in? $3 or $z? I see it in both. $3 seems somewhat more appropriate, but $3 also often contains various other kinds of information, such as the name of the provider, or the chapters of a monograph that are covered.  Or even a human-readable description of whether the link represents full text at all, or just tables of contents etc — most 856 fields don’t have this at all, but when they do it’s usually in the $3.

So basically, if you want software to be able to tell the user what range of dates is covered in full text, from the MARC, forget about it.  And that’s not even talking about trying to get a machine processable range of dates, so software could actually calculate whether a particular date/issue is included.  Even finding a simple display string representing dates of coverage is pretty much infeasible.

For how significant links to electronic versions (or supplementary information) has become, I’m kind of amazed that the 856 spec and practice hasn’t really been changed since the early days of the WWW.  Probably becuase most of our ILS’s wouldn’t be able to handle it anyway. A vicious circle, as we generally run into with MARC.

What’s needed

Just for a start,

  • A way to encode in machine-recognizable way whether the link represents full text, or just excerpts, or something else entirely (like a review). Right now full text and excerpts are coded the same — when they’re coded at all. As Roy’s paper discusses.
  • A field for extent of coverage that is actually used only for extent of coverage. The $3 is theoretically this, but people throw all sorts of stuff into it. And possibly different fields for date range extent of coverage, vs. other extent of coverage (like for a monograph, chapter 3 only, or volume 1 only, etc).  [Yes, it now occurs to me that the way this interacts with the first bullet point needs to be thought through].
  • A field for displayable provider/platform.

And that’s just the low-hanging fruit. Then we start wanting the provider/platform to not just be human readable, but controlled in some way (URIs?), so we can collocate by provider/platform.

And then we start thinking about how we really want the dates of coverage for a serial to be machine actionable so software can compute if a particular date/issue is included. Which gets us thinking about Marc Format for Holdings, which is it’s own entirely gigantic mess.  I don’t think this kind of ‘holdings’ data is usually used at all with records with an 856 representing electronic full text.  But even if they were, typical MFHD use makes it pretty infeasible for software to actually answer this question.


21 thoughts on “MARC 856, I don’t like you”

  1. From the practical standpoint, whether you need to use 856 $3 or $z right now will depend on which subfield your OPAC displays. Library of Congress currently uses $3. That was my first hint several years ago that I needed to be alert regarding a need to change subfields. At that time, however, we were still using Follett’s older Circ/Cat software, and it would only display text in $z. When we changed software suppliers last year, I globally moved content of all affected records from 856 $z to 856 $3 BEFORE importing the MARC records to the new system. Had I left the description in $z, the new catalog would have displayed the URL, not the description.

    So, regardless of what needs to be overhauled about the 856, it’s essential that you understand how this field is currently being translated by both your present OPAC and by any new ILS you’re considering.

  2. See, the problem is that people writing MARC to meet the needs of their specific local OPAC means that it becomes mostly impossible to use MARC data in novel and innovative services outside the bounds of the traditional OPAC. Not to mention that it’ll cause you problems next time you switch OPACs, but that’s not really my focus here.

    I realize people have no choice but to deal with the crappy OPACs we’ve got.

    But our cataloging data is incredibly valuable. It’s valuable for a lot more things than just providing an OPAC search. We can use this data to power all kinds of services to help users find and access materials of interest to them. But it becomes awfully hard to do that when our data is unpredictable and idiosyncratic, rather than standard. MARC (or even AACR2 in MARC), despite all the effort at standardizing it, despite cataloger sometimes telling me they can’t possibly do something because it would violate AACR2 or MARC… is in practice not very standard at all. It’s all over the place. And that makes it very hard to write and share software that uses MARC data to do new things.

  3. Perhaps this is too simplistic to meet your needs, but the second indicator of the 856 CAN be used to indicate whether the link points to the resource itself, a “version” of the resource, or a “related” resource. It’s not as specific as you (or I) would want, and it isn’t always utilized appropriately (as usual when it comes to MARC records), but in WorldCat and at least some vendor record sets I’ve seen, a second indicator of “0” is often used when the 856 contains a link to the full-text of the resource described in the record. Also, 856 is a “holdings” field, included in the MARC 21 Format for Holdings Data (MFHD), so it is possible to incorporate an 856 into MFHD records along with other fielded data (such as chronology and enumeration). The ILS software my library uses supports this reasonably well (I just can’t export the holdings data out of my system until I pay them a bunch of money for a new export profile). I agree that MFHD and Z39.71 leave a lot to be desired, but they are far more machine actionable than the alternative (free-text holdings statements).

  4. Sorry, one more quick comment. I, for one, wouldn’t be sorry to see the 856 go away entirely, replaced by mechanisms for looking up whether a particular resource is available online in other applications that are optimized for keeping track of digitized files and their location on the Web. For example, I love the way Google Book Search works; I implement some JavaScript code in my catalog that searches for online full-text for a particular book based on criteria I specify and displays a link if it finds one. That way, I don’t have to code and maintain all of these links in 856 fields stored in my local records.

  5. A second indicator is often set to “1” for both full text, and table of contents only or other excerpts from the work itself. Roy’s study showed this is true in fact; and I’m not sure it’s wrong according to the way the instructions are written.

    Perhaps “0” is supposed to mean only full text, but I think I’ve even seen 0 used for table of contents.

    As always, MARC is flexible enough that IF it were used consistently, the format itself is capable of representing what we need. But it is not used consistently, which means there’s no predicting what you’ll find there.

  6. I do not like 856s either. Date ranges may vary all the time, depending on your institution’s subscription etc.

    I find it very annoying to have to add such volatile information in a bib record, anyway. MARC was created to manage relatively static information.

    I suppose link resolvers should take up such tasks, should they not?

  7. *not* a cataloger! but I really don’t think it’s a good use of time to put the date ranges of coverage in the 856 field. This stuff changes so fast, and catalogers everywhere are so far behind, there’s no way it could ever be reliable. I heartily agree, though, that there should be a consistent way to indicate that the link goes to the ToC at the LoC or something instead of the full text.

  8. I asked about this sometime back on the AUTOCAT list. There is no controlled vocabulary for subfield 3, so folks just use whatever makes sense at the time. Since I have a lot of libraries in Texas that want to catalog their historic photos, maps, etc. and include links to digitized content, we’re promoting a best practice of using multiple 856s with $3 for “Thumbnail image” “Access copy” “Archival copy (TIF)” and so on. A true controlled vocabulary or additional indicators would be an easy way to fix the 856 problem, but I don’t see that happening.

  9. I agree there needs to be some clarity for coverage in the context of an 856 particularly when it is applied to bibliographic records. But I think that data is more easily understood manipulted and indexed if it is parsed.

    Since the 856 is part of the MARC format for holdings, when it is used in conjunction with the other 8XX fields that make up a holding record, the coverage can be made very clear and is more easily updated because its parsed.

    In fact one former library I worked at years ago was getting so many differnt electronic versions of the NYT we created sep. holding records for each electronic copy and applied the 856 to the holding record and used other 8XX field to describe the coverage of each.

  10. Our catalog is a good tool for managing an inventory of things we control. We don’t control the holdings in our online resources. So I agree with Christina that trying to maintain that information in a catalog is probably not a good use of our time. The knowledgebases of our link resolvers should be a better place to keep it but right now they are pretty sloppy.

    The KBART Group’s work is relevant here: It’s a UKSG/NISO effort at this point. One of their goals is:

    “Develop and publish guidelines for best practice **to effect smoother interaction between members of the knowledge base supply chain.** Knowledge base providers and their customers (primarily academic libraries) will benefit from provision of higher quality data by content providers. Publishers will benefit from accurate linking to their content and subsequently the possibility of increased usage.”

  11. I’m also not a cataloguer, (interested, but RDA turns me off!). I also think that the 856 needs to be represented elsewhere, *except* for when used to export/import records.

    This I see as the power of OpenURL-based things such as Umlaut, providing material for the OPAC to consume as JSON/XML/whatever. This separates the catalogue with needing to know everything, but does complicate the whole picture…!

    I wonder how the SerialSolutions interest in the eXtensible Catalog project relates to this?

  12. But, Tom, the implication of what you said is you should store information about URLs that you’re incapable of importing/exporting to/from other systems. That doesn’t seem right. We need to figure out how to store the info we actually need — but then we need to be able to import/export that info, don’t we?

  13. Yeah, it doesn’t sound perfect, but the idea I had was that import of MaRC data would be through the OpenURL system, which “knows” about the local library system too, (through Z39.50), and enriches the MaRC exported through that system.

    I know this sounds needlessly complex, but it’s technologically feasible, je crois! It would be the next logical step for a next-gen library system, but possible today, esp. for SFX & Voyager. There are packages which do MaRC read-write, File:MARC for php, and probably something in Perl, Ruby (not sure).

  14. BTW, this would be optional, of course!

    if ($metadata_enrichment == “possible”){
    return (enrich_with_marc($rec));
    } else {
    return ($rec);

  15. I’ve done an awful lot of work with OpenURL, and i don’t see how it’s capable of doing what you’re suggesting, I don’t get it, I’m confused.

    We quite likely could use something other than MARC for exchanging bibliographic data. OpenURL isn’t it, for like twenty reasons.

  16. To be clear, Tom, the perspective I’m coming from is from WRITING an OpenURL link resolver, that wants to get information from the catalog on various things including holdings and links to full text and links to supporting information — and _can’t_ do so consistently/reliably because the data is not recorded there clearly enough! Involving another system in things isn’t a solution, it’s just making the problem more complex, and in fact more clear — the lack of clear data becomes a salient problem exactly when you want to expose it to other systems. But you still need the data, and you need the data in a way it can be communicated in a standardly understandable way. Just saying ‘the OpenURL link resolver will handle that’ is not a solution, in fact it’s the problem statement in the first place!

  17. [penny drops]
    Aren’t there nicer places to get this data than [your?] catalogue? And where does your catalogue get the data from in the first place?

    I think a lot of cataloguing is copy/enhanced cataloguing from another source… but I think you’ve probably already looked at this[?] I’m thinking mostly WorldCat (not sure about other sources).

    Not being that familiar with what exactly is under the Umlaut’s hood, which catalogues are you looking at?

    I notice that although you’re looking at using the 856 for getting data *from* a catalogue, replies from Sue & Jordi also seem to be talking about this information not belonging in the catalogue anyway.

    I think my confusion is that you’re talking about MaRC as if it’s the way the catalogue is stored, while MaRC is, to my mind, (again, not a cataloguer!), an *exchange* format. This is why I was thinking about an OpenURL resolver consuming the MaRC records (for example, Safari eBooks), and using your OpenURL resolver to put this data in your catalogue.

    Okay, I’m rambling… I know I’m talking about using OpenURL to fix the OPAC/catalogue, but I agree with this information not really belonging in the catalogue. At least, not while MaRC is used for cataloguing, and it will be interesting to see how well RDA will do here.

  18. OK, I’m not a cataloguer either, but it seems to me (perhaps naively) that the problem with the 856 field is equally applicable to 852 – both fields indication location of a holding, not what is being held. This works reasonably well for monographs where the bibliographic record describes what is being held to a useful level of accuracy, but not for serials, where to be useful you need the details of the dates/range held to know – is the 852 $3 any different/more useful than 856 $3?

    To go further, as we see a separation of the ‘discovery’ systems from the traditional LMS, the separation between holdings and bibliographic description becomes more of an issue, and I wonder if the concept of ’embedded’ holdings statements (852 and 856) becomes completely redundant. What systems these end up being stored in (so called ‘OpenURL Resolvers’ or ‘traditional LMS’) isn’t so much the issue, I think the separation of ‘holdings’ from bibliographic description is necessary.

    If we separate ‘holdings’ out, and make it possible to deliver that data in machine calculable formats, and via machine to machine interfaces (functions OpenURL resolvers tend to support), then it makes it easier to embed information about the library holdings into different environments.

    This is important in being able to answer the user question ‘where can I get this’ rather than the question ‘what is in this library’.

  19. I realize that this may not be the right forum to bring up the topic, seeing as most people here are justly bemoaning the inadequacies of Marc 856, but I’d be interested to hear opinions on including supplemental material in the 856 field. It seems there is more and more relevant, useful or interesting information being provided online by authors/publishers that would be worth displaying in an online catalog, but I haven’t been able to find any precedent. As an example, see the related site for Superfreakonomics –

    What do you think? Is there a place for this type of material on an OPAC?

  20. There is most _definitely_ a place for that kind of material in the display to the user, I think there’s no doubt.

    I just worry about our inadequate tools for actually managing this info; if you add it to marc 856s, is there any way for the software to tell which links are full text and which are just supplementary? If you add it to 856, do you have any procedures in place to periodically check if the link is still good, and remove it if not? (A standard ‘link checker’ often doesn’t work, because many web sites incorrectly return a 200 http status code even when there’s nothing at the url anymore, or the wrong thing. Not that most library systems are set up even to do this level of automated link-checking). Without that, our systems quickly become over-filled with no-longer-working URL links, and users learn not to bother clicking on any of them, because odds are the click won’t work.

    For this latter thing especially, this is something that we really ought to be doing cooperatively, so we can share the work of discovering supplementary URLs, as well as discovering when they no longer work anymore. But our cooperative cataloging infrastructure isn’t set up to support that either.

    So it’s not that it doesn’t serve the users, or that its’ not the proper role of an OPAC or other library discovery interface. It definitely is. But instead, the problem is that we don’t actually have adequate infrastructure to support the maintenance of this kind of data. A sad thing.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s