de-coupling the metadata system

B.Eversburg on the RDA listserv writes:

Meanwhile, most of us will agree that RDA is still off the mark in that regard [getting us to “new rules and new formats for a new information era” in the words of Peter Murray that Eversburg was resonding to]. MARC is a mere carrier, a wrapper, and it might carry something very different in 300 than what we are used to – if only we wanted. Nonetheless, MARC is regarded as even more backward than RDA.

So, now what? From where can salvation come? What workable alternatives are there? I mean, ways and means that exist and that can fully replace more or less the entirety of what we have now, not just this or that aspect of it because we need a coherent whole.

I am not sure that most of us will agree, actually.  I think RDA actually IS a useful first step — and I can’t think of any other first step than the approach RDA claims to be pursuing.

In our current environment, we (at least in the US) pretty much catalog FOR MARC. MARC serves as our ‘data vocabulary’, and even our rules for entry in many cases come either from MARC itself, or are formulated in terms of marc fields. This makes it VERY hard to switch to something else, because our entire practice is based on MARC, it’s not just a data format, it’s our data vocabulary and a large part of our entry guidelines too. (AACR2 in theory uses ISBD as a data vocabulary, but this ends up being more or less irrelevant to our actual practice, ISBD is way too basic for our actual data, and doesn’t really match MARC, where we actually put the stuff, very well at all — even less well than RDA’s data vocabulary).

RDA, by attempting to seperate out the data vocabulary and entry guidelines, make them completely free-standing and unrelated to any specific record encoding formats — is a CRUCIAL step to swapping out MARC for for something else. Because if you can move to thinking in terms of RDA, with MARC just being one possible (very imperfect) way to encode your RDA data, then switching to something else is _just_ switching your encoding, not simultaneously switching your data vocabulary and entry guidance at the same time. And, if this can be done well, data can even be automatically converted between MARC and that other (or those others) encoding formats — although certainly not perfectly because MARC encodes RDA so imperfectly at present (for instance mushing together various RDA elements into one MARC subfield in the 300 and other places). (Incremental changes/enhancements to match RDA data vocabulary better would be one way forward).

If a software systems rules can be written in terms of RDA instead of MARC, with a separate layer that simply reads/writes from MARC as an encoding to an internal RDA-data-vocabulary based representation, then the same software system could even deal with MARC and some non-MARC RDA encoding simultaneously. The challenges here are again the mismatch between MARC and RDA — not only the loss of ‘granularity’ when putting RDA in MARC as above, but possibly some crucial data elements that are in MARC but aren’t yet actually included in the RDA data vocabulary at all. (And incremental changes to RDA data vocabulary to match anything ‘important’ in MARC currently left out would be one way forward on that side). But a future encoding format based on RDA won’t have this problem — precisely because the data vocabulary of RDA has been formally described (thanks a lot to the DCMI/RDA group for much of this), so it is straightforward to make sure your encoding format matches it properly, and that multiple encoding formats that match it properly can all be converted to and from one another.

So it’s a challenge, for sure. But I can’t think of any way to approach it _except_ trying to separate the encoding from the specification of data vocabulary and entry rules (those latter two should themselves ideally be seperated as well). Which is what FRBR/RDA is trying to do. (I say FRBR/RDA, because FRBR was sort of the first basic sketch of a data vocabulary — RDA takes FRBR’s work and refines it a bit in the process of creating entry rules and making sure they align with the specified data vocabulary. At least in theory that’s what it does). If it hasn’t succeeded completely, we can keep working to refine it. Starting over from scratch… if FRBR/RDA couldn’t pull it off, what would make anyone think starting over from scratch they could have any better luck? Giving up on the attempt altogether as infeasible — would basically doom us to MARC forever.

It very well -may- be infeasible, but that would just doom us to MARC forever — but not really forever, it will just doom us to basically ceasing to existing as a professional metadata creating community, because MARC is simply too difficult to work with. This last fact is controversial on this list, some people think MARC is just fine and it would be fine to stick with it forever. I don’t personally know of any computer programmers trying to make powerful interfaces (instead of a clone of basically the same interface we’ve had for 20 years) that think that.

How do you replace the entirety of what we have now? Only by doing it one step at a time, decomposing the entirety of what we have now into seperate sub-components that relate to each other in specified defined ways, so each one can be enhanced or replaced on it’s own. This is systems engineering, and our entire cooperative cataloging practice is one big system.

By “system” here I don’t just mean software. I don’t even mean software plus record formats plus standards. I also mean the individual people and organizational actors that play a role here, how we deal with each other (or don’t) in sharing metadata. The whole big cooperative cataloging endeavor is a big system of inter-related moving parts, some of those parts being human actors, others being (written and unwritten) standard practices, others being rules enforced by shared databases like WorldCat, others being our various inter-related specification and standards documents, our data formats, our software.  A system this large becomes very hard to change when it’s not “designed” carefully — when any change to one part of it has unexpected implications for all the other parts.  The only way to deal with that is to try and design individual parts to be isolated so they interact with the other parts in clearly defined ways.  So that changes to one part have predictable implications to the other parts.  That’s engineering, whether you’re engineering a building, engineering a piece of software, or engineering the complicated multi-piece system that is our collective cooperative cataloging.


This entry was posted in General. Bookmark the permalink.

4 Responses to de-coupling the metadata system

  1. Céline says:

    Some very interesting observations about what RDA might achieve in the steps away from MARC. I particularly liked your last paragraph about the interdependent, interconnected system that is cooperative cataloguing

  2. Jenn Miller says:

    Yes! ::fist pump::

  3. Esther Arens says:

    Thanks for this very practical approach: “[…] if FRBR/RDA couldn’t pull it off, what would make anyone think starting over from scratch they could have any better luck?” Yes, the idea of starting with a clean sheet is nice. But it is as nice as it is unrealistic – our ‘sheets’ are already filled with MARC and its derivatives a million times over. So, one step at a time and let’s get one with things.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s