In a conversation that no doubt continues to be frustrating for all involved, because we all think we’ve had the exact same conversation a dozen times before, Bernhard Eversberg wrote:
No, to learn MARC does not consist of learning all the numbers and codes, that’s rather trivial. You have to learn the precise meanings and the concept, and that’s the same with verbal tags.
So, okay, here’s what this means to me: You are saying that MARC serves as our metadata _schema_ or _vocabulary_. It is NOT just a serialization format or an exchange format, it is in fact our schema, it defines what elements are available and what they mean.
Now, to me, THAT is in fact the biggest problem with MARC. We’ve taken what was originally designed as simply a transport format, and turned it into a schema. In the process, by having ONE standard that is BOTH our metadata schema and serialization format, by entangling these two concepts, it makes any kind of movement or inter-operability much more complicated. It makes it nearly impossible to have a serialization of our data in some _other_ serialization format in a ‘lossless’ way, because the serialization format and the schema are so entangled.
It makes our ‘content guidance’ like AACR2 _very_ difficult to understand in practice, because the only reasonable way to write content guidance like AACR2 is to refer to a metadata schema. AACR2 refers to ISBD — which “officially” was designed as a metadata schema (although we/they didn’t use that term way back then, that’s what it was; library people were actually doing ‘metadata engineering’ FIRST). But in _fact_, in most/all AACR2-using countries, it’s MARC21 that BECAME the true metadata schema. AACR2 keeping up the fiction that it’s ISBD makes these various parts of our metadata control regime mesh with “broken gears”, making everything _much_ harder to understand for both library and non-library sector people (who might want to inter-operate with our data).
It makes it insanely complicated to make any changes to ANY of the parts of the metadata regime, because the parts inter-relate in ill-defined ways. If what you really need is a change in our ‘metadata schema’, does that mean you need a change to ISBD, MARC, or AACR2? Or all of the above?
RDA _theoretically_ uses FRBR (rather than ISBD) as the referenced ‘metadata schema’. This to my mind is actually the _most important_ part of RDA, the The problem is that the RDA effort didn’t really realize how important and how challenging this was, they didn’t really realize what it entailed, and didn’t take it seriously — perhaps until fairly recently. FRBR needs/needed some work to do the job, and it needs to effect the whole of how RDA is structured. Diane Hillman is waging an epic struggle to make RDA take seriously the idea that (a further formalization/specification of) the FRBR model is the metadata schema which RDA applies guidance to. If she and RDA are successful, that will be the biggest contribution of RDA, and will make possible alternate serialization formats that are still “high fidelity”.
Jim Weinheimmer makes a followup post that I think to him is about why you can never move your data out of MARC “losslessly”, but to me is instead evidence of exactly the kind of problems you run into when you aren’t clear about your metadata schema/vocabulary as distinct from your serialization format. Jim says:
A few points.
Here is an example in the mapping from MARC21 to MODS for uniform titles from http://www.loc.gov/standards/mods/mods-mapping.html:
130, 240 $a$d$f$k$l$m$o$r$s
730 $a$d$f$k$l$m$o$r if ind2 is not 2
<title> with <titleInfo> type=”uniform” and
130, 240, 730 $n (and other subfields following as above)
130, 240, 730 $p (and other subfields following as above)
130, 240, 730 $0 add xlink=”contents of $0″ (as URI)
Now, compare this to the MARC Guidelines for the 240 field:
and the LC Rule Interpretations (these are the additions to AACR2, not AACR2 itself) for uniform titles:
Here is the Uniform title in UNIMARC:
A non-cataloger will probably ask: What is a uniform title? And that would be the correct response because there are different types of uniform titles for different purposes and they are terribly complex. It should also be accepted that there are legitimate reasons for this complexity. The above rules work together intimately to ensure standards for both the coding and standards for the information. (Except for the UNIMARC, which has different standards)
So here is where the rubber meets the road when it comes to talking about “computational thinking” and cataloging. What we’re doing when we’re creating standards for “bibliographic control” or “metadata engineering” — we’re doing data modelling for a computer environment. And there are 50 years of practice, experience, and theory on how to do data modelling for a computer environment. And if you ignore all that…. well, you’re trying to re-invent the wheel, and you’re probably not going to come up with a very good wheel.
Now, I think there ARE some things that aren’t entirely solved in data modelling practice, trying to do things on the web raises new issues that communities are trying to solve, RDF and Entity-Attribute-Value modelling in general is one approach to some of these issues which itself raises some questions (that in my opinion) are not entirely solved. But these are things built upon 50 years of practice in data modelling for the computer environment.
At one point, library cataloging was ahead of everyone else in structured data modelling, we were kind of the only game in town. That point ended around 50 years ago. And we’re still data modelling like computers don’t exist, forget data modelling for the web in particular. There are still challenges and unanswered questions, I don’t (some on code4lib might disagree) think these are all answered questions. But there are answered questions, you can’t engage with this without understanding the lessons of 50 years of data modelling for the computer environment, and that’s what discussions on NGC4Lib and RDA-L often seem to be doing to me.
4 thoughts on “serialization vs metadata schema/vocabulary”
Jonathan, your input on this thread has been invaluable — and it *has* been frustrating for exactly the reasons you detail.
One thing that keeps coming to my mind is how nebulous the specific functional requirements are among the cataloguing community. Getting data out of MARC *losslessly* is implied, but nowhere is it explicitly said that this is necessary or, if so, why. In fact, LoC is very clear that MARC-to-MODS is lossy.
What would be the purpose of a lossless crosswalk? So that we can convert back to MARC at some point on the future? Hopefully not.
Thank you for this post (and others). Your comments help clarify the issues around cataloging and metadata engineering.
I wonder what you think would be the likeliest path or paths to actually creating an alternate serialization to replace MARC with?
There is good work going on. RDA is problematic in many ways, but it is a step in the right direction. Diane Hillman is doing good work with RDA vocabularies, and Karen Coyle is doing good work thinking about a way to serialize RDA.
But it strikes me that a non-MARC serialization format might be better based directly on FRBR than on RDA. And that serialization would have to interpret FRBR and extend it some.
I know of no one who is doing that work. Just my ignorance, I suspect.
Again, I thank you for writing clearly about an often confusing set of ideas and issues.
Matthew, I think the work Diane and Karen are doing with RDA _is_ meant to be the “interpret FRBR and extend it some”. They are doing that work not as a ‘serialization’ but as a ‘schema’ or ‘vocabulary’ (which is the approximate thing FRBR is meant to be too) — so it’s part of the “the RDA effort”, but that work is really “FRBR interpreted and extended and further formalized for the needs of RDA” as distinct from “RDA the set of content guidelines for filling in the blanks.”
So in this case, I think “starting with RDA”, if it means Diane and Karen’s work, may be exactly what meets your suggestion after all.
Thanks for the kind words.
Actually, I would love to have started with FRBR/FRAD + attributes, then applied RDA to that… but they weren’t available in that order, and I have no idea how closely RDA tried to follow FR attributes (I’m working on a comparison of RDA data elements and FR attributes, but FRAD is only now being added to the registry and isn’t complete). In my mind (and I think in Diane’s as well) RDA should be an Application Profile based on a declaration of the model (entities, relationships) and guidance rules. If, as I suspect, RDA and FR elements will not be the same, then it’s going to be hard to reconcile. It would be easier to reconcile if we had the FR –> RDA/AP, because we could add RDA extensions to FR in the RDA/AP. Trying to meld RDA and FR without recognizing that RDA must be based on FR, not applied to it after the fact, is going to create a mess.
We are also hindered by the fact that we have no access to the RDA or FR process other than reading documents that are issued and commenting on them (usually too late to have any real effect on fundamentals). The process has built into it some of the same assumptions that have made our data problematic: that cataloging is a thought exercise that doesn’t need to adhere to data management principles. Or that data management can be applied after the fact.
When I do get things like comparison tables done, I post them at http://kcoyle.net/rda/ — partly for my own convenience, but feel free to check in and see if you find anything useful for your own analyses.