In a conversation that no doubt continues to be frustrating for all involved, because we all think we’ve had the exact same conversation a dozen times before, Bernhard Eversberg wrote:
No, to learn MARC does not consist of learning all the numbers and codes, that’s rather trivial. You have to learn the precise meanings and the concept, and that’s the same with verbal tags.
So, okay, here’s what this means to me: You are saying that MARC serves as our metadata _schema_ or _vocabulary_. It is NOT just a serialization format or an exchange format, it is in fact our schema, it defines what elements are available and what they mean.
Now, to me, THAT is in fact the biggest problem with MARC. We’ve taken what was originally designed as simply a transport format, and turned it into a schema. In the process, by having ONE standard that is BOTH our metadata schema and serialization format, by entangling these two concepts, it makes any kind of movement or inter-operability much more complicated. It makes it nearly impossible to have a serialization of our data in some _other_ serialization format in a ‘lossless’ way, because the serialization format and the schema are so entangled.
It makes our ‘content guidance’ like AACR2 _very_ difficult to understand in practice, because the only reasonable way to write content guidance like AACR2 is to refer to a metadata schema. AACR2 refers to ISBD — which “officially” was designed as a metadata schema (although we/they didn’t use that term way back then, that’s what it was; library people were actually doing ‘metadata engineering’ FIRST). But in _fact_, in most/all AACR2-using countries, it’s MARC21 that BECAME the true metadata schema. AACR2 keeping up the fiction that it’s ISBD makes these various parts of our metadata control regime mesh with “broken gears”, making everything _much_ harder to understand for both library and non-library sector people (who might want to inter-operate with our data).
It makes it insanely complicated to make any changes to ANY of the parts of the metadata regime, because the parts inter-relate in ill-defined ways. If what you really need is a change in our ‘metadata schema’, does that mean you need a change to ISBD, MARC, or AACR2? Or all of the above?
RDA _theoretically_ uses FRBR (rather than ISBD) as the referenced ‘metadata schema’. This to my mind is actually the _most important_ part of RDA, the The problem is that the RDA effort didn’t really realize how important and how challenging this was, they didn’t really realize what it entailed, and didn’t take it seriously — perhaps until fairly recently. FRBR needs/needed some work to do the job, and it needs to effect the whole of how RDA is structured. Diane Hillman is waging an epic struggle to make RDA take seriously the idea that (a further formalization/specification of) the FRBR model is the metadata schema which RDA applies guidance to. If she and RDA are successful, that will be the biggest contribution of RDA, and will make possible alternate serialization formats that are still “high fidelity”.
Jim Weinheimmer makes a followup post that I think to him is about why you can never move your data out of MARC “losslessly”, but to me is instead evidence of exactly the kind of problems you run into when you aren’t clear about your metadata schema/vocabulary as distinct from your serialization format. Jim says:
A few points.
Here is an example in the mapping from MARC21 to MODS for uniform titles from http://www.loc.gov/standards/mods/mods-mapping.html:
130, 240 $a$d$f$k$l$m$o$r$s
730 $a$d$f$k$l$m$o$r if ind2 is not 2
<title> with <titleInfo> type=”uniform” and
130, 240, 730 $n (and other subfields following as above)
130, 240, 730 $p (and other subfields following as above)
130, 240, 730 $0 add xlink=”contents of $0″ (as URI)
Now, compare this to the MARC Guidelines for the 240 field:
and the LC Rule Interpretations (these are the additions to AACR2, not AACR2 itself) for uniform titles:
Here is the Uniform title in UNIMARC:
A non-cataloger will probably ask: What is a uniform title? And that would be the correct response because there are different types of uniform titles for different purposes and they are terribly complex. It should also be accepted that there are legitimate reasons for this complexity. The above rules work together intimately to ensure standards for both the coding and standards for the information. (Except for the UNIMARC, which has different standards)
So here is where the rubber meets the road when it comes to talking about “computational thinking” and cataloging. What we’re doing when we’re creating standards for “bibliographic control” or “metadata engineering” — we’re doing data modelling for a computer environment. And there are 50 years of practice, experience, and theory on how to do data modelling for a computer environment. And if you ignore all that…. well, you’re trying to re-invent the wheel, and you’re probably not going to come up with a very good wheel.
Now, I think there ARE some things that aren’t entirely solved in data modelling practice, trying to do things on the web raises new issues that communities are trying to solve, RDF and Entity-Attribute-Value modelling in general is one approach to some of these issues which itself raises some questions (that in my opinion) are not entirely solved. But these are things built upon 50 years of practice in data modelling for the computer environment.
At one point, library cataloging was ahead of everyone else in structured data modelling, we were kind of the only game in town. That point ended around 50 years ago. And we’re still data modelling like computers don’t exist, forget data modelling for the web in particular. There are still challenges and unanswered questions, I don’t (some on code4lib might disagree) think these are all answered questions. But there are answered questions, you can’t engage with this without understanding the lessons of 50 years of data modelling for the computer environment, and that’s what discussions on NGC4Lib and RDA-L often seem to be doing to me.