Of MARC, serialization formats, and element schemata

A topic I’ve ranted about before, and was inspired to rant about again in an NGC4Lib thread, where Cory Rockliff wrote:

MARC, as is often pointed out, was conceived of as a data exchange format; and while our library systems, bizarrely, continue to provide the MARC tags view as the default input mode for catalogers, they almost universally store that data internally in an RDBMS. Do we no longer need  a record-like data exchange format if our systems use, e.g.,  triplestores internally (a prospect which is a ways off, I think)?

We definitely still need to exchange data.  We need to do more, not less, sharing of data, cooperative cataloging, etc., and you can’t do that without a standard exchange format.

We just need an exchange format which _really is_ just an exchange format, and does not become our standard element vocabulary schema too, or our rules for entering data.  MARC has become all of this.   Which is one of the reasons i’ve thought for a while we desperately need to get away from MARC — not because MARC is not capable of expressing what we need (it is capable of expressing MOST but not all of it, although not always easily or prettily), but because moving away from MARC is the only realistic way to conceptually disintangle our data transmission format from our data vocabularly schema(ata), from our rules and guidelines for entering data.

What will that standard exchange format look like? Do we need a “record model”, or can we we just get by with free-floating RDF atomic assertions?  I don’t know.   I am less confident hitching our boat to RDF than some.  And certainly our data practices should not REQUIRE RDF among all technologies, but be somewhat agnostic toward semantic web technologies, IMO.  If its’ well designed data though, it should _support_ serialization as RDF.   If we DO end up with RDF-compatible data, than the standard transmission format COULD be expressed as RDF, and probably other ways too. (Note that it’s not enough to simply say “RDF” — to get to a standard exchange format, you’d need to constrain a whole bunch more things, including RDF vocabularies used, and RDF serialization formats recognized.)

But I agree it’s just as likely that we’ll want “packages” of metadata in the form of “records” for some time to come, not JUST bundles of atomic RDF assertions.

But I’m still not sure MODS is very helpful.  My reasons for not being enthused with MODS are NOT, as Cory suggests, “because it embodies the hieararchical document model of XML.”  I’ve got nothing against a hieararchical document model, and I’ve got nothing against a “record” package based exchange format.   My reasons for being suspicious of MODS are because it STILL holds too closely to MARC, it’s basically just a slightly prettified MARC.   It doesn’t allow one to do _very_ much more than MARC does, and it still makes it harder for us to conceptually seperate our _transmission_ format from our data schemata and rules for entering data.

So the key thing here is that our _transmission format_ ought not to matter very much.  If we can get down our element schemata in a formal and clear and flexible way, and we can provide our guidelines for entering data in a transmission-format-independent way…. then the transmission format(s) are _easy_ after that.   Those are the hard parts.  Get those ducks in a row, and it’s no longer a very hard problem to create one, two, many transmission/exchange serializations that all work fine.

Getting those ducks in a row is (at least from one perspective) the goal (or one of the goals) of RDA.  I am not certain how well it has succeeded, I have some trepidation.

Thinking that mere exchange/serialization format is important, that it will determine how we build metadata — is a symptom of getting confused about the role of a serialization format vs the role of a formally defined element vocabulary/schema.   It’s the latter that’s hard, it’s the latter that needs to be flexible enough to handle the various realistic possiblities we see for how we manage metadata.  The exchange/serialization format is not so important or hard.  That MARC seems so important and central to ALL our metadata management is testament to how out-of-control MARC has grown to be WAY more than just a data exchange format, which is a problem for data interoperability and machine use in our contemporary environment .

2 thoughts on “Of MARC, serialization formats, and element schemata

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s