Inventing how to think and talk about metadata: DCAM and RDF

So, there’s some discussion going on about DCAM and RDF, based off of Stuart Wiebel’s blog here, here, and here. With some supplementary commentary by Peter Murray.

One fundamental question: Do both RDF and DCAM do the same thing? (I’m not sure if I mean RDF alone here, or RDF plus a ‘suite’ of things specifically designed by the RDF development community to supplement RDF, like OWL, which I know nothing about!). Do they do different things? What is each ‘framework’ intended to do anyway? [Okay, I guess that wasn't just one question].

Stu Weibel suggests that they do not both do the same thing, although there may be some overlap, but they are generally compatible and complementary. And PeteJ (Pete Johnston I think) seems to disagree, and think that both RDA and DCAM have very similar roles, although he thinks that each does something the other doesn’t, but I’ still confused as to what that something is in his analysis–and if that something matters.

It has occurred to me before that there’s an astounding amount of difficulty over being able to put this stuff into words and know what we’re talking about: Over our mental models of metadata and metadata control, and the words we use to talk about it. This makes sense–metadata–and especially the frameworks we use to define metadata–is incredibly abstract, existing not anywhere in the concrete world. The only way to talk about such a thing with precision is to have agreed upon shared mental models and terms of arts to describe them. And what people are trying to do is new enough that we’re still working that out, and aren’t all on the same page yet. In fact, I think 90% of the challenge in developing a metadata framework or control regime is entirely composed of developing a shared analysis and language for talking about the components of metadata control.

So, one thing to take from this is that this stuff is complicated to think and talk about, new, still being worked out, reasonable people can disagree. So don’t feel stupid because you are confused–and don’t trust anyone that tells you they have the one true “right” answer. We’re still working on it. (Keep that in mind in debates over RDA and it’s DCAM integration; I am equally skeptical of people that claim that they are absolutely certain of the right way to do this–as I am of people that think engagement with these questions is not necessary at all, no need to do it at all.)

Talking about, or talking around?

But it also strikes me that in that conversation on those blogs, the discussion was frustratingly short on specifics. Everyone was talking of DCAM and RDF without actually talking about them much at all. Like, mentioning actual properties or aspects of DCAM and RDF. To me the central question is, again: What things does DCAM intend to accomplish? What things does RDF intend to accomplish?

That is the pre-requisite to answering: Are there things in one set that aren’t in the other? Even if so, are the two sets (of things intended to accomplish) actually rationally separated (‘complementary’ says Stu), or are they separate as just an accident of history and should merged into one framework that does all of it (in a well-designed modular way, of course)? If not (and the intended ‘things to accomplish’ of each are the same), are there reasons nonetheless for two ‘parallel’ frameworks (or, as I’d like to call them ‘metadata control regimes’) to exist? I have to say this seems like a tough argument to make; on it’s face, I am skeptical of any argument based on “Well, DCAM represents one community’s outlook, while RDF is more general” (which maybe is what PeteJ was saying), which to me does not seem sufficient justification for maintaining two parallel regimes (with the hit on interoperability as well as inefficiency of labor).

Then, with what mechanisms does each regime accomplish these things? How well does each regime accomplish what it intends to accomplish: If they both do the same things but in different ways, what are the advantages and disadvantages of each way (that might lead to justification for keeping them both, if both have advantages in different circumstances. I suppose. I still think that argument is an uphill battle.)

Developing a shared analysis and vocabulary for talking about metadata control

So how do we talk about these things in actual specifics, talk about DCAM and RDF instead of just around them? I think the start is being specific about what I’m calling “things to accomplish”. What this really means is defining a conceptual model for thinking about metadata and metadata frameworks (aka ‘control regimes’). What are the pieces? How do they relate?

Stu admirably begins to do that in an accessible way, but I think that very basic metadata model falls short. The “structure” category smells to me like the kind of “grab bag miscellaneous” category that sneaks into all of our schematic analyses, and when it gets too large means that there’s still analysis left to do to get rid of it. But it’s a start–you’ve got to have something to critique in order to critique. Another much more in-depth (and thus complicated and at the moment less accessible) attempt at a schematic for analyzing metadata and metadata control regimes into pieces can be found in “Towards an Interoperability Framework for Metadata Standards“–two of the authors of which are participating in this blog dialog (Mikael Nilson and Pete Johnston). Look at Figure 3.7 in particular for a schematic overview. I confess I don’t entirely understand what’s there, and find some of the choices of terms of art for the various components troubling, although I found it useful when I first found it (at a now dead URL) for understanding metadata control, and developing a vocabulary to talk about it .

So, okay, it’s time to start using the vocabulary we are developing. Mikael and Pete, you have a sophisticated vocabulary you have developed in that paper for talking about components of metadata control: Are all of those components accomplished by RDF (and associated RDF pieces)? Or, as Stu, suggests, are some of those components unique to DCAM and not RDF? How does that model compare to Stu’s model–where I think many of those components need to be rescued from the grab bag of ‘structure’, but I can’t tell where they fit in the Nilsson et al model.

It is only by actually working out a shared analysis and vocabulary for the components of metadata control that we can actually talk about these things, instead of just around them.

Elements of a Model: An beginning naive attempt at synthesis

My own jumble of components of metadata control as I understand it ; with my own understanding often confused and limited, and with all of this needing to be put into a coherent model of metadata control, right now they are just a confusing jumble) include, in no particular order:

[Two things which are both called 'data models' sometimes, even though they are very different things, so I'll try to avoid the word 'data model' at all.]

1. A domain information model: What are the elements of data in our domain? The DC 15 element set is a simple example. FRBR, which has multiple entities each with their own elements and relationships between them, is a more complex example. What Stu subsides as “semantics” (I think there is probably more semantics than this), and I think Nillson et al call an “element vocabulary”.

2. A metadata expression model. For instance, I would include the RDF idea of the subject-predicate-object atomic metadata statement. I think this is what Nillson et al call an ‘abstract model’. (Not sure if I’m doing RDF’s ‘abstract model’ justice, and don’t even know how to begin with DCAM’s). I suppose it is also another kind of ‘semantic’ framework.

Various kinds of serialization (what Weibel calls ‘syntax’) including but probably not limited to:

3. A record serialization format. Such as MARC, or MARCXML, or RDF-in-XML, or DC-in-XML (?). Weibel’s “syntax” (but I think we need to sub-divide syntax), and I’m not sure how this fits into Nilsson.

But then there are really sub-parts of this:

4. content-value-serialization. How is an individual data value fixated in a record? A date can be 20070219 or Feb 19, 2007, both representing the same content. An element from a controlled vocabularly can be recorded by a URI, a control number, or a human-readable ‘heading’. Weibel puts this under “structure”, not sure how or if it fits into Nillson et al. In a given overall record serialization, there can be any number of ways to represent a date, or any other class of value element, so I think this is a different level than above.

5. Numerous other aspects of serialization which should be broken out of the overall record serialization. A useful question to ask: Once we’ve decided RDF-XML, what _remains_ to decide in order for our applications to make sense of metadata? Weibel provides some useful examples under “structure”: “The boundaries of a set of assertions (what constitutes a record)”. “How is nesting managed?”. “Are metadata values specified by reference (URI) or by value (literal strings)?”. I’m not sure where this fits into Nilsson other than “application profile”, which is a confusing “grab bag” type category to me still. I’m not sure what facillities the RDF ‘suite’ has for controlling these things. I think Weibel is suggesting that it does not, but DCAM does. Most of these are essentially syntactic, even though Weibel puts them in ‘structure’.

6. Some of those things in Weibel’s structure are really semantic, not syntactic/serialized though. Whether nested information is possible is semantic (and perhaps part of what I’m calling the domain information model with regard to specific entities; or the metadata expression model in general), and a prior question to how to serialize it. Also included in this semantic “structure” category is specification of things such as “Cardinality – Can an element be repeated, and if so, is there a limit on the number?”–that’s really a semantic (domain information model and/or metadata expression model) question even before you get to serializing it in fixed medium.

7. Then related to all this is formal descriptions that allow _validation_ of a serialized fixed metadata expression. This can be both validation of syntax and semantics.

8. Content Guidance. From the library world, your instructions for how to determine a given value from the domain information model. Library cataloging has alrways been big on this, but the modern metadata world often ignores it, apparently assuming this kind of guidance is unneccesary, metadata creators can do whatever they please and it will work out. In some cases that’s certainly true. But in the library world, we have traditionally found it highly desirable to try to have some consistency here. Okay, our “domain information model” says we have an “author’s name”—from an item in hand, how do you determine the author’s name if it’s unclear, to make it likely two people will do it the same way? How do you decide which LCSH to apply to make it predictable. This is part of what AACR2 does, and RDA intends to do all of this(?) and only(!) this.

9. Value vocabularies in Nilsson et al’s unobjectional formulation. Ie, controlled vocabularies. LSCH, LCC, Dewey, AAT thesaurus, a formalized gazeteer, the AACR2 “General Materials Designation” allowed values, MARC relator codes, etc. etc. Use of a value vocabulary can come with ‘content guidance’, or contextual semantic restrictions (no more than N values may be applied)–well as a whole bunch of contextual serialization/syntactic instructions/requirements for a given application.

There’s probably still more I’m missing.

Back to RDF and DCAM

Which of these things does RDF and it’s related suite provide for? (And, to be clear, some of these things are things you expect the metadata framework itself ot provide for; others are provided externally and the metadata framework needs only to specify what external tool is being used; but a big part of a metadata control regime is a standard way to specify that).

I have no idea! I don’t understand either RDF (and suite) or DCAM enough to say! Ordinarly, I’d read documentation to figure these things out (books, web pages), or ask others who are experts in these things to tell me. But I’ve been unsuccessful figuring this out on my own (finding insufficient or confusing and contradictory documentation), and when the experts start engaging with these questions–to me as a naive observer, it’s almost as if they are all using the same words to mean entirely different things, I’m never sure exactly what they’re talking about.

About these ads
This entry was posted in cataloging, Theory. Bookmark the permalink.

3 Responses to Inventing how to think and talk about metadata: DCAM and RDF

  1. jaf says:

    Jonathan –

    It seems strange that you discount the “syntax, structure, semantics” breakdown – to me, that’s pretty much metadata 101. Stu’s not the only (nor first) person to talk about metadata in that framework, as can be found at this guy’s website -http://www.md3.org/ (looks like the project is somewhat old and not up to date….oh wait, that’s my old project……) :-)

  2. You seem to be going ‘back to basics’, Jonathan, so here’s an even more basic question: what is a data model and why should we want one?

    Stuart Weibel says “structure is the specification of the details necessary to layout and declare metadata assertions so they can be embedded unambiguously in a syntax. A data model is the specification of this structure.” Do you understand this? I’m afraid I don’t.

  3. Thomas Baker says:

    One short answer to why one needs a data model at all:

    “Metadata is (a form of) language.”

    If one accepts that, then in order to be linguistically meaningful, metadata needs a grammar. As data models, the RDF abstract syntax and DCAM provide that sort of grammar by specifying how meaningful “statements” can be constructed – statements that are “meaningful” in an environment where statements are
    increasingly being parsed and combined automatically.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s