On form/format/media/content/carrier and RDA

In response to a long-running thread on RDA-L.

It turns out form/format/media/content/carrier are really complicated to understand.

It turns out that most people’s internal mental models for these things in fact aren’t internally consistent at all (even librarians). Which is kind of why AACR2/MARC turned into the mess it is around this stuff. All those slashes in “form/format/content/carrier” are there because the naive mental model isn’t even clear about what the different distinct qualities/aspects/axes are we’re considering, it’s a big overlapping muddle.

Which is fine if all you want to do is display exactly what’s entered in the record, but a huge challenge if you want software to take action based on form/format/content/carrier, including action like localized display labels or filters.

So RDA (and Karen Coyle’s efforts) try to make an internally consistent taxonomy for these qualities (content type; media type; carrier type), that can be actionable by software, including mapping to display labels that make sense for a given collection/user group at a given point in history.

A lot of work actually went into the taxonomy used by RDA. I think there was even a report by a working group, that did lots of analysis and evaluation of prior work, but I can’t find it now (was it FRBR-related?). If we could find it, it might shed some light on things for people trying to figure out what’s going on.

Which doesn’t mean what they came up with is perfect, but probably means you can’t easily do better without a LOT more work.

Still, what’s there is confusing. If there is an ‘impedence mismatch’ between what’s in RDA instructions and what ends up in MARC, as I think Karen is suggesting, that certainly makes it even more confusing.

The rationalization of taxonomies for these qualities is potentially one of the biggest benefits that RDA can bring — but it’s also one of the most confusing, and misunderstood. The work that Thomas Brenndorfer and Karen Coyle have done to understand and explain it is invaluable.

But this is potentially something the JSC should prioritize, and actually fund (or find a grant for) someone (Karen or Thomas?) to work on improving the situation of. It’s a big enough issue that it might take some concerted and funded effort to fix, and it’s got large impact on the ultimate success of RDA.

Efforts that might help

Here are things that I think can make RDA’s approach to these issues more understandable; more likely to be implemented; and implementations more likely to be successful:

1) Provide a good overview explanation of what the point and principles of RDA’s content/carrier/form vocabularies are. Provide a link to the research paper that established them (if I’m not misremmebering the existence of such a paper), and a summary of it. Provide some examples of things that we’d want software to do but were infeasible under the AACR2/MARC muddle, justifying the rationalization of the vocabularies that RDA does.

2) Make small tweaks to the RDA vocabularies, if deemed neccesary based on the actual experience of trying to work with them post publication of RDA. Doing a complete overhaul (for instance changing the taxonomic categories) is not going to be feasible on any reasonable timeline, but small tweaks based on experience since publication may be called for. (Either for errors in the original, or just things we’ve learned by experimentation since publication).

3) Analyze the serialization of the content/carrier/form vocabularies in MARC, make recommendation to MARBI for improvements on what’s already there. Again, MARC21 has been changed to accomodate these — but the changes may not have been quite right or sufficient, there may be additional tweaks that are now needed based on further analysis/experience. Lessen the ‘impedence mismatch’, so you don’t need to translate from RDA to MARC when entering in MARC; make sure no important information is lost.

(In particular, i worry that when there are multiple 336/337/338 sets in MARC currently, there is no good way to tell which 336 goes with which 337 goes with which 338. This is a huge loss of information, which reduces the actual benefit of this rationalized encoding while still leaving a large cost-of-switch!)

(I also worry that the liberal allowance for free-text entry for content-type/media-type/carrier-type will ALSO limit the benefit of this switch while still leaving a large cost-of-switch. The point of rationalizing these vocabularies is to allow software actionability, but you don’t get that with arbitrary non-controlled free text entries.)

4) Provide a recommended basic starting algorithm for translation from RDA’s content-type/media-type/carrier-type to user-display strings, that, as a base level starting point, is more-or-less consistent with AACR2. This is a non-trivial thing to figure out for oneself. If RDA can provide a basic suggested algorithm, this will make it more likely for people (such as ILS vendors) to implement appropriately, and make it easier for everyone understand what’s going on with the content-type/media-type/carrier-type vocabularies, their intent. It should be clear that this algorithm is not a requirement or even a standard, it’s just one starting point RDA provides to ease implementation.

This entry was posted in General. Bookmark the permalink.

7 Responses to On form/format/media/content/carrier and RDA

  1. Dan Scott says:

    In response to a concern you expressed in proposed effort #3, fields 336/337/338 can be reliably linked through the standard mechanism in MARC of subfield $8 – the “field link and sequence number” (http://www.loc.gov/marc/bibliographic/ecbdcntf.html), if I understand the mechanism correctly.

    Simple example; in the following fields, the first pair of 336 & 337 are link pair #1 (expressed by $81), and the second pair of 336 & 337 are link pair #2 (expressed by $82).

    336 ## $aperformed music $2rdacontent $81
    337 ## $aaudio $2rdamedia $81
    336 ## $atwo-dimensional moving image $btdi $2rdacontent $82
    337 ## $avideo $bv $2rdamedia $82

  2. Esther Arens says:

    Good summary of what’s needed. Totally agree!

  3. Kathleen F. Lamantia, Technical Services Librarian says:

    This is excellent. Thank you very much for summarizing and for making concrete, coherent suggestions.

    We plan to just delete the 336-338 fields. They will do nothing but confuse our patrons. We also plan to just continue to use 245|h and to enter those gmds which are long-established at our library. The Mat Type field (30) in Millennium produces icons in our catalog, so that our patrons can easily tell which type of material they are viewing. The 245|h does that for our staff.

    Even if our materials are searched in the wider net universe (interoperabilbity and all that) we feel that the descriptive terms (245|h) and the Mat type indicators will provide any information needed to those who might come upon our catalog entries.

    So we are all set, don’t need the 336-338s and don’t plan to import them.

  4. jrochkind says:

    I think that’s unfortunate, Kathleen. Providing an internally consistent and coherent approach to form/format/media/content is one of the biggest advances RDA can help us make. The current AACR2-based approach you are sticking with makes it infeasible to have much software action based on form/format/media/content, because the data is so muddled — it’s fine for simply displaying strings to users (“provide information needed to those who come accross the record”), but not for actually taking software action, like filtering or partioning or grouping or ranking. And in fact, form/format/media/content sort of attributes are qualities neccesary to have software action on in order to provide features users commonly want.

    Of course, if your software isn’t going to provide those features anyway, I guess it doesn’t matter. And the bigger problem is when your software is incapable of doing anything with 336-338 _but_ displaying them “as is” to users. Indeed this is not the appropriate thing to do with them, which is why my #4 recommendation above is perhaps the most important.

    But if everyone ignores 336-338, then the data doesn’t exist for software to be written to take advantage of. It’s a vicious circle, that we are trapped in with our cooperative cataloging. Software can’t do sophisticated things users want because the data doesn’t support those things; nobody will write data that supports those things, because the software can’t deal with appropriately.

    I would like to propose a solution to this connumdrum, but I think we may simply be out of time, out of luck, and the cooperative cataloging endeavor doomed to pretty much become irrelvant and fail, never able to provide records appropriate to 21st century needs. It is a sad thing.

  5. Irvin Flack says:

    Hi Jonathan
    Is this this paper you were looking for?
    RDA/ONIX Framework for Resource Categorization
    From the Background section: ‘In the course of discussions held in October 2005 between the Joint Steering Committee for Revision of AACR and representatives of the publishing industry in the UK, the two groups identified resource categorization as an area of mutual interest and one in which there is substantial potential benefit to be gained through cooperation. A proposal for a joint initiative was subsequently approved and funded by the organizations sponsoring the development of RDA and ONIX, with additional support from the British Library.’

  6. Kelley McGrath says:

    Belated comment on subfield 8. This is more of a theoretical solution than a practical one since I know of no system that makes them easy to enter in bib records (unlike holdings) nor any system that is capable of making use of them once they’re entered. In fact, someone told me that OCLC’s Connexion won’t even take $8 (won’t validate). With the coming disaggregation of music “subject headings,” there’s interest again in $8 and maybe it will actually get to the point of implementation.

  7. LadyJane Hickey, Coordinator of Bibliographic Services Librarian says:

    I’ve seen subfield 8 used many times at the beginning of a line, as the first subfield. It is used that way in MARC Holdings and also in linking entries for foreign language fields that are being linked to related fields with a transliterated text. But, I’ve always seen them as the first subfield, before subfield a.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s