two kinds of data

I’m finding the discussion on the RDA listserv regarding ‘kits’ and RDA content/carrier/genre (ie, form/format) vocabularies to hit on some interesting pragmatic theoretical questions.

I suggested in one post that the RDA-intended MARC 336, 337, 338 should not be considered fields that should be displayed (or indexed) to the user verbatim, but should be considered a controlled vocabulary that should be transformed by the system before display, and in deciding what limits or collocated filters to provide, etc.

I think if we consider them such, it makes a lot more sense. And the reason they’ve ended up such, is it turns out that actual ordinary people’s conceptions of the ideas of format/form/genre/type-of-thing are inherently sloppy and inconsistent, even at times mutually contradictory. So you can try to encode form/format/carrier/etc in a way that matches user’s conception, and end up with something internally inconsistent, in which case your only choice ends up being displaying/indexing them verbatim. Or you can come up with your own internally consistent ontology for this stuff that captures what you think the significant dimensions of form/format/carrier are in a consistent and rational way — and then make it possible for the system to transform those things in various ways for display and indexing in differnet contexts, different collections, different user communities, different times.

I think the RDA-intended 336, 337, 338 are best understood as the latter, and I think they’ve actually done a pretty good job at creating an internally consistent ontology to capture format/form categories.

J. Mac Elrod responded by saying, if I understand him right, that it makes no sense to consider the 336, 337, 338 in that way, because they are not “fixed fields”, and “Isn’t this why MARC has fixed fields and variable fields? So that the fixed fields make sense to the computer, and variable ones to people?”

I disagree that this makes sense as a distinction in our data.

There’s no reason to restrict coded/controlled values to ‘fixed fields’. I’m not entirely sure what we consider ‘fixed fields’ — are the coded values in an eg 041 considered a fixed field? Those aren’t ‘fixed’ byte size like say the 006 or 008, but they are controlled from a finite vocabulary and not meant for verbatim echo’ing to the user.

I suggest we are best served by considering the new 3xx’s to be in the same category.

Now instead of saying’fixed fields are meant for the computer, variable fields for direct display and/or indexing’, I think what probably does make sense is to distinguish between transcribed fields, and fields whose values are taken from a finite controlled vocabulary.

Transcribed fields are meant for direct display and/or indexing; fields whose values are taken from a finite controlled vocabulary are generally intended for computer use; although the finite controlled vocabulary is created to make sense when shown directly to the user, for instance LCSH, other times it is not. Even in those cases where it is intended to make sense shown directly to the user (LCSH), it may make sense to show the user an alternate representation (for instance a non-English language translation of LCSH), and that’s what fields whose values are taken from a finite controlled vocabulary make possible, if done right. They also make possible collocation of sets of items under certain controlled terms, and more complicated set arithmetical combination of such sets, in ways transcribed fields do not.

Now, those new 3xx’s seem to have some confusion, even in the mind’s of their inventors, about which category they fall in. I suggest that if we consider them to fall in the category of values from finite controlled vocabulary not meant for direct display to the user, then everything ends up making a LOT more sense. However, that is contradicted by the idea of allowing free text entry in those fields when none of the controlled terms are deemed suitable, as is apparently allowed, and is something that’s always smelled fishy to me precisely because it ruins the use of these fields as finite controlled values for computer use.

However, this contradiction can be mitigated if you did NOT use a $2 rdamedia for such free text entries — ideally, you’d provide an $a with the best-fit value from the established RDA vocabularies with a $2 rdamedia, AND you’d provide your free text invented “better fit” with a different or no $2. But again here we are fighting with MARC, as MARC gives us no way to enter two values like this, and what avenues it does offer will makes the unsolved problem of relating 336-337-338 even harder.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s