The Purposes of ‘Subject’ Vocabularies

LCSH, LCC, DDC, Ulhrich’s subject headings, BISAC, Ranganathan’s Colon Classification, Bliss Classification (2), Amazon’s subject headings: All are examples of ‘subject’ controlled vocabulary.

I put ‘subject’ in quotes because in reality most, if not all, of these examples include terms to capture ‘aboutness’ as well as terms to capture discipline (ie, perspective), and genre (and in some cases form, format, and intended audience). (Yes, Dewey sometimes captures ‘aboutness’ and LCSH sometimes captures disciplinary perspective. Take a look.)

I have been interested for a while in exploring the purposes of these types of vocabularly. I think they are not as clear and simple as we might be used to assuming. I wrote a (too) long paper about it in library school, which I’ll attach here. I actually wrote this before I had seen NCSU’s Endeca implementation; I’d have written it differently after; but I think this discussion is very relevant to understanding effective use of controlled vocabularies in facetted navigation. Recent discussion on NGC4Lib regarding these types of vocabularies further emphasizes, to me, the importance of considering the functions.

In my paper, I argue that in looking at these vocabularies from the perspective of functions or purpose, the traditional line between ‘classification’ and ‘subject vocabulary’ isn’t actually that clear, but instead we have a number of purposes (not just two) which a given vocabularly may serve better or worse.

The paper is awfully long, so I’ll also now summarize my suggestion as to an initial draft taxonomy of functions. (These functions admittedly overlap in some ways, but I still think ) (The next step, to determine what features of a vocabularly fit what functions or purposes–is only touched upon in the paper).

1. Class Retrieval: Identifying a class (or term) that matches your interest, and then assembling all documents that have this term assiged.

2. Browsing: A somewhat vague notion, but some sort of exploratory or investigatory, probably iterative, interaction with a corpus,

3. Relationship Navigation: Over 100 years ago, Cutter wrote that “one can move up or down a set of records that are displayed in call number order, to broaden or narrow a search.” Of course, using call number order is just one way of accomplishing this function, and hiearchical relationships of broadening and narrowing are also just one kind of relationship that can be followed.

4. Identification: The Identification function is served by listing assigned class or term information on a record so that the user knows more about the nature of the document indicated. For instance, traditionally, subject tracings on library cards served this function.

5. Locating: This is the traditional function of traditional classification as a ‘shelf location device’: A means to identify exactly where to find a given known document.

6. Ordering: Providing one or more useful arranged sequences of documents. This may mean the ordering of a retrieval set online, or of a catalog, and we also include here the traditional physical shelf ordering function of classification.

7. Surveying: Knowledge organization systems can be used to allow the user to get a general overview of what exists in a corpus.

8. Dealing with a large result set: See the way recent ‘facetted’ interfaces allow large result sets to become more manageable by faceting on ‘subject’ vocabulary.

9. Keyword match enhancment: When a user executes a simple keyword search, English words from classification schedules or thesauri attached to bibliographic records will increase the recall of the result set, if those words were not otherwise found in metadata or other indexed content.

10. Negotation: “to give aid to the searcher in his choice of search terms” (Vickery). To help the user better understand what she is looking for, to be able to put her information need into words by seeing what the options are.

[Questions for the reader: Which of these can ‘folksonomy’ tagging conceivably do? Only a certain kind of tagging? Which of these require a compact notation? Which of these require hiearchical, associative or other relations between terms/classes? How do our traditional library vocabularies do in practice for each of these? What would make them work better? What features make them work as well as they do? Which of these functions are more important to our users? How do we know? Which of these functions do our traditional OPACs serve? The addition of facetted navigation? ]

  1. Irvin Flack says:

    My hero in this area is AC Foskett (Subject Approach to Information). He doesn’t draw any sharp distinction between subject headings and classification, rather between pre and post coordinate ‘indexing languages’ (it’s all indexing, dude!). LCSH and DDC both fall under the former category but differ in their arrangement: alphabetical vs classified. Also, LCSH can be seen as a glorified index to the LCC, (cf Dewey’s ‘relative index’).

    One point that came home to me strongly on re-reading him recently was that the indexing languages that survived in the 20th c were not necessarily the best (the race is not to the swift!) but the ones with the strongest maintenance systems behind them. So intellectually impressive efforts like the UDC and Ranganathan’s CC withered while the well-backed LCC, LCSH and DDC triumphed.

    But there has been a dramatic climate change in this “post-meteor strike” digital age. I think the time of the big lumbering indexing languages might be passing and those little scurrying furry taxonomies and folksonomies at our feet may inherit the earth.

