Two meanings of “Identifier”

Two meanings of identifier = Two Functions of ‘Identifiers’ => Fulfilled with only one mechanism in traditional anglo-american cataloging.

And the harm this causes.

This is an essay I recently posted to the RDA list and the FRBR list, in response to an essay posted by Martha Yee. I think my essay can stand on it’s own to some extent. I think it starts to get at some of the failure to communicate that’s been occuring in certain cataloging discussions….

Karen beat me to it, but I’d been drafting this response to Martha Yee for a couple days. Sorry it’s so long, but I’ve learned not to assume that readers share the same assumptions as me, so explicit clarity is required.

Reading some of Martha Yee’s articles in library school was invaluable to developing what understanding of bibliographic control I have. I think Martha has, in particular to this discussion, makes a very valuable contribution by frequently reminding us that our ‘primary access points’ are best thought of as ‘work identifiers’ and ‘author identifiers’.

I think Martha’s recent comments on FRAD are a good start to important dialog, but I entirely disagree with some of her conclusions.

I think the best way to get at this disagreement is to talk about what I see as two functions of our traditional primary access points (I think Martha rightly refers to these ‘primary access points’ as ‘identifiers’, as I’ve discussed in a different way in a post on the RDA-L list, 14 Feb 2007,

Two Functions/Meanings of ‘Identifiers’

identifier(1): To serve as a unique ‘key’ pointing to a foreign entity/record. The fact that the exact string ‘Burroughs, William S., 1914-1997.’ (with no variation whatsoever, or else the system is harmed) appears in a certain place in multiple records allows our systems to unequivocally tie these records to a particular authority record, and thus to know also that these are all written by the same particular person. If every record written by this person uses this exact uniform string, all can be correct associated. Use of identifier(1) is mainly as a means of recording/encoding a _relationship between things_.

[When the metadata community says ‘identifier’, they almost exclusively mean
this identifier(1) only, but when the library community talks about
‘identifier’, they often focus more on…. ]

identifier(2): To serve as a _label_ to provide on an interface to the user (card served as an interface to our systems just as software on monitor now does), to allow the person reading the interface to recognize and identify that person, or a work by that person etc.

Our ‘primary access point’ identifiers served both these purposes—in the print world by being a label on a card (or other surrogate or item) [identifier(2)], as well as by providing collocation when items or surrogates were filed by that ‘primary access point’ [identifier(1)]. In the computer world—as a label, as a means of filing in a ‘browse’ display, or occasionally by actually collocating for the user behind-the-scenes by matching on that exact string only, once a ‘primary access point’ textual identifier is chosen (this latter becomes the most useful and important mechanism in the digital environment, although our systems don’t always act like it currently).

But these are two really two distinct functions, that we tried to serve with one device/mechanism. But these functions or purposes don’t always live well together. What in the pre-computer world was a useful economy of serving two purposes with one mechanism, in the digital world is a liability. One simple example to illustrate the harm and limitation of tying these two functions into one device is thinking about our international world, and multiple scripts.

Any record by Burroughs throughout the world ought to have the same identifier(1), so that these records can exist in as large a shared cataloging environment as possible. But in a locale with non-roman script (say China), the _label_ shown to the user to represent this person, the identifier(2) ought to, when the data is available, be shown in the local non-roman ‘vernacular’ script. So different displays in differnet contexts need different identifier(2)s, but it is highly desirable for our overall system if we all share the same identifier(1) regardless. There is no need for an identifier(1) to _mean_ anything, it can be a meaningless number, so long as it is used consistently to establish a relationship.

The particular example of different scripts is just a useful example meant to illustrate a principle important generally, of seperating identifier(1) and identifier(2) functions. These two different kinds of ‘identifier’ can inform the following discussion.

Author as Important Property of Work

So as to Martha’s point 1: The author of a work is indeed an important property of a work. An important property of ‘Naked Lunch’ is that it was written by a particular person, the famous Wiliam S. Burroughs. That FRAD treats this as a relationship does not mean that FRAD says it is not an important property.

But the fact that his first name is written “William”, middle ‘S.’ and his last name “Burroughs” is not an attribute (or property at all) of the work, it is a property of the historical person. As is the fact that he was born in 1914 and died in 1997. The fact that ‘traditional library system’ uses a textual author identifier[1 and 2] representing this person that’s written exactly as “Burroughs, William S., 1914-1997.” is in fact also an attribute of the person himself, the instance of the Person entity. As is the fact that his name is written however it might be written in Chinese script. And some other hypothetical string used as an author identifier in some other country’s code/practice, also an attribute of that Person.

That the Work representing Burrough’s _Naked Lunch_ is written by that particular Person must be encoded in our records somewhere using an identifier(1)–that is in fact the function of identifier(1)s, to record _relationships_ in an umabiguous way.

That doesn’t mean that knowing author of a work isn’t important to properly understanding what work one is looking at. And, of course, in order to give the reader this understanding, a label must be provided displaying this information. By way of an identifier(2). If that string is stored in the record for the Person, that doesn’t mean it can’t be displayed in a listing of works too, so long as the works are properly related to their authors. Does that string serving as an identifier(2) always need to be exactly the same in all contexts? Well, we have at least one obvious ‘no’ answer: For non-roman scripts.

Preferred vs. Non-preferred: in Identifier(1) and Identifier(2)

Everything that really is written by Burroughs needs to have the same identifier(1) (or at least an identifier(1) that can be unambiguously tied to the _same_ Person instance in our systems). This is how the system can know they are all by the same person, and prevent the same person from appearing in a list “twice or three times under two or three different names”. This much is certain, and FRAD and any other guidelines or standards we adopt must make this clear.

But that doesn’t mean that only one identifier(2) can ever be shown to any user anywhere. This is I think the point of what FRAD is trying to get across–not that the same author might show up in a list more than once, but that the model will allow environments where no one canonical identifier(2) label must be chosen, eg. where the English script and Chinese script representation of an author’s name might both be equally acceptable as an identifier(2). [That doesn’t mean that both will be displayed to the same user on the same screen in different positions, indicating different people! That would be a mistake. ]

Whether FRAD gets the point accross about what it’s doing is another question. (And keep in mind that FRAD is just a model, it’s not meant to be a code, or rules, or guidance, but instead a language for expressing those things–a particular code can certainly constrain more specifically than everything the model, meant to work with various possible codes, allows. )

I think part of the confusion is that our models and standards need to start being clear about the two functions of identifier(1) and identifier(2), and that our traditional ‘access points’ have served both functions–but may be replaced, going forward. I think Martha is absolutely right that these documents should refer to these strings as ‘identifiers’ (of works, and of authors/agents). That would clear things up immensely.

So then, What Is An Identifier?

But I think Martha is wrong that the term “identifier” should be _reserved_ for these traditional ‘access points’. In fact, it is quite likely that new devices such as URIs applied properly will function much better to serve the purpose of identifier(1)s. There are reasons that nobody constructing a new system in the age of computers (from ISBN to ‘metadata’) has used meaningful strings like this to serve the purpose of identifier(1), instead choosing numbers or URIs etc.

Another problem with the current conflating of identifier(1) and identifier(2), in addition to the non-roman script issue, is exemplified by the fact that my catalog in fact has records under “Burroughs, William S., 1914-1997.”, as well as under “Burroughs, William S., 1914-“–that are the same person. The end year might be usefully added to the _display_ of the user for human disambiguation (identifier(2)), but the fact that the actual identifier(1) needs to be changed when a person dies is an undesirable design, and my system fails to meet the unneccesary challenge posed, and fails to collocate all those works proprerly.

Certainly we need all those things Martha mentions to serve the purpose of identifier(2). Some are now arguing that identifier(2)s don’t need to be ‘pre-coordinated’ by the cataloger, but can be composed by the system from smaller elements recorded seperately (first name, last name, year of birth, etc.)–and still serve those purposes sufficiently. Others more traditional would argue that certainly catalogers need to create a ‘canonical’ identifier(2) according to carefully constructed rules applied judiciously, and used canonically everywhere (or at least in a certain language/script area).

REGARDLESS of the hypothetical resolution of this debate about identifier(2), what is or isn’t neccesary to show the user a label for a Person or Work sufficient to allow them to distinguish—good design for the digital age requires seperating identifier(1) and identifier(2) in two seperate mechanisms, and identifier(1)’s will IN EITHER CASE still need to be _assigned_ at the moment of cataloging, to unambiguously link instances of entities (such as authors to works). Even if identifier(2)s are composed by the system from smaller elements.

I’m sorry I couldn’t manage to make this essay shorter, I hope it helps to illuminate what I believe are some important concepts.

2 thoughts on “Two meanings of “Identifier”

  1. In further discussion in RDA-L, I/we sort of settled on the term “linking identifier” for what I just call “identifier(1)” above, and “name/label identifier” for “identifier(2)”. Not entirely satisfied with those terms, but they’re probably better than numbers!

    Perhaps I should edit this essay like that, which might make it less confusing.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s