cataloging theory really is useful

As much as I’m sometimes frustrated by our common inherited legacy cataloging practices, I actually do think the cataloging theory developed by Lubetzky, Svenonius, Cutter, and others is still useful — sometimes you just need to ‘translate’ it to the modern environment.

I’ve been thinking about how having persistent unique identifiers (bib IDs) for our records is really important — but not generally prioritized in some of our legacy cataloging practice. There are a bunch of ways to explain why this is important (and it’s kind of obvious to the CS-perspective-inclined).

But I realized another way goes back to some language used in my cataloging class.  A cataloging record is called a ‘surrogate’ for the physical item described. That’s exactly what it is, even more so in the digital age:  it allows the physical item to be ‘projected’ into the digital environment as a digital object which is a ‘surrogate’ for the physical object (or sets of objects, depending on context you consider it in) it represents.

Perhaps this helps explain why a persistent bib ID is important using cataloging theory language.  As a surrogate for the physical object in the digital environment, we want to be able to link to the surrogate in different ways — from simply bookmarking it, to building more complicated ‘semantic’ relationships based upon it.  All of that depends on having a persistent identifier — a persistent bib ID — for the surrogate.  Changing the bib ID of the surrogate in the digital environment in unpredictable ways would be analagous to periodically changing where the physical item is physically shelved in unpredictable ways!  The internal unique identifier for the surrogate is essentially it’s digital “location”.

[That’s a bit of an oversimplification — giving the digital surrogate a reliable digital ‘location’ requires some layering on top of the unique internal ID, to give it a unique persistent URI too. But the pre-requisite for that is a persistent unique internal ID.]

[And, incidentally, for the semantic web geeks reading, this gets at some of my dissatisfaction with this focus on ‘real world objects’ vs ‘documents’ or whatever they’re currently calling the second class. I don’t think it’s at all a clear distinction, and can often get confusing right quick, and I think it’s probably a mistake to rely on such a confusing distinction for crucial parts of your ‘specs’.  A cataloging record is a ‘web document’, surely, but it’s also a surrogate (not JUST a ‘description’) for a real world object.  Sure, we can split hairs and talk about how to handle that. But the fact that it gets so confusing and abstract and hair-splitting and subject to debate worries me and makes me suspicious of relying on such a distinction for describing how to ‘do business’ in the sem web.]


6 thoughts on “cataloging theory really is useful”

  1. I too have been struggling with this distinction between “real world objects” and “documents” (*about* real world objects). It is quite simple if you refer to a person (real world object) and a web page (document) about that person.
    But as you say, a web page about a book for instance can both be a “document”(about something, the book), and also a “real world object” of its own (an article written by someone about something else). It just depends how it is used….
    In Semantic Web and URI theory there is a three-way distinction: the “resource identifier URI”, which can redirect to either a human readable HTML “document” about the resource, or to a machine readable RDF document defining the resource and its relationships.
    See my post where I have tried to make some sense of all of this.

  2. To me that http-range14 stuff is, like, this intellectual compromise which manages to split the difference and meet the warring parties _theoretical_ concerns… but I’m not sure its’ actually practically USEFUL.

    Even if someone can explain how there’s a ‘theoretically right’ answer to a particular confusing example… if it’s so abstract and confusing that it takes some kind of sem web genius to tell you what the ‘right’ answer is… how likely is that to catch on and be implemented consistently? And confusing examples are not just rare ‘edge cases’, they’re scattered well throughout common real cases.

    Plus I hate hash URIs.

  3. OK, you don’t like hash URI’s, but that’s just one way of taking care of the resource identifier stuff.

    What SemWeb or Linked Data is about, is “linking data” that is available on the web in different systems, databases, etc. And for that you need persistent identifiers, just like you argue.

  4. Thanks for pointing me to Ed Summers article. Looks interesting indeed. I will read it more closely soon. The question is: “how to decide what is the entity/object?” Definitely deserves more thought by me.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s