As much as I’m sometimes frustrated by our common inherited legacy cataloging practices, I actually do think the cataloging theory developed by Lubetzky, Svenonius, Cutter, and others is still useful — sometimes you just need to ‘translate’ it to the modern environment.
I’ve been thinking about how having persistent unique identifiers (bib IDs) for our records is really important — but not generally prioritized in some of our legacy cataloging practice. There are a bunch of ways to explain why this is important (and it’s kind of obvious to the CS-perspective-inclined).
But I realized another way goes back to some language used in my cataloging class. A cataloging record is called a ‘surrogate’ for the physical item described. That’s exactly what it is, even more so in the digital age: it allows the physical item to be ‘projected’ into the digital environment as a digital object which is a ‘surrogate’ for the physical object (or sets of objects, depending on context you consider it in) it represents.
Perhaps this helps explain why a persistent bib ID is important using cataloging theory language. As a surrogate for the physical object in the digital environment, we want to be able to link to the surrogate in different ways — from simply bookmarking it, to building more complicated ‘semantic’ relationships based upon it. All of that depends on having a persistent identifier — a persistent bib ID — for the surrogate. Changing the bib ID of the surrogate in the digital environment in unpredictable ways would be analagous to periodically changing where the physical item is physically shelved in unpredictable ways! The internal unique identifier for the surrogate is essentially it’s digital “location”.
[That’s a bit of an oversimplification — giving the digital surrogate a reliable digital ‘location’ requires some layering on top of the unique internal ID, to give it a unique persistent URI too. But the pre-requisite for that is a persistent unique internal ID.]
[And, incidentally, for the semantic web geeks reading, this gets at some of my dissatisfaction with this focus on ‘real world objects’ vs ‘documents’ or whatever they’re currently calling the second class. I don’t think it’s at all a clear distinction, and can often get confusing right quick, and I think it’s probably a mistake to rely on such a confusing distinction for crucial parts of your ‘specs’. A cataloging record is a ‘web document’, surely, but it’s also a surrogate (not JUST a ‘description’) for a real world object. Sure, we can split hairs and talk about how to handle that. But the fact that it gets so confusing and abstract and hair-splitting and subject to debate worries me and makes me suspicious of relying on such a distinction for describing how to ‘do business’ in the sem web.]