Xiaoming Liu talks about what the OCLC xID service does to translate from the WorldCat MARC to some simple metadata fields:
As of now, we make a rather subjective “educated guess” when we implement xISBN system. For the record, I would like to list the mapping we are doing in xISBN right now:
Form->complex logic of pulling marc header, 008, 245#h, and applying a Bayesian trainer
Url->context related (e.g. Ebook, wikipedia, hathitrust may have different URL)
(From wc-devnet-l listserv, which doesn’t seem to have publically accessible archives. royt, you want another mission?)
Okay, who can guess what I’m going to rant about now?
He needs a freakin bayesian trainer to figure out a best guess of form (book, audio, video, journal, etc.) from MARC? A bayesian trainer is a method of computer ‘learning’ (that maybe people would have used to have called ‘artificial intelligence’) for applying a statistical guess. That’s what our MARC requires of us?
And from the other examples there, Liu obviously wasn’t making things any more complicated than neccesary for a good enough approximation. This topic came up when some people took issue over him just looking at 245$c instead of the several dozen MARC fields in complicated combinations that an author might be in (or what you find there might not be an author at all). If he’s using computer learning there, I’d guess it’s because he couldn’t avoid it.
What’s wrong with us? And it’s not just legacy data, we’re still creating cataloging records that require this. How can anyone not see a problem here?
Nice quote, scary quote
Bryan, in a comment to the previous post, pointed out some remarks from a nice tribute to Lubetzky.
This paragraph from Martha Yee, made five years ago, struck me with the terror of truth:
Call me Cassandra, but the fact that we can’t carry out the objectives of the catalog so eloquently described and urged upon us by Lubetzky does not bode well for our future as a profession. The rest of the world has become enamored of Google. Google cannot carry out the objectives of the catalog either. But if our choice is between online public access catalogs that are expensive but cannot carry out the objectives of the catalog, and Google that is cheap and cannot carry out the objectives of the catalog, I know what the choice is likely to be. And when we try to argue for the continuing existence of our profession on the basis of our expertise in the organization of information, what scholar in the humanities is going to stand up for us, after spending a career trying to navigate the chaos we have created in our catalogs for searchers of known prolific works?
It’s certainly not just ‘searchers of known prolific works.’