Oops, I just discovered that LCCNs didnt’ work like I thought, meaning a bunch of software I’ve written is in error. (Apparently I’m not alone, and many other vendors make this mistake too.)
An LCCN is not neccesarily just a numeric string. It may have a two-three 1-3 letter alphabetic prefix, which is necessary to actually have an LCCN which uniquely refers to an LC record. This came up in an email conversation with Ardie Bausenbach from LC:
We have some information on the LCCN Permalink FAQ at http://lccn.loc.gov — which explains the info:lccn normalization specs. In essence, LCCNs assigned beffore 12/31/2000 can have 3-character prefixes, and LCCNs assigned afer 1/1/2001 can have 2-character prefixes. A large number of CONSER records have prefixes (sn, sf, sc, ce, cf, cn). The prefix makes the number unique — so sn 83031896 is a completely different record from 83031896.
Oops. I have written software that assumes an LCCN is purely numeric, which I realize now is broken. In fact, an LCCN may include a two letter (or historically, three-letter) alphabetic prefix, which is an integral part of the LCCN. Other LCCNs do not have such a prefix (ie, they have a ‘null’ prefix).
I wonder what Google Book Search does with this? I wonder if this confusion is what’s responsible for the false hits I’m getting from Google? Perhaps either the data was wrong in my original citation (generally from my catalog’s marc), or my software was erroneously stripping it out before sending it to Google. (update: Checked a few examples I had on hand of bad matches, I don’t think they are due to prefix errors. Compare: http://lccn.loc.gov/01000500 to http://books.google.com/books?q=LCCN01000500. The prefix-less “01000500” does seem to be the proper LCCN for Slave insurrections in Virginia, and not, as GBS thinks, for Leaders Helping Leaders: A Practical Guide to Administrative Mentoring)
The specs, such as they are (see here and here), seem to suggest that the ‘normalized’ form of an LCCN is to have no space between the alphabetic prefix and the number: “sn83031896”, not “sn 83031896”. Jon Gorman says he thinks GBS does handle LCCNs with prefix properly, and indeed thinks GBS expects there to be no space.
Note that the documents there describe some information which may appear as a “suffix” too, but I think it’s safe to ignore those. “Suffixes and alphabetic identifiers do not affect the uniqueness of the control number… Suffixes have not been assigned since 1969 and they will be deleted from Library of Congress files in 1999.”
So I think that with 1-3 letter alphabetic prefix, followed by a numeric string, you succesfully have a unique LC record identifier. If I’m understanding this right.
Thanks to Jon Gorman and Ed Summers in explaining some of this to me in channel.
I meant to mention yesterday that another source of information on how to normalize LCCNs can be found in the info-uri registry. I believe it was put together by Ray Denenberg when he registered the lccn info-uri namespace.
My apologies yesterday for getting all rtfm-in-yo-face-about this stuff. Your questions were measured and spot-on as always. I admit it is confusing, but we are all kinda doing the best we can. Chalk it up to another edsu-low-blood-sugar event …
Thanks Ed. That info-uri document is really useful in laying out a very clear algorithmic method for validation and normalization of LCCNs. I can use this in my code now. Going from the LC docs I had previously alone, there was still a lot of work to do to figure out how to algorithmically validate or normalize.