FRBR imperfect? So then?

We hear all the time “FRBR is untested, FRBR is incomplete, FRBR needs work.” One version of this is in Karen Coyle’s summary of the Working Group on the Future of Bibliographic Control report. [ I’m waiting for the official written report before really responding to these reccommendations, but I’ll respond now just to the informal comments here as one point of view, regardless of whether it accurately represents the working groups’.]

“The framework known as FRBR has great potential but so far is untested.”

Now, as it happens, I in fact agree with this completely so far as it goes. However, that doesn’t change the fact that we desperately need what FRBR is trying to do—a formal and explicit schematic of how we
model the ‘bibliographic’ (or ‘information resource’) universe. Some agree that we desperately need this, some don’t and think it’s all a bunch of hot air. I’ve made my case for why we need it before, and probably ought to do so again in more polished form.

But those of us who agree that we desperately need this, AND that FRBR is an untested and imperfect attempt to do this—then what? Either we: Continue reading “FRBR imperfect? So then?”

DLF forum, bowker presentation

At the DLF forum, I saw a presentation from someone who’s name I forget from Bowker, where I learned two simple things very interesting to me.

I learned about the plans for an International Standard Text Code (ISTC), sort of like an ISBN but applies at the FRBR “expression” level, grouping a set of ISBNs. (Although Bowker seems to call what it applies to a ‘work’, it is in fact meant to apply only to things that are ‘textually identical’, which is what we call the expression level. It is also only meant to apply to textual material, not audio, video, etc. She claimed that audio and video already had something similar, although it wasn’t widely adopted. I know nothing about this?) This would potentially be quite useful, of course, to have an expression level identifier from the ISBN people. It also makes me think of how it might harmonize or not with the library world’s plans for ‘frbrization’–it’s being done for the needs of the publishing and sales industry, like ISBN. Of course, right now there’s not much for it to harmonize to in the library world, just talk.

I’ve been having writer’s block on writing the essay I intend about identifiers and ‘access points’. I think I need to stop thinking of trying to write the perfect essay and just put my unfinished sketchy notes on the blog, because I think it is a important topic–especially for talking about how what we’re doing relates to what everyone else is doing. We need to use compatible language and compatible mental models.

Anyway, the second interesting thing I learned is that a few months ago Bowker released a web service for accessing ISBN metadata. Which is packaged with Books in Print, meaning it’s free to anyone who already has an online Books in Print subscription (many if not most of us). I don’t know the details, but am eager to find them out and play with the service.

I like that pricing idea–hey, they’re already paying lots of money for BiP, they shouldn’t need to pay even more to get 21st century methods of access to that same content they are already paying for, we should instead improve the service to 2007 levels. If only more of our library vendors worked like that. Instead, we are often in the situation where the software we pay a fortune for is stuck in 1985, and if you want modern software you’ve got to purchase and continue to pay licensing and support for an additional add-on “product”. Ridiculous, and almost all of our vendors do it. (Perhaps the market requires them to do it that way to stay in business–if so, the market is doomed and their time in business is limited anyway. Hopefully not along with ours.)

The Purpose of Authority Control

So I’ve been not blogging for a while–I managed to arrange my job so I could devote myself to some serious software development, and have been reminded of both what I liked AND what I didn’t about how obsessive I can get about coding. I get really caught up in it. Hopefully some news on what I’ve coded soon.

But meanwhile, I’ve been wanting for a while to write about authority control and identifiers, a topic I have written about before. There are some points I’ve really wanted to make, but I’ve had a bit of writer’s block on it, because it is so hard to talk clearly on this subject—it’s hard to even think clearly on this subject. But I think it’s crucial, and I think there are some important things to be said, that I’m getting a bit clearer thinking about and saying.

After trying to figure out how to say these things clearly, I decided that first we need to establish some basic agreement about the purpose of authority control. I’m sorry that this ends up being so lengthy, but I think it’s necessary to be clear.

Continue reading “The Purpose of Authority Control”

ONIX For Serials Coverage

The ONIX For Serials Coverage standard is out.

While it was mainly designed to be used within the ONIX SOH and SPS formats, they wisely decided to publish it as a free-standing schema too: “The Coverage Statement may also be used to express holdings or coverage in XML structures other than those specified in ONIX for Serials.”

I think this is a great idea, along the lines of the ‘mix and match’ incipient semantic web we find ourselves in. If you look at the standard, it is really a very nice way way to describe serial holdings coverage, in ways very amenable to machine calculation. For instance, to answer the question: “Is this particular issue X within the holdings?” Or, to combine various holdings statements into a contiguous human-displayable statement. Etc. This is something our current systems have trouble doing, because we don’t store the neccesary data in machine-actionable ways.

While the standard says it’s “designed to convey information about online serial resources from suppliers – such as hosting services, publication access management services, agents or publishers – to end customers in subscribing libraries.”, there’s really nothing about it that’s limited to that context.

If anyone is writing software where they need to store or exchange serials coverage data, I’d encourage them to check out ONIX For Serials Coverage. It’s very elegant, seems to me to be just the right level of complexity and flexibility to do what it needs to do, without being overly abstract/complex/flexible. Should be quite easy to work with. Hats off to the standards writers here.

“Computational thinking”

Yeah, I hate it when people just “me too” something someone else blogged, but I’m doing it anyway, bringing it into a slightly different context.

John Udell talks about Jeannette Wing’s concept of “computational thinking“, and points to a podcast on it (which I haven’t listened to, no. But that’s antoher topic).

This idea of a “computer science perpsective”, based in large part on the foundational idea of ‘abstraction’ (from which, I think, comes ‘refactoring’ and ‘seperation of concerns’), is one I’ve been thinking about for a while. I’m pleased to see that Wing has put a name on it too, and is exploring what it means exactly.

I learned this way of looking at things with a computer science degree and some years of experience programming, but I don’t think that’s the only way to learn it, and I think there’s a way to to learn it without actually learning how to program or being a programmer or computer scientist.

And it’s precisely this kind of perpsective (“computational thinking”) that I think the 21st century cataloger or metadata librarian absolutely needs, to be able to understand how what they do does and can fit into the digital landscape. I’ve thought before if it would be possible to design some kind of curriculum in what I thought of as ‘computer science perspective’ that wasn’t in fact particularly technical and was not about teaching programming or computer science itself. I wonder if Wing is exploring that idea with ‘computational thinking’, as she seems to think too that it’s a way of thinking that’s of utility for more than just computer scientists.

Looks like this article is a good place to start on Wing’s “Computational Thinking” idea. I think if you read that article, you will immediately be convinced “Yes! Catalogers do need to be able to think that way in the 21st century!” [Indeed, I’d make clear, that many catalogers do, or can, or _almost_ can think that way already, it’s not TOO different from a certain type of ‘cataloging thinking’ ].

“Computational thinking is a way of solving problems, designing systems, and understanding human behavior that draws on concepts fundamental to computer science. Computational thinking is thinking in terms of abstractions, invariably multiple layers of abstraction at once. Computational thinking is about the automation of these abstractions. The automaton could be an algorithm, a Turing machine, a tangible device, a software system –or the human brain. Recursively, it could be a network of these. Computational thinking gives us the power to scale beyond our imagination.”

The Purposes of ‘Subject’ Vocabularies

LCSH, LCC, DDC, Ulhrich’s subject headings, BISAC, Ranganathan’s Colon Classification, Bliss Classification (2), Amazon’s subject headings: All are examples of ‘subject’ controlled vocabulary.

I put ‘subject’ in quotes because in reality most, if not all, of these examples include terms to capture ‘aboutness’ as well as terms to capture discipline (ie, perspective), and genre (and in some cases form, format, and intended audience). (Yes, Dewey sometimes captures ‘aboutness’ and LCSH sometimes captures disciplinary perspective. Take a look.)

I have been interested for a while in exploring the purposes of these types of vocabularly. I think they are not as clear and simple as we might be used to assuming. I wrote a (too) long paper about it in library school, which I’ll attach here. I actually wrote this before I had seen NCSU’s Endeca implementation; I’d have written it differently after; but I think this discussion is very relevant to understanding effective use of controlled vocabularies in facetted navigation. Recent discussion on NGC4Lib regarding these types of vocabularies further emphasizes, to me, the importance of considering the functions.

In my paper, I argue that in looking at these vocabularies from the perspective of functions or purpose, the traditional line between ‘classification’ and ‘subject vocabulary’ isn’t actually that clear, but instead we have a number of purposes (not just two) which a given vocabularly may serve better or worse.

The paper is awfully long, so I’ll also now summarize my suggestion as to an initial draft taxonomy of functions. (These functions admittedly overlap in some ways, but I still think ) (The next step, to determine what features of a vocabularly fit what functions or purposes–is only touched upon in the paper). Continue reading “The Purposes of ‘Subject’ Vocabularies”

CCQ 43:3/4 on Semantic Web

I’m finally getting around to looking at the Cataloging and Classification Quarterly vol. 43 iss. 3/4. It’s a special issue on semantic web technologies for libraries. I think it could be really good background reading for the discussions some of us are trying to have.

It looks like it’s got some great stuff in it! I recommend everyone take a look. I am particularly excited to read “Library Cards for the 21st Century” by Charles McCathieNevile and Eva Méndez:

“This paper presents several reflections on the traditional card catalogues and RDF (Resourc Description Framework), which is “the” standard for creating the Semantic Web… The central theme of the discussion is resource description as a discipline that could be based on RDF. RDF is explained as a very simple grammar, using metadata and ontologies for semantic search and access. RDF has the ability to enhance 21st century libraries and support metadata interoperability in digital libraries, while maintaining the expressive power that was available to librarians when catalogues were physical artefacts.”

Haven’t read it yet, looking forward to. (I still say that almost all libraries in 2007 are ‘digital libraries’)

My library has online access to CCQ via Haworth Press.

“Broken”, huh?

Irvin Flack asks:

Jonathan, you say “our current metadata environment is seriously and fundamentally broken in several ways”. What are the ways in which it is broken? I would say the cataloguing community have just been overtaken by a tsunami of change in the last ten years (mainly the shift to digital information) and is still working out how best to respond and adapt.

I suspected someone would ask that of me after the last post. A definitive argument/explanation for why/what is broken in our current environment has yet to be written, and is not an easy thing to do. All I can do is provide a sketch of some notes toward that thesis, which I’ll try to do here.

Continue reading ““Broken”, huh?”

My Cataloging/Metadata Credo

I think our current metadata environment is seriously and fundamentally broken in several ways.

I do NOT think the solution lies in getting rid of everything we’ve got, or in nothing but machine-analysis of full text. I think the solution requires continual engagement by metadata professionals, which will be continually needed. We will always need catalogers—that is, metadata professionals involved in the generation and maintenance of metadata. Because that’s what catalogers are and have always been. Continue reading “My Cataloging/Metadata Credo”