“Broken”, huh?

Irvin Flack asks:

Jonathan, you say “our current metadata environment is seriously and fundamentally broken in several ways”. What are the ways in which it is broken? I would say the cataloguing community have just been overtaken by a tsunami of change in the last ten years (mainly the shift to digital information) and is still working out how best to respond and adapt.

I suspected someone would ask that of me after the last post. A definitive argument/explanation for why/what is broken in our current environment has yet to be written, and is not an easy thing to do. All I can do is provide a sketch of some notes toward that thesis, which I’ll try to do here.

1. The issues brought up in the LC working group’s Users and Uses meeting are one good place to start for some overall background. Karen Coyle provides a good summary that includes some of the issues.

2. There is far too much duplication of labor among working catalogers. We lack a good technical and customary infrastructure for efficiently sharing corrections and improvements made in one location with the larger community. We fail to take advantage of as much work as we could for the larger good, and have significant resources being spent on duplicating data.

3. There are very basic questions of high interest to our users that our data set is unable to answer, even though we are spending time recording information that ought to be available to answer these questions. One very good example–and it’s just one example–is Roy Tennant’s analysis of the inability to say whether full content is available online even though we are already spending time recording URL information.

  • We do not spend nearly enough time investigating and identifying and working to solve these sorts of problems. Why did Roy have people on AUTOCAT telling him this problem was clearly imaginary, and didn’t exist?

4. We have drawn a wall around what is and what is not of interest to ‘cataloging’ that is not neccesarily backed up by any good rationale. Many things that we decide are not of interest (like the above issue?) are in fact of high significance to the success and ease our users will have in carrying out the tasks we mean to support. We do this even within the data found in a MARC record, and also according to type of material and source of data. I don’t mean that “Catalogers” need to apply the exact same standards to journal articles, institutional repository metadata, data from Lorcan’s other three sources of metadata (thanks Peter). But we do need to consider it our responsibility to figure out how all these things can fit together. Cataloger’s need to be metadata professionals stepping up to figure out the overall control regime that can fit these things together.

  • We need to think seriously about how we will share our metadata with other communities and vice versa.
  • As an aside, a “pet peeve” that actually isn’t a “peeve” at all, it’s a serious problem, is the MARC-8 character encoding.

5. Related, we have too many different standards, controlled vocabularies, standards bodies, organizations, sub-communities with overlapping domains and which produce un-harmonized data, without enough coordination. One example of the problems this causes is form/genre information. Form/genre is of high interest to our users. And it is found in at least half a dozen places in the MARC record, from at least three different controlled vocabularies from three different places—LCSH $v information; GMD/SMD; and MARC allowed coded values and guidance from MARC itself (which does count as a controlled vocabulary!). How can we help users find what they need and understand what they’ve found (see facetted browsing) in terms of form/genre from this mish-mash?

  • To be clear, form/genre is conceptually a very difficult problem. Although it may seem simple to the users (“I just want to find videos(/biographies/science fiction)! What’s the problem?”), we all know that it’s a conceptually thorny set of concepts that are difficult to deal with systematically. That’s no excuse for not working on it though, and the apparatus we have in place instead binds us in inertia.

That’ll do for a start. Something deserves to be said more generally about creating data that’s of use to machine processing (for the end goal of presenting things to users in better ways, naturally! We don’t care about the machines for the sake of machines) as well as for direct human consumption (Human finds record somehow->what we record has to be intelligible to human once found). But I’m still working out how to say/justify that clearly for an audience that doesn’t already agree with it.

Now, these are some very difficult problems. That we have them is not indication that 100 years of cataloging practice has “failed”. In fact, the metadata system/environment we have now was very intelligently optimized for the social, economic, and technical context of the mid 20th century. It is arguably the best that could be done in that context. But that’s not the context we are in anymore. We have new demands and new possibilities and new challenges. Yes, “the cataloguing community have just been overtaken by a tsunami of change in the last ten years” (although I’d say it’s not just about the fact that information resources are increasingly digital form. That’s in fact less significant to me than the change from card catalog to online environment, which I think we still haven’t made successfully–and that’s going on 20, 25 years.) The result is a broken system.

In the 21st century, our library metadata environment (by which I mean the interacting system composed of people, institutions, organizations, rules, standards, data sets, computer software–“system” in the sense of General Systems Theory, I don’t just mean “system” in the sense of “Systems Department”)–is in fact, I still argue, broken.

It is the role of a professional and strong community of catalogers to work on fixing it. Don’t forget that Lubetzky [1], Cutter, Panizzi—all were in fact “cataloging radicals” challenging and rethinking how things had always been done for new social, economic, and technical contexts. Where is our Lubetzy for the 21st century?

[1] “Unfortunately, standard rules had become too much of a good thing. An undue proliferation of rules was the topic of “Crisis in Cataloging” as identified by the Librarian’s Committee of 1940 at the Library of Congress and immortalized by Andrew Osborn, one of the members of the Librarian’s Committee, in 1941.

“The Library of Congress together with ALA took the lead to examine the rules, and Seymour Lubetzky was hired to discover ‘Is this rule neccessary?’ usually answering, ‘no’. Catalogers had become too focused on creating the perfect record according to LC standards, which they also complained not even LC had achieved.”

From “Cooperative Cataloging: past, present and future”, by Barry B. Baker. “Has also been published as Cataloging & classification quarterly, volume 17, number 3/4 1993”–T.p. verso. Found by me via a Google search.


5 thoughts on ““Broken”, huh?”

  1. I believe you are right, Jonathan. Many trite phrases come to mind — ranging from “you need to break some eggs to bake a cake” to “science advances one funeral at a time” (commonly attributed, perhaps apocryphally, to Max Plank). There are key individuals present in the environment that — for reasons such as fear, fatigue, tenure, and ignorance — hold too tightly to the status quo when it is clear to others that the status quo leads down a path of irrelevancy.

    These are stark terms, but it seems like factions the profession are increasingly coming to opposing viewpoints. Granted, I’m a ones-and-zeros kind of guy, but I am hard pressed to conceive of a way to reconcile these positions.

    I know your post dealt quite well with why the act of cataloging needs to be fractured and reformed for the benefit of our users, but just as important I think is the need to bust apart the computer systems we currently rely on and remake them in something approaching 21st century software design theory. (You touch on this briefly towards the end, but I felt the need to emphasize it.)

  2. Yes, quite true, Peter, and thanks for emphasizing. Wouldn’t want anyone to get the idea that I’m saying “The computer systems are fine, it’s the data’s fault!”.

    It’s not a question of whether “the data” or “the computer system” are to “blame”. I certainly could write another post about the ways that our software is broken (both the software to support cataloger workflow, and the software to provide an end-user interface).

    But it’s also a vicious circle, both of these components are so inter-related. Both need to be reformulated–along with also the social mechanism we call the ‘shared cataloging environment’–as part of an intentional rebuilding the System that incorporates all of these things as components. As I keep repeating like a broken record, I believe that bibliographic systems design and cataloging/’metadata management’ are just two parts of what ought to be considered one unified discipline.

  3. Good post! We need to break down the requirements and processes of cooperative cataloging and rebuild it using modern tools.

    I’ve long thought that the best example of online collaboration is how open source programs come together using versioning systems, then populate out using repositories and semi-automated package management tools.

  4. Thanks Jonathan, they’re all good points. I can’t help thinking all this will take a lot of time, even if everyone is pulling in the same direction. A bit like turning an aircraft carrier.

    Further to the points about the new demands/challenges: it came as a jolt yesterday to read the following quote in a blog post by Wayne Hodgins on libraries and librarians (this is Wayne quoting Erik Duval (in turn quoting Wayne):

    ‘I never go there [the library]. Neither do my colleagues. Nor my students. Why would we? All our material is available on-line. If it isn’t, it kind of doesn’t exist…

    This is an area that is very much in flux: the conservative reflex with many librarians is easy to understand but they really risk “perfecting the irrelevant”, as my friend Wayne Hodgins would say.’

    I’m not saying I agree with this POV — but we have to accept it’s out there.

  5. Well, a new Doctor happened to be making smalltalk with me and when I mentioned I was a librarian he said the same thing. “Yeah, I don’t ever use the library anymore, I get all my journal articles online.”

    I said, “Well, the people that make sure those journal articles are THERE for you to get online? And who try to make sure you can find em when you’re looking for them? And make it easier to find them? That’s librarians and the library doing that, man! You ARE using the library when you get those journal articles online! In fact, my job is all about those computer systems that get you those journal articles!” (I work with link resolvers and federated search).

    Libraries indeed have to figure out what services to provide to meet the actual information needs of our actual users. And there is no shortage of need, or of services! Some of these services will indeed not require the patron to set foot in the library. [For our own preservation, we better try to make sure the patron knows our role in them anyway, true.]

    I have no doubt that many of these services–whether or not the user sets foot in a library, or is interested only in online materials or not–will continue to require the contribution of metadata professionals. That is, catalogers.

