jump to navigation

Code4Lib Journal December 28, 2007

Posted by jrochkind in General.
add a comment

The first issue of the Code4Lib Journal is out. I have nothing more to say about than what I said in my editorial introduction, except to re-iterate that this project ended up taking quite a bit more time then I naively thought it would!

Update Dec 28. It occurs to me that the ‘qualification’ for an article to get into a standard scholarly journal might be “Is this reporting significant research?”  In contrast, I hope the ‘qualification’ for Code4Lib Journal is “Is this article going to be helpful to others trying to improve library services?”  You can have an article about really good research, and the article might accurately report that research—but it might not be very good at explaining to someone else what they can actually _do_ with it (to repeat it, or to act upon what they’ve found). This could be because of the way it’s written, or because of what’s left out. Personally (and I only speak for myself),  that article would need more work before going in c4lj. On the other hand, there can be an article that isn’t about original research _at all_, but is incredibly helpful to others in innovating in their library, and that would be a shoe-in to c4lj, but probably wouldn’t qualify for a journal with a mission more traditional-scholarly.

WoGroFuBiCo December 14, 2007

Posted by jrochkind in Theory, cataloging.
1 comment so far

Comments on the report of the LC Working Group on the Future of Bibliographic Control (WoGroFuBiCo in William Denton’s amusing coinage) are due. I still haven’t had time to read the document thoroughly, but I guess I better prepare what thoughts I have to send them in. (Two weeks for review seem rather skimpy to me, especially at this time of year, which is a a busy end of a semester for those of us in academic environments).

On my first reading of the report, I was frankly filled with a kind of elation. I think it does a very good job of analyzing the present environment and pointing the direction we need to go–if not the final destination. The basic perspective on the library metadata control environment evidenced here is not neccesarily the consensus among the library field. But I think they’ve gotten it exactly right, and I hope the report coming from such an influential cataloging player (LC) can help to build that consensus around how we understand what’s going on where to go. The headings of the report, “Increase the efficiency of bibliographic production”; “Position our technology/community for the future (in particular with regard to recognizing that software is sometimes the ‘audience’ of our efforts)”–these are exactly the right way to frame our challenges.

Then I read Diane Hillmann’s comments and found myself agreeing wholeheartedly with them too. While Hillmann is, I think, largely sympathetic to the aims of the working group, the devil is in the details. I realized that the report is pretty short on specific actionable reccommendations. More detail would be welcome–or at least the recognition that more detail is needed and a reccomendation that certain task forces need to develop that detail. One example would be Martha Yee’s comments, I think on the RDA-L list, about reducing duplication of authority work requiring a better technical infrastructure for automated sharing of metadata, not just a vague “promoting wider participation” or what have you. I agree wholeheartedly, and would extend that to cover ‘bib’ data too, not just ‘authority’ data. I’d quote her on that if the RDA-L list archives didn’t stop in August.

There is only one place in the report that raises serious red flags for me, and that’s the reccommendations on RDA and FRBR, especially around 3.2.1. To explain my concerns, let’s start with a great framing of the metadata situation from Stu Weibel (quoted a bit out of order to approach this the way I want):

I am asserting that embedding the library in the open Web demands:

  1. A coherent [content] model of what we are describing and the relationships among those entities, and in which each entity is identified with a URI…
  2. A carrier syntax that lives comfortably on the Web (the DC Abstract Model is my candidate)
  3. Rules for populating agreed structures (that at which RDA seems to be failing so earnestly).

I suspect that the perspective of the Working Group would agree with this too (although it would be nice if it were more explicitly highlighted in the report)–but it is not neccesarily one shared across the library community–that applying this sort of metadata control is both neccesary and vital. Whether an audience agrees with this or not will color how that audience reads the working group reccomendation, in potentially dangerous ways the Working Group should take care to avoid.

Stu goes on to say (or actually says first):

There is exactly one candidate for a content model that captures the relations among salient bibliographic entities that are needed to anchor library assets in the larger information sphere: FRBR. It feels roughly right to most, though it would be unwise to underestimate the time we can (ill-afford) to spend on thrashing around in the details.

I again agree entirely, and would add that there is exactly one serious candidate for that third pillar, the rules or guidelines for how we take the real world of concrete objects and capture it in recorded descriptions according to our content model: And that is RDA.

To be sure, neither FRBR nor RDA are perfect. But they are what we have: they are the only serious candidates we have for filling these vital roles. And they have both come out of significant community work. The drafters of RDA especially have had to deal with very difficult competing goals and priorities from various constituencies and funders. If we do not think they have been as successful as we would like–what makes us think we can do better from starting over? Because I think this is the implied sub-text of this recommendation–that both RDA and FRBR may not be worth continuing with. I don’t think we have any other choice though: Again, these are the only serious candidates we have. And if this is not the sub-text the Working Group meant to imply, they should be clear about what they mean, because if I read this sub-text into it when I find it to in fact be a disastrous course of action, I can guarantee that others who find it a welcome suggestion (because they do not believe a formal content vocabulary or rules for applying it other than AACR2 with tweaks are necessary) will also read that as unspoken subtext.

We do not have the luxury of time to start over. Nor, in “internet time”, do we have the luxury of waiting until one of these three “pillars” of contemporary metadata control is perfected before continuing with the others. Work on all of them must go on in parallel. These are the serious candidates we have–we may need to change the processes through which they are developed (eg., back-room closed session dealings are no longer the way standards are best done in the internet world; the reluctance to create implementations before standards are “finished” needs to be replaced by creating prototype implementations to inform the finishing of the standards)–but what we need to do is continue to develop FRBR and RDA, in parallel, with all deliberate speed.

So when the Working Group recommends that we “suspend” RDA work, I wonder what they are proposing as an alternative, and worry that this recommendation will be heard by an audience that does not agree with our premises–that we need a content model; and rules/guidance for applying it–as evidence that we do not need these things at all. Or that we have the luxury of time to make them perfect before seriously engage in them. If the Working Group does not want to send this message, I urge them to reword this recommendation carefully.

In fact, I think that the RDA/DCAM task force is the just about the best hope we have to “specify…, model and represent a Bibliographic Description Vocbulary, drawing on the work of FRBR and RDA, the DCAM and appropriate semantic Web technologies.” This is the already existing group of people prepared and qualified to do this work–it is, I would say, the best candidate we have for starting this work as soon as we need to. I would rather see the Working Group recommend that this group be funded so they can get cracking, rather than recommend that RDA work be “suspended”—which again, some audiences which do not agree on the importance of rationalizing our work to a contemporary metadata control regime (something I think the Working Group does see), are going to hear as evidence that this is not important or urgent after all, that it’s being “suspended”.

The evidence that I see is that the RDA JSC is trying mightily to accomplish exactly what the Working Group calls for, so calling to suspend the one influential endeavor doing that seems just disastrous to me. If the Working Group believes the way they are going about it is wrong, then the recommendation should be as to how that way of going about it should be changed, not recommend that the work be “suspended”. We have no time for suspension.

In particular, I share a concern that certain parts of the operational/business model for RDA are problematic for achieving a good solution–in particular, that the committee of principals which is funding this work sees it as a publishing project which must produce a return on investment (and quickly), rather than as a metadata control standards-making project with different business-case requirements. The consequences of this on the RDA process may indeed be a fatal flaw on what is produced; but on the other hand, the existing actual cataloging community constituencies do need to have buy-in to the process and product, and this is theoretically what the committee of principals represents. So a clear solution to this dilemma is not clear to me. But again, if this is the Working Group’s concern, they again need to make it explicit and suggest alternative approaches, not simply suggest a suspension while we wait for vague research work to be done by parties that have not yet stepped up to do it.

We do not have time for that.

When the Working Group lists their “consequences of maintaining the status quo” for this overall section including these parts, I think they are not nearly dire enough. The consequences of not proceeding with this effort to, in the working group’s words, “model and represent a Bibliographic Description Vocabulary” (which I think means accomplishing Stu’s three pillars) are dire, they are that our metadata efforts will become increasingly irrelevant to our actual information environment.

podcast on library software market December 10, 2007

Posted by jrochkind in General.
add a comment

Even though I never really listen to podcasts, I still participated in a Talis podcast that ended up being a sort of free discussion on the state of the library software market.

Why FRBR entity model matters: FRBR considered as set relationships December 7, 2007

Posted by jrochkind in Theory, cataloging.
1 comment so far

In response to some recent debate on the lists over whether the FRBR entity model really matters or is useful or is acceptable, to clarify some issues (maybe! Or stir the pot yet more!), and make it clear why I think the FRBR model is a reasonably good approximation to serve as a ’skeleton’ for our metadata, and why the Group 1 relationships are especially important to the information landscape, I’m going to throw out another way of looking at the FRBR Group 1 entities. (I believe this is just an explanation of what’s in FRBR, not a change; just another way of explaining what’s already there).

=> An item is a concrete physical thing in your hand, naturally. That’s straightforward, yes?

=> Two items belong to the same manifestation if they are physically identical. Or in the case of digital items that have no physicality, if they are bitwise identical, I guess is the good analog.

  • To be sure, some physical features are more important to us than others. A dogeared page makes something no longer physically identical, but we still consider it the same manifestation. So we really sort of mean ‘physically identical at the point of production’
  • Also, as Jim Weinheimer usefully reminds us, we do not go the lengths to really ensure with 100% confidence that two items are physically identical, we just approximate, and decide that they can be treated as physically identical for our users needs, trying to optimize meeting user benefit per staff time put in.

=> Two items belong to the same expression if they are textually identical. Or more generally for non-textual materials, if their information content is identical (not revised or amended–and certainly not entirely different!).

  • This is even fuzzier than physically identical, but still of importance to users! The FRBR report itself makes clear that realistically, this is more like “can be considered textually/information identical, for the purposes of a user community, balanced with the resources available to make this determination.” Many users might consider two items textually identical despite some minor trivial differences; whereas a rare books scholar might consider the tiniest difference vital.
  • And yes, it’s more clear how this applies to textual materials than non-textual, but I think it still matters for non-textual materials. Is this print the very same pictures as this other print, or did the artist ‘revise’ the pictures before making the other print? With music even trickier, but it still matters to the user if the information content has changed or not (but with music there is less clear understanding of when the information content has changed; the same ensemble playing the same arrangement on a different day may be information difference that matters to the music community).

=> Two items belong to the same work if… well, they belong to the same work. This one is entirely culturally determined, and there’s no good way to say it in any more basic language (although FRBR tries), but while the concept of ‘work’ is entirely cultural, in Western culture at least it is a very important concept, which matters quite a bit to users. Basically, we know it when we see it. Is this thing an edition of Shakespear’s Hamlet, or is it not? This is something that matters quite a bit to the user, who may be looking for an edition of Hamlet, any one will do. Or may be looking for all editions of Hamlet.

  • To be sure, it’s a judgment that is subjective, contextual, and which reasonable people can sometimes disagree on–especially in edge cases. But it’s still an important one to users! Especially in strange edge cases, one person may disagree with another about whether an item is an edition of Hamlet or not, but the notion of “Hamlet”, and that various expressions and manifestations (as defined above as sets of items sharing physical and textual identity) may embody “Hamlet”—is a key thing to (at least Western) readers/users conception of the bibliographic/information landscape. Whether the naive user uses the word ‘work’ or not, it’s still a key concept.

So, physical identity, textual/information identity, and embodying the same work—I think these are fundamental divisions of the bibliographic/information universe, which are very important to our efforts to give the user a somewhat organized approach to that universe instead of just chaos (which all too often our OPACs give them now!). I think this means they do serve as a good basic model for the general purpose bibliographic/information universe, a skeleton on which more specific things can be built, but which still gives us a way to compare and explain anatomies, as it were. I think these characteristics of relationships between items are also especially important and basic ones to ‘bibliographic control’, and therefore justified in having a central place in the FRBR model–but that doesn’t mean that other relationships aren’t also present and important, and in certain ‘edge cases’ other relationships may even be more important.