jump to navigation

Tagging and motivation in library catalogs? May 10, 2008

Posted by jrochkind in Practice, Theory, business, cataloging.
6 comments

Eh, this comment was long enough I might as well post it here too, revised and expanded a bit. (I’ve been flagging on the blogging lately). Karen Schneider thinks about “tagging in a workflow context

Tagging in library catalogs hasn’t worked yet for a number of reasons…

Karen goes on to discuss much of the ‘when’ of tagging, but I still think the ‘why’ of tagging is more relevant. Why would a user spend their valuable time adding tags to books in your library catalog?

I think the vast majority of succesful tagging happens when users tag to aid their OWN workflow. Generally to keep track of things. You tag on delicious to keep track of your bookmarks. You tag on librarything to organize your collections. The most succesful tagging isn’t done to help _other_ people find things, but to keep track of things yourself–at least not at first, not the tagging that builds the successful tag ecology. Most cases of a successful tagging community where people do to tag to help others find things–I’d suggest it would be because it somehow benefits them personally to help people find things. Such as, maybe, tagging your blog posts on wordpress.com because you want others to find your blog posts–still a personal benefit.

A succesful tag ecology is generally built on tagging actions that serve very personal interests which do not need the succesful tagging ecology on top of it. Interests served even if you are the only one who is tagging. The succesful tagging ecology which builds out of it–and which goes on to provide collective benefit that was not the original intent of the taggers–is an epiphenomenon.

Amazon might be a notable exemption to this hypothesis, perhaps because it such a universally used service before tagging already. (Unlike our library catalogs).  I would be interested to understand what motivates users to tag in Amazon. Anyone know of anyone who’s looked into this? It’s also possible that if amazon’s tags are less useful, it is in fact because of this lack of personal benefit from tagging.

So what personal benefit can a user get in tagging in a library catalog? If we provided better ’saved records’ features, perhaps, keep tracks of books you’ve checked out, books you might want to check out, etc. But I’m not sure if our users actually USE our catalogs enough to find this useful, no matter how good a ’saved records’ feature we provide. In an academic setting, items from the catalog no longer neccesarily make up a majority of a user’s research space.

To me, that suggests, can we capture tags from somewhere else? My users export items to refworks. Does refworks allow tagging yet? If it did, is there a way to export (really re-import) these tags BACK to the catalog, when a user tags something? But even if so, it would be better if Refworks somehow magically aggregated tags from _different_ catalogs, of the same work. But that relies on identifier issues we haven’t solved yet. If our catalogs provide persistent URLs (which they don’t usually, which is a tragedy), users COULD tag in delicious if they wanted to. Is there a way to scan delicious for any tags including your catalogs url, and import those back in?

In addition to organizing one’s research and books/items of interest, are there other reasons it would serve a patron’s interest to tag, other things they could get out of it?  A professor might tag books of interest for their students, perhaps (not that most professors are looking for more technological things to spend time on helping students, but some are).   And librarians themselves might tag things with non-controlled-vocabulary topic areas they know would be of use to a particular class or program or department, with terms of use to those classes or programs or departments.  Can anyone think of any other reasons tagging could be of benefit to a user (not whether a successful tagging ecology would be of collective benefit–but benefits an individual user can get from assigning tags in a library catalog).

Worldcat covers a much larger share of my academic users’ research universe than my own catalog. And worldcat has solved the “aggregating different copies of this work from different libraries” problem to some extent. Which is why it would make so much sense for worldcat to offer a tagging service–which can be easily incorporated into your own local catalog for both assigning and displaying tags (if not for searching) ala library thing. It is astounding to me that OCLC hasn’t provided this yet. It seems to be a very ‘low hanging fruit’ (a tagging interface on worldcat.org with a good API is not rocket science) that is worth a try.

State of FRBR March 12, 2008

Posted by jrochkind in Theory, cataloging.
2 comments

Thoughts on FRBR sparked by a discussion on RDA-L (archives for month in progress not online, sadly) about FRBR modelling of moving pictures (thanks Martha Yee), which itself was sparked by Diane Hillman’s excellent stab at creating some RDA cataloging scenarios (which I can’t find the url for having no archive of RDA-L to search. You know, I should go and subscribe a google list to it and make my own archive henceforth). (more…)

Inventing how to think and talk about metadata: DCAM and RDF February 19, 2008

Posted by jrochkind in Theory, cataloging.
3 comments

So, there’s some discussion going on about DCAM and RDF, based off of Stuart Wiebel’s blog here, here, and here. With some supplementary commentary by Peter Murray.

One fundamental question: Do both RDF and DCAM do the same thing? (I’m not sure if I mean RDF alone here, or RDF plus a ’suite’ of things specifically designed by the RDF development community to supplement RDF, like OWL, which I know nothing about!). Do they do different things? What is each ‘framework’ intended to do anyway? [Okay, I guess that wasn't just one question].

Stu Weibel suggests that they do not both do the same thing, although there may be some overlap, but they are generally compatible and complementary. And PeteJ (Pete Johnston I think) seems to disagree, and think that both RDA and DCAM have very similar roles, although he thinks that each does something the other doesn’t, but I’ still confused as to what that something is in his analysis–and if that something matters.

It has occurred to me before that there’s an astounding amount of difficulty over being able to put this stuff into words and know what we’re talking about: Over our mental models of metadata and metadata control, and the words we use to talk about it. (more…)

Salivating over authority data February 15, 2008

Posted by jrochkind in cataloging, identifiers.
3 comments

From the LCCN permalink FAQ:

Are LCCN Permalinks available for Library of Congress authority records?
Not at this time, but the Library is exploring options for adding this functionality.

Now that would actually be huge. If my software can look up an authority LCCN, and get back MARCXML for the complete MARC authority, it would make it so much easier to do so much more with authorities in software that handles existing bib records.

Now, provide me a machine-actionable interface where I can keyword search the authorities too (including LCSH and NAF), and I’d be able to do so much.

LCCN permalink February 14, 2008

Posted by jrochkind in business, cataloging, identifiers.
add a comment

LCCN permalink. A pretty simple thing, but one that makes so much sense. That LC is providing it encourages one to hope such sensical but not quite as simple things may be in the pipeline too.

Also interesting to note that, as far as I’m aware, this means that LC has beat OCLC to having a persistent URI for a (approximated) manifestation that resolves to a publically accessible structured machine actionable bib record (in choice of formats). That LC did it first surely has as much or more to do with business models than it does with tech. It’s actually gratifying and surprising that LC has done it. Stuart Weibel has surely taken note.

issn metadata access February 12, 2008

Posted by jrochkind in Practice, business, cataloging.
add a comment

Did you guys know that issn.org sold z39.50 access to the ISSN registry/portal? I didn’t.

What might you want to use this for? Well, if the “linking ISSN” is deployed succesfullly, and the information is successfully included in the information available from the ‘issn portal’, then this is a machine-actionable source of correspondences between ISSNs that really represent the same title in different formats. I trust that many of my readers can think of all sorts of uses they could make being able to embed that information in their various discovery applications.

OCLC xISSN also can potentially provide some of this data in machine actionable form. (Haven’t explored it yet myself). I assume that xISSN correspondences are currently algorithmically/heuristically generated from what information is available in a cataloging record, as opposed to the “linking ISSN” based metadata, which presumably will be manually controlled? But then an interesting question is the cost comparison of these two services licensed for the uses we’d want to put them to. Would be nice to have two competing metadata web services available for a change, instead of usually having NONE that do what we need.

Identifiers and Display Labels again January 17, 2008

Posted by jrochkind in Theory, cataloging.
add a comment

An RDA-L post, more yet again on a topic I’ve been circling for some time, after Ed Jones usefully approaches it too.

Ed Jones wrote:

On a related question, as long as this collocating function is satisfied, I don’t know that it matters anymore whether a name is unique in its display form.

Yeah, this is something I’ve been talking about for a while too and trying to get people to think about.

If the collocating function is satisfied, then we can group works and show related works without any ambiguity, EVEN IF we have no unique display form. Now, it might result in a confusing display.

Ed Jones

  • Work 1
  • Work 2
  • Work 3

Ed Jones

  • Work A

Two different Ed Joneses, correctly grouped seperately, but both showing up as “Ed Jones”. So it might be going too far to say “it doesn’t matter”. It matters to some extent. We might prefer to have a display title that can disambiguate. On the other hand, we might not need one. Maybe the above display is sufficient. Maybe the works listed under each Ed Jones alone is enough to tell the user which is which! (A birth date isn’t neccesarily what the user needs). Maybe a calculation of most common LCSH strings assigned to Ed Jones(1)’s work would be helpful.

But regardless of how important you think display uniqueness is, or how you think it’s best accomplished, the important point is this: If we can just get collocation uniqueness (which we could call ‘identifier uniqueness’ too), is that better than nothing? Yeah it is! A lot better than nothing. Display uniqueness may just be icing on the cake.

This gets at the concepts I’ve been harping on for a while of conceptually separating the ideas of identifiers and display labels, because it makes it easier to think and talk about what we’re doing. Thanks Ed for explaining it very nicely.

(See my previous posts, The Purpose of Authority Control, Access Points as Identifiers, and
Two Meanings of ‘Identifier’
)

Our current AACR2 headings serve both purposes. They serve the “identifier” purpose when they provide for collocation–but as Ed Notes this function could instead be served by a ‘dumb’ identifier. And they serve the “display label” purpose when they are used, well, as a display label—and this display label purpose could be served by something that is not unique, but could not be served by by a ‘dumb’ (say strictly numeric, like OCLC num; or a URI) identifier. Two different purposes.

One reason this matters regardless of how we change our own systems of control is that we will increasingly be interfacing with other people’s systems that DO separate these functions, or maybe provide only one but not both of them. To get these things to mesh with our systems, we’ve got to understand that they are two seperate functions. But I do think our systems of control can usefully be improved by separating these purposes to some extent too.

I keep meaning to write more about how the FRAD document totally misses the boat on this, but keep running into writer’s block. It’s unfortunate, because FRAD is the right place to try to get these things straight conceptually, but it totally confuses them instead.

 (The collocating function will be satisfied if its underlying identifier is unique.) In some ways, simply using an unadorned name in the display–even if this results in a series of
identical unadorned names–frees the catalog search engine for more interesting manipulations. At present, selecting my “John Smith” from pages and pages of the same name arranged and differentiated by birth year–not particularly useful–is a time-consuming task that will ultimately defeat me. If I can select my “John Smith” on a variety of useful criteria that I may be more likely to know–prolific author,
nationality, profession, affiliation–I can quickly reduce the mountain of “John Smiths” to a manageable few. And these useful differentiating criteria may have been added to the authority record from a variety of data sources.

Ed Jones
National University (San Diego, Calif.)

WoGroFuBiCo December 14, 2007

Posted by jrochkind in Theory, cataloging.
1 comment so far

Comments on the report of the LC Working Group on the Future of Bibliographic Control (WoGroFuBiCo in William Denton’s amusing coinage) are due. I still haven’t had time to read the document thoroughly, but I guess I better prepare what thoughts I have to send them in. (Two weeks for review seem rather skimpy to me, especially at this time of year, which is a a busy end of a semester for those of us in academic environments).

On my first reading of the report, I was frankly filled with a kind of elation. I think it does a very good job of analyzing the present environment and pointing the direction we need to go–if not the final destination. The basic perspective on the library metadata control environment evidenced here is not neccesarily the consensus among the library field. But I think they’ve gotten it exactly right, and I hope the report coming from such an influential cataloging player (LC) can help to build that consensus around how we understand what’s going on where to go. The headings of the report, “Increase the efficiency of bibliographic production”; “Position our technology/community for the future (in particular with regard to recognizing that software is sometimes the ‘audience’ of our efforts)”–these are exactly the right way to frame our challenges.

Then I read Diane Hillmann’s comments and found myself agreeing wholeheartedly with them too. While Hillmann is, I think, largely sympathetic to the aims of the working group, the devil is in the details. I realized that the report is pretty short on specific actionable reccommendations. More detail would be welcome–or at least the recognition that more detail is needed and a reccomendation that certain task forces need to develop that detail. One example would be Martha Yee’s comments, I think on the RDA-L list, about reducing duplication of authority work requiring a better technical infrastructure for automated sharing of metadata, not just a vague “promoting wider participation” or what have you. I agree wholeheartedly, and would extend that to cover ‘bib’ data too, not just ‘authority’ data. I’d quote her on that if the RDA-L list archives didn’t stop in August.

There is only one place in the report that raises serious red flags for me, and that’s the reccommendations on RDA and FRBR, especially around 3.2.1. To explain my concerns, let’s start with a great framing of the metadata situation from Stu Weibel (quoted a bit out of order to approach this the way I want):

I am asserting that embedding the library in the open Web demands:

  1. A coherent [content] model of what we are describing and the relationships among those entities, and in which each entity is identified with a URI…
  2. A carrier syntax that lives comfortably on the Web (the DC Abstract Model is my candidate)
  3. Rules for populating agreed structures (that at which RDA seems to be failing so earnestly).

I suspect that the perspective of the Working Group would agree with this too (although it would be nice if it were more explicitly highlighted in the report)–but it is not neccesarily one shared across the library community–that applying this sort of metadata control is both neccesary and vital. Whether an audience agrees with this or not will color how that audience reads the working group reccomendation, in potentially dangerous ways the Working Group should take care to avoid.

Stu goes on to say (or actually says first):

There is exactly one candidate for a content model that captures the relations among salient bibliographic entities that are needed to anchor library assets in the larger information sphere: FRBR. It feels roughly right to most, though it would be unwise to underestimate the time we can (ill-afford) to spend on thrashing around in the details.

I again agree entirely, and would add that there is exactly one serious candidate for that third pillar, the rules or guidelines for how we take the real world of concrete objects and capture it in recorded descriptions according to our content model: And that is RDA.

To be sure, neither FRBR nor RDA are perfect. But they are what we have: they are the only serious candidates we have for filling these vital roles. And they have both come out of significant community work. The drafters of RDA especially have had to deal with very difficult competing goals and priorities from various constituencies and funders. If we do not think they have been as successful as we would like–what makes us think we can do better from starting over? Because I think this is the implied sub-text of this recommendation–that both RDA and FRBR may not be worth continuing with. I don’t think we have any other choice though: Again, these are the only serious candidates we have. And if this is not the sub-text the Working Group meant to imply, they should be clear about what they mean, because if I read this sub-text into it when I find it to in fact be a disastrous course of action, I can guarantee that others who find it a welcome suggestion (because they do not believe a formal content vocabulary or rules for applying it other than AACR2 with tweaks are necessary) will also read that as unspoken subtext.

We do not have the luxury of time to start over. Nor, in “internet time”, do we have the luxury of waiting until one of these three “pillars” of contemporary metadata control is perfected before continuing with the others. Work on all of them must go on in parallel. These are the serious candidates we have–we may need to change the processes through which they are developed (eg., back-room closed session dealings are no longer the way standards are best done in the internet world; the reluctance to create implementations before standards are “finished” needs to be replaced by creating prototype implementations to inform the finishing of the standards)–but what we need to do is continue to develop FRBR and RDA, in parallel, with all deliberate speed.

So when the Working Group recommends that we “suspend” RDA work, I wonder what they are proposing as an alternative, and worry that this recommendation will be heard by an audience that does not agree with our premises–that we need a content model; and rules/guidance for applying it–as evidence that we do not need these things at all. Or that we have the luxury of time to make them perfect before seriously engage in them. If the Working Group does not want to send this message, I urge them to reword this recommendation carefully.

In fact, I think that the RDA/DCAM task force is the just about the best hope we have to “specify…, model and represent a Bibliographic Description Vocbulary, drawing on the work of FRBR and RDA, the DCAM and appropriate semantic Web technologies.” This is the already existing group of people prepared and qualified to do this work–it is, I would say, the best candidate we have for starting this work as soon as we need to. I would rather see the Working Group recommend that this group be funded so they can get cracking, rather than recommend that RDA work be “suspended”—which again, some audiences which do not agree on the importance of rationalizing our work to a contemporary metadata control regime (something I think the Working Group does see), are going to hear as evidence that this is not important or urgent after all, that it’s being “suspended”.

The evidence that I see is that the RDA JSC is trying mightily to accomplish exactly what the Working Group calls for, so calling to suspend the one influential endeavor doing that seems just disastrous to me. If the Working Group believes the way they are going about it is wrong, then the recommendation should be as to how that way of going about it should be changed, not recommend that the work be “suspended”. We have no time for suspension.

In particular, I share a concern that certain parts of the operational/business model for RDA are problematic for achieving a good solution–in particular, that the committee of principals which is funding this work sees it as a publishing project which must produce a return on investment (and quickly), rather than as a metadata control standards-making project with different business-case requirements. The consequences of this on the RDA process may indeed be a fatal flaw on what is produced; but on the other hand, the existing actual cataloging community constituencies do need to have buy-in to the process and product, and this is theoretically what the committee of principals represents. So a clear solution to this dilemma is not clear to me. But again, if this is the Working Group’s concern, they again need to make it explicit and suggest alternative approaches, not simply suggest a suspension while we wait for vague research work to be done by parties that have not yet stepped up to do it.

We do not have time for that.

When the Working Group lists their “consequences of maintaining the status quo” for this overall section including these parts, I think they are not nearly dire enough. The consequences of not proceeding with this effort to, in the working group’s words, “model and represent a Bibliographic Description Vocabulary” (which I think means accomplishing Stu’s three pillars) are dire, they are that our metadata efforts will become increasingly irrelevant to our actual information environment.

Why FRBR entity model matters: FRBR considered as set relationships December 7, 2007

Posted by jrochkind in Theory, cataloging.
1 comment so far

In response to some recent debate on the lists over whether the FRBR entity model really matters or is useful or is acceptable, to clarify some issues (maybe! Or stir the pot yet more!), and make it clear why I think the FRBR model is a reasonably good approximation to serve as a ’skeleton’ for our metadata, and why the Group 1 relationships are especially important to the information landscape, I’m going to throw out another way of looking at the FRBR Group 1 entities. (I believe this is just an explanation of what’s in FRBR, not a change; just another way of explaining what’s already there).

=> An item is a concrete physical thing in your hand, naturally. That’s straightforward, yes?

=> Two items belong to the same manifestation if they are physically identical. Or in the case of digital items that have no physicality, if they are bitwise identical, I guess is the good analog.

  • To be sure, some physical features are more important to us than others. A dogeared page makes something no longer physically identical, but we still consider it the same manifestation. So we really sort of mean ‘physically identical at the point of production’
  • Also, as Jim Weinheimer usefully reminds us, we do not go the lengths to really ensure with 100% confidence that two items are physically identical, we just approximate, and decide that they can be treated as physically identical for our users needs, trying to optimize meeting user benefit per staff time put in.

=> Two items belong to the same expression if they are textually identical. Or more generally for non-textual materials, if their information content is identical (not revised or amended–and certainly not entirely different!).

  • This is even fuzzier than physically identical, but still of importance to users! The FRBR report itself makes clear that realistically, this is more like “can be considered textually/information identical, for the purposes of a user community, balanced with the resources available to make this determination.” Many users might consider two items textually identical despite some minor trivial differences; whereas a rare books scholar might consider the tiniest difference vital.
  • And yes, it’s more clear how this applies to textual materials than non-textual, but I think it still matters for non-textual materials. Is this print the very same pictures as this other print, or did the artist ‘revise’ the pictures before making the other print? With music even trickier, but it still matters to the user if the information content has changed or not (but with music there is less clear understanding of when the information content has changed; the same ensemble playing the same arrangement on a different day may be information difference that matters to the music community).

=> Two items belong to the same work if… well, they belong to the same work. This one is entirely culturally determined, and there’s no good way to say it in any more basic language (although FRBR tries), but while the concept of ‘work’ is entirely cultural, in Western culture at least it is a very important concept, which matters quite a bit to users. Basically, we know it when we see it. Is this thing an edition of Shakespear’s Hamlet, or is it not? This is something that matters quite a bit to the user, who may be looking for an edition of Hamlet, any one will do. Or may be looking for all editions of Hamlet.

  • To be sure, it’s a judgment that is subjective, contextual, and which reasonable people can sometimes disagree on–especially in edge cases. But it’s still an important one to users! Especially in strange edge cases, one person may disagree with another about whether an item is an edition of Hamlet or not, but the notion of “Hamlet”, and that various expressions and manifestations (as defined above as sets of items sharing physical and textual identity) may embody “Hamlet”—is a key thing to (at least Western) readers/users conception of the bibliographic/information landscape. Whether the naive user uses the word ‘work’ or not, it’s still a key concept.

So, physical identity, textual/information identity, and embodying the same work—I think these are fundamental divisions of the bibliographic/information universe, which are very important to our efforts to give the user a somewhat organized approach to that universe instead of just chaos (which all too often our OPACs give them now!). I think this means they do serve as a good basic model for the general purpose bibliographic/information universe, a skeleton on which more specific things can be built, but which still gives us a way to compare and explain anatomies, as it were. I think these characteristics of relationships between items are also especially important and basic ones to ‘bibliographic control’, and therefore justified in having a central place in the FRBR model–but that doesn’t mean that other relationships aren’t also present and important, and in certain ‘edge cases’ other relationships may even be more important.

FRBR imperfect? So then? November 28, 2007

Posted by jrochkind in Theory, cataloging.
6 comments

We hear all the time “FRBR is untested, FRBR is incomplete, FRBR needs work.” One version of this is in Karen Coyle’s summary of the Working Group on the Future of Bibliographic Control report. [ I'm waiting for the official written report before really responding to these reccommendations, but I'll respond now just to the informal comments here as one point of view, regardless of whether it accurately represents the working groups'.]

“The framework known as FRBR has great potential but so far is untested.”

Now, as it happens, I in fact agree with this completely so far as it goes. However, that doesn’t change the fact that we desperately need what FRBR is trying to do—a formal and explicit schematic of how we
model the ‘bibliographic’ (or ‘information resource’) universe. Some agree that we desperately need this, some don’t and think it’s all a bunch of hot air. I’ve made my case for why we need it before, and probably ought to do so again in more polished form.

But those of us who agree that we desperately need this, AND that FRBR is an untested and imperfect attempt to do this—then what? Either we: (more…)