Tagging and motivation in library catalogs?

Eh, this comment was long enough I might as well post it here too, revised and expanded a bit. (I’ve been flagging on the blogging lately). Karen Schneider thinks about “tagging in a workflow context

Tagging in library catalogs hasn’t worked yet for a number of reasons…

Karen goes on to discuss much of the ‘when’ of tagging, but I still think the ‘why’ of tagging is more relevant. Why would a user spend their valuable time adding tags to books in your library catalog?

I think the vast majority of succesful tagging happens when users tag to aid their OWN workflow. Generally to keep track of things. You tag on delicious to keep track of your bookmarks. You tag on librarything to organize your collections. The most succesful tagging isn’t done to help _other_ people find things, but to keep track of things yourself–at least not at first, not the tagging that builds the successful tag ecology. Most cases of a successful tagging community where people do to tag to help others find things–I’d suggest it would be because it somehow benefits them personally to help people find things. Such as, maybe, tagging your blog posts on wordpress.com because you want others to find your blog posts–still a personal benefit.

A succesful tag ecology is generally built on tagging actions that serve very personal interests which do not need the succesful tagging ecology on top of it. Interests served even if you are the only one who is tagging. The succesful tagging ecology which builds out of it–and which goes on to provide collective benefit that was not the original intent of the taggers–is an epiphenomenon.

Amazon might be a notable exemption to this hypothesis, perhaps because it such a universally used service before tagging already. (Unlike our library catalogs).  I would be interested to understand what motivates users to tag in Amazon. Anyone know of anyone who’s looked into this? It’s also possible that if amazon’s tags are less useful, it is in fact because of this lack of personal benefit from tagging.

So what personal benefit can a user get in tagging in a library catalog? If we provided better ’saved records’ features, perhaps, keep tracks of books you’ve checked out, books you might want to check out, etc. But I’m not sure if our users actually USE our catalogs enough to find this useful, no matter how good a ’saved records’ feature we provide. In an academic setting, items from the catalog no longer neccesarily make up a majority of a user’s research space.

To me, that suggests, can we capture tags from somewhere else? My users export items to refworks. Does refworks allow tagging yet? If it did, is there a way to export (really re-import) these tags BACK to the catalog, when a user tags something? But even if so, it would be better if Refworks somehow magically aggregated tags from _different_ catalogs, of the same work. But that relies on identifier issues we haven’t solved yet. If our catalogs provide persistent URLs (which they don’t usually, which is a tragedy), users COULD tag in delicious if they wanted to. Is there a way to scan delicious for any tags including your catalogs url, and import those back in?

In addition to organizing one’s research and books/items of interest, are there other reasons it would serve a patron’s interest to tag, other things they could get out of it?  A professor might tag books of interest for their students, perhaps (not that most professors are looking for more technological things to spend time on helping students, but some are).   And librarians themselves might tag things with non-controlled-vocabulary topic areas they know would be of use to a particular class or program or department, with terms of use to those classes or programs or departments.  Can anyone think of any other reasons tagging could be of benefit to a user (not whether a successful tagging ecology would be of collective benefit–but benefits an individual user can get from assigning tags in a library catalog).

Worldcat covers a much larger share of my academic users’ research universe than my own catalog. And worldcat has solved the “aggregating different copies of this work from different libraries” problem to some extent. Which is why it would make so much sense for worldcat to offer a tagging service–which can be easily incorporated into your own local catalog for both assigning and displaying tags (if not for searching) ala library thing. It is astounding to me that OCLC hasn’t provided this yet. It seems to be a very ‘low hanging fruit’ (a tagging interface on worldcat.org with a good API is not rocket science) that is worth a try.

State of FRBR

Thoughts on FRBR sparked by a discussion on RDA-L (archives for month in progress not online, sadly) about FRBR modelling of moving pictures (thanks Martha Yee), which itself was sparked by Diane Hillman’s excellent stab at creating some RDA cataloging scenarios (which I can’t find the url for having no archive of RDA-L to search. You know, I should go and subscribe a google list to it and make my own archive henceforth). Continue reading “State of FRBR”

Inventing how to think and talk about metadata: DCAM and RDF

So, there’s some discussion going on about DCAM and RDF, based off of Stuart Wiebel’s blog here, here, and here. With some supplementary commentary by Peter Murray.

One fundamental question: Do both RDF and DCAM do the same thing? (I’m not sure if I mean RDF alone here, or RDF plus a ‘suite’ of things specifically designed by the RDF development community to supplement RDF, like OWL, which I know nothing about!). Do they do different things? What is each ‘framework’ intended to do anyway? [Okay, I guess that wasn’t just one question].

Stu Weibel suggests that they do not both do the same thing, although there may be some overlap, but they are generally compatible and complementary. And PeteJ (Pete Johnston I think) seems to disagree, and think that both RDA and DCAM have very similar roles, although he thinks that each does something the other doesn’t, but I’ still confused as to what that something is in his analysis–and if that something matters.

It has occurred to me before that there’s an astounding amount of difficulty over being able to put this stuff into words and know what we’re talking about: Over our mental models of metadata and metadata control, and the words we use to talk about it. Continue reading “Inventing how to think and talk about metadata: DCAM and RDF”

Identifiers and Display Labels again

An RDA-L post, more yet again on a topic I’ve been circling for some time, after Ed Jones usefully approaches it too.

Ed Jones wrote:

On a related question, as long as this collocating function is satisfied, I don’t know that it matters anymore whether a name is unique in its display form.

Yeah, this is something I’ve been talking about for a while too and trying to get people to think about.

If the collocating function is satisfied, then we can group works and show related works without any ambiguity, EVEN IF we have no unique display form. Now, it might result in a confusing display.

Ed Jones

  • Work 1
  • Work 2
  • Work 3

Ed Jones

  • Work A

Two different Ed Joneses, correctly grouped seperately, but both showing up as “Ed Jones”. So it might be going too far to say “it doesn’t matter”. It matters to some extent. We might prefer to have a display title that can disambiguate. On the other hand, we might not need one. Maybe the above display is sufficient. Maybe the works listed under each Ed Jones alone is enough to tell the user which is which! (A birth date isn’t neccesarily what the user needs). Maybe a calculation of most common LCSH strings assigned to Ed Jones(1)’s work would be helpful.

But regardless of how important you think display uniqueness is, or how you think it’s best accomplished, the important point is this: If we can just get collocation uniqueness (which we could call ‘identifier uniqueness’ too), is that better than nothing? Yeah it is! A lot better than nothing. Display uniqueness may just be icing on the cake.

This gets at the concepts I’ve been harping on for a while of conceptually separating the ideas of identifiers and display labels, because it makes it easier to think and talk about what we’re doing. Thanks Ed for explaining it very nicely.

(See my previous posts, The Purpose of Authority Control, Access Points as Identifiers, and
Two Meanings of ‘Identifier’

Our current AACR2 headings serve both purposes. They serve the “identifier” purpose when they provide for collocation–but as Ed Notes this function could instead be served by a ‘dumb’ identifier. And they serve the “display label” purpose when they are used, well, as a display label—and this display label purpose could be served by something that is not unique, but could not be served by by a ‘dumb’ (say strictly numeric, like OCLC num; or a URI) identifier. Two different purposes.

One reason this matters regardless of how we change our own systems of control is that we will increasingly be interfacing with other people’s systems that DO separate these functions, or maybe provide only one but not both of them. To get these things to mesh with our systems, we’ve got to understand that they are two seperate functions. But I do think our systems of control can usefully be improved by separating these purposes to some extent too.

I keep meaning to write more about how the FRAD document totally misses the boat on this, but keep running into writer’s block. It’s unfortunate, because FRAD is the right place to try to get these things straight conceptually, but it totally confuses them instead.

 (The collocating function will be satisfied if its underlying identifier is unique.) In some ways, simply using an unadorned name in the display–even if this results in a series of
identical unadorned names–frees the catalog search engine for more interesting manipulations. At present, selecting my “John Smith” from pages and pages of the same name arranged and differentiated by birth year–not particularly useful–is a time-consuming task that will ultimately defeat me. If I can select my “John Smith” on a variety of useful criteria that I may be more likely to know–prolific author,
nationality, profession, affiliation–I can quickly reduce the mountain of “John Smiths” to a manageable few. And these useful differentiating criteria may have been added to the authority record from a variety of data sources.

Ed Jones
National University (San Diego, Calif.)


Comments on the report of the LC Working Group on the Future of Bibliographic Control (WoGroFuBiCo in William Denton’s amusing coinage) are due. I still haven’t had time to read the document thoroughly, but I guess I better prepare what thoughts I have to send them in. (Two weeks for review seem rather skimpy to me, especially at this time of year, which is a a busy end of a semester for those of us in academic environments).

On my first reading of the report, I was frankly filled with a kind of elation. I think it does a very good job of analyzing the present environment and pointing the direction we need to go–if not the final destination. The basic perspective on the library metadata control environment evidenced here is not neccesarily the consensus among the library field. But I think they’ve gotten it exactly right, and I hope the report coming from such an influential cataloging player (LC) can help to build that consensus around how we understand what’s going on where to go. The headings of the report, “Increase the efficiency of bibliographic production”; “Position our technology/community for the future (in particular with regard to recognizing that software is sometimes the ‘audience’ of our efforts)”–these are exactly the right way to frame our challenges.

Then I read Diane Hillmann’s comments and found myself agreeing wholeheartedly with them too. While Hillmann is, I think, largely sympathetic to the aims of the working group, the devil is in the details. I realized that the report is pretty short on specific actionable reccommendations. More detail would be welcome–or at least the recognition that more detail is needed and a reccomendation that certain task forces need to develop that detail. One example would be Martha Yee’s comments, I think on the RDA-L list, about reducing duplication of authority work requiring a better technical infrastructure for automated sharing of metadata, not just a vague “promoting wider participation” or what have you. I agree wholeheartedly, and would extend that to cover ‘bib’ data too, not just ‘authority’ data. I’d quote her on that if the RDA-L list archives didn’t stop in August.

There is only one place in the report that raises serious red flags for me, and that’s the reccommendations on RDA and FRBR, especially around 3.2.1. To explain my concerns, let’s start with a great framing of the metadata situation from Stu Weibel (quoted a bit out of order to approach this the way I want):

I am asserting that embedding the library in the open Web demands:

  1. A coherent [content] model of what we are describing and the relationships among those entities, and in which each entity is identified with a URI…
  2. A carrier syntax that lives comfortably on the Web (the DC Abstract Model is my candidate)
  3. Rules for populating agreed structures (that at which RDA seems to be failing so earnestly).

I suspect that the perspective of the Working Group would agree with this too (although it would be nice if it were more explicitly highlighted in the report)–but it is not neccesarily one shared across the library community–that applying this sort of metadata control is both neccesary and vital. Whether an audience agrees with this or not will color how that audience reads the working group reccomendation, in potentially dangerous ways the Working Group should take care to avoid.

Stu goes on to say (or actually says first):

There is exactly one candidate for a content model that captures the relations among salient bibliographic entities that are needed to anchor library assets in the larger information sphere: FRBR. It feels roughly right to most, though it would be unwise to underestimate the time we can (ill-afford) to spend on thrashing around in the details.

I again agree entirely, and would add that there is exactly one serious candidate for that third pillar, the rules or guidelines for how we take the real world of concrete objects and capture it in recorded descriptions according to our content model: And that is RDA.

To be sure, neither FRBR nor RDA are perfect. But they are what we have: they are the only serious candidates we have for filling these vital roles. And they have both come out of significant community work. The drafters of RDA especially have had to deal with very difficult competing goals and priorities from various constituencies and funders. If we do not think they have been as successful as we would like–what makes us think we can do better from starting over? Because I think this is the implied sub-text of this recommendation–that both RDA and FRBR may not be worth continuing with. I don’t think we have any other choice though: Again, these are the only serious candidates we have. And if this is not the sub-text the Working Group meant to imply, they should be clear about what they mean, because if I read this sub-text into it when I find it to in fact be a disastrous course of action, I can guarantee that others who find it a welcome suggestion (because they do not believe a formal content vocabulary or rules for applying it other than AACR2 with tweaks are necessary) will also read that as unspoken subtext.

We do not have the luxury of time to start over. Nor, in “internet time”, do we have the luxury of waiting until one of these three “pillars” of contemporary metadata control is perfected before continuing with the others. Work on all of them must go on in parallel. These are the serious candidates we have–we may need to change the processes through which they are developed (eg., back-room closed session dealings are no longer the way standards are best done in the internet world; the reluctance to create implementations before standards are “finished” needs to be replaced by creating prototype implementations to inform the finishing of the standards)–but what we need to do is continue to develop FRBR and RDA, in parallel, with all deliberate speed.

So when the Working Group recommends that we “suspend” RDA work, I wonder what they are proposing as an alternative, and worry that this recommendation will be heard by an audience that does not agree with our premises–that we need a content model; and rules/guidance for applying it–as evidence that we do not need these things at all. Or that we have the luxury of time to make them perfect before seriously engage in them. If the Working Group does not want to send this message, I urge them to reword this recommendation carefully.

In fact, I think that the RDA/DCAM task force is the just about the best hope we have to “specify…, model and represent a Bibliographic Description Vocbulary, drawing on the work of FRBR and RDA, the DCAM and appropriate semantic Web technologies.” This is the already existing group of people prepared and qualified to do this work–it is, I would say, the best candidate we have for starting this work as soon as we need to. I would rather see the Working Group recommend that this group be funded so they can get cracking, rather than recommend that RDA work be “suspended”—which again, some audiences which do not agree on the importance of rationalizing our work to a contemporary metadata control regime (something I think the Working Group does see), are going to hear as evidence that this is not important or urgent after all, that it’s being “suspended”.

The evidence that I see is that the RDA JSC is trying mightily to accomplish exactly what the Working Group calls for, so calling to suspend the one influential endeavor doing that seems just disastrous to me. If the Working Group believes the way they are going about it is wrong, then the recommendation should be as to how that way of going about it should be changed, not recommend that the work be “suspended”. We have no time for suspension.

In particular, I share a concern that certain parts of the operational/business model for RDA are problematic for achieving a good solution–in particular, that the committee of principals which is funding this work sees it as a publishing project which must produce a return on investment (and quickly), rather than as a metadata control standards-making project with different business-case requirements. The consequences of this on the RDA process may indeed be a fatal flaw on what is produced; but on the other hand, the existing actual cataloging community constituencies do need to have buy-in to the process and product, and this is theoretically what the committee of principals represents. So a clear solution to this dilemma is not clear to me. But again, if this is the Working Group’s concern, they again need to make it explicit and suggest alternative approaches, not simply suggest a suspension while we wait for vague research work to be done by parties that have not yet stepped up to do it.

We do not have time for that.

When the Working Group lists their “consequences of maintaining the status quo” for this overall section including these parts, I think they are not nearly dire enough. The consequences of not proceeding with this effort to, in the working group’s words, “model and represent a Bibliographic Description Vocabulary” (which I think means accomplishing Stu’s three pillars) are dire, they are that our metadata efforts will become increasingly irrelevant to our actual information environment.

Why FRBR entity model matters: FRBR considered as set relationships

In response to some recent debate on the lists over whether the FRBR entity model really matters or is useful or is acceptable, to clarify some issues (maybe! Or stir the pot yet more!), and make it clear why I think the FRBR model is a reasonably good approximation to serve as a ‘skeleton’ for our metadata, and why the Group 1 relationships are especially important to the information landscape, I’m going to throw out another way of looking at the FRBR Group 1 entities. (I believe this is just an explanation of what’s in FRBR, not a change; just another way of explaining what’s already there).

=> An item is a concrete physical thing in your hand, naturally. That’s straightforward, yes?

=> Two items belong to the same manifestation if they are physically identical. Or in the case of digital items that have no physicality, if they are bitwise identical, I guess is the good analog.

  • To be sure, some physical features are more important to us than others. A dogeared page makes something no longer physically identical, but we still consider it the same manifestation. So we really sort of mean ‘physically identical at the point of production’
  • Also, as Jim Weinheimer usefully reminds us, we do not go the lengths to really ensure with 100% confidence that two items are physically identical, we just approximate, and decide that they can be treated as physically identical for our users needs, trying to optimize meeting user benefit per staff time put in.

=> Two items belong to the same expression if they are textually identical. Or more generally for non-textual materials, if their information content is identical (not revised or amended–and certainly not entirely different!).

  • This is even fuzzier than physically identical, but still of importance to users! The FRBR report itself makes clear that realistically, this is more like “can be considered textually/information identical, for the purposes of a user community, balanced with the resources available to make this determination.” Many users might consider two items textually identical despite some minor trivial differences; whereas a rare books scholar might consider the tiniest difference vital.
  • And yes, it’s more clear how this applies to textual materials than non-textual, but I think it still matters for non-textual materials. Is this print the very same pictures as this other print, or did the artist ‘revise’ the pictures before making the other print? With music even trickier, but it still matters to the user if the information content has changed or not (but with music there is less clear understanding of when the information content has changed; the same ensemble playing the same arrangement on a different day may be information difference that matters to the music community).

=> Two items belong to the same work if… well, they belong to the same work. This one is entirely culturally determined, and there’s no good way to say it in any more basic language (although FRBR tries), but while the concept of ‘work’ is entirely cultural, in Western culture at least it is a very important concept, which matters quite a bit to users. Basically, we know it when we see it. Is this thing an edition of Shakespear’s Hamlet, or is it not? This is something that matters quite a bit to the user, who may be looking for an edition of Hamlet, any one will do. Or may be looking for all editions of Hamlet.

  • To be sure, it’s a judgment that is subjective, contextual, and which reasonable people can sometimes disagree on–especially in edge cases. But it’s still an important one to users! Especially in strange edge cases, one person may disagree with another about whether an item is an edition of Hamlet or not, but the notion of “Hamlet”, and that various expressions and manifestations (as defined above as sets of items sharing physical and textual identity) may embody “Hamlet”—is a key thing to (at least Western) readers/users conception of the bibliographic/information landscape. Whether the naive user uses the word ‘work’ or not, it’s still a key concept.

So, physical identity, textual/information identity, and embodying the same work—I think these are fundamental divisions of the bibliographic/information universe, which are very important to our efforts to give the user a somewhat organized approach to that universe instead of just chaos (which all too often our OPACs give them now!). I think this means they do serve as a good basic model for the general purpose bibliographic/information universe, a skeleton on which more specific things can be built, but which still gives us a way to compare and explain anatomies, as it were. I think these characteristics of relationships between items are also especially important and basic ones to ‘bibliographic control’, and therefore justified in having a central place in the FRBR model–but that doesn’t mean that other relationships aren’t also present and important, and in certain ‘edge cases’ other relationships may even be more important.

FRBR imperfect? So then?

We hear all the time “FRBR is untested, FRBR is incomplete, FRBR needs work.” One version of this is in Karen Coyle’s summary of the Working Group on the Future of Bibliographic Control report. [ I’m waiting for the official written report before really responding to these reccommendations, but I’ll respond now just to the informal comments here as one point of view, regardless of whether it accurately represents the working groups’.]

“The framework known as FRBR has great potential but so far is untested.”

Now, as it happens, I in fact agree with this completely so far as it goes. However, that doesn’t change the fact that we desperately need what FRBR is trying to do—a formal and explicit schematic of how we
model the ‘bibliographic’ (or ‘information resource’) universe. Some agree that we desperately need this, some don’t and think it’s all a bunch of hot air. I’ve made my case for why we need it before, and probably ought to do so again in more polished form.

But those of us who agree that we desperately need this, AND that FRBR is an untested and imperfect attempt to do this—then what? Either we: Continue reading “FRBR imperfect? So then?”

DLF forum, bowker presentation

At the DLF forum, I saw a presentation from someone who’s name I forget from Bowker, where I learned two simple things very interesting to me.

I learned about the plans for an International Standard Text Code (ISTC), sort of like an ISBN but applies at the FRBR “expression” level, grouping a set of ISBNs. (Although Bowker seems to call what it applies to a ‘work’, it is in fact meant to apply only to things that are ‘textually identical’, which is what we call the expression level. It is also only meant to apply to textual material, not audio, video, etc. She claimed that audio and video already had something similar, although it wasn’t widely adopted. I know nothing about this?) This would potentially be quite useful, of course, to have an expression level identifier from the ISBN people. It also makes me think of how it might harmonize or not with the library world’s plans for ‘frbrization’–it’s being done for the needs of the publishing and sales industry, like ISBN. Of course, right now there’s not much for it to harmonize to in the library world, just talk.

I’ve been having writer’s block on writing the essay I intend about identifiers and ‘access points’. I think I need to stop thinking of trying to write the perfect essay and just put my unfinished sketchy notes on the blog, because I think it is a important topic–especially for talking about how what we’re doing relates to what everyone else is doing. We need to use compatible language and compatible mental models.

Anyway, the second interesting thing I learned is that a few months ago Bowker released a web service for accessing ISBN metadata. Which is packaged with Books in Print, meaning it’s free to anyone who already has an online Books in Print subscription (many if not most of us). I don’t know the details, but am eager to find them out and play with the service.

I like that pricing idea–hey, they’re already paying lots of money for BiP, they shouldn’t need to pay even more to get 21st century methods of access to that same content they are already paying for, we should instead improve the service to 2007 levels. If only more of our library vendors worked like that. Instead, we are often in the situation where the software we pay a fortune for is stuck in 1985, and if you want modern software you’ve got to purchase and continue to pay licensing and support for an additional add-on “product”. Ridiculous, and almost all of our vendors do it. (Perhaps the market requires them to do it that way to stay in business–if so, the market is doomed and their time in business is limited anyway. Hopefully not along with ours.)

The Purpose of Authority Control

So I’ve been not blogging for a while–I managed to arrange my job so I could devote myself to some serious software development, and have been reminded of both what I liked AND what I didn’t about how obsessive I can get about coding. I get really caught up in it. Hopefully some news on what I’ve coded soon.

But meanwhile, I’ve been wanting for a while to write about authority control and identifiers, a topic I have written about before. There are some points I’ve really wanted to make, but I’ve had a bit of writer’s block on it, because it is so hard to talk clearly on this subject—it’s hard to even think clearly on this subject. But I think it’s crucial, and I think there are some important things to be said, that I’m getting a bit clearer thinking about and saying.

After trying to figure out how to say these things clearly, I decided that first we need to establish some basic agreement about the purpose of authority control. I’m sorry that this ends up being so lengthy, but I think it’s necessary to be clear.

Continue reading “The Purpose of Authority Control”

The Purposes of ‘Subject’ Vocabularies

LCSH, LCC, DDC, Ulhrich’s subject headings, BISAC, Ranganathan’s Colon Classification, Bliss Classification (2), Amazon’s subject headings: All are examples of ‘subject’ controlled vocabulary.

I put ‘subject’ in quotes because in reality most, if not all, of these examples include terms to capture ‘aboutness’ as well as terms to capture discipline (ie, perspective), and genre (and in some cases form, format, and intended audience). (Yes, Dewey sometimes captures ‘aboutness’ and LCSH sometimes captures disciplinary perspective. Take a look.)

I have been interested for a while in exploring the purposes of these types of vocabularly. I think they are not as clear and simple as we might be used to assuming. I wrote a (too) long paper about it in library school, which I’ll attach here. I actually wrote this before I had seen NCSU’s Endeca implementation; I’d have written it differently after; but I think this discussion is very relevant to understanding effective use of controlled vocabularies in facetted navigation. Recent discussion on NGC4Lib regarding these types of vocabularies further emphasizes, to me, the importance of considering the functions.

In my paper, I argue that in looking at these vocabularies from the perspective of functions or purpose, the traditional line between ‘classification’ and ‘subject vocabulary’ isn’t actually that clear, but instead we have a number of purposes (not just two) which a given vocabularly may serve better or worse.

The paper is awfully long, so I’ll also now summarize my suggestion as to an initial draft taxonomy of functions. (These functions admittedly overlap in some ways, but I still think ) (The next step, to determine what features of a vocabularly fit what functions or purposes–is only touched upon in the paper). Continue reading “The Purposes of ‘Subject’ Vocabularies”