cataloging and ‘citations’

So my understanding is that many ‘entries’ in a cataloging record are meant to be ‘citations’. They are meant to unambiguously identify the work cited.   In the age when cataloging rules were created, what you’d do with that unambiguous citation was simply look it up in a printed or card catalog.

But the very precise rules involving ‘main entry’ and ‘uniform title’ should, I believe, allow software to unambiguously find the target of the citation in a database, if it’s there.

I am at the very beginning stages of figuring out how to do this exactly, it’s not exactly simple.

If it turns out that you can’t even do this, I’m really going to think that much of the very complicated and time-consuming cataloging rules are irrelevant in the post-card-catalog age. But we’re not there yet.

Initial signs, however, aren’t very good. Take this example from OCLC docs on 76x-78x linking fields.

The first choice for identification is the uniform title. If available, use the entire uniform title (e.g., title and qualifier) to identify the related publication. If the uniform title is unavailable, use the main entry and title proper. For example, if OCLC record number 6597310 has the following uniform title:

130 0 Monthly digest of statistics (Zimbabwe. Central Statistical Office)

It would be linked to the related publication in field 780.

780 0 0 t Monthly digest of statistics (Zimbabwe. Central Statistical Office) w (OCoLC)6597310

Okay, fair enough. And a referenced uniform title should indeed allow us to unambiguously identify records belonging to the cited work.  But wait. That title is clearly a uniform title, it’s given in a 130.

But in the 780 example then… shouldn’t that title be in subfield ‘s’, not ‘t’? 780 subfield s is clearly documented as “uniform title”, right?

But wait, $t says: it is indeed used for title elements from a 245 or a 130.  Subfield ‘u’ is only used for field 240 entered uniform titles.

So wait, when citing a work in a 780, you put a uniform title in subfield s if it’s title-main-entry, but you put it in subfield t if it’s author main entry? And when you find a title in t, there’s no way to know if it’s a uniform (controlled) title, or a transcribed (245) title?

Um. So, um.  I am kinda speechless. If you’re going to spend all these expensive cataloger hours following very precise rules, wouldn’t it be sensible to make the rules result in data that can actually be interpreted to do what’s it’s supposed to do?

About these ads
This entry was posted in General. Bookmark the permalink.

14 Responses to cataloging and ‘citations’

  1. Bryan says:

    Have you talked with anybody at the MARC standards office (ndmso@loc.gov) or OCLC about what you are trying to do? Maybe they can clear some things up for you.

  2. melanie says:

    This is one where I don’t understand it either. I understand uniform titles pretty well, but not the 780 field.

  3. Shawne Miksa says:

    Can you give the actual rule sequence from AACR2 that corresponds to the problem you are describing? I’m just seeing MARC via OCLC’s Bibliographic Format and Standards–which I use all the time in my cataloging class but with the caveat that MARC documentation (whether LC or OCLC) does not refer to the actual rules. There is at times some disconnect between what the cataloging rule may say and how that is transcribed into an encoding standard such as MARC.

    Thanks, just curious.

  4. jrochkind says:

    No, I can not. After I spend several hours (days? weeks?) with AACR2, I’ll let you know if I can.

    In the end, what my software has to work with is MARC. So, yeah, I’ve got to understand AACR2, and I’ve got to understand MARC, and I’ve got to understand how they relate. This is no mean feat. Just understanding the AACR2 rules won’t do anything for me — in the end, I need to know what to do what the MARC records I get.

    We generally think of AACR2 rules as _input rules_, because that’s what they are. But in the end, software designers need to understand _rules for understanding what is in the record_, which kind of don’t exist. You’ve got to reverse engineer them from AACR2 combined with MARC combined with unwritten cataloger practice.

    So, yes, that’s exactly what I’m trying to figure out.

  5. Shawne Miksa says:

    Interesting–I don’t entirely agree with you, but then I’m not a software designer. From my standpoint AACR2 is very similar to a programming language in that it is really just a series of “if…then” statements. MARC, in and of itself, has no meaning without the input rules. As well, MARC is not the sum total of cataloging. (Bad analogy–focusing just on MARC is putting the cart before the horse.)

    When you say that rules for understanding what is in the record don’t exist—what do you mean exactly? What about the MARC documentation that gives specifications for the data values to be entered? Again, I’m not a system designer or programmer but it seems to me that with any system the conceptual and logical model comes first, followed by the actual implementation. Understanding AACR2 will give you insight into those models.

    It is very possible to learn AACR2 in a few hours—the basic patterns of rules that occur in Chapters 1-12, for instance. These are all based around the eight ISBD elements. In fact, once you understand Chapter 1, which is the master chapter, then Chapters 2-12 as very interpretable.

    Perhaps the problem with cataloging systems these days is that they were developed around MARC, instead of how the cataloging flow works as outlined in rules such as AACR2.

    Just some random thoughts.

  6. jrochkind says:

    It might be easy to some, and not to others, i can’t speak for everyone!

    But if it’s easy for you, and you have time, I would love to have a conversation with you to help me better understand what’s going on.

    Indeed part of the problem is that systems _have_ to work off of MARC, because that’s all they have. But to work off of MARC well, you indeed need to understand what the AACR2 rules are, and how they end up being translated to MARC. Which is what I’m trying to do, get my system to have a display that actually makes sense based on the underlying data — but my software doesn’t have access to the minds of catalogers, all it’s got to work with is what’s in the MARC.

    So in particular in this case, we can take a specific example of a 700. How, from the MARC, can my software be unambiguously sure if it’s a name entry or a name-title entry? If it is a name-title entry, what’s the right way to find the record (if it exists in the local catalog) that the 700 name-title refers to? What subfields in the 700 name-title should be mapped to what MARC tags in the “target” (or “cited”) record? I know I’m talking in MARC, because that’s what the software has to work with, but the challenge is in ‘reverse engineering’ to the AACR2 to then go back and understand what to do with the MARC.

    If you think you’d have some insight there, I’m completely serious that I’d be thrilled to have a phone or email conversation with you, to hopefully help me understand it!

    Once I do that with the 700 example, then I need to go on to do a very similar thing with, for example, the 76x-78x fields. Which might be a very similar solution, or might be different. Depending on both the MARC and AACR2 rules involved. That’s what I’m trying to figure out, and kinda getting stumped.

  7. jrochkind says:

    To be honest, I can’t even figure out what part(s) of AACR2 govern making name-title entries in a 700, for the various reasons you might do so.

  8. Irvin Flack says:

    Jonathan, I feel your pain and I suspect there will be other cases where MARC will leave you in ambiguity. I don’t agree at all with Shawne’s comment about learning AACR in a few hours — but I do agree strongly with these two statements she made:

    “There is at times some disconnect between what the cataloging rule may say and how that is transcribed into an encoding standard such as MARC.”

    “Perhaps the problem with cataloging systems these days is that they were developed around MARC, instead of how the cataloging flow works as outlined in rules such as AACR2.”

    A cataloguer must learn not only AACR, but also a set of complicated rules around subject headings (LCSH) and a classification system like DDC or LCC. These are not integrated with each other: reading AACR you could be forgiven for not knowing a catalogue record would also include subject headings. On top of these the cataloguer must learn the intricacies of MARC, which conforms with the rules of the other cataloguing standards — except when it doesn’t — and also adds some functionality of its own, like fixed fields, which may not relate to instructions in AACR.

    And this all assumes that cataloguers will know all the rules (which they often don’t, given the sheer number of rules and number of places to find them), apply them correctly and consistently (which they don’t), and that library systems will interpret the MARC fields accurately (they don’t).

    It’s an unholy mess — but cataloguers learn to live with it and even love it. It would be tempting to bulldoze the lot and start again from first principles: which RDA half-did but then lost heart and ending up propping the remnants.

  9. Jason Thomale says:

    One of the things in the library world that has always interested/concerned me is this apparent gulf between catalogers and programmers. The more I hear the two groups talk past one another, the more I believe it’s a fundamental difference in worldviews–something like the difference between engineers and people that study the humanities. Until we bridge that gulf and build a truly shared understanding between (at least) those two camps, I think any attempts to update our cataloging rules and data formats are going to end up in the kinds of compromises that have ruined RDA. We don’t need compromise. We need unification.

    Toward that end, this series of posts and the resulting discussions have been excellent. The library community desperately needs more of this. I think this is an excellent opportunity to build some of this shared understanding.

    This morning I pulled out my Concise AACR2, which I normally just use for reference, and actually began reading through it. There were a number of things that struck me. Before I go on, please keep in mind that I don’t do cataloging–people like Shawne understand cataloging more thoroughly than I probably ever will–so please take what I have to say with the requisite grain of salt.

    The main thing that hits me is that cataloging rules really are built for human interpretation and understanding. Of course they are–the foundations of cataloging were formed before computers even existed. In a recent NGC4LIB post I compared MARC to natural language; really I should have compared *cataloging rules* to natural language. To be fair, cataloging is definitely clearer and more structured than natural language, but I don’t think it’s unambiguous enough to be comparable to a programming language. And, to be precise–there’s a difference between the program (i.e., the rules that tell you how to do something) and the resulting *data* (i.e., the recorded results of what you’ve done). You can write a program that creates data that would be difficult for another person’s program to read. So, what we’re talking about here is interpreting the data that results from following the rules.

    Anyway, AACR2 may indeed be relatively simple for a person to learn and use. But people have a base of knowledge that helps them interpret. Looking through the examples given in The Concise AACR2, it’s readily apparent to me how to read them. Title, statement of responsibility, publisher–yeah, I can see the patterns, and they make sense to me. There are a number of things in the data that help me recognize the patterns. The punctuation is one of those things. But another is my existing knowledge of–well, for example, what does a publisher’s name look like? What does a title look like? What do personal names look like? Even without the punctuation, I can usually tell these things apart. A computer can’t–not without complex “learning” algorithms, and never with 100% accuracy. So the computer needs something explicit and unambiguous–like the punctuation–so that it can recognize the patterns, too. But, given the AACR2 cataloging rules as they are, *is* the punctuation always enough? Are the patterns present in the data (resulting from the cataloging rules) interpretable by something like a computer, which sees the data as opaque and doesn’t actually understand the informational content?

    Well, here’s an example of an ambiguity that’s very easy for a person to resolve but would be very difficult for a computer–right there within the first couple of pages.

    In The Concise AACR2, rules 1B2, 1B3, and 1F1 indicate that you don’t repeat a name in the statement of responsibility if it’s part of the title (e.g., “The Rolling Stones’ greatest hits”). Right off the bat, that’s a problem. A person looking at such a description can easily see that the title contains the name of the entity responsible for the resource–just based on their existing knowledge of such things. But a computer can’t do that–again, not without the machine learning. Now, let’s combine that with rule 1F4: if there’s no statement of responsibility on the piece in hand, then don’t supply one. Okay, so, if a computer runs across a description with a title but no statement of responsibility, what does it do? It can’t just assume that “no statement of responsibility” means that the responsible party is given in the title. Nor can it just assume that “no statement of responsibility” means that there was no statement of responsibility given on the piece. It can’t easily look at the title (like a person can) to determine that information. For a person, it’s a no-brainer. For a computer, it’s too ambiguous.

    Now–I realize, in a real catalog record, you’ll have additional things such as main entries and added entries that will probably supply that information. But this is just one concrete example showing that the cataloging rules don’t produce data that is necessarily machine-interpretable. If I kept going through the AACR2, I’m sure I would find many more examples.

    Clearly MARC is an attempt to deal with this problem by providing a machine-readable container for data produced via the cataloging rules. But–for one thing, machine-”readable” and machine-”interpretable” are two different things. And, for another thing, as others have often noted, MARC was devised in the era of the card catalog. From what I understand, it was meant to be nothing more than a way to store card catalog data so that cards could be shared among libraries and printed. Until recently, computers have never needed to “understand” cataloging data. That, combined with many other reasons, means that MARC is rife with the same sorts of ambiguities that are in AACR2. Jonathan’s recent posts prove this. Sure–MARC *is* machine-interpretable, but really only up to a point. There’s probably a good reason that OPAC displays of MARC data haven’t moved too much beyond just showing a digital catalog card.

    And, of course, Irvin’s comment offers more pieces of the puzzle.

    To most catalogers I know it probably appears that programmers are just being whiny, anal, and intentionally obtuse. But good programmers have to think like computers, and computers are incredibly stupid. Computers need to have things spelled out for them in exacting, unambiguous detail.

  10. Shawne Miksa says:

    MARC was developed to help with the automating of catalog cards, yes. The essence of this process–as it was with manually produced cards, written by hand—was to save space on the card. Very obviously this has been carried over to the how MARC is used in electronic systems. Oh that ideal so needs to be changed–enter RDA. RDA is based IN PART on FRBR which is essentially an interpretation of the Entity-Relationship model, a database model created by Peter Chen in 1976 (I believe). To me this signals the cataloging enterprise’s understanding that cataloger’s and programmers need to come together. Right now.

    Let’s no be disingenuous about how RDA is developing. “Lost heart” is a bit harsh. If anyone expects all of this change of conceptual fundamentals to take place in one shot is completely unrealistic and I strongly advocate for a review of the history of how cataloging rules develop by committee work, especially international committee work. This is not going to happen with the tap of a key on the keyboard. It seems the faster and better our technology becomes the more the loss of patience. Patience is a concrete virtue. (all said jokingly–I’m the most impatient person I know).

    The point of disconnect is “cataloger’s judgment”—which several of you have commented on. This is a concept we discuss often. The interpretation of the rules, the human response, is not yet programmable—would that be a fair statement?

    Cataloger’s work has to take into account many variables–the system’s needs being just one. Cataloger’s function as information translators; their work falls at the intersection of the user, the system, the information objects, and the creators/authors of those objects. All need representation (to borrow from my colleague, Brian O’Connor’s work). Yes, the system need unambiguous detail, but that is very challenging when everything is ambiguous and open to wide interpretation.

    Just for the record—the fundamentals of AACR2 can be learned in a few hours, not the entire set of rules. That takes a while–I’m still working on it.

    Email me about setting up a time to talk about all this. I’m more than happy to discuss and exchange ideas.

  11. Pingback: HotStuff 2.0 » Blog Archive » Word of the Day: “240″

  12. Melanie says:

    Be willing to share, please. I find this discussion quite interesting.

  13. Given that the RDA vocabularies will be in final registered form when RDA is released, it seems to me that there’s plenty for programmers to play with already. Take a look at http://metadataregistry.org/rdabrowse.htm for the current element sets and vocabularies (still being updated, but almost there).

    The group of us that’s working on the RDA registrations are preparing an article which we hope to have published online by the end of the year, which will talk about how all this was done, and perhaps most importantly, WHY.

  14. Irvin Flack says:

    Yes, I didn’t mean to sound that harsh re RDA: I think what has been achieved is fantastic.

    One of RDA’s major difficulties is that, like AACR, it’s set of content guidelines: just one layer of the stack, and not the bottom layer. Other layers, particularly the domain model and the vocabularies, should have been laid down _before_ the guidelines. Instead, we’ve taken the content guidelines and then extrapolated what the vocabularies and domain model should be. It’s like building a house from the top down. But it’s happening, (see the RDA vocabularies), which is the main thing.

    Shawne, I agree about the fundamentals of AACR being quick to learn: but I would say the best way to learn them is not to read AACR. ;-) A good teacher or mentor is the best, or at least a good textbook.

    Having said that, after just using AACR as a tool for years, I actually once read it from front to back like a novel when I had a spare few weeks (long story!). By the end I was left feeling awed by the intellectual achievement it embodies and the years of collective cataloguing wisdom captured in it. But I probably wouldn’t recommend non-cataloguers doing it, unless they were insomniacs.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s