More struggles with MARC: Indexing 505 contents notes as title/author?

So in our catalog, like most academic libraries, we have a number of edited collections, books that are collections of scholarly essays by various people.

For instance: Global critical race feminism : an international reader
edited by Adrien Katherine Wing ; foreword by Angela Y. Davis.

One of the essays in there, chosen just as an example, is “Holding Up More Than Half the Sky: Marketization and the Status of Women in China / Anna M. Han”

The MARC record we have for this book has the table of contents transcribed in a 505 field (not guaranteed, but most of the edited books in our catalog seem to), and in addition have the title and author components of each essay in the table of contents broken out in separate targetted subfields (of our 505s, some do, some don’t, not sure which is more common).

For instance:

505 [...] g| 26. t| Holding Up More Than Half the Sky: Marketization and the Status of Women in China / r| Anna M. Han -- [...]

The chapter title, “Holding Up…” is in a subfield “t” for “title”, followed by the author of the chapter in a subfield “r” for “author”. (The chapter numbers transcribed from the table of contents page are in subfields g “miscellaneous”).

Now, in MARC, a cataloger could add a controlled/authorized author and/or title heading for each individual chapter ‘analytic’ in a MARC 7xx, but this is very rarely done. More common, just the transcribed table of contents in a 505, although often using this $t and $r subfields to mark up the titles and authors semantically.

Our typical legacy catalogs would include an appropriate 7xx in the title and/or author indexes, but not usually these 505s.

But I thought, at the point I was setting up our Solr indexes — it makes sense that if someone is searching our catalog using an ‘author’ fielded search for everything with author ‘Anna Han’ — it would be optimal if they found this book, since it’s got an essay in it by someone Anna M. Han, and it’s conveniently notated in a specific 505$r subfield as an author, hey, it would be easy to do.

Same if someone searches for title: "Holding up more than half the sky", right? They’re looking for the essay, a known title, and we’ve got a book that has that essay in it, wouldn’t it be optimal if their query succesfully matched?

So I initially included 505$r’s in my author index, although ‘boosted’ lower than 1xx/7xx controlled creator headings, just like other transcribed author/responsible parties from 245$c). And I included 505$t’s in my title index, findable by fielded ‘title’ searches too, although again boosted fairly low.

However, this causes a problem. Because for books that are not edited collections, but have chapter titles transcribed in a 505 too, 505$t is sometimes also used for those chapter titles.

For instance, check out We the people : an introduction to American politics, with it’s doozy of a transcribed table of contents. Each individual author in a seperate 505$t.

505 [...] t| Equality -- t| Democracy -- t| Liberty, Equality, and Democracy in Practice -- [...]

No $r’s, but lots of $t’s. It makes less sense that a query for title: "Liberty, Equality, and Democracy" would match this book because it matches the name of one of the chapters in it, that isn’t what we want out of a title search. And even worse that a query for title: labor african nation-building matches that book, because the words “labor”, “african”, “nation”, and “building”, are all in one of the voluminous chapter names transcribed in 505$t’s.

And in fact, this kind of false positive for title searches, matching on chapter headings in 505$t’s, is one of the main categories of “why the heck is this matching?” questions I get from user-facing librarians.

Sometimes, even for general ‘keyword anywhere’ searches, considering 505$t’s to be ‘title’ results in weird relevancy ranking, as it’s boosting those chapter headings, potentially wrongly.

Other times, the questionable case reported to me is a known item fielded title search, and invariably the ‘correct’ hit was first on the list, but some other item that didn’t seem like a title match at all was lower down on the list, again because of spurious “title” matches on chapter headings transcribed in a 505. It’s unclear to me how much this actually bothers users — the correct wanted thing was first, and I’m not sure how often users use fielded searching anyway. But it’s clear that it really disconcerts and bothers user-facing librarians, it was a very popular category of question on relevancy ranking.

I think the ideal solution would actually be to index 505$t as ‘title’ only in cases where there’s a subsequent 505$r, or even only in cases where the 505 also contains at least one $r. The 505$t’s used for individual essays in an edited collection almost always have corresponding $r’s, and the 505$t’s used for chapter headings in a monograph almost never do.

Even if we did this ‘ideal’ solution, we might miss some things we wanted to index: For instance, A trilogy, all by one author, combined in one book; we’d want to index the individual titles as titles, but they probably won’t have seperate 505$r authors listed. However, in such cases, often there will be 7xx ‘analytic’ titles. A title search for “fellowship of the rings” wouldn’t find this record for Lord of the Rings trilogy. Perhaps catalogers are more likely to add 7xx analytics in such cases, although not in this one. Oh well, we can’t have everything. (Although actually, in this case, the 505 doesn’t even use $t, so it wouldn’t have been indexed even with our unrestricted 505$t indexing, that one is just left out due to insufficient cataloging either way).

However, using SolrMarc, the tool we use for indexing Marc into Solr, it’s a bit painful to implement that kind of custom logic, I’d have to take my whole title indexing configuration, currently expressed using just configuration, and transition the whole thing to custom Java (or BeanShell) code, which tends to be more confusing to understand, etc. (My experience with SolrMarc sure tells me what things I’d make different were I to work on a next generation replacement for it, informed by experiences with it. Maybe some day.)

So I’m not completely sure yet if I’m going to implement that kind of custom logic, or instead just remove 505$t from my title index entirely. If I remove it entirely, it will unfortunately mean that a search for title:"Holding Up More Than Half the Sky" author:Han will no longer find the “Global critical race feminism” reader, even though there is an essay with that title/author in the work. We’ll see.

This entry was posted in General. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s