Yet Another Defense of FRBR in a Linked Data World

There is periodically some discussion on the nets, including recently on the RDA listserv, where people attack the FRBR Work-Expression-Manifestation-Item model, from what they believe is a forwards-looking linked-data-oriented perspective.

I still think the WEMI model (or ‘ontology’ — the model of the ‘things’ to be dealt with in the system or data ecology) is in fact crucial for linked data applications, rather than problematic. (My earlier post on the FRBR model as set theory may be useful to reacquaint yourself with the model, or read how I think it’s most productive to think of it).

Linked data applications rely on taking data from multiple sources, and being able to tell when it’s about the same ‘thing’.

But what is a ‘thing’, in the ‘bibliographic universe’? 

Different editions/versions of a work may matter.

  • For instance, If there are multiple editions/versions of a book, they may have different pagination.  If you care about what’s on “page 123” of a book, it’s crucial to know what version/edition is being talked about. In fact, the FRBR “manifestation” entity is exactly the set of things where (eg) word number 4 on page 123 will be identical.
  • Different versions/editions may contain different revised text — for a “close reading” criticism of a text, it may matter exactly what revision of the text it was based on, but not exactly what the pagination was — this set of all printings that have share the exact same text is exactly what an ‘expression’ is.
  • Other data may be about the work as a whole, regardless of edition/version/revision, or we may not know exactly what edition/version/revision is about. We may have a user review of “Hamlet” that doesn’t indicate exactly what edition of Hamlet they are reviewing, it is appropriate to link such a thing to the ‘work’ as a whole, for lack of more precise information if nothing else.

In a linked data world without the WEMI ontology, we have a mishmash of data and no way to know what ‘thing’ the data is really about. If there’s a citation to page 123 of Hamlet, but it’s just attached to an identifier for “Hamlet” as a whole (the “work”), we have no way of knowing or even of later establishing what edition of Hamlet that can be found on page 123 of.  If it’s instead attached to an identifier for ‘manifestation’, that identifier can later be linked to the edition in Amazon or a local library catalog or online fulltext etc, even if it wasn’t initially, establishing what edition all things talking about that particular manifestation identifier are talking about, and allowing a user to track down that page 123 citation.

What matters, users mental models, or meeting user needs?

It doesn’t in fact matter whether WEMI matches users mental models.  The bibliographic universe is complex in an abstract way, and users are not used to thinking about it analytically. Different users may have different mental models, and users mental models may not be internally consistent or complete or practical for basing actual answers to user questions on.

WE, and our community and tradition, are the experts at thinking analytically and logically about the bibliographic universe.  That WEMI is the formalization of a library tradition of modelling the bibliographic universe is not a flaw, it’s too it’s credit.

Interfaces do not need to present things using the WEMI language or even the structure, and maybe shouldn’t — but it’s that structure that lets us answer user questions and meet user needs ‘under the hood’, and that common language that lets us talk about things ourselves in setting up common systems, using ‘terms of art’ understood by us.

Now, what would matter is if the WEMI ontology is useless or counter-productive for actually serving user needs and answering user questions. But, while it’s surely not perfect, it is clear to me that it is in fact useful and productive and better than anything else we have or are likely to come up with anytime soon for serving user needs and answering user questions.  The linked data world absolutely demands a common shared ontology making sense of the ‘bibliographic universe’, not just a mishmash of mental models where it’s unclear what the ‘things’ being talked about are, and the FRBR WEMI is the best we’ve got, and pretty darn good.

From ontology to formal vocabularies

Now, it may be that the particular vocabulary formalization of FRBR represented by RDA or the RDA formal vocabularies are not right, or have serious problems, or are counter-productive. I am not experienced enough in using those vocabularies or in the actual on the ground techniques of linked data to have an opinion one way or another, but it may be.   But that would be a criticism of the particular way FRBR was fleshed out and further formalized in RDA and the corresponding vocabularies — not of the WEMI ontology.

Forget the FRBR user tasks if you like

FRBR presents itself as being based on ‘user tasks’ and ‘requirements’. It’s even there in the name, ‘functional requirements’.

It made sense to try to base everything we’re doing on user tasks and ‘functional requirements’, it was a noble effort. But the bibliographic universe and our users contexts is enormously complex, that part may well have failed.

But while it may be ideal to approach things that way (starting from “user tasks”), we can’t always pull it off, and it doesn’t mean we throw up our hands and do nothing, or spend decades trying to get ‘user tasks’ right. We go forward anyway.

I don’t write to defend the FRBR user tasks or ‘functional requirements’, I am agnostic on it.

But the FRBR WEMI ontology is useful anyway. Perhaps the idea that it’s based on user tasks is a fiction that should be dismissed. We can accept that the FRBR WEMI model really comes out of a rigorous analysis and formalization of traditional cataloging mental models of the bibliographic universe as expressed in our metadata.  And that’s just fine, for reasons I try to justify above, it is useful as that, and useful in a linked data universe going forwards as that.

So, if you like, give up on the actual user tasks or functional requirements from FRBR — heck, everyone else has, who actually pays them any attention anyway?

The most useful part of FRBR is actually the WEMI ontology, and it’s use value is in fact not dependent on the user tasks or ‘functional requirements’ — even if the FRBR report itself would have you believe differently.

Or,  maybe the user tasks and ‘functional requirements’ are actually useful. I dunno, I’m agnostic. I’m just asserting that the value of the WEMI ontology is clear either way.


It is quite clear to me that the WEMI ontology is not only useful but crucial for a useful linked data environment, and especially for one that preserves the hard-earned and useful semantics in our present data (which DO make a distinction between ‘manifestation’ and ‘work’, although we’ve generally not analyzed ‘expression’, no big deal).  And it continues to dismay and frustrate me that people are so negative towards the FRBR WEMI ontology, thinking they are being linked-data forward-looking.


4 thoughts on “Yet Another Defense of FRBR in a Linked Data World”

  1. I like your wilingness to throw out the user tasks but keep the model. I was in the camp of dismissing FRBR because of the shortcomings I see in the user tasks. I still think that focusing on those user tasks was A Bad Thing for libraries going forward because it assumed (or I interpreted it to assume) that future use of library data could be extrapolated from what we’ve done over the last century.

    But I can see where linking data from a review of Hamlet to a given manifestation (say, a recent version starring Ethan Hawke) would be useful in separating it from a review of a Kenneth Branagh production. And it would be more helpful to attach critical works about the text of the play itself to an expression of that work. On a different level, it could also be useful to pull all of those reviews and critical works together in order to say something about the impact of Hamlet overall.

    I can’t get my mind around how this would be set up in linked data (I’m just starting to think along those lines) or how it would be presented to the end user. But thanks for this post (and the earlier one you reference) – very useful and accessible.

  2. Joe, I think it’s just that a manifestation needs an identifier. Then when making a linked data triple, if you want to say something about a manifestation, you use an identifier for a manifestation. If you want to say something about an Expression or a Work, etc.

    Some people might use ‘canonical’ centralized manifestation identifiers, say from OCLC. (OCLC, I have heard, DOES have M,E, and W identifiers internally, although they aren’t distributed much).

    Other entities might only be able to say “Well, this is about SOME manifestation which we give identifier X to. And we can say that X was published in a certain year from a certain publisher.” And then later, other people (or the same ones) could say that manifestation identifier X from is the same thing as manifestation identifier Y from OCLC. (Which is just one of the reasons it’s important to know that X and Y _do_ identify manifestations, so you can later figure out if they identify the same manifestation or not).

  3. You make an excellent point that linked-data applications stand to benefit enormously from identifiers pointing unambiguously to work-level descriptions, and even moreso to descriptions of expressions and manifestations. Linked data could unpack those identifications within a single record to serve a number of distinct purposes specific to either expressions or manifestations, but only if the identifiers clearly link the statements about the resource. If you’ll indulge a little devil’s advocacy, I might take issue with your statement that “Interfaces do not need to present things using the WEMI language or even the structure, and maybe shouldn’t.” Since linked data provides syntax that could be useful for building out the tree of FRBR entity relationships, as you note, I wonder what would be wrong with exposing that structure for users to scan rather than hiding it under the hood. That user who comes looking for any exemplar of the work _Hamlet_ might well benefit from learning what other expressions have been created, leading to finer-scale consideration of which would best suit a particular need. Couldn’t someone who only wants the most easily accessible copy could look past all the finer distinctions and grab whatever is closest to hand?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s