RDFa :: HTML5 microdata

RDFa and HTML5 microdata, are, I think, basically interchangeable.

RDF and microdata both use the same fundamental triple data model. Please note that schema.org is just a specific set of vocabularies that can be used with HTML5 microdata,  HTML5 microdata goes beyond this. schema.org is a pretty good microdata tutorial though, if you remember you don’t have to use it’s vocabularies.  Here’s the actual microdata spec. Here’s a good microdata tutorial that pre-dates schema.org and is not schema.org-specific.

You can take pretty much anything that’s RDF, from any vocabularies, and use an RDFa style approach to express (basically) the same semantics is in HTML5 microdata  instead.

This is a good thing for RDF, because there’s no good way to do RDFa in HTML (or anything but xHTML which is basically an abandoned approach — RDFa needs XML namespaces).  You can go from (any) html5 microdata to RDF too — although there are a couple gaps I’ll discuss at the end.

First, let’s show how you’d do RDFa-style RDF semantics expressed in HTML5 microdata. Let’s take the complete example from the RDFa wikipedia article, as it’s small but makes us actually use a pretty complete complement of microdata features. There are in fact a couple weird details I’m not sure about.

<?xml version="1.0" encoding="UTF-8"?>
<html xmlns="http://www.w3.org/1999/xhtml"
    version="XHTML+RDFa 1.0" xml:lang="en">
    <title>John's Home Page</title>
    <base href="http://example.org/john-d/" />
    <meta property="dc:creator" content="Jonathan Doe" />
    <link rel="foaf:primaryTopic" href="http://example.org/john-d/#me" />
  <body about="http://example.org/john-d/#me">
    <h1>John's Home Page</h1>
    <p>My name is <span property="foaf:nick">John D</span> and I like
      <a href="http://www.neubauten.org/" rel="foaf:interest"
        xml:lang="de">Einstürzende Neubauten</a>.
      My <span rel="foaf:interest" resource="urn:ISBN:0752820907">favorite
      book is the inspiring <span about="urn:ISBN:0752820907"><cite
      property="dc:title">Weaving the Web</cite> by
      <span property="dc:creator">Tim Berners-Lee</span></span>

Here’s the same thing, using the same vocabularies, with HTML5 microdata. (yes, contrary to some belief, you can mix and match more than one vocabulary in microdata too, although you’ve got to spell out the complete URI for all but one in any given scope.

<html lang="en">
    <title>John's Home Page</title>
    <base href="http://example.org/john-d/" />
    <link rel="http://xmlns.com/foaf/0.1/primaryTopic" href="http://example.org/john-d/#me" />
  <body itemscope itemtype="http://purl.org/dc/elements/1.1/"  itemid="http://example.org/john-d/#me">
    <h1>John's Home Page</h1>
    <p>My name is <span itemprop="http://xmlns.com/foaf/0.1/nick">John D</span> and I like
      <a href="http://www.neubauten.org/" itemprop="http://xmlns.com/foaf/0.1/interest"
        lang="de">Einstürzende Neubauten</a>.
      <span itemscope itemtype="http://purl.org/dc/elements/1.1/" itemprop="http://xmlns.com/foaf/0.1/interest" itemid="urn:ISBN:0752820907 ">
      My favorite
      book is the inspiring <cite
      itemprop="title">Weaving the Web</cite> by
      <span itemprop="creator">Tim Berners-Lee</span>

Mismatches and missing semantics

While the fundamental approach is compatible, there are a few mismatches and semantics lost or less clear in html5 microdata. Here are some I feel like noting, there may be others.

  • I’m not sure I did the right thing with the <link> in the <head> section — html5 has kind of an odd fork in it between maintaining more or less backwards compatibility with old-style <link> and <meta> (in <head>, using ‘rel’), and microdata style (in <body>, using ‘itemprop’).  There are weird things with ‘rel’ only being allowed in certain places and ‘itemprop’ in others; you also are never supposed to have both ‘rel’ and ‘itemprop’.  So anyway, I’m not sure what a proper way of expressing a relationship with the document as the subject is, in microdata, may have done something not right here.
  • RDFa takes XML’s namespaces to express vocabularies. RDFa’s namespace+name is analagous to microdata’s itemtype+itemprop.  But.
    • In microdata, you can do something dear to RDF’s heart, and express the predicate URL as a literal absolute URL  — which is what you have to do to mix namespaces/vocabularies, and that’s really just fine. You can also do the equivalent of a namespace (in an itemtype) and a non-URI bare name belonging to that namespace (in an itemprop), but you only get one namespace at a time like this.
    • But also, RDFa, via XML, is quite clear that you concatenate a namespace and a bare name to get the complete URI. We used this same convention when putting our RDFa into microdata, which works because itemtype’s are always URIs too.  — but it’s just a convention, microdata isn’t clear about that, and microdata examples often use itemtype URI examples that clearly weren’t intended like this. Like schema.org: itemtype=”http://schema.org/Book&#8221; + itemprop=”bookFormat” concatenated == “http://schema.org/BookbookFormat&#8221;. Um, that’s not quite sensible, not what anyone’s looking for… although it is a legal URI….
  • microdata makes  a lot ‘easier’ to use what the RDFistas call ‘blank nodes’ — nodes whose ‘subject’ lacks a specified URI. Idiomatic microdata actually generally has a bunch of those, including the top-level one(s).   The microdata spec tries to tell you that you can only use an `itemid` for certain vocabularies that establish it’s use — ideally, I think this would be opened up, and even encouraged. The semantics should be made more clearly compatible with RDF — the itemid it is an identifier for the ‘itemscope’d thing, that is the ‘subject’ URI of any itemprop’s in that itemscope, that should be made clear.
    • I personally think allowing idiomatic blank nodes is a good thing for microdata, making it more usable, letting people get started with the minimal semantics for their use cases, not making them spend time on metadata design/control they don’t need yet.  Even if RDFistas disagree, I suggest they focus on making it easier to avoid blank nodes — more idiomatic, more encouraged by docs, more generally legal — and give up on making it hard or impossible to have blank nodes in html5 microdata.

Whither RDF/RDFa

(That’s “whither”, not “wither”. Hopefully).

There are probably other rough spots than the ones I’ve identified. And the one’s I mentioned include some tough ones (the itemtype+itemprop==URI issue).

But by and large, HTML5 microdata’s fundamental model is RDF compatible.  Hopefully the RDFistas are focused on figuring out how to lessen the impedence mismatches, if neccesary by lobbying the html5 working group to make minimized interventions.  Hopefully they’re not still stuck on an xhtml/rdfa/why-didn’t-they-do-things-our-way train, because that train isn’t leaving the station.  Instead though, they can contribute to sanding off a few rough spots in microdata to make it quite capable of doing what they want (and, if they’re right, everyone else will eventually realize they want too). Work on tools to turn microdata to RDF, too, hopefully.

microdata could actually be the a great thing for RDF.  If handled correctly, it should be possible to express full RDF semantics in microdata — microdata can be the RDF-in-HTML-markup standard that RDFa wanted to be. (microdata’s designers clearly knew about RDF/RDFa and were influenced by it). It’s also possible to leave a lot of semantics out when writing microdata — but often in ways you could do with RDF/RDFa too, lots of blank nodes, etc, RDF/RDFa just tries to make it inconvenient and non-idiomatic.

While the RDFistas may be rueing that microdata makes it so easy to not have completely specified triples with no blank nodes everywhere — I think the flip side of this is actually what will allow it to possibly get more uptake, and be an easy start on the road to RDF, if RDF plays it’s cards right.   That because you have to think through the complete vocabularies and semantics less, you can get started with just the semantics you need, and not be forced to do more up front metadata design than you need for your identified use cases, or more than you can afford or have the skills to do. That, and some the immediate use cases in ‘Google will use it!’ of course. But if Google had tried to say they used RDFa (didn’t they once, maybe, sort of?), I don’t think it would have gone anywhere — RDFa is just too overwhelming.


One thought on “RDFa :: HTML5 microdata”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s