Mapping Microdata to RDF

From W3C Wiki

This page describes how microdata content can be consumed by a consumer whose back-end systems are based on an RDF (or RDF-like) model, as part of the work of the HTML Data TF.

Transformation description moved to ReSpec document

Property URI generation

Microdata allows properties to be specified as simple names, which then have a URI generation rule applied to them. As different vocabularies have different requirements for property URIs, the idea is to provide a way to inform the processor of how to generate URIs, and have the processor fall back to a specific URI generation strategy if no other information is available.

There are different strategies for generating property URIs from names:

hashSlash
Infer the vocabulary from the @itemtype, and append the name to the resulting vocabulary URI. This would take advantage of the typically RDF strategy of having a flat namespace for classes and properties, so that the class name could be removed from the @itemtype URI to which the name can be appended. For example, if the type were http://schema.org/Thing the property 'name' would be be http://schema.org/name. Types are inherited by items without an @itemtype. Items without a type (explicit or inherited) append the name to the document base URI, in the case that the item has no type. For example, if the document had a base of http://example.com/doc, name could be appended along with a '#', yielding http://example.com/doc#name
fragID
Append the name to the @itemtype URI. For example, given the URI http://microformats.org/profile/hcard as the type, the property 'fn' would result in the following URI: http://microformats.org/profile/hcard#fn. Note this is only possible if the type does not include a '#', which would result in an error and/or no generated property URI.
contextual
Append the name to a combination of @itemtype and the property path, and ensure that property URIs generated from names are distinct from explicit property URIs. For example, given the type http://microformats.org/profile/hcard, the property 'fn' would result in http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard#:%23fn. However, if there is an intervening item without a type, it would construct a different URI. Assuming an intervening property 'foo', the resulting URI would be http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard#:%23foo%20fn.

These strategies can be the value of a _propertyURIGeneration_ parameter added to the initial evaluation context.

Vocabulary-specific URI generation

A registry may associate different vocabularies with property URI generation schemes, for example:

<http://schema.org/> a :Vocabulary; :propertyURIscheme :slashHash .
<http://microformats.org/profile/hcard> a :Vocabulary; :propertyURIscheme :contextual .

A vocabulary-aware processor could then change URI generation schemes when encountering @itemtype URIs contained in the registry, and fallback to a default setting otherwise.

Multiple types for an item

TBD.

Examples

Additional examples can be added here.

An example of a a http://schema.org/Organization that is the provider, publisher and copyrightHolder of a http://schema.org/NewsArticle. When converting this sample to RDF, it might be interesting that the "itemid" of the Organization object happens to be the same URL that is used as a property expecting a URL (the "url" property of http://schema.org/Thing in this case) from the same object. The "url" property of http://schema.org/Thing is not meant to take a http://schema.org/Organization as a value, but a URL.

<body itemscope="itemscope" itemtype="http://schema.org/NewsArticle"
  itemid="http://www.businesswire.com/news/home/20110106006854/en">
...
<span itemprop="provider publisher copyrightHolder" itemscope="itemscope"
          itemtype="http://schema.org/Organization" itemid="http://businesswire.com">
  <meta itemprop="name" content="Business Wire"/>
  <a itemprop="url" href="http://www.businesswire.com">
     <img itemprop="image"
              src="http://www.businesswire.com/images/Powered-by-Business-Wire.gif"
              title="Business Wire is the leading source for full-text breaking news and press releases, 
              multimedia and regulatory filings for companies and groups throughout the world"
              alt="Powered by Business Wire"/>
  </a>
</span>
...
</body>


The resulting RDF from this example is:

<http://www.businesswire.com/news/home/20110106006854/en> a schema:NewsArticle;
 schema:copyrightHolder <http://www.businesswire.com> .


<http://businesswire.com> a schema:Organization;
   schema:image <http://www.businesswire.com/images/Powered-by-Business-Wire.gif>;
   schema:name "Business Wire";
   schema:url <http://www.businesswire.com> .