Mapping Microdata to RDF
From W3C Wiki
This page describes how microdata content can be consumed by a consumer whose back-end systems are based on an RDF (or RDF-like) model, as part of the work of the HTML Data TF.
Transformation description moved to ReSpec document
Property URI generation
Microdata allows properties to be specified as simple names, which then have a URI generation rule applied to them. As different vocabularies have different requirements for property URIs, the idea is to provide a way to inform the processor of how to generate URIs, and have the processor fall back to a specific URI generation strategy if no other information is available.
There are different strategies for generating property URIs from names:
- Infer the vocabulary from the @itemtype, and append the name to the resulting vocabulary URI. This would take advantage of the typically RDF strategy of having a flat namespace for classes and properties, so that the class name could be removed from the @itemtype URI to which the name can be appended. For example, if the type were
http://schema.org/Thingthe property 'name' would be be
http://schema.org/name. Types are inherited by items without an @itemtype. Items without a type (explicit or inherited) append the name to the document base URI, in the case that the item has no type. For example, if the document had a base of
http://example.com/doc, name could be appended along with a '#', yielding
- Append the name to the @itemtype URI. For example, given the URI
http://microformats.org/profile/hcardas the type, the property 'fn' would result in the following URI:
http://microformats.org/profile/hcard#fn. Note this is only possible if the type does not include a '#', which would result in an error and/or no generated property URI.
- Append the name to a combination of @itemtype and the property path, and ensure that property URIs generated from names are distinct from explicit property URIs. For example, given the type
http://microformats.org/profile/hcard, the property 'fn' would result in
http://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard#:%23fn. However, if there is an intervening item without a type, it would construct a different URI. Assuming an intervening property 'foo', the resulting URI would be
These strategies can be the value of a _propertyURIGeneration_ parameter added to the initial evaluation context.
Vocabulary-specific URI generation
A registry may associate different vocabularies with property URI generation schemes, for example:
<http://schema.org/> a :Vocabulary; :propertyURIscheme :slashHash . <http://microformats.org/profile/hcard> a :Vocabulary; :propertyURIscheme :contextual .
A vocabulary-aware processor could then change URI generation schemes when encountering @itemtype URIs contained in the registry, and fallback to a default setting otherwise.
Multiple types for an item
Additional examples can be added here.
An example of a a http://schema.org/Organization that is the provider, publisher and copyrightHolder of a http://schema.org/NewsArticle. When converting this sample to RDF, it might be interesting that the "itemid" of the Organization object happens to be the same URL that is used as a property expecting a URL (the "url" property of http://schema.org/Thing in this case) from the same object. The "url" property of http://schema.org/Thing is not meant to take a http://schema.org/Organization as a value, but a URL.
<body itemscope="itemscope" itemtype="http://schema.org/NewsArticle" itemid="http://www.businesswire.com/news/home/20110106006854/en"> ... <span itemprop="provider publisher copyrightHolder" itemscope="itemscope" itemtype="http://schema.org/Organization" itemid="http://businesswire.com"> <meta itemprop="name" content="Business Wire"/> <a itemprop="url" href="http://www.businesswire.com"> <img itemprop="image" src="http://www.businesswire.com/images/Powered-by-Business-Wire.gif" title="Business Wire is the leading source for full-text breaking news and press releases, multimedia and regulatory filings for companies and groups throughout the world" alt="Powered by Business Wire"/> </a> </span> ... </body>
The resulting RDF from this example is:
<http://www.businesswire.com/news/home/20110106006854/en> a schema:NewsArticle; schema:copyrightHolder <http://www.businesswire.com> . <http://businesswire.com> a schema:Organization; schema:image <http://www.businesswire.com/images/Powered-by-Business-Wire.gif>; schema:name "Business Wire"; schema:url <http://www.businesswire.com> .