Mapping Microdata to RDF
This page describes how microdata content can be consumed by a consumer whose back-end systems are based on an RDF (or RDF-like) model, as part of the work of the HTML Data TF.
Transformation description moved to ReSpec document
Contents
Property URI generation
Microdata allows properties to be specified as simple names, which then have a URI generation rule applied to them. As different vocabularies have different requirements for property URIs, the idea is to provide a way to inform the processor of how to generate URIs, and have the processor fall back to a specific URI generation strategy if no other information is available.
There are different strategies for generating property URIs from names:
- hashSlash
- Infer the vocabulary from the @itemtype, and append the name to the resulting vocabulary URI. This would take advantage of the typically RDF strategy of having a flat namespace for classes and properties, so that the class name could be removed from the @itemtype URI to which the name can be appended. For example, if the type were
http://schema.org/Thing
the property 'name' would be behttp://schema.org/name
. Types are inherited by items without an @itemtype. Items without a type (explicit or inherited) append the name to the document base URI, in the case that the item has no type. For example, if the document had a base ofhttp://example.com/doc
, name could be appended along with a '#', yieldinghttp://example.com/doc#name
- fragID
- Append the name to the @itemtype URI. For example, given the URI
http://microformats.org/profile/hcard
as the type, the property 'fn' would result in the following URI:http://microformats.org/profile/hcard#fn
. Note this is only possible if the type does not include a '#', which would result in an error and/or no generated property URI. - contextual
- Append the name to a combination of @itemtype and the property path, and ensure that property URIs generated from names are distinct from explicit property URIs. For example, given the type
http://microformats.org/profile/hcard
, the property 'fn' would result inhttp://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard#:%23fn
. However, if there is an intervening item without a type, it would construct a different URI. Assuming an intervening property 'foo', the resulting URI would behttp://www.w3.org/1999/xhtml/microdata#http://microformats.org/profile/hcard#:%23foo%20fn
.
These strategies can be the value of a _propertyURIGeneration_ parameter added to the initial evaluation context.
Vocabulary-specific URI generation
A registry may associate different vocabularies with property URI generation schemes, for example:
<http://schema.org/> a :Vocabulary; :propertyURIscheme :slashHash . <http://microformats.org/profile/hcard> a :Vocabulary; :propertyURIscheme :contextual .
A vocabulary-aware processor could then change URI generation schemes when encountering @itemtype URIs contained in the registry, and fallback to a default setting otherwise.
Multiple types for an item
TBD.
Examples
Additional examples can be added here.
An example of a a http://schema.org/Organization that is the provider, publisher and copyrightHolder of a http://schema.org/NewsArticle. When converting this sample to RDF, it might be interesting that the "itemid" of the Organization object happens to be the same URL that is used as a property expecting a URL (the "url" property of http://schema.org/Thing in this case) from the same object. The "url" property of http://schema.org/Thing is not meant to take a http://schema.org/Organization as a value, but a URL.
<body itemscope="itemscope" itemtype="http://schema.org/NewsArticle" itemid="http://www.businesswire.com/news/home/20110106006854/en"> ... <span itemprop="provider publisher copyrightHolder" itemscope="itemscope" itemtype="http://schema.org/Organization" itemid="http://businesswire.com"> <meta itemprop="name" content="Business Wire"/> <a itemprop="url" href="http://www.businesswire.com"> <img itemprop="image" src="http://www.businesswire.com/images/Powered-by-Business-Wire.gif" title="Business Wire is the leading source for full-text breaking news and press releases, multimedia and regulatory filings for companies and groups throughout the world" alt="Powered by Business Wire"/> </a> </span> ... </body>
The resulting RDF from this example is:
<http://www.businesswire.com/news/home/20110106006854/en> a schema:NewsArticle; schema:copyrightHolder <http://www.businesswire.com> . <http://businesswire.com> a schema:Organization; schema:image <http://www.businesswire.com/images/Powered-by-Business-Wire.gif>; schema:name "Business Wire"; schema:url <http://www.businesswire.com> .