HTML Microdata

Abstract

This specification defines new HTML attributes to embed machine-readable data in HTML documents in a style similar to RDFa. It is compatible with JSON, and can be written in a style which is convertible to RDF, although two-way conversion is not lossless.

4.1 Overview

This section is non-normative.

Sometimes, it is desirable to annotate content with specific machine-readable labels. For example, search engines can better identify page content using schema.org annotations, and content management systems can find and use information from documents, if it is marked up in a known way.

Microdata provides a simple mechanism to label content in a document, so it can be processed as a set of items described by name-value pairs.

Each name-value pair identifies a property of the item, and a value of that property.

Figure 1 A common way to represent items, properties and values graphically

The value of a property may be an item.

4.2 The basic syntax

This section is non-normative.

Items and properties are generally represented by regular elements.

The itemscope attribute creates an item.

The itemprop attribute on a descendent element of an item's identifies a property of that item. Typically, the text content of that element is the value of that property.

Here there are two items, each of which has the property "name":

<div itemscope>
 <p>My name is
  <span itemprop="name">Elizabeth</span>.</p>
</div>

<div itemscope>
 <p>My name is 
  <span itemprop="name">Daniel</span>.</p>
</div>

Figure 2 The example represented graphically: two items, each with a value for the property name

Markup other than microdata attributes has no effect on microdata.

These two examples are exactly equivalent, at a microdata level, as the previous two examples respectively:

<div itemscope>
 <p>My <em>name</em> is
  <span itemprop="name">E<strong>liz</strong>abeth</span>.</p>
</div>

<section>
 <div itemscope>
  <aside>
   <p>My name is
    <span itemprop="name"><a href="/?user=daniel">Daniel</a></span>.</p>
  </aside>
 </div>
</section>

Warning

Note that this means any information recorded in markup for purposes such as internationalisation or accessibility will be lost in the conversion to data.

Properties generally have values that are strings.

Here the item has three properties:

<div itemscope>
 <p>My name is <span itemprop="name">Neil</span>.</p>
 <p>My band is called <span itemprop="band">Four Parts Water</span>.</p>
 <p>I am <span itemprop="nationality">British</span>.</p>
</div>

If the text that would normally be the value of a property, such as the element content, is unsuitable for recording the property value, it can be expressed using the content attribute of the element.

Here, the visible content may be added by a script. A microdata processor can extract the content from the content attribute without running scripts. The value of the product-id property for this item is 9678AOU879

<li itemscope>
 <span itemprop="product-id" content="9678AOU879"
     class="reference--id_code-autoinsert"></span>
</li>

When a string value is in some machine-readable format unsuitable to present as the content of an element, it can be expressed using the value attribute of the data element, as long as there is no content attribute.

Here, there is an item with a property whose value is a product identifier. The identifier is not human-friendly, so instead it is encoded for microdata using the value attribute of the data element, and the product's name is used as the text content of the element that is rendered on the page.

<h1 itemscope>
 <data itemprop="product-id" value="9678AOU879">The Instigator 2000</data>
</h1>

Warning

This will not work if there is a content attribute as well. In the following example, the value of the product-id property is taken from the content attribute, so it will be This one rocks!:

<h1 itemscope>
 <data itemprop="product-id" value="9678AOU879" 
   content="This one rocks!">The Instigator 2000</data>
</h1>

When an itemprop is used on an element that can have a src or href attribute, such as links and media elements, that does not have a content attribute, the value of the name-value pair is an absolute URL based on the src or href attribute (or the empty string if they are missing or there is an error).

In this example, the item has one property, logo, whose value is a URL based on the location of the page, and ending with our-logo.png:

<div itemscope itemtype="https://schema.org/LocalBusiness">
 <img itemprop="logo" src="our-logo.png" alt="Our Company">
</div>

Warning

Note that accessibility information, such as the alt attribute in the previous example, is ignored. To provide that as a value, repeat it in a content attribute. In the following example, the value of the name property is The Company:

<div itemscope itemtype="https://schema.org/LocalBusiness">
 <img itemprop="name" src="our-logo.png"
     content="The Company" alt="Our Company">
</div>

For numeric data, the meter element and its value attribute can be used instead, as long as there is no content attribute.

Here a rating of 3.5 is given using a meter element.

<div itemscope itemtype="https://schema.org/Product">
 <span itemprop="name">Panasonic White 60L Refrigerator</span>
 <img src="panasonic-fridge-60l-white.jpg" alt="">
  <div itemprop="aggregateRating"
       itemscope itemtype="https://schema.org/AggregateRating">
   <meter itemprop="ratingValue" min=0 value=3.5 max=5>Rated 3.5/5</meter>
   (based on <span itemprop="reviewCount">11</span> customer reviews)
  </div>
</div>

Similarly, for date- and time-related data, the time element and its datetime attribute can be used to specify a specifically formatted date or time, as long as there is no content attribute.

In this example, the item has one property, "birthday", whose value is a date:

<div itemscope>
 I was born on <time itemprop="birthday" datetime="2009-05-10">May 10th 2009</time>.
</div>

Properties can also themselves be groups of name-value pairs, by putting the itemscope attribute on the element that declares the property.

Items that are not part of others are called top-level microdata items.

In this example, the outer item represents a person, and the inner one represents a band:

<div itemscope>
 <p>Name: <span itemprop="name">Amanda</span></p>
 <p>Band: <span itemprop="band" itemscope> <span itemprop="name">Jazz Band</span> (<span itemprop="size">12</span> players)</span></p>
</div>

The outer item here has two properties, "name" and "band". The "name" is "Amanda", and the "band" is an item in its own right, with two properties, "name" and "size". The "name" of the band is "Jazz Band", and the "size" is "12".

The outer item in this example is a top-level microdata item.

Properties that are not descendants of the element with the itemscope attribute can be associated with the item using the itemref attribute. This attribute takes a list of IDs of elements to crawl in addition to crawling the children of the element with the itemscope attribute.

This example is the same as the previous one, but all the properties are separated from their items:

<div itemscope id="amanda" itemref="a b"></div>
<p id="a">Name: <span itemprop="name">Amanda</span></p>
<div id="b" itemprop="band" itemscope itemref="c"></div>
<div id="c">
 <p>Band: <span itemprop="name">Jazz Band</span></p>
 <p>Size: <span itemprop="size">12</span> players</p>
</div>

This gives the same result as the previous example. The first item has two properties, "name", set to "Amanda", and "band", set to another item. That second item has two further properties, "name", set to "Jazz Band", and "size", set to "12".

An item can have multiple properties with the same name and different values.

This example describes an ice cream, with two flavors:

<div itemscope>
 <p>Flavors in my favorite ice cream:</p>
 <ul>
  <li itemprop="flavor">Lemon sorbet</li>
  <li itemprop="flavor">Apricot sorbet</li>
 </ul>
</div>

This thus results in an item with two properties, both "flavor", having the values "Lemon sorbet" and "Apricot sorbet".

An element introducing a property can also introduce multiple properties at once, to avoid duplication when some of the properties have the same value.

Here we see an item with two properties, "favorite-color" and "favorite-fruit", both set to the value "orange":

<div itemscope>
 <span itemprop="favorite-color favorite-fruit">orange</span>
</div>

It's important to note that there is no relationship between the microdata and the content of the document where the microdata is marked up.

The following two examples are exactly the same microdata, because they produce exactly the same information when processed:

<figure>
 <img src="castle.jpeg">
 <figcaption><span itemscope><span
  itemprop="name">The Castle</span></span> (1986)</figcaption>
</figure>

<span itemscope><meta itemprop="name"
 content="The Castle"></span>
<figure>
 <img src="castle.jpeg">
 <figcaption>The Castle (1986)</figcaption>
</figure>

Both have a figure with a caption, and both, completely unrelated to the figure, have an item with a name-value pair with the name "name" and the value "The Castle". In neither case is the image in any way associated with the item.

4.3 Typed items

This section is non-normative.

The examples in the previous section show how information could be marked up on a page that doesn't expect its microdata to be re-used. Microdata is most useful, though, when it is used in contexts where other authors and readers are able to cooperate to make new uses of the markup.

For this purpose, it is necessary to give each item a type, such as "http://example.com/person", or "http://example.org/cat", or "http://band.example.net/". Types are identified as URLs.

The type for an item is given as the value of an itemtype attribute on the same element as the itemscope attribute. The value is a URL, which determines the vocabulary identifier for properties

Assuming a page at http://example.net/some/dataexample contains the following code:

<section itemscope itemtype="http://example.org/animals#cat">
 <h1 itemprop="name">Hedral</h1>
 <p itemprop="desc">Hedral is a male american domestic
 shorthair, with a fluffy black fur with white paws and belly.</p>
 <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months">
</section>

The item's type is "http://example.org/animals#cat"

In this example the "http://example.org/animals#cat" item has three properties:

http://example.org/animals#name: "Hedral"
http://example.org/animals#desc: Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly.
http://example.org/animals#img: hedral.jpeg

The type gives the context for the properties, thus selecting a vocabulary: a property named "class" given for an item with the type "http://census.example/person" might refer to the economic class of an individual, while a property named "class" given for an item with the type "http://example.com/school/teacher" might refer to the classroom a teacher has been assigned. A vocabulary may define several types. For example, the types " http://example.org/people/teacher" and "http://example.org/people/engineer" could be defined in the same vocabulary. Some properties might not be especially useful in both cases: the "classroom" property might not be meaningful with the "http://example.org/people/engineer" type. Multiple types from the same vocabulary can be given for a single item by listing the URLs, separated by spaces, in the attribute's value. An item cannot be given two types if they do not use the same vocabulary, however.

4.4 Global identifiers for items

This section is non-normative.

Sometimes, an item gives information about a topic that has a global identifier. For example, books can be identified by their ISBN number, or concepts can be identified by a URL as in [rdf-primer].

The itemtype attribute associates an item with a global identifier in the form of a URLs.

Here, an item is talking about a particular book:

<dl itemscope
    itemtype="http://vocab.example.net/book"
    itemid="urn:isbn:0-330-34032-8">
 <dt>Title
 <dd itemprop="title">The Reality Dysfunction
 <dt>Author
 <dd itemprop="author">Peter F. Hamilton
 <dt>Publication date
 <dd><time itemprop="pubdate" datetime="1996-01-26">26 January 1996</time>
</dl>

4.5 Selecting names when defining vocabularies

This section is non-normative.

Using microdata means using a vocabulary. For some purposes an ad-hoc vocabulary is adequate, but authors are encouraged to re-use existing vocabularies to make content re-use easier.

When designing new vocabularies, identifiers can be created either using URLs, or, for properties, as plain words (with no dots or colons). For URLs, conflicts with other vocabularies can be avoided by only using identifiers that correspond to pages that the author has control over.

For instance, if Jon and Adam both write content at example.com, at http://example.com/~jon/... and http://example.com/~adam/... respectively, then they could select identifiers of the form "http://example.com/~jon/name" and "http://example.com/~adam/name" respectively.

Properties whose names are just plain words can only be used within the context of the types for which they are intended; properties named using URLs can be reused in items of any type. If an item has no type, and is not part of another item, then if its properties have names that are just plain words, they are not intended to be globally unique, and are instead only intended for limited use. Generally speaking, authors are encouraged to use either properties with globally unique names (URLs) or ensure that their items are typed.

Here, an item in the page http://example.net/some/dataexample is an "http://example.org/animals/cat", and most of the properties have names defined in the context of that type. There are also a few additional properties whose names come from other vocabularies.

<section itemscope itemtype="http://myvocab.example.org/animals/cat">
 <h1 itemprop="name http://example.com/fn">Hedral</h1>
 <p itemprop="desc">Hedral is a male american domestic
 shorthair, with a fluffy <span
 itemprop="http://example.com/color">black</span> fur with <span
 itemprop="http://example.com/color">white</span> paws and belly.</p>
 <img itemprop="img" src="hedral.jpeg" alt="" title="Hedral, age 18 months">
</section>

This example has one item with the type "http://myvocab.example.org/animals/cat" and the following properties:

http://myvocab.example.org/animals/name: Hedral
http://example.com/fn: Hedral
http://myvocab.example.org/animals/desc: Hedral is a male american domestic shorthair, with a fluffy black fur with white paws and belly.
http://example.com/color: black
http://example.com/color: white
http://myvocab.example.org/animals/img: http://example.net/some/hedral.jpeg

5. Encoding microdata

5.1 The microdata model

The microdata model consists of groups of name-value pairs known as items.

Each group is known as an item. Each item can have zero or more item types, global identifier(s), and associated name-value pairs. Each name in the name-value pair is known as a property, and each property has one or more values. Each value is either a string or itself a group of name-value pairs (an item). The names are unordered relative to each other, but if a particular name has multiple values, they do have a relative order.

5.2 Items: `itemscope`, `itemtype`, and `itemid`.

Every HTML element may have an itemscope attribute specified. The itemscope attribute is a boolean attribute.

An element with the itemscope attribute specified creates a new item, a group of name-value pairs that describe properties, and their values, of the thing represented by that element.

Elements with an itemscope attribute may have an itemtype attribute specified, to give the item types of the item.

The itemtype attribute, if specified, must have a value that is an unordered set of unique space-separated tokens that are case-sensitive, each of which is a valid absolute URL, and all of which are in the same vocabulary. The attribute's value must have at least one token.

The item types of an item are the tokens obtained by splitting the element's itemtype attribute's value on spaces. If the itemtype attribute is missing or parsing it in this way finds no tokens, the item is said to have no item types.

The item types determine the vocabulary identifier. This is a URL that is prepended to property names, which identifies them as part of their vocabulary. The value of the vocabulary identifier for an item is determined as follows:

Let potential values be an empty array of URLs.
Let tokens be the value of the itemtype attribute, split on spaces.
For each value of tokens:

If there is a NUMBER SIGN Ux0023 ("#") in the value

Append the substring of the value from the beginning to the first NUMBER SIGN Ux0023 ("#") to potential values

Otherwise, if there is a SOLIDUS Ux002F ("/") in the value

Append the substring of the value from the beginning to the last SOLIDUS Ux002F ("/") to potential values

Otherwise

Append a SOLIDUS Ux002F ("/") to the value, and append the resulting string to potential values
If there is only one unique value in potential values return that value. Otherwise return the first item in potential values.

User agents must not automatically dereference unknown URLs given as item types and property names. These URLs are a priori opaque identifiers.

Note

A specification could define that its item types can be derefenced to provide the user with help information. Vocabulary authors are encouraged to provide useful information at the given URL, either in prose or a formal language such as RDF. User agents are

The itemtype attribute must not be specified on elements that do not have an itemscope attribute specified.

An item is said to be a typed item when either it has an item type, or it is the value of a property of a typed item. The relevant types for a typed item is the item's item types, if it has any, or else is the relevant types of the item for which it is a property's value.

Elements with both itemscope and itemtype attributes may also have an itemid attribute specified, to give a global identifier for the item, so that it can be related to other items elsewhere on the Web, or with concepts beyond the Web such as ISBN numbers for published books.

The itemid attribute, if specified, must have a value that is a valid URL potentially surrounded by space characters.

The global identifier of an item is the value of its element's itemid attribute, if it has one, resolved relative to the element on which the attribute is specified. If the itemid attribute is missing or if resolving it fails, it is said to have no global identifier.

The itemid attribute must not be specified on elements that do not have both an itemscope attribute and an itemtype attribute specified.

This example shows a simple vocabulary used to describe the products of a model railway manufacturer. The vocabulary has just five property names:

product-code: A number that identifies the product in the manufacturer's catalog.
name: A brief description of the product.
scale: One of "HO", "1", or "Z" (potentially with leading or trailing whitespace), indicating the scale of the product.
digital: If present, one of "Digital", "Delta", or "Systems" (potentially with leading or trailing whitespace) indicating that the product has a digital decoder of the given type.
track-type: For track-specific products, one of "K", "M", "C" (potentially with leading or trailing whitespace) indicating the type of track for which the product is intended.

This vocabulary has four defined item types:

http://md.example.com/loco: Rolling stock with an engine
http://md.example.com/passengers: Passenger rolling stock
http://md.example.com/track: Track pieces
http://md.example.com/lighting: Equipment with lighting

Each item that uses this vocabulary can be given one or more of these types, depending on what the product is.

Thus, a locomotive might be marked up as:

<dl itemscope itemtype="http://md.example.com/loco
                        http://md.example.com/lighting">
 <dt>Name:
 <dd itemprop="name">Tank Locomotive (DB 80)
 <dt>Product code:
 <dd itemprop="product-code">33041
 <dt>Scale:
 <dd itemprop="scale">HO
 <dt>Digital:
 <dd itemprop="digital">Delta
</dl>

A turnout lantern retrofit kit might be marked up as:

<dl itemscope itemtype="http://md.example.com/track
                       http://md.example.com/lighting">
 <dt>Name:
 <dd itemprop="name">Turnout Lantern Kit
 <dt>Product code:
 <dd itemprop="product-code">74470
 <dt>Purpose:
 <dd>For retrofitting 2 <span itemprop="track-type">C</span> Track
 turnouts. <meta itemprop="scale" content="HO">
</dl>

A passenger car with no lighting might be marked up as:

<dl itemscope itemtype="http://md.example.com/passengers">
 <dt>Name:
 <dd itemprop="name">Express Train Passenger Car (DB Am 203)
 <dt>Product code:
 <dd itemprop="product-code">8710
 <dt>Scale:
 <dd itemprop="scale">Z
</dl>

Great care is necessary when creating new vocabularies. Often, a hierarchical approach to types can be taken that results in a vocabulary where each item only ever has a single type, which is generally much simpler to manage.

5.3 Properties: the `itemprop` and `itemref` attributes

The itemprop attribute, when added to any HTML element that is part of an item, identifies a property of that item. The attribute must be an unordered set of unique space-separated tokens, representing the case-sensitive names of the properties that it adds. The attribute must contain at least one token.

Each token must be either a valid absolute URL or a a string that contains no "." (U+002E) characters and no ":" (U+003A) characters.

Vocabulary specifications must not define property names for Microdata that contain "." (U+002E) characters, ":" (U+003A) characters, nor space characters (defined in [HTML52] as U+0020, U+0009, U+000A, U+000C, and U+000D).

The property names of an element are determined as follows:

Let tokens be the values of the itemprop attribute, Split on spaces.
Let properties be an empty array of strings.
For each value of token, in order:

If the value is a repeated occurrence of an earlier value

discard it and process the next value

If the value is an absolute URL

append it to properties, then process the next value

Otherwise, if the the element is a typed item:

Append the value to the vocabulary identifier for the item. If the the resulting value does not match any value in properties, then append it to properties, and process the next value.

Otherwise

append the value to properties and process the next value.
If properties is not empty, return properties.

Within an item, the properties are unordered with respect to each other, except for properties with the same name, which are ordered in the order they are given by the algorithm that defines the properties of an item.

In the following example, the "a" property has the values "1" and "2", in that order, but whether the "a" property comes before the "b" property or not is not important:

<div itemscope>
 <p itemprop="a">1</p>
 <p itemprop="a">2</p>
 <p itemprop="b">test</p>
</div>

Thus, the following is equivalent:

<div itemscope>
 <p itemprop="b">test</p>
 <p itemprop="a">1</p>
 <p itemprop="a">2</p>
</div>

As is the following:

<div itemscope>
 <p itemprop="a">1</p>
 <p itemprop="b">test</p>
 <p itemprop="a">2</p>
</div>

Elements with an itemscope attribute may have an itemref attribute specified, to give a list of additional elements to crawl to find the name-value pairs of the item.

The itemref attribute, if specified, must have a value that is an unordered set of unique space-separated tokens that are case-sensitive, consisting of IDs of elements in the same document.

The itemref attribute must not be specified on elements that do not have an itemscope attribute specified.

The preceding example:

<div itemscope>
 <p itemprop="a">1</p>
 <p itemprop="a">2</p>
 <p itemprop="b">test</p>
</div>

Could also be written as follows:

<div id="x">
 <p itemprop="a">1</p>
</div>
<div itemscope itemref="x">
 <p itemprop="b">test</p>
 <p itemprop="a">2</p>
</div>

When an element with an itemprop attribute adds a property to multiple items, the requirement above regarding the tokens applies for each item individually.

For the following code:

<div itemscope itemtype="http://example.com/a"> <ref itemred="x"> </div>
<div itemscope itemtype="http://example.com/b"> <ref itemref="x"> </div>
<meta id="x" itemprop="z" content="">

The author should be certain that z is a valid property name for both the http://example.com/a and http://example.com/b vocabularies.

5.4 Values: the `content` attribute.

The algorithm to determine the value for a name-value pair is given by applying the first matching case in the following list:

If the element also has an itemscope attribute: The value is the item created by the element.
If the element has a content attribute: The value is the textContent of the element's content attribute.; Note

HTML only allows the content attribute on the meta element. This specification changes the content model to allow it on any element, as a global attribute.
If the element is an audio, embed, iframe, img, source, track, or video element: If the element has a src attribute, let proposed value be the result of resolving that attribute's textContent. If proposed value is a valid absolute URL: The value is proposed value.
otherwise The value is the empty string.
If the element is an a, area, or link element: If the element has an href attribute, let proposed value be the result of resolving that attribute's textContent. If proposed value is a valid absolute URL: The value is proposed value.
otherwise The value is the empty string.
If the element is an object element: If the element has a data attribute, let proposed value be the result of resolving that attribute's textContent. If proposed value is a valid absolute URL: The value is proposed value.
otherwise The value is the empty string.
If the element is a data or meter element: If the element has a value attribute, the value is that attribute's textContent.
If the element is a time element: If the element has a datetime attribute, the value is that attribute's textContent.
Otherwise: The value is the element's textContent.

The URL property elements are the a, area, audio, embed, iframe, img, link, object, source, track, and video elements.

If a property's value, as defined by the property's definition, is an absolute URL, the property must be specified using a URL property element.

Note

These requirements do not apply just because a property value happens to match the syntax for a URL. They only apply if the property is explicitly defined as taking such a value.

For example, a book about the first moon landing could be called "mission:moon". A "title" property from a vocabulary that defines a title as being a string would not expect the title to be given in an a element, even though it looks like a URL. On the other hand, if there was a (rather narrowly scoped!) vocabulary for "books whose titles look like URLs" which had a "title" property whose content was defined as a URL, then the property would expect the title to be given in an a element (or one of the other URL property elements), because of the requirement above.

5.5 Associating names with items

To find the properties of an item defined by the element root, the user agent must run the following steps. These steps are also used to flag microdata errors.

Let results, memory, and pending be empty lists of elements.
Add the element root to memory.
Add the child elements of root, if any, to pending.
If root has an itemref attribute, split the value of that itemref attribute on spaces. For each resulting token ID, if there is an element in the document whose ID is ID, then add the first such element to pending.
Loop: If pending is empty, jump to the step labeled end of loop.
Remove an element from pending and let current be that element.
If current is already in memory, there is a microdata error; return to the step labeled loop.
Add current to memory.
If current does not have an itemscope attribute, then: add all the child elements of current to pending.
If current has an itemprop attribute specified and has one or more property names, then add current to results.
Return to the step labeled loop.
End of loop: Sort results in tree order.
Return results.

A document must not contain any items for which the algorithm to find the properties of an item finds any microdata error.

An item is a top-level microdata item if its element does not have an itemprop attribute.

All itemref attributes in a Document must be such that there are no cycles in the graph formed from representing each item in the Document as a node in the graph and each property of an item whose value is another item as an edge in the graph connecting those two items.

A document must not contain an itemprop attribute that would not be a property of any item in that document were the properties all to be determined.

In this example, a single license statement is applied to two works, using itemref from the items representing the works:

<!DOCTYPE HTML>
<html>
 <head>
  <title>Photo gallery</title>
 </head>
 <body>
  <h1>My photos</h1>
  <figure itemscope itemtype="http://n.whatwg.org/work" itemref="licenses">
   <img itemprop="work" src="images/house.jpeg" alt="A white house, boarded up, sits in a forest.">
   <figcaption itemprop="title">The house I found.</figcaption>
  </figure>
  <figure itemscope itemtype="http://n.whatwg.org/work" itemref="licenses">
   <img itemprop="work" src="images/mailbox.jpeg" alt="Outside the house is a mailbox. It has a leaflet inside.">
   <figcaption itemprop="title">The mailbox.</figcaption>
  </figure>
  <footer>
   <p id="licenses">All images licensed under the <a itemprop="license"
   href="http://www.opensource.org/licenses/mit-license.php">MIT
   license</a>.</p>
  </footer>
 </body>
</html>

The above results in two items with the type "http://n.whatwg.org/work", one with:

work: images/house.jpeg
title: The house I found.
license: http://www.opensource.org/licenses/mit-license.php

...and one with:

work: images/mailbox.jpeg
title: The mailbox.
license: http://www.opensource.org/licenses/mit-license.php

6. Converting Microdata to other formats

6.1 JSON

Given a list of nodes nodes in a Document, a user agent must run the following algorithm to extract the microdata from those nodes into a JSON form:

Let result be an empty object.
Let items be an empty array.
For each node in nodes, check if the element is a top-level microdata item, and if it is then get the object for that element and add it to items.
Add an entry to result called "items" whose value is the array items.
Return the result of serializing result to JSON in the shortest possible way (meaning no whitespace between tokens, no unnecessary zero digits in numbers, and only using Unicode escapes in strings for characters that do not have a dedicated escape sequence), and with a lowercase "e" used, when appropriate, in the representation of any numbers. [JSON]

Note

This algorithm returns an object with a single property that is an array, instead of just returning an array, so that it is possible to extend the algorithm in the future if necessary.

When the user agent is to get the object for an item item, potentially together with a list of elements memory, it must run the following substeps:

Let result be an empty object.
If no memory was passed to the algorithm, let memory be an empty list.
Add item to memory.
If the item has any item types, add an entry to result called "type" whose value is an array listing the item types of item, in the order they were specified on the itemtype attribute.
If the item has a global identifier, add an entry to result called "id" whose value is the global identifier of item.
Let properties be an empty object.
For each element element that has one or more property names and is one of the properties of the item item, in the order those elements are given by the algorithm that returns the properties of an item, run the following substeps:
1. Let value be the property value of element.
2. If value is an item, then: If value is in memory, then let value be the string "ERROR". Otherwise, get the object for value, passing a copy of memory, and then replace value with the object returned from those steps.
3. For each name name in element's property names, run the following substeps:
  1. If there is no entry named name in properties, then add an entry named name to properties whose value is an empty array.
  2. Append value to the entry named name in properties.
Add an entry to result called "properties" whose value is the object properties.
Return result.

For example, take this markup:

<!DOCTYPE HTML>
<title>My Blog</title>
<article itemscope itemtype="https://schema.org/BlogPosting">
 <header>
  <h1 itemprop="headline">Progress report</h1>
  <p><time itemprop="datePublished" datetime="2013-08-29">today</time></p>
  <link itemprop="url" href="?comments=0">
 </header>
 <p>All in all, he's doing well with his swim lessons. The biggest thing was he had trouble
 putting his head in, but we got it down.</p>
 <section>
  <h1>Comments</h1>
  <article itemprop="comment" itemscope itemtype="https://schema.org/UserComments" id="c1">
   <link itemprop="url" href="#c1">
   <footer>
    <p>Posted by: <span itemprop="creator" itemscope itemtype="https://schema.org/Person">
     <span itemprop="name">Greg</span>
    </span></p>
    <p><time itemprop="commentTime" datetime="2013-08-29">15 minutes ago</time></p>
   </footer>
   <p>Ha!</p>
  </article>
  <article itemprop="comment" itemscope itemtype="https://schema.org/UserComments" id="c2">
   <link itemprop="url" href="#c2">
   <footer>
    <p>Posted by: <span itemprop="creator" itemscope itemtype="https://schema.org/Person">
     <span itemprop="name">Charlotte</span>
    </span></p>
    <p><time itemprop="commentTime" datetime="2013-08-29">5 minutes ago</time></p>
   </footer>
   <p>When you say "we got it down"...</p>
  </article>
 </section>
</article>

It would be turned into the following JSON by the algorithm above (supposing that the page's URL was http://blog.example.com/progress-report):

{
  "items": [
    {
      "type": [ "https://schema.org/BlogPosting" ],
      "properties": {
        "headline": [ "Progress report" ],
        "datePublished": [ "2013-08-29" ],
        "url": [ "http://blog.example.com/progress-report?comments=0" ],
        "comment": [
          {
            "type": [ "https://schema.org/UserComments" ],
            "properties": {
              "url": [ "http://blog.example.com/progress-report#c1" ],
              "creator": [
                {
                  "type": [ "https://schema.org/Person" ],
                  "properties": {
                    "name": [ "Greg" ]
                  }
                }
              ],
              "commentTime": [ "2013-08-29" ]
            }
          },
          {
            "type": [ "https://schema.org/UserComments" ],
            "properties": {
              "url": [ "http://blog.example.com/progress-report#c2" ],
              "creator": [
                {
                  "type": [ "https://schema.org/Person" ],
                  "properties": {
                    "name": [ "Charlotte" ]
                  }
                }
              ],
              "commentTime": [ "2013-08-29" ]
            }
          }
        ]
      }
    }
  ]
}

7. Changes to HTML

7.1 New attributes

This specification adds the following global attributes and associated validity constraints to HTML:

itemscope: This is a boolean attribute. When present on an element, it identifies that element as the container for an item
itemtype: This is a list of absolute URLs that identify an item within a particular vocabulary.; The itemtype attribute must not be specified on elements that do not have an itemscope attribute.; Note

This attribute performs a function similar to the combination of vocab and typeof attributes in [rdfa-core].
itemprop: When present on an element, it identifies that the element provides the property value of the item in which it appears, and the attribute's value defines the property name.; Note

This attribute is equivalent to the property attribute in [rdfa-core].
itemid: This is an absolute URL that provides a global identifier for an item.; The itemid attribute must not be specified on elements that do not have both an itemscope attribute and an itemtype attribute specified.; Note

This is approximately equivalent to declaring that an item is owl:sameAs the value of the attribute. [owl-ref]
itemref: This is a space seperated list of IDs of elements which are not descendants of the element on which it appears. It identifies each element whose ID it includes as defining a property of the item on which it is present.; The itemref attribute must not be specified on elements that do not have an itemscope attribute.

7.2 Content models

This section changes the content models defined by HTML in the following ways:

The content attribute redefined by this specification as a global attribute that may be present on that element.

This is consistent with [HTML-RDFA], which uses the attribute for the same purpose.

If the itemprop attribute is present on a link or meta element, that element is flow content and phrasing content, and may be used where phrasing content is expected.

If a link element has an itemprop attribute, the rel attribute may be omitted.

If a meta element has an itemprop attribute, the name, http-equiv, and charset attributes must be omitted, and the content attribute must be present.

If the itemprop attribute is specified on an a or area element, then the href attribute must also be specified.

If the itemprop attribute is specified on an audio, embed, iframe, img, source, track, or video element, then the src attribute must also be specified.

If the itemprop attribute is specified on an object element, then the data attribute must also be specified.

HTML Microdata

W3C Working Draft 26 June 2017

Abstract

Status of This Document

1. Dependencies

2. Terminology

3. Conformance

4. Introduction

4.1 Overview

4.2 The basic syntax

4.3 Typed items

4.4 Global identifiers for items

4.5 Selecting names when defining vocabularies

5. Encoding microdata

5.1 The microdata model

5.2 Items: `itemscope`, `itemtype`, and `itemid`.

5.3 Properties: the `itemprop` and `itemref` attributes

5.4 Values: the `content` attribute.

5.5 Associating names with items

6. Converting Microdata to other formats

6.1 JSON

7. Changes to HTML

7.1 New attributes

7.2 Content models

9. Internationalisation and localisation

10. Privacy Considerations

11. Security Considerations

12. IANA considerations

`application/microdata+json`

13. Changes

14. Acknowledgements

A. References

A.1 Normative references

A.2 Informative references

Abstract

Status of This Document

1. Dependencies

2. Terminology

3. Conformance

4. Introduction

4.1 Overview

4.2 The basic syntax

4.3 Typed items

4.4 Global identifiers for items

4.5 Selecting names when defining vocabularies

5. Encoding microdata

5.1 The microdata model

5.2 Items: itemscope, itemtype, and itemid.

5.3 Properties: the itemprop and itemref attributes

5.4 Values: the content attribute.

5.5 Associating names with items

6. Converting Microdata to other formats

6.1 JSON

7. Changes to HTML

7.1 New attributes

7.2 Content models

8. Accessibility and microdata

9. Internationalisation and localisation

10. Privacy Considerations

11. Security Considerations

12. IANA considerations

application/microdata+json

13. Changes

14. Acknowledgements

A. References

A.1 Normative references

A.2 Informative references

5.2 Items: `itemscope`, `itemtype`, and `itemid`.

5.3 Properties: the `itemprop` and `itemref` attributes

5.4 Values: the `content` attribute.

`application/microdata+json`