WebSchemas/ExternalEnumerations

From W3C Wiki
Jump to: navigation, search
  • This is an evolving draft of a document for inclusion on schema.org as /docs/external.html
  • When it is finalised, the Wiki version will be historical only; for now it is a discussion draft only.
  • As a potential schema.org page, it is written in a tone suited to that site, rather than more typical Wiki-speak; please don't edit this away.

See blog post for high-level overview.


External Enumerations

Goals

  • show how to make direct use of externally-maintained vocabularies and datasets
  • encourage common representation of controlled values, without adding hundreds of details into schema.org core
  • balance decentralization with markup simplicity

Change history

  • The originally announced version added a layer of indirection, giving ext.schema.org URLs to items from external vocabularies.
  • After some discussion, this will now be revised, "The canonical urls that Schema.org recommends for use will the urls for the entities on the reference sites (wikipedia, freebase, nist, etc.) When these reference sites add new entities (such as South Sudan as a new country), webmasters can immediately start using them.", "In addition, to make the common use case much easier, Schema.org will provide documentation pages that list the entities (and their external urls), along with the caveat that the external entity is the primary source.". This version (april 25) has been updated to match this decision but may have some traces of the old design.
  • This revision (May 2012) clarifies that external enumerations are always via schema.org types, and that we don't consider external type systems to be part of schema.org.

Overview

Schema.org markup can be used alongside various existing descriptive standards and datasets. For example, country codes, units and measures, or job classifications - in each of these areas there are existing standards and authorities. This document describes how to reference such authorities using schema.org vocabulary.

Schema.org is about finding a balance between having a single well-integrated vocabulary and the full diversity of the Web. To achieve this, we define here some specific integration points through which selected externally maintained vocabulary can be published as part of schema.org markup. This is in addition to the existing extension mechanisms we support, and the general ability to include whatever markup you like in your pages. The focus here is on external vocabularies which can be thought of as 'supported' (or anticipated) in some sense by schema.org.

Initiatives such as DBpedia and Wikidata provide community maintained sources of structured data and well-known identifiers. Alongside professionally maintained vocabularies (e.g. the Library of Congress Subject Headings for topics, or the UN FAO Country Profiles for countries), there are a great many externally enumerated lists that are useful to include in schema.org descriptions. There are many cases in which a more precise description can be given by drawing upon some standard list or vocabulary.

This page describes how we do this, and how to link to these enumerations in a way that schema.org-aware systems can understand. For example:

  • Countries
  • Topics and categories
  • Specific named individuals (people, events, ...)
  • Units and Measures (future work)

Using only schema.org vocabulary, you can describe many things directly. However often it is useful to reference things that have been described more authoritatively elsewhere, and to structure your description so the link with external data can be exploited. We start with some common examples, and show (using Microdata) how this looks in practice. Using common controlled values for properties can also help with data integration and merging.

The external enumerations mechanism in one sentence:

  • Schema.org markup uses links into well-known authority lists to clarify which particular instance of a schema.org type (eg. Country) is being mentioned.

Questions and Answers

This section anticipates likely questions regarding this mechanism. As we start to mix externally-managed systems with schema.org markup, it is important to remain clear on what the schema.org partnership is committing to.

Q: What can you reasonably expect the schema.org search engines to support or understand?

A: The schema.org collaboration is focused on the types and properties from schema.org. Publishers are of course free to add whatever additional information they like into their pages, but the schema.org project doesn't include any search engine commitment to make sense of such added extras.


Q: How do 'external enumerations' fit in to this picture?

A: From a schema.org perspective, these are essentially controlled lists. We do not treat them as full types - we don't expect to see them used for defining expected values within schema.org-based systems, and we don't treat them as defining their own attributes. Others may do these things, but publishers who expect their markup to be understood as schema.org markup should always use the relevant schema.org types and properties. If you add extras for other reasons, that's great - in general, the more structured data out there, the better!


Q: Why this approach?

A: It is important to allow different authorities to provide 'external enumeration' details, and often there are several quite different approaches to describing the same topic. Our approach is to say, "when describing a Country, please use identifiers from (for example) Geonames, Wikipedia, DBpedia, UN FAO GeoPolitical Ontology and others...", while also saying "don't expect us to understand the detailed type and property structures associated with each of those". Similar issues arise with controlled values for 'genre', product identifiers, etc. It is much easier to handle these datasets as providers controlled lists, than as part of a gigantic distributed type system.


Q: "How do you decide which external vocabularies to support in this manner?"

A: We will maintain a simple directory of useful vocabularies, based on suggestions from the community, publishers and consumers of schema.org data. We prefer vocabularies that address common practical needs, and that show either broad grassroots support (e.g. Wikipedia) or relevant professional input (e.g. LCSH, UN FAO). Inclusion does not represent any kind of formal endorsement of the vocabulary content by the schema.org partners; rather it is an indication that we expect schema.org markup will be richer and more informative if the dataset is used. So, widely-valued vocabularies / datasets that are available under liberal license and through open standards at stable URIs are strong candidates for inclusion.

In general, it is important that publishers and content authors have freedom of expression. Schema.org's role is to help document a collection of useful vocabularies to support that; if something seems to be missing, please ask in the WebSchemas group.


Q: Where can we find out the identifiers used within these naming scopes defined by each authority? ie. that we write "United_States" in the Wikipedia dataset (and hence DBpedia too), yet "USA" within the UN FAO dataset?

A: At launch, this involves visiting each site independently. Since these datasets are available in open and standard formats, better tool support is likely.


Controlled property values

There are several ways in which external enumerations can show up in schema.org markup. We begin with the simplest.

Enumerations that can be used as values for schema.org properties.For example, some schema.org properties have a range of 'Country', which means they can be used to indicate some specific country. In addition, Schema.org has the notion of a class of things called 'Country', which is a kind of 'AdministrativeArea'. However at schema.org we do not attempt to maintain a detailed list of countries, their identifiers or other properties; instead, we offer this external enumerations mechanism.

There are several kinds of controlled property values that might want to use external enumerations:

  • 1. properties (e.g. genre, inLanguage, occupationalCategory, servesCuisine, recipeCuisine) whose value is formally expressed using type 'Text', but which are ideally drawn from externally managed lists, ideally one supplying us with both well-known URLs alongside controlled textual labels.
  • 2. properties (e.g. nationality, addressCountry) whose values have a full schema.org type (Country, in this case), and which are drawn from a finite and manageable list.
  • 3. properties (e.g. actor, publisher etc.) whose values have a full schema.org type (e.g. Person, Organization), but where the list of values is much larger, and more concerned with entity identification. For example, millions of organizations, billions of people. External sites can still provide well-known identifiers for these, e.g. bibliographic sites listing authors, or various ways of identifying companies. The 'url' property is also relevant here.

At this stage, we focus on the first two: cases where a property currently points to controlled strings, and cases where schema.org has already a name for the type of thing, and so we are identifying an entity of that type. For the former case to be externally enumerable, the property definitions will need to be updated to add more relevant externally enumerable type. Although the entity identification usage (identifying a specific Person, Organization etc.) has some structural similarity, the open-ended nature of the task makes it harder for schema.org to add value. Where the list of values is more restricted, common tools and datasets can be shared in the schema.org community, to make it easier to publish and share data using these enumerations.

Note: A schema.org type is required for grounding external enumerations in our type system. This means that whenever a property's values are to be externally enumerated, the schema.org project should make sure there is some relevant type added. This converts our situation (1.) into situation (2.), ie. we should add a type such as CuisineCategory or JobCategory or Cuisine. The specifics will vary from case to case; sometimes they will be organized using Enumeration or Intangible; other times they will be other ordinary types.

Note: the 'about' property of CreativeWork (expected type: Thing) is unusual here, in that it is a hybrid design. Sometimes its value is a thing, eg. some Movie or Person; other times it may be expressed textually, or indirectly via a link to a subject classification code. This pattern follows similar practice in the Dublin Core vocabulary.

Sub-Types as Enumerations

There are a few places in schema.org's vocabulary where we can find long lists of kinds of thing, organized into a class hierarchy. So for example we have a class 'HairSalon' which is, alongside 'DaySpa', and 'HealthClub' a subclass of 'HealthAndBeautyBusiness', itself a subclass of LocalBusiness. While schema.org can usefully include some common cases like this, we don't want to attempt to catalog all of human experience and activity in our schema. Another example is 'PlaceOfWorship', where schema.org has a few sub-classes defined but doesn't attempt to cover all religions.

Future versions of this document may express ways of using externally managed lists to augment schema.org's core classes. However, as noted above there are no plans for schema.org to ever directly treat external definitions as full types, since this conflicts with our goal of having a single unified schema while drawing on diverse additional datasets.

Practical Examples

The rest of this document shows some examples of external enumerations in practical use.

Country example

Schema.org has some properties such as 'nationality' (of a Person) or 'addressCountry' (of a PostalAddress) whose values are countries. Others properties may be added in the future.

Let's take a cut-down version of our existing Movie example, and add in nationality properties for some of the people mentioned. Perhaps this information might help fine tune a movie recommender, for example.

First, the basic example with no nationality information:

  1. <div itemscope itemtype="http://schema.org/Movie">
  2.   <h1 itemprop="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
  3.   <span itemprop="description">Jack Sparrow and Barbossa embark on a quest to find the elusive 
  4.            fountain of youth, only to discover that Blackbeard and his daughter are after it too.</span>
  5.   <div itemprop="actor" itemscope itemtype="http://schema.org/Person">
  6.     <span itemprop="name">Johnny Depp</span> [...]
  7.    </div> 
  8. </div>

Now, let's start adding to this. According to wikipedia, Johnny Depp is a US citizen. How can we express this clearly?

First, we need to find a Web link that represents 'United States of America'. Schema.org will make a directory of such lists, for various topics, although you can use any authority you like (or suggest new ones to schema.org). Here are some sources of Country identifiers:

When we are dealing with external enumerations for controlled property values, we handle this using links. In this case, each specific "Country" has an associated URL that we can use.

Basic Example: Links

This is the simplest way to indicate Johnny Depp's nationality. When we are in some markup describing Johnny Depp, we make a property link marked 'nationality' to the appropriate Country link. If we want the link to be human-visible, we use '<a>', otherwise we can use '<link>' (or div, span etc., see below). For example:

<div itemscope itemtype="http://schema.org/Movie">
  <h1 itemprop="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
  <span itemprop="description">Jack Sparrow and Barbossa embark on a quest to find the elusive 
           fountain of youth, only to discover that Blackbeard and his daughter are after it too.</span>
  <div itemprop="actor" itemscope itemtype="http://schema.org/Person">
    <span itemprop="name">Johnny Depp</span>, 
    <link itemprop="nationality" href="http://en.wikipedia.org/wiki/United_States"/>   </div>  
</div>

Note: instead of the Wikipedia United_States link, we could use this one into the UN FAO geopolitical ontology dataset: http://www.fao.org/countryprofiles/index.asp?lang=en&iso3=USA . If we instead used both Wikipedia and FAO URLs in the same description, there is some risk that some consumers would treat this as a claim about dual nationality. It is likely that over time, consumers will become smarter in understanding the links between values from different authorities, e.g. in this case realizing that the two expressions both describe the same country. However this cannot be guaranteed while at the same time remaining open and extensible.

Note: the current definition for addressCountry is The country. For example, USA. You can also provide the two-letter ISO 3166-1 alpha-2 country code. ...this uses the common schema.org idiom of allowing descriptions to contain either strings or detailed descriptions. Sometimes a controlled property value will be written as a simple code, like 'fr'. Sometimes it will be a particular identified thing, such as the country France, identified in markup via URL.

Advanced Example: Using inline description and 'itemid'

In some cases, your content will contain more information about some value of an enumerated property. This can also be represented using schema.org markup.

The Microdata syntax includes a mechanism for indicating a URI identifier for typed items. Here we show these same examples, but using an inline description of the relevant Country, using schema.org's existing vocabulary for this. Instead of 'nationality' being used with a simple link, we use it to wrap a block of markup, and put that same URL in the item description using an 'itemid' attribute.

In this example, we use a Wikipedia URL. It happens to use 'div' rather than 'a', but in either case the attributes would be the same:

  1. <div itemscope itemtype="http://schema.org/Movie">
  2.   <h1 itemprop="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
  3.   <span itemprop="description">Jack Sparrow and Barbossa embark on a quest to find the elusive 
  4.            fountain of youth, only to discover that Blackbeard and his daughter are after it too.</span>
  5.   <div itemprop="actor" itemscope itemtype="http://schema.org/Person">
  6.     <span itemprop="name">Johnny Depp</span>, 
  7.     <div itemprop="nationality" itemscope itemid="http://en.wikipedia.org/wiki/United_States" itemtype="http://schema.org/Country"></div>
  8.   </div>
  9. </div>

However, adding all this extra markup doesn't make a lot of sense on its own. So here is how to extend the description of the country, adding in extra details. In this case we are using UN FAO instead as our authority, and show that we can give further properties (e.g. 'name') of the Country inline:

  1. <div itemscope itemtype="http://schema.org/Movie">
  2.   <h1 itemprop="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
  3.   <span itemprop="description">Jack Sparrow and Barbossa embark on a quest to find the elusive 
  4.            fountain of youth, only to discover that Blackbeard and his daughter are after it too.</span>
  5.   <div itemprop="actor" itemscope itemtype="http://schema.org/Person">
  6.     <span itemprop="name">Johnny Depp</span>, 
  7.     <div itemprop="nationality" itemscope itemid="http://www.fao.org/countryprofiles/index.asp?lang=en&iso3=USA" itemtype="http://schema.org/Country">
  8.         <span itemprop="name">United States of America</span>
  9.     </div>
  10.   </div>  
  11. </div>

Q: Which style should I use?

A: Often a simple link is enough, but sometimes you'll want to describe an item within your own content as well. Sometimes you'll want the link to be human-visible (so use 'a'), other times that would be distracting (use the 'link' or 'itemid'-based examples instead).

Cuisine Example

Schema org has some properties, 'servesCuisine', 'recipeCuisine' whose expected type is a simple 'Text' value, but which would benefit from using externally managed lists of cuisines. At this time Schema.org does not have a type for 'Cuisine', or 'CuisineCategory' etc. The proper process for making 'servesCuisine' externally enumerable would be for such a class to be added as an option for the expected type on the relevant properties.

'For the sake of this example, we will use the fictional type 'CuisineCategory' here. Future revisions may add more detailed advice on general types to use.

Here is a simplified Microdata description of a Restaurant, based on the FoodEstablishment example:

  1.     <div itemscope itemtype="http://schema.org/Restaurant">
  2.       <span itemprop="name">GreatFood</span>
  3.       Hours:
  4.       <meta itemprop="openingHours" content="Mo-Sa 11:00-14:30">Mon-Sat 11am - 2:30pm
  5.       <meta itemprop="openingHours" content="Mo-Th 17:00-21:30">Mon-Thu 5pm - 9:30pm
  6.       <meta itemprop="openingHours" content="Fr-Sa 17:00-22:00">Fri-Sat 5pm - 10:00pm
  7.       Categories:
  8.       <span itemprop="servesCuisine">Middle Eastern</span>,
  9.       <span itemprop="servesCuisine">Mediterranean</span>
  10.       Price Range: <span itemprop="priceRange">$$</span>
  11.       Takes Reservations: Yes
  12.     </div>

Remembering that the 'servesCuisine' property has 'Text' values, how can we improve this markup, to point to controlled lists of cuisines?

Here's a simplified version, showing just the part we're studying, alongside some controlled links from Wikipedia that capture relevant notions of Mediterranean cuisine and Middle Eastern cuisine.

  1. <div itemscope itemtype="http://schema.org/Restaurant">
  2.   <span itemprop="name">GreatFood</span>
  3.   <span itemprop="servesCuisine" itemscope itemtype="http://schema.org/CuisineCategory" itemid="http://en.wikipedia.org/wiki/Middle_Eastern_cuisine">Middle Eastern</span>,
  4.   <span itemprop="servesCuisine" itemscope itemtype="http://schema.org/CuisineCategory" itemid="http://en.wikipedia.org/wiki/Mediterranean_cuisine">Mediterranean</span>
  5. </div>

Community suggestions on improvements to support external enumerations in this way are always welcome on the public-vocabs@w3.org list.

Document topic example

A very common scenario is to want to describe the subject (also often called 'topic') of a document, by using a controlled vocabulary rather than arbitrary keywords. The library and bibliographic community have created many such systems over the years, and they are increasingly available as open structured data. Often such vocabularies are expressed using W3C's SKOS system, which models them as a hierarchy of linked 'concepts', each with textual (possibly multilingual) labels of various kinds. See the final report of the W3C Linked Library Data group for more details.

For schema.org, the CreativeWork class has a property 'about' whose values can come from controlled value systems. For example, the Library of Congress use LCSH, and assign Web identifiers to each concept there. Many other vocabularies do likewise.

Here is an example using a direct link to a controlled value page, with the 'about' property:

  1. <div itemscope itemtype="http://schema.org/Book">
  2. <span itemprop="name">Introduction to Linear Algebra</span> - <link itemprop="bookFormat" href="http://schema.org/Paperback" />Paperback
  3. by <a itemprop="author" href="/author/g_strang.html">Gilbert Strang</a>
  4. <link itemprop="about" href="http://id.loc.gov/authorities/sh85003441#concept"/>
  5. </div>


Initial Dataset Directory

Our starter list of authorities is given here. Suggestions for others are strongly encouraged.

Authority Authority_id Summary Topics URL Pattern(s)
UN FAO GeoPolitical Ontology faogeo United Nations Food and Agriculture Organization's GeoPolitical Ontology. Dataset providing in depth descriptions of countries and related geopolitical entities, as well as various formal codes for these. See homepage and Wikipedia entry, countrylist. Example term: 'USA'. Related schema.org term(s): Country. http://www.fao.org/countryprofiles/index.asp?lang=en&iso3={term} (test case: term=USA)
Wikipedia wikipedia Wikipedia, the famous encyclopedia. Everything you can imagine. http://en.wikipedia.org/wiki/{term} (test case: term=United_States)
US Library of Congress loc The Library of Congress, in the United States. Everything you can imagine. Example term: 'sh85003441' (Linear Algebra) http://id.loc.gov/authorities/{term}#concept (test case: term=sh85003441)
The Universal Decimal Classification udc The Universal Decimal Classification (UDC) is a bibliographic and library classification developed by the Belgian bibliographers Paul Otlet and Henri La Fontaine at the end of the 19th century, and is widely used in library catalogues internationally. The UDC is managed by the UDC Consortium. See also blog. The more general levels of the classification are available in public; the full detailed schedules are subscription only. Multilingual translations are available. Everything you can imagine. http://udcdata.example.org/TODO/{term}#TODO (test case: term=TODO)


Possible Additions (TODO and for discussion):

See the IETF URI Template spec for a more formal notation for such patterns.

See also

Examples in RDFa

These are written in the Lite flavour of RDFa.

Country example

  1. <div vocab="http://schema.org/" typeof="Movie">
  2.   <h1 property="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
  3.   <span property="description">Jack Sparrow and Barbossa embark on a quest to find the elusive
  4.            fountain of youth, only to discover that Blackbeard and his daughter are after it too.</span>
  5.   <div property="actor" typeof="Person">
  6.     <span property="name">Johnny Depp</span>,
  7.    </div>
  8.   <div property="actor" typeof="Person">
  9.     <span property="name">Penelope Cruz</span>,
  10.   </div>
  11.   <div property="actor" typeof="Person">
  12.     <span property="name">Ian McShane</span>
  13.   </div>
  14. </div>

Here are two ways of using external authorities to indicate Johnny Depp's nationality

  1. <div vocab="http://schema.org/" typeof="Movie">
  2.   <h1 property="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
  3.   <span property="description">Jack Sparrow and Barbossa embark on a quest to find the elusive
  4.            fountain of youth, only to discover that Blackbeard and his daughter are after it too.</span>
  5.   <div property="actor" typeof="Person">
  6.     <span property="name">Johnny Depp</span>,
  7.     <link property="nationality" href="http://en.wikipedia.org/wiki/United_States"/>
  8.    </div>
  9. </div>

or

  1. <div vocab="http://schema.org/" typeof="Movie">
  2.   <h1 property="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
  3.   <span property="description">Jack Sparrow and Barbossa embark on a quest to find the elusive
  4.            fountain of youth, only to discover that Blackbeard and his daughter are after it too.</span>
  5.   <div property="actor" typeof="Person">
  6.     <span property="name">Johnny Depp</span>,
  7.     <link property="nationality" href="http://www.fao.org/countryprofiles/index.asp?lang=en&iso3=USA"/>
  8.    </div>
  9. </div>

Example using inline description and 'resource'

  1. <div vocab="http://schema.org/" typeof="Movie">
  2.   <h1 property="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
  3.   <span property="description">Jack Sparrow and Barbossa embark on a quest to find the elusive
  4.            fountain of youth, only to discover that Blackbeard and his daughter are after it too.</span>
  5.   <div property="actor" typeof="Person">
  6.     <span property="name">Johnny Depp</span>,
  7.     <div property="nationality" resource="http://en.wikipedia.org/wiki/United_States" typeof="Country"></div>
  8.   </div>
  9. </div>

similar example, using UN FAO instead as our authority

  1. <div vocab="http://schema.org/" typeof="Movie">
  2.   <h1 property="name">Pirates of the Carribean: On Stranger Tides (2011)</h1>
  3.   <span property="description">Jack Sparrow and Barbossa embark on a quest to find the elusive
  4.            fountain of youth, only to discover that Blackbeard and his daughter are after it too.</span>
  5.   <div property="actor" typeof="Person">
  6.     <span property="name">Johnny Depp</span>,
  7.     <div property="nationality" resource="http://www.fao.org/countryprofiles/index.asp?lang=en&iso3=USA" typeof="Country">
  8.         <span property="name">United States of America</span>
  9.     </div>
  10.   </div>
  11. </div>

Cuisine Example

  1.     <div vocab="http://schema.org/" typeof="Restaurant">
  2.       <span property="name">GreatFood</span>
  3.       Hours:
  4.       <meta property="openingHours" content="Mo-Sa 11:00-14:30">Mon-Sat 11am - 2:30pm
  5.       <meta property="openingHours" content="Mo-Th 17:00-21:30">Mon-Thu 5pm - 9:30pm
  6.       <meta property="openingHours" content="Fr-Sa 17:00-22:00">Fri-Sat 5pm - 10:00pm
  7.       Categories:
  8.       <span property="servesCuisine">Middle Eastern</span>,
  9.       <span property="servesCuisine">Mediterranean</span>
  10.       Price Range: <span property="priceRange">$$</span>
  11.       Takes Reservations: Yes
  12.     </div>

with controlled links from Wikipedia

  1. <div vocab="http://schema.org/" typeof="Restaurant">
  2.   <span property="name">GreatFood</span>
  3.   <span property="servesCuisine" typeof="CuisineCategory" resource="http://en.wikipedia.org/wiki/Middle_Eastern_cuisine">Middle Eastern</span>,
  4.   <span property="servesCuisine" typeof="CuisineCategory" resource="http://en.wikipedia.org/wiki/Mediterranean_cuisine">Mediterranean</span>
  5. </div>

Document examples, direct link to a controlled value page

  1. <div vocab="http://schema.org/" typeof="Book">
  2. <span property="name">Introduction to Linear Algebra</span> - <link property="bookFormat" href="http://schema.org/Paperback" />Paperback
  3. by <a property="author" href="/author/g_strang.html">Gilbert Strang</a>
  4. <link property="about" href="http://id.loc.gov/authorities/sh85003441#concept"/>
  5. </div>