23. XHTML Metainformation Module

Contents

This section is normative.

The Metainformation Module defines elements that allow the definition of relationships. These may relate to:

Note that this module is dependent upon the Metainformation Attributes module. The attributes defined therein are available on the elements defined in this module, and their semantics are the essential part of how these elements behave.

Elements and attributes in this module are:

Elements Attributes Content Model
link Common ( link | meta )*
meta Common ( PCDATA | Text )*

Implementation: RELAX NG

23.1. The link element

Attributes

The Common collection
A collection of other attribute collections, including: Bi-directional, Core, Edit, Embedding, Events, Forms, Hypertext, I18N, Map, and Metainformation.

This element defines a link. Link conveys relationship information that may be rendered by user agents in a variety of ways (e.g., a tool-bar with a drop-down menu of links). User agents should enable activation of links and the retrieval of link targets. Since link elements may have no content, information from the rel and title attributes should be used when labelling links.

This example illustrates how several link definitions may appear in the head section of a document. The current document is "Chapter2.html". The rel attribute specifies the relationship of the linked document with the current document. The values "Index", "Next", and "Prev" are explained in the section on the attribute rel.

<head>
  <title>Chapter 2</title>
  <link rel="index" href="../index.html"/>
  <link rel="next"  href="Chapter3.html"/>
  <link rel="prev"  href="Chapter1.html"/>
</head>

23.1.1. Forward and reverse links

While the rel attribute specifies a relationship from this document to another resource, the rev attribute specifies the reverse relationship.

Consider two documents A and B.

Document A:       <link href="docB" rel="index"/>

Has exactly the same meaning as:

Document B:       <link href="docA" rev="index"/>

namely that document B is the index for document A.

Both the rel and rev attributes may be specified simultaneously.

23.1.2. Links and search engines

Authors may use the link element to provide a variety of information to search engines, including:

The examples below illustrate how language information, media types, and link types may be combined to improve document handling by search engines.

The following example shows how to use the hreflang attribute to indicate to a search engine where to find other language versions of a document. Note that for the sake of the example the xml:lang attribute has been used to indicate that the value of the title attribute for the link element designating the French manual is in French.

<html ... xml:lang="en">
<head> 
<title>The manual in English</title>
<link title="The manual in Dutch"
      rel="alternate"
      hreflang="nl" 
      href="http://example.com/manual/dutch.html"/>
<link title="La documentation en Français"
      rel="alternate"
      hreflang="fr" xml:lang="fr"
      href="http://example.com/manual/french.html"/>
</head>

In the following example, we tell search engines where to find the printed version of a manual.

<head>
<title>Reference manual</title>
<link media="print" 
      title="The manual in PostScript"
      hreftype="application/postscript"
      rel="alternate"
      href="http://example.com/manual/postscript.ps"/>
</head>

In the following example, we tell search engines where to find the front page of a collection of documents.

<head>
<title>Reference manual -- Chapter 5</title>
<link rel="start" title="The first chapter of the manual"
      hreftype="application/xhtml+xml"
      href="http://example.com/manual/start.html"/>
</head>

23.2. The meta element

Attributes

The Common collection
A collection of other attribute collections, including: Bi-directional, Core, Edit, Embedding, Events, Forms, Hypertext, I18N, Map, and Metainformation.

The meta element can be used to identify properties of a document (e.g., author, expiration date, a list of key words, etc.) and assign values to those properties. This specification defines a small normative set of properties, but users may extend this set as described for the property attribute.

Each meta element specifies a property/value pair. The property attribute identifies the property and the content of the element or the value of the content attribute specifies the property's value.

For example, the following declaration sets a value for the Author property:

Example

<meta property="dc:creator">Steven Pemberton</meta>

Note. The meta element is a generic mechanism for specifying metadata. However, some XHTML elements and attributes already handle certain pieces of metadata and may be used by authors instead of meta to specify those pieces: the title element, the address element, the edit and related attributes, the title attribute, and the cite attribute.

Note. When a property specified by a meta element takes a value that is a URI, some authors prefer to specify the metadata via the link element. Thus, the following metadata declaration:

Example

<meta property="dc:identifier">
      http://www.rfc-editor.org/rfc/rfc3236.txt
</meta>

might also be written:

Example

<link rel="dc:identifier"
      href="http://www.rfc-editor.org/rfc/rfc3236.txt" />

23.2.1. meta and search engines

A common use for meta is to specify keywords that a search engine may use to improve the quality of search results. When several meta elements provide language-dependent information about a document, search engines may filter on the xml:lang attribute to display search results using the language preferences of the user. For example,

Example

<!-- For speakers of US English -->
<meta property="keywords" 
      xml:lang="en-us">vacation, Greece, sunshine</meta>
<!-- For speakers of British English -->
<meta property="keywords" 
      xml:lang="en">holiday, Greece, sunshine</meta>
<!-- For speakers of French -->
<meta property="keywords" 
      xml:lang="fr">vacances, Grèce, soleil</meta>

The effectiveness of search engines can also be increased by using the link element to specify links to translations of the document in other languages, links to versions of the document in other media (e.g., PDF), and, when the document is part of a collection, links to an appropriate starting point for browsing the collection.

23.3. Literals and Resources

There are two types of properties that some item can have. The first is a simple string value, which is useful for specifying properties such as dates, names, numbers and so on:

Example

this document was written on "March 21st, 2004"

This is not so useful though when trying to uniquely identify items that could occur in other places. Take the example of the document's author being "Mark Birbeck":

Example

this document was written by "Mark Birbeck"

Since there are other people called Mark Birbeck, then we won't know which of them wrote what. We get round this problem by allowing the value referred to, to be a URI. For example:

Example

this document was written by
<http://example.com/people/MarkBirbeck/654>

We distinguish these two types of properties by calling the first a 'string literal' and the second a 'resource'.

NOTE: Of course there is nothing to stop two people from using this URI to identify two completely different people. But in general URIs are accepted as a convenient way to identify a specific item.

23.4. Document Properties

23.4.1. Literals

23.4.1.1. String Literals

The simplest piece of metadata is a string literal attached to the containing document. This can be specified using meta. For example:

Example

  <head>
    <meta property="dc:creator">Mark Birbeck</meta>
    <meta property="dc:created" content="2004-03-20" />
  </head>

which states that:

Example

  this document has an 'author' property of "Mark Birbeck";
  this document has a 'created' property of "2004-03-20".

23.4.1.2. XML Literals

It is also possible to include mark-up in the string. This will always be part of the string's value - in other words, no matter what the mark-up is, it will never be processed as if it were anything other than the value of the property:

Example

  <head>
    <meta property="dc:creator" content="Albert Einstein" />
    <meta property="dc:title">E = mc<sup>2</sup>: The Most Urgent Problem
of Our Time</meta>
  </head>

states that:

Example

  this document has an 'author' property of "Albert Einstein";
  this document has a 'title' property of 
      "E = mc<sup>2</sup>: The Most Urgent Problem of Our Time".

However, just because the mark-up is not processed as mark-up does not mean it need not be well-formed and valid if the processor requires it.

23.4.1.3. Typed Literals

In some situations the value of a property is not sufficiently specified by a simple literal. For example, properties such as height or weight would require more than a string to fully specify them:

Example

  <head>
    <meta property="height">87</meta>
  </head>

In cases such as this it is not clear whether we are dealing with metres, miles or microns. Whilst it's certainly possible to add the units to the literal itself there will be situations where this is not possible, and so the unit should be specified with datatype In this example we use the XML Schema type for date:

Example

  <head>
    <meta property="created" datatype="xsd:date">2004-03-22</meta>
  </head>

23.4.2. Resources

There will be situations when a string literal is not suitable as the value of a property. In the example just given there would be no way to know which 'Mark Birbeck' we are referring to. This might not be a problem when documents are only used within one company, but this becomes a big problem when documents are used across the internet.

When we need to provide a unique identifier for the value of a property we use link. link identifies a relationship between one resource and another, and uses rel to indicate the nature of this relationship. In addition href contains the URI that is being used to uniquely identify the item being related to. For example:

Example

  <head>
    <link rel="author"
          href="http://example.com/people/MarkBirbeck/654" />
  </head>

Note that just because we are using URIs as unique identifiers doesn't mean that navigating to this URI with a web browser would yield anything useful. This is perhaps easier to see with the following example:

Example

  <head>
    <link rel="source" href="urn:isbn:0140449132" />
  </head>

23.4.3. Making Use of External Lists of Properties

Best practice for specifying metadata is to try as much as possible to make use of common property names. This can often be achieved by using lists in use by other document authors within a similar field. There are many such lists for different sectors and industries, but for our examples here we will use Dublin Core[DCORE].

To replace the term 'author' with the more widely used Dublin Core term 'creator', we would need to not only substitute 'creator' for 'author', but also to indicate which list we are using. We achieve the latter by using XML namespaces:

Example

  <head xmlns:dc="http://purl.org/dc/elements/1.1/">
    <meta property="dc:creator">Mark Birbeck</meta>
  </head>

Now we have stated that:

Example

  this document has a property called 'creator' (which comes
  from a library of properties called the Dublin Core) and the
  value of that property is the literal "Mark Birbeck".

23.5. Properties of Other Resources

While it is common to create properties and values that say something about the document that contains them, there is often a need to add metadata that refers only to a section of the document, or to some external resource. This is achieved by using about, which can be present on meta and link.

23.5.1. Resources Within the Containing Document

A quote might be attributed as follows:

Example

  <html xmlns:dc="http://purl.org/dc/elements/1.1/">
    <head>
      <link about="#q1" rel="dc:source" href="urn:isbn:0140449132" />
    </head>
    <body>
      <blockquote id="q1">
        <p>
          'Rodion Romanovitch! My dear friend! If you go on in this way
          you will go mad, I am positive! Drink, pray, if only a few drops!'
        </p>
      </blockquote>
    </body>
  </html>

Note that the absence of about does not always mean that the metadata refers to the containing document. If the element containing metadata is a child of head, then it does relate to the document, and so the following mark-up:

Example

  <head xmlns:dc="http://purl.org/dc/elements/1.1/">
    <meta property="dc:creator">Mark Birbeck</meta>
  </head>

can be regarded as a shorthand for this:

Example

  <head xmlns:dc="http://purl.org/dc/elements/1.1/">
    <meta about="" property="dc:creator">Mark Birbeck</meta>
  </head>

23.5.2. External Resources

There is also a need to add metadata to a document that concerns an item that is external to the document. As before we use about, but this time we should provide an absolute or relative URI, rather than just a fragment identifier.

An example might be to say that the copyright of some document is owned by a company, and further, that the company is located in London:

Example

  <head xmlns:dc="http://purl.org/dc/elements/1.1/">
    <link rel="dc:copyright"
          href="http://example.com/company/BBC/6" />
    <meta about="http://example.com/company/BBC/6"
          property="dc:location">London</meta>
  </head>

23.6. Chaining Metadata

Metadata that is relevant to a resource referred to by a link can be placed inside the link element with no about. Our previous example could be re-written as follows:

Example

  <head xmlns:dc="http://purl.org/dc/elements/1.1/">
    <link rel="dc:copyright"
          href="http://example.com/company/BBC/6">
      <meta property="dc:location">London</meta>
    </link>
  </head>

There is no limit to the depth of this nesting.

If resource is omitted from a link then the nested metadata is still legitimate, it simply relates to an anonymous resource. For example, we might want to say that the 'mother tongue' of the author of Crime and Punishment is Russian, without saying anything further about the author:

Example

  <html xmlns:dc="http://purl.org/dc/elements/1.1/"
        xmlns:con="http://example.org/terms/" >
    <head />
    <body>
      <blockquote id="q1">
        <link rel="dc:source" href="urn:isbn:0140449132">
          <link rel="dc:creator">
            <meta property="con:motherTongue">rus</meta>
          </link>
        </link>
        <p>
          'Rodion Romanovitch! My dear friend! If you go on in this way
          you will go mad, I am positive! Drink, pray, if only a few drops!'
        </p>
      </blockquote>
    </body>
  </html>

When reading this metadata, the anonymous resource can be thought of simply as 'something'. This mark-up means:

  1. The quote has a source of Crime and Punishment.
  2. Crime and Punishment has a property of 'creator' (from the Dublin Core taxonomy), and the value of that property is something.
  3. The something that is the author of Crime and Punishment has a property of 'mother tongue' (from the SWAP contacts taxonomy), and the value of that Property is "Russian".

Note however that while placing further elements inside meta is structurally valid, it does not mean the same thing as the example we have just given, since the content of meta is an XML literal. The following:

Example

  <blockquote id="q1">
    <link about="#q1" rel="dc:source" href="urn:isbn:0140449132">
      <meta property="dc:creator">
        <meta property="con:motherTongue">rus</meta>
      </meta>
    </link>
    <p>...</p>
  </blockquote>

means that:

  1. the quote has a source of Crime and Punishment.
  2. Crime and Punishment has a property of 'creator' (from the Dublin Core taxonomy), and the value of that property is the XML literal "<meta property="con:motherTongue">rus</meta>".

23.7. Issues

rebuild link element: chapter, section / subsection PR #7869
State: Open
Resolution: None
User: None

Notes: