20. XHTML Metainformation Module

Contents

This section is normative.

The Metainformation Module defines elements that allow the definition of relationships. These may relate to:

Note that this module is dependent upon the Metainformation Attributes module. The attributes defined therein are available on the elements defined in this module, and their semantics are an important part of understanding how these elements behave.

For example, some metadata about the document itself might be who wrote it:

this document was written by Mark Birbeck

Some metadata about an item external to the document might be an author of a book:

Crime and Punishment was written by Dostoevsky

And some metadata about information already provided within the document may be that the author of this document lives in London:

this document was written by Mark Birbeck
    and Mark Birbeck lives in London
Elements Attributes Minimal Content Model
link Common ( link | meta )*
meta Common ( ( PCDATA | Text )* | meta+ )

When this module is selected, the link and meta elements are added to the Structural and Text content sets of the Structural and Text Modules. In addition, the elements are added to the content model of the head element defined in the Document Module. Finally, when this module is selected, the associated Metainformation Attributes module must also be selected.

Implementation: RELAX NG

20.1. Literals and Resources

There are two types of properties that some item can have. The first is a simple string value, which is useful for specifying properties such as dates, names, numbers and so on:

this document was written on "March 21st, 2004"

This is not so useful though when trying to uniquely identify items that could occur in other places. Take the example of the document's author being "Mark Birbeck":

this document was written by "Mark Birbeck"

Since there are other people called Mark Birbeck, then we won't know which of them wrote what. We get round this problem by allowing the value referred to, to be a URI. For example:

this document was written by
<http://example.com/people/MarkBirbeck/654>

We distinguish these two types of properties by calling the first a 'string literal' and the second a 'resource'.

NOTE: Of course there is nothing to stop two people from using this URI to identify two completely different people. But in general URIs are accepted as a convenient way to identify a specific item.

20.2. Document Properties

20.2.1. Literals

20.2.1.1. String Literals

The simplest piece of metadata is a string literal attached to the containing document. This can be specified using meta. For example:

  <head>
    <meta property="author">Mark Birbeck</meta>
    <meta property="created" content="2004-03-20" />
  </head>

which states that:

  this document has an 'author' property of "Mark Birbeck";
  this document has a 'created' property of "2004-03-20".

20.2.1.2. XML Literals

It is also possible to include mark-up in the string. This will always be part of the string's value - in other words, no matter what the mark-up is, it will never be processed as if it were anything other than the value of the property:

  <head>
    <meta property="author" content="Albert Einstein" />
    <meta property="title">E = mc<sup>2</sup>: The Most Urgent Problem
of Our Time</meta>
  </head>

states that:

  this document has an 'author' property of "Albert Einstein";
  this document has a 'title' property of 
      "E = mc<sup>2</sup>: The Most Urgent Problem of Our Time".

However, just because the mark-up is not processed as mark-up does not mean it need not be well-formed and valid if the processor requires it.

20.2.1.3. Typed Literals

In some situations the value of a property is not sufficiently specified by a simple literal. For example, properties such as height or weight would require more than a string to fully specify them:

  <head>
    <meta property="height">87</meta>
  </head>

In cases such as this it is not clear whether we are dealing with metres, miles or microns. Whilst it's certainly possible to add the units to the literal itself there will be situations where this is not possible, and so the unit should be specified with datatype In this example we use the XML Schema type for date:

  <head>
    <meta property="created" datatype="xsd:date">2004-03-22</meta>
  </head>

20.2.2. Resources

There will be situations when a string literal is not suitable as the value of a property. In the example just given there would be no way to know which 'Mark Birbeck' we are referring to. This might not be a problem when documents are only used within one company, but this becomes a big problem when documents are used across the internet.

When we need to provide a unique identifier for the value of a property we use link. link identifies a relationship between one resource and another, and uses rel to indicate the nature of this relationship. In addition href contains the URI that is being used to uniquely identify the item being related to. For example:

  <head>
    <link rel="author"
resource="http://example.com/people/MarkBirbeck/654" />
  </head>

Note that just because we are using URIs as unique identifiers doesn't mean that navigating to this URI with a web browser would yield anything useful. This is perhaps easier to see with the following example:

  <head>
    <link rel="source" resource="urn:isbn:0140449132" />
  </head>

20.2.3. Making Use of External Lists of Properties

Best practice for specifying metadata is to try as much as possible to make use of common property names. This can often be achieved by using lists in use by other document authors within a similar field. There are many such lists for different sectors and industries, but for our examples here we will use Dublin Core[DCORE].

To replace the term 'author' with the more widely used Dublin Core term 'creator', we would need to not only substitute 'creator' for 'author', but also to indicate which list we are using. We achieve the latter by using XML namespaces:

  <head xmlns:dc="http://purl.org/dc/elements/1.1/">
    <meta property="dc:creator">Mark Birbeck</meta>
  </head>

Now we have stated that:

  this document has a property called 'creator' (which comes
  from a library of properties called the Dublin Core) and the
  value of that property is the literal "Mark Birbeck".

20.3. Properties of Other Resources

While it is common to create properties and values that say something about the document that contains them, there is often a need to add metadata that refers only to a section of the document, or to some external resource. This is achieved by using about, which can be present on meta and link.

20.3.1. Resources Within the Containing Document

A quote might be attributed as follows:

  <html xmlns:dc="http://purl.org/dc/elements/1.1/">
    <head>
      <link about="#q1" rel="dc:source" resource="urn:isbn:0140449132" />
    </head>
    <body>
      <blockquote id="q1">
        <p>
          'Rodion Romanovitch! My dear friend! If you go on in this way
          you will go mad, I am positive! Drink, pray, if only a few drops!'
        </p>
      </blockquote>
    </body>
  </html>

Note that the absence of about does not always mean that the metadata refers to the containing document. If the element containing metadata is a child of head, then it does relate to the document, and so the following mark-up:

  <head xmlns:dc="http://purl.org/dc/elements/1.1/">
    <meta property="dc:creator">Mark Birbeck</meta>
  </head>

can be regarded as a shorthand for this:

  <head xmlns:dc="http://purl.org/dc/elements/1.1/">
    <meta about="" property="dc:creator">Mark Birbeck</meta>
  </head>

20.3.2. External Resources

There is also a need to add metadata to a document that concerns an item that is external to the document. As before we use about, but this time we should provide an absolute or relative URI, rather than just a fragment identifier.

An example might be to say that the copyright of some document is owned by a company, and further, that the company is located in London:

  <head xmlns:dc="http://purl.org/dc/elements/1.1/">
    <link rel="dc:copyright"
          resource="http://example.com/company/BBC/6" />
    <meta about="http://example.com/company/BBC/6"
          property="dc:location">London</meta>
  </head>

20.4. Chaining Metadata

Metadata that is relevant to a resource referred to by a link can be placed inside the link element with no about. Our previous example could be re-written as follows:

  <head xmlns:dc="http://purl.org/dc/elements/1.1/">
    <link rel="dc:copyright"
          resource="http://example.com/company/BBC/6">
      <meta property="dc:location">London</meta>
    </link>
  </head>

There is no limit to the depth of this nesting.

If resource is omitted from a link then the nested metadata is still legitimate, it simply relates to an anonymous resource. For example, we might want to say that the 'mother tongue' of the author of Crime and Punishment is Russian, without saying anything further about the author:

  <html xmlns:dc="http://purl.org/dc/elements/1.1/">
    <head />
    <body>
      <blockquote id="q1">
        <link rel="dc:source" resource="urn:isbn:0140449132">
          <link rel="dc:creator">
            <meta property="con:motherTongue">rus</meta>
          </link>
        </link>
        <p>
          'Rodion Romanovitch! My dear friend! If you go on in this way
          you will go mad, I am positive! Drink, pray, if only a few drops!'
        </p>
      </blockquote>
    </body>
  </html>

When reading this metadata, the anonymous resource can be thought of simply as 'something'. This mark-up means:

  1. The quote has a source of Crime and Punishment.
  2. Crime and Punishment has a property of 'creator' (from the Dublin Core taxonomy), and the value of that property is something.
  3. The something that is the author of Crime and Punishment has a property of 'mother tongue' (from the SWAP contacts taxonomy), and the value of that Property is "Russian".

Note however that while placing further elements inside meta is structurally valid, it does not mean the same thing as the example we have just given, since the content of meta is an XML literal. The following:

  <blockquote id="q1">
    <link about="#q1" rel="dc:source" resource="urn:isbn:0140449132">
      <meta property="dc:creator">
        <meta property="con:motherTongue">rus</meta>
      </meta>
    </link>
    <p>...</p>
  </blockquote>

means that:

  1. the quote has a source of Crime and Punishment.
  2. Crime and Punishment has a property of 'creator' (from the Dublin Core taxonomy), and the value of that property is the XML literal "<meta property="con:motherTongue">rus</meta>".

20.5. The link element

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding, Map, Metainformation, and Hypertext

This element defines a link. Link conveys relationship information that may be rendered by user agents in a variety of ways (e.g., a tool-bar with a drop-down menu of links).

This example illustrates how several link definitions may appear in the head section of a document. The current document is "Chapter2.html". The rel attribute specifies the relationship of the linked document with the current document. The values "Index", "Next", and "Prev" are explained in the section on the attribute rel.

<head>
  <title>Chapter 2</title>
  <link rel="index" resource="../index.html"/>
  <link rel="next"  resource="Chapter3.html"/>
  <link rel="prev"  resource="Chapter1.html"/>
</head>

20.5.1. Forward and reverse links

While the rel attribute specifies a relationship from this document to another resource, the rev attribute specifies the reverse relationship.

Consider two documents A and B.

Document A:       <link resource="docB" rel="index"/>

Has exactly the same meaning as:

Document B:       <link resource="docA" rev="index"/>

namely that document B is the index for document A.

Both the rel and rev attributes may be specified simultaneously.

20.5.2. Links and search engines

Authors may use the link element to provide a variety of information to search engines, including:

The examples below illustrate how language information, media types, and link types may be combined to improve document handling by search engines.

The following example shows how to use the xml:lang attribute to indicate to a search engine where to find Dutch, Portuguese, and Arabic versions of a document. Note that this also indicates that the value of the title attribute for the link element designating the French manual is in French.

<head>
<title>The manual in English</title>
<link title="The manual in Dutch"
      rel="alternate"
      xml:lang="nl" 
      resource="http://example.com/manual/dutch.html"/>
<link title="The manual in Portuguese"
      rel="alternate"
      xml:lang="pt" 
      resource="http://example.com/manual/portuguese.html"/>
<link title="The manual in Arabic"
      rel="alternate"
      xml:lang="ar" 
      resource="http://example.com/manual/arabic.html"/>
<link title="La documentation en Fran&ccedil;ais"
      rel="alternate"
      xml:lang="fr"
      resource="http://example.com/manual/french.html"/>
</head>

In the following example, we tell search engines where to find the printed version of a manual.

<head>
<title>Reference manual</title>
<link media="print" 
      title="The manual in PostScript"
      restype="application/postscript"
      rel="alternate"
      resource="http://example.com/manual/postscript.ps"/>
</head>

In the following example, we tell search engines where to find the front page of a collection of documents.

<head>
<title>Reference manual -- Chapter 5</title>
<link rel="start" title="The first chapter of the manual"
      restype="application/xhtml+xml"
      resource="http://example.com/manual/start.html"/>
</head>

20.6. The meta element

For the following attributes, the permitted values and their interpretation are profile dependent:

Attributes

The Common collection
A collection of other attribute collections, including: Core, Events, I18N, Bi-directional, Edit, Embedding, Map, Metainformation, and Hypertext

The meta element can be used to identify properties of a document (e.g., author, expiration date, a list of key words, etc.) and assign values to those properties. This specification does not define a normative set of properties.

Each meta element specifies a property/value pair. The property attribute identifies the property and the content of the element or the value of the content attribute specifies the property's value.

For example, the following declaration sets a value for the Author property:

<meta property="Author">Steven Pemberton</meta>

Note. The meta element is a generic mechanism for specifying meta data. However, some XHTML elements and attributes already handle certain pieces of meta data and may be used by authors instead of meta to specify those pieces: the title element, the address element, the edit and related attributes, the title attribute, and the cite attribute.

Note. When a property specified by a meta element takes a value that is a URI, some authors prefer to specify the meta data via the link element. Thus, the following meta data declaration:

<meta property="DC.identifier">http://www.rfc-editor.org/rfc/rfc3236.txt</meta>

might also be written:

<link rel="DC.identifier"
         type="text/plain"
         resource="http://www.rfc-editor.org/rfc/rfc3236.txt"/>

20.6.1. meta and search engines

A common use for meta is to specify keywords that a search engine may use to improve the quality of search results. When several meta elements provide language-dependent information about a document, search engines may filter on the xml:lang attribute to display search results using the language preferences of the user. For example,

<!-- For speakers of US English -->
<meta property="keywords" xml:lang="en-us">vacation, Greece, sunshine</meta>
<!-- For speakers of British English -->
<meta property="keywords" xml:lang="en">holiday, Greece, sunshine</meta>
<!-- For speakers of French -->
<meta property="keywords" xml:lang="fr">vacances, Gr&egrave;ce, soleil</meta>

The effectiveness of search engines can also be increased by using the link element to specify links to translations of the document in other languages, links to versions of the document in other media (e.g., PDF), and, when the document is part of a collection, links to an appropriate starting point for browsing the collection.