Bert Bos (W3C) bert@w3.org
26 November 2011
This is a proposal for a new microformat for HTML, to help software automatically find bibliographic metadata about documents, in particular the metadata that conforms to the set known as the Dublin Core.
That set contains the typical properties found in a bibliography or library catalog, such as title, author, date, publisher and abstract. The DCMI (Dublin Core Metadata Initiative, the organization behind the Dublin Core) itself already defines a way to embed such metadata in HTML document in a machine-readable way, viz., with LINK and META elements, but that method suffers from some of the well known problems for which microformats were invented: error-prone editing, duplication of data, and invisible data.
The result of extracting the dcmi-coded data from an HTML document is thus a bibliographic record about that document.
This section is not normative.
The dcmi microformat is designed for a specific use-case (but can, of course, be used for other cases). That use-case is the case of a scientist who is researching the literature before writing a scientific paper. Whenever he reads an (electronic) article that he thinks he might cite in his paper, he records the bibliographic data of the article: author, title, publisher, date, etc. He probably has some reference manager (e.g., Mendeley, citeUlike or Zotero) or at least a database of references (BibTeX, Refer, etc.) and rather than copying the bibliographic data by hand, he prefers to copy-and-paste, or better still: to press a single button.
Some reference managers have tools that can parse certain kinds of online articles automatically. They may have built-in parsers for certain ACM publications or they read the DCMI metadata from META and LINK tags, as defined by the DCMI. As long as the article's publisher or author has provided the machine-readable metadata, the scientist indeed only has to press the button.
The dcmi microformat, once the various reference managers have learned to parse it, should make it easier for authors to add the required metadata and should also reduce the number of errors in such metadata. The result is that the scientist looking for articles to cite can even more often copy the bibliographic data with just a single click.
The dcmi microformat resembles other microformats, such as hcard and hcalendar, but also has some unique characteristics.
The syntax uses the well-known microformat patterns: CLASS attributes, REL attributes, ABBR elements, etc. Like rel-license, but unlike hcard, the dcmi microformat defines metadata of which the subject is, implicitly, the HTML document itself. Like hcard, dcmi uses a special class value, called the root class name (“vcard” for one, “dcmi” for the other) as a kind of sentinel: keywords are only recognized if they are inside an element with the root class name. But where each occurrence of the root class name “vcard” indicates the start of a separate vcard, the root class name “dcmi” only serves as sentinel and the keywords below it all contribute metadata to the same bibliographic record.
The reference for the dcmi keywords is the official specification of the DCMI Metadata Terms. All the terms, more precisely: the part referred to in that specification as the “name” of the term, can be used as a value on a CLASS attribute (following the microformat syntax known as the class design pattern), with the exception of the terms explained further down.
If the [TBD] URL is present in the PROFILE attribute of the HEAD element, then a UA that reads the dcmi microformat MUST look for DCMI terms on any element with a class value of “dcmi” and on its descendants. If the [TBD] URL is not present, a UA that reads the dcmi microformat MAY still look for DCMI terms on any element with a class value of “dcmi” and its descendants. A UA that reads the dcmi microformat MUST NOT look for DCMI terms on other elements.
Note: This means that a UA that writes markup conforming to the dcmi microformat MUST make sure that all elements that represent DCMI terms either have the class “dcmi” or are descendants of an element with that class. Such a UA SHOULD add the [TBD] profile.
Note: The class “dcmi” can occur multiple times and does not need to be on a common ancestor of all the DCMI terms. E.g., the following two HTML fragments represent the same data:
<body class=dcmi> <p><span class=creator>P. Maple</span>, <abbr class=date title=2011-12-15>15 Dec. 2011</abbr>
and:
<body> <p><span class="dcmi creator">P. Maple</span>, <abbr class="date dcmi" title=2011-12-15>15 Dec. 2011</abbr>
If the same dcmi term occurs multiple times, the value corresponding to that term is the concatenation of the values of all occurrences.
The following terms are handled specially:
<a rel=hasVersion href="document-B">previous</a>,although sometimes the class design pattern may be used:
<a class=hasVersion href="http://example.org/doc-B"> http://example.org/doc-B</a>.
<a href="license.html">Copyright</a>
This section is not normative.
A document marked-up like this:
<!doctype html public '-//W3C//DTD HTML 4.01//EN'> <html lang=en> <head profile= "http://microformats.org/profile/hcard [the-URL-for-the DC-microformat]"> <title>dcmi: The Dublin Core microformat</title> <body class=dcmi> <h1>dcmi: The Dublin Core microformat</h1> <p class="creator vcard"><span class=fn>Bert Bos</span> (<span class=org>W3C</span>) <a class=email href="mailto:bert@w3.org">bert@w3.org</a> <p class=date><abbr class=date title=2011-11-26>26 November 2011</abbr> <p class=abstract>This is a proposal for... <p class=abstract>That set contains... [...]
encodes the following Dublin Core metadata:
| Term | Value |
|---|---|
| language | en |
| title | dcmi: The Dublin Core microformat |
| creator | Bert Bos (W3C) bert@w3.org |
| date | 2011-11-26 |
| abstract | This is a proposal… That set contains… |
| type | text |
| identifier | [the URL of this document] |
| format | text/html |
Note that the “creator” uses the hcard microformat to provide extra structure for the author in the form of a vcard (not shown in the table above).
This section is not normative.
The DCMI has written a document (called the Singapore Framework) listing five components that each specification for an “application profile” of the Dublin Core should define. For the dcmi microformat, those components are as follows: