Metadata on the Web: A survey

Jonathan Rees took an action (ACTION-227) on 2/12/2009: "Summarize TAG work on metadata, with Larry" due 2/24.

I decided to cast the net wider, so that we could better think about what we might do. I got a bit carried away, and this turned into a bird's-eye outline of this immense field. My research was done in the most lowbrow way: using a search engine, and using an online encyclopedia.

Relevant recent TAG email

Related TAG issues

Previous TAG documents

Uniform Access to Metadata
Jonathan Rees and Phil Archer, 2/2008, revised 2/2009. Contains use cases, solutions summary, and issues list.
Uniform Access to Information About
JAR, Nov 2008, prepared for TAG F2F. Gives one particular protocol (superseded by Hammer-Lahav's)
Uniform Access to Links and Properties
JAR's wiki page on the subject
Authoritative Metadata (TAG finding)

Work on metadata inside W3C

PICS (likely to be superseded by POWDER)
Media Annotations Working Group
Charter, August 2008
(a W3C rec that explicitly claims to be useful for obtaining metadata. Note especially DESCRIBE)
Understanding Metadata (part of WCAG 2.0)
WCAG WG note, 2008
Test Metadata
QA WG note, 2005
Metadata for Content Adaptation
Workshop, 2004
SVG Metadata
W3C recommendation 2003
W3C Metadata Activity Statement
Technology and Society Domain, 1998-2000. Superseded by Semantic Web Activity
Metadata Architecture
TimBL 1997

Definition of "metadata"

The wikipedia article is worth reading.

It's very important to understand the definition of "metadata": Metadata is data about data, or information about information. It is not arbitrary kinds of information, even if it is about something, unless that something is data. Although the word is widely abused to go beyond this scope, I (JAR) do not approve, since we already have perfectly good words for other kinds of information - data, description, information, etc.

In addition, the W3C SVG recommendation defines "metadata" consistently with general use, and to have other W3C documents vary from usage in a recommendation would be a bad idea.

(The etymology of "metadata" is a circus. The "meta" in "metadata" is a back-formation from "metaphysics", which originally just meant "the volume that comes after the physics volume" but because of that volume's content came to mean something much more... well, metaphysical.)

If we want to broaden the scope of the investigation - which already seems much too broad - beyond data about data, we'll need to choose a different word - "description", "information about", etc.

Applications for metadata

Some kinds of metadata

Kinds of data that might have metadata

Obviously any data can have metadata, but the following are specific domains in which people talk about metadata.

Formats suitable for use in representing metadata


Syntactic bases...

Embedded metadata / markup...

See Metadata standards, crosswalks, and standard organizations from QE II Library for a long list of links about metadata related to the library world. (W3C is on its list of 5 standards organizations in this area.)

Neither here nor there: Google search results are metadata; the books and media sections of Amazon are metadata; link lists and webliography such as LSRN are metadata.

Relevant protocols

Related but not on topic

The following are not on topic (data about data), but come up often in discussions of metadata on the Web.

Web Services Resource Access

TAG Resolution endorsing W3C Team Comment on the identifications of WS Transfer resources, WSRA WG charter, WS Metadata Exchange, WS Policy, and so on. All documents use the word "metadata" as describing "what other endpoints need to know to interact with" a service. This is definitely outside the accepted definition of the word "metadata", as a service is not data. You would not say that a planet's mass was "metadata" - it is just data. "Metadata" can be about the messages sent when using the service, but not the service itself. I would urge the WG to change their terminology.

What a URI identifies

In contexts where what a URI is supposed to "identify" matters, it becomes important to ask a URI owner what they want to say about this (AWWW You could consider this to be "metadata" if you took the position that what you recover is data about the URI, where the URI is data. JAR does not favor this usage.

Change log


Larry Masinter, Phil Archer, Dan Connolly, Eran Hammer-Lahav, Ray Denenberg