Jonathan Rees and Michael Hausenblas, 14 January 2010
Previous version: metameta-20091202.html
Metadata is a wide and amorphous topic and hence we have tried to identify relevant problem areas in a first step. In this memo we will focus on two metadata sub-areas: (i) data about documents, and (ii) data about core Web functions.
For each technology mentioned in respective sections below, we provide a concrete, real-world example that illustrates its usage.
As of Metadata on the Web: A survey, metadata in the strict sense. For the sake of argumentation and in the scope of this note, we define metadata in the traditional sense used in information science: descriptive information about document-like things. We intentionally stretch the semantics of word 'document' to construe it very broadly. So, not just books and articles but also static Web pages, audio recordings, images, videos, and other similar resources are covered by the word 'document' inhere. Metadata architecture then applies to the creation, maintenance, transmission, and application of this particular kind of information. This is of interest to W3C because so much of the value of the Web is tied up in this sort of object.
Core Web functions in the scope of this document are all functions of the Internet-based, REST eco-system aka the Web. This includes, but is not limited to, data about transfer, access, provenance, HTTP, authentication, Web services, discovery, etc.
In more or less chronological order:
What follows is the entire collection of 'old' stuff, incl. JAR's draft from 2010-01-07 for the road ahead. It will be step-by-step moved to the sections above.
"Metadata" is a huge, amorphous topic and it is important that we attempt to focus on a single problem or problem area. I have identified the following possibilities:
For the next draft (in progress), I propose looking at the both of the latter two, eventually either dropping one or, if necessary, splitting into two documents.
An architectural document is most successful when it starts from problems and applications, not technologies. If we look at the protocols and formats that are often discussed in the context of metadata (XML, OWL, POWDER, Link: header, and so on), we find that many of them are applicable to any of the above concentration areas (data-generally, data-about-documents, data-about-web-plumbing), and this fact helps explain why discussions can get so confusing. I propose that for the next draft discussion of any technologies or other considerations that seem to be independent of application area be relegated to a late section or appendix to help make its independence clear.
TBD: (1) Organize this document into three parts (a) data-about-documents (b) data-about-web-plumbing (c) common technological base. (2) For each format or technology mentioned in "What deployments / use cases are inspirational?" below, and in the earlier survey, provide a link or snippet that concretely illustrates the format, and locate one or more user-facing applications that make use of that format. E.g. for Dublin Core, find some actual "wild type" metadata that uses the DC vocabulary, and find at least one application that puts DC metadata to some interesting use (indexing, browsing, etc.).
Why should the TAG care about metadata?
What is the division of responsibility between TAG and others (e.g. W3C working groups)?
What could we possibly do? We're not a WG.
For further research:
Yes, metadata makes the data (and therefore the Web) more valuable in particular ways that both promote interoperability and benefit from it.
Not enough of it (example?) - poor incentives for creating it
Difficult to deploy
Hard to validate
A lot of what's there is closed, e.g.
Difficult to use at scale
Being a master consumer of metadata is complicated (XMP, GIF, <link>, LRDD, 303, RDFa, ...)
Doubt and uncertainty regarding data identity
Unclear lines of authority, thus difficult to evaluate for trustworthiness (but how is this different from any other content on the web?)
How to consistently identify people, organizations, places (organize bookmarks by author, photos by place)
Is there such a thing as data that is not metadata? Metadata that is not data? Can non-data have metadata? If X is about Y, does that mean X is in scope of this project, or is X only in scope when Y is data?
Are OpenID and XRD a metadata use case, or just application-related data?
Is RDF nose-following (linked data) a metadata use case, a world unto itself, or an intersecting world?
Should we focus our attention on particular metadata profiles (e.g. Dublin Core), or on the meta-metadata problem (bibliographic would be a special case, offers to sell might another, audio another, etc.)?
Is "metadata" even the right word to cover this project?
Both private and public, that is. Figure out business models, especially for the more public sources.
Things someone might do in RDF-land to advance web metadata:
Elsewhere:
Does "metadata architecture" make sense? Is it something to be discovered (empirical), designed (invented), or some of each?
The Metadata Activity Statement (1998) is worth a look.