ACTION-282 = Draft a finding on metadata architecture

Jonathan Rees, 2 December 2009

Not ready to draft a finding because:

I need a clearer idea of what needs to be said and why
I need to do more research (tracking down examples and analyzing them) - in particular, nearly every bullet point below needs some thought and at least one example

Here are some thoughts on organizing these two tasks.

Why should the TAG get into this?

Why should the TAG care about metadata?

it's the TAG's job to encourage a connected, open, inclusive Web
there's a vague feeling that there might be something to be learned by looking at metadata globally...
in particular, something that might help us understand and evaluate some of the many puzzles that come our way...
- RDFa vs. microdata
- LRDD / Link: / .well-known / <link>
- XRD
- HTTP semantics (including redirections)
- using URIs to "refer" vs. to "locate"
- <link rel="canonical">
- multimedia bookmarking
- what is "authoritative"
and also might help us identify opportunities for advocacy and/or standardization

What is the division of responsibility between TAG and others (e.g. W3C working groups)?

WGs have looked at various pieces (RDF, SKOS) but not the big picture. That's as it should be
interests of WGs may not be aligned with those of the TAG (for which see above)
heavy existing interest in many communities (e.g. digital repositories)
IETF mostly takes care of relevant protocols

What could we possibly do? We're not a WG.

Descriptive text: Here is the current state of the art, presented in an organized, thoughtful way; with gap analysis.
Prescriptive text: Here are some ways in which, hypothetically, things might be made better through the actions of WGs, Web publishers, and others.
Facilitation (?): ask so-and-so to talk to so-and-so.

Background

In more or less chronological order:

Metadata on the Web: A survey Feb 2009
Discussion of first-party-provided metadata (as opposed to metadata from other sources). This has mostly fallen under ISSUE-62 (303, LRDD)
Larry Masinter email on July 21 announcing ISSUE-63: Metadata Architecture for the Web, closing ACTION-254. This frames the issue, saying what the components of any metadata architecture ought to be.
Steve Rowat's writings:
1. US patent 6782394 "Representing object metadata in a relational database system"
2. email to www-tag 21 August "Goals of a W3C-mediated Global Metadata System"
3. email to www-tag 21 Sep "Ten Use-Cases of Individual Content Authors Requiring Rights/Commerce Metadata"
I found these to be interesting as they raise the issue of attaching metadata to resource parts such as sections, passages, tracks, time intervals. This relates to RDFa which (among other things) supports metadata reffering to parts of XHTML documents (I believe!? - verify this), and our previous discussion of URIs for video segments.
Metadata-as-deployed thread started by Dan

Questions

For further research:

Does metadata have any special role on the web (as compared to other kinds of content)?

Yes, metadata makes the data (and therefore the Web) more valuable in particular ways that both promote interoperability and benefit from it.

makes things more discoverable both locally and globally
enables management of collections of things
opens up communication channels (e.g. provenance, licensing)

Is there a problem? In what ways is metadata on the web less webby (connected, open, inclusive) than it ought to be?

Not enough of it (example?) - poor incentives for creating it

most web pages don't have it
<author> was a failure, value not realized
why bother with <link> or RDFa ?
level of reuse of metadata is inadequate to justify investment
authors not usually competent or motivated to curate their own stuff
creator / curator / consumer interests not aligned

Difficult to deploy

well, maybe no easier or harder than web publishing generally
structured formats are always more painful than free text
LRDD will only be used when someone really needs to use it
RDFa is early days, relatively untested, poor tool support

Hard to validate

A lot of what's there is closed, e.g.

openurl DOI metadata sits behind login
Pubmed can be accessed piecemeal, RESTfully, but bulk load requires license
iTunes ? (research this)

Difficult to use at scale

many many gardens: indexes, repositories, databases, catalogs (Pubmed, Citeulike, IMDB, Flickr, Amazon, ...)
nothing web-like (other than big search engines)

Being a master consumer of metadata is complicated (XMP, GIF, <link>, LRDD, 303, RDFa, ...)

not everyone can do this
too many "standard" reference formats
too many protocols (examples?)

Doubt and uncertainty regarding data identity

what exactly is it about - what's the subject?
is this about that, or not?
when are two metadatums about the same data?

Unclear lines of authority, thus difficult to evaluate for trustworthiness (but how is this different from any other content on the web?)

How to consistently identify people, organizations, places (organize bookmarks by author, photos by place)

What metadata do we care about?

Is there such a thing as data that is not metadata? Metadata that is not data? Can non-data have metadata? If X is about Y, does that mean X is in scope of this project, or is X only in scope when Y is data?

Are OpenID and XRD a metadata use case, or just application-related data?

Is RDF nose-following (linked data) a metadata use case, a world unto itself, or an intersecting world?

Should we focus our attention on particular metadata profiles (e.g. Dublin Core), or on the meta-metadata problem (bibliographic would be a special case, offers to sell might another, audio another, etc.)?

standardizing particular things is always warm and fuzzy, and there's a huge need for agreed-upon formats...
probably that should be done in a WG, not by the TAG

Is "metadata" even the right word to cover this project?

What deployments / use cases are inspirational?

Both private and public, that is. Figure out business models, especially for the more public sources.

CiteSeer
Delicious
iTunes
Papers
Flickr ?
Creative Commons licensing
how is XMP metadata actually used?
Google Scholar ??
Tabulator ???
any cool SKOS clients? (check recent thread)
Gene Ontology annotations

What potential technical opportunities are there?

Things someone might do in RDF-land to advance web metadata:

applicability statements - what to use, when, how
common representations - particular vocabularies, ontologies, schemas
style recommendations
metadata deployment guide(s)
richer logical models, with relations between entities (e.g. part/whole, versions, quality)
ontologies of metadata subjects (class hierarchy)
subject headings (SKOS)
how to serialize using an RDF serialization
how to import RDF into non-RDF settings (example please?)
why is RDF, which was designed for this purpose, not getting uptake in this application area? Is there something someone can do to make it a better fit to community needs?

Elsewhere:

what would we have to say to someone who is not touching RDF?

Does "metadata architecture" make sense? Is it something to be discovered (empirical), designed (invented), or some of each?

Is anything different now compared to 10 years ago when RDF and Dublin Core were published?

The Metadata Activity Statement (1998) is worth a look.

more experience, more frustration
W3C recommendations haven't met with universal acclaim
How do they fall short? are they fixable?
Does any "market" (community) care enough to do anything? Is there will?