<?xml version='1.0'?>

<!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.6//EN" "http://www.w3.org/2002/xmlspec/dtd/2.6/xmlspec.dtd"
[
  <!-- ================================================================ -->
  <!ENTITY draft.day "24">
  <!ENTITY draft.month "05">
  <!ENTITY draft.monthname "May">
  <!ENTITY draft.year "2007">
  <!ENTITY iso6.doc.date "&draft.year;-&draft.month;-&draft.day;">
  <!ENTITY http-ident "http://www.w3.org/2001/tag/doc/selfDescribingDocuments">
]>



<spec w3c-doctype='wd' role='editors-copy'>
<header>
<title>The Self-Describing Web</title>
<w3c-designation>&http-ident;-&iso6.doc.date;</w3c-designation>
<w3c-doctype>Draft Tag Finding</w3c-doctype>
<pubdate><day>&draft.day;</day>
<month>&draft.monthname;</month>
<year>&draft.year;</year>
</pubdate>
<publoc>
<loc href='&http-ident;-&iso6.doc.date;.html'>&http-ident;-&iso6.doc.date;</loc>
</publoc>
<altlocs>
<loc href='&http-ident;-&iso6.doc.date;.xml'>XML</loc>
</altlocs>
<latestloc>
<loc href='&http-ident;.html'>&http-ident;</loc>
</latestloc>
<prevlocs>
<loc href="http://www.w3.org/2001/tag/doc/selfDescribingDocuments-2007-02-25.html">http://www.w3.org/2001/tag/doc/selfDescribingDocuments-2007-02-25</loc>
</prevlocs>
<authlist>
<author><name>Noah Mendelsohn</name>
<affiliation>IBM Corp.</affiliation>
<email href='mailto:Noah_Mendelsohn@us.ibm.com'>Noah_Mendelsohn@us.ibm.com</email></author>
</authlist>
<copyright>
<p>
<loc href='http://www.w3.org/Consortium/Legal/ipr-notice-20000612#Copyright'>Copyright</loc> &#xA9; 2006, 2007
<loc href='http://www.w3.org/'>W3C</loc><sup>&#xAE;</sup>
(<loc href='http://www.lcs.mit.edu/'>MIT</loc>,
<loc href='http://www.inria.fr/'>INRIA</loc>,
<loc href='http://www.keio.ac.jp/'>Keio</loc>),
All Rights Reserved. W3C
<loc href='http://www.w3.org/Consortium/Legal/ipr-notice-20000612#Legal_Disclaimer'>liability</loc>,
<loc href='http://www.w3.org/Consortium/Legal/ipr-notice-20000612#W3C_Trademarks'>trademark</loc>,
<loc href='http://www.w3.org/Consortium/Legal/copyright-documents-19990405'>document use</loc>, and
<loc href='http://www.w3.org/Consortium/Legal/copyright-software-19980720'>software licensing</loc>
rules apply.
</p></copyright>

<abstract>
<p>
The Web is designed to support flexible exploration of information, by human users and by automated agents.
For such exploration to be productive, 
information published by many different sources and for a wide variety of
purposes must be comprehensible to a wide variety of Web client software.
This finding suggests that there are three strategies that, used in combination, can ensure
such flexible interoperability:  1) where practical, resource representations should be encoded using widely deployed standards; 2) where such widely deployed standards are not sufficient, the encodings used should themselves be described in machine readable form on the Web, using RDF, RDDL, or other standard description systems; and 3) in all cases, each representation should carry information such as media-types, character encoding labels, RDFa, links to specifications, etc. sufficient to support automatic determination of the standards and other specifications necessary for correct interpretation.
To the extent that these guidelines are observed, individual documents become self-describing, in the sense that only widely available information is necessary for understanding them.
Furthermore, when such documents are linked together, the Web as a whole can support reliable,
ad hoc discovery of information.
This finding discusses in more detail the techniques needed to create such a <emph>self-describing Web</emph>.</p>
</abstract>

<status>


<p>This document has been produced by the <loc href='/2001/tag/'>W3C
Technical Architecture Group (TAG)</loc>.
This finding addresses TAG issue XXXX (to be opened).
</p>
<p>This version is an editor's draft and has not been approved by the TAG.  It has been
prepared for discussion at the <loc href="http://www.w3.org/2001/tag/2007/05/29-agenda">June 2007 Face to Face Meeting of the TAG</loc>,
and it is intended in part to address comments made at the <loc href="http://www.w3.org/2001/tag/2007/03/06-minutes#item08">March 2006 Face to Face Meeting of the TAG</loc>.</p>

<p><loc href='/2001/tag/findings'>Additional TAG findings</loc>, both
accepted and in draft state, may also be available. The TAG may 
incorporate this and other findings into 
future versions of the  <bibref ref='AWWW'/>.</p>

<p>The terms <rfc2119>MUST</rfc2119>, <rfc2119>SHOULD</rfc2119>, and
<rfc2119>SHOULD NOT</rfc2119> are used in this document
in accordance with <bibref ref='rfc2119'/>.</p>

<p>Please send comments on this finding to the publicly archived TAG
mailing list <loc href='mailto:www-tag@w3.org'>www-tag@w3.org</loc>
(<loc href='http://lists.w3.org/Archives/Public/www-tag/'>archive</loc>).</p>

</status>
<pubstmt>
<p>World-Wide Web Consortium,
Draft TAG Finding, 2005.</p>
</pubstmt>
<sourcedesc>
<p>Created in electronic form.</p>
</sourcedesc>
<langusage>
<language id='EN'>English</language>
</langusage>
<revisiondesc>
<slist>
<sitem>2002-04-30: Published draft</sitem>
</slist>
</revisiondesc>
</header>
<body>

<div1 id='Introduction'>
<head>Introduction</head>
<p>
The World Wide Web has at least three characteristics that distinguish it from many other shared information spaces:
<olist>
<item><p>The Web is global: the documents on the Web are contributed by and accessed by a very large number of users.</p></item>
<item><p>Supporting ad-hoc exploration is a goal of the Web.  Users must therefore be able to get
useful information from documents prepared by people whom they don't know, and with whom they have not coordinated in advance.</p></item>
<item><p>Web architecture dictates that <emph>any</emph> user agent may at any time issue a GET and attempt to interpret representations for <emph>any</emph> HTTP resource.</p></item>
</olist>
It seems fairly obvious that documents intended for a broad audience should be encoded using
standard formats, because user agents, such as Web browsers, can provide built in
support for such standards.
What may be less clear are the importance of having each resource representation unambiguously
indicate the conventions used to encode it, the possibilities for extending fixed
representation standards by providing machine readable specifications on the Web,
and the importance
of using such approaches even for documents that are primarily targeted to
a limited audience.
Applying these approaches results in documents that 
are self-describing, in the sense that only widely available information is necessary for understanding them.
Furthermore, when such documents are linked together, the Web as a whole can support reliable,
ad hoc discovery of information.
This finding discusses in more detail the techniques needed to create such a <emph>self-describing Web</emph>.
</p>
</div1>
  
<div1 id='standards'>
<head>Use of widely deployed standards and formats</head>
<p>
Electronic documents are used on the World Wide Web as a means of communication. 
Successful communication depends on the supplier and the consumer(s) of a document having a shared understanding of the information conveyed, and that in turn requires at least some shared assumptions about the form in which the information is represented.
The simplest way to achieve this is if the document is encoded using widely deployed standards and conventions.
</p>
<p>
As an example, consider the document you are reading now.  If you have a printed copy, then you and the author have implicitly agreed to communicate in English.  You have agreed that the English is set down using traditional typographical conventions, with the usual 26 letter alphabet and other symbols used to represent the words, punctuation, and so on.  
You are also depending on some shared assumptions about document structure, such as the use of a title to set an overall theme for the document, hierarchical sections used to reflect semantic structure, white space to set off paragraphs and so on.
In other respects, the document is self-describing.  Given the simple and widely shared assumptions about alphabet, typography and so on, it is possible for a reader with no additional knowledge to discover essentially
the full intended content of this finding.
</p>
<p>
If you are reading this document online using a Web browser, then you are benefiting from the
fact that its electronic representation is also based on widely deployed standards:
it is written in HTML, using the UTF-8 Unicode encoding, is served using the widely deployed HTTP protocol, and so on.
Because so many agents on the Web are compatible with that representation, this document
can be viewed in Web browsers, both on desktop machines and on mobile devices, it can be
parsed and decoded by search engine crawlers, and so on.
(See also the TAG Finding "The Rule of Least Power" <bibref ref="LeastPower"/> for a discussion of some other document characteristics that facilitate use of the information in this document.)
</p>
<p>
More compact encodings of this document are possible,
but they might well depend on assumptions that are less widely shared.
For example, instead of all the detailed information on the title page above, one might have written:
"Usual title stuff for TAG finding on self-description written by Noah in May."
For another member of the TAG, this sentence might have sufficed
to convey most of the information in the title page. 
He or she might have known that only one person named Noah has ever served on the TAG, and correctly guessed him to be the author.
The copyright
might have been inferred, the links to various W3C sites are well-known,
and the overall structure of title pages is common to most TAG findings.
The resulting encoding would indeed be much more compact.
Unfortunately, it would not reliably convey the full intended information to most readers on the Web, only to those with very specialized information.
Thus, the compact form is not sufficiently self-describing to be widely useful;  its correct interpretation depends on assumptions that are not broadly shared.
</p>
<p role="practice"><a name="GPNWidelyDep" id="GPNWidelyDep"></a>
<em>Good Practice:</em> 
Web resource representations SHOULD, to the extent practical, be encoded using widely deployed standards.
</p>
<p><!-- empty paragraph to keep good practice box from messing up the indentation of the heading to follow --></p>
</div1>
<div1 id='agreeing'>
<head>Determining the format of a representation</head>
<p>Just as certain shared assumptions were required for a reader to correctly understand
the markings comprising the printed form of this finding, 
the sender and receiver of a Web document must share some assumptions
if the bit streams representing the document are to be correctly interpreted.
It's not enough for the sender to know that standard formats or encodings
were used;
the receiver must be able to reliably discover which ones were chosen.
The HTTP protocol and associated standards are designed to facilitate
discovery of the encodings that have been used for each Web resource representation.
</p>
<p>
Again using this finding as an example: it is usually served on the 
Web as a sequence of bits (octets) using the HTTP protocol,
labeled with
the media type text/html and the associated character set (UTF-8).
Indeed, if you're reading this document online,
you may wish to use your browser's View Source or View Page Information
(or similar) feature to examine
some of these declarations.
Here is a representative portion of the HTTP returned for one of 
the early drafts of this document (a few headers not pertinent to this discussion have been removed, and carriage returns have been added to the HTML to 
make it easier to read):
</p>
<pre id="HTTPdump">
HTTP/1.1 200 OK
Date: Mon, 21 May 2007 22:55:45 GMT
Server: Apache/1.3.37 (Unix) PHP/4.4.7
Last-Modified: Mon, 26 Feb 2007 14:44:58 GMT
Content-Type: text/html; charset=utf-8

&lt;!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
&lt;html lang="EN">
&lt;head>
&lt;META http-equiv="Content-Type" content="text/html; charset=utf-8">
&lt;title>The Self-Describing Web&lt;/title>
...
</pre>
<p>

Typically, such encoding or format information is applied in a layered manner
after the representation is received.
So, for example, the knowledge that UTF-8 has been used is necessary to
interpret the
octet stream as characters, and the discovery that media type text/html
is used gives the receiver license to interpret the first of those characters
as an HTML DOCTYPE declaration.
(Note that if the same entity-body were served as text/plain, a user agent
would be guessing if it "sniffed" the document to determine
that the content could be processed as HTML, or if it
tried to infer the type of such a resource from an ".html" suffix
in the URI; as discussed in TAG Findings <bibref ref="AuthoritativeMetadata"/> and 
<bibref ref="MetadataInURI"/> such guessing is contrary to Web Architecture and is
strongly discouraged.
The Content-Type header is generally the appropriate means to determine
the character encoding and media type of a representation retrieved using
HTTP.) 
With knowledge that UTF-8 and media type text/html have been used,
a receiving user
agent can inspect the DOCTYPE to determine that in fact the document
uses the 4.01 Transitional variant of HTML, and can
parse the rest of the document to determine its tag structure.
From the <code>lang="EN"</code> attribute on the <code>HTML</code> element it can reliably determine that the text was intended to be
read as English, from the various HTML heading tags (e.g. <code>&lt;h1></code> and <code>&lt;h2></code>) it can determine the structuring of the document into sections,
and from the <code>&lt;a></code> tags it can discover the links and the anchors in the document,
and so on.
</p>
<p>
In short, a user agent can work step by step, starting with knowledge
of the HTTP protocol and its headers, to determine the full intended
interpretation of this example representation.
This representation not only conforms to standards, it advertises the standards
it uses so that a receiver can discover them.
The Web representation of this document is in that sense <emph>self-describing</emph>.
For the reasons discussed above, providing self-describing resource
representations is essential if the Web is to be an information
space that users and software agents can freely explore.
</p>
<p role="practice"><a name="GPNSelfDesc" id="GPNSelfDesc"></a>
<em>Good Practice:</em> 
Web resource representations SHOULD, to the extent practical, be <emph>self-describing</emph>.
</p>
<p>
Note that the above wording takes account of the fact that HTTP headers,
such as Content-Type, are considered to be part of the resource
representation, even though they are not part of the HTML entity-body content;
indeed, 
because the information needed to get started on finding the encodings
used is found in common ways for <emph>all</emph> representations returned using HTTP, I.e. in standard headers such as Content-type, HTTP facilitates the
creation and deployment of self-describing resources.
</p>
<p>
In many cases, such as in the example above, a small, bounded set of such standards is sufficient for representing the information in a document, but in others, more extensible conventions are needed.
The following sections discusses how technologies such as XML, RDF, RDDL,
GRDDL and others can be used to support the use of extensible, application-specific representation formats, and how user agents can dynamically discover information about the formats that have been used.
</p>


</div1>
<div1 id='dynamic'>
<head>Dynamic discovery using extensible specifications</head>
<p>
Dynamic discovery of specifications is necessary because of the ever changing nature of the
information on the Web.  
Indeed,
many documents, particularly those that convey machine-readable data or messages, encode detailed
information using specifications that may be specialized to particular purposes.
These may cover details of particular data formats 
such as list of customers or inventory records, experimental results of scientific
experiments, listings for television shows,
lists descriptions of universities or their course offerings, information about
molecular structures or drug tests, etc.
They may also provide new ways of representing document structure,
graphical images or message control structures such as SOAP headers.
Because of the great variety and number of such formats and their specifications,
and because new versions of such specifications are deployed
often, it's not practical
to assume that even most of them will be directly implemented by typical Web user agents.
</p>
<p>
A variety of Web technologies are available that allow for unambiguous labeling of the
specifications being used.
Furthermore, when such labels are URIs (or when, as with many XML Qualified Names, they can be mapped to URIs),
it may be possible to dynamically discover on the Web the logic or code needed
to understand, or at least to do partial processing of the content in question.
So, just as the Web may be used to dynamically discover a great wealth of resources, it can
also be used to dynamically discover the specifications, ontologies, or programs
needed to interpret the representations of those resources.
Web representations that use such domain- or application-specific formats should link to the information
needed to interpret them.
</p> 
<p role="practice"><a name="GPNDynamicDesc" id="GPNDynamicDesc"></a>
<em>Good Practice:</em> 
Representations that use application- or domain-specific formats SHOULD link to 
the information needed to support automatic processing of those formats.  [Need to find a less clumsy wording for this one...Noah]
</p>
<p>
Of course, when the standards used in a representation are widely deployed, as with HTTP, ASCII,
Unicode, XML and so on, there may be no need for a client to dynamically
integrate support for those standards;  as described above most Web user agents come with
built in support for widely used protocols and formats, including HTTP and Unicode,
media-types such as text/plain, text/html, image/jpeg, and so on.
Indeed, even when extensibility is desired,
it is generally necessary that each user agent provide built in support for
at least <emph>some</emph> standards, which should in turn be usable to discover information
about others.
The following sections explain how a number of Web technologies can be applied to achieve such
dynamic integration of new Web representation formats.  First, we consider the automatic
discovery of information needed to process
namespace-qualified XML markup. 
</p>
<div2 id="XMLSpecs">
<head>Self-describing XML documents</head>
<p>
XML documents with namespace-qualified elements are a widely used means of creating self-describing
Web documents.
Given that a Web document is of media type <code>application/xml</code>, or in the family of
media types <code>application/____+xml</code>, recursive processing from the root element down may be applied to
determine not just the overall nature of the document, but also the meaning in context
of its sub-elements.
Doing, this, however, requires understanding of the semantics of each named element.
Here we discuss one specific aspect of creating self-describing XML:  the use of namespace documents
that can be discovered automatically from the tag names used in the markup.
Later sections of this finding describe some additional techniques for creating self-describing XML.
</p>
<p>
When XML namespaces are used <bibref ref="XMLNamespaces"/>, each XML element is named with what is called a "Qualified Name", which consists of a prefix and a local name.  For example:
</p>
<pre id="xmlex1">
   &lt;inventory:itemNumber>87354&lt;/inventory:itemNumber>
   &lt;inventory:quantityAvailable>152&lt;/inventory:quantityAvailable>
</pre>
<p>
Here <code>inventory</code> is a prefix, and we see that it is used in the names of two elements, both of which
presumably have to do with describing items in some business' inventory.
The first element name has a local name <code>itemNumber</code> and the second has local name <code>quantityAvailable</code>.
Not shown above, but necessary for these to be well formed XML, is that each prefix be bound to a URI, for which it is a shorthand.
These bindings can be repeated on each element, or more conveniently, declared on a shared ancestor element such as the document's root:
</p>
<pre id="xmlex2">
   &lt;inventory:inventoryItem 
        xmlns:inventory="http://example.org/inventoryNamespace">
     &lt;inventory:itemNumber>
         87354
     &lt;/inventory:itemNumber>
     &lt;inventory:quantityAvailable>
         152
     &lt;/inventory:quantityAvailable>
   &lt;/inventory:inventoryItem>
</pre>
<p>
Although the element names are written using the prefix shorthand, the logical name of each
element is a pair consisting of the namespace name URI, and the local name.
The Namespaces in XML
recommendation calls these pairs <emph>expanded names</emph>;
for the example elements above, the namespace name is <code>http://example.org/inventoryNamespace</code> and the expanded names are <code>{http://example.org/inventoryNamespace,inventoryItem}</code>, <code>{http://example.org/inventoryNamespace,itemNumber}</code> and <code>{http://example.org/inventoryNamespace,quantityAvailable}</code>.
</p>
<p>
The namespace name URI serves at least two roles:  the most obvious and the most widely understood is that it serves to distinguish expanded names in one namespace from those in another;  the other role, and the one that's most important for purposes of this finding, is that it provides Web identification for the namespace itself.
The namespace is a Web resource, and like any other resource, it can and should provide representations of itself using HTTP.
<emph>A user agent processing an XML document can retrieve representations of the namespaces used in that document, and
can use that retrieved information to determine how to correctly process the XML markup.</emph>
The W3C TAG is currently working on a finding that will describe best practices for creating such representations of namespaces.
Drafts of the finding are available at <bibref ref="NamespaceDocuments"/>.
Most likely, the finding will recommend the use of <bibref ref="RDDL"/> as a preferred means of providing machine readable documentation
of namespaces.
RDDL is itself extensible, but it is commonly used to suggest XML Schemas (in any of several languages including the W3C XML Schema Language [Refs to be supplie]), XSLT Stylesheets, etc. that are usable with markup from the namespace being described.
</p>
<p>
Using the example above, let's assume that user Bob is browsing the Web, and that he follows a link to a resource that returns the XML above as its representation, using media type application/xml.
Of course, it's very unlikely that Bob's browser has built in knowledge of the inventory XML language, but his browser probably
can parse XML, and we assume that it also is aware of RDDL.
When the inventory description comes back, 
the browser uses the techniques already described to determine the character encoding, 
the media type application/xml, and it discovers
that the root element tag is from namespace <code>http://example.org/inventoryNamespace</code>.
That namespace is identified by an http-scheme URI,
so the brower does an HTTP GET and retrieves from the namespace resource a RDDL
document.
</p>
<pre id="RDDLexample">
...Need to put sample fragment of RDDL document here...
</pre>
<p>
The RDDL document in turn suggests a stylesheet that can be applied to format the inventory XML as HTML;
the browser automatically retrieves and applies the stylesheet, producing HTML that is
rendered on the screen.
Without any manual intervention from Bob, his browser automatically displays the inventory record in a format that's convenient to read and print.
Bob's browser may also be enabled for XML validation, in which case it can look in the RDDL for a link to a schema to be used for validating inventory markup, and can use it to check the document that Bob has received.
Bob's browser has, in an important sense, automatically extended itself for processing
of the inventory markup language.
</p>
<p>
Unless the RDDL provides a link to one or more executable program that processes inventory records, it's unlikely that Bob's browser can automatically discover <emph>everything</emph> that one might reasonably want to know about processing inventory
markup.
Still, even the limited automatic function described above very useful, and RDDL is an extensible framework that can
be easily adapted to provide new kinds of information about namespaces.
The document Bob retrieved was self-describing: even information needed to correctly process markup specific to inventory management was available by following links that were provided in the document itself.
</p>
<p>
Typically a TAG finding would at this point include a good practice note, suggesting the use
of RDDL or similar technologies to make XML documents on the Web self-describing;
in this case, the details of such recommendations are likely to be provided in the 
TAG finding [ref to namespaces documents finding], and so they are not formally restated here.
Note also that the TAG has opened an issue <loc href="http://www.w3.org/2001/tag/issues.html?type=1#xmlFunctions-34" xlink:actuate="onRequest" xlink:type="simple" xlink:show="replace">xmlFunctions-34</loc> and
is preparing an associated finding on the recursive interpretation of XML documents.</p>
</div2>

<div2 id="RDF">
<head>RDF and the Self-Describing Web</head>
<p>
RDF [ref to RDF] plays an important and distinguished role as the preferred technology for
creating self-describing Web data resources, and for integrating representations rendered using
other technologies.
The result is a single, global self-describing Semantic Web that integrates not only resources
that are themselves built or represented using RDF, but also the other Web resources to which
that RDF links.
Readers unfamiliar with RDF should consult the [ref to RDF primer] as a prerequisite to understanding the discussion below.
</p>
<p>
Each RDF statement is a triple consisting of a subject, a predicate (typically the identifier for a property, or for a relationship between two Web resources), and an object (typically
the value of the property or the referent of the relationship).
Crucially, the subject and the predicate
are themselves identified by URIs, 
enabling the same sort of dynamic discovery that we've already seen with namespace names &#8212;
if a user agent has no built in knowledge of some particular RDF subject or relationship
(or object if it's a URI),
it can often use the URI to retrieve the information necessary for processing.
</p>
<p>
Indeed, RDF's Schema [ref to RDF schema] and OWL Ontology technologies [ref to OWL ontology]
together offer
a standard, machine-processable means of describing particular uses of RDF.
Just as RDDL allowed Bob's browser to automatically discover the information needed
to process the XML inventory vocabulary, RDF and OWL provide the standard means by which
software can discover the the relationships between RDF statements (e.g. that two seemingly
differing predicates are the "owl:sameAS" each other), or other information needed for
processing the RDF.
</p>
<p>RDF and its companion Semantic Web technologies ultimately provide much richer
facilities for self-description than the combination of XML and RDDL.
Because its model is uniform, because all of its self-description is provided in the
same model as the data itself, and because all RDF information is linked
into the Web as a whole, RDF provides uniquely powerful facilities for dynamic
integration of a self-describing Web.</p>
<p>[[Need to add a Dirk/Nadia 
example here of why RDF is cooler than anything anyone's ever seen :-) ]] </p>

<p role="practice"><a name="RDFGPN" id="RDFGPN"></a>
<em>Good Practice:</em> 
Information provided directly in RDF, or information for which automated means can be used to
discover corresponding RDF, contributes to the self-describing Semantic Web.
</p>
<p>Because of RDF's unique role as the glue that binds the Web into a single, global self-describing framework, it's particularly important that information not originally supplied in RDF can
be selectively made available in RDF.  
The two sections below discuss two examples:  the first
shows how RDFa can integrate HTML documents into the Semantic Web,
and the second illustrates the use of GRDDL to extract
RDF from XML documents.
</p>
</div2>
<div2 id="RDFa">
<head>Using RDFa to produce self-describing HTML</head>
<p>
<bibref ref="RDFa"/> is a W3C draft Recommendation for embedding Semantic Web statements into ordinary HTML Web pages.
This example illustrates how RDFa can integrate HTML into the self-describing Semantic Web:
</p>
<p>
Mary is exploring the Web using a browser that has been
enhanced with capabilities for interpreting RDFa.
Her browser knows to look through each Web page that she browses, picking out useful information
from the RDFa, and helping her to use it.  For example, the page might contain the following,
which represents a VCard-style contact listing.  (This example is adapted from one
in <bibref ref="RDFa"/>):
</p>
<pre id="vCardExample">
    &lt;p class="contactinfo" 
          xmlns:contact="http://www.w3.org/2001/vcard-rdf/3.0#"
          about="http://example.org/staff/joseph">
        My name is
        &lt;span property="contact:fn">
            Joseph Smith
        &lt;/span>
        I'm a
        &lt;span property="contact:title">
            distinguished web engineer
        &lt;/span>
        at
        &lt;a rel="contact:org" href="http://example.org">
            Example.org
        &lt;/a>.
        You can contact me
        &lt;a rel="contact:email" href="mailto:joe@example.org">
            via email
        &lt;/a>.
    &lt;/p>
</pre>

<p>
Even though this document is of media type application/xhtml+xml,
which is not a member of the RDF family
of media types, an RDFa-enabled user agent can extract RDF from this document.
This document conveys as RDF a set of semantic Web statements about the Web resource
<code>http://example.org/staff/joseph</code>.  The predicates are all named with the
same base URI <code>http://www.w3.org/2001/vcard-rdf/3.0#</code>, for which the
shorthand prefix <code>contact</code> is established in the HTML.
Using this syntax, the RDFa carries triples for relationships such as the
full name of the contact
(<code>http://www.w3.org/2001/vcard-rdf/3.0#fn</code>), which is <code>Joseph Smith</code>,
the e-mail address (<code>http://www.w3.org/2001/vcard-rdf/3.0#email</code>) which is
<code>mailto:joe@example.org</code>,
and so on.
</p>
<p>
An RDFa-enabled user agent can extract these triples and integrate them with other
Semantic Web information.  As discussed above in <specref ref="RDF"/>, such
Semantic Web triples are inherently self-describing.
If the user agent needs more information about the processing of
the email triple, for example, it can do an HTTP GET to
<code>http://www.w3.org/2001/vcard-rdf/3.0#email</code> and use the results to
get more information.
With luck, that information will lead it to automatically discover that, for example,
<code>mailto:joe@example.org</code> can indeed be used to send mail to the person
named <code>Joseph Smith</code>.
The browser can then offer Mary the option to send e-mail to Joe, or to add Joe to
her address book.
</p>
<p role="practice"><a name="RDFaGPN" id="RDFaGPN"></a>
<em>Good Practice:</em> 
RDFa SHOULD be used to make information conveyed in HTML self-describing.
</p>
<!-- empty para helps formatting after GPN -->
<p/>
</div2>
<div2 id="GRDDLchap">
<head>Using GRDDL to bridge from XML to RDF</head>
<p>To be supplied in next version of this finding:  just as RDFa lets us get triples from HTML,
GRDDL lets us get triples from XML variants.
</p>
<p role="practice"><a name="GRDDLGPN" id="GRDDLGPN"></a>
<em>Good Practice:</em> 
GRDDL SHOULD be used to make information conveyed in XML self-describing.
</p>
<!-- empty para helps formatting after GPN -->
<p/>
</div2>
</div1>
<div1 id="conclusion">
<head>Conclusion</head>
<p>
The next draft of the finding will include a brief conclusion section summarizing the
highlights of the points made above.</p>
</div1>


<div1 id="ChangeLog">
<head>Change Log</head>
<div2 id="ChangeMay242007">
<head>Changes in 24 May 2007 Edition</head>
<ulist>
<item><p>Changed title to "Self-describing Web"</p></item>
<item><p>New discussion of discovery of specs, role of RDF, etc.</p></item>
<item><p>Extensive editorial work.</p></item>
</ulist>
</div2>
</div1>


<div1 id='references'>
<head>References</head>

<blist>
<bibl id="AuthoritativeMetadata" href="http://www.w3.org/2001/tag/doc/mime-respect">R. Fielding, I. Jacobs, <titleref>Authoritative Metadata</titleref>. W3C Technical Architecture Group Finding, April, 2006.</bibl>
<bibl id='AWWW' href='http://www.w3.org/TR/webarch/'>I.Jacobs, 
N. Walsh, <titleref>Architecture of the World Wide Web</titleref>.
W3C. December, 2004.</bibl>
<bibl id="GRDDL" href="http://www.w3.org/TR/grddl/">D. Connolly,  <title>Gleaning Resource Descriptions from Dialects of Languages (GRDDL)</title>, W3C Candidate Recommendation, May, 2007</bibl>
<bibl id='LeastPower' href='http://www.w3.org/2001/tag/doc/leastPower'>T. Berners-Lee, N. Mendelsohn B. Adida, M. Birbeck <titleref>The Rule of Least Power</titleref>
W3C Technical Architecture Group Finding, February, 2006dg="noahcomp" diff="add"w.</bibl>
<bibl id="MetadataInURI" href="http://www.w3.org/2001/tag/doc/metaDataInURI-31">N. Mendelsohn, S. Williams, <titleref>The use of Metadata in URIs</titleref>. W3C Technical Architecture Group Finding, January, 2007.</bibl>
<bibl id="NamespaceDocuments" href="http://www.w3.org/2001/tag/doc/nsDocuments/">N. Walsh, <titleref>Associating Resources with Namespaces</titleref>. W3C Technical Architecture Group Draft Finding, December, 2005.</bibl>
<bibl id='RDDL' href='http://www.rddl.org/'>J. Borden, T. Bray, <titleref>Resource Directory Description Language (RDDL)</titleref>.
W3C. February, 2002.</bibl>
<bibl id='RDFa' href='http://www.w3.org/TR/xhtml-rdfa-primer/'>B. Adida, M. Birbeck <titleref>RDFa Primer 1.0: Embedding RDF in XHTML</titleref>
W3C. (working draft) March, 2007.</bibl>
<bibl id="XMLNamespaces" href="http://www.w3.org/TR/xml-names11/">T. Bray, D. Hollander, A. Layman, R. Tobin,  <titleref>Namespaces in XML 1.1</titleref>. W3C, August, 2006 (2nd Edition).</bibl>

</blist>
</div1>
</body>

 <back>
    <div1>
      <head>Change log</head>
      <slist>
         <sitem>6-Dec-2005 [NRM]: initial version</sitem>
      </slist>
      <slist>
         <sitem>25-Feb-2007 [NRM]: trying to get it good enough to circulate</sitem>
      </slist>
    </div1>
 </back>

</spec>
