<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE spec PUBLIC "-//W3C//DTD Specification V2.1//EN"
               "http://www.w3.org/XML/1998/06/xmlspec-v21.dtd" [
  <!-- ================================================================ -->
  <!ENTITY draft.day "25">
  <!ENTITY draft.month "03">
  <!ENTITY draft.monthname "Mar">
  <!ENTITY draft.year "2002">
  <!ENTITY iso6.doc.date "&draft.year;-&draft.month;-&draft.day;">
  <!ENTITY http-ident "http://www.w3.org/2001/tag/doc/docmeaning">
]>
<spec w3c-doctype='other'>
<?CVS $Id: docmeaning.xml,v 1.1 2002/03/25 15:54:05 NormanWalsh Exp $?>
<header>
<title>What Does a Document Mean?</title>
<w3c-designation>&http-ident;-&iso6.doc.date;</w3c-designation>
<w3c-doctype>TAG Draft</w3c-doctype>
<pubdate><day>&draft.day;</day>
<month>&draft.monthname;</month>
<year>&draft.year;</year>
</pubdate>
<publoc>
<loc href="&http-ident;">&http-ident;</loc>
(<loc href="&http-ident;.html">HTML</loc>,
<loc href="&http-ident;.xml">XML</loc>)</publoc>
<!--
<latestloc><loc href="&http-ident;">&http-ident;</loc>
</latestloc>
<prevlocs></prevlocs>
-->
<authlist>
<author><name>Paul Cotton</name>
<affiliation>Microsoft, Inc.</affiliation>
<email href="mailto:pcotton@microsoft.com">pcotton@microsoft.com</email></author>
<author><name>Norman Walsh</name>
<affiliation>Sun Microsystems, Inc.</affiliation>
<email href="mailto:Norman.Walsh@Sun.COM">Norman.Walsh@Sun.COM</email></author>
</authlist>
<copyright>
<p>
<loc href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#Copyright">Copyright</loc> &#xA9; 2002
<loc href="http://www.w3.org/">W3C</loc><sup>&#xAE;</sup>
(<loc href="http://www.lcs.mit.edu/">MIT</loc>,
<loc href="http://www.inria.fr/">INRIA</loc>,
<loc href="http://www.keio.ac.jp/">Keio</loc>),
All Rights Reserved. W3C
<loc href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#Legal_Disclaimer">liability</loc>,
<loc href="http://www.w3.org/Consortium/Legal/ipr-notice-20000612#W3C_Trademarks">trademark</loc>,
<loc href="http://www.w3.org/Consortium/Legal/copyright-documents-19990405">document use</loc>, and
<loc href="http://www.w3.org/Consortium/Legal/copyright-software-19980720">software licensing</loc>
rules apply.
</p></copyright>

<abstract>
<p>A one-page answer to the question "what does a document mean?"
@@say more</p>
</abstract>

<status>
<p>This document has been developed for discussion by the
W3C Technical Architecture Group.</p>

<p>This document is the work of the editors. It is a draft
with no official standing. It does not necessarily represent the
consensus opinion of the TAG and it may not even represent the
consensus opinion of the editors.</p>

<p>Comments may be directed to the W3C TAG mailing list <loc
href="mailto:www-tag@w3.org">www-tag@w3.org</loc>
(<loc
href="http://lists.w3.org/Archives/Public/www-tag/">archive</loc>).</p>

<p>Publication of this document by W3C indicates
no endorsement by W3C or the W3C Team, or any W3C Members.
</p>

</status>
<pubstmt>
<p>Chicago, Vancouver, Mountain View, et al.: World-Wide Web Consortium,
TAG Note, 2002.</p>
</pubstmt>
<sourcedesc>
<p>Created in electronic form.</p>
</sourcedesc>
<langusage>
<language id="EN">English</language>
</langusage>
<revisiondesc>
<slist>
<sitem>2002-03-14: Published draft</sitem>
<sitem>2002-03-07: Initial draft</sitem>
</slist>
</revisiondesc>
</header>
<body>

<div1 id="sec-intro">
<head>Introduction</head>

<p>In human communication, mutual understanding of meaning is often
achieved through an iterative process. Our common experiences provide
a platform on which we can express our thoughts and opinions. On
matters of fact, we can exchange evidence until we reach consensus.
On matters of opinion, we can exchange views until we reach
understanding.</p>

<p>Communication with and between machines does not generally allow
the luxury of partial understanding. Machines have not yet
demonstrated any ability to proceed in the face of ambiguity. On the
Web, where communication is achived by the exchange of resources (most
often <quote>documents</quote> in a broad sense, but generally any resource
that is identified by a URI can be exchanged), we need a clear means by
which meaning can be associated unambiguiously with a document.</p>

</div1>

<div1 id="sec-what-is-a-doc">
<head>What is a Document?</head>

<p>A document on the Web is a stream of bits identified with a specific MIME
type. The MIME type indicates to the processor how it may interpret the stream
of bits to decompose it into a sequence of characters, for example, or a
specific bitmap image.</p>

</div1>

<div1 id="sec-how-used">
<head>How Can a Document Be Used?</head>

<p>One of the fundamental benefits of an open, extensible language
like XML is that it makes information reuse very cheap. A number of
mechanisms exist that allow me to quickly and efficiently repurpose
information <quote>on demand</quote>.</p>

<p>Some forms of reuse are easy to predict: publishing information for
multiple platforms and multiple hardware architectures, converting
between application formats. Others are harder to predict:
transforming legacy information for new systems, mining existing data
resources for new information, and applications net yet imagined.</p>

<p>It is important to preserve this cheap, flexible reuse policy as we
develop the Web architecture.</p>

</div1>

<div1 id="sec-what-does-it-mean">
<head>What Does a Document Mean?</head>

<p>On the Web, every document has a MIME type in addition to the
stream of bits that comprises its content. In the absense of a MIME
type, a stream of bits is essentially meaningless. It's the MIME type that
tells a process how it can interpret the stream of bits. Although good
engineering often encourages us to design self-describing data
formats, it's quite possible for the same stream of bits to have
different semantics depending on the MIME type associated with it.</p>

<p>Consider an XHTML document. If it is served as
<code>text/xhtml</code>, the receipient is obligated to process it as
XML, performing well-formedness and perhaps validity checking,
namespace processing, and a small number of other tasks. If the same
document is served as <code>text/html</code>, the interpretation of those
bits is governed by the HTML Recommendation which has no provision for
XML namespace processing or well-formedness.</p>

<p>So, a first level approximation of a document's meaning is determined
solely by its MIME type.</p>

<div2 id="sec-specific-meaning">
<head>More Specific Meaning</head>

<p>Some MIME types, such as <code>application/xml</code>, describe
whole classes of documents that may have much more specific,
application-level meaning.</p>

<p>Consider, for example, an XHTML home page, a purchase order, and a remote
procedure call; all three of these documents might be transmited (for better
or worse) with the MIME type <code>application/xml</code>. Few of us, however,
would assert that they have the same <quote>meaning</quote> except in the
most general sense.</p>

<p>So, how can we determine a more specific meaning in an unambiguous way?
What additional information can we rely upon?</p>

<olist>
<item><p>The document might have an associated document type or
schema. That would provide additional information about the document,
if we are capable of performing the required validity assessment, and
assuming that it is in fact valid.</p></item>

<item><p>The document has a (possibly anonymous) fully qualified root
element name. That name provides additional information about the type
of document. As a general rule, XML allows an author to mix different
namespaces with considerable freedome, so the fully qualified name of
the root element may not tell us everything about the
document.</p></item>

<item><p>The document has a (possibly empty) set of top-level
namespace declarations. These namespaces tell us what the author
declared at the top level, but there's nothing that requires the
author to have used all of those namespaces nor anything that prevents
the author from using more at lower levels of the tree.</p>
</item>

<item><p>The document has a (possibly empty) set of attribute
name/value pairs on the root element. These may provide additional details.</p>
</item>

<item><p>The document forms a complete information set. Although expensive in
the general case, it's not entirely unreasonable to imagine applications that
examine an entire information set.</p></item>

<item><p>The document may have arrived as part of a larger context.
That context may identify definitively what the document means, or is
intended to mean. An RPC application might listen for a set of parameters on
a particular port and it might always interpret any document so received as
a set of parameters. Hopefully the application would perform enough error-checking
to recognize when it had been sent something in appropropriate, but there's
no reason to imagine that this application can handle anything else.</p></item>

<item><p>MIME provides for the transmission of messages composed of several
parts. How is meaning determined if the receipient is handed such a
<quote>package</quote>.</p></item>

<item><p>Some documents may be transformed in ways that may
(encryption) or may not (digital signatures) require the recipient to reverse
the transformation before continuing. Does such a process confer any meaning?
How is that meaning related to the meaning that would have been expressed if
the document had not been so transformed before transmission?</p>
</item>

</olist>

<p>Which, if any of these, is an appropriate answer in any and all
circumstances is an open question.</p>

</div2>
</div1>

</body>
</spec>

