This has ended up reading like a W3C spec., which the TAG doesn't do, but it's the way it turned out. . . We'll have to discuss what we do about that. . .
This is a TAG working document---no decision has yet been taken on its eventual disposition
The main change in this version is a substantial expansion of the discussion of quotation, see section Quoting. This also involved a high-level re-ordering of sections. The rhetoric was also changed to eliminate references to elaborating namespaces.
Still needs a section on the overall model -- relation of elaborated infoset to document interpretation, overall control flow, the role of the application
TAG issue xmlFunctions-34 represents the TAG's commitment to consider the question of whether there is a 'default' XML processing model, and if so what it looks like. That is, aside from the obligations imposed by the XML (and XML Namespace) recommendations themselves, what, if anything, ought to be done with a document whose media type tells you it's an XML document, before any application-specific processing is attempted? Or, to put it another way, if an author takes responsibility for the information in an XML document, exactly what is s/he taking responsibility for?
The XML Information Set specification defines a vocabulary for referring to the information content of an XML document, in the form of an abstract data model. It identifies XML parsers as the most likely source of such information, but acknowledges that other sources are possible, and several subsequent W3C specs (e.g. XInclude, XML Schema) are defined in terms of mappings from infosets to infosets.
The default processing model question can be rephrased as "Is there an infoset other than the one produced by a conformant XML parser which can and should be defined?" Indeed exactly what the infoset of an XML document is is already somewhat under-determined, in that a well-formed XML document as processed by a conformant processor may yield two distinct infosets, depending on whether that processor processes all the external parameter entities in the document's DTD.
Just as applications today can express the requirement that certain minimal processing has been done and/or that certain information must be available from the XML documents they take as input, by simply referring to the Infoset, we propose to define a more extended form of processing whose results, in information terms, can then be simply identified as the starting point for applications. Since the specification of XML and the XML information set, a number of generic XML applications have been specified, in terms of functions from infosets to infosets, which arguably should (almost) always be implemented before any more specific processing is attempted. By 'generic' I mean that their elements and/or attributes may usefully appear in almost any XML document, and are coherently interpretable without reference to the syntax or semantics of the surrounding XML (but see quoting below). Furthermore, the resulting infoset is consistent with the media type of the original XML document.
The inventory of such 'generic' applications is small, and identifying its membership correctly is likely to be one of the hard parts of this project, but here are three candidates:
There are three different ways in which the process of elaboration can
be avoided, so that the unelaborated infoset is preserved: opting out, implicit
quotation and explicit quotation. Opting out is trivial: Nothing in the
definition of elaborated infosets requires a specification or
processor to use it. So, for example, the next edition of XSLT probably should
not mandate the elaboration of stylesheets, since on balance the
presense therein of e.g. an xi:include
element is most likely to
be specifying a literal result element, and should not be elaborated.
In the context of an application which does call for elaboration of (some parts of) its input, two distinct kinds of quotation may be needed:
Implicit quotation provides for quotation of some parts of all documents in a particular namespace. The semantics of some parts of a particular application namespace may be best handled by blocking elaboration. Even different kinds of processing of a particular namespace may require different choices with respect to elaboration. Consider SOAP, for example. SOAP intermediaries might best be specified as elaborating down as far as the SOAP body, but no further, whereas SOAP recipients would elaborate the body. Constructors of SOAP messages might take yet a different approach. This means that both specifications and implementations may need to go into considerable detail with respect to what parts of an infoset are not elaborated. This in turn means that implementations of elaboration must provide controls which allow applications to specify which domains (subtrees) are to be treated as quoted.
Explicit quotation provides for quotation of parts of individual documents. In special circumstances, the author of a document may wish to
prevent the operation of elaboration within certain sub-trees
of a document. Accordingly, we define
http://www.example.org/quote
as an elaborating namespace, specified for use only on an eq:quote
attribute, which quotes any subtree it appears at the root of.
The elaboration of an element II with this attribute is defined to be an otherwise identical element eII with the attribute removed, and the special property that it short-circuits further applications of E in search of a fixed-point.
We need to establish just what the elaboration signals are, that is, what specs define one or more generic processes which it's useful to include in the definition of elaboration as a whole. Just what fits that description (which itself begs a question with the word 'useful') is an open question, but as suggested above we start with three candidates:
include
EII in the http://www.w3.org/2001/XInclude
namespace is an elaboration signal, and it should
be elaborated by reference to the XInclude specification.EncryptedData
EII in the
http://www.w3.org/2001/04/xmlenc#
namespace is an
elaboration signal, and it should
be elaborated by reference to the XML Encryption specification. It is always an error if a decryption fails because a key is supplied but is not accepted. There are roughly three non-error cases:
EncryptedData
element II itself;Signature
EII in the
http://www.w3.org/2000/09/xmldsig#
namespace is an
elaboration signal, and the in it should
be elaborated by reference to the XML Signature specification. This is not a clear or simple case, as XML Signature provides for at least three distinct kinds of signing (Enveloped, enveloping and detached), and supports signing of multiple objects. As a starting point elaboration of signing should always fail if the signature is not valid, and its value when the signature is valid should be as follows:
Signature
element II itself;This spec. identifies three elaboration signals. It should be possible for W3C specs published subsequently to identify one or more additional elaboration signals, by specifying what elaboration means for them.
The basic idea is that the elaborated infoset is constructed by a
top-down traversal of the original infoset, replacing each element information item which
signals that it is an elaborating element, either by itself being
an elaboration signal, or by being the owner of an attribute II
which is an elaboration signal. For example, the
an EII whose name is include
in the XInclude namespace is an
elaborating element, with its
elaboration as determined by the XInclude spec. The elaboration process
applies to its own output, that is, for example, if the result of XInclude
processing of an element is a sequence of elements, one of which is itself
named EncryptedData
in
the XML Encryption namespace, that element will in turn be elaborated.
More formally, the elaborated infoset of an infoitem is defined by a function E from information items ('II' for short) and a set of implicit quotation element names to (sequences of) information items (IQNs), by cases over the kind of information item. In each case we refer to the original information item as o and the result of a single elaboration, that is E(o,IQNs), as e, and to the values of properties of information items using a '.' and the property name, e.g.. o.local name.
The elaboration of an II o is F(E(o,IQNs)), where F is defined in Infoset fixup below and E is defined as follows:
quote
in the elaboration quotation
namespace, then eq:quote
attribute is removedThe elaboration process as a whole fails if any individual elaboration fails with an error.
The infoset as defined in the Infoset spec. has several properties whose values are non-local, that is, they cannot be determined or checked for consistency solely by reference to the subtree rooted at their host II. These are
IDREF
or
IDREFS
is the set of referenced element IIs, which may be anywhere
in the surrounding document;As recognized by the XInclude spec. (see references Property Fixup and subsequent sections), it follows that some fixup may be required after constructing an infoset by replacing some subtrees within an original infoset with subtrees from elsewhere. In some cases fixup means adding new attribute information items, in others a combination of that and changing the values of some infoset properties. It is conjectured that fixup can be done once, on the entire result infoset, after all elaborations have been carried out
eq:quote
to wrap quoted elements?