This specification provides a set of definitions for use in other specifications that need to refer to the information in an XML document.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the W3C.
This document is a W3C Proposed Recommendation. Several specifications at W3C already use the definitions contained in this document. Therefore the XML Core WG considers that this specification has been "implemented" successfully. Because the purpose of this document is to provide definitions for use in other specifications, implementation of this specification consists primarily of its use in those specifications.
This document enters a Proposed Recommendation review period. W3C Advisory Committee Members are invited to send formal review comments until 10 September 2001 to firstname.lastname@example.org, visible only to the W3C Team.
The public is invited to send comments on this document to the public mailing list email@example.com. An archive is available at http://lists.w3.org/Archives/Public/www-xml-infoset-comments/.
After the review, the Director will announce the document's disposition: it may become a W3C Recommendation (possibly with minor changes). This announcement should not be expected sooner than 14 days after the end of the review.
This document has been produced as part of the XML Activity by the XML Core Working Group (members only). See the XML Information Set Requirements for the specific requirements that informed development of this specification.
Publication as a Proposed Recommendation does not imply endorsement by the W3C membership. This is still a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite W3C Proposed Recommendations as other than "work in progress." A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/.
2. Information Items
2.1 The Document Information Item
2.2 Element Information Items
2.3 Attribute Information Items
2.4 Processing Instruction Information Items
2.5 Unexpanded Entity Reference Information Items
2.6 Character Information Items
2.7 Comment Information Items
2.8 The Document Type Declaration Information Item
2.9 Unparsed Entity Information Items
2.10 Notation Information Items
2.11 Namespace Information Items
Appendix A: References
Appendix B: XML 1.0 Reporting Requirements (informative)
Appendix C: Example
Appendix D: What is not in the Information Set
Appendix E: RDF Schema (informative)
This specification defines an abstract data set called the XML Information Set (Infoset). Its purpose is to provide a consistent set of definitions for use in other specifications that need to refer to the information in a well-formed XML document [XML].
It does not attempt to be exhaustive; the primary criterion for inclusion of an information item or property has been that of expected usefulness in future specifications. Nor does it constitute a minimum set of information that must be returned by an XML processor.
An XML document has an information set if it is well-formed and satisfies the namespace constraints described below. There is no requirement for an XML document to be valid in order to have an information set.
Information sets may be created by methods (not described in this specification) other than parsing an XML document. See Synthetic Infosets below.
An XML document's information set consists of a number of information items; the information set for any well-formed XML document will contain at least a document information item and several others. An information item is an abstract description of some part of an XML document: each information item has a set of associated named properties. In this specification, the property names are shown in square brackets, [thus]. The types of information item are listed in section 2.
The XML Information Set does not require or favor a specific interface or class of interfaces. This specification presents the information set as a modified tree for the sake of clarity and simplicity, but there is no requirement that the XML Information Set be made available through a tree structure; other types of interfaces, including (but not limited to) event-based and query-based interfaces are also capable of providing information conforming to the XML Information Set.
The terms "information set" and "information item" are similar in meaning to the generic terms "tree" and "node", as they are used in computing. However, the latter terms were avoided in this specification to reduce possible confusion with other specific data models. Information items do not map one-to-one with the nodes of the DOM or the "tree" and "nodes" of the XPath data model.
In this specification, the words "must", "should", and "may" assume the meanings specified in [RFC2119], except that the words do not appear in uppercase.
XML 1.0 documents that do not conform to [Namespaces], though technically well-formed, are not considered to have meaningful information sets. That is, this specification does not define an information set for documents that have element or attribute names containing colons that are used in other ways than as prescribed by [Namespaces].
Furthermore, this specification does not define an information set for documents which use relative URI references in namespace declarations. This is in accordance with the decision of the W3C XML Plenary Interest Group described in [Relative Namespace URI References]. Thus the value of a [namespace name] property is always an absolute URI with an optional fragment identifier.
An information set describes its XML document with entity references already expanded, that is, represented by the information items corresponding to their replacement text. However, there are various circumstances in which a processor may not perform this expansion. An entity may not be declared, or may not be retrievable. A non-validating processor may choose not to read all declarations, and even if it does, may not expand all external entities. In these cases an unexpanded entity reference information item is used to represent the entity reference.
The values of all properties in the Infoset take account of the end-of-line normalization described in [XML], 2.11 "End-of-Line Handling".
Several information items have a [base URI] or [declaration base URI] property. This is computed according to [XML Base]. Note that retrieval of a resource may involve redirection at the parser level (for example, in an entity resolver) or below; in this case the base URI is the final URI used to retrieve the resource after all redirection.
In some cases (such as a document read from a string or a pipe) the rules in [XML Base] may result in a base URI being application dependent. In these cases this specification does not define the value of the [base URI] or [declaration base URI] property.
When resolving relative URIs the [base URI] property should be used in preference to the values of xml:base attributes; they may be inconsistent in the case of Synthetic Infosets.
Some properties may sometimes have the value unknown or no value, and it is said that a property value is unknown or that a property has no value respectively. These values are distinct from each other and from all other values. In particular they are distinct from the empty string, the empty set, and the empty list, each of which simply has no members. This specification does not use the term null since in some communities it has particular connotations which may not match those intended here.
This specification describes the information set resulting from parsing an XML document. Information sets may be constructed by other means, for example by use of an API such as the DOM or by transforming an existing information set.
An information set corresponding to a real document will necessarily be consistent in various ways; for example the [in-scope namespaces] property of an element will be consistent with the [namespace attributes] properties of the element and its ancestors. This may not be true of an information set constructed by other means; in such a case there will be no XML document corresponding to the information set, and to serialize it will require resolution of the inconsistencies (for example, by outputting namespace declarations that correspond to the namespaces in scope).
An information set can contain up to eleven different types of information item, as explained in the following sections. Every information item has properties. For ease of reference, each property is given a name, indicated [thus] . Links to a definition and/or syntax in the XML 1.0 Recommendation [XML] are given for each information item.
XML Definition: document (Section 2, Documents)
XML Syntax:  Document (Section 2.1, Well-Formed XML Documents)
There is exactly one document information item in the information set, and all other information items are accessible from the properties of the document information item, either directly or indirectly through the properties of other information items.
The document information item has the following properties:
XML Definition: element (Section 3, Logical Structures)
XML Syntax:  Element (Section 3, Logical Structures)
There is an element information item for each element appearing in the XML document. One of the element information items is the value of the [document element] property of the document information item, corresponding to the root of the element tree, and all other element information items are accessible by recursively following its [children] property.
An element information item has the following properties:
xmlns="", which undeclares the default namespace, counts as a namespace declaration. By definition, all namespace attributes (including those named
xmlns, whose [prefix] property has no value) have a namespace URI of
http://www.w3.org/2000/xmlns/. If the element has no namespace declarations, this set has no members.
xmlwhich is implicitly bound to the namespace name
http://www.w3.org/XML/1998/namespace. It does not contain an item with the prefix
xmlns(used for declaring namespaces), since an application can never encounter an element or attribute with that prefix. The set will include namespace items corresponding to all of the members of [namespace attributes], except for any representing a declaration of the form
xmlns="", which does not declare a namespace but rather undeclares the default namespace. When resolving the prefixes of qualified names this property should be used in preference to the [namespace attributes] property; they may be inconsistent in the case of Synthetic Infosets.
XML Definition: attribute (Section 3.1, Start-Tags, End-Tags, and Empty-Element Tags)
XML Syntax:  Attribute (Section 3.1, Start-Tags, End-Tags, and Empty-Element Tags)
There is an attribute information item for each attribute (specified or defaulted) of each element in the document, including those which are namespace declarations. The latter however appear as members of an element's [namespace attributes] property rather than its [attributes] property.
Attributes declared in the DTD with no default value and not specified in the element's start tag are not represented by attribute information items.
An attribute information item has the following properties:
XML Definition: processing instruction (Section 2.6, Processing Instructions)
XML Syntax:  PI (Section 2.6, Processing Instructions)
There is a processing instruction information item for each processing instruction in the document. The XML declaration and text declarations for external parsed entities are not considered processing instructions.
A processing instruction information item has the following properties:
xml:baseattribute on elements.
XML Definition: Section 4.4.3, Included If Validating
A unexpanded entity reference information item serves as a placeholder by which an XML processor can indicate that it has not expanded an external parsed entity. There is such an information item for each unexpanded reference to an external general entity within the content of an element. A validating XML processor, or a non-validating processor that reads all external general entities, will never generate unexpanded entity reference information items for a valid document.
An unexpanded entity reference information item has the following properties:
XML Syntax:  Char (Section 2.2, Characters)
There is a character information item for each data character that appears in the document, whether literally, as a character reference, or within a CDATA section.
Each character is a logically separate information item, but XML applications are free to chunk characters into larger groups as necessary or desirable.
A character information item has the following properties:
XML Definition: comment (Section 2.5, Comments)
XML Syntax:  Comment (Section 2.5, Comments)
There is a comment information item for each XML comment in the original document, except for those appearing in the DTD (which are not represented).
A comment information item has the following properties:
XML Definition: document type declaration (section 2.8, Prolog and Document Type Declaration)
XML Syntax:  doctypedecl (section 2.8, Prolog and Document Type Declaration)
If the XML document has a document type declaration, then the information set contains a single document type declaration information item. Note that entities and notations are provided as properties of the document information item, not the document type declaration information item.
A document type declaration information item has the following properties:
XML Definition: entity (section 4, Physical Structures)
XML Syntax:  GEDecl (section 4.2, Entities)
There is an unparsed entity information item for each unparsed general entity declared in the DTD.
An unparsed entity information item has the following properties:
XML Definition: notation (section 4.7, Notations)
XML Syntax:  NotationDecl (section 4.7, Notations)
There is a notation information item for each notation declared in the DTD.
A notation information item has the following properties:
Each element in the document has a namespace information item for each namespace that is in scope for that element.
A namespace information item has the following properties:
xmlns:prefix. If the attribute name is simply
xmlns, so that the declaration is of the default namespace, this property has no value.
Since the purpose of the Information Set is to provide a set of definitions, conformance is a property of specifications that use those definitions, rather than of implementations.
Specifications referring to the Infoset must:
If a specification allows the construction of an infoset that has inconsistencies as described above under Synthetic Infosets it may describe how those inconsistencies are to be resolved, and should do so if it provides for serialization of the infoset.
Although the XML 1.0 Recommendation [XML] is primarily concerned with XML syntax, it also includes some specific reporting requirements for XML processors.
The reporting requirements include errors, which are outside the scope of this specification, and document information. All of the XML 1.0 requirements for document information reporting have been integrated into the XML Information Set; numbers in parentheses refer to sections of the XML Recommendation:
Consider the following example XML document:
<?xml version="1.0"?> <msg:message doc:date="19990421" xmlns:doc="http://doc.example.org/namespaces/doc" xmlns:msg="http://message.example.org/" >Phone home!</msg:message>
The information set for this XML document contains the following information items:
http://message.example.org/", local part "
message", and prefix "
http://doc.example.org/namespaces/doc", local part "
date", prefix "
doc", and normalized value "
The following information is not represented in the current version of the XML Information Set (this list is not intended to be exhaustive):
See RDF Schema for the XML Information Set for a formal characterization of the Infoset.