Copyright ©1999, 2000 W3C® (MIT, INRIA, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply.
This specification describes an abstract data set which contains the useful information available from an XML document.
Though this specification has already had a Last Call review on an earlier version , the XML Core Working Group has decided to publish this working draft of its latest version (member only) and invites public comment on this specification.
Comments on this document are invited and are to be sent to the public mailing list www-xml-infoset-comments@w3.org. An archive is available at http://lists.w3.org/Archives/Public/www-xml-infoset-comments/.
For background on this work, please see the XML Activity Statement. This specification is a product of the XML Core Working Group.
See XML Information Set Requirements for the specific requirements that informed development of this specification.
It is inappropriate to use W3C Working Drafts as reference material or to cite them as other than "work in progress". A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR/ .
1. Introduction
2. Information Items
2.1 The Document Information Item
2.2 Element Information Items
2.3 Attribute
Information Items
2.4 Processing instruction
Information Items
2.5 Reference to Skipped
Entity Information Items
2.6 Character
Information Items
2.7 Comment Information
Items
2.8 The Document Type Declaration
Information Item
2.9 Entity Declaration
Information Items
2.10 Notation Information
Items
2.11 Entity Start Marker Information
Items
2.12 Entity End Marker Information Items
2.13 CDATA Start Marker Information
Items
2.14 CDATA End Marker Information
Items
2.15 Namespace Declaration
Information Items
3. Conformance
3.1 Core Conformance
4. References
4.1 Normative References
4.2 Informative References
Appendix A: XML 1.0
Reporting Requirements (informative)
Appendix B:
What is not in the Information Set
Appendix D:
RDF Schema (informative)
This document specifies an abstract data set called the XML Information Set (Infoset ), a description of the information available in a well-formed XML document [XML].
XML 1.0 documents that do not conform to [Namespaces], though technically well-formed, are not considered to have meaningful information sets as defined by this specification. That is, this specification does not define an information set documents that have element or attribute names containing colons that are used in other ways than as prescribed by [Namespaces]. There is no requirement for a XML document to be valid in order to have an information set.
An XML document's information set consists of two or more information items (the information set for any well-formed XML document will contain at least the document information item and one element information item). An information item is an abstract representation of some part of an XML document: each information item has a set of associated properties
An information set describes its XML document with all entity references already expanded, that is, represented by the information items corresponding to their replacement text. Explicit provision is made in the information set for representing an entity reference that has not been or cannot be expanded, (i.e. because an XML processor has not read its declaration or its value).
The XML Information Set does not require or favor a specific interface or class of interfaces. This specification presents the information set as a modified tree for the sake of clarity and simplicity, but there is no requirement that the XML Information Set be made available through a tree structure; other types of interfaces, including (but not limited to) event-based and query-based interfaces are also capable of providing information conforming to the XML Information Set. As long as the information in the information set is made available to XML applications in one way or another, the requirements of this document are satisfied.
The terms "information set" and "information item" are similar in meaning to the generic terms "tree" and "node", as they are used in computing. However, these terms were avoided in this document to reduce possible confusion with other specific data models. Information items do not map one-to-one with the Nodes of the DOM or the "tree" and "nodes" of the XPath data model.
The Infoset provides a definition of one specific body of information, not an exhaustive or unified inventory of the various kinds of information which specialized XML processors (e.g. schema-aware processors) may provide to downstream applications.
Note: In this document, the words "must", "should", and "may" assume the meanings specified in RFC 2119 [RFC2119], except that the words do not appear in upper case.
Note: To the best of the editor's knowledge and belief, the information set scheme described in this document satisfies the requirements of the XPointer-Information Set Liaison Statement [XPointer-Liaison].
Note: To the best of the editor's knowledge and belief, the interface specified by the Document Object Model, Level 1 Core Recommendation [DOM] conforms to the XML Information Set as currently specified.
An information set can contain up to fifteen different types of information items, as explained in the following sections. Every information item has properties. For ease of reference, each property is given a name, indicated [thus] .
XML Definition: document (Section 2, Documents)
XML Syntax: [1] Document (Section 2.1, Well-Formed XML Documents)
There is exactly one document information item in the information set, and all other information items are accessible from the properties of the document information item, either directly or indirectly through the properties of other information items.
The document information item has the following properties:
XML Definition: element (Section 3, Logical Structures)
XML Syntax: [39] Element (Section 3, Logical Structures)
There is a element information item for each element appearing in the XML document. One of the element information items corresponds to the document element (the root of the element tree), and all other element information items are children of the document element, either directly or indirectly.
An element information item has the following properties:
xmlns=""
, which does not declare a
namespace but rather undeclares the default namespace.XML Definition: attribute (Section 3.1, Start-Tags, End-Tags, and Empty-Element Tags)
XML Syntax: [41] Attribute (Section 3.1, Start-Tags, End-Tags, and Empty-Element Tags)
There is a attribute information item for each attribute (specified or defaulted) for each element in the document, except that attributes which are namespace declarations are represented using namespace declaration information items, not attribute information items.
Attributes declared in the DTD with a default value of #IMPLIED
and not specified in the element's start tag are not represented by
attribute information items.
An attribute information item has the following properties:
XML Definition: processing instruction (Section 2.6, Processing Instructions)
XML Syntax: [16] PI (Section 2.6, Processing Instructions)
There is one processing instruction information item for every processing instruction in the document. The XML declaration and text declarations for external parsed entities are not considered processing instructions.
A processing instruction information item has the following properties:
XML Definition: Section 4.4.3, Included If Validating
A reference to skipped entity information item serves as a place-holder in the information set provided by a non-validating XML processor that does not read external parsed entities. There is one such information item for each reference to such an external general entity within the content of an element. A validating XML processor, or a non-validating processor that reads all external general entities, will never generate reference to skipped entity information items for a valid document.
A reference to skipped entity information item has the following properties:
XML Syntax: [2] Char (Section 2.2, Characters)
There is one character information item for each character that appears within the document element, either literally, as a character reference, or within a CDATA section. There is also one character information item for each character that appears in a normalized attribute value.
Note, however, that
a CR (#xD) character that is followed by an LF (#xA) character is not represented
by any information item. Furthermore, a CR character that is not
followed by an LF character is represented by an LF character information
item. These rules do not apply to CR characters created by character references
such as 
or
.
Each character is a logically separate information item, but XML applications are free to chunk characters into larger groups as necessary or desirable.
A character information item has the following properties:
XML Definition: comment (Section 2.5, Comments)
XML Syntax: [15] Comment (Section 2.5, Comments)
A comment information item corresponds to each XML comment in the original document.
A comment information item has the following properties:
XML Definition: document type declaration (section 2.8, Prolog and Document Type Declaration)
XML Syntax: [28] doctypedecl (section 2.8, Prolog and Document Type Declaration)
If the XML document has a document type declaration, then the information set contains a single document type declaration information item. Note that although entities and notations are logically part of the document type declaration, they are provided as properties of the document information item, not the document type declaration information item.
A document type declaration information item has the following properties:
XML Definition: entity (section 4, Physical Structures)
XML Syntax: [70] EntityDecl (section 4.2, Entity Declarations)
There is an entity declaration information item for each general entity, internal or external, declared in the DTD. When the same entity is declared more than once, only the first declaration is represented in the information set. (Declarations in the internal subset are understood to precede those in the external subset.) Parameter entities are not represented by entity declaration information items. There is also an entity declaration information item for the document entity, and one for the DTD external subset, if it exists, even though these entities are not actually declared anywhere.
An entity declaration information item has the following properties:
XML Definition: notation (section 4.7, Notation Declarations)
XML Syntax: [82] NotationDecl (section 4.7, Notation Declarations)
There is one notation information item for each notation declared in the DTD.
A notation information item has the following properties:
XML Definition: entity reference (section 4.1, Character and Entity References)
XML Syntax: [68] EntityRef (section 4.1, Character and Entity References)
Entity start marker information items are inserted just before the point where information items resulting from the inclusion of a general entity as a consequence of an entity reference begins.
Entity start marker information items are not used in connection with parameter entity references in the DTD.
An entity start marker information item has the following properties:
XML Definition: entity reference (section 4.1, Character and Entity References)
XML Syntax: [68] EntityRef (section 4.1, Character and Entity References)
Entity end marker information items are inserted just after the point where information items resulting from the inclusion of a general entity as a consequence of an entity reference concludes.
Entity end marker information items are not used in connection with parameter entity references in the DTD.
An entity end marker information item has the following properties:
XML Definition: CDATA sections (section 2.7, CDATA sections)
XML Syntax: [18] CDSect (section 2.7, CDATA Sections)
CDATA start marker information items are inserted just before the place where text embedded in a CDATA section begins.
A CDATA start marker information item has the following properties:
XML Definition: CDATA sections (section 2.7, CDATA sections)
XML Syntax: [18] CDSect (section 2.7, CDATA Sections)
CDATA end marker information items are inserted just after the place where text embedded in a CDATA section concludes.
A CDATA end marker information item has the following properties:
XML Definition: attribute (Section 3.1, Start-Tags, End-Tags, and Empty-Element Tags)
XML Syntax: [41] Attribute (Section 3.1, Start-Tags, End-Tags, and Empty-Element Tags)
There is one namespace
declaration information item for each namespace declaration
(specified or defaulted) for each element in the document. Namespace declarations
are syntactically like attribute declarations of attributes that have names
beginning with the string xmlns
.
Namespace declarations
declared in the DTD with a default value of #IMPLIED
and not
specified in the element's start tag are not represented by information items.
A namespace declaration information item has the following properties:
xmlns:
prefix.
If the attribute name is simply xmlns
, this property is an empty
string.An XML processor conforms to the XML Information Set if it documents the information items and properties that it provides. Processors may provide only certain information items, and of the information items provided, only certain properties. In addition, properties that are lists or sets may be provided only in part.
XML processors may additionally provide additional information not found in the XML Information Set; for instance, the XML Information Set excludes whitespace that occurs between attributes from the information set, but an XML processor that provides this information can still conform to the XML Information Set as long as it provides information from the document's information set.
An XML processor conforms to the XML Information Set core if it provides at least the following information items and properties:
The document information item, including the following properties:
All element information items, including the following properties:
All processing instruction information items, including all their properties.
All reference to skipped entity information items, including all their properties.
All character information items, including the following properties:
All attribute information items, including the following properties:
All entity information items corresponding to unparsed entities, including the following properties:
All notation information items, including all their properties.
All namespace declaration items, including the following properties:
Conformance to the core is not a requirement for conformance to the Infoset. The notion of core conformance is introduced solely for the convenience of other specifications and recommendations, so that they do not have to detail every information item and property which they support.
http://www.w3.org/TR/REC-xml-names/
.
http://www.isi.edu/in-notes/rfc2119.txt
.http://www.w3.org/TR/REC-xml
.
http://www.w3.org/TR/xmlbase
.
http://www.w3.org/TR/REC-DOM-Level-1/
.
http://www.w3.org/TR/NOTE-xptr-infoset-liaison
.Although the XML 1.0 Recommendation [XML] is primarily concerned with XML syntax, it also includes some specific reporting requirements for XML processors.
The reporting requirements include errors, which are outside the scope of this specification, and document information. All of the XML 1.0 requirements for document information reporting have been integrated into the XML Information Set (numbers in parentheses refer to sections of the XML Recommendation):
Consider the following example XML document:
<?xml version="1.0"?> <msg:message doc:date="19990421" xmlns:doc="http://www.doc.example/namespaces/doc" xmlns:msg="http://www.message.example/" >Phone home!</msg:message>
The information set for this XML document contains the following information items:
http://www.message.example/
" and the local part "
message
".http://www.doc.example/namespaces/doc
" and the
local part "date
".http://www.doc.example/namespaces/doc
and http://www.message.net/
namespaces.The following information is not represented in the current version of the XML Information Set:
<foo/>
and <foo></foo>
.The following RDF Schema provides a formal characterization of the Infoset. In case of disagreement between this schema and the prose in this document, the prose is normative.
<?xml version='1.0' encoding='utf-8' standalone='yes'?> <!-- this can be decoded as US-ASCII or iso-8859-1 as well, since it contains no characters outside the US-ASCII repertoire --> <!-- $Id: infoset.rdf,v 1.1 2000/07/26 19:17:39 connolly Exp $ --> <rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#' xmlns='http://www.w3.org/2000/07/infoset#'> <!--Enumeration classes and their members--> <rdfs:Class ID='AttributeType'/> <AttributeType ID='AttributeType.ID'/> <AttributeType ID='AttributeType.IDREF'/> <AttributeType ID='AttributeType.IDREFS'/> <AttributeType ID='AttributeType.ENTITY'/> <AttributeType ID='AttributeType.ENTITIES'/> <AttributeType ID='AttributeType.NMTOKEN'/> <AttributeType ID='AttributeType.NMTOKENS'/> <AttributeType ID='AttributeType.NOTATION'/> <AttributeType ID='AttributeType.CDATA'/> <AttributeType ID='AttributeType.ENUMERATED'/> <rdfs:Class ID='Boolean'/> <Boolean ID='Boolean.true'/> <Boolean ID='Boolean.false'/> <rdfs:Class ID='EntityType'/> <EntityType ID='EntityType.InternalGeneral'/> <EntityType ID='EntityType.ExternalGeneral'/> <EntityType ID='EntityType.Unparsed'/> <EntityType ID='EntityType.DocumentEntity'/> <EntityType ID='EntityType.ExternalDTDSubset'/> <rdfs:Class ID='Integer' rdfs:subClassOf='http://www.w3.org/2000/01/rdf-schema#Literal'/> <rdfs:Class ID='StandaloneType'/> <StandaloneType ID='StandaloneType.yes'/> <StandaloneType ID='StandaloneType.no'/> <StandaloneType ID='StandaloneType.notSpecified'/> <!--Info item classes in document order--> <rdfs:Class ID='InfoItem'/> <rdfs:Class ID='Document' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='Element' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='Attribute' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='ProcessingInstruction' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='Character' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='ReferenceToSkippedEntity' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='Comment' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='DocumentTypeDeclaration' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='EntityDeclaration' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='Notation' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='EntityStartMarker' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='EntityEndMarker' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='CDATAStartMarker' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='CDATAEndMarker' rdfs:subClassOf='#InfoItem'/> <rdfs:Class ID='Namespace' rdfs:subClassOf='#InfoItem'/>\ <!--Set containers--> <rdfs:Class ID='InfoItemSet' rdfs:subClassOf='http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag'/> <rdfs:Class ID='AttributeSet' rdfs:subClassOf='#InfoItemSet'/> <rdfs:Class ID='EntitySet' rdfs:subClassOf='#InfoItemSet'/> <rdfs:Class ID='NamespaceSet' rdfs:subClassOf='#InfoItemSet'/> <rdfs:Class ID='NotationSet' rdfs:subClassOf='#InfoItemSet'/> <!--Sequence container--> <rdfs:Class ID='InfoItemSeq' rdfs:subClassOf='http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq'/> <!--Info item properties--> <rdfs:Property ID='attributes'> <rdfs:domain resource='#Element'/> <rdfs:range resource='#AttributeSet'/> </rdfs:Property> <rdfs:Property ID='attributeType'> <rdfs:domain resource='#Attribute'/> <rdfs:range resource='#AttributeType'/> </rdfs:Property> <rdfs:Property ID='baseURI'> <rdfs:domain resource='#Document'/> <rdfs:domain resource='#Element'/> <rdfs:domain resource='#ProcessingInstruction'/> <rdfs:domain resource='#EntityDeclaration'/> <rdfs:domain resource='#Notation'/> <rdfs:range resource='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal'/> </rdfs:Property> <rdfs:Property ID='characterCode'> <rdfs:domain resource='#Character'/> <rdfs:range resource='#Integer'/> </rdfs:Property> <rdfs:Property ID='charset'> <rdfs:domain resource='#EntityDeclaration'/> <rdfs:range resource='http://www.w3.org/2000/01/rdf-schema#Literal'/> </rdfs:Property> <rdfs:Property ID='children'> <rdfs:domain resource='#Document'/> <rdfs:domain resource='#Element'/> <rdfs:domain resource='#Attribute'/> <rdfs:domain resource='#DocumentTypeDeclaration'/> <rdfs:domain resource='#Namespace'/> <rdfs:range resource='#InfoItemSeq'/> </rdfs:Property> <rdfs:Property ID='content'> <rdfs:domain resource='#ProcessingInstruction'/> <rdfs:domain resource='#Comment'/> <rdfs:domain resource='#EntityDeclaration'/> <rdfs:range resource='http://www.w3.org/2000/01/rdf-schema#Literal'/> </rdfs:Property> <rdfs:Property ID='declaredNamespaces'> <rdfs:domain resource='#Element'/> <rdfs:range resource='#NamespaceSet'/> </rdfs:Property> <rdfs:Property ID='default'> <rdfs:domain resource='#Attribute'/> <rdfs:range resource='#Boolean'/> </rdfs:Property> <rdfs:Property ID='elementContentWhitespace'> <rdfs:domain resource='#Character'/> <rdfs:range resource='#Boolean'/> </rdfs:Property> <rdfs:Property ID='entity'> <rdfs:domain resource='#ReferenceToSkippedEntity'/> <rdfs:domain resource='#EntityStartMarker'/> <rdfs:domain resource='#EntityEndMarker'/> <rdfs:range resource='#EntityDeclaration'/> </rdfs:Property> <rdfs:Property ID='entities'> <rdfs:domain resource='#Document'/> <rdfs:range resource='#EntitySet'/> </rdfs:Property> <rdfs:Property ID='entityType'> <rdfs:domain resource='#Attribute'/> <rdfs:range resource='#EntityType'/> </rdfs:Property> <rdfs:Property ID='externalDTD'> <rdfs:domain resource='#DocumentTypeDeclaration'/> <rdfs:range resource='#EntityDeclaration'/> </rdfs:Property> <rdfs:Property ID='inScopeNamespaces'> <rdfs:domain resource='#Element'/> <rdfs:range resource='#NamespaceSet'/> </rdfs:Property> <rdfs:Property ID='localName'> <rdfs:domain resource='#Element'/> <rdfs:domain resource='#Attribute'/> <rdfs:range resource='http://www.w3.org/2000/01/rdf-schema#Literal'/> </rdfs:Property> <rdfs:Property ID='name'> <rdfs:domain resource='#ReferenceToSkippedEntity'/> <rdfs:domain resource='#EntityDeclaration'/> <rdfs:domain resource='#Notation'/> <rdfs:range resource='http://www.w3.org/2000/01/rdf-schema#Literal'/> </rdfs:Property> <rdfs:Property ID='namespaceName'> <rdfs:domain resource='#Element'/> <rdfs:domain resource='#Attribute'/> <rdfs:domain resource='#Namespace'/> <rdfs:range resource='http://www.w3.org/2000/01/rdf-schema#Literal'/> </rdfs:Property> <rdfs:Property ID='normalizedValue'> <rdfs:domain resource='#Attribute'/> <rdfs:range resource='http://www.w3.org/2000/01/rdf-schema#Literal'/> </rdfs:Property> <rdfs:Property ID='notation'> <rdfs:domain resource='#EntityDeclaration'/> <rdfs:range resource='#Notation'/> </rdfs:Property> <rdfs:Property ID='notations'> <rdfs:domain resource='#Document'/> <rdfs:range resource='#NotationSet'/> </rdfs:Property> <rdfs:Property ID='ownerElement'> <rdfs:domain resource='#Attribute'/> <rdfs:domain resource='#Namespace'/> <rdfs:range resource='#Element'/> </rdfs:Property> <rdfs:Property ID='parent'> <rdfs:domain resource='#Element'/> <rdfs:domain resource='#ProcessingInstruction'/> <rdfs:domain resource='#Character'/> <rdfs:domain resource='#ReferenceToSkippedElement'/> <rdfs:domain resource='#Comment'/> <rdfs:domain resource='#DocumentTypeDeclaration'/> <rdfs:domain resource='#EntityStartMarker'/> <rdfs:domain resource='#EntityEndMarker'/> <rdfs:domain resource='#CDATAStartMarker'/> <rdfs:domain resource='#CDATAEndMarker'/> <rdfs:range resource='#InfoItem'/> </rdfs:Property> <rdfs:Property ID='prefix'> <rdfs:domain resource='#Namespace'/> <rdfs:range resource='http://www.w3.org/2000/01/rdf-schema#Literal'/> </rdfs:Property> <rdfs:Property ID='publicIdentifier'> <rdfs:domain resource='#EntityDeclaration'/> <rdfs:domain resource='#Notation'/> <rdfs:range resource='http://www.w3.org/2000/01/rdf-schema#Literal'/> </rdfs:Property> <rdfs:Property ID='specified'> <rdfs:domain resource='#Attribute'/> <rdfs:range resource='#Boolean'/> </rdfs:Property> <rdfs:Property ID='standalone'> <rdfs:domain resource='#Document'/> <rdfs:range resource='#StandaloneType'/> </rdfs:Property> <rdfs:Property ID='systemIdentifier'> <rdfs:domain resource='#EntityDeclaration'/> <rdfs:domain resource='#Notation'/> <rdfs:range resource='http://www.w3.org/2000/01/rdf-schema#Literal'/> </rdfs:Property> <rdfs:Property ID='target'> <rdfs:domain resource='#ProcessingInstruction'/> <rdfs:range resource='http://www.w3.org/2000/01/rdf-schema#Literal'/> </rdfs:Property> <rdfs:Property ID='version'> <rdfs:domain resource='#Document'/> <rdfs:range resource='http://www.w3.org/2000/01/rdf-schema#Literal'/> </rdfs:Property> </rdf:RDF>