W3C

XML Information Set

W3C Working Draft 20-December-1999

This version:
http://www.w3.org/TR/1999/WD-xml-infoset-19991220
Latest version:
http://www.w3.org/TR/xml-infoset
Previous versions:
http://www.w3.org/TR/1999/WD-xml-infoset-19990517
Editors:
John Cowan
David Megginson

Abstract

This specification describes an abstract data set containing the information available from an XML document.

Status of this Document

The XML Core Working Group, with this 1999 December 20 Infoset Last Call working draft, invites comment on this specification. The Last Call period begins 20 December 1999 and ends 31 January 2000.

The W3C Membership and other interested parties are invited to review the specification and report implementation experience. Please send comments to www-xml-infoset-comments@w3.org (archive).

For background on this work, please see the XML Activity Statement. While we welcome implementation experience reports, the XML Core Working Group will not allow early implementation to constrain its ability to make changes to this specification prior to final release.

See XML Information Set Requirements for the specific requirements that informed development of this specification.

A list of current W3C Recommendations and other technical documents can be found at http://www.w3.org/TR.

Contents

  1. Introduction
  2. Information Items
  3. Example
  4. Conformance
  5. What is not in the Information Set
  6. References
Appendix A: XML 1.0 Reporting Requirements
Appendix B: RDF Schema

1. Introduction

This document specifies an abstract data set called the XML information set (Infoset), a description of the information available in a well-formed XML document [XML].

Although technically well-formed XML 1.0, documents that do not conform to [Namespaces] are not considered to have meaningful information sets. This essentially bars documents that have element or attribute names containing colons that are used in other ways than as prescribed by [Namespaces]. There is no requirement for a XML document to be valid in order to have an information set.

An XML document's information set consists of two or more information items (the information set for any well-formed XML document will contain at least the document information item and one element information item). An information item is an abstract representation of some component of an XML document: each information item has a set of associated properties, some of which are core, and some of which are peripheral.

In earlier drafts, the term "required" was used rather than "core", and the term "optional" rather than "peripheral". The editor has made this change because "required" and "optional" suggest the behavior of an application rather than the status of part of a data structure.

For any given XML document, there are a number of corresponding information sets: a unique minimal information set consisting of the core properties of the core items and nothing else, a unique maximal information set consisting of all the core and all the peripheral items with all the peripheral properties, and one for every combination of present/absent peripheral items and properties in between. The in-between information sets must be fully consistent with the maximal information set.

All information sets are understood to describe the XML document with all entity references already expanded; that is, represented by the information items corresponding to their replacement text. In the case that an entity reference cannot be expanded, because an XML processor has not read its declaration or its value, explicit provision is made for representing such a reference in the information set.

The XML information set does not require or favor a specific interface or class of interfaces. This specification presents the information set as a tree for the sake of clarity and simplicity, but there is no requirement that the XML information set be made available through a tree structure; other types of interfaces, including (but not limited to) event-based and query-based interfaces are also capable of providing information conforming to the information set. As long as the information in the information set is made available to XML applications in one way or another, the requirements of this document are satisfied.

Note: In this document, the words "must", "should", and "may" assume the meanings specified in RFC 2119 [RFC2119], except that the words do not appear in upper case.

Note: To the best of the editors' knowledge and belief, the information set scheme described in this document satisfies the requirements of the XPointer-Information Set Liaison Statement [XPointer-Liaison].

Note: To the best of the editors' knowledge and belief, the interface specified by the Document Object Model, Level 1 Core Recommendation [DOM] conforms to the XML Information Set as currently specified.

2. Information Items

The XML information set can contain fifteen different types of information items:

  1. a document information item (core )
  2. element information items (core)
  3. attribute information items (core )
  4. processing instruction information items ( core)
  5. reference to skipped entity information items (core)
  6. character information items (core )
  7. comment information items (peripheral )
  8. a document type declaration information item (peripheral)
  9. entity information items (core for unparsed entities, peripheral for others)
  10. notation information items (core )
  11. entity start marker information items ( peripheral)
  12. entity end marker information items ( peripheral)
  13. CDATA start marker information items (peripheral)
  14. CDATA end marker information items ( peripheral)
  15. namespace declaration information items (core)

Every information item has properties, some of which are core and some of which are peripheral. Note that peripheral information items can, and do, have core properties. For ease of reference, each property is given a name, indicated [thus].

2.1. The Document Information Item

XML Definition: document (Section 2, Documents)

XML Syntax: [1] Document (Section 2.1, Well-Formed XML Documents)

There is always one document information item in the information set, and all other information items are related to the document information item, either directly or indirectly.

2.1.1. Document: Core Properties

The document information item must have the following properties available in some form:

  1. [children] An ordered list of references to child information items, in the original document order. The list must contain exactly one reference to an element information item, together with a reference to one processing instruction information item for each processing instruction preceding the document element (either in the document entity or in a lower-level entity) or following the document element. The list may contain references to other information items as well (see below).
  2. [notations] An unordered set of references to notation information items, one for each notation declaration in the DTD.
  3. [entities] An unordered set of references to entity information items, one for each unparsed entity declaration in the DTD.

2.1.2. Document: Peripheral Properties

The document information item may also have the following properties available in some form:

  1. [base URI] The absolute URI of the document entity, as computed by the method of RFC 2396 [RFC2396], if that is known.
  2. [children - comments] One reference to a comment information item for each comment outside the document element, added to the ordered list of child information items. The relative position of each comment information item in the list must reflect its position in the original document.
  3. [children - doctype] A reference to exactly one document type declaration information item, added to the ordered list of child information items. The relative position of the document type declaration information item in the list must reflect its position in the original document.
  4. [entities - other] One reference to an entity information item for each parsed general entity declaration in the DTD, added to the unordered set of entities. There can also be an entity information item for the document entity and for the external DTD subset.

2.2. Element Information Items

XML Definition: element (Section 3, Logical Structures)

XML Syntax: [39] Element (Section 3, Logical Structures)

There is one element information item for each element appearing in the XML document. Exactly one of the element information items corresponds to the document element (the root of the element tree), and all other element information items are contained within the document element, either directly or indirectly.

2.2.1. Elements: Core Properties

An element information item must have the following properties available in some form:

  1. [namespace URI] The URI part, if any, of the element's name.
  2. [local name] The local part of the element's name. This does not include any namespace prefix or following colon.
  3. [children] An ordered list of references to element, processing instruction, reference to skipped entity and character information items, one for each element, processing instruction, reference to an unprocessed external entity, and character appearing immediately within the current element, in the original document order. If the element is empty, this list will have no members.
  4. [attributes] An unordered set of references to attribute information items, one for each of the attributes (specified or defaulted) for this element. Namespace declarations are not represented as attribute information items. If there are no non-#IMPLIED attributes specified or defaulted for the element, this set will be empty.
  5. [declared namespaces] An unordered set of references to namespace declaration information items, one for each of the namespaces declared in this element. If there are no non-#IMPLIED namespace declarations specified or defaulted for the element, this set will be empty.

2.2.2. Elements: Peripheral Properties

An element information item may also have the following properties available in some form:

  1. [children - comments] A reference to a comment information item for each comment appearing immediately within the current element, added to the ordered list of children of the current element. The relative position of each comment information item in the list must reflect its position in the original document.
  2. [children - entity markers] An ordered set of pairs of references to entity start marker information items and their corresponding entity end marker information items, one pair for each entity reference in the content of the element, added to the ordered list of children of the current element. The relative position of each marker information item in the list must reflect its position in the original document. If an entity start marker is present, the corresponding entity end marker must also be present, and vice versa.
  3. [children - CDATA markers] An ordered set of pairs of references to CDATA start marker information items and their corresponding CDATA end marker information items, one pair for each CDATA section in the content of the element, added to the ordered list of children of the current element. The relative position of each marker information item in the list must reflect its position in the original document. If a CDATA start marker is present, the corresponding CDATA end marker must also be present, and vice versa.
  4. [base URI] The absolute URI of the external entity in which this element appears, as computed by the method of RFC 2396 [RFC2396]. If the element appears directly in the document entity, the URI is the absolute URI of the document entity, if that is known.
  5. [in-scope namespaces] An unordered set, distinct from that previously mentioned, of references to namespace declaration information items, one for each of the namespaces in effect for this element. If there are no namespaces in effect for the element, this set will be empty. This set will include all of the members of the preceding set, except for any information item representing a declaration in the form xmlns="", which does not declare a namespace but rather undeclares the default namespace.

2.3. Attribute Information Items

XML Definition: attribute (Section 3.1, Start-Tags, End-Tags, and Empty-Element Tags)

XML Syntax: [41] Attribute (Section 3.1, Start-Tags, End-Tags, and Empty-Element Tags)

There is one attribute information item for each attribute (specified or defaulted) for each element in the document instance. Namespace declarations are represented using namespace declaration information items, not attribute information items.

Attributes declared in the DTD with a default value of #IMPLIED and not specified in the element's start tag are not represented by attribute information items.

2.3.1. Attributes: Core Properties

An attribute information item must have the following properties available in some form:

  1. [namespace URI] The URI part, if any, of the attribute's name.
  2. [local name] The local part of the attribute's name. This does not include any namespace prefix or following colon.
  3. [children] An ordered list of references to character information items, one for each character appearing in the normalized attribute value.

2.3.2. Attributes: Peripheral Properties

In addition, for each attribute information item, the following property may be available in some form:

  1. [specified] A flag indicating whether this attribute was actually specified in the document instance, or was defaulted from the DTD.
  2. [default] An ordered list of references to character information items, one for each character appearing in the default value specified for this attribute in the DTD, if any. A #FIXED value is considered a default value.
  3. [attribute type]An indication of the type declared for this attribute in the DTD. Legitimate values are ID, IDREF, IDREFS, ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION, CDATA, and ENUMERATED.
  4. [children - entity markers] One reference to an entity start marker and a reference to its corresponding entity end marker information item for each entity reference in the attribute, added to the ordered list of children of the current attribute. The relative position of each marker information item in the list must reflect the beginning and ending of the entity in the original document. If an entity start marker is present, the corresponding entity end marker must also be present, and vice versa.

2.4. Processing Instruction Information Items

XML Definition: processing instruction (Section 2.6, Processing Instructions)

XML Syntax: [16] PI (Section 2.6, Processing Instructions)

There is one processing instruction information item for every processing instruction in the document. The XML declaration and text declarations for external parsed entities are not considered processing instructions.

2.4.1. Processing Instructions: Core Properties

A processing instruction information item must have the following properties available in some form:

  1. [target] The target part of the processing instruction's content (an XML name).
  2. [content] A string representing the content of the processing instruction, excluding the target and any whitespace immediately following it. The content may be the empty string.

2.4.2. Processing Instructions: Peripheral Properties

A processing instruction information item may also have the following properties available in some form:

  1. [base URI] The absolute URI of the external entity in which this PI appears, as computed by the method of RFC 2396 [RFC2396]. If the PI appears directly in the document entity, the URI is the absolute URI of the document entity, if that is known.

2.5. Reference to Skipped Entity Information Items

XML Definition: Section 4.4.3, Included If Validating

There is one reference to skipped entity information item for each reference to an entity not included by a non-validating XML processor because the XML processor does not include external parsed entities.

A validating XML processor will never generate reference to skipped entity information items for a valid XML document.

2.5.1. Reference to Skipped Entity: Core Properties

A reference to skipped entity information item must have the following information available in some form:

  1. [name] The name of the entity referenced.

2.5.2. Reference to Skipped Entity: Peripheral Properties

A reference to skipped entity information item may also have the following properties available in some form:

  1. [referent] A reference to the entity information item for the skipped entity (if the XML processor has read the declaration).

2.6. Character Information Items

XML Definition: characters (Section 2.2, Characters)

XML Syntax: [2] Char (Section 2.2, Characters)

There is one character information item for each non-markup character that appears within the document element, either literally, as a character reference, or within a CDATA section. There is also one character information item for each character that appears in a normalized attribute value.

Note, however, that a CR (#xD) character that is followed by a LF (#xA) character is not represented by any information item. Furthermore, a CR character that is not followed by a LF character is treated as a LF character. These rules do not apply to CR characters created by character references such as 
 or 
.

Each character is a logically-separate information item, but XML applications are free to chunk characters into larger groups as necessary or desirable.

2.6.1. Characters: Core Properties

A character information item must have the following properties available in some form:

  1. [character code] The ISO 10646 character code (in the range 0 to #x10FFFF, but not every value in this range is a legal XML character code) of the character.

2.6.2. Characters: Peripheral Properties

A character information item may also have the following properties available in some form:

  1. [element content whitespace] A flag indicating whether the character is whitespace appearing within element content (see [XML], 2.10 "White Space Handling"). Note that validating XML processors are required by XML 1.0 to provide this information.
  2. [predefined entity] A flag indicating whether the character was included through one of the predefined XML entities.

2.7. Comment Information Items

XML Definition: comment (Section 2.5, Comments)

XML Syntax: [15] Comment (Section 2.5, Comments)

The peripheral comment information item corresponds to a single XML comment in the original document.

2.7.1. Comments: Core Properties

If a comment information item is included, the following properties must be available:

  1. [content] A string representing the content of the comment.

2.8. The Document Type Declaration Information Item

XML Definition: document type declaration (section 2.8, Prolog and Document Type Declaration)

XML Syntax: [28] doctypedecl (section 2.8, Prolog and Document Type Declaration)

If the XML document has a document type declaration, then the information set may contain a single document type declaration information item. Note that although entities and notations are logically part of the document type declaration, they are provided as properties of the document information item, because XML processors must provide information on them.

2.8.1. Document Type Declaration: Peripheral Properties

A document type declaration information item may have the following properties available in some form:

  1. [external DTD] A reference to the entity information item for the external DTD subset, if such an information item exists. The public and system identifiers for the external DTD subset are available through this information item.
  2. [children] An ordered list of references to comment information items and processing instruction information items representing comments and processing instructions appearing in the DTD, in the original document order. Items from the internal DTD subset appear before those in the external subset.

2.9. Entity Information Items

XML Definition: entity (section 4, Physical Structures)

XML Syntax: [70] EntityDecl (section 4.2, Entity Declarations)

Entity information items are peripheral, except for information items representing unparsed external entities, which are core information items.

There is at most one entity information item for each general entity, internal or external, declared in the DTD: when the same entity is declared more than once, only the first declaration is used. Parameter entities are not represented by entity information items. There is also at most one entity information item for the document entity, and at most one for the DTD external subset (if there is one). It is perfectly all right for an XML processor to report some entities and not others.

2.9.1. Entities: Core Properties

The entity information item, if included, must have the following information available in some form:

  1. [entity type] An indication of the type of the entity (internal general entity, external general entity, unparsed entity, document entity, or external DTD subset).
  2. [name] The name of the entity. If the information item represents the document entity or the external DTD subset, the name is null.
  3. [system identifier] The system identifier of the entity. If the information item represents an internal entity, the value of this property is always null, and if it represents the document entity, the value may be null; otherwise, it must have a non-null value.
  4. [public identifier] The public identifier of the entity, if one is available. For internal entities, the value is always null.
  5. [base URI] The absolute URI corresponding to the entity. If the information item represents an internal entity, the value of this property is always null, and if it represents the document entity, the value may be null; otherwise, it must have a non-null value.
  6. [notation] A reference to the notation information item associated with the entity, if the entity is an unparsed (NDATA) entity. For entities other than unparsed entities, the value must be null.

2.9.2. Entities: Peripheral Properties

An entity information item may also have the following information available in some form:

  1. [content] The replacement text of the entity, if it is an internal entity.
  2. [charset] The name of the character encoding in which the entity is expressed. This property is derived from the XML or text declaration optionally present at the beginning of the document entity or an external entity respectively.
  3. [standalone] An indication of the standalone status of the entity (which must be the document entity in this case), either "yes", "no", or "not present". This property is derived from the XML declaration optionally present at the beginning of the document entity.

2.10. Notation Information Items

XML Definition: notation (section 4.7, Notation Declarations)

XML Syntax: [82] NotationDecl (section 4.7, Notation Declarations)

There is one notation information item for each notation declared in the DTD.

2.10.1. Notations: Core Properties

A notation information item must have the following properties available:

  1. [name] The name of the notation.
  2. [system identifier] The system identifier of the notation, if one was specified.
  3. [public identifier] The public identifier of the notation, if one was specified.
  4. [base URI] The absolute URI corresponding to the notation.

2.11. Entity Start Marker Information Items

XML Definition: entity reference (section 4.1, Character and Entity References)

XML Syntax: [68] EntityRef (section 4.1, Character and Entity References)

Entity start marker information items are an peripheral part of the information set. They are inserted to mark the place where text included from an general entity (as a consequence of an entity reference) begins. They appear as children of an element or attribute information item.

Entity start marker information items are not used in connection with parameter entity references in the DTD.

2.11.1. Entity Start Markers: Core Properties

An entity start marker information item, if present, must have the following properties available in some form:

  1. [entity] A reference to the entity information item referred to by the entity reference which triggered the insertion of this information item.

2.12. Entity End Marker Information Items

XML Definition: entity reference (section 4.1, Character and Entity References)

XML Syntax: [68] EntityRef (section 4.1, Character and Entity References)

Entity end marker information items are an peripheral part of the information set. They are inserted to mark the place where text included from an general entity (as a consequence of an entity reference) concludes. They appear as children of an element or attribute information item.

Entity end marker information items are not used in connection with parameter entity references in the DTD.

2.12.1. Entity End Markers: Core Properties

An entity end marker information item, if present, must have the following properties available in some form:

  1. [entity] A reference to the entity information item referred to by the entity reference which triggered the insertion of this information item.

2.13. CDATA Start Marker Information Items

XML Definition: CDATA sections (section 2.7, CDATA sections)

XML Syntax: [18] CDSect (section 2.7, CDATA Sections)

CDATA start marker information items are an peripheral part of the information set. They are inserted to mark the place where text embedded in a CDATA section begins. They appear as children of an element information item.

CDATA start marker information items have no properties.

2.14. CDATA End Marker Information Items

XML Definition: CDATA sections (section 2.7, CDATA sections)

XML Syntax: [18] CDSect (section 2.7, CDATA Sections)

CDATA end marker information items are an peripheral part of the information set. They are inserted to mark the place where text embedded in a CDATA section concludes. They appear as children of an element information item.

CDATA end marker information items have no properties.

2.15. Namespace Declaration Information Items

XML Definition: attribute (Section 3.1, Start-Tags, End-Tags, and Empty-Element Tags)

XML Syntax: [41] Attribute (Section 3.1, Start-Tags, End-Tags, and Empty-Element Tags)

There is one namespace declaration information item for each namespace declaration (specified or defaulted) for each element in the document instance. Namespace declarations are syntactically like attribute declarations of attributes whose names begin with the string xmlns.

Namespace declarations declared in the DTD with a default value of #IMPLIED and not specified in the element's start tag are not represented by information items.

Note that the last two properties present the same underlying information in overlapping ways. XML processors may report either one or both, but must report at least one.

2.15.1. Namespace Declarations: Core Properties

A namespace declaration information item must have the following properties available in some form:

  1. [prefix] The prefix being declared. Syntactically, this is the part of the attribute name following the xmlns: prefix. If the attribute name is simply xmlns, this property is a null string.
  2. [namespace URI] The absolute URI (plus optional fragment identifier) of the namespace being declared. It may be a null string. This property is considered a core property if and only if the following property is not present.
  3. [children] An ordered list of references to character information items, one for each character appearing in the normalized attribute value. There may also be a reference to an entity start marker and a reference to its corresponding entity end marker information item for each entity reference in the attribute. The relative position of each marker information item in the list must reflect its position in the original document. If an entity start marker is present, the corresponding entity end marker must also be present, and vice versa. This property is considered a core property if and only if the preceding property is not present.

3. Example

Consider the following example XML document:

<?xml version="1.0"?>

<msg:message dc:date="19990421"
             xmlns:dc="http://purl.org/metadata/dublin_core#"
             xmlns:msg="http://www.message.net/"
>Phone home!</msg:message>

The Information Set for this XML document will contain at least the following items in some form:

4. Conformance

An XML processor conforms to the XML Information Set if it provides all the core information items and all their core properties corresponding to that part of the document that the processor has actually read. For instance, attributes are core information items; therefore, an XML processor that does not report the existence of attributes, as well as their names and values (which are core properties of attributes), does not conform to the XML Information Set.

Some information items are peripheral, and some core information items have peripheral information associated with them. If an XML processor reports an information item, then it must supply at least the core properties defined by the XML Information Set in order to conform. For instance, if an XML processor chooses to supply entity information items, which are peripheral, then it is also required to supply names for the entities, since the XML Information Set specifies that the name of an entity information item is a core property. However, since entity information items are peripheral, an XML processor which does not supply them at all also conforms to the XML Information Set.

The XML 1.0 Recommendation [XML] explicitly allows non-validating XML processors to omit parsing the external DTD subset and external entities (both parsed general entities and parameter entities). As a result, it is possible that a non-validating XML processor will omit reading attribute and entity declarations or actual markup that will affect the quantity and quality of information included in the information set. Validating XML processors must report all core information; non-validating XML processors may omit core information that appears outside of the top-level document entity (either in the external DTD subset or in an external text entity) if they do not read the other entities.

XML Processors may optionally provide additional information not found in the XML Information Set; for instance, the XML Information Set excludes whitespace that occurs between attributes from the information set, but an XML Processor that provides this information will still conform to the Information Set as long as it provides the information that is required for conformance to the XML Information Set.

5. What is not in the Information Set

The following information is not represented in the current version of the XML Information Set:

  1. The XML version number.
  2. The content models of elements, from ELEMENT declarations in the DTD.
  3. The grouping and ordering of attribute declarations in ATTLIST declarations.
  4. Whitespace outside the document element.
  5. The difference between the two forms of an empty element: <foo/> and <foo></foo>.
  6. Whitespace within start-tags (other than significant whitespace in attribute values) and end-tags.
  7. The difference between CR, CR-LF, and LF line termination.
  8. The unnormalized form of attribute values (see 3.3.3 Attribute-Value Normalization [XML]).
  9. The order of attributes within a start-tag.
  10. The order of declarations within the DTD.
  11. The boundaries of conditional sections in the DTD.
  12. Any ignored declarations, including those within an IGNORE conditional section, as well as entity and attribute declarations ignored because previous entity declarations override them.

Furthermore, the XML Infoset does not provide any method of assigning a single series of numbers to all child nodes of an element or of the document that is guaranteed to be reliable regardless of the underlying XML processor. Although such a method would be desirable, it is considered unachievable for XML, due to the difficulties produced by references to skipped entities, non-validating processors, and peripheral information items.

In other words, there is no reliable way to specify something like "the second child of this element" without restricting both the type of XML processor and the types of children being counted.

6. References

DOM
Document Object Model (DOM) Level 1 Specification, eds. Vidur Apparao, Steve Byrne, Mike Champion, et alii. 1 October 1998. Available at http://www.w3.org/TR/REC-DOM-Level-1/ .
Namespaces
Namespaces in XML, eds. Tim Bray, Dave Hollander, and Andrew Layman. 14 January 1999. Available at http://www.w3.org/TR/REC-xml-names.
RFC2119
Key words for use in RFCs to Indicate Requirement Levels, ed. S. Bradner. March 1997. Available at http://www.isi.edu/in-notes/rfc2119.txt.
RFC2396
Uniform Resource Identifiers (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter. August 1998. Available at http://www.isi.edu/in-notes/rfc2396.txt.
XML
Extensible Markup Language (XML) 1.0, eds. Tim Bray, Jean Paoli, and Michael Sperberg-McQueen. 10 February 1998. Available at http://www.w3.org/TR/REC-xml.
XPointer-Liaison
XPointer-Information Set Liaison Statement, ed. Steven J. DeRose. 24 February 1999. Available at http://www.w3.org/TR/NOTE-xptr-infoset-liaison.

Appendix A: XML 1.0 Reporting Requirements (informative)

Although the XML 1.0 Recommendation [XML] is primarily concerned with XML syntax, it also includes some specific reporting requirements for XML processors.

The reporting requirements include errors, which are outside the scope of this specification, and document information; all of the XML 1.0 requirements for document information reporting have been integrated into the XML information set specification (numbers in parentheses refer to sections of the Recommendation):

  1. An XML processor must always provide all characters in a document that are not part of markup to the application (2.10). We have interpreted this requirement to refer only to characters within the document element.
  2. A validating XML processor must inform the application which of the character data in a document is whitespace appearing within element content (2.10).
  3. An XML processor must pass a single LF character in place of CR or CR-LF characters appearing in its input (2.5).
  4. An XML processor must normalize the value of attributes according to the rules in clause 3.3 before passing them to the application. This implies that the value of attributes after normalization are passed to the application (3.3).
  5. An XML processor must pass the names and external identifiers (system identifiers, public identifiers or both) of declared notations to the application (4.7).
  6. When the name of an unparsed entity appears as the explicit or default value of an ENTITY or ENTITIES attribute, an XML processor must provide the names, system identifiers, and (if present) public identifiers of both the entity and its notation to the application (4.6, 4.7).
  7. An XML processor must pass processing instructions to the application. (2.6)
  8. An XML processor (necessarily a non-validating one) that does not include the replacement text of an external parsed entity in place of an entity reference must notify the application that it recognized but did not read the entity (4.4.3).
  9. A validating XML processor must include the replacement text of an entity in place of an entity reference. (5.2)
  10. A validating XML processor must supply the default value of attributes declared in the DTD for a given element type but not appearing in the element's start tag (5.2).

Appendix B: RDF Schema (informative)

The following RDF Schema provides a formal characterization of the Infoset. In case of disagreement between this schema and the prose in this document, the prose should be taken as normative.

<?xml version='1.0' standalone='yes'?>
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
 xmlns:rdfs='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#'
 xmlns='http://www.w3.org/1999/WD-infoset-19991201#'>

<!--Enumeration classes and their members-->

<rdfs:Class id='AttrType'/>
<AttrType id='AttrType.ID'/>
<AttrType id='AttrType.IDREF'/>
<AttrType id='AttrType.IDREFS'/>
<AttrType id='AttrType.ENTITY'/>
<AttrType id='AttrType.ENTITIES'/>
<AttrType id='AttrType.NMTOKEN'/>
<AttrType id='AttrType.NMTOKENS'/>
<AttrType id='AttrType.NOTATION'/>
<AttrType id='AttrType.CDATA'/>
<AttrType id='AttrType.ENUMERATED'/>

<rdfs:Class id='Boolean'/>
<Boolean id='Boolean.true'/>
<Boolean id='Boolean.false'/>

<rdfs:Class id='EntityType'/>
<EntityType id=EntityType.InternalGeneral'/>
<EntityType id=EntityType.ExternalGeneral'/>
<EntityType id=EntityType.Unparsed'/>
<EntityType id=EntityType.DocumentEntity'/>
<EntityType id=EntityType.ExternalDTDSubset'/>

<rdfs:Class id='Integer'
 rdfs:subClassOf='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal'/>

<rdfs:Class id='StandaloneType'/>
<StandaloneType id='StandaloneType.yes'/>
<StandaloneType id='StandaloneType.no'/>
<StandaloneType id='StandaloneType.notSpecified'/>


<!--Info item classes in document order-->

<rdfs:Class id='InfoItem'/>

<rdfs:Class id='Document' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='Element' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='Attribute' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='ProcessingInstruction' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='Character' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='ReferenceToSkippedEntity' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='Comment' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='DocumentTypeDeclaration' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='Entity' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='Notation' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='EntityStartMarker' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='EntityEndMarker' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='CDATAStartMarker' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='CDATAEndMarker' rdfs:subClassOf='#InfoItem'/>

<rdfs:Class id='Namespace' rdfs:subClassOf='#InfoItem'/>\


<!--Set containers-->

<rdfs:Class id='InfoItemSet'
  rdfs:subClassOf='http://www.w3.org/1999/02/22-rdf-syntax-ns#Bag'/>

<rdfs:Class id='AttributeSet' rdfs:subClassOf='#InfoItemSet'/>

<rdfs:Class id='EntitySet' rdfs:subClassOf='#InfoItemSet'/>

<rdfs:Class id='NamespaceSet' rdfs:subClassOf='#InfoItemSet'/>

<rdfs:Class id='NotationSet' rdfs:subClassOf='#InfoItemSet'/>


<!--Sequence container-->

<rdfs:Class id='InfoItemSeq'
 rdfs:subClassOf='http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq'/>


<!--Info item properties-->

<rdfs:Property id='attributes'>
  <rdfs:domain resource='#Element'/>
  <rdfs:range resource='#AttributeSet'/>
</rdfs:Property>

<rdfs:Property id='attributeType'>
  <rdfs:domain resource='#Attribute'/>
  <rdfs:range resource='#AttrType'/>
</rdfs:Property>

<rdfs:Property id='baseURI'>
  <rdfs:domain resource='#Document'/>
  <rdfs:domain resource='#Element'/>
  <rdfs:domain resource='#ProcessingInstruction'/>
  <rdfs:domain resource='#Entity'/>
  <rdfs:domain resource='#Notation'/>
  <rdfs:range resource='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal'/>
</rdfs:Property>

<rdfs:Property id='characterCode'>
  <rdfs:domain resource='#Character'/>
  <rdfs:range resource='#Integer'/>
</rdfs:Property>

<rdfs:Property id='charset'>
  <rdfs:domain resource='#Entity'/>
  <rdfs:range resource='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal'/>
</rdfs:Property>

<rdfs:Property id='children'>
  <rdfs:domain resource='#Document'/>
  <rdfs:domain resource='#Element'/>
  <rdfs:domain resource='#Attribute'/>
  <rdfs:domain resource='#DocumentTypeDeclaration'/>
  <rdfs:domain resource='#Namespace'/>
  <rdfs:range resource='#InfoItemSeq'/>
</rdfs:Property>

<rdfs:Property id='content'>
  <rdfs:domain resource='#ProcessingInstruction'/>
  <rdfs:domain resource='#Comment'/>
  <rdfs:domain resource='#Entity'/>
  <rdfs:range resource='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal'/>
</rdfs:Property>

<rdfs:Property id='declaredNamespaces'>
  <rdfs:domain resource='#Element'/>
  <rdfs:range resource='#NamespaceSet'/>
</rdfs:Property>

<rdfs:Property id='default'>
  <rdfs:domain resource='#Attribute'/>
  <rdfs:range resource='#Boolean'/>
</rdfs:Property>

<rdfs:Property id='elementContentWhitespace'>
  <rdfs:domain resource='#Character'/>
  <rdfs:range resource='#Boolean'/>
</rdfs:Property>

<rdfs:Property id='entity'>
  <rdfs:domain resource='#EntityStartMarker'/>
  <rdfs:domain resource='#EntityEndMarker'/>
  <rdfs:range resource='#Entity'/>
</rdfs:Property>

<rdfs:Property id='entities'>
  <rdfs:domain resource='#Document'/>
  <rdfs:range resource='#EntitySet'/>
</rdfs:Property>

<rdfs:Property id='entityType'>
  <rdfs:domain resource='#Attribute'/>
  <rdfs:range resource='#AttrType'/>
</rdfs:Property>

<rdfs:Property id='externalDTD'>
  <rdfs:domain resource='#DocumentTypeDeclaration'/>
  <rdfs:range resource='#Entity'/>
</rdfs:Property>

<rdfs:Property id='inScopeNamespaces'>
  <rdfs:domain resource='#Element'/>
  <rdfs:range resource='#NamespaceSet'/>
</rdfs:Property>

<rdfs:Property id='localName'>
  <rdfs:domain resource='#Element'/>
  <rdfs:domain resource='#Attribute'/>
  <rdfs:range resource='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal'/>
</rdfs:Property>

<rdfs:Property id='name'>
  <rdfs:domain resource='#ReferenceToSkippedEntity'/>
  <rdfs:domain resource='#Entity'/>
  <rdfs:domain resource='#Notation'/>
  <rdfs:range resource='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal'/>
</rdfs:Property>

<rdfs:Property id='namespaceURI'>
  <rdfs:domain resource='#Element'/>
  <rdfs:domain resource='#Attribute'/>
  <rdfs:domain resource='#Namespace'/>
  <rdfs:range resource='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal'/>
</rdfs:Property>

<rdfs:Property id='notation'>
  <rdfs:domain resource='#Entity'/>
  <rdfs:range resource='#Notation'/>
</rdfs:Property>

<rdfs:Property id='notations'>
  <rdfs:domain resource='#Document'/>
  <rdfs:range resource='#NotationSet'/>
</rdfs:Property>

<rdfs:Property id='predefinedEntity'>
  <rdfs:domain resource='#Character'/>
  <rdfs:range resource='#Boolean'/>
</rdfs:Property>

<rdfs:Property id='prefix'>
  <rdfs:domain resource='#Namespace'/>
  <rdfs:range resource='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal'/>
</rdfs:Property>

<rdfs:Property id='publicIdentifier'>
  <rdfs:domain resource='#Entity'/>
  <rdfs:domain resource='#Notation'/>
  <rdfs:range resource='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal'/>
</rdfs:Property>

<rdfs:Property id='referent'>
  <rdfs:domain resource='#ReferenceToSkippedEntity'/>
  <rdfs:range resource='#Entity'/>
</rdfs:Property>

<rdfs:Property id='specified'>
  <rdfs:domain resource='#Attribute'/>
  <rdfs:range resource='#Boolean'/>
</rdfs:Property>

<rdfs:Property id='standalone'>
  <rdfs:domain resource='#Entity'/>
  <rdfs:range resource='#StandaloneType'/>
</rdfs:Property>

<rdfs:Property id='systemIdentifier'>
  <rdfs:domain resource='#Entity'/>
  <rdfs:domain resource='#Notation'/>
  <rdfs:range resource='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal'/>
</rdfs:Property>

<rdfs:Property id='target'>
  <rdfs:domain resource='#ProcessingInstruction'/>
  <rdfs:range resource='http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal'/>
</rdfs:Property>


</rdf:RDF>