W3C

Efficient XML Interchange (EXI) Format 1.0 Errata

27 June 2013


Abstract

This document records all known errors in the Efficient XML Interchange (EXI) Format 1.0 (hereinafter, "the specification" or "the spec").

If you find errors in the specification that are not listed in this document, please report them to public-exi-comments@w3.org. Archives of the mailing list are available at public archive

Table of Contents

1. Substantive Errata
2. Editorial Errata
3. Clarifications
A. Errata Changes


1. Substantive Errata

Section 7

19 September 2012 (1)

Below is a paragraph excerpted from section 7 Representing Event Content.

Schemas can provide one or more enumerated values for datatypes. When the Preserve.lexicalValues option is false, EXI exploits those pre-defined values when they are available to represent values of such datatypes in a more efficient manner than would have done otherwise without using pre-defined values. The encoding rule for representing enumerated values is described in . Datatypes that are derived from another by union and their subtypes are always represented as String regardless of the availability of enumerated values. Representation of values of which the datatype is one of QName, Notation or a datatype derived therefrom by restriction are also not affected by enumerated values if any.

Make the above paragraph the one shown below. The modified part is highlighted in color for distinction purposes only.

Schemas can provide one or more enumerated values for datatypes. When the Preserve.lexicalValues option is false, EXI exploits those pre-defined values when they are available to represent values of such datatypes in a more efficient manner than would have done otherwise without using pre-defined values. The encoding rule for representing enumerated values is described in . Datatypes that are derived from another by union and their subtypes are always represented as String regardless of the availability of enumerated values. Representation of values of which the datatype is either a list datatypeXS2, or one of QName, Notation or a datatype derived therefrom by restriction are also not affected by enumerated values if any.

Section 7.2

19 September 2012 (2)

Below is a paragraph excerpted from section 7.2 Enumerations.

Exceptions are for schema types derived from others by union and their subtypes, QName or Notation and types derived therefrom by restriction. The values of such types are processed by their respective built-in EXI datatype representations instead of being represented as enumerations.

Make the above paragraph the one shown below. The modified part is highlighted in color for distinction purposes only.

Exceptions are for schema union datatypesXS2 , list datatypesXS2, as well as QName or Notation and types derived therefrom by restriction. The values of such types are processed by their respective built-in EXI datatype representations instead of being represented as enumerations.

Section 8.4.3

08 May 2012

Change the semantics section that currently reads as follows

All productions in the built-in element grammarof the form LeftHandSide: AT (*) RightHandSide are evaluated as follows:

  1. Let qname be the qname of the attribute matched by AT (*)
  2. Create a production of the form LeftHandSide : AT (qname) RightHandSide with an event code 0 and increment the first part of the event code of each production in the current grammar with the non-terminal LeftHandSide on the left-hand side. Add this production to the grammar.
  3. If qname is xsi:type, let target-type be the value of the xsi:type attribute and assign it the QName datatype representation (see 7.1.7 QName). If there is no namespace in scope for the specified qname prefix, set the uri of target-type to empty ("") and the localName to the full lexical value of the QName, including the prefix. Encode target-type according to section 7. Representing Event Content. If a grammar can be found for the target-type type using the encoded target-type representation, evaluate the element contents using the grammar for target-type type instead of RightHandSide.

to

All productions in the built-in element grammarof the form LeftHandSide: AT (*) RightHandSide are evaluated as follows:

  1. Let qname be the qname of the attribute matched by AT (*)
  2. If qname is not xsi:type or If a production of the form LeftHandSide : AT(xsi:type) with an event code of length 1 does not exist in the current element grammar, create a production of the form LeftHandSide : AT (qname) RightHandSide with an event code 0 and increment the first part of the event code of each production in the current grammar with the non-terminal LeftHandSide on the left-hand side. Add this production to the grammar.
  3. If qname is xsi:type, let target-type be the value of the xsi:type attribute and assign it the QName datatype representation (see 7.1.7 QName). If there is no namespace in scope for the specified qname prefix, set the uri of target-type to empty ("") and the localName to the full lexical value of the QName, including the prefix. Encode target-type according to section 7. Representing Event Content. If a grammar can be found for the target-type type using the encoded target-type representation, evaluate the element contents using the grammar for target-type type instead of RightHandSide.

Section 8.5.4.1.3

29 March 2013 (1)

Change the the fourth paragraph in Section 8.5.4.1.3 Type Grammars from

Sections 8.5.4.1.3.1 Simple Type Grammars and 8.5.4.1.3.2 Complex Type Grammars describe the processes for creating Type i  and TypeEmpty i  from XML Schema simple type definitionsXS1 and complex type definitionsXS1 defined in schemas as well as built-in primitive typesXS2, built-in derived typesXS2 and simple ur-typeXS2 defined by XML Schema specification [XML Schema Datatypes]. Section 8.5.4.1.3.3 Complex Ur-Type Grammar defines the grammar used for processing instances of element contents of type xsd:anyTypeXS1.

to

Sections 8.5.4.1.3.1 Simple Type Grammars and 8.5.4.1.3.2 Complex Type Grammars describe the processes for creating Type i  and TypeEmpty i  from XML Schema simple type definitionsXS1 and complex type definitionsXS1 defined in schemas as well as built-in primitive typesXS2, built-in derived typesXS2, simple ur-typeXS2 and complex ur-typeXS1 defined by XML Schema specification [XML Schema Datatypes].

Section 8.5.4.1.3.2

29 March 2013 (2)

Change the grammar that reads as follows

G n−1, 0  :
EE

to the following form

G n−1, 0  :
EE

G n−1, 1  :
EE

and add the following rule just before the first note in the section:

If there is neither an attribute use nor an {attribute wildcard}, G 0  of the following form is used as an attribute use grammar.

G 0, 0  :
EE

Section 8.5.4.1.3.3

29 March 2013 (3)

Given that the EXI specification is already clear in Section 8.5.4.1.3 Type Grammars how grammars are build Section 8.5.4.1.3.3 Complex Ur-Type Grammar and references to it are entirely removed.

Appendix A.1

26 June 2013

Add the following paragraph below the Namespaces in XML reference:

Namespaces in XML 1.1

Namespaces in XML 1.1 (Second Edition), T. Bray, D. Hollander, A. Layman, and R. Tobin, Editors. World Wide Web Consortium, 4 February 2004, revised 16 August 2006. This version is http://www.w3.org/TR/2006/REC-xml-names11-20060816. The latest version is available at http://www.w3.org/TR/xml-names11/.

2. Editorial Errata

To be added upon receipt of errors.

3. Clarifications

Section 7

30 May 2011

Append the following text as 2nd paragraph right after Table 7-2.

The restricted character set for a value that would be represented as an EXI enumeration is the restricted character set of the EXI datatype representation of the enumeration base type.

Section 7.1.2

22 February 2012

Below are two paragraphs excerpted from section 7.1.2 Boolean.

In the absence of pattern facets in the schema datatype, the Boolean datatype representation is a n-bit unsigned integer (7.1.9 n-bit Unsigned Integer), where n is one (1). The value zero (0) represents false and the value one (1) represents true.
Otherwise, when pattern facets are available in the schema datatype, the Boolean datatype representation is a n-bit unsigned integer (7.1.9 n-bit Unsigned Integer), where n is two (2) and the values zero (0), one (1), two (2) and three (3) represent the values "false", "0", "true" and "1" respectively.

Change the excerpted text to the one shown below.

When the associated schema datatype is derived from xsd:boolean and pattern facets are available in the schema datatype, the Boolean datatype representation is a n-bit unsigned integer (7.1.9 n-bit Unsigned Integer), where n is two (2) and the values zero (0), one (1), two (2) and three (3) represent the values "false", "0", "true" and "1" respectively.
Otherwise, the Boolean datatype representation is a n-bit unsigned integer (7.1.9 n-bit Unsigned Integer), where n is one (1). The value zero (0) represents false and the value one (1) represents true.

The primary change is in the order of the two paragraphs. In the revised text, the special case is described first, followed by the default case. A clause clarifying the condition is added, highlighted in color above for distinction.

Section 7.4

05 October 2011

Below is a paragraph excerpted from section 7.4 Datatype Representation Map.

EXI processors that support Datatype Representation Maps MAY provide implementation specific means to define and install user-defined datatype representations. EXI processors MAY also provide implementation specific means for applications or users to specify alternate built-in EXI datatype representations or user-defined datatype representations for representing specific schema datatypes. As with the default EXI datatype representations, alternate datatype representations are used for the associated XML Schema types specified in the Datatype Representation Map and XML Schema datatypes derived from those datatypes. When there are built-in or user-defined datatype representations associated with more than one XML Schema datatype in the type hierarchy of a particular datatype, the closest ancestor with an associated datatype representation is used to determine the EXI datatype representation.

Make the above paragraph the one shown below by appending a text. The appended part is highlighted in color for distinction purposes only.

EXI processors that support Datatype Representation Maps MAY provide implementation specific means to define and install user-defined datatype representations. EXI processors MAY also provide implementation specific means for applications or users to specify alternate built-in EXI datatype representations or user-defined datatype representations for representing specific schema datatypes. As with the default EXI datatype representations, alternate datatype representations are used for the associated XML Schema types specified in the Datatype Representation Map and XML Schema datatypes derived from those datatypes. When there are built-in or user-defined datatype representations associated with more than one XML Schema datatype in the type hierarchy of a particular datatype, the closest ancestor with an associated datatype representation is used to determine the EXI datatype representation. For XML Schema datatypes with enumerated values, the encoding rule described in 7.2 Enumerations is used as the representation when the closest ancestor datatype with an associated datatype representation has no enumerated values.

Section 8.5.4.1.5

03 April 2012

Below is a paragraph excerpted from section 8.5.4.1.5 Particles.

Otherwise, if {max occurs} is unbounded, generate one additional copy of Term 0 , G {min occurs} and replace all productions of the form:

G {min occurs}, k :
EE

with productions of the form:

G {min occurs}, k :
G {min occurs}, 0

indicating this term may be repeated indefinitely. Then if there is no production of the form:

G {min occurs}, 0 :
EE

add one after the other productions with the non-terminal G {min occurs}, 0 on the left-hand side, indicating this term may be omitted from the content model. Then, create the grammar for Particle i using the grammar concatenation operator defined in section 8.5.4.1.1 Grammar Concatenation Operator as follows:

Particle i = G 0G 1 ⊕ … ⊕ G {min occurs}

Make the above text the one shown below. The modified part is highlighted in color for distinction purposes only.

Otherwise, if {max occurs} is unbounded, generate one additional copy of Term 0 , G {min occurs} and replace all productions of the form:

G {min occurs}, k :
EE

with productions of the form:

G {min occurs}, k :
G {min occurs}, 0

indicating this term may be repeated indefinitely. Then, when there is no more production of the form:

G {min occurs}, 0 :
EE

add one after the other productions with the non-terminal G {min occurs}, 0 on the left-hand side, indicating this term may be omitted from the content model. Then, create the grammar for Particle i using the grammar concatenation operator defined in section 8.5.4.1.1 Grammar Concatenation Operator as follows:

Particle i = G 0G 1 ⊕ … ⊕ G {min occurs}

Section 4

06 May 2013

Append the following text as 4th paragraph after Table 4-1:

The namespace of elements and attributes is specified as part of SE and AT events and hence namespace declarations can be omitted from the EXI stream if preservation of prefixes is not required by the applications. As prescribed by Table B-2 and Table B-11, [namespace attributes] representing namespace declarations are mapped to NS events and SHOULD NOT be represented by AT events. This also implies that the following AT events SHOULD NOT occur in EXI streams: (1) AT events with qname whose uri is "http://www.w3.org/2000/xmlns/"; (2) AT events with qname which has empty uri ("") and local name either of the form "xmlns" or "xmlns:*", where "*" represent string with 0 or more characters.

Section 7.1.8

13 June 2013

Below is the first paragraph and Table 7-3 excerpted from section 7.1.8 Date-Time:

The Date-Time datatype representation is a sequence of values representing the individual components of the Date-Time. The following table specifies each of the possible date-time components along with how they are encoded.

Table 7-3. Date-Time components
ComponentValueType
YearOffset from 2000Integer ( 7.1.5 Integer)
MonthDay Month * 32 + Day 9-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer) where day is a value in the range 1-31 and month is a value in the range 1-12.
Time ((Hour * 64) + Minutes) * 64 + seconds 17-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer)
FractionalSecsFractional secondsUnsigned Integer ( 7.1.6 Unsigned Integer) representing the fractional part of the seconds with digits in reverse order to preserve leading zeros
TimeZone TZHours * 64 + TZMinutes 11-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer) representing a signed integer offset by 896 ( = 14 * 64 )
presenceBoolean presence indicatorBoolean (7.1.2 Boolean)

Change the content of the paragraph and the table to the one shown below by appending the highlighted text:

The Date-Time datatype representation is a sequence of values representing the individual components of the Date-Time. The following table specifies each of the possible date-time components along with how they are encoded. The value ranges of the date-time components follow the definitions of the XML Schema specification [XML Schema Datatypes] which for example prescribes the value range of the seconds to be between 0 and 60 to account for leap second representation and hour between 0 and 24 among others.

Table 7-3. Date-Time components
ComponentValueType
YearOffset from 2000Integer ( 7.1.5 Integer)
MonthDay Month * 32 + Day 9-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer) where day is a value in the range 1-31 and month is a value in the range 1-12.
Time ((Hour * 64) + Minutes) * 64 + seconds 17-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer) where Hour is a value in the range 0-24, Minutes is a value in the range 0-59 and seconds is a value in the range 0-60
FractionalSecsFractional secondsUnsigned Integer ( 7.1.6 Unsigned Integer) representing the fractional part of the seconds with digits in reverse order to preserve leading zeros
TimeZone TZHours * 64 + TZMinutes 11-bit Unsigned Integer (7.1.9 n-bit Unsigned Integer) representing a signed integer offset by 896 ( = 14 * 64 ) where TZHours is a value in the range [-14 .. 14] and TZMinutes is a value in the range [-59 .. 59]
presenceBoolean presence indicatorBoolean (7.1.2 Boolean)

Section 7.1.5

27 June 2013

Below is the second paragraph of section 7.1.5 Integer:

If the associated schema datatype is derived from xsd:integer and the bounded range determined by its minInclusiveXS2, minExclusiveXS2, maxInclusiveXS2 and maxExclusiveXS2 facets has 4096 or fewer values, the value is represented as an n-bit Unsigned Integer where n is ⌈ log2 m ⌉ and m is the bounded range of the schema datatype.

Change the paragraph to the one shown below by appending the highlighted text:

If the associated schema datatype is derived from xsd:integer and the bounded range determined by its minInclusiveXS2, minExclusiveXS2, maxInclusiveXS2 and maxExclusiveXS2 facets has 4096 or fewer values, the value is represented as an n-bit Unsigned Integer offset from the minimum value in the range where n is ⌈ log2 m ⌉ and m is the bounded range of the schema datatype.

Section 8.5.3

19 August 2013 (1)

Remove the last sentence from Section "8.5.3 Schema-informed Element Fragment Grammar" that reads:

The content index of grammars ElementFragment and ElementFragmentTypeEmpty are both 1 (one).

Section 8.5.4.1.3

19 August 2013 (2)

Remove the third paragraph from Section "8.5.4.1.3 Type Grammars" that reads:

[Definition:]  For each type grammar Type i , an unique index number content is determined such that all non-terminal symbols of indices smaller than content have at least one AT terminal symbol and the rest of the non-terminal symbols in Type i  do not have AT terminal symbols on their right-hand side, where indices are assigned to non-terminal symbols in ascending order with the entry non-terminal symbol of Type i  being assigned index 0 (zero). There is also a content index associated with each TypeEmpty i  where its value is determined in the same manner as for Type i .

Section 8.5.4.1.3.1

19 August 2013 (3)

Remove the last sentence from Section "8.5.4.1.3.1 Simple Type Grammars" that reads:

The content index of grammar Type_i and TypeEmpty_i created from an XML Schema simple type definition is always 0 (zero).

Section 8.5.4.1.3.2

19 August 2013 (4)

An excerpt from Section "8.5.4.1.3.2 Complex Type Grammars" is given below:

The grammar TypeEmpty i is created by combining the sequence of attribute use grammars terminated by an empty {content type} grammar as follows:

TypeEmpty i = G 0G 1 ⊕ … ⊕ G n−1Content i

where the grammar Content i is created as follows:

Content i, 0 :
EE

The content index of grammar TypeEmpty i  is the index of its last non-terminal symbol.

Remove the last sentence from this excerpt that reads:

The content index of grammar TypeEmpty i  is the index of its last non-terminal symbol.

Also remove the last sentence from Section "8.5.4.1.3.2 Complex Type Grammars" that reads:

The content index of grammar Type i  created from an XML Schema complex type definition is the index of the first non-terminal symbol of Content i within the context of Type i .

Section 8.5.4.2.2

19 August 2013 (5)

Remove the paragraph from Section "8.5.4.2.2 Eliminating Duplicate Terminal Symbols" that reads:

When G i  is a type grammar, if both k and l are smaller than content index of G i , k ⊔ l is also considered to be smaller than content for the purpose of index comparison purposes. Otherwise, if either k or l is not smaller than content, k ⊔ l is considered to be larger than content.

Section 8.5.4.4.1

19 August 2013 (6)

Insert the following text as a second paragraph in Section "8.5.4.4.1 Adding Productions when Strict is False" right after the first sentence that reads:

This section describes the process for augmenting the normalized grammars when the value of the strict option is false.

insert the following paragraph:

[Definition:] For each normalized element grammar Element_i , an unique index number content is determined such that: for each set of grammar productions with left-hand side non-terminal symbol of index smaller than content there is at least one production with AT terminal symbol and the rest of the productions in Element_i with left-hand side non-terminal symbols of indices equal or greater than content do not have AT terminal symbols. The left-hand side non-terminal symbols indices are assigned in ascending order with the entry non-terminal symbol of Element_i being assigned index 0 (zero). If there are no productions in Element_i that have AT terminal symbols on their right-hand side, the content index is 0.

Modify the second sentence from Section "8.5.4.4.1 Adding Productions when Strict is False" that reads:

For each normalized element grammar Element_i , create a copyElement_i,content2 of Element_i,content where the index "content" is the content of the type of the element from which Element_i was created.

changed it to:

For each normalized element grammar Element_i , create a copyElement_i,content2 of Element_i,content where the index "content" is the content of the Element_i grammar.

A. Errata Changes (in reverse chronological order)