This document is also available in these non-normative formats: XML .
Copyright
© 2008
© 2009
W3C
®
(
MIT
,
ERCIM
,
Keio
),
All
Rights
Reserved.
W3C
liability
,
trademark
and
document
use
rules
apply.
This
document
is
the
specification
of
the
Efficient
XML
Interchange
(EXI)
format.
EXI
is
a
very
compact
representation
for
the
Extensible
Markup
Language
(XML)
Information
Set
that
is
intended
to
simultaneously
optimize
performance
and
the
utilization
of
computational
resources.
The
EXI
format
uses
a
hybrid
approach
drawn
from
the
information
and
formal
language
theories,
plus
practical
techniques
verified
by
measurements,
for
entropy
encoding
XML
information.
Using
a
relatively
simple
algorithm,
which
is
amenable
to
fast
and
compact
implementation,
and
a
small
set
of
data
types,
datatype
representations,
it
reliably
produces
efficient
encodings
of
XML
event
streams.
The
event
grammar
production
system
and
format
definition
of
EXI
are
presented.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This
is
the
Last
Call
Public
Working
Draft
Candidate
Recommendation
of
the
Efficient
XML
Interchange
(EXI)
Format
1.0.
It
is
made
available
for
review
by
W3C
members
and
other
interested
parties.
It
has
been
produced
by
the
Efficient
XML
Interchange
(EXI)
EXI
Working
Group
,
which
is
part
of
the
Extensible
Markup
Language
(XML)
Activity
.
A
summary
list
of
changes
made
W3C publishes a Candidate Recommendation to indicate that the document is believed to be stable and to encourage implementation by the developer community. Publication as a Candidate Recommendation does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
The list of changes since the last publication is exhibited in the Change Log . A diff-marked version against the previous version of this document is also available.
The
EXI
Working
Group
intends
plans
to
advance
submit
this
specification
to
for
consideration
as
a
W3C
Proposed
Recommendation
status.
In
addition,
the
group
has
produced
two
draft
notes,
publications
of
which
are
part
of
when
the
following
exit
criteria
for
this
specification
to
enter
Last
Call
status.
Those
notes
have
been
met:
A
test
suite
is
available
that
tests
each
analyze
the
impacts
of
the
new
identified
EXI
format
on
existing
XML
technologies
[EXI
Impacts
Note]
,
1.0
feature,
both
required
and
the
evaluation
optional.
At
least
two
implementations,
at
least
one
of
performance
gains
which
is
publicly
available,
have
demonstrated
interoperability
of
the
format
each
feature.
The
working
group
will
create
an
implementation
report
based
on
the
criteria
defined
by
the
XBC
Working
Group
[EXI
Evaluation
Note]
.
gathered
evidence
and
make
it
available
on
its
group
web
page.
The
features
and
algorithms
described
in
Working
Group
has
responded
formally
to
all
issues
raised
against
this
document
are
considered
stable
at
during
the
time
of
this
writing.
However,
Candidate
Recommendation
review
period.
There
is
no
formal
implementation
report
available
at
the
mechanism
described
in
section
present
time.
The following feature is considered to be at risk:
Deriving Set of Characters from XML Schema Regular Expressions
The
above
feature
may
be
subject
to
change.
This
mechanism
caps
removed
in
the
amount
subsequent
revisions
of
memory
used
for
value
partitions
this
specification
if
it
is
found
that
implementations
exhibit
lack
of
interoperability
in
string
tables.
It
should
be
considered
a
regards
of
the
feature
at
risk
during
the
execution
of
the
interoperability
tests,
and
may
later
the
problem
cannot
be
altered
or
replaced
if
(and
only
if)
recouped
by
simply
clarifying
the
Working
Group
identifies
another
mechanism
that
provides
even
better
efficiency.
existing
descriptions
of
the
feature
but
would
rather
entail
any
semantic
changes
thereof.
Any
Implementers
are
encouraged
to
provide
feedback
on
this
specification
is
welcome.
by
01
March
2010.
Please
send
comments
about
this
document
comment
by
email
to
public-exi-comments@w3.org
(
,
a
mailing
list
with
a
public
archive
).
When
preparing
comments
to
send
in,
please
provide
a
separate
email
message
for
each
distinct
issue
to
the
extent
possible.
The
Last
Call
review
period
for
this
document
extends
until
07
November
2008.
Publication
as
a
Working
Draft
does
not
imply
endorsement
by
the
W3C
Membership.
.
This
mailing
list
is
a
draft
document
and
may
be
updated,
replaced
or
obsoleted
by
other
documents
at
any
time.
It
reserved
for
comments,
it
is
inappropriate
to
cite
send
discussion
email
to
this
document
as
other
than
work
in
progress.
address.
Discussion
should
take
place
on
the
public-exi@w3.org
mailing
list
(
public
archive
).
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy . W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy .
1.
Introduction
1.1
History
and
Design
1.2
Notational
Conventions
and
Terminology
2.
Design
Principles
3.
Basic
Concepts
4.
EXI
Streams
5.
EXI
Header
5.1
EXI
Cookie
5.2
Distinguishing
Bits
5.3
EXI
Format
Version
5.4
EXI
Options
6.
Encoding
EXI
Streams
6.1
Determining
Event
Codes
6.2
Representing
Event
Codes
6.3
Fidelity
Options
7.
Representing
Event
Content
7.1
Built-in
EXI
Datatype
Representations
7.1.1
Binary
7.1.2
Boolean
7.1.3
Decimal
7.1.4
Float
7.1.5
Integer
7.1.6
Unsigned
Integer
7.1.7
QName
7.1.8
Date-Time
7.1.9
n-bit
Unsigned
Integer
7.1.10
String
7.1.10.1
Restricted
Character
Sets
7.1.11
List
7.2
Enumerations
7.3
String
Table
7.3.1
String
Table
Partitions
7.3.2
Partitions
Optimized
for
Frequent
use
of
Compact
Identifiers
7.3.3
Partitions
Optimized
for
Frequent
use
of
String
Literals
7.4
Datatype
Representation
Map
8.
EXI
Grammars
8.1
Grammar
Notation
8.1.1
Fixed
Event
Codes
8.1.2
Variable
Event
Codes
8.2
Grammar
Event
Codes
8.3
Pruning
Unneeded
Productions
8.4
Built-in
XML
Grammars
8.4.1
Built-in
Document
Grammar
8.4.2
Built-in
Fragment
Grammar
8.4.3
Built-in
Element
Grammar
8.5
Schema-informed
Grammars
8.5.1
Schema-informed
Document
Grammar
8.5.2
Schema-informed
Fragment
Grammar
8.5.3
Schema-informed
Element
Fragment
Grammar
8.5.4
Schema-informed
Element
and
Type
Grammars
8.5.4.1
EXI
Proto-Grammars
8.5.4.1.1
Grammar
Concatenation
Operator
8.5.4.1.2
Element
Grammars
8.5.4.1.3
Type
Grammars
8.5.4.1.3.1
SimpleType
Simple
Type
Grammars
8.5.4.1.3.2
Complex
Type
Grammars
8.5.4.1.3.3
Complex
Ur-Type
Grammar
8.5.4.1.4
Attribute
Uses
8.5.4.1.5
Particles
8.5.4.1.6
Element
Terms
8.5.4.1.7
Wildcard
Terms
8.5.4.1.8
Model
Group
Terms
8.5.4.1.8.1
Sequence
Model
Groups
8.5.4.1.8.2
Choice
Model
Groups
8.5.4.1.8.3
All
Model
Groups
8.5.4.2
EXI
Normalized
Grammars
8.5.4.2.1
Eliminating
Productions
with
no
Terminal
Symbol
8.5.4.2.2
Eliminating
Duplicate
Terminal
Symbols
8.5.4.3
Event
Code
Assignment
8.5.4.4
Undeclared
Productions
8.5.4.4.1
Adding
Productions
when
Strict
is
False
8.5.4.4.2
Adding
Productions
when
Strict
is
True
9.
EXI
Compression
9.1
Blocks
9.2
Channels
9.2.1
Structure
Channel
9.2.2
Value
Channels
9.3
Compressed
Streams
10.
Conformance
10.1
EXI
Stream
Conformance
10.2
EXI
Processor
Conformance
A
References
A.1
Normative
References
A.2
Other
References
B
Infoset
Mapping
B.1
Document
Information
Item
B.2
Element
Information
Items
B.3
Attribute
Information
Item
B.4
Processing
Instruction
Information
Item
B.5
Unexpanded
Entity
Reference
Information
item
B.6
Character
Information
item
B.7
Comment
Information
item
B.8
Document
Type
Declaration
Information
item
B.9
Unparsed
Entity
Information
Item
B.10
Notation
Information
Item
B.11
Namespace
Information
Item
C
XML
Schema
for
EXI
Options
Header
Document
D
Initial
Entries
in
String
Table
Partitions
D.1
Initial
Entries
in
Uri
Partition
D.2
Initial
Entries
in
Prefix
Partitions
D.3
Initial
Entries
in
Local-Name
Partitions
E
Deriving
Character
Sets
Set
of
Characters
from
XML
Schema
Regular
Expressions
F
Content
Coding
and
Internet
Media
Type
F.1
Content
Coding
F.2
Internet
Media
Type
G
Example
Encoding
(Non-Normative)
H
Schema-informed
Grammar
Examples
(Non-Normative)
H.1
Proto-Grammar
Examples
H.2
Normalized
Grammar
Examples
H.3
Complete
Grammar
Examples
I
Recent
Specification
Changes
(Non-Normative)
I.1
Changes
from
Last
Call
Working
Draft
I.2
Changes
from
Fourth
Public
Working
Draft
I.2
I.3
Changes
from
Third
Public
Working
Draft
I.3
I.4
Changes
from
Second
Public
Working
Draft
I.4
I.5
Changes
from
First
Public
Working
Draft
J
Acknowledgements
(Non-Normative)
The Efficient XML Interchange (EXI) format is a very compact, high performance XML representation that was designed to work well for a broad range of applications. It simultaneously improves performance and significantly reduces bandwidth requirements without compromising efficient use of other resources such as battery life, code size, processing power, and memory.
EXI
uses
a
grammar-driven
approach
that
achieves
very
efficient
encodings
using
a
straightforward
encoding
algorithm
and
a
small
set
of
data
types.
datatype
representations.
Consequently,
EXI
processors
are
relatively
simple
and
can
be
implemented
on
devices
with
limited
capacity.
EXI is schema "informed", meaning that it can utilize available schema information to improve compactness and performance, but does not depend on accurate, complete or current schemas to work. It supports arbitrary schema extensions and deviations and also works very effectively with partial schemas or in the absence of any schema. The format itself also does not depend on any particular schema language, or format, for schema information.
[Definition:]
A
program
module
called
an
EXI
processor
,
whether
it
is
part
of
a
software
or
a
hardware,
is
used
by
application
programs
to
encode
their
structured
data
into
EXI
streams
and/or
to
decode
EXI
streams
to
make
the
structured
data
accessible
to
them.
accessible.
The
former
and
the
latter
of
the
aforementioned
roles
of
EXI
processors
are
each
called
[Definition:]
EXI
stream
encoder
and
[Definition:]
EXI
stream
decoder
.
respectfully.
This
document
not
only
specifies
the
EXI
format,
but
also
defines
errors
that
EXI
processors
are
required
to
detect
and
behave
upon.
The primary goal of this document is to define the EXI format completely without leaving ambiguity so as to make it feasible for implementations to interoperate. As such, the document lends itself to describing the design and features of the format in a systematic manner, often declaratively with relatively few prosaic annotations and examples. Those readers who prefer a step-by-step introduction to the EXI format design and features are suggested to start with the non-normative [EXI Primer] .
EXI is the result of extensive work carried out by the W3C's XML Binary Characterization (XBC) and Efficient XML Interchange (EXI) Working Groups. XBC was chartered to investigate the costs and benefits of an alternative form of XML, and formulate a way to objectively evaluate the potential of a substitute format for XML. Based on XBC's recommendations, EXI was chartered, first to measure, evaluate, and compare the performance of various XML technologies (using metrics developed by XBC [XBC Measurement Methodologies] ), and then, if it appeared suitable, to formulate a recommendation for a W3C format specification. The measurements results and analyses, are presented elsewhere [EXI Measurements Note] . The format described in this document is the specification so recommended.
The
functional
requirements
of
the
EXI
format
are
those
that
were
prepared
by
the
XBC
WG
in
their
analysis
of
the
desirable
properties
of
a
high
performance
encoding
representation
for
XML
[XBC
Properties]
.
Those
properties
were
derived
from
a
very
broad
set
of
use
cases
also
identified
by
the
XBC
working
group
[XBC
Use
Cases]
.
The design of the format presented here, is largely based on the results of the measurements carried out by the group to evaluate the performance characteristics (mainly of processing efficiency and compactness) of various existing formats. The EXI format is based on Efficient XML [Efficient XML] , including for example the basis heuristic grammar approach, compression algorithm, and resulting entropy encoding.
EXI is compatible with XML at the XML Information Set [XML Information Set] level, rather than at the XML syntax level. This permits it to encapsulate an efficient alternative syntax and grammar for XML, while facilitating at least the potential for minimizing the impact on XML application interoperability.
The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL, when they appear EMPHASIZED in this document, are to be interpreted as described in RFC 2119 [IETF RFC 2119] . Other terminology used to describe the EXI format is defined in the body of this specification.
The term event and stream is used throughout this document to denote EXI event and EXI stream respectively unless the words are qualified differently to mean otherwise.
This document specifies an abstract grammar for EXI. In grammar notation, all terminal symbols are represented in plain text and all non-terminal symbols are represented in italics . Grammar productions are represented as follows:
LeftHandSide
:
|
A
set
of
one
or
more
grammar
productions
that
share
the
same
left-hand-side
left-hand
side
non-terminal
symbol
are
often
presented
together
along
annotated
with
event
codes
that
uniquely
identify
specify
how
events
among
matching
the
collocated
terminal
symbols
of
the
associated
productions
are
represented
in
the
EXI
stream
as
follows:
LeftHandSide : | |||
|
EventCode
| ||
|
EventCode
| ||
|
EventCode
| ||
... | |||
|
EventCode
|
Section
8.1
Grammar
Notation
introduces
additional
notations
for
describing
productions
and
event
codes
in
grammars.
Those
additional
notations
facilitates
facilitate
concise
representation
of
the
EXI
grammar
system.
[Definition:]
In
this
document,
the
term
qname
is
used
to
denote
a
QName
XS2
.
When
used
to
qualify
terminal
symbols
in
grammars
(see
Table
4-1
for
notation),
to
identify
built-in
element
grammars
(see
8.4.3
Built-in
Element
Grammar
)
and
global
type
grammars
(see
8.5.4.1.3
Type
Grammars
),
or
to
distinguish
value
channels
in
EXI
compression
(see
9.2.2
Value
Channels
),
such
uses
of
qname
represent
QName
values,
which
values
are
tuples
composed
of
{
uri,
local-name
}.
Otherwise,
a
qname
represents
a
QName
value
affixed
with
a
prefix
part
to
make
a
triplet
of
{
prefix,
uri,
local-name
},
where
the
absence
of
prefix
is
indicated
by
""
(an
empty
string).
a
localname
and
an
optional
prefix.
Two
qnames
are
considered
equal
when
if
they
have
the
same
uri
and
the
same
local-name
to
each
other
localname,
regardless
of
their
prefix
values.
In
cases
where
prefixes
are
not
relevant,
such
as
in
the
grammar
notation,
they
are
not
specified
by
this
document.
Terminal symbols that are qualified with a qname permit the use of a wildcard symbol (*) in place of or as part of a qname. The forms of terminal symbols involving qname wildcards used in grammars and their definitions are described in the table below.
Wildcard | Definition |
---|---|
| The terminal symbol that matches a start element (SE) event with any qname. |
SE ( uri : *) | The terminal symbol that matches a start element (SE) event with any local-name in namespace uri . |
| The terminal symbol that matches an attribute (AT) event with any qname. |
AT ( uri : *) | The terminal symbol that matches an attribute (AT) event with any local-name in namespace uri . |
Several prefixes are used throughout this document to designate certain namespaces. The bindings shown below are assumed, however, any prefixes can be used in practice if they are properly bound to the namespaces.
Prefix | Namespace Name |
---|---|
exi |
|
xsd | http://www.w3.org/2001/XMLSchema |
xsi | http://www.w3.org/2001/XMLSchema-instance |
In describing the layout of an EXI format construct, a pair of square brackets [ ] are used to surround the name of a field to denote that the occurrence of the field is optional in the structure of the part or component that contains the field.
In arithmetic expressions, the notation ⌈ x ⌉ where x represents a real number denotes the ceiling of x , that is, the smallest integer greater than or equal to x .
When it is stated that strings are sorted in lexicographical order, it is done so character by character, and the order among characters are determined by comparing their Unicode codepoints.
The following design principles were used to guide the development of EXI and encourage consistent design decisions. They are listed here to provide insight into the EXI design rationale and to anchor discussions on desirable EXI traits.
One of primary objectives of EXI is to maximize the number of systems, devices and applications that can communicate using XML data. Specialized approaches optimized for specific use cases should be avoided.
To reach the broadest set of small, mobile and embedded applications, simple, elegant approaches are preferred to large, analytical or complex ones.
EXI must be competitive with hand-optimized binary formats so it can be used by applications that require this level of efficiency.
EXI must deal flexibly and efficiently with documents that contain arbitrary schema extensions or deviate from their schema. Documents that contain schema deviations should not cause encoding to fail.
EXI must integrate well with existing XML technologies, minimizing the changes required to those technologies. It must be compatible with the XML Information Set [XML Information Set] , without significant subsetting or supersetting, in order to maintain interoperability with existing and prospective XML specifications.
EXI achieves broad generality, flexibility, and performance, by unifying concepts from formal language theory and information theory into a single, relatively simple algorithm. The algorithm uses a grammar to determine what is likely to occur at any given point in an XML document and encodes the most likely alternatives in fewer bits. The fully generalized algorithm works for any language that can be described by a grammar (e.g., XML, Java, HTTP, etc.); however, EXI is optimized specifically for XML languages.
The
built-in
EXI
grammar
accepts
grammars
accept
any
XML
document
or
fragment
and
may
be
augmented
with
productions
derived
from
XML
Schemas
[XML
Schema
Structures]
[XML
Schema
Datatypes]
,
RELAX
NG
schemas
[ISO/IEC
19757-2:2003]
,
DTDs
[XML
1.0]
[XML
1.1]
or
other
sources
of
information
about
what
is
likely
to
occur
in
a
set
of
XML
documents.
The
EXI
stream
encoder
uses
the
grammar
to
map
a
stream
of
XML
information
items
onto
a
smaller,
lower
entropy,
stream
of
events.
events
.
The EXI stream encoder then represents the stream of events using a set of simple variable length codes called event codes . Event codes are similar to Huffman codes [Huffman Coding] , but are much simpler to compute and maintain. They are encoded directly as a sequence of values, or if additional compression is desired, they are passed to the EXI compression algorithm, which replaces frequently occurring event patterns to further reduce size.
When
schemas
are
used,
EXI
also
supports
a
user-customizable
set
of
typed
encodings
Datatype
Representations
for
efficiently
encoding
representing
typed
values.
[Definition:]
An
EXI
stream
is
an
EXI
header
followed
by
an
EXI
body.
[Definition:]
It
is
the
The
EXI
body
that
carries
the
content
of
the
document,
while
the
EXI
header
amongst
its
roles
communicates
the
options
that
were
used
for
encoding
the
EXI
body.
Section
5.
EXI
Header
describes
the
EXI
header
.
Values
in
an
EXI
stream
are
packed
into
bytes
most
significant
bit
first.
[Definition:] The building block of an EXI body is an EXI event . An EXI body consists of a sequence of EXI events representing an EXI document or an EXI fragment .
The EXI events permitted at any given position in an EXI stream are determined by the EXI grammar. As is the case with XML, the events occur with nesting pairs of matching start element and end element events where any pair does not intersect with another except when it is fully contained in the other. The EXI grammar incorporates knowledge of the XML grammar and may be augmented and refined using schema information and fidelity options. The EXI grammar is formally specified in section 8. EXI Grammars .
The
An
EXI
grammars
either
permits
only
body
can
represent
an
EXI
document
with
a
single
root
element
or
multiple
root
elements
in
an
EXI
body,
depending
on
the
top-level
grammar
used
for
processing
the
body.
fragment
with
zero
or
more
root
elements.
[Definition:]
EXI
documents
are
EXI
bodies
encoded
using
either
with
a
single
root
element
that
conform
to
the
Built-in
Document
Grammar
(See
8.4.1
Built-in
Document
Grammar
)
or
Schema-informed
Document
Grammar
(See
8.5.1
Schema-informed
Document
Grammar
),
and
are
inherently
restricted
to
each
contain
only
a
single
root
element
as
per
the
grammars.
).
[Definition:]
EXI
fragments
are
EXI
bodies
encoded
using
either
with
zero
or
more
root
elements
that
conform
to
the
Built-in
Fragment
Grammar
(See
8.4.2
Built-in
Fragment
Grammar
)
or
Schema-informed
Fragment
Grammar
(See
8.5.2
Schema-informed
Fragment
Grammar
),
and
are
permitted
to
each
contain
multiple
root
elements.
).
[Definition:]
When
schema
information
is
available
to
describe
the
contents
of
an
EXI
body,
such
an
EXI
stream
is
a
schema-informed
EXI
stream
,
and
either
the
EXI
body
is
interpreted
according
to
the
Schema-informed
Document
Grammar
(See
8.5.1
Schema-informed
Document
Grammar
)
or
Schema-informed
Fragment
Grammar
(See
8.5.2
Schema-informed
Fragment
Grammar
)
is
used
to
process
the
EXI
body.
).
[Definition:]
Otherwise,
an
EXI
stream
is
a
schema-less
EXI
stream
,
and
either
the
EXI
body
is
interpreted
according
to
the
Built-in
Document
Grammar
(See
8.4.1
Built-in
Document
Grammar
)
or
Built-in
Fragment
Grammar
(See
8.4.2
Built-in
Fragment
Grammar
)
is
used
to
process
the
EXI
body.
).
The
following
table
summarizes
the
EXI
events
event
types
and
associated
event
content
that
occur
in
an
EXI
stream.
[Definition:]
The
content
of
an
event
consists
of
content
items
,
and
the
content
items
appear
in
an
EXI
stream
in
the
order
they
are
shown
in
the
table.
table
following
their
respective
event
codes
that
each
marks
the
start
of
an
event
.
In
addition,
the
table
includes
the
grammar
notation
used
to
represent
each
event
in
this
specification.
Each
event
in
an
EXI
stream
participates
in
a
mapping
system
that
relates
events
to
XML
Information
Items
so
that
an
EXI
document
or
an
EXI
fragment
as
a
whole
serves
to
represent
an
XML
Information
Set.
The
table
shows
XML
Information
Items
relevant
to
each
EXI
event
type.
event.
Appendix
B
Infoset
Mapping
describes
the
mapping
system
in
detail.
EXI Event Type | Event Content (Content Items) | Grammar Notation (Terminal Symbols) | Information Item |
---|---|---|---|
Start Document | SD | B.1 Document Information Item | |
End Document | ED | ||
Start Element | qname |
SE
| B.2 Element Information Items |
| |||
| |||
End Element | EE | ||
Attribute | qname, value |
| B.3 Attribute Information Item |
| |||
AT (
| |||
Characters | value | CH | B.6 Character Information item |
Namespace Declaration | uri , prefix , local-element-ns | NS | B.11 Namespace Information Item |
Comment | text | CM | B.7 Comment Information item |
Processing Instruction | name, text | PI | B.4 Processing Instruction Information Item |
DOCTYPE | name, public, system, text | DT | B.8 Document Type Declaration Information item |
Entity Reference | name | ER | B.5 Unexpanded Entity Reference Information item |
Self Contained | SC |
Section 6. Encoding EXI Streams describes the algorithm used to encode events in the EXI stream. As indicated in the table above, there are some event types that carry content with their event instances while other event types function as markers without content.
SE events may be followed by a series of NS events. Each NS event either associates a prefix with an URI, assigns a default namespace, or in the case of a namespace declaration with an empty URI, rescinds one of such associations in effect at the point of its occurrence. The effect of the association or disassociation caused by a NS event stays in effect until the corresponding EE event occurs.
Like
XML,
the
namespace
of
a
particular
element
may
be
specified
by
a
namespace
declaration
preceeding
preceding
the
element
or
a
local
namespace
declaration
following
the
element
name.
When
the
namespace
is
specified
by
a
local
namespace
declaration,
the
local-element-ns
flag
of
the
associated
NS
event
is
set
to
true
and
the
prefix
of
the
element
is
set
to
the
prefix
of
that
NS
event.
When
the
namespace
is
specified
by
a
previous
namespace
declaration,
the
local-element-ns
flag
of
all
local
NS
events
is
false
and
the
prefix
of
the
element
is
set
according
to
the
prefix
component
of
the
element
qname
.
The
series
of
NS
events
associated
with
a
particular
element
may
include
at
most
one
NS
event
with
its
local-element-ns
flag
set
to
true.
The
uri
of
a
NS
event
with
its
local-element-ns
flag
set
to
true
MUST
match
the
uri
of
the
associated
SE
event.
An SE event may be followed by a SC event, indicating the element is self-contained and can be read independently from the rest of the EXI body. Applications may use self-contained elements to index portions of the EXI body for random access.
The
representation
of
event
codes
which
identify
the
event
type
and
start
each
event
is
described
in
6.2
Representing
Event
Codes
.
Each
item
in
the
event
content
has
a
data
type
datatype
representation
associated
with
it
as
shown
in
the
following
table.
The
content
of
each
event,
event
,
if
any,
is
encoded
as
a
sequence
of
items
each
of
which
being
encoded
according
to
its
data
type
datatype
representation
in
order
starting
with
the
first
item
followed
by
subsequent
items.
Content item | Used in |
|
---|---|---|
name | PI, DT, ER | 7.1.10 String |
prefix | NS | 7.1.10 String |
local-element-ns | NS | 7.1.2 Boolean |
public | DT | 7.1.10 String |
qname | SE, AT | 7.1.7 QName |
system | DT | 7.1.10 String |
text | CM, PI | 7.1.10 String |
uri | NS | 7.1.10 String |
value | CH, AT |
According
to
the
schema
|
Content
items
other
than
value
have
their
inherent,
fixed
data
types
independent
of
their
uses.
The
data
type
that
governs
datatype
representation
used
for
each
occurrence
of
the
value
content
item
depends
on
the
schema
type
datatype
XS2
if
any
that
is
in
effect
for
the
that
value
in
question.
.
The
type
xsd:anySimpleType
String
datatype
representation
(see
7.1.10
String
)
is
used
for
value
values
s
that
do
not
have
an
associated
schema-type,
schema
datatype,
cannot
be
or
are
schema-invalid,
opted
not
to
be
represented
by
their
associated
datatype
representations,
or
occur
in
mixed
content.
Section
7.
Representing
Event
Content
describes
how
each
of
the
types
listed
above
are
encoded
in
an
EXI
stream.
Editorial note | |
The
syntax
and
semantics
of
the
NS
event
|
Each EXI stream begins with an EXI header. [Definition:] The EXI header can identify EXI streams, distinguish EXI streams from text XML documents, identify the version of the EXI format being used, and specify the options used to process the body of the EXI stream. The EXI header has the following structure:
[ EXI Cookie ] | Distinguishing Bits | Presence Bit | EXI Format | [ EXI Options ] | [Padding Bits] |
for EXI Options | Version |
The EXI Options field within an EXI header is optional. Its presence is indicated by the value of the presence bit that follows Distinguishing Bits . The presence and absence is indicated by the value 1 and 0, respectively.
When
either
the
compression
option
is
used,
true,
or
the
alignment
used
option
is
one
of
byte-alignment
or
pre-compression
as
dictated
by
EXI
Options
,
padding
bits
of
minumum
length
required
to
make
the
whole
length
of
the
header
byte-aligned
are
added
at
the
end
of
the
header.
On
the
other
hand,
there
are
no
padding
bits
when
the
alignment
in
use
is
bit-packed
.
The
padding
bits
field
if
it
is
present
can
contain
any
values
of
bits
as
its
contents.
The details of the EXI Cookie , Distinguishing Bits , EXI Format Version and EXI Options are described in the following sections.
[Definition:] An EXI header MAY start with an EXI Cookie , which is a four byte field that serves to indicate that the stream of which it is a part is an EXI stream. The four byte field consists of four characters " $ " , " E ", " X " and " I " in that order, each represented as an ASCII octet, as follows.
' $ ' | ' E ' | ' X ' | ' I ' |
This
four
byte
sequence
is
particular
to
EXI
and
specific
enough
to
distinguish
EXI
streams
from
a
broad
range
of
data
types
currently
used
on
the
Web.
While
the
EXI
cookie
is
optional,
its
use
is
RECOMMENDED
in
the
EXI
header
when
the
EXI
stream
is
exchanged
in
a
context
where
a
longer,
more
solid
specific
content-based
datatype
identification
identifier
is
desired
than
what
is
that
provided
by
the
Distinguishing
Bits
,
whose
role
is
rather
more
narrowly
focused
on
distinguishing
EXI
streams
from
XML
documents.
[Definition:] The second part in the EXI header is the Distinguishing Bits , which is a two bit field of which the first bit contains the value 1 and the second bit contains the value 0, as follows.
1 | 0 |
Unlike
the
optional
EXI
cookie
that
MAY
occur
to
precede
this
field,
the
presence
of
Distinguishing
Bits
is
REQUIRED
in
the
EXI
header.
It
is
used
to
distinguish
EXI
streams
from
text
XML
documents
in
the
absence
of
an
EXI
cookie
.
This
two
bit
sequence
is
the
minimum
that
suffices
to
distinguish
EXI
streams
from
XML
documents
since
it
is
the
minimum
length
bit
pattern
that
cannot
occur
as
the
first
two
bits
of
a
well-formed
XML
document
represented
in
any
one
of
the
conventional
character
encodings,
such
as
UTF-8,
UTF-16,
UCS-2,
UCS-4,
EBCDIC,
ISO
8859,
Shift-JIS
and
EUC,
according
to
XML
1.0
[XML
1.0]
[XML
1.1]
.
Therefore,
XML
Processors
that
do
not
support
EXI
are
expected
to
reject
an
EXI
stream
as
early
as
they
read
and
process
the
first
byte
from
the
stream.
Systems that use EXI streams as well as XML documents can reliably look at the Distinguishing Bits to determine whether to interpret a particular stream as XML or EXI.
[Definition:]
The
fourth
part
in
the
EXI
header
is
the
EXI
Format
Version
,
which
identifies
the
version
of
the
EXI
format
being
used.
EXI
format
version
numbers
are
integers.
Each
version
of
the
EXI
Format
Specification
specifies
the
corresponding
EXI
format
version
number
to
be
used
by
conforming
implementations.
The
EXI
format
version
number
that
corresponds
with
this
version
of
the
EXI
format
specification
is
0
(zero).
1
(one).
The first bit of the version field indicates whether the version is a preview or final version of the EXI format. A value of 0 indicates this is a final version and a value of 1 indicates this is a preview version. Final versions correspond to final, approved versions of the EXI format specification. An EXI processor that implements a final version of the EXI format specification is REQUIRED to process EXI streams that have a version field with its first bit set to 0 followed by a version number that corresponds to the version of the EXI specification the processor implements. The behavior of an EXI processor on an EXI stream with its first bit set to 0 followed by a version not corresponding to a version implemented by the processor is not constrained by this specification. For example, the EXI processor MAY reject such a stream outright or it MAY attempt to process the EXI body. Preview versions of the EXI format are useful for gaining implementation and deployment experience prior to finalizing a particular version of the EXI format. While preview versions may match drafts of this specification, they are not governed by this specification and the behaviour of EXI processors encountering preview versions of the EXI format is implementation dependent. Implementers are free to coordinate to achieve interoperability between different preview versions of the EXI format.
Following
the
first
bit
of
the
version
is
a
sequence
of
one
or
more
4-bit
unsigned
integers
representing
the
version
number.
The
version
number
is
determined
by
summing
this
sequence
of
4-bit
unsigned
values.
values
and
adding
1
(one).
The
sequence
is
terminated
by
any
4-bit
unsigned
integer
with
a
value
in
the
range
0-14.
As
such,
the
first
15
version
numbers
are
represented
by
4
bits,
the
next
15
are
represented
by
8
bits,
etc.
Given an EXI stream with its stream cursor positioned just past the first bit of the EXI format version field, the EXI format version number can be computed by going through the following steps with version number initially set to 1.
The following are example EXI format version numbers.
EXI Format Version Field | Description |
---|---|
1 0000 | Preview version 1 |
0 0000 | Final version 1 |
0 1110 | Final version 15 |
0 1111 0000 | Final version 16 |
0 1111 0001 | Final version 17 |
EXI processors conforming with the final version of this specification MUST use the 5-bit value 0 0000 as the version number.
[Definition:] The fifth part of the EXI header is the EXI Options , which provides a way to specify the options used to encode the body of the EXI stream . [Definition:] The EXI Options are represented as an EXI Options document , which is an XML document encoded using the EXI format described in this specification. This results in a very compact header format that can be read and written with very little additional software.
The presence of EXI Options in its entirety is optional in EXI header, and it is predicated on the value of the presence bit that follows the Distinguishing Bits . When EXI Options are present in the header, an EXI Processor MUST observe the specified options to process the EXI stream that follows. Otherwise, an EXI Procesor may obtain the EXI options using another mechanism. There are no fallback option values provided by this specification for use in the absence of the whole EXI Options part.
EXI
processors
MAY
provide
external
means
for
applications
or
users
to
specify
EXI
Options
when
the
EXI
header
Options
document
is
absent.
Such
EXI
processors
are
typically
used
in
controlled
systems
where
the
knowledge
about
the
effective
EXI
Options
is
shared
prior
to
the
exchange
of
EXI
streams
.
The
mechanism
mechanisms
to
communicate
out-of-bound
EXI
Options
and
their
representation
used
in
such
systems
are
implementation
dependent.
The
following
table
describes
the
EXI
options
that
may
be
specified
in
the
options
field.
EXI
Options
document.
EXI Option | Description | Default Value |
---|---|---|
alignment | Alignment of event codes and content items | bit-packed |
compression | EXI compression is used to achieve better compactness | false |
strict | Strict interpretation of schemas is used to achieve better compactness | false |
fragment | Body is encoded as an EXI fragment instead of an EXI document | false |
preserve | Specifies whether comments, pis, etc. are preserved | all false |
selfContained | Enables self-contained elements | false |
schemaID | Identify the schema information, if any, used to encode the body |
|
datatypeRepresentationMap |
|
|
blockSize | Specifies the block size used for EXI compression | 1,000,000 |
valueMaxLength | Specifies the maximum string length of value content items to be considered for addition to the string table. | unbounded |
valuePartitionCapacity | Specifies the total capacity of value partitions in a string table | unbounded |
[user
|
User
defined
|
|
Appendix
C
XML
Schema
for
EXI
Options
Header
Document
provides
an
XML
Schema
describing
the
EXI
Options
document
.
This
schema
is
designed
to
produce
smaller
headers
for
option
combinations
used
when
compactness
is
critical.
The
EXI
Options
document
is
encoded
represented
as
an
EXI
body
informed
by
the
above
mentioned
schema
using
the
default
options
specified
by
the
following
XML
document.
An
EXI
Options
document
consists
only
of
an
EXI
body,
and
MUST
NOT
start
with
an
EXI
header.
<header xmlns="http://www.w3.org/2007/07/exi">
<header xmlns="http://www.w3.org/2009/exi"> <strict/> </header>
Note that this specification does not require EXI processors to read and process the schema prescribed for EXI options document ( C XML Schema for EXI Options Document ), in order to process EXI options documents. EXI processors MUST use the schema-informed grammars that stem from the schema in processing EXI options documents, beyond which there is no requirement as to the use of the schema, and implementations are free to use any methods to retrieve the instructions that observe the grammars for processing EXI options documents. Section 8.5 Schema-informed Grammars describes the system to derive schema-informed grammars from XML Schemas.
Below is a brief description of each EXI option.
[Definition:]
The
alignment
option
is
used
to
control
the
alignment
of
event
codes
and
content
items.
items
.
The
value
is
one
of
bit-packed
,
byte-alignment
or
pre-compression
,
of
which
bit-packed
is
the
default
value
assumed
when
the
"alignment"
element
is
absent
in
the
EXI
Options
document
.
When
the
value
of
compression
option
is
set
to
true,
alignment
of
the
way
event
codes
and
associated
contents
are
represented
EXI
Body
is
governed
by
the
rule
rules
specified
in
9.
EXI
Compression
instead
of
the
alignment
option
value,
thus
the
compression
option
value
"true"
effectively
rescinds
the
effect
of
an
alignment
option
value.
The
"alignment"
element
MUST
NOT
appear
in
an
EXI
options
document
when
the
"compression"
element
is
present.
[Definition:]
Alignment
The
alignment
option
value
bit-packed
indicates
that
the
the
event
codes
and
associated
content
are
packed
in
bits
without
any
paddings
padding
in-between.
[Definition:]
Alignment
The
alignment
option
value
byte-alignment
indicates
that
the
event
codes
and
associated
content
are
aligned
on
byte
boundaries.
While
byte-alignment
generally
results
in
EXI
streams
of
larger
sizes
compared
with
their
bit-packed
equivalents,
byte-alignment
may
provide
a
help
in
some
use
cases
that
involve
frequent
copying
of
large
arrays
of
scalar
data
directly
out
of
the
stream.
It
can
also
make
it
possible
to
work
with
data
in-place
and
can
make
it
easier
to
debug
encoded
data
by
allowing
items
on
aligned
boundaries
to
be
easily
located
in
the
stream.
[Definition:]
Alignment
The
alignment
option
value
pre-compression
alignment
indicates
that
all
steps
involved
in
compression
(see
section
9.
EXI
Compression
)
are
to
be
done
with
the
exception
of
the
final
step
of
applying
the
DEFLATE
algorithm.
The
primary
use
case
of
pre-compression
is
to
avoid
a
duplicate
compression
step
when
compression
capability
is
built
into
the
transport
protocol.
In
this
case,
pre-compression
just
prepares
the
stream
for
later
compression.
[Definition:] The compression option is a Boolean used to increase compactness using additional computational resources. The default value "false" is assumed when the "compression" element is absent in the EXI Options document . When set to true, the event codes and associated content are compressed according to 9. EXI Compression regardless of the alignment option value. As mentioned above, the "compression" element MUST NOT appear in an EXI options document when the "alignment" element is present.
[Definition:]
The
strict
option
is
a
Boolean
used
to
increase
compactness
by
using
a
strict
interpretation
of
the
schemas
and
omitting
preservation
of
certain
items,
such
as
comments,
processing
instructions
and
namespace
prefixes.
The
default
value
"false"
is
assumed
when
the
"strict"
element
is
absent
in
the
EXI
Options
document
.
When
set
to
true,
those
productions
that
have
NS,
CM,
PI,
ER
ER,
and
SC
events
terminal
symbols
are
pruned
omitted
from
the
EXI
grammars,
and
schema-informed
element
and
type
grammars
are
restricted
to
only
permit
items
declared
in
the
schemas.
Consequently,
when
the
strict
option
is
set
to
true,
xsi:schemaLocation
and
xsi:noNamespaceSchemaLocation
attributes
are
only
permitted
to
appear
in
a
schema-informed
element
where
they
match
specific
schema
declarations
(i.e.,
wildcards
or
ur-types).
The
"strict"
element
MUST
NOT
appear
in
an
EXI
options
document
when
the
"preserve"
one
of
"dtd",
"prefixes",
"comments",
"pis"
or
"selfContained"
element
is
present
in
the
same
options
document.
[Definition:]
The
fragment
option
is
a
Boolean
that
indicates
whether
the
EXI
body
is
an
EXI
document
or
an
EXI
fragment
.
When
set
to
true,
the
EXI
body
is
an
EXI
fragment
.
Otherwise,
the
EXI
body
is
an
EXI
document
.
Unlike
EXI
documents,
EXI
fragments
are
capable
of
representing
multiple
elements
at
the
root
level.
They
are
analogous
in
concept
to
external
general
parsed
entities
XML
in
XML
in
that
they
consist
of
a
sequence
of
elements,
processing
instructions
and
comments
in
containers
of
their
own
that
are
physically
separate
from
the
documents
in
which
they
are
to
be
used.
An
EXI
fragment
is
formally
defined
in
terms
of
its
grammar
in
Sections
8.4.2
Built-in
Fragment
Grammar
and
8.5.2
Schema-informed
Fragment
Grammar
.
The
XML
Information
Set
an
EXI
stream
default
value
"false"
is
mapped
onto
contains
a
document
information
item
if
the
stream
represents
an
EXI
document,
otherwise,
the
XML
Information
Set
does
not
have
a
document
information
item
if
the
stream
represents
an
EXI
fragment.
The
order
among
elements,
processing
instructions
and
comments
that
appear
at
assumed
when
the
root
in
an
EXI
fragment
"fragment"
element
is
deemed
significant
and
MUST
be
preserved
by
absent
in
the
EXI
processors
Options
document
.
[Definition:]
The
preserve
option
is
a
set
of
Booleans
that
can
be
set
independently
to
control
whether
or
how
certain
information
items
are
preserved
in
the
EXI
stream.
Section
6.3
Fidelity
Options
describes
the
set
of
information
items
effected
affected
by
the
preserve
option.
The
"preserve"
element
elements
"dtd",
"prefixes",
"comments"
and
"pis"
MUST
NOT
appear
in
an
EXI
options
document
when
the
"strict"
element
is
present
in
the
same
options
document.
The
element
"lexicalValues",
on
the
other
hand,
is
permitted
to
occur
in
the
presence
of
"strict"
element.
[Definition:]
The
selfContained
option
is
a
Boolean
used
to
enable
the
use
of
self
contained
self-contained
elements
in
the
EXI
stream.
Self
contained
Self-contained
elements
may
be
read
independently
from
the
rest
of
the
EXI
body,
allowing
them
to
be
indexed
for
random
access.
The
"selfContained"
element
MUST
NOT
appear
in
an
EXI
options
document
when
the
"compression"
or
one
of
"compression",
"pre-compression"
or
"strict"
elements
are
present
in
the
same
options
document.
The
default
value
"false"
is
assumed
when
the
"selfContained"
element
is
abscent
from
the
EXI
Options
document
.
[Definition:]
The
schemaID
option
may
be
used
to
identify
the
schema
information
used
when
encoding
for
processing
the
EXI
body.
When
the
"schemaID"
element
in
the
EXI
options
document
contains
the
xsi:nil
attribute,
attribute
with
its
value
set
to
true,
no
schema
information
was
is
used
when
encoding
for
processing
the
EXI
body.
When
the
value
of
the
"schemaID"
element
is
empty,
no
user
defined
schema
information
was
is
used
when
encoding
for
processing
the
EXI
body;
however,
the
built-in
XML
Schema
schema
types
may
have
been
used
with
are
available
for
use
in
the
xsi:type
attribute
to
specify
element
types.
EXI
body.
When
the
schemaID
option
is
absent
(i.e.,
undefined),
no
statement
is
made
about
the
schema
information
used
to
encode
the
EXI
body
and
this
information
MUST
be
communicated
out
of
band.
This
specification
does
not
dictate
the
syntax
or
semantics
of
other
values
specified
in
this
field.
An
example
schemaID
scheme
is
the
use
of
URI
that
is
apt
for
globally
identifying
schema
resources
on
the
Web.
The
parties
involved
in
the
exchange
are
free
to
agree
on
the
scheme
of
schemaID
field
that
is
appropriate
for
their
use
to
uniquely
identify
the
schema
information.
[Definition:]
The
datatypeRepresentationMap
option
,
represented
by
a
"datatypeRepresentationMap"
element,
identifies
specifies
an
alternate
set
of
datatype
representations
used
to
encode
for
typed
values
in
the
EXI
body
as
described
in
7.4
Datatype
Representation
Map
.
When
there
are
no
"datatypeRepresentationMap"
elements
in
the
EXI
Options
document
,
no
Datatype
Representation
Map
is
used
for
processing
the
EXI
body.
This
option
does
not
take
effect
when
the
value
of
the
preserve.lexicalValues
fidelity
option
is
true
(see
6.3
Fidelity
Options
),
or
when
the
EXI
stream
is
a
schema-less
EXI
stream.
[Definition:]
The
blockSize
option
specifies
the
block
size
used
for
EXI
compression.
When
the
blockSize
option
"blockSize"
element
is
absent,
absent
in
the
EXI
Options
document
,
the
default
blocksize
of
1,000,000
is
used.
The
default
blockSize
is
intentionally
large
but
can
be
reduced
for
processing
large
documents
on
devices
with
limited
memory.
[Definition:]
The
valueMaxLength
option
specifies
the
maximum
length
of
string
values
representing
value
content
items
to
be
considered
for
addition
to
the
string
table.
When
the
valueMaxLength
option
is
absent,
the
maximum
length
is
unbounded.
String
values
representing
value
content
items
that
have
length
larger
than
to
be
considered
for
addition
to
the
valueMaxLength
option
string
table.
The
default
value
are
excluded
from
further
consideration
on
account
of
"unbounded"
is
assumed
when
the
"valueMaxLength"
element
is
absent
in
the
valuePartitionCapacity
EXI
Options
document
for
addition
to
the
string
table.
.
[Definition:]
The
valuePartitionCapacity
option
specifies
the
total
capacity
of
the
global
and
all
local
value
partitions
of
a
string
table,
where
the
measurement
unit
of
the
capacity
is
the
maximum
number
of
unique
enitiries.
When
the
valuePartitionCapacity
option
is
absent,
an
unbounded
capacity
is
assumed.
A
string
representing
a
value
content
item
that
has
length
smaller
than
or
equal
to
items
in
the
string
table
at
any
given
time.
The
default
value
"unbounded"
is
assumed
when
the
"valuePartitionCapacity"
element
is
absent
in
the
valueMaxLength
EXI
Options
document
.
Section
7.3.3
Partitions
Optimized
for
Frequent
use
of
String
Literals
option
value
and
is
not
found
in
the
value
partitions
at
specifies
the
time
behavior
of
the
value
occurrence
is
to
be
added
into
the
string
table
only
when
doing
so
would
not
cause
the
number
of
unique
values
in
value
partitions
to
exceed
the
capacity.
this
capacity
is
reached.
[Definition:]
The
user
defined
meta-data
conveys
auxillary
information
that
applications
may
use
to
facilitate
interpretation
of
valuePartitionCapacity
option
value
and
the
EXI
stream.
.
The
user
defined
meta-data
MUST
NOT
be
interpreted
in
a
way
that
alters
or
extends
the
number
of
unique
values
are
counted
for
value
partitions
are
described
EXI
data
format
defined
in
this
specification.
User
defined
meta-data
may
be
added
to
an
EXI
Options
document
just
prior
to
the
7.3.1
String
Table
Partitions
.
alignment
option.
The
rules
for
encoding
a
series
of
events
as
an
EXI
stream
are
very
simple
and
are
driven
by
a
declarative
set
of
grammars
that
describes
the
structure
of
an
EXI
stream.
stream
.
Every
event
in
the
stream
is
encoded
using
the
same
set
of
encoding
rules,
which
are
summarized
as
follows:
Self-contained (SC), namespace (NS) and attribute (AT) events associated with a given element occur directly after the start element (SE) event in the following order:
SC | NS | NS | ... | NS | AT (xsi:type) | AT (xsi:nil) | AT | AT | ... | AT |
Namespace
(NS)
events
occur
in
document
order.
AT(xsi:type)
The
xsi:type
and
AT(xsi:nil)
xsi:nil
attributes
occur
before
all
other
AT
events.
In
When
the
grammar
currently
in
effect
for
the
element
is
either
a
schema-less
EXI
stream
built-in
element
grammar
,
(see
8.4.3
Built-in
Element
Grammar
)
or
a
schema-informed
element
fragment
grammar
(see
8.5.3
Schema-informed
Element
Fragment
Grammar
),
the
remaining
attribute
(AT)
events
can
occur
in
any
order.
In
Otherwise,
when
the
grammar
in
effect
is
a
schema-informed
EXI
stream
element
grammar
,
or
a
schema-informed
type
grammar
(see
8.5.4
Schema-informed
Element
and
Type
Grammars
),
the
remaining
attribute
(AT)
events
attributes
can
occur
in
lexical
any
order
that
is
permitted
by
the
grammar,
though
in
practice
they
SHOULD
occur
in
lexicographical
order
sorted
first
by
qname
's
local-name
then
by
qname
's
URI.
uri
for
achieving
better
compactness,
where
a
qname
is
a
qname
of
an
attribute.
EXI uses the same simple procedure described above, to encode well-formed documents, document fragments, schema-valid information items, schema-invalid information items, information items partially described by schemas and information items with no schema at all. Only the grammars that describe these items differ. For example, an element with no schema information is encoded according to the XML grammar defined by the XML specification, while an element with schema information is encoded according to the more specific grammar defined by that schema.
[Definition:]
An
event
code
is
a
sequence
of
1
to
3
non-negative
integers
called
parts.
Each
production
parts
used
to
identify
each
event
in
a
grammar
has
an
event
EXI
stream.
The
EXI
grammars
describe
which
events
may
occur
at
each
point
in
an
EXI
stream
and
associate
an
even
code
that
distinguishes
its
event
from
that
of
other
productions
that
share
the
same
left-hand-side
non-terminal
symbol.
with
each
one.
(See
8.2
Grammar
Event
Codes
for
more
description
of
event
codes.)
Section
6.1
Determining
Event
Codes
describes
in
detail
how
the
grammar
is
used
to
determine
the
event
code
of
an
event.
event
.
Section
6.2
Representing
Event
Codes
describes
in
detail
how
event
codes
are
represented
as
bits.
Section
6.3
Fidelity
Options
describes
available
fidelity
options
and
how
they
effect
affect
the
EXI
stream.
Section
7.
Representing
Event
Content
describes
how
the
typed
event
contents
are
represented
as
bits.
The
structure
of
an
EXI
stream
is
described
by
the
EXI
grammars,
which
are
formally
specified
in
section
8.
EXI
Grammars
.
Each
grammar
defines
which
events
are
permitted
to
occur
at
any
given
point
in
the
EXI
stream
and
provides
a
pre-assigned
event
code
for
each
event.
one.
For
example,
the
grammar
productions
below
describe
the
events
that
can
occur
in
a
schema-informed
EXI
stream
after
the
Start-Document
(SD)
event
provided
there
are
four
global
elements
defined
in
the
schema
and
provide
assign
an
event
code
for
each
event:
one.
See
8.5.1
Schema-informed
Document
Grammar
for
the
process
used
for
generating
the
grammar
productions
below
from
the
schema.
| Event Code | ||
---|---|---|---|
DocContent | |||
SE ("A") DocEnd | 0 | ||
SE ("B") DocEnd | 1 | ||
SE ("C") DocEnd | 2 | ||
SE ("D") DocEnd | 3 | ||
|
| ||
DT DocContent |
| ||
CM DocContent |
| ||
PI DocContent |
|
At
the
point
in
an
EXI
stream
where
the
above
grammar
productions
are
in
effect,
the
event
code
of
Start
Element
"A"
(i.e.
SE("A"))
is
0.
The
event
code
of
a
DOCTYPE
(DT)
event
at
this
point
in
the
stream
is
4.1,
5.0,
and
so
on.
Each event code is represented by a sequence of 1 to 3 parts that uniquely identify an event. Event code parts are encoded in order starting with the first part followed by subsequent parts.
When
the
value
The
i
-th
part
of
compression
option
is
false,
and
an
bit-packed
event
code
alignment
option
is
used
for
the
current
processing
of
the
stream,
the
i
th
part
of
an
event
code
is
encoded
using
the
minimum
number
of
bits
required
to
distinguish
it
from
the
i
th
part
of
the
other
sibling
event
codes
in
the
current
grammar.
Specifically,
the
i
th
part
of
an
event
code
is
encoded
as
an
n
-bit
unsigned
integer
(
7.1.9
n-bit
Unsigned
Integer
),
of
which
where
n
is
⌈
log
⌈ log
2
2
m
⌉
where
⌉
and
m
is
the
number
of
distinct
values
used
as
the
i
th
-th
part
of
its
own
and
all
its
sibling
event
codes
in
the
current
grammar.
Two
event
codes
are
siblings
at
the
i
th
-th
part
if
and
only
if
they
share
the
same
values
in
all
preceding
parts.
All
event
codes
are
siblings
at
the
first
part.
On
the
other
hand,
when
the
If
there
is
only
one
distinct
value
of
compression
option
for
a
given
part,
the
part
is
true,
or
either
byte-alignment
or
omitted
(i.e.,
encoded
in
log
2
1
= 0 bits
= 0 bytes).
For
example,
the
eight
pre-compression
event
codes
alignment
option
is
used,
shown
in
the
i
DocContent
th
part
of
an
event
code
is
encoded
using
grammar
above
have
values
ranging
from
0
to
5
for
the
minimum
number
of
bytes
instead
of
bits
required
first
part.
Six
distinct
values
are
needed
to
distinguish
it
from
identify
the
i
th
first
part
of
the
other
sibling
these
event
codes
in
.
Therefore,
the
current
grammar.
Each
first
part
is
can
be
encoded
as
an
n
-bit
unsigned
integer
(
7.1.9
n-bit
Unsigned
Integer
),
of
which
where
n
is
⌈
log
=
⌈ log
2
2
m
⌉
where
m
is
6 ⌉
= 3.
In
the
number
of
distinct
values
used
as
same
fashion,
the
i
th
part
of
its
own
second
and
all
its
sibling
event
codes
in
the
current
grammar.
The
number
of
bytes
used
for
the
third
part
(if
present)
are
represented
as
n
-bit
unsigned
integer
representation
in
this
case
is
equal
to
⌈
integers
where
n
/
8
⌉.
is
⌈ log
2
2 ⌉
= 1
and
⌈ log
2
2 ⌉
= 1
respectively.
Regardless
of
When
the
values
value
of
the
compression
option
is
false
and
alignment
option
bit-packed
,
if
there
is
only
one
distinct
value
for
a
given
part,
the
part
alignment
is
omitted
(i.e.,
encoded
in
log
2
1
=
0
bits
=
0
bytes).
For
example,
used,
n
-bit
unsigned
integers
are
represented
using
n
bits.
The
first
table
below
illustrates
how
the
nine
event
codes
shown
in
of
each
event
matched
by
the
DocContent
grammar
above
have
a
value
ranging
from
0
to
4
for
their
first
part.
There
are
five
distinct
values
needed
to
identify
the
first
part
of
these
event
codes.
Therefore,
when
EXI
compression
and
alignment
are
not
in
effect,
the
first
part
can
be
encoded
represented
in
⌈
log
2
5
⌉
=
3
bits.
In
the
same
fashion,
this
case.
When
the
number
value
of
bits
used
for
encoding
second
and
third
part
(if
present)
are
calculated
as
⌈
log
2
4
⌉
=
2
bits
and
⌈
log
2
2
⌉
=
1
bits,
respectively.
On
the
other
hand,
when
EXI
compression
option
is
true,
or
either
byte-alignment
or
pre-compression
alignment
option
is
in
effect,
used,
n
-bit
unsigned
integers
are
represented
using
the
minimum
number
of
bytes
used
for
each
part
is
⌈
3
/
8
⌉
=
1
bytes
for
the
first
part,
⌈
2
/
8
⌉
=
1
bytes
for
the
second
part
and
⌈
1
/
8
⌉
=
1
bytes
for
the
third
part.
required
to
store
n
bits.
The
second
table
below
illustrates
how
the
event
codes
of
each
event
in
matched
by
the
DocContent
grammar
above
is
encoded.
are
represented
in
this
case.
Event | Part values | Event Code Encoding | # bits | ||
---|---|---|---|---|---|
SE ("A") | 0 | 000 | 3 | ||
SE ("B") | 1 | 001 | 3 | ||
SE ("C") | 2 | 010 | 3 | ||
SE ("D") | 3 | 011 | 3 | ||
| 4 |
|
|
| |
DT |
|
|
|
| |
CM |
|
| 0 |
|
|
PI |
|
| 1 |
|
|
# distinct values ( m ) |
|
| 2 | ||||
| 3 |
| 1 |
Event | Part values | Event Code Encoding | # bytes | ||
---|---|---|---|---|---|
SE ("A") | 0 | 00000000 | 1 | ||
SE ("B") | 1 | 00000001 | 1 | ||
SE ("C") | 2 | 00000010 | 1 | ||
SE ("D") | 3 | 00000011 | 1 | ||
| 4 |
|
|
|
|
|
|
|
| 2 | |
CM |
|
| 0 |
| 3 |
PI |
|
| 1 |
| 3 |
# distinct values ( m ) |
|
| 2 | ||||
| 1 | 1 | 1 |
Some
XML
applications
do
not
require
the
entire
XML
feature
set
and
would
prefer
to
eliminate
the
overhead
associated
with
unused
features.
For
example,
the
SOAP
1.2
specification
[SOAP
1.2]
prohibits
the
use
of
XML
processing-instructions.
processing
instructions.
In
addition,
there
are
many
data-exchange
use
cases
that
do
not
require
XML
comments
or
DTDs.
Applications
can
use
The
preserve
option
in
EXI
Options
comprises
a
set
of
fidelity
options
options,
each
of
which
independently
controls
the
preservation
(or
preservation
level)
of
a
certain
type
of
information
item.
Applications
can
use
the
preserve
option
to
specify
the
XML
features
set
of
fidelity
options
they
require.
As
specified
in
section
8.3
Pruning
Unneeded
Productions
,
EXI
processors
MUST
use
these
fidelity
options
to
prune
productions
that
match
the
associated
events
that
are
not
required
from
the
grammars,
improving
compactness
and
processing
efficiency.
The
table
below
lists
the
fidelity
options
supported
by
this
version
of
the
EXI
specification
and
describes
the
effect
setting
these
options
has
on
the
EXI
stream.
stream
.
Fidelity option | Effect |
---|---|
Preserve.comments | CM events are preserved |
Preserve.pis | PI events are preserved |
Preserve.dtd | DOCTYPE and ER events are preserved |
Preserve.prefixes | NS events and namespace prefixes are preserved |
Preserve.lexicalValues | Lexical form of element and attribute values is preserved in value content items |
When qualified names NS are used in the value s of AT or CH events in an EXI Stream, the Preserve.prefixes fidelity option SHOULD be turned on to preserve the NS prefix declarations used by these values. See section 4. EXI Streams for the definition of EXI event types and their associated content items .
The
content
event
code
of
each
event
in
an
EXI
body
is
represented
as
a
sequence
of
n
-bit
unsigned
integers
(
7.1.9
n-bit
Unsigned
Integer
).
See
section
6.2
Representing
Event
Codes
for
the
description
of
the
event
code
representation.
The
content
items
of
an
event
are
represented
according
to
its
type
their
datatype
representations
(see
Table
4-2
).
In
the
absence
of
external
type
information,
an
associated
datatype
representation,
attribute
and
character
values
are
typed
represented
as
String.
String
(
7.1.10
String
).
[Definition:]
EXI
defines
a
minimal
set
of
datatype
representations
called
Built-in
EXI
datatype
representations
that
define
how
values
content
items
as
well
as
the
parts
of
an
event
code
are
represented
in
EXI
streams.
When
the
preserve.lexicalValues
option
is
false,
values
are
represented
according
to
their
schema
datatypes
per
Table
7-1
below
using
built-in
EXI
datatype
representations
as
described
in
(see
7.1
Built-in
EXI
Datatype
Representations
.
)
or
user-defined
datatype
representations
(see
7.4
Datatype
Representation
Map
)
associated
with
the
schema
datatypes
XS2
.
Otherwise,
values
are
represented
as
Strings
with
restricted
character
sets
(see
Table
7-2
below).
The
following
table
lists
the
built-in
EXI
datatype
representations,
associated
type
EXI
datatype
identifiers
and
the
XML
Schema
Language
[XML
Schema
Datatypes]
built-in
datatypes
XS2
each
EXI
datatype
representation
is
used
to
represent
by
default.
Built-in EXI Datatype Representation | EXI Datatype ID |
XML
Schema
Datatypes
| |
---|---|---|---|
Binary |
| base64Binary | |
| hexBinary | ||
Boolean |
| boolean | |
Date-Time |
|
dateTime
| |
exi:time |
time
| ||
exi:date |
date
| ||
exi:gYearMonth |
gYearMonth
| ||
exi:gYear |
gYear
| ||
exi:gMonthDay |
gMonthDay
| ||
exi:gDay |
gDay
| ||
exi:gMonth | gMonth | ||
Decimal |
| decimal | |
Float |
| float , double | |
| exi:integer |
integer
| |
String | exi:string |
| |
n-bit
Unsigned
Integer
|
Used
by
Integer
| ||
Unsigned Integer |
Used
by
Integer
| ||
List | All types derived by list , including IDREFS and ENTITIES | ||
QName | QName but only for the value of xsi:type attributes |
By default, datatypes derived from the XML Schema datatypes above are also represented according to the associated built-in EXI datatype representation . When there are more than one XML Schema datatypes above from which a datatype is derived directly or indirectly, the closest ancestor is used to determine the built-in EXI datatype representation . For example, a value of XML Schema datatype xsd:int is represented according to the same built-in EXI datatype representation as a value of XML Schema datatype xsd:integer. Although xsd:int is derived indirectly from xsd:integer and also further from xsd:decimal, a value of xsd:int is processed as an instance of xsd:integer because xsd:integer is closer to xsd:int than xsd:decimal is in the datatype inheritance hierarchy.
Each
EXI
datatype
identifier
above
is
a
qname
.
Datatype
identifiers
that
uniquely
identify
identifies
one
of
the
built-in
EXI
datatype
representations.
They
EXI
datatype
identifiers
are
used
by
in
Datatype
Representation
Map
Maps
to
designate
XML
Schema
datatypes
to
built-in
EXI
change
the
datatype
representations
used
for
specific
schema
datatypes
different
from
the
ones
that
are
associated
by
default.
Not
all
XS2
and
their
sub-types.
Only
built-in
EXI
datatype
representations
are
assigned
datatype
identifiers.
Only
those
that
have
been
assigned
identifiers
are
usable
by
in
Datatype
Representation
Map
Maps
for
designating
alternative
representations.
.
When
the
preserve.lexicalValues
option
is
true,
all
values
are
represented
as
Strings.
Some
The
table
below
defines
restricted
character
sets
for
several
of
the
built-in
EXI
datatypes.
Each
values
value
that
would
have
otherwise
been
designated
to
certain
built-in
EXI
datatype
representations
are
represented
by
one
of
these
datatypes
is
instead
represented
as
Strings
a
String
with
the
associated
restricted
character
sets
as
defined
by
set,
regardless
of
the
table
below.
actual
pattern
facets,
if
any,
specified
in
the
definitions
of
the
schema
datatypes.
EXI Datatype ID | Restricted Character Set |
---|---|
| { #x9, #xA, #xD, #x20, +, /, [0-9], =, [A-Z], [a-z] } |
| { #x9, #xA, #xD, #x20, [0-9], [A-F], [a-f] } |
| { #x9, #xA, #xD, #x20, 0, 1, a, e, f, l, r, s, t, u } |
| { #x9, #xA, #xD, #x20, +, -, ., [0-9], :, T, Z } |
| |
exi:date | |
exi:gYearMonth | |
exi:gYear | |
exi:gMonthDay | |
exi:gDay | |
exi:gMonth | |
exi:decimal | { #x9, #xA, #xD, #x20, +, -, ., [0-9] } |
| { #x9, #xA, #xD, #x20, +, -, ., [0-9], E, F, I, N, a, e } |
| { #x9, #xA, #xD, #x20, +, -, [0-9] } |
The
restricted
character
set
for
the
EXI
List
datatype
representation
is
determined
by
the
restricted
character
set
of
the
EXI
datatype
representation
of
the
values
in
the
List.
List
item
type.
The rules used to represent values of String depend on the content items to which the values belong. There are certain content items whose value representation involve the use of string tables while other content items are represented using the encoding rule described in 7.1.10 String without involvement of string tables. The content items that use string tables and how each of such content items uses string tables to represent their values are described in 7.3 String Table .
Schemas
can
provide
one
or
more
enumerated
values
for
types.
datatypes.
When
the
preserve.lexicalValues
option
is
false,
EXI
exploits
those
pre-defined
values
when
they
are
available
to
represent
values
of
such
types
datatypes
in
a
more
efficient
manner
than
it
would
have
done
otherwise
without
using
built-in
EXI
datatypes.
pre-defined
values.
The
encoding
rule
for
representing
a
type
of
enumerated
values
is
described
in
7.2
Enumerations
.
Types
Datatypes
that
are
derived
from
other
types
another
by
union
and
their
subtypes
are
always
represented
as
String
regardless
of
the
availability
of
enumerated
values.
Representation
of
values
of
which
the
schema
type
datatype
is
one
of
QName,
Notation
or
a
type
datatype
derived
therefrom
by
restriction
are
also
not
affected
by
enumerated
values
if
any.
The
following
sections
describe
the
encoding
rules
of
built-in
EXI
datatype
representations
used
for
representing
event
codes
values
and
content
items
in
EXI
streams.
Unless
otherwise
stated,
individual
items
in
an
EXI
stream
are
packed
into
bytes
most
significant
bit
first.
Values
typed
as
The
Binary
are
represented
as
datatype
representation
is
a
length-prefixed
sequence
of
octets
representing
the
binary
content.
The
length
is
represented
as
an
Unsigned
Integer
(see
7.1.6
Unsigned
Integer
).
In
the
absence
of
pattern
facets
in
the
schema
datatype,
values
typed
as
the
Boolean
are
represented
as
datatype
representation
is
a
n
-bit
unsigned
integer
(
7.1.9
n-bit
Unsigned
Integer
),
where
n
is
one
(1)
and
the
(1).
The
value
zero
(0)
represents
false
and
the
value
one
(1)
represents
true.
Otherwise,
when
pattern
facets
are
available
in
the
schema
datatype,
the
Boolean
datatype
representation
is
able
to
distinguish
values
not
only
arithmetically
(0
or
1)
but
also
between
lexical
variances
("0",
"1",
"false"
and
"true"),
and
values
typed
as
Boolean
are
represented
as
a
n
-bit
unsigned
integer
(
7.1.9
n-bit
Unsigned
Integer
),
where
n
is
two
(2)
and
the
value
values
zero
(0),
one
(1),
two
(2)
and
three
(3)
each
represents
value
represent
the
values
"false",
"0",
"true"
and
"1".
"1"
respectively.
Values
typed
as
The
Decimal
are
represented
as
datatype
representation
is
a
Boolean
sign
(see
7.1.2
Boolean
)
followed
by
two
Unsigned
Integers
(see
7.1.6
Unsigned
Integer
).
A
sign
value
of
zero
(0)
is
used
to
represent
positive
Decimal
values
and
a
sign
value
of
one
(1)
is
used
to
represent
negative
Decimal
values.
The
first
Unsigned
Integer
represents
the
integral
portion
of
the
Decimal
value.
The
second
Unsigned
Integer
represents
the
fractional
portion
of
the
Decimal
value
with
the
digits
in
reverse
order
to
preserve
leading
zeros.
Values
typed
as
The
Float
are
represented
as
datatype
representation
is
two
consecutive
Integers
(see
7.1.5
Integer
).
The
first
Integer
represents
the
mantissa
of
the
floating
point
number
and
the
second
Integer
represents
the
base-10
exponent
of
the
floating
point
number.
The
range
of
the
mantissa
is
-
(2
63
)
to
2
63
-1
and
the
range
of
the
exponent
is
-
(2
14
-1)
to
2
14
-1.
Values
typed
as
Float
with
a
mantissa
or
exponent
outside
the
accepted
range
are
represented
as
schema-invalid
untyped
values.
The exponent value -(2 14 ) is used to indicate one of the special values: infinity, negative infinity and not-a-number (NaN). An exponent value -(2 14 ) with mantissa values 1 and -1 represents positive infinity (INF) and negative infinity (-INF) respectively. An exponent value -(2 14 ) with any other mantissa value represents NaN.
A
value
represented
as
The
Float
datatype
representation
can
be
decoded
by
going
through
the
following
steps.
The
Integer
type
datatype
representation
supports
signed
integer
numbers
of
arbitrary
magnitude.
Values
typed
The
specific
representation
used
depends
on
the
facet
XS2
values
of
the
associated
schema
datatype
as
follows.
If
the
bounded
range
of
the
associated
schema
datatype
is
4096
or
smaller
as
determined
by
the
values
of
minInclusive
XS2
,
minExclusive
XS2
,
maxInclusive
XS2
and
maxExclusive
XS2
facets,
the
value
is
represented
as
an
n-bit
Unsigned
Integer
are
where
n
is
⌈ log
2
m
⌉
and
m
is
the
bounded
range
of
the
schema
datatype.
Otherwise, if the minInclusive XS2 or minExclusive XS2 facets specify a lower bound greater than or equal to zero (0), the value is represented as an Unsigned Integer .
Otherwise, the value is represented as a Boolean sign (see 7.1.2 Boolean ) followed by an Unsigned Integer (see 7.1.6 Unsigned Integer ). A sign value of zero (0) is used to represent positive integers and a sign value of one (1) is used to represent negative integers. For non-negative values, the Unsigned Integer holds the magnitude of the value. For negative values, the Unsigned Integer holds the magnitude of the value minus 1.
The
Unsigned
Integer
type
datatype
representation
supports
unsigned
integer
numbers
of
arbitrary
magnitude.
Values
typed
as
Unsigned
Integer
are
It
is
represented
using
as
a
sequence
of
octets.
The
sequence
is
octets
terminated
by
an
octet
with
its
most
significant
bit
set
to
0.
The
value
of
the
unsigned
integer
is
stored
in
the
least
significant
7
bits
of
the
octets
as
a
sequence
of
7-bit
bytes,
with
the
least
significant
byte
first.
EXI
processors
SHOULD
support
arbitrarily
large
Unsigned
Integer
values.
EXI
processors
MUST
support
Unsigned
Integer
values
less
than
4294967296.
2147483648.
A
value
represented
as
The
Unsigned
Integer
datatype
representation
can
be
decoded
by
going
through
the
following
steps.
Values
of
type
The
QName
are
encoded
as
datatype
representation
is
a
sequence
of
values
representing
the
URI,
local-name
and
prefix
components
of
the
QName
in
that
order,
where
the
prefix
component
is
present
only
when
the
preserve.prefixes
option
is
set
to
true.
When
the
QName
value
is
specified
by
a
schema-informed
grammar
using
the
SE(
SE (
qname
)
or
AT(
AT (
qname
)
terminal
symbols,
URI
and
local-name
are
implicit
and
are
omitted.
Similarly,
when
the
URI
of
the
QName
value
is
derived
from
a
schema-informed
grammar
using
SE(
SE (
uri
: *)
or
AT (
uri
: *)
terminal
symbols,
URI
is
implicit
thus
omitted
in
the
representation,
and
only
the
local-name
component
is
encoded
as
a
String
(see
7.1.10
String
).
Otherwise,
URI
and
local-name
components
are
encoded
as
Strings.
If
the
QName
is
in
no
namespace,
the
URI
is
represented
by
a
zero
length
String.
When
present,
prefixes
are
represented
as
n
-bit
unsigned
integers
(
7.1.9
n-bit
Unsigned
Integer
),
where
n
is
log
⌈ log
2
(
N
)
) ⌉
and
N
is
the
number
of
unique
prefix
es
specified
for
in
the
prefix
string
table
partition
associated
with
the
URI
of
the
QName
by
preceding
NS
events
or
one
(1)
if
there
are
no
prefixes
in
this
partition.
If
the
EXI
stream.
Each
unique
given
prefix
is
assigned
a
unique
n
-bit
integer
(0
...
N
-1)
according
to
the
order
exists
in
which
the
associated
NS
event
occurs
in
prefix
string
table
partition,
it
is
represented
using
the
EXI
stream.
compact
identifier
assiged
by
the
partition.
If
there
are
no
the
given
prefix
es
specified
for
does
not
exist
in
the
URI
of
associated
partition,
the
QName
MUST
be
part
of
an
SE
event
and
the
prefix
MUST
be
resolved
by
preceding
one
of
the
NS
events
in
immediately
following
the
EXI
stream,
SE
event
(see
resolution
rules
below).
In
this
case,
the
unresolved
prefix
representation
is
undefined.
An
undefined
not
used
and
can
be
zero
(0)
or
the
compact
identifer
of
any
prefix
in
the
associated
partition.
Note:
When N is one, the prefix is represented using zero bits
Given
either
a
n
-bit
unsigned
integer
m
that
represents
either
the
prefix
value
or
an
undefined
prefix,
unresolved
prefix
value,
the
effective
prefix
value
is
determined
by
following
the
rules
described
below
in
order.
A
QName
is
in
error
if
it
has
an
undefined
its
prefix
that
cannot
be
resolved
by
the
rules
below.
Values
typed
as
The
Date-Time
are
encoded
as
datatype
representation
is
a
sequence
of
values
representing
the
individual
components
of
the
Date-Time.
The
following
table
specifies
each
of
the
possible
date-time
components
along
with
how
they
are
encoded.
Component | Value | Type |
---|---|---|
Year | Offset from 2000 | Integer ( 7.1.5 Integer ) |
MonthDay | Month * 32 + Day | 9-bit Unsigned Integer ( 7.1.9 n-bit Unsigned Integer ) where day is a value in the range 1-31 and month is a value in the range 1-12. |
Time |
((Hour
*
| 17-bit Unsigned Integer ( 7.1.9 n-bit Unsigned Integer ) |
FractionalSecs | Fractional seconds | Unsigned Integer ( 7.1.6 Unsigned Integer ) representing the fractional part of the seconds with digits in reverse order to preserve leading zeros |
TimeZone | TZHours * 60 + TZMinutes | 11-bit Unsigned Integer ( 7.1.9 n-bit Unsigned Integer ) representing a signed integer offset by 840 ( = 14 * 60 ) |
presence | Boolean presence indicator | Boolean ( 7.1.2 Boolean ) |
The variety of components that constitute a value and their appearance order depend on the XML Schema type associated with the value. The following table shows which components are included in a value of each XML Schema type that is relevant to Date-Time datatype. Items listed in square brackets are included if and only if the value of its preceding presence indicator (specified above) is set to true.
XML Schema Datatype | Included Components |
---|---|
gYear XS2 | Year, presence, [TimeZone] |
gYearMonth XS2 | Year, MonthDay, presence, [TimeZone] |
date XS2 | |
dateTime XS2 | Year, MonthDay, Time, presence, [FractionalSecs], presence, [TimeZone] |
gMonth XS2 | MonthDay, presence, [TimeZone] |
gMonthDay XS2 | |
gDay XS2 | |
time XS2 | Time, presence, [FractionalSecs], presence, [TimeZone] |
When
the
value
of
the
compression
option
is
false
and
the
value
bit-packed
is
used
for
alignment
options
,
values
of
type
is
used,
the
n
-bit
Unsigned
Integer
are
represented
as
datatype
representation
is
an
unsigned
binary
integer
using
n
bits.
Otherwise,
they
are
represented
as
it
is
an
unsigned
integer
using
the
minimum
number
of
bytes
required
to
store
n
bits.
Bytes
are
ordered
with
the
least
significant
byte
first.
The
n
-bit
unsigned
integer
is
used
for
encoding
representing
event
codes
,
the
prefix
component
of
QName
QNames
(see
7.1.7
QName
)
as
well
as
and
certain
value
content
items,
as
described
in
respective
relevant
parts
of
this
document.
As
shown
described
in
table
Table
7-1
section
7.1.5
Integer
,
integers
with
a
bounded
range
size
m
equal
to
4095
4096
or
smaller
are
encoded
using
represented
as
n
-bit
unsigned
integer
integers
with
n
being
⌈
log
2
m
⌉,
as
an
offset
from
the
minimum
value
in
the
range.
Values
of
type
The
String
are
represented
as
datatype
representation
is
a
length
prefixed
sequence
of
characters.
The
length
indicates
the
number
of
characters
in
the
string
and
is
represented
as
an
Unsigned
Integer
(see
7.1.6
Unsigned
Integer
).
If
a
restricted
character
set
is
defined
for
the
string
(see
7.1.10.1
Restricted
Character
Sets
),
each
character
is
represented
as
an
n
-bit
Unsigned
Integer
(see
7.1.9
n-bit
Unsigned
Integer
).
Otherwise,
each
character
is
represented
by
its
UCS
[ISO/IEC
10646]
code
point
encoded
as
an
Unsigned
Integer
(see
7.1.6
Unsigned
Integer
).
EXI uses a string table to represent certain content items more efficiently. Section 7.3 String Table describes the string table and how it is applied to different content items.
If a string value is associated with a schema datatype XS2 derived from xsd:string and one or more of the datatypes in its datatype hierarchy has one or more pattern facets, there may be a restricted character set defined for the string value. The following steps are used to determine the restricted character set, if any, defined for a given string value associated with such a schema datatype.
First,
determine
Given
the
character
set
for
each
datatype
in
schema
datatype,
let
the
target
datatype
hierarchy
definition
be
the
definition
of
the
string
value
most-derived
datatype
that
has
one
or
more
pattern
facets
immediately
specified
in
its
definition
in
the
schema
among
those
in
the
datatype
inheritance
hierarchy
that
traces
backwards
toward
primitive
datatypes
XS2
starting
from
the
datatype.
If
the
target
datatype
definition
is
a
definition
for
a
built-in
datatype
XS2
,
there
is
no
restricted
character
set
for
the
string
value.Otherwise,
determine
the
set
of
characters
for
each
immediate
pattern
facet
of
the
target
datatype
definition
according
to
section
E
Deriving
Character
Sets
Set
of
Characters
from
XML
Schema
Regular
Expressions
.
For
each
datatype
with
more
than
one
pattern
facet,
Then,
compute
the
restricted
character
set
based
on
the
union
of
the
regular
expressions
specified
by
its
pattern
facets.
If
the
restricted
character
set
for
a
datatype
contains
at
least
255
characters
or
contains
non-BMP
characters,
the
character
set
of
the
datatype
is
not
restricted
and
can
be
omitted
from
further
consideration.
Then,
compute
the
restricted
character
set
for
the
string
value
as
the
intersection
union
of
all
the
character
sets
of
characters
computed
above.
in
the
previous
step.
If
the
resulting
character
set
of
characters
contains
less
than
255
256
characters
and
contains
only
BMP
characters,
the
string
value
has
a
restricted
character
set
and
each
character
is
represented
using
an
n
-bit
Unsigned
Integer
(see
7.1.9
n-bit
Unsigned
Integer
),
where
n
is
⌈
log
2
(
N
+
1)
⌉
and
N
is
the
number
of
characters
in
the
restricted
character
set.
The
characters
in
the
restricted
character
set
are
sorted
by
UCS
[ISO/IEC
10646]
code
point
and
represented
by
integer
values
in
the
range
(0
...
N
-1)
−1)
according
to
their
ordinal
position
in
the
set.
Characters
that
are
not
in
this
set
are
represented
by
the
integer
n
-bit
Unsigned
Integer
N
followed
by
the
UCS
code
point
of
the
character
represented
as
an
Unsigned
Integer.
The figure below illustrates an overview of the process for determining and using restricted character sets described in this section.
Figure 7-1. String Processing Model
Values of type List are encoded as a length prefixed sequence of values. The length is encoded as an Unsigned Integer (see 7.1.6 Unsigned Integer ) and each value is encoded according to its type (see 7. Representing Event Content ).
Values
of
When
the
preserve.lexicalValues
option
is
false,
enumerated
types
values
are
encoded
as
n
-bit
Unsigned
Integers
(
7.1.9
n-bit
Unsigned
Integer
)
where
n
=
⌈
log
2
m
⌉
and
m
is
the
number
of
items
in
the
enumerated
type.
The
value
assigned
to
each
item
corresponds
to
its
ordinal
position
in
the
enumeration
in
schema-order
starting
with
position
zero
(0).
Exceptions are for schema types derived from others by union and their subtypes, QName or Notation and types derived therefrom by restriction. The values of such types are processed by their respective built-in EXI datatype representations instead of being represented as enumerations.
EXI uses a string table to assign "compact identifiers" to some string values. Occurrences of string values found in the string table are represented using the associated compact identifier rather than encoding the entire "string literal". The string table is initially pre-populated with string values that are likely to occur in certain contexts and is dynamically expanded to include additional string values encountered in the document. The following content items are encoded using a string table:
When a string value is found in the string table, the value is encoded using the compact identifier and no changes are made to the string table as a result. When a string value is not found in the string table, its string literal is encoded as a String without using a compact identifier, only after which the string table is augmented by including the string value with an assigned compact identifier unless the string value represents a value content item and fails to satisfy the criteria in effect by virtue of valuePartitionCapacity and valueMaxLength options .
The string table is divided into partitions and each partition is optimized for more frequent use of either compact identifiers or string literals depending on the purpose of the partition. Section 7.3.1 String Table Partitions describes how EXI string table is partitioned. Section 7.3.2 Partitions Optimized for Frequent use of Compact Identifiers describes how string values are encoded when the associated partition is optimized for more frequent use of compact identifiers. Section 7.3.3 Partitions Optimized for Frequent use of String Literals describes how string values are encoded when the associated partition is optimized for more frequent use of string literals.
The life cycle of a string table spans the processing of a single EXI stream. String tables are not represented in an EXI stream or exchanged between EXI processors. A string table cannot be reused across multiple EXI streams; therefore, EXI processors MUST use a string table that is equivalent to the one that would have been newly created and pre-populated with initial values for processing each EXI stream.
The string table is organized into partitions so that the indices assigned to compact identifiers can stay relatively small. Smaller number of indices results in improved average compactness and the efficiency of table operations. Each partition has a separate set of compact identifiers and content items are assigned to specific partitions as described below.
Uri content items and the URI portion of qname content items are assigned to the uri partition. The uri partition is optimized for frequent use of compact identifiers and is pre-populated with initial entries as described in D.1 Initial Entries in Uri Partition . When a schema is provided, the uri partition is also pre-populated with the name of each target namespace URI declared in the schema, plus some of the namespace URIs used in wildcard terms and attribute wildcards (see section 8.5.4.1.7 Wildcard Terms and 8.5.4.1.3.2 Complex Type Grammars , respectively for the condition), appended in lexicographical order.
Prefix content items are assigned to partitions based on their associated namespace URI. Partitions containing prefix content items are optimized for frequent use of compact identifiers and the string table is pre-populated with entries as described in D.2 Initial Entries in Prefix Partitions .
The local-name portion of qname content items are assigned to partitions based on the namespace URI of the qname content item of which the local-name is a part. Partitions containing local-names are optimized for frequent use of string literals and the string table is pre-populated with entries as described in D.3 Initial Entries in Local-Name Partitions . When a schema is provided, the string table is also pre-populated with the local name of each attribute, element and type declared in the schema, partitioned by namespace URI and sorted lexicographically.
Each
Value
value
content
items
are
item
is
assigned
simultaneously
to
both
the
global
value
partition
as
well
as
to
the
and
a
"local"
value
partition
that
corresponds
to
based
on
the
qname
of
the
attribute
or
element
in
context
at
the
time
when
the
string
table
is
looked
up
and
the
string
value
is
not
found
in
both
global
and
local
added
to
the
value
partitions.
Partitions
containing
value
content
items
are
optimized
for
frequent
use
of
string
literals
and
are
initially
empty.
[Definition:]
All
value
partitions
in
a
string
table
share
a
single
The
variable
valueAmount
globalID
the
value
of
which
is
a
non-negative
integer
that
reflects
representing
the
current
number
compact
identifier
of
unique
values
in
the
next
item
added
to
the
global
value
partitions.
partition.
Its
value
is
initially
set
to
0
(zero)
and
changes
while
processing
an
EXI
stream
per
the
rule
described
in
7.3.3
Partitions
Optimized
for
Frequent
use
of
String
Literals
.
String table partitions that are expected to contain a relatively small number of entries used repeatedly throughout the document are optimized for the frequent use of compact identifiers. This includes the uri partition and all partitions containing prefix content items.
When a string value is found in a partition optimized for frequent use of compact identifiers, the string value is represented as the value ( i +1) encoded as an n -bit Unsigned Integer ( 7.1.9 n-bit Unsigned Integer ), where i is the value of the compact identifier, n is ⌈ log 2 ( m +1) ⌉ and m is the number of entries in the string table partition at the time of the operation.
When a string value is not found in a partition optimized for frequent use of compact identifiers, the String value is represented as zero (0) encoded as an n -bit Unsigned Integer, followed by the string literal encoded as a String ( 7.1.10 String ). After encoding the String value, it is added to the string table partition and assigned the next available compact identifier m .
The remaining string table partitions are optimized for the frequent use of string literals. This includes all string table partitions containing local-names and all string table partitions containing value content items.
When
a
string
value
is
found
in
the
partitions
containing
local-names,
the
string
value
is
represented
as
zero
(0)
encoded
as
an
Unsigned
Integer
(see
7.1.6
Unsigned
Integer
)
followed
by
an
the
compact
identifier
of
the
string
value.
The
compact
identifier
of
the
string
value
is
encoded
as
an
n
-bit
unsigned
integer
(
7.1.9
n-bit
Unsigned
Integer
),
where
n
is
⌈
log
2
m
⌉
and
m
is
the
number
of
entries
in
the
string
table
partition
at
the
time
of
the
operation.
When
a
string
value
is
not
found
in
the
partitions
containing
local-names,
its
string
literal
is
encoded
as
a
String
(see
7.1.10
String
)
with
the
length
of
the
string
is
incremented
by
one.
After
encoding
the
string
value,
it
is
added
to
the
string
table
partition
and
assigned
the
next
available
compact
identifier
m
.
As
described
above,
each
value
content
items
are
item
is
assigned
to
two
partitions,
a
"local"
value
partition
and
the
global
value
partition.
When
a
string
value
is
found
in
the
"local"
value
partition,
the
string
value
is
may
be
represented
as
zero
(0)
encoded
as
an
Unsigned
Integer
(see
7.1.6
Unsigned
Integer
)
followed
by
the
compact
identifier
of
the
string
value
in
the
"local"
value
partition.
When
a
string
value
is
found
in
the
global
value
partition,
but
not
in
the
"local"
value
partition,
the
String
value
is
may
be
represented
as
one
(1)
encoded
as
an
Unsigned
Integer
(see
7.1.6
Unsigned
Integer
)
followed
by
the
compact
identifier
of
the
String
value
in
the
global
value
partition.
The
compact
identifier
is
encoded
as
an
n
-bit
unsigned
integer
(
7.1.9
n-bit
Unsigned
Integer
),
where
n
is
⌈
log
2
m
⌉
and
m
is
the
number
of
entries
in
the
associated
partition
at
the
time
of
the
operation.
When
a
string
value
S
is
not
found
in
the
global
or
"local"
value
partition,
its
string
literal
is
encoded
as
a
String
(see
7.1.10
String
)
with
the
length
L
+
2
(incremented
by
two)
where
L
is
the
number
of
characters
in
the
string
value
.
After
encoding
the
string
value,
it
value.
If
valuePartitionCapacity
is
added
to
both
the
associated
"local"
value
string
table
partition
not
zero,
and
the
global
value
string
table
partition
if
L
is
equal
to
or
smaller
greater
than
zero
and
no
more
than
the
valueMaxLength
option
value,
,
the
string
S
is
added
to
the
associated
"local"
value
partition
using
the
next
available
compact
identifier
m
and
added
to
the
global
value
of
partition
using
the
compact
identifier
valueAmount
globalID
.
When
S
is
smaller
than
added
to
the
capacity
specified
by
global
value
partition
and
there
was
already
a
string
V
in
the
global
value
partition
associated
with
the
compact
identifier
valuePartitionCapacity
option
globalID
.
,
the
string
S
replaces
the
string
V
in
the
global
table,
and
the
string
V
is
removed
from
its
associated
local
value
partition
by
rendering
its
compact
identifier
permanently
unassigned.
When
the
string
value
was
is
added
to
the
global
value
partitions,
partition,
the
value
of
valueAmount
globalID
is
incremented
by
1
.
Editorial
note
String
values
representing
value
content
items
are
never
added
to
one
(1).
If
the
string
table
once
resulting
value
of
valueAmount
globalID
reaches
is
equal
to
valuePartitionCapacity
.
The
working
group
,
its
value
is
still
looking
at
other
alternatives
reset
to
cap
the
amount
of
memory
used
for
value
partitions
that
can
result
in
more
compact
representation
of
string
values
overall,
including
those
that
involve
reassignment
of
compact
identifiers
using
some
sort
of
round-robin
selection
method,
and
the
expected
effect
on
processing
efficiency
of
each
alternative.
zero
(0)
By
default,
each
typed
value
in
an
EXI
stream
is
represented
by
the
associated
using
its
default
built-in
EXI
datatype
representation
(e.g.,
see
(see
Table
7-1
).
However,
[Definition:]
EXI
processors
MAY
provide
the
capability
to
specify
different
alternate
built-in
EXI
datatype
representations
or
user-defined
datatype
representations
for
representing
specific
schema
datatypes.
datatypes
XS2
.
This
capability
is
called
a
Datatype
Representation
Map
.
Note:
This feature is relevant only to simple types in the schema. EXI does not provide a way for applications to infuse custom representations of structured data bound to complex types into the format.
EXI
processors
that
support
Datatype
Representation
Map
Maps
MAY
provide
external
implementation
specific
means
to
define
and
install
user-defined
datatype
representations
,
of
which
EXI
processors
are
free
to
choose
implementation
dependent
mechanisms.
representations.
EXI
processors
MAY
also
provide
implementation
specific
means
for
applications
or
users
to
specify
alternate
built-in
EXI
datatype
representations
or
user-defined
datatype
representations
for
representing
specific
schema
datatypes,
datatypes.
As
with
the
mechanisms
of
which
default
EXI
datatype
representations,
alternate
datatype
representations
are
again
implementation
dependent.
used
for
the
associated
XML
Schema
types
specified
in
the
Datatype
Representation
Map
and
XML
Schema
datatypes
derived
from
those
datatypes.
When
there
are
built-in
or
user-defined
datatype
representations
associated
with
more
than
one
XML
Schema
datatype
in
the
type
hierarchy
of
a
particular
datatype,
the
closest
ancestor
with
an
associated
datatype
representation
is
used
to
determine
the
EXI
datatype
representation.
When
an
EXI
processor
encodes
an
EXI
stream
using
a
Datatype
Representation
Map
,
it
MUST
specify
in
the
EXI
header
each
schema
datatype
that
is
not
represented
using
the
default
built-in
EXI
datatype
representation
and
the
alternate
built-in
EXI
datatype
representation
or
user-defined
datatype
representation
used
for
each
one
unless
the
whole
EXI
Options
part
of
the
header
is
omitted.
present,
the
EXI
options
part
MUST
specify
all
alternate
datatype
representations
used
in
the
EXI
stream.
An
EXI
processor
that
attempts
to
decode
an
EXI
stream
that
specifies
a
user-defined
datatype
representation
in
the
EXI
header
that
it
does
not
recognize
MAY
report
a
warning,
but
this
is
not
an
error.
However,
when
an
EXI
processor
encounters
a
typed
value
that
was
encoded
by
a
user-defined
datatype
representation
that
it
does
not
support,
it
MUST
report
an
error.
The
EXI
options
header,
when
it
appears
in
an
EXI
stream,
MUST
include
a
"datatypeRepresentationMap"
element
for
each
schema
datatype
that
is
of
which
the
descendant
datatypes
derived
by
restriction
as
well
as
itself
are
not
represented
using
the
default
built-in
EXI
datatype
representation.
The
"datatypeRepresentationMap"
element
includes
two
child
elements.
The
qname
of
the
first
child
element
identifies
the
schema
datatype
that
is
not
represented
using
the
default
built-in
EXI
datatype
representation
and
the
qname
of
the
second
child
element
identifies
the
alternate
built-in
EXI
datatype
representation
or
user-defined
datatype
representation
used
to
represent
that
type.
Built-in
EXI
datatype
representations
are
identified
by
the
type
identifiers
in
Table
7-1
.
For
example,
the
following
"datatypeRepresentationMap"
element
"datatypeRepresentationMap"element
indicates
all
values
of
type
xsd:decimal
and
the
datatypes
derived
from
it
by
restriction
in
the
EXI
stream
are
represented
using
the
built-in
String
datatype
representation,
which
has
the
type
ID
xsd:string:
exi:string
:
<exi:datatypeRepresentationMap xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:decimal/><xsd:string/> </datatypeRepresentationMap><exi:string/> </exi:datatypeRepresentationMap>
It is the responsibility of an EXI processor to interface with a particular implementation of built-in EXI datatype representations or user-defined datatype representations properly. In the example above, an EXI processor may need to provide a string value of the data being processed that is typed as xsd:decimal in order to interface with an implementation of built-in String datatype representation. In such a case, some EXI processors may have started with a decimal value and such processors may well translate the value into a string before passing the data to the implementation of built-in String datatype representation while other EXI processors may already have a string value of the data so that it can pass the value directly to the implementation of built-in String datatype representation without any translation.
As
another
example,
the
following
"datatypeRepresentationMap"
element
indicates
all
values
of
the
used-defined
datatype
simple
type
geo:geometricSurface
and
the
datatypes
derived
from
it
by
restriction
are
represented
using
the
user-defined
datatype
representation
geo:geometricInterpolator:
<exi:datatypeRepresentationMap xmlns:geo="http://example.com/Geometry"> <geo:geometricSurface/> <geo:geometricInterpolator/></datatypeRepresentationMap></exi:datatypeRepresentationMap>
Note:
EXI only defines a way to indicate the use of user-defined datatype representations for representing values of specific datatypes. Datatype representationsEXI is a knowledge based encoding that uses a set of grammars to determine which events are most likely to occur at any given point in an EXI stream and encodes the most likely alternatives in fewer bits. It does this by mapping the stream of events to a lower entropy set of representative values and encoding those values using a set of simple variable length codes or an EXI compression algorithm.
The result is a very simple, small algorithm that uniformly handles schema-less encoding, schema-informed encoding, schema deviations, and any combination thereof in EXI streams. These variations do not require different algorithms or different parsers, they are simply informed by different combinations of grammars.
The following sections describe the grammars used to inform the EXI encoding.
Note:
The grammar semantics in this specification are written for clarity and generality. They do not prescribe a particular implementation approach.Each grammar production has an event code , which is represented by a sequence of one to three parts separated by periods ("."). Each part is an unsigned integer. The following are examples of grammar productions with event codes as they appear in this specification.
Productions | Event Codes | ||
---|---|---|---|
LeftHandSide
| |||
| 0 | ||
| 1 | ||
| 2.0 | ||
| 2.1 | ||
| 2.2.0 | ||
| 2.2.1 | ||
LeftHandSide
| |||
| 0 | ||
| 1.0 | ||
| 1.1 |
The
number
of
parts
in
a
given
event
code
is
called
the
event
code's
length.
No
two
productions
with
the
same
non-terminal
symbol
on
the
left-hand-side
left-hand
side
are
permitted
to
have
the
same
event
code.
code
.
Some
non-terminal
symbols
are
used
on
the
right-hand-side
right-hand
side
in
a
production
without
an
event
a
terminal
symbol
prefixed
to
them.
them,
but
with
a
parenthesized
event
code
affixed
instead.
Such
non-terminal
symbols
are
macros
and
they
are
used
to
capture
some
recurring
set
of
productions
into
as
symbols
so
that
a
symbol
can
be
used
in
the
grammar
representation
instead
of
including
all
the
productions
the
macro
represents
in
place
every
time
it
is
used.
ABigProduction
| |||
| 0 | ||
| 1 | ||
LEFTHANDSIDE 1 (2.0) | 2.0 | ||
ABigProduction
| |||
| 0 | ||
LEFTHANDSIDE 1 (1.1) | 1.1 | ||
| 1.2 |
Because
non-terminal
macros
are
injected
into
the
right-hand-side
right-hand
side
of
more
than
one
production,
the
event
codes
of
productions
with
these
macro
non-terminals
on
the
left-hand-side
left-hand
side
are
not
fixed,
but
will
have
different
event
code
values
depending
on
the
context
in
which
the
macro
non-terminal
appears.
This
specification
calls
these
variable
event
codes
and
uses
variables
in
place
of
individual
event
code
parts
to
indicate
the
event
code
parts
are
determined
by
the
context.
Below
are
some
examples
of
variable
event
codes:
LEFTHANDSIDE
| |||
| n .0 | ||
| n .1 | ||
| n . m +2 | ||
| n . m +3 | ||
| n . m +4.0 | ||
| n . m +4.1 |
Unless
otherwise
specified,
the
variable
n
evaluates
to
the
first
part
of
the
event
code
of
the
production
in
which
the
macro
non-terminal
LEFTHANDSIDE
1
1
appears
on
the
right-hand-side.
right-hand
side.
Similarly,
the
expression
n
.
m
represents
the
first
two
parts
of
the
event
code
of
the
production
in
which
the
macro
non-terminal
LEFTHANDSIDE
1
1
appears
on
the
right-hand-side.
right-hand
side.
Non-terminal
macros
are
used
in
this
specification
for
notational
convenience
only.
They
are
not
non-terminals,
even
though
they
are
used
in
place
of
non-terminals.
Productions
that
use
non-terminal
macros
on
the
right-hand-side
right-hand
side
need
to
be
expanded
by
macro
substitution
before
such
productions
are
interpreted.
Therefore,
ABigProduction
1
1
and
ABigProduction
2
2
shown
in
the
preceding
example
are
equivalent
to
the
following
set
of
productions
derived
obtained
by
expanding
the
non-terminal
macro
symbol
LEFTHANDSIDE
1
1
and
evaluating
the
variable
event
codes.
ABigProduction
| ||||
| 0 | |||
| 1 | |||
| 2.0 | |||
| 2.1 | |||
| 2.2 | |||
| 2.3 | |||
| 2.4.0 | |||
| 2.4.1 | |||
ABigProduction
| ||||
| 0 | |||
| 1.0 | |||
| 1.1 | |||
| 1.2 | |||
| 1.3 | |||
| 1.4 | |||
| 1.5.0 | |||
| 1.5.1 |
Each production rule in the EXI grammar includes an event code value that approximates the likelihood the associated production rule will be matched over the other productions with the same left-hand-side non-terminal symbol. Ultimately, the event codes determine the value(s) by which each non-terminal symbol will be represented in the EXI stream.
To
understand
how
a
given
event
code
approximates
the
likelihood
a
given
production
will
matched,
match,
it
is
useful
to
visualize
the
event
codes
for
a
set
of
production
rules
that
have
the
same
non-terminal
symbol
on
the
left-hand-side
left-hand
side
as
a
tree.
For
example,
the
following
set
of
productions:
ElementContent : | ||||
EE | 0 | |||
| 1.0 | |||
| 1.1 | |||
| 1.2 | |||
| 1.3.0 | |||
| 1.3.1 |
represents
a
set
of
information
items
that
might
occur
as
element
content
after
the
start
tag.
Using
the
production
event
codes,
codes
,
we
can
visualize
this
set
of
productions
as
follows:
Figure 8-1. Event code tree for ElementContent grammar
where
the
non-terminal
terminal
symbols
are
represented
by
the
leaf
nodes
of
the
tree
tree,
and
the
event
code
of
each
production
rule
that
contains
a
non-terminal
symbol
defines
a
path
from
the
root
of
the
tree
to
the
node
associated
with
that
symbol.
represents
the
terminal
symbol
that
is
on
the
right-hand
side
of
the
production.
We
call
this
the
event
code
tree
for
a
given
set
of
productions.
An event code tree is similar to a Huffman tree [Huffman Coding] in that shorter paths are generally used for symbols that are considered more likely. However, event code trees are far simpler and less costly to compute and maintain. Event code trees are shallow and contain at most three levels. In addition, the length of each event code in the event code tree is assigned statically without analyzing the data. This classification provides some of the benefits of a Huffman tree without the cost.
As discussed in section 6.3 Fidelity Options , applications MAY provide a set of fidelity options to specify the XML features they require. EXI processors MUST use these fidelity options to prune the productions of which the terminal symbols represent the events that are not required from the grammars, improving compactness and processing efficiency.
For example, the following set of productions represent the set of information items that might occur as element content after the start tag.
ElementContent : | |||
EE | 0 | ||
| 1.0 | ||
| 1.1 | ||
| 1.2 | ||
| 1.3.0 | ||
| 1.3.1 |
If an application sets the fidelity options preserve.comments, preserve.pis and preserve.dtd to false, the productions matching comment (CM), processing instruction (PI) and entity reference (ER) events are pruned from the grammar, producing the following set of productions:
ElementContent : | ||||
EE | 0 | |||
| 1.0 | |||
| 1.1 |
Removing these productions from the grammar tells EXI processors that comments and processing instructions will never occur in the EXI stream, which reduces the entropy of the stream allowing it to be encoded in fewer bits.
Each
time
a
production
is
removed
from
a
grammar,
the
event
codes
of
the
other
productions
with
the
same
non-terminal
symbol
on
the
left-hand-side
left-hand
side
MUST
be
adjusted
to
keep
them
contiguous
if
its
removal
has
left
the
remaining
productions
with
non-contiguous
event
codes.
This
section
describes
the
built-in
XML
grammar
grammars
used
by
EXI
when
no
additional
schema
information
is
available
to
describe
the
contents
or
when
available
schema
information
describes
only
portions
of
the
EXI
stream.
The
built-in
XML
grammar
is
used
when
no
schema
exists,
grammars
are
dynamic
and
for
schema
extensions
continuously
evolve
to
reflect
knowledge
learned
while
processing
an
EXI
stream.
New
built-in
element
grammars
are
created
to
describe
the
content
of
newly
encountered
elements
and
deviations
that
new
grammar
productions
are
not
declared
by
added
to
refine
existing
built-in
grammars.
Newly
learned
grammars
and
productions
are
used
to
more
efficiently
represent
subsequent
events
in
the
schema.
EXI
stream.
All
newly
created
built-in
element
grammars
are
global
element
grammars
.
[Definition:]
A
built-in
XML
global
element
grammar
is
self-evolving.
The
built-in
a
grammar
continuously
reflects
describing
the
knowledge
being
learned
while
content
of
an
element
that
has
global
scope
(i.e.
a
global
element).
At
the
onset
of
processing
an
EXI
stream
onto
itself
in
order
to
keep
refining
itself
for
subsequent
use
stream,
the
set
of
global
element
grammars
is
the
set
of
all
schema-informed
element
grammars
derived
from
element
declarations
that
have
a
{scope}
property
of
global
.
Each
built-in
element
grammar
within
created
while
processing
an
EXI
stream
is
added
to
the
extent
set
of
processing
global
element
grammars.
Each
global
element
grammar
has
a
single
stream.
unique
qname
.
In
the
absence
of
additional
schema
information
about
describing
the
content
of
the
EXI
stream,
the
following
grammar
describes
the
events
that
will
occur
in
an
EXI
document
.
Syntax | Event Code | ||
---|---|---|---|
Document : | |||
SD DocContent | 0 | ||
DocContent : | |||
| 0 | ||
DT DocContent | 1.0 | ||
CM DocContent | 1.1.0 | ||
PI DocContent | 1.1.1 | ||
DocEnd : | |||
ED | 0 | ||
CM DocEnd | 1.0 | ||
PI DocEnd | 1.1 |
Semantics: | |
---|---|
All
productions
in
the
built-in
|
In
the
absence
of
additional
schema
information
about
describing
the
contents
of
an
EXI
stream,
the
following
grammar
describes
the
events
that
will
may
occur
in
an
EXI
fragment
.
The
grammar
shown
below
represents
the
initial
set
of
productions
that
belong
to
a
in
the
built-in
fragment
grammar
at
the
start
of
a
EXI
stream
processing,
which
is
supplemented
by
the
semantic
description
that
explains
the
rules
used
to
evolve
processing.
The
associated
semantics
explain
how
the
built-in
fragment
grammar
evolves
to
continuously
improve
it
and
be
better
prepared
for
more
efficiently
represent
subsequent
uses
of
the
same
grammar
during
the
rest
of
the
processing
of
events
in
the
EXI
stream.
Syntax | Event Code | ||
---|---|---|---|
Fragment : | |||
SD FragmentContent | 0 | ||
FragmentContent : | |||
| 0 | ||
ED | 1 | ||
CM FragmentContent | 2.0 | ||
PI FragmentContent | 2.1 |
Semantics: | |
---|---|
All
productions
in
the
built-in
All productions of the form LeftHandSide : SE ( qname ) RightHandSide that were previously added to the grammar upon the first occurrence of the element that has the qname qname are evaluated as follows when they are matched:
|
[Definition:]
EXI
defines
a
built-in
element
When
no
grammar
that
is
used
exists
for
an
element
occuring
in
the
absence
of
additional
information
about
the
contents
of
an
EXI
element
prior
to
its
processing.
A
stream,
a
built-in
element
grammar
shown
below
is
prescibed
by
EXI
to
reflect
the
events
created
for
that
will
occur
in
an
element.
Built-in
element
grammars
are
initially
generic
and
are
progressively
refined
as
the
order
amongst
them
in
general
without
any
further
constraint
about
what
specific
content
for
the
associated
element
is
likely
or
not
likely
to
occur
inside
elements.
A
single
instance
of
learned.
All
built-in
element
grammar
is
shared
by
those
elements
in
a
stream
that
have
the
same
grammars
are
qname
global
element
grammar
s
and
do
not
have
additional
a
priori
constraints
as
to
their
content.
A
separate
instance
can
be
uniquely
identified
by
the
qname
of
the
global
element
they
describe.
At
the
outset
of
processing
an
EXI
stream,
the
set
of
built-in
element
grammar
grammars
is
assigned
to
each
empty.
Below
is
the
initial
set
of
productions
used
for
all
newly
created
qname
built-in
element
grammars
upon
the
first
occurrence
of
the
elements
of
the
same
.
The
semantics
describe
how
productions
are
added
to
each
qname
built-in
element
grammar
,
thereafter
the
grammar
continuously
evolves
by
reflecting
the
knowledge
learned
while
processing
as
the
content
of
those
elements.
The
grammar
shown
below
represents
the
initial
set
of
productions
that
belong
to
a
built-in
associated
element
grammar
at
the
time
when
a
new
instance
is
created,
which
is
supplemented
by
the
semantic
description
that
explains
the
rules
that
are
applied
by
the
grammar
onto
itself
to
evolve
and
be
better
prepared
for
subsequent
uses
of
the
same
grammar
instance
during
the
rest
of
the
processing
of
the
stream.
learned.
Syntax | Event Code | ||
---|---|---|---|
StartTagContent : | |||
EE | 0.0 | ||
| 0.1 | ||
NS StartTagContent | 0.2 | ||
SC Fragment | 0.3 | ||
ChildContentItems (0.4) | |||
ElementContent : | |||
EE | 0 | ||
ChildContentItems (1.0) | |||
ChildContentItems (n.m) : | |||
| n . m | ||
CH ElementContent | n .( m +1) | ||
ER ElementContent | n .( m +2) | ||
CM ElementContent | n .( m +3).0 | ||
PI ElementContent | n .( m +3).1 |
Note: |
---|
|
|
Semantics: | |
---|---|
All
productions
in
the
built-in
All productions of the form LeftHandSide : SC Fragment are evaluated as follows:
All
productions
in
the
built-in
All productions of the form LeftHandSide : SE ( qname ) RightHandSide that were previously added to the grammar upon the first occurrence of the element that has the qname qname are evaluated as follows when they are matched:
All
productions
in
the
built-in
All productions in the built-in element grammar of the form LeftHandSide : EE are evaluated as follows:
|
This
section
describes
the
schema-informed
grammars
used
by
EXI
when
schema
information
is
available
to
describe
the
contents
of
the
EXI
stream.
stream
.
Schema
information
used
for
processing
an
EXI
stream
is
either
indicated
by
the
header
option
schemaID
,
or
communicated
out-of-band
in
the
absence
of
schemaID
.
Schema-informed
grammars
are
independent
of
any
particular
schema
language
and
can
be
derived
from
W3C
XML
Schemas,
Schemas
[XML
Schema
Structures]
[XML
Schema
Datatypes]
,
RELAX
NG
schemas,
schemas
[ISO/IEC
19757-2:2003]
,
DTDs
[XML
1.0]
[XML
1.1]
or
other
schema
languages
for
describing
what
is
likely
to
occur
in
an
EXI
stream.
Schema-informed grammars accept all XML documents and fragments regardless of whether and how closely they match the schema. The EXI stream encoder encodes individual events using schema-informed grammars where they are available and falls back to the built-in XML grammars where they are not. In general, events for which a schema-informed grammar exists will be encoded more efficiently.
Unlike built-in XML grammars, schema-informed grammars are static and do not evolve, which permits the reuse of schema-informed grammars across the processing of multiple EXI streams. This is a single outstanding difference between the two grammar systems.
With
such
differences,
however,
their
uses
are
not
exclusive,
but
are
connected
together
at
individual
grammar
level.
Of
particular
note
It
is
important
to
note
that
schema-informed
and
built-in
grammars
that
are
called
upon
for
schema-deviated
parts
are
still
subject
often
used
together
within
the
context
of
a
single
EXI
stream
.
While
processing
a
schema-informed
grammar,
built-in
grammars
may
be
created
to
dynamic
grammar
learning
during
represent
schema
deviations
or
elements
that
match
wildcards
declared
in
the
rest
schema.
Even
though
these
built-in
grammars
occur
in
the
context
of
a
schema-informed
stream,
they
are
still
dynamic
and
evolve
to
represent
content
learned
while
processing
the
EXI
stream
processing
as
is
described
in
8.4.2
8.4
Built-in
Fragment
Grammar
XML
Grammars
.
When
schema
information
is
available
to
describe
the
contents
of
an
EXI
stream,
stream
,
the
following
grammar
describes
the
events
that
will
occur
in
an
EXI
document
.
Syntax | Event Code | ||||
---|---|---|---|---|---|
Document : | |||||
SD DocContent | 0 | ||||
DocContent : | |||||
SE (G 0 ) DocEnd | 0 | ||||
SE (G 1 ) DocEnd | 1 | ||||
⋮ | ⋮ | ||||
SE
(G
n
|
n
| ||||
|
n
| ||||
DT DocContent |
(
n
| ||||
CM DocContent |
(
n
| ||||
PI DocContent |
(
n
| ||||
| |||||
| |||||
DocEnd : | |||||
ED | 0 | ||||
CM DocEnd | 1.0 | ||||
PI DocEnd | 1.1 |
Semantics: | |
---|---|
|
When
schema
information
is
available
to
describe
the
contents
of
an
EXI
stream,
stream
,
the
following
grammar
describes
the
events
that
will
occur
in
an
EXI
fragment
.
Syntax | Event Code | ||||
---|---|---|---|---|---|
Fragment : | |||||
SD FragmentContent | 0 | ||||
FragmentContent : | |||||
SE (F 0 ) FragmentContent | 0 | ||||
SE (F 1 ) FragmentContent | 1 | ||||
⋮ | ⋮ | ||||
SE
(F
n
|
n
| ||||
| n | ||||
|
| ||||
CM FragmentContent |
(
n
| ||||
PI FragmentContent |
(
n
| ||||
| |||||
|
Semantics: | |
---|---|
|
[Definition:]
When
schema
information
is
available
to
describe
the
contents
of
an
EXI
stream
and
more
than
one
element
is
declared
with
the
same
qname,
qname
,
but
not
all
such
elements
have
the
following
grammar
describes
same
type
name
and
{nillable}
property
value,
the
Schema-informed
Element
Fragment
Grammar
are
used
for
processing
the
events
that
may
occur
in
these
such
elements
when
they
occur
inside
an
EXI
fragment
or
EXI
Element
Fragment.
The
schema-informed
element
fragment
grammar
consists
of
ElementFragment
and
ElementFragmentTypeEmpty
which
are
defined
below.
ElementFragment
is
a
grammar
that
accounts
both
element
declarations
and
attribute
declarations
in
the
schemas,
whereas
ElementFragmentTypeEmpty
is
a
grammar
that
regards
only
attribute
declarations.
Syntax | Event Code | ||||
---|---|---|---|---|---|
| |||||
| 0 | ||||
| 1 | ||||
⋮ | ⋮ | ||||
| n −1 | ||||
AT (*) ElementFragment 0 |
n
| ||||
SE
(F
0
)
| n +1 | ||||
SE
(F
1
)
|
n
| ||||
⋮ | ⋮ | ||||
SE
(F
m
-1
)
|
n
+
m
| ||||
| n + m +1 | ||||
EE | n + m +2 | ||||
CH
[untyped value]
|
n
+
m
| ||||
| |||||
SE
(F
0
)
| 0 | ||||
SE
(F
1
)
| 1 | ||||
⋮ | ⋮ | ||||
SE
(F
m
-1
)
| m -1 | ||||
| m | ||||
EE | m +1 | ||||
CH
[untyped value]
| m +2 | ||||
ElementFragmentTypeEmpty 0 : | |||||
AT (A 0 ) [schema-valid value] ElementFragmentTypeEmpty 0 | 0 | ||||
AT (A 1 ) [schema-valid value] ElementFragmentTypeEmpty 0 | 1 | ||||
⋮ | ⋮ | ||||
AT (A n −1 ) [schema-valid value] ElementFragmentTypeEmpty 0 | n −1 | ||||
AT (*) ElementFragmentTypeEmpty 0 | n | ||||
EE | n +1 | ||||
ElementFragmentTypeEmpty 1 : | |||||
EE | 0 | ||||
| |||||
|
Semantics: | |
---|---|
In a schema-informed grammar, all productions of the form LeftHandSide : SE (*) RightHandSide are evaluated as follows:
All productions in the schema-informed element fragment grammar of the form LeftHandSide : AT (*) RightHandSide are evaluated as follows:
|
As
with
all
schema
informed
element
grammars,
the
Element
Fragment
schema-informed
element
fragment
grammar
is
augmented
with
additional
productions
that
describe
events
that
may
occur
in
an
EXI
stream,
but
are
not
explicity
declared
in
the
schema.
The
process
for
augmenting
the
grammar
is
described
in
8.5.4.4
Undeclared
Productions
.
For
the
purposes
of
this
process,
the
schema-informed
element
fragment
grammar
is
treated
as
though
it
is
created
from
an
element
declaration
with
a
{nillable}
property
value
of
true
and
a
type
declaration
that
has
named
sub-types,
and
ElementFragmentTypeEmpty
is
used
to
serve
as
the
TypeEmpty
of
the
type
in
the
process.
The content index of grammars ElementFragment and ElementFragmentTypeEmpty are both 1 (one).
[Definition:] When one or more XML Schema is available to describe the contents of an EXI stream, a schema-informed element grammar Element i is derived for each element declaration E i described by the schemas, where 0 ≤ i < n and n is the number of element declarations in the schema.
[Definition:] When one or more XML Schema is available to describe the contents of an EXI stream, a schema-informed type grammar Type i is derived for each named type declaration T i described by the schemas as well as for each of the built-in primitive types XS2 and built-in derived types XS2 , the complex ur-type XS1 and the simple ur-type XS2 defined by XML Schema specification [XML Schema Structures] [XML Schema Datatypes] , where 0 ≤ i < n and n is the total number of such available types.
Each schema-informed element grammar and type grammar is constructed according to the following four steps:
Each element grammar Element i includes a sequence of n non-terminals Element i, j , where 0 ≤ j < n . The content of the entire element is described by the first non-terminal Element i, 0 . The remaining non-terminals describe portions of the element content. Likewise, each type grammar Type i includes a sequence of n non-terminals Type i, j and the content of the entire type is described by the first non-terminal Type i, 0 .
The algorithms expressed in this section provide a concise and formal description of the EXI grammars for a given set of XML Schema definitions. More efficient algorithms likely exist for generating these EXI grammars and EXI implementations are free to use any algorithm that produces grammars and event codes that generate EXI encodings that match those produced by the grammars described here.
An example is provided in the appendix (see H Schema-informed Grammar Examples ) that demonstrates the process described in this section to generate a complete schema-informed element grammar from an element declaration in a schema.
This section describes the process for creating the EXI proto-grammars from XML Schema declarations and definitions. EXI proto-grammars differ from normalized EXI grammars in that they may contain productions of the form:
LeftHandSide : | ||
|
where
LeftHandSide
and
RightHandSide
are
both
non-terminals.
Whereas,
all
productions
in
a
normalized
EXI
grammar
contain
exactly
one
terminal
symbol
and
at
most
one
non-terminal
symbol
on
the
right
hand
right-hand
side.
This
is
a
restricted
form
of
Greibach
normal
form
[Greibach
Normal
Form]
.
EXI proto-grammars are derived from XML Schema in a straight-forward manner and can easily be normalized with simple algorithm (see 8.5.4.2 EXI Normalized Grammars ).
Proto-grammars are specified in a modular, constructive fashion. XML Schema components such as terms, particles, attribute uses are transformed each into a distinct proto-grammar, leveraging proto-grammars of their sub-components. At various stages of proto-grammar construction, two or more of proto-grammars are concatenated one after another to form more composite grammars.
The grammar concatenation operator ⊕ is a binary, associative operator that creates a new grammar from its left and right grammar operands. The new grammar accepts any set of symbols accepted by its left operand followed by any set of symbols accepted by its right operand.
Given a left operand Grammar L and a right operand Grammar R , the following operation
Grammar L ⊕ Grammar R |
creates a combined grammar by replacing each production of the form
Grammar L k : | ||
EE |
where
0
≤
k
<
n
and
n
is
the
number
of
non-terminals
that
occur
on
the
left
hand
left-hand
side
of
productions
in
Grammar
L
,
with
a
production
of
the
form
Grammar L k : | ||
Grammar R 0 |
connecting each accept state of Grammar L with the start state of Grammar R .
This section describes the process for creating an EXI element grammar from an XML Schema element declaration XS1 .
Given
an
element
declaration
E
i
,
with
properties
{name},
{target
namespace},
{type
definition},
{target namespace},
{type definition},
{scope}
and
{nillable},
create
a
corresponding
EXI
grammar
Element
i
for
evaluating
the
contents
of
elements
in
the
specified
{scope}
with
qname
localName
local-name
=
{name}
and
qname
uri
=
{target namespace}
where
uri
qname
=
{target
namespace}
.
is
the
qname
of
the
elements.
Let
T
j
be
the
{type
definition}
{type definition}
of
E
i
and
Type
j
j
be
the
type
grammar
created
from
T
j
.
The
grammar
Element
i
describing
the
content
model
of
E
i
is
created
as
follows.
Syntax: | ||
---|---|---|
Element
| ||
Type
| ||
Given
an
XML
Schema
type
definition
T
i
,
with
properties
{name}
and
{target namespace},
two
type
grammars
are
created,
which
are
denoted
by
Type
i
and
TypeEmpty
i
.
[Definition:]
Type
i
is
a
grammar
that
fully
reflects
the
type
definition
of
T
i
,
,
whereas
[Definition:]
TypeEmpty
i
is
a
grammar
that
accepts
regards
only
the
attribute
uses
and
attribute
wildcards
of
T
i
,
if
any.
any
.
The grammar Type i is used for evaluating the content of elements that are defined to be of type T i in the schema. [Definition:] Type i is a global type grammar when T i is a named type. Type i , when it is a global type grammar, can additionally be used as the effective grammar designated by a xsi:type attribute with the attribute value that is a qname with local-name = {name} and uri = {target namespace}. TypeEmpty i is used in place of Type i when the element instance that is being evaluated has a xsi:nil attribute with the value true .
[Definition:]
For
each
type
grammar
Type
i
,
an
unique
index
number
content
is
determined
such
that
all
non-terminal
symbols
of
indices
smaller
than
content
have
at
least
one
AT
event
terminal
symbol
and
the
rest
of
the
non-terminal
symbols
in
Type
i
do
not
have
AT
events
terminal
symbols
on
their
right-hand-side,
right-hand
side,
where
indices
are
assigned
to
non-terminal
symbols
in
ascending
order
with
the
entry
non-terminal
symbol
of
Type
i
being
assined
index
0
(zero).
There
is
also
a
content
index
associated
with
each
TypeEmpty
i
where
its
value
is
determined
in
the
same
manner
as
for
Type
i
.
Sections
8.5.4.1.3.1
SimpleType
Simple
Type
Grammars
and
8.5.4.1.3.2
Complex
Type
Grammars
describe
the
processes
for
creating
Type
i
and
TypeEmpty
i
from
XML
Schema
simple
type
definitions
XS1
and
complex
type
definitions
XS1
defined
in
schemas
as
well
as
built-in
primitive
types
XS2
,
built-in
derived
types
XS2
and
simple
ur-type
XS2
defined
by
XML
Schema
specification
[XML
Schema
Datatypes]
.
Section
8.5.4.1.3.3
Complex
Ur-Type
Grammar
defines
the
grammar
used
for
processing
instances
of
element
contents
of
type
xsd:anyType
XS1
.
This section describes the process for creating an EXI type grammar from an XML Schema simple type definition XS1 .
Given
a
simple
type
definition
T
i
,
with
properties
{name}
and
{target
namespace},
create
two
new
EXI
grammars
Type
i
and
TypeEmpty
i
for
evaluating
instances
of
types
with
qname
localName
=
{name}
and
qname
uri
=
{target
namespace}.
following
the
procedure
described
below.
Add
the
following
grammar
productions
to
Type
i
and
TypeEmpty
i
i
:
Syntax: | ||
---|---|---|
Type i, 0 : | ||
CH
| ||
Type i, 1 : | ||
EE | ||
TypeEmpty i, 0 : | ||
EE | ||
Note: | |
---|---|
Productions
of
the
form
LeftHandSide
:
CH
|
The content index of grammar Type i and TypeEmpty i created from an XML Schema simple type definition is always 0 (zero).
This section describes the process for creating an EXI type grammar from an XML Schema complex type definition XS1 .
Given
a
complex
type
definition
T
i
,
with
properties
{name},
{target
namespace},
{target namespace},
{attribute
uses},
{attribute
wildcard}
and
{content
type},
create
two
EXI
grammars
Type
i
and
TypeEmpty
i
for
evaluating
instances
of
types
with
qname
local-name
=
{name}
and
qname
uri
=
{target
namespace}
,
as
follows
.
following
the
procedure
described
below.
Generate a grammar Attribute i , for each attribute use A i in {attribute uses} according to section 8.5.4.1.4 Attribute Uses .
Sort
the
attribute
use
grammars
first
by
qname
local-name,
then
by
qname
uri
to
form
a
sequence
of
grammars
G
0
,
G
1
,
…,
G
n-1
n−1
,
where
n
is
the
number
of
attribute
uses
in
{attribute
uses}.
Generate
If
an
{attribute
wildcard}
is
specified,
increment
n
and
generate
an
additional
attribute
use
grammar
G
n
n−1
as
follows:
G
| ||
EE |
If
an
When
the
{attribute
wildcard}
wildcard}'s
{namespace constraint}
is
specified
with
the
value
any
,
or
a
pair
of
not
and
either
a
namespace
name
or
the
special
value
absent
indicating
no
namespace,
add
the
following
production
to
each
grammar
G
i
generated
above:
above,
where
0 ≤
i
<
n
:
G i, 0 : | ||
|
If
an
{attribute
wildcard}
Otherwise,
that
is,
when
{namespace constraint}
is
specified
with
a
set
of
values
whose
members
are
namespace
names
or
the
special
value
absent
indicating
no
namespace,
add
the
following
production
to
each
grammar
G
i
generated
above:
above
where
0 ≤
i
<
n
:
G i, 0 : | |||||
AT( uri x : *) G i, 0 | |||||
| |||||
| |||||
|
The grammar TypeEmpty i is created by combining the sequence of attribute use grammars terminated by an empty {content type} grammar as follows:
TypeEmpty
i
=
G
0
⊕
G
1
⊕
…
⊕
G
|
where the grammar Content i is created as follows:
Content i, 0 : | ||
EE |
The content index of grammar TypeEmpty i is the index of its last non-terminal symbol.
The grammar Type i is generated as follows.
If
{content
type}
is
a
simple
type
definition
T
j
,
generate
a
grammar
Content
i
as
Type
j
according
to
section
8.5.4.1.3.1
SimpleType
Simple
Type
Grammars
.
If
{content
type}
has
a
content
model
particle,
generate
a
grammar
Content
i
according
to
section
8.5.4.1.5
Particles
.
Otherwise,
if
{content
type}
is
empty
,
create
a
grammar
Content
i
as
follows:
Content i : | ||
EE |
If {content type} is a content model particle with mixed content, add a production for each non-terminal Content i , j in Content i as follows:
Content i, j : | ||
CH [untyped value] Content i, j |
Note: | |
---|---|
The value of each Characters event that has an [untyped value] is represented as a String (see 7.1.10 String ). |
Then, create a copy H i of each attribute use grammar G i and create the grammar Type i by combining this sequence of attribute use grammars and the Content i grammar using the grammar concatenation operator defined in section 8.5.4.1.1 Grammar Concatenation Operator as follows:
Type
i
=
H
0
⊕
H
1
⊕
…
⊕
H
|
The content index of grammar Type i created from an XML Schema complex type definition is the index of the first non-terminal symbol of Content i within the context of Type i .
XML Schema [XML Schema Structures] defines a complex ur-type XS1 called xsd:anyType XS1 , which is the default type for declared elements when no type is specified in the declaration. The type xsd:anyType can be used as the type of declared elements in schemas, or as the explicit type given to elements by means of xsi:type attribute in schema-informed EXI streams .
When
schemas
are
available
to
describe
the
body
of
an
EXI
stream,
create
an
ur-type
grammar
grammars
UrType
Type
ur-type
and
TypeEmpty
ur-type
that
is
are
used
to
process
the
element
contents
of
type
xsd:anyType
XS1
as
follows.
shown
below.
| ||
| ||
SE(*)
| ||
EE | ||
CH
| ||
| ||
SE(*)
| ||
EE | ||
CH
| ||
TypeEmpty ur-type, 0 : | ||
AT (*) TypeEmpty ur-type, 0 | ||
EE | ||
TypeEmpty ur-type, 1 : | ||
EE | ||
Semantics: | |
---|---|
In a schema-informed grammar, all productions of the form LeftHandSide : AT (*) RightHandSide are evaluated as follows:
In a schema-informed grammar, all productions of the form LeftHandSide : SE (*) RightHandSide are evaluated as follows:
|
The
content
index
of
grammar
grammars
UrType
Type
is
always
ur-type
and
TypeEmpty
ur-type
are
both
1
(one).
Given
an
attribute
use
A
i
with
properties
{required}
and
{attribute
declaration},
{attribute declaration},
where
{attribute
declaration}
{attribute declaration}
has
properties
{name},
{target
namespace}
{target namespace},
{type definition}
and
{scope},
generate
a
new
EXI
grammar
Attribute
i
i
for
evaluating
attributes
in
the
specified
{scope}
with
qname
localName
local-name
=
{name}
and
qname
uri
=
{target
namespace}.
{target namespace}
where
qname
is
the
qname
of
the
attributes.
Add
the
following
grammar
productions
to
Attribute
i
:
Attribute i, 0 : | ||
AT( qname ) [schema-typed value] Attribute i, 1 | ||
Attribute i, 1 : | ||
EE |
If the {required} property of A i is false, add the following grammar production to indicate this attribute occurrence may be omitted from the content model.
Attribute i, 0 : | ||
EE |
Note: | |
---|---|
Productions of the form LeftHandSide : AT( qname ) [schema-typed value] RightHandSide represent typed attributes that occur in schema-valid contexts with values that can be represented using the EXI datatype representation associated with the attribute's {type definition} (see 7. Representing Event Content ). Attributes that occur in schema-valid contexts that can be represented using the EXI datatype representation associated with the attribute's {type definition}, SHOULD be represented this way. Attributes that are not represented this way, are represented using the alternate forms of AT events described in section 8.5.4.4 Undeclared Productions . |
Given
an
XML
Schema
particle
XS1
P
i
with
{min
occurs},
{max
occurs}
{min occurs},
{max occurs}
and
{term}
properties,
generate
a
grammar
Particle
i
for
evaluating
instances
of
P
i
as
follows.
If {term} is an element declaration, generate the grammar Term 0 according to section 8.5.4.1.6 Element Terms . If {term} is a wildcard, generate the grammar Term 0 according to section 8.5.4.1.7 Wildcard Terms Wildcard Terms. If {term} is a model group, generate the grammar Term 0 according to section 8.5.4.1.8 Model Group Terms .
Create
{min
occurs}
{min occurs}
copies
of
Term
0
.
G
0
,
G
1
,
…,
G
|
If
{max
occurs}
{max occurs}
is
not
unbounded,
create
{max
occurs}
{max occurs}
–
{min
occurs}
{min occurs}
additional
copies
of
Term
0
,
G
|
Add the following productions to each of the grammars that do not already have a production of this form.
G i, 0 : | ||
EE
where
|
indicating these instances of Term 0 may be omitted from the content model. Then, create the grammar for Particle i using the grammar concatenation operator defined in section 8.5.4.1.1 Grammar Concatenation Operator as follows:
Particle
i
=
G
0
⊕
G
1
⊕
…
⊕
G
|
Otherwise,
if
{max
occurs}
{max occurs}
is
unbounded,
generate
one
additional
copy
of
Term
0
,
G
{min
occurs}
{min occurs}
and
replace
all
productions
of
the
form:
G
| ||
EE |
with productions of the form:
G
| ||
G
|
indicating this term may be repeated indefinitely. Then if there is no production of the form:
G
| ||
EE |
add
one
after
the
other
productions
with
the
non-terminal
G
{min
occurs}, 0
{min occurs}, 0
on
the
left
hand
left-hand
side,
indicating
this
term
may
be
omitted
from
the
content
model.
Then,
create
the
grammar
for
Particle
i
using
the
grammar
concatenation
operator
defined
in
section
8.5.4.1.1
Grammar
Concatenation
Operator
as
follows:
Particle
i
=
G
0
⊕
G
1
⊕
…
⊕
G
|
Given
a
particle
{term}
PT
i
that
is
an
XML
Schema
element
declaration
XS1
with
properties
{name}
and
{target
namespace},
{target namespace},
let
S
be
the
set
of
element
declarations
that
directly
or
indirectly
reaches
the
element
declaration
PT
i
through
the
chain
of
{substitution
group
affiliation}
{substitution group affiliation}
property
of
the
elements,
plus
PT
i
itself
if
was
not
in
the
set.
Sort
the
element
declarations
in
S
lexicographically
first
by
{name}
then
by
{target
namespace},
{target namespace},
which
makes
a
sorted
list
of
element
declarations
E
0
,
E
1
,
…
E
n-1
n−1
where
n
is
the
cardinality
of
S
.
Then
create
the
grammar
ParticleTerm
i
i
with
the
following
grammar
productions:
Syntax: | ||
---|---|---|
ParticleTerm i, 0 : | ||
SE( qname 0 ) ParticleTerm i, 1 | ||
SE( qname 1 ) ParticleTerm i, 1 | ||
⋮ | ||
SE(
qname
| ||
ParticleTerm i, 1 : | ||
EE | ||
Note: | ||
---|---|---|
In
the
productions
above,
qname
x
(where
0
| ||
Semantics: | ||
---|---|---|
In
a
schema-informed
grammar,
all
productions
of
the
form
LeftHandSide
:
SE(
qname
)
RightHandSide
are
evaluated
as
follows:
|
Given
a
particle
{term}
PT
i
i
that
is
an
XML
Schema
wildcard
XS1
with
property
{namespace
constraint},
{namespace constraint},
a
grammar
that
reflects
the
wildcard
definition
is
created
as
follows.
Create a grammar ParticleTerm i containing the following grammar production:
ParticleTerm i, 1 : | ||
EE |
When
the
wildcard's
{namespace
constraint}
{namespace constraint}
is
either
any
,
or
a
pair
of
not
and
either
a
namespace
name
or
the
special
value
other
,
absent
indicating
no
namespace,
add
the
following
production
to
ParticleTerm
i
.
ParticleTerm i, 0 : | ||
SE(*) ParticleTerm i, 1 |
Otherwise
(i.e.
{namespace
constraint}
being
Otherwise,
that
is,
when
{namespace constraint}
is
a
set
of
values
whose
members
are
namespace
names),
for
each
member
value
uri
x
in
{namespace
constraint}
where
0
≤
x
<
n
,
and
n
is
the
number
of
members,
augment
the
uri
partition
of
names
or
the
String
table
with
special
value
uri
x
absent
(see
section
7.3.1
String
Table
Partitions
for
String
table
pre-population),
and
indicating
no
namespace,
add
the
following
production
to
ParticleTerm
i
.
:
ParticleTerm i, 0 : | ||
SE( uri x : *) ParticleTerm i, 1 | ||
Note
that
productions
of
which
right
hand
side
start
with
terminal
SE(*)
or
SE(
for
each
member
value
uri
x
x
in
{namespace constraint},
provided
that
it
is
the
empty
string
(i.e.
"")
that
is
used
as
uri
: *)
are
only
matched
if
a
more
specific
match
for
x
when
the
current
event
does
not
exist
in
member
value
is
the
grammar.
Terminals
SE(
special
value
absent
.
Each
uri
x
: *)
are
matched
before
it
falls
back
is
used
to
SE(*)
among
augment
the
productions
uri
partition
of
same
the
String
table.
Section
7.3.1
String
Table
Partitions
describes
how
these
LeftHandSide
,
if
any.
uri
strings
are
put
into
String
table
for
pre-population.
Semantics: | |
---|---|
In
a
schema-informed
grammar,
all
productions
of
the
form
LeftHandSide
:
Terminal
RightHandSide
where
Terminal
is
one
of
|
Given
a
particle
{term}
PT
i
that
is
a
model
group
with
{compositor}
equal
to
"sequence"
and
a
list
of
n
{particles}
P
0
,
P
1
,
…,
P
n-1
n−1
,
create
a
grammar
ParticleTerm
i
i
as
follows:
If the value of n is 0, add the following productions to the grammar ParticleTerm i .
ParticleTerm i, 0 : | ||
EE |
Otherwise,
generate
a
sequence
of
grammars
Particle
0
,
Particle
1
,
…,
Particle
n-1
n−1
corresponding
to
the
list
of
particles
P
0
,
P
1
,
…,
P
n-1
n−1
according
to
section
8.5.4.1.5
Particles
.
Then
combine
the
sequence
of
grammars
using
the
grammar
concatenation
operator
defined
in
section
8.5.4.1.1
Grammar
Concatenation
Operator
as
follows:
ParticleTerm
|
Given
a
particle
{term}
PT
i
that
is
a
model
group
with
{compositor}
equal
to
"choice"
and
a
list
of
n
{particles}
P
0
,
P
1
,
…,
P
n-1
n−1
,
create
a
grammar
ParticleTerm
i
i
as
follows:
If the value of n is 0, add the following productions to the grammar ParticleTerm i .
ParticleTerm i, 0 : | ||
EE |
Otherwise,
generate
a
sequence
of
grammar
productions
Particle
0
,
Particle
1
,
…,
Particle
n-1
n−1
corresponding
to
the
list
of
particles
P
0
,
P
1
,
…,
P
n-1
n−1
according
to
section
8.5.4.1.5
Particles
.
Then
create
the
grammar
ParticleTerm
i
with
the
following
grammar
productions:
ParticleTerm i, 0 : | ||
Particle 0, 0 | ||
Particle 1, 0 | ||
⋮ | ||
Particle
|
indicating the grammar for the term may accept any one of the given {particles}.
Given
a
particle
{term}
PT
i
that
is
a
model
group
with
{compositor}
equal
to
"all"
and
a
list
of
n
{particles}
{ particles }
P
0
,
P
1
,
...,
P
n-1
n−1
,
create
a
grammar
ParticleTerm
i
i
as
follows:
Generate
a
set
of
grammars
Add
the
following
production
to
the
grammar
S
ParticleTerm
0
i
.
ParticleTerm i, 0 : | ||
EE |
If
the
value
of
n
=
{
is
not
0,
generate
a
sequence
of
grammar
productions
Particle
0
,
Particle
1
,
...,
…,
Particle
n-1
n−1
}
corresponding
to
the
list
of
particles
P
0
,
P
1
,
...,
…,
P
n-1
n−1
according
to
section
8.5.4.1.5
Particles
.
Then,
generate
the
grammar
ParticleTerm
i
from
the
set
S
0
by
applying
Replace
all
productions
of
the
following
rules.
form:
|
| |
| EE |
with
productions
to
the
grammar
G
,
which
completes
of
the
grammar
G
.
form:
| ||
|
where
0
≤
≤
j
<
m
n
,
C
j
is
a
copy
of
G
j
and
0
≤
All
k
(
<
S
m
–
{
with
G
m
j
})
is
denoting
the
All
grammar
for
number
non-terminals
in
the
set
(
S
–
{
grammar
G
Particle
j
})
created
by
applying
this
sequence
of
rules
recursively
starting
step
1.
Then
add
.
Add
the
following
productions
to
the
grammar
G
:
ParticleTerm
i
.
| ||
| ||
| ||
⋮ | ||
|
Note:
This
section
describes
the
process
for
converting
an
EXI
proto-grammar
derived
generated
from
an
XML
Schema
in
accordance
with
section
8.5.4.1
EXI
Proto-Grammars
into
an
EXI
normalized
grammar.
Each
production
in
an
EXI
normalized
grammar
has
exactly
one
non-terminal
symbol
on
the
left
hand
left-hand
side
and
one
terminal
symbol
on
the
right
hand
right-hand
side
followed
by
at
most
one
non-terminal
symbol
on
the
right
hand
right-hand
side.
In
addition,
EXI
normalized
grammars
contain
no
two
grammar
productions
with
the
same
non-terminal
on
the
left
left-hand
side
and
the
same
terminal
symbol
on
the
right-hand-side.
right-hand
side.
This
is
a
restricted
form
of
Greibach
normal
form
[Greibach
Normal
Form]
.
EXI proto-grammars differ from normalized EXI grammars in that they may contain productions of the form:
LeftHandSide : | ||
|
where
LeftHandSide
and
RightHandSide
are
both
non-terminals.
Therefore,
the
first
step
of
the
normalization
process
focuses
on
replacing
productions
in
this
form
with
productions
that
conform
to
the
EXI
normalized
grammar
rules.
This
process
can
produce
a
grammar
that
has
more
than
one
production
with
the
same
non-terminal
on
the
left
hand
left-hand
side
and
the
same
terminal
symbol
on
the
right
hand
right-hand
side.
Therefore,
the
second
step
focuses
on
eliminating
such
productions.
The first step of the normalization process is described in Section 8.5.4.2.1 Eliminating Productions with no Terminal Symbol . The second step is described in section 8.5.4.2.2 Eliminating Duplicate Terminal Symbols . Once these two steps are completed, the grammar will be an EXI normalized grammar.
Given
an
EXI
proto-grammar
G
i
,
with
non-terminals
G
i, 0
,
G
i, 1
,
…,
G
i, n-1
i, n−1
,
replace
each
production
of
the
form:
G i, j : | ||
G
i, k
where
|
with a set of productions:
G i, j : | ||
RHS ( G i, k ) 0 | ||
RHS ( G i, k ) 1 | ||
⋮ | ||
RHS ( G i, k ) m-1 |
where
RHS
(
G
i, k
)
0
,
RHS
(
G
i, k
)
1
,
…,
RHS
(
G
i, k
)
m-1
represents
the
right
hand
right-hand
side
of
each
production
in
G
i
that
has
the
non-terminal
G
j, k
i, k
on
the
left
hand
left-hand
side
and
m
is
the
number
of
such
productions.
Remove such productions if any among G i, j : RHS ( G i, k ) h where 0 ≤ h < m of which the right-hand side either is identical to the left-hand side, or has previously been replaced while applying the process described in this section to productions with G i, j on the left-hand side.
Repeat
this
process
until
there
are
no
more
production
productions
of
the
form:
G i, j : | ||
G
i, k
where
|
in the grammar G i .
Given
an
EXI
proto-grammar
G
i
,
with
non-terminals
G
i, 0
,
G
i, 1
,
…,
G
i, n-1
i, n−1
,
identify
all
pairs
of
productions
that
have
the
same
non-terminal
on
the
left
hand
left-hand
side
and
the
same
terminal
symbol
on
the
right
hand
right-hand
side
of
the
form:
G i, j : | ||
Terminal G i, k | ||
Terminal G i, l |
where k ≠ l and Terminal represents a particular terminal symbol and replace them with a single production:
G i, j : | ||
Terminal G i, k ⊔ l |
where G i, k ⊔ l is a distinct non-terminal that accepts the inputs accepted by G i, k and the inputs accepted by G i, l . Here the notation " k ⊔ l " denotes a union set of integers and is used to uniquely identify the index of such a non-terminal.
When G i is a type grammar, if both k and l are smaller than content index of G i , k ⊔ l is also considered to be smaller than content for the purpose of index comparison purposes. Otherwise, if either k or l is not smaller than content , k ⊔ l is considered to be larger than content .
If the non-terminal G i, k ⊔ l does not exist, create it as follows:
G i, k ⊔ l : | ||
RHS ( G i, k ) 0 | ||
RHS ( G i, k ) 1 | ||
⋮ | ||
RHS ( G i, k ) m-1 | ||
RHS ( G i, l ) 0 | ||
RHS ( G i, l ) 1 | ||
⋮ | ||
RHS
(
G
i, l
)
|
where
RHS
(
G
i, k
)
0
,
RHS
(
G
i, k
)
1
,
…,
RHS
(
G
i, k
)
m-1
and
RHS
(
G
i, l
)
0
,
RHS
(
G
i, l
)
1
,
…,
RHS
(
G
i, l
)
n-1
n−1
represent
the
right
hand
right-hand
side
of
each
production
in
the
Grammar
G
i
that
has
the
non-terminals
G
j, k
and
G
j, l
on
the
left
hand
left-hand
side
respectively
and
m
and
n
are
the
number
of
such
productions.
Repeat this process until there are no more productions in the grammar G i of the form:
G i, j : | ||
Terminal G i, k | ||
Terminal G i, l |
Then, identify any identical productions of the following form:
G i, j : | ||
Terminal G i, k | ||
Terminal G i, k |
where 0 ≤ k < n , n is the number of productions in G i and Terminal represents a specific terminal symbol, then remove one of them until there are no more productions remaining in the grammar G i of this form.
This
section
describes
the
process
for
assigning
unique
event
codes
to
each
production
in
a
normalized
EXI
grammar.
Given
a
normalized
EXI
grammar
G
i
,
apply
the
following
process
to
each
unique
non-terminal
G
i, j
that
occurs
on
the
left
hand
left-hand
side
of
the
productions
in
G
i
where
0
≤
0 ≤
j
<
n
and
n
is
the
number
of
such
non-terminals
in
G
i
.
Sort
all
productions
with
G
i, j
on
the
left
hand
left-hand
side
in
the
following
order:
In step 4 and step 5, the schema order of productions with SE( qname ) and SE( uri x : *) on the right-hand side is determined by the order of the corresponding particles in the schema after any references to named model groups in the schema are expanded in place with the group definitions themselves. A content model of a complex type can be seen as a tree that consists of particles where particles of either element declaration terms or wildcard terms appear as leaves, and the order is assigned to those leaf particles by traversing the tree by depth-first method.
Given
the
sorted
list
of
productions
P
0
,
P
1
,
…
P
n
with
the
non-terminal
G
i, j
on
the
left
hand
left-hand
side,
assign
event
codes
to
each
of
the
productions
as
follows:
Productions | Event Code | |
---|---|---|
P 0 | 0 | |
P 1 | 1 | |
⋮ | ⋮ | |
P
|
n
|
The
normalized
element
and
type
grammars
derived
generated
from
a
schema
describe
the
sequences
of
child
elements,
attributes
and
character
events
that
may
occur
in
a
particular
EXI
stream.
However,
there
are
additional
events
that
may
occur
in
an
EXI
stream
that
are
not
described
by
the
schema,
for
example
events
representing
comments,
processing-instructions,
schema
deviations,
etc.
This section first describes the process for, in cases with strict option value set to false, augmenting the normalized element and type grammars with productions that describe events that may occur in the EXI stream, but are not explicitly declared in the schema. It then describes the way, in cases with strict option value set to true, normalized element and type grammars are supplemented with productions to be prepared for the occurrences of xsi:type and xsi:nil attributes that are permitted by the schema.
In
the
normalized
element
and
type
grammars,
terminal
symbols
AT
and
CH
represent
attributes
or
attribute
and
character
events
that
have
schema-valid
values
per
can
be
represented
by
the
EXI
datatype
representations
associated
datatypes.
with
their
schema
datatypes
(see
7.
Representing
Event
Content
).
When
the
strict
option
value
is
set
to
false,
in
order
to
efficiently
permit
schema-invalid
values
for
these
event
types,
terminal
symbols
additional
untyped
AT
and
CH
predicated
as
schema-invalid
terminal
symbols
are
introduced
to
convey
added
that
their
values
are
schema-invalid.
can
be
used
for
representing
attributes
and
character
events
that
cannot
be
represented
by
the
associated
EXI
datatype
representations
(e.g.,
schema-invalid
values).
The
following
table
shows
the
notation
used
for
such
AT
and
CH
terminals
along
with
their
definitions.
| Definition |
---|---|
AT (
qname
|
Terminal
symbol
that
matches
an
attribute
|
|
Terminal
symbol
that
matches
an
attribute
|
CH
|
Terminal
symbol
that
matches
|
This section describes the process for augmenting the normalized grammars when the value of the strict option is false. For each normalized element grammar Element i , create a copy Element i, content2 of Element i, content where the index "content" is the content of the type of the element from which Element i was created. Then, apply the following procedures.
Add the following production to each non-terminal Element i, j that does not already include a production of the form Element i, j : EE, such that 0 ≤ j ≤ content.
Syntax | Event Code | ||
---|---|---|---|
Element i, j : | |||
EE | n . m | ||
where n . m represents the next available event code with length 2. | |||
Let
E
i
be
the
element
declaration
from
which
Element
i
was
created
and
T
k
be
the
{type
definition}
{type definition}
of
E
i
.
Let
Type
k
and
TypeEmpty
k
be
the
type
grammars
created
from
T
k
(see
section
8.5.4.1.3
Type
Grammars
).
Add
the
following
productions
to
Element
i
.
Syntax | Event Code | ||
---|---|---|---|
Element i, 0 : | |||
AT(xsi:type) Element i, 0 | n . m | ||
AT(xsi:nil) Element i, 0 | n .( m +1) | ||
where n . m represents the next available event code with length 2. | |||
Note: | |
---|---|
When xsi:type and/or xsi:nil attributes appear in an element where schema-informed grammars are in effect, they MUST occur before any other attribute events of the same element, with xsi:type placed before xsi:nil when they both occur. | |
Semantics: | |
---|---|
When
using
schemas,
all
productions
of
the
form
LeftHandSide
:
AT (xsi:type)
RightHandSide
are
evaluated
as
follows:
| |
When
using
schemas,
productions
of
the
form
LeftHandSide
:
|
Add
the
following
productions
to
For
each
non-terminal
Element
i, j
,
such
that
0
≤
j
≤
content .
content ,
with
zero
or
more
productions
of
the
following
form:
Element i, j : | ||
AT ( qname 0 ) [schema-typed value] NonTerminal 0 | ||
AT ( qname 1 ) [schema-typed value] NonTerminal 1 | ||
⋮ | ||
AT ( qname x -1 ) [schema-typed value] NonTerminal x-1 |
where x represents the number of attributes declared in the schema for this context, add the following productions:
Syntax | Event Code | ||
---|---|---|---|
Element i, j : | |||
| n . m | ||
| n .( m +1).0 | ||
| n .( m +1).1 | ||
⋮ | ⋮ | ||
| n .( m +1).( x -1) | ||
| n .( m +1).( x ) | ||
where
n
.
m
represents
the
next
available
event
code
with
length
| |||
Note: | |||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| |||||||||||||||||
| |||||||||||||||||
|
Semantics: | |
---|---|
|
In
a
schema-informed
grammar,
all
productions
of
the
form
LeftHandSide
:
|
Add the following production to Element i .
Syntax | Event Code | ||
---|---|---|---|
Element i, 0 : | |||
NS Element i, 0 | n . m | ||
where n . m represents the next available event code with length 2. | |||
When the value of the selfContained option is true, add the following production to Element i .
Syntax | Event Code | ||
---|---|---|---|
Element i, 0 : | |||
SC Fragment | n . m | ||
where n . m represents the next available event code with length 2. |
Semantics: | |
---|---|
All
productions
of
the
form
LeftHandSide
:
SC
Fragment
are
evaluated
as
follows:
|
Add the following productions to each non-terminal Element i, j , such that 0 ≤ j ≤ content .
Syntax | Event Code | ||
---|---|---|---|
Element i, j : | |||
| n . m | ||
CH
| n .( m +1) | ||
ER Element i, content2 | n .( m +2) | ||
CM Element i, content2 | n .( m +3).0 | ||
PI Element i, content2 | n .( m +3).1 | ||
where n . m represents the next available event code with length 2. | |||
Note: |
---|
|
Semantics: | |
---|---|
In
a
schema-informed
grammar,
all
productions
of
the
form
LeftHandSide
:
SE (*)
RightHandSide
are
evaluated
as
follows:
|
Add the following production to Element i, content2 and to each non-terminal Element i, j that does not already include a production of the form Element i, j : EE, such that content < j < n , where n is the number of non-terminals in Element i .
Syntax | Event Code | ||
---|---|---|---|
Element i, j : | |||
EE | n . m | ||
where n . m represents the next available event code with length 2. | |||
Add the following productions to Element i, content2 and to each non-terminal Element i, j , such that content < j < n , where n is the number of non-terminals in Element i .
Syntax | Event Code | ||
---|---|---|---|
Element i, j : | |||
| n . m | ||
CH
| n .( m +1) | ||
ER Element i, j | n .( m +2) | ||
CM Element i, j | n .( m +3).0 | ||
PI Element i, j | n .( m +3).1 | ||
where n . m represents the next available event code with length 2. | |||
Semantics: | |
---|---|
In
a
schema-informed
grammar,
all
productions
of
the
form
LeftHandSide
:
SE (*)
RightHandSide
are
evaluated
as
follows:
|
Apply
the
process
described
above
for
element
grammars
to
each
normalized
type
grammar
Type
i
.
and
TypeEmpty
i
.
This
section
describes
the
process
for
augmenting
the
normalized
grammars
when
the
value
of
the
strict
option
is
true.
For
each
normalized
element
grammar
Element
i
i
,
apply
the
following
procedures.
Let
E
i
be
the
element
declaration
from
which
Element
i
was
created
and
T
k
be
the
{type
definition}
{type definition}
of
E
i
.
If
T
k
either
has
named
sub-types,
sub-types
or
is
a
simple
type
definition
of
which
{variety}
is
union
,
add
the
following
production
to
Element
i
.
Syntax | Event Code | ||
---|---|---|---|
Element i, 0 : | |||
AT(xsi:type) Element i, 0 | n . m | ||
where n . m represents the next available event code with length 2. | |||
Semantics: | |
---|---|
When using schemas, productions of the form LeftHandSide : AT (xsi:type) RightHandSide are evaluated as follows: | |
|
Let Type k and TypeEmpty k be the type grammars created from T k (see section 8.5.4.1.3 Type Grammars ). If the {nillable} property of E i is true, add the following production to Element i .
Syntax | Event Code | ||
---|---|---|---|
Element i, 0 : | |||
AT(xsi:nil) Element i, 0 | n . m | ||
where n . m represents the next available event code with length 2. | |||
Semantics: | |
---|---|
When
using
schemas,
productions
of
the
form
LeftHandSide
:
| |
|
Note: |
---|
|
|
The use of EXI compression increases compactness utilizing additional computational resources. EXI compression combines knowledge of XML with a widely adopted, standard compression algorithm to achieve higher compression ratios than would be achievable by applying compression to the entire stream.
EXI compression is applied when compression is turned on or when alignment is set to pre-compression . Byte-aligned representations of event codes and content items are more amenable to compression algorithms compared to unaligned representations because most compression algorithms operate on series of bytes to identify redundancies in the octets. Therefore, when EXI compression is used, event codes and content items of EXI events are encoded as aligned bytes in accordance with 6.2 Representing Event Codes and 7. Representing Event Content .
EXI compression splits a sequence of EXI events into a number of contiguous blocks of events. Events that belong to the same block are transformed into lower entropy groups of similar values called channels , which are individually well suited for standard compression algorithms. To reduce compression overhead, smaller channels are combined before compressing them, while larger channels are compressed independently. The criteria EXI compression uses to define and combine channels is intentionally simple to facilitate implementation, reduce processing overhead, and avoid the need to encode channel ordering or grouping information in the format. The figure below presents a schematic view of the steps involved in EXI compression.
Figure 9-1. EXI Compression Overview
In the following sections, 9.1 Blocks defines blocks and explains how EXI events are partitioned into blocks. Section 9.2 Channels defines channels, their organization as well as how a group of channels correlate to its corresponding block of events. Section 9.3 Compressed Streams describes how some channels are combined as needed in preparation for applying compression algorithms on channels.
EXI
compression
partitions
the
sequence
of
EXI
events
into
a
sequence
of
one
or
more
non-overlapping
blocks.
Each
block
preceding
the
final
block
contains
the
minimum
set
of
consecutive
events
that
have
result
in
exactly
blockSize
Attribute
(AT)
and
Character
(CH)
values
,
in
its
value
channels
(see
9.2.2
Value
Channels
),
where
blockSize
is
the
block
size
of
the
EXI
stream
(see
5.4
EXI
Options
).
The
final
block
contains
no
more
than
blockSize
Attribute
(AT)
and
Character
(CH)
values
.
in
its
value
channels.
Events inside each block are multiplexed into channels. The first channel of each block is the structure channel described in Section 9.2.1 Structure Channel . The remaining channels in each block are value channels described in Section 9.2.2 Value Channels . The diagram below presents an exemplary view of the transformation in which events within a block are multiplexed into channels in one way and channels are demultiplexed into events in the other way.
Figure 9-2. Multiplexing EXI events into channels
The
structure
channel
of
each
block
defines
the
overall
order
and
structure
of
the
events
in
that
block.
It
contains
the
event
codes
and
associated
content
for
each
event
in
the
block,
except
for
Attribute
(AT)
and
Character
(CH)
values
,
,
which
are
stored
in
the
value
channels.
In
addition,
there
are
two
kinds
of
attribute
events
whose
values
are
stored
in
the
structure
channel
instead
of
in
value
channels,
which
are
xsi:nil
and
channels.
The
value
of
each
xsi:type
attributes
attribute
is
stored
in
the
structure
channel.
The
value
of
each
xsi:nil
attribute
that
match
matches
a
schema-informed
grammar
production.
production
and
has
a
schema-valid
value
is
also
stored
in
the
structure
channel.
These
attribute
events
are
intrinsic
to
the
grammar
system
thus
are
essential
in
processing
the
structure
channel
because
their
values
affect
the
grammar
to
be
used
for
processing
the
rest
of
the
elements
on
which
they
appear.
All
event
codes
and
content
in
the
structure
stream
occur
in
the
same
order
as
they
occur
in
the
EXI
event
sequence.
The values of the Attribute (AT) and Character (CH) events in each block are organized into separate channels based on the qname of the associated attribute or element. Specifically, the value of each Attribute (AT) event is placed in the channel identified by the qname of the Attribute and the value of each Character (CH) event is placed in the channel identified by the qname of its parent Start Element (SE) event. Each block contains exactly one channel for each distinct element or attribute qname that occurs in the block. The values in each channel occur in the order they occur in the EXI event sequence.
The channels in a block are further organized into compressed streams. Smaller channels are combined into the same compressed stream, while others are each compressed separately. Below are the rules applied within the scope of a block used to determine the channels to be combined together, the order of the compressed streams and the order amongst the channels that are combined into the same compressed stream.
If the block contains at most 100 values , the block will contain only 1 compressed stream containing the structure channel followed by all of the value channels. The order of the value channels within the compressed stream is defined by the order in which the first value in each channel occurs in the EXI event sequence.
If
the
block
contains
more
than
100
values
,
the
first
compressed
stream
contains
only
the
structure
channel.
The
second
compressed
stream
contains
all
value
channels
that
contain
no
more
than
at
most
100
values
.
And
the
remaining
compressed
streams
each
contain
only
one
channel,
each
having
more
than
100
values
.
The
order
of
the
value
channels
within
the
second
compressed
stream
is
defined
by
the
order
in
which
the
first
value
in
each
channel
occurs
in
the
EXI
event
sequence.
Similarly,
the
order
of
the
compressed
streams
following
the
second
compressed
stream
in
the
block
is
defined
by
the
order
in
which
the
first
value
of
the
channel
inside
each
compressed
stream
occurs
in
the
EXI
event
sequence.
Note:
EXI compression changes the order in which event codes and value s are read and written to and from an EXI stream.When the value of the compression option is set to true, each compressed stream in a block is stored using the standard DEFLATE Compressed Data Format defined by RFC 1951 [IETF RFC 1951] . Otherwise, when the value of the alignment option is set to pre-compression , each compressed stream in a block is stored directly without the DEFLATE algorithm.
[Definition:]
A
Conformant
conformant
EXI
streams
stream
consist
consists
of
a
sequence
of
octets
that
follows
the
syntax
of
EXI
stream
that
is
defined
in
this
document.
[Definition:]
EXI
format
provides
a
way
to
involve
user-defined
datatype
representations
in
EXI
streams
processing,
which
is
an
extension
point
that,
when
used
in
conjunction
with
relevant
datatype
representations
specifications
external
to
this
document,
leads
to
the
formulation
of
Extended
EXI
streams
.
Conformance
of
extended
EXI
streams
are
is
relative
to
the
syntax
defined
by
the
relevant
user-defined
datatype
representations
specifications.
The
definitions
of
user-defined
datatype
representations
syntax
are
out
of
the
scope
of
this
document.
[Definition:]
An
extended
EXI
stream
is
a
conformant
extended
EXI
streams
stream
if
replacing
value
items
represented
using
user-defined
datatype
representations
with
their
intrinsic
representations
would
make
the
stream
a
conformant
EXI
streams
stream
.
When
the
use
of
user-defined
datatype
representations
is
expected,
and
agreed
upon
prior
to
the
exchange
of
An
extended
EXI
streams,
the
parties
intended
to
participate
in
the
exchange
not
only
need
to
share
the
knowledge
about
the
datatype
representations,
but
also
MUST
advertise
the
stream
described
as
"EXI
streams
with
regards
to
datatype
representations
S
"
instead
of
simply
as
"EXI
streams"
when
they
are
asked
to
do
so,
where
S
is
an
unordered
set
of
datatype
representations.
An
"EXI
streams
stream
with
regards
to
datatype
representations
S
"
"
where
S
is
the
set
of
datatype
representations
can
be
processed
by
an
EXI
stream
decoder
only
if
the
processor
has
the
shared
knowledge
about
each
one
of
the
datatype
representations
in
the
set
S
.
EXI
stream
decoders
MAY
fail
with
an
error
when
they
receive
an
extended
EXI
Stream
that
uses
an
user-defined
datatype
representations
that
it
does
not
understand.
The
structural
syntax
of
EXI
streams
and
extended
EXI
streams
is
described
by
the
abstract
EXI
grammar
system
defined
in
this
document.
Although
this
document
specifies
the
normative
way
in
which
XML
Schema
schemas
are
mapped
into
the
EXI
grammar
system
to
make
a
schema-informed
grammar,
grammars,
EXI
allows
the
use
of
other
schema
languages
to
process
EXI
streams
or
extended
EXI
streams
so
far
as
there
is
a
well
known
EXI
grammar
binding
to
of
the
schema
language
and
the
binding
preserves
the
semantics
part
of
the
EXI
grammar
system.
EXI
streams
or
extended
EXI
streams
generated
using
schemas
of
such
schema
language
are
still
conformant.
The
definitions
of
grammar
binding
to
bindings
for
schema
languages
other
than
XML
Schema
is
are
out
of
the
scope
of
this
document,
and
each
community
of
schema
languages
language
community
is
encouraged
to
define
a
its
own
binding
in
order
to
make
it
possible
to
harness
the
most
utmost
efficiency
out
of
EXI
when
schemas
of
that
the
language
are
available
.
available.
The
conformance
of
EXI
Processors
are
is
defined
separately
for
each
of
the
two
processor
roles,
EXI
stream
encoders
and
EXI
stream
decoders
;
the
conformance
of
the
former
is
described
in
terms
of
the
conformance
of
the
EXI
streams
or
extended
EXI
streams
that
they
produce,
while
that
of
the
latter
is
based
on
the
set
of
format
features
that
EXI
stream
decoders
are
prepared
with
for
in
the
processing
of
conformant
EXI
streams
or
conformant
extended
EXI
streams
.
An
EXI
stream
encoder
is
conformant
if
and
only
if
it
is
capable
of
generating
conformant
EXI
streams
or
conformant
extended
EXI
streams
given
any
input
structured
data
it
is
made
to
work
on.
On
the
other
hand,
EXI
stream
decoders
MUST
support
all
format
features
described
in
this
document
as
they
are
explained,
except
for
the
capability
of
handling
Datatype
Representation
Map
which
is
an
optional
feature.
EXI
stream
decoders
that
do
not
implement
Datatype
Representation
Map
feature
MUST
report
an
error
with
a
meaningful
message
upon
encountering
a
"datatypeRepresentationMap"
element
while
processing
EXI
options
documents
in
EXI
headers
.
Both
an
EXI
stream
encoder
and
an
EXI
stream
decoder
MAY
support
only
a
certain
range
of
values
for
the
EXI
header
option
blockSize
.
For
interoperability
between
processors,
every
EXI
processors
SHOULD
at
least
support
the
blockSize
option
value
of
1,000,000.
This
appendix
contains
the
mappings
between
the
XML
Information
Set
[XML
Information
Set]
model
and
the
EXI
format.
Starting
from
the
document
information
item,
each
information
item
definition
is
mapped
to
its
respective
unordered
set
of
EXI
event
types.
types
(see
Table
4-1
).
The
actual
order
amongst
information
set
items
when
it
is
relevant
reflects
the
occurrence
order
of
EXI
events
or
their
references
in
an
EXI
stream
that
correlate
to
the
infoset
items.
As
used
in
the
XML
Information
Set
specification,
the
Infoset
property
names
are
shown
in
square
brackets,
[thus]
.
Note:
As has been prescribed in section 2. Design Principles , EXI is designed to be compatible with the XML Information Set. While this approach is both legitimate and practical for designing a succinct format interoperable with XML family of specifications and technologies, it entails that some lexical constructs of XML not recognized by the XML Information Set are not represented by EXI, either. Examples of such unrepresented lexical constructs of XML include white space outside the document element, white space within tags, the kind of quotation marks (single or double) used to quote attribute values, and the boundaries of CDATA marked sections.
No constructs in EXI format facilitate the representation of [character encoding scheme] , [standalone] and [version] properties which are available in the definition of Document Information Item of XML Information Set (see B.1 Document Information Item ). EXI is made agnostic about [character encoding scheme] and [version] properties as they are in XML Information Set, and considers them to be the properties of XML serializers in use. EXI forgoes [standalone] property because simply having no references to any external markup declarations practically serves the purpose with less complexity.
A
document
information
item
maps
to
a
pair
of
SD
Start
Document
(SD)
and
ED
event
End
Document
(ED)
events
with
each
of
its
properties
subject
to
further
mapping
as
shown
in
the
following
table.
Property | EXI event types |
---|---|
[children] | CM* PI* DT? [SE, EE] |
[document element] | [SE, EE] |
[notations] |
Computed
based
on
text
content
item
of
DT
to
which
each
notation
information
set
item
|
[unparsed entities] |
Computed
based
on
text
content
item
of
DT
to
which
each
unparsed
entity
information
set
item
|
[base URI] | The base URI of the EXI stream |
[character encoding scheme] | N/A |
[standalone] | Not available |
[version] | Not available |
[all declarations processed] | True if all declarations contained directly or indirectly in DT are processed, otherwise false, which is the processor quality as opposed to the information provided by the format. |
An
element
information
item
maps
to
a
pair
of
a
SE
Start
Element
(SE)
event
and
the
corresponding
EE
End
Element
(EE)
event
with
each
of
its
properties
subject
to
further
mapping
as
shown
in
the
following
table.
Property | EXI event types |
---|---|
[namespace name] | SE |
[local name] | SE |
[prefix] | SE |
[children] | [SE, EE]* PI* CM* CH* ER* |
[attributes] | AT* |
[namespace attributes] | NS* |
[in-scope namespaces] | The namespace information items computed using the [namespace attributes] properties of this information item and its ancestors |
[base URI] | The base URI of the element information item |
[parent] | Computed based on the last SE event encountered that did not get a matching EE event if any, or computed based on the SD event |
An
attribute
information
item
maps
to
an
AT
Attribute
(AT)
event
with
each
of
its
properties
subject
to
further
mapping
as
shown
in
the
following
table.
Property | EXI event types |
---|---|
[namespace name] | AT |
[local name] | AT |
[prefix] | AT |
[normalized value] | The value of AT |
[specified] | True if the item maps to AT, otherwise false |
[attribute type] | Computed based on AT and DT |
[references] | Computed based on [attribute type] and value of AT |
[owner element] | Computed based on the last SE event encountered that did not get a matching EE event |
A
processing
instruction
information
maps
to
a
PI
Processing
Instruction
(PI)
event
with
each
of
its
properties
subject
to
further
mapping
as
shown
in
the
following
table.
Property | EXI event types |
---|---|
[target] | PI |
[content] | PI |
[base URI] | The base URI of the processing information item |
[notation] | Computed based on the availability of the internal DTD subset |
[parent] | Computed based on the last SE event encountered that did not get a matching EE event type |
An
unexpanded
entity
reference
information
item
maps
to
an
ER
Entity
Reference
(ER)
event
with
each
of
its
properties
subject
to
further
mapping
as
shown
in
the
following
table.
Property | EXI event types |
---|---|
[name] | ER |
[system identifier] | Based on the availability of the internal DTD subset |
[public identifier] | Based on the availability of the internal DTD subset |
[declaration base URI] | The base URI of the unexpanded entity reference information item |
[parent] | Computed based on the last SE event encountered that did not get a matching EE event type |
A
character
information
item
maps
to
the
individual
characters
contained
in
a
CH
Characters
(CH)
event
following
a
SE
event
that
did
not
get
a
matching
EE
event.
Property | EXI event types |
---|---|
[character code] | Each character in CH |
[element content whitespace] | Computed based on [parent] and DT |
[parent] | Computed based on the last SE event encountered that did not get a matching EE event |
A
comment
information
item
maps
to
a
CM
Comment
(CM)
event
with
each
of
its
properties
subject
to
further
mapping
as
shown
in
the
following
table.
Property | EXI event types |
---|---|
[content] | text content item of CM |
[parent] | Computed based on the last SE event encountered that did not get a matching EE event, or the SD event |
A
document
type
declaration
information
item
maps
to
a
DT
DOCTYPE
(DT)
event
with
each
of
its
properties
subject
to
further
mapping
as
shown
in
the
following
table.
Property | EXI event types |
---|---|
[system identifier] | DT |
[public identifier] | DT |
[children] | Computed based on text content item of DT |
[parent] | Computed based on the SD event |
An
unparsed
entity
information
item
maps
to
part
of
the
text
content
item
of
DT
DOCTYPE
(DT)
event
with
each
of
its
properties
subject
to
further
mapping
as
shown
in
the
following
table.
Property | EXI event types |
---|---|
[name] | Computed based on text content item of DT |
[system identifier] | Computed based on text content item of DT |
[public identifier] | Computed based on text content item of DT |
[declaration base URI] | The base URI of the unparsed entity information item |
[notation name] | Computed based on text content item of DT |
[notation] | Computed based on text content item of DT |
An
notation
information
item
maps
to
part
of
the
text
content
item
of
DT
DOCTYPE
(DT)
event
with
each
of
its
properties
subject
to
further
mapping
as
shown
in
the
following
table.
Property | EXI event types |
---|---|
[name] | Computed based on text content item of DT |
[system identifier] | Computed based on text content item of DT |
[public identifier] | Computed based on text content item of DT |
[declaration base URI] | The base URI of the notation information item |
An
namespace
information
item
ismaps
toa
NS
maps
to
a
Namespace
Declaration
(NS)
event
with
each
of
its
properties
subject
to
further
mapping
as
shown
in
the
following
table.
Property | EXI event types |
---|---|
[prefix] | NS |
[namespace name] | NS |
The
following
schema
describes
the
EXI
options
header.
It
is
designed
to
produce
smaller
headers
for
option
combinations
used
when
Note:
compactness is critical.<xsd:schema targetNamespace="http://www.w3.org/2007/07/exi"
compactness is critical.<xsd:schema targetNamespace="http://www.w3.org/2009/exi"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified">
<xsd:element name="header">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="lesscommon" minOccurs="0">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="uncommon" minOccurs="0">
<xsd:complexType>
<xsd:sequence>
<xsd:any namespace="##other" minOccurs="0" maxOccurs="unbounded" />
<xsd:any namespace="##other" minOccurs="0" maxOccurs="unbounded"
processContents="skip" />
<xsd:element name="alignment" minOccurs="0">
<xsd:complexType>
<xsd:choice>
<xsd:element name="byte">
<xsd:complexType />
</xsd:element>
<xsd:element name="pre-compress">
<xsd:complexType />
</xsd:element>
</xsd:choice>
</xsd:complexType>
</xsd:element>
<xsd:element name="selfContained" minOccurs="0">
<xsd:complexType />
</xsd:element>
<xsd:element name="valueMaxLength" minOccurs="0">
<xsd:simpleType>
<xsd:restriction base="xsd:unsignedInt" />
</xsd:simpleType>
</xsd:element>
<xsd:element name="valuePartitionCapacity" minOccurs="0">
<xsd:simpleType>
<xsd:restriction base="xsd:unsignedInt" />
</xsd:simpleType>
</xsd:element>
<xsd:element name="datatypeRepresentationMap" minOccurs="0" maxOccurs="unbounded">
<xsd:element name="datatypeRepresentationMap"
minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:sequence>
<xsd:any namespace="##other" /> <!-- schema datatype -->
<xsd:any namespace="##other" /> <!-- datatype representation -->
<!-- schema datatype -->
<xsd:any namespace="##other" processContents="skip" />
<!-- datatype representation -->
<xsd:any processContents="skip" />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="preserve" minOccurs="0">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="dtd" minOccurs="0">
<xsd:complexType />
</xsd:element>
<xsd:element name="prefixes" minOccurs="0">
<xsd:complexType />
</xsd:element>
<xsd:element name="lexicalValues" minOccurs="0">
<xsd:complexType />
</xsd:element>
<xsd:element name="comments" minOccurs="0">
<xsd:complexType />
</xsd:element>
<xsd:element name="pis" minOccurs="0">
<xsd:complexType />
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="blockSize" minOccurs="0">
<xsd:simpleType>
<xsd:restriction base="xsd:unsignedInt" />
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="common" minOccurs="0">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="compression" minOccurs="0">
<xsd:complexType />
</xsd:element>
<xsd:element name="fragment" minOccurs="0">
<xsd:complexType />
</xsd:element>
<xsd:element name="schemaId" minOccurs="0" nillable="true">
<xsd:simpleType>
<xsd:restriction base="xsd:string" />
</xsd:simpleType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="strict" minOccurs="0">
<xsd:complexType />
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
<!-- Built-in EXI Datatype IDs for use in datatype representation maps -->
<xsd:simpleType name="base64Binary">
<xsd:restriction base="xsd:base64Binary"/>
</xsd:simpleType>
<xsd:simpleType name="hexBinary" >
<xsd:restriction base="xsd:hexBinary"/>
</xsd:simpleType>
<xsd:simpleType name="boolean" >
<xsd:restriction base="xsd:boolean"/>
</xsd:simpleType>
<xsd:simpleType name="decimal" >
<xsd:restriction base="xsd:decimal"/>
</xsd:simpleType>
<xsd:simpleType name="double" >
<xsd:restriction base="xsd:double"/>
</xsd:simpleType>
<xsd:simpleType name="integer" >
<xsd:restriction base="xsd:integer"/>
</xsd:simpleType>
<xsd:simpleType name="string" >
<xsd:restriction base="xsd:string"/>
</xsd:simpleType>
<xsd:simpleType name="dateTime" >
<xsd:restriction base="xsd:dateTime"/>
</xsd:simpleType>
<xsd:simpleType name="date" >
<xsd:restriction base="xsd:date"/>
</xsd:simpleType>
<xsd:simpleType name="time" >
<xsd:restriction base="xsd:time"/>
</xsd:simpleType>
<xsd:simpleType name="gYearMonth" >
<xsd:restriction base="xsd:gYearMonth"/>
</xsd:simpleType>
<xsd:simpleType name="gMonthDay" >
<xsd:restriction base="xsd:gMonthDay"/>
</xsd:simpleType>
<xsd:simpleType name="gYear" >
<xsd:restriction base="xsd:gYear"/>
</xsd:simpleType>
<xsd:simpleType name="gMonth" >
<xsd:restriction base="xsd:gMonth"/>
</xsd:simpleType>
<xsd:simpleType name="gDay" >
<xsd:restriction base="xsd:gDay"/>
</xsd:simpleType>
<!-- Qnames reserved for future use in datatype representation maps -->
<xsd:simpleType name="ieeeBinary32" >
<xsd:restriction base="xsd:float"/>
</xsd:simpleType>
<xsd:simpleType name="ieeeBinary64" >
<xsd:restriction base="xsd:double"/>
</xsd:simpleType>
</xsd:schema>
The following table lists the entries that are initially populated in uri partitions, where partition name URI denotes that they are entries in the uri partition.
Partition | String ID | String Value |
---|---|---|
URI | 0 | "" [empty string] |
URI | 1 | "http://www.w3.org/XML/1998/namespace" |
URI | 2 | "http://www.w3.org/2001/XMLSchema-instance" |
When XML Schemas are used to inform the grammars for processing EXI body, there is an additional entry that is appended to the uri partition.
Partition | String ID | String Value |
---|---|---|
URI | 3 | "http://www.w3.org/2001/XMLSchema" |
The
following
table
lists
the
entries
that
are
initially
populated
in
prefix
partitions,
where
XML-PF
represents
the
partition
for
prefixes
in
the
"http://www.w3.org/XML/1998/namespace"
namespace
and
XSI-PF
represents
the
partition
for
prefixes
in
the
"http://www.w3.org/2001/XMLSchema-instance"
namespace.
Partition | String ID | String Value |
---|---|---|
"" | 0 | "" [empty string] |
XML-PF | 0 | "xml" |
XSI-PF | 0 | "xsi" |
The
following
table
lists
tables
list
the
entries
that
are
initially
populated
in
local-name
partitions,
where
XML-NS
represents
the
partition
for
local-names
in
the
"http://www.w3.org/XML/1998/namespace"
namespace,
XSI-NS
represents
the
partition
for
local-names
in
the
"http://www.w3.org/2001/XMLSchema-instance"
namespace,
and
XSD-NS
represents
the
partition
for
local-names
in
the
"http://www.w3.org/2001/XMLSchema"
namespace.
Partition | String ID | String Value |
---|---|---|
XML-NS | 0 |
|
XML-NS | 1 |
|
XML-NS | 2 |
|
XML-NS | 3 |
|
XSI-NS | 0 |
|
XSI-NS | 1 |
|
When XML Schemas are used to inform the grammars for processing EXI body, there an additional partition that is appended to the local-name partitions.
Partition | String ID | String Value |
---|---|---|
XSD-NS | 0 |
|
XSD-NS | 1 |
|
XSD-NS | 2 |
|
XSD-NS | 3 |
|
XSD-NS | 4 |
|
XSD-NS | 5 |
|
XSD-NS | 6 |
|
XSD-NS | 7 |
|
XSD-NS | 8 |
|
XSD-NS | 9 |
|
XSD-NS | 10 |
|
XSD-NS | 11 |
|
XSD-NS | 12 |
|
XSD-NS | 13 |
|
XSD-NS | 14 |
|
XSD-NS | 15 |
|
XSD-NS | 16 |
|
XSD-NS | 17 |
|
XSD-NS | 18 |
|
XSD-NS | 19 |
|
XSD-NS | 20 |
|
XSD-NS | 21 |
|
XSD-NS | 22 |
|
XSD-NS | 23 |
|
XSD-NS | 24 |
|
XSD-NS | 25 |
|
XSD-NS | 26 |
|
XSD-NS | 27 |
|
XSD-NS | 28 |
|
XSD-NS | 29 |
|
XSD-NS | 30 |
|
XSD-NS | 31 |
|
XSD-NS | 32 |
|
XSD-NS | 33 |
|
XSD-NS | 34 |
|
XSD-NS | 35 |
|
XSD-NS | 36 |
|
XSD-NS | 37 |
|
XSD-NS | 38 |
|
XSD-NS | 39 |
|
XSD-NS | 40 |
|
XSD-NS | 41 |
|
XSD-NS | 42 |
|
XSD-NS | 43 |
|
XSD-NS | 44 |
|
XSD-NS | 45 |
|
XML
Schema
datatypes
specification
[XML
Schema
Datatypes]
defines
its
a
regular
expression
XS2
syntax
for
use
in
pattern
facets
of
simple
type
definitions.
Pattern
facets
are
applied
to
values
literally
to
constrain
the
set
of
valid
values
to
those
that
lexically
matches
match
the
specified
regular
expression.
Though
regular
expression
syntax
is
defined
by
dozens
of
productions,
after
all,
they
are
character
sets
that
constitute
a
regular
expression
at
This
section
describes
the
finest
granularity
of
rules
for
deriving
the
grammar,
which
are
leveraged
such
as
being
combined,
concatenated,
complemented
or
subtracted
in
a
bottom-up
fashion
to
form
a
regular
expression.
In
this
regard,
a
regular
expression
can
be
seen
as
a
sort
of
micro-schema
that
suggests
a
concrete
character
set
to
which
of
characters
allowed
in
a
string
are
likely
value
that
conforms
to
belong.
The
remainder
of
this
section
describes
a
method
for
deriving
a
character
set
from
given
regular
expression
in
an
XML
Schema
regular
expression.
Hereinafter,
"character
set"
and
"XML
Schema
regular
expression"
are
referred
to
as
"charset"
and
"regexp",
respectively.
Regexp
syntax
permits
Schema.
In
the
use
of
character
class
escapes
XS2
some
of
which
depend
on
following
description,
the
mapping
from
code
points
to
character
properties.
This
document
assumes
term
"set-of-chars"
is
used
as
the
use
shorthand
form
of
revision
5.0.0
"set
of
Unicode
Standard
[Unicode
Database]
to
obtain
the
mapping.
characters".
At
the
top
level,
regexp
the
XML
Schema
regular
expression
syntax
is
summarized
by
the
following
three
productions,
production
excerpted
here
from
[XML
Schema
Datatypes]
.
Note
the
notation
used
for
the
numbers
that
tag
the
productions.
"XSD:"
is
prefixed
to
the
original
numeric
tags
to
make
it
easier
to
discern
them
as
belonging
to
XML
Schema
specification.
[XSD:1]
XS2
regExp ::= branch ( '|' branch )*
[XSD:2]
XS2
branch ::= piece*
[XSD:3]
XS2
piece ::= atom quantifier?
These
productions
indicate
that
the
charset
of
The
set-of-chars
for
a
regexp
(i.e.
regExp
above)
equals
regex
that
conforms
to
the
syntax
above
is
the
union
of
all
the
charsets
set-of-chars
defined
for
each
branch.
Each
branch
of
which
corresponds
to
an
atom
that
participates
in
the
regexp.
There
are
exceptions
which
are
based
on
empirical
observations
that
are
introduced
here
to
identify
certain
regexps
that
are
not
subject
to
the
computation
of
charsets.
If
any
atom
itself
a
regex
is
or
contains
one
of
described
by
the
following
character
groups
directly
or
indirectly,
the
charset
of
the
whole
regexp
is
defined
to
be
the
entire
set
of
XML
characters.
All
multi-character
escapes
production:
[XSD:2]
XS2
(including
meta-character
'.'
)
except
branch ::= piece*
The
set-of-chars
for
'\s'
and
'\d'
.
All
category
escapes
XS2
that
carry
one
each
branch
of
a
regex
is
the
following
character
properties.
All
category
names
that
are
union
of
the
forms:
'L'[ulo]?
,
'M'[n]?
,
'N'
,
'P'
,
'Z'
,
'S'[mo]?
or
'C'[o]?
.
The
following
block
names:
Ethiopic,
UnifiedCanadianAboriginalSyllabics,
CJKUnifiedIdeographs,
CJKCompatibilityIdeographs,
ArabicPresentationForms-A,
CJKUnifiedIdeographsExtensionA,
YiSyllables,
HangulSyllables
and
PrivateUse.
complEsc
XS2
(examples
set-of-chars
for
each
piece
of
which
are
'\P{
L
}'
and
'\P{
N
}'
).
negCharGroup
the
branch.
Each
piece
of
a
branch
is
described
by
the
following
production:
[XSD:3]
XS2
as
indicated
by
meta-character
'^'
.
See
[XSD:15].
piece ::= atom quantifier?
Most
regexps
that
contain
one
of
the
character
groups
listed
above
result
in
a
very
large
number
of
characters,
and
even
in
such
rare
cases
where
it
is
not
necessarily
the
case,
there
are
usually
alternative
ways
to
describe
the
same
effect
more
intuitively
using
none
of
the
above
constructs.
The
rest
of
this
section
describes
the
system
to
derive
character
sets
from
such
regexps
that
do
not
contain
any
set-of-chars
for
each
piece
of
the
character
groups
listed
above.
Shown
below
a
branch
is
the
rule
set-of-chars
for
assembling
the
charset
of
a
regexp
given
a
list
atom
portion
of
atoms
that
are
contained
directly
in
the
regexp,
excluding
those
atoms
contained
in
sub-regexp
parenthesized
within
the
regexp
(see
[XSD:9]
below)
in
which
case
the
sub-regexp
itself
is
the
piece.
The
atom
that
is
included
in
the
list.
Note
the
pseudo-function
notation
portion
of
the
form
CS(arg)
with
arg
being
a
regexp
construct
denotes
the
method
of
obtaining
the
character
set
of
piece
is
described
by
the
argument.
[1] CS(regExp) :=
union of every CS(atom[0...N-1])
where N represents the number of atoms
following
production:
There
are
three
kinds
of
atoms
per
its
definition
[XSD:9].
[XSD:9]
XS2
atom ::= Char | charClass | ( '(' regExp ')' )
This
production
directly
translates
to
the
following
rule
The
set-of-chars
for
acquiring
the
charset
of
an
atom.
[2] CS(atom) :=
a single char represented by Char (if atom is Char)
or
CS(charClass) (if atom is charClass. See rule [3])
or
CS(regExp) (if atom is sub-regexp. See rule [1])
Similarly,
there
are
three
choices
for
charClass
per
its
definition
[XSD:11]
below,
which
atom
is
followed
by
the
corresponding
rule
set-of-chars
for
acquiring
the
charset
of
a
Char,
charClass
.
[XSD:11]
XS2
charClass ::= charClassEsc | charClassExpr | WildcardEsc
[3] CS(charClass) :=
CS(charClassEsc) (if charClass is charClassEsc. See [XSD:23] that defines
the characters contained in CS(charClassEsc) for each kind
of charClassEsc)
CS(charClassExpr) (if charClass charClassExpr. See rule [3])
Note
or
regExp
that
there
is
no
rule
specified
above
constitutes
the
atom.
The
set-of-chars
for
a
charClass
Char
that
is
a
WildcardEsc
.
This
is
because
the
presence
of
a
WildcardEsc
causes
to
conclude
the
charset
of
constitutes
an
atom
contains
the
regExp
single
character
that
contains
this
charClass
to
be
matches
the
entire
XML
charset
(See
rule
[1]
above).
A
charClassExpr
is
either
posCharGroup
,
negCharGroup
or
charClassSub
per
production
[XSD:12]
and
[XSD:13]
as
excerpted
below.
[XSD:12]
XS2
charClassExpr ::= '[' charGroup ']'
[XSD:13]
Char
expression
XS2
charGroup ::= posCharGroup | negCharGroup | charClassSub
[4] CS(charClassExpr) :=
CS(posCharGroup) (if charClassExpr is posCharGroup. See rule [5])
CS(charClassSub) (if charClassExpr is charClassSub. See rule [6])
Note
that
there
is
no
rule
specified
above
.
The
set-of-chars
for
a
charClassExpr
charClass
that
constitutes
an
atom
is
a
negCharGroup
.
This
is
because
the
presence
of
a
negCharGroup
causes
to
conclude
the
charset
set
of
characters
specified
by
the
regExp
that
contains
this
charClassExpr
to
be
the
entire
XML
charset
(See
rule
[1]
above).
posCharGroup
is
defined
to
be
a
sequence
of
charRange
and
charClassEsc
per
production
[XSD:14].
[XSD:14]
charClass
expression
XS2
posCharGroup ::= ( charRange | charClassEsc )+
.
The
above
production
translates
to
the
following
rule
set-of-chars
for
acquiring
the
charset
of
a
posCharGroup
.
[5] CS(posCharGroup) := union of every CS(charRange[0...M-1]) and
every CS(charClassEsc[0...N-1])
where M and N represent the number of charRanges and charClassEscs
contained in the posCharGroup, respectively.
Lastly,
charClassSub
is
defined
using
a
subtraction
operation
as
follows.
[XSD:16]
XS2
charClassSub ::= ( posCharGroup | negCharGroup ) '-' charClassExpr
Because
the
presence
of
negCharGroup
would
have
resulted
in
the
containing
regExp
to
have
the
entire
XML
charset
sub-expression
enclosed
in
parenthesis
that
constitutes
an
atom
is
the
first
place,
negCharGroup
can
be
pruned
from
the
above
production,
which
makes
set-of-chars
for
the
following
reduced
version
of
[XSD:16].
[XSD:16'] charClassSub ::= posCharGroup '-' charClassExpr
The
above
production
translates
to
regExp
itself
derived
by
recursively
applying
the
following
rule
for
acquiring
the
charset
of
a
charClassSub
.
[6] CS(charClassSub) := characters that are found
in CS(posCharGroup) (See rule [5])
but not in CS(charClassExpr) (See rule [4])
defined
above.
Two
labels
are
defined
for
use
in
identifying
and
negotiating
the
interchange
use
of
XML
Information
Set
data
encoded
as
EXI
streams.
for
representing
XML
information
in
higher-level
protocols.
They
serve
two
distinct
roles
of
indicating
metadata
in
data
interchange;
one
roles.
One
is
for
content
coding,
coding
and
the
other
is
for
internet
media
type.
In
such
protocols
that
support
a
mechanism
to
indicate
the
encoding
transformation
of
the
data
being
exchanged,
the
label
"exi"
is
used
as
a
content
coding
(see
section
F.1
Content
Coding
)
in
an
occurrence
of
or
a
request
of
an
XML
Information
Set
interchange
of
which
the
document
body
is
encoded
as
an
EXI
stream.
For
other
protocols
that
lack
the
capability
of
indicating
the
encoding
transformation
of
the
data
being
transferred,
the
other
label
"application/exi"
is
defined
as
an
internet
media
type
(see
section
F.2
Internet
Media
Type
)
in
order
to
identify
that
the
data
being
retrieved
or
sent
is
an
XML
Information
Set
represented
as
an
EXI
stream.
The
content-coding
value
"exi"
is
registered
with
the
Internet
Assigned
Numbers
Authority
(IANA)
for
use
with
EXI.
Protocols
that
support
a
mechanism
to
indicate
can
identify
and
negotiate
the
encoding
transformation
content
coding
of
the
data
being
transferred
(e.g.
HTTP
1.1)
XML
information
independent
of
its
media
type,
SHOULD
use
the
label
content
coding
"exi"
(case-insensitive)
to
annotate
the
transfer
or
the
request
of
data
structured
as
an
XML
Information
Set
to
convey
the
acceptance
or
actual
use
of
or
the
acceptance
of
EXI
encoding
for
the
interchange
that
is
underway.
XML
information.
A new media type registration "application/exi" described below is being proposed for community review, with the intent to eventually submit it to the IESG for review, approval, and registration with IANA.
application
exi
none
none
binary
When used as an XML replacement in an application, EXI shares the same security concerns as XML, described in IETF RFC 3023 [IETF RFC 3023] , section 10.
In addition to concerns shared with XML, the schema identifier refers to information external to the EXI document itself. If an attacker is able to substitute another schema in place of the intended one, the semantics of the EXI document could be changed in some ways. As an example, EXI is sensitive to the order of the values in an enumeration. It is not known whether such an attack is possible on the actual structure of the document.
Also, EXI supports user-defined datatype representations, and such representations, if present in a document and purportedly understood by a processor, can be a security weakness. Definitions of these representations are expected to be external, often application- or industry-specific, so any definition needs to be analyzed carefully from the security perspective before being adopted.
The datatype representation map feature of EXI requires coordination between the producer and consumer of an EXI document, and is not recommended except in controlled environments or using standardized datatype representations potentially defined in the future.
EXI permits information necessary to decode a document to be omitted with the expectation that such information has been communicated out of band. Such omissions hinder interoperability in uncontrolled environments.
Efficient XML Interchange (EXI) Format 1.0, World Wide Web Consortium
No known applications currently use this media type.
Magic number(s): | |
---|---|
The first four octets may be hexadecimal 24 45 58 49 ("$EXI"). The first octet after these, or the first octet of the whole content if they are not present, has its high two bits set to values 1 and 0 in that order. |
File extension(s): | |
---|---|
.exi |
Macintosh file type code(s): | |
---|---|
APPL |
Consideration of alternatives : | |
---|---|
When transferring EXI streams over a protocol that can identify and negotiate the content coding of XML information independent of its media-type, the content-coding should be used to identify and negotiate how the XML information is encoded and the media-type should be used to negotiate and identify what type of information is transfered. | |
World Wide Web Consortium <web-human@w3.org>
COMMON
none
The EXI specification is the product of the World Wide Web Consortium's Efficient XML Interchange Working Group. The W3C has change control over this specification.
EXI Primer [EXI Primer] contains a section that explains the workings of EXI format using simple example documents. Those examples are intended to serve as a tool to confirm the understanding of the EXI format in action by going through encoding and decoding processes step by step.
As an example to exercise the process to produce schema-informed element grammars, consider the following XML Schema fragment declaring two complex-typed elements, <product> and <order>:
<xs:element name="product"> <xs:complexType> <xs:sequence maxOccurs="2"> <xs:element name="description" type="xs:string" minOccurs="0"/> <xs:element name="quantity" type="xs:integer" /> <xs:element name="price" type="xs:float" /> </xs:sequence> <xs:attribute name="sku" type="xs:string" use="required" /> <xs:attribute name="color" type="xs:string" use="optional" /> </xs:complexType> </xs:element> <xs:element name="order"> <xs:complexType> <xs:sequence> <xs:element ref="product" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> </xs:element>
Section
H.1
Proto-Grammar
Examples
guides
you
through
the
process
of
deriving
generating
EXI
proto-grammars
from
the
schema
components
available
in
the
example
schema
above.
EXI
grammars
in
the
normalized
form
that
correspond
to
the
proto-grammars
are
shown
in
section
H.2
Normalized
Grammar
Examples
.
Section
H.3
Complete
Grammar
Examples
shows
the
complete
EXI
grammars
for
elements
<product>
and
<order>.
Grammars
for
element
declaration
terms
"description",
"quantity"
and
"price"
are
as
follows.
See
section
8.5.4.1.6
Element
Terms
for
the
rules
used
to
derive
generate
grammars
from
for
element
terms.
Term_description | ||
---|---|---|
Term_description 0 : | ||
SE( "description" ) Term_description 1 | ||
Term_description 1 : | ||
EE | ||
Term_quantity | ||
---|---|---|
Term_quantity 0 : | ||
SE( "quantity" ) Term_quantity 1 | ||
Term_quantity 1 : | ||
EE | ||
Term_price | ||
---|---|---|
Term_price 0 : | ||
SE( "price" ) Term_price 1 | ||
Term_price 1 : | ||
EE | ||
Grammars
The
grammar
for
element
particle
"description"
are
derived
from
is
created
based
on
Term_description
given
{
minOccurs
}
{ minOccurs }
value
of
0
and
{
maxOccurs
}
{ maxOccurs }
value
of
1.
See
section
8.5.4.1.5
Particles
for
the
rules
used
to
derive
generate
grammars
from
for
particles.
Particle_description | ||
---|---|---|
Term_description 0 : | ||
SE( "description" ) Term_description 1 | ||
EE | ||
Term_description 1 : | ||
EE | ||
Grammars for element particle "quantity" and "prices" are the same as those of their terms ( Term_quantity and Term_price , respectively) because { minOccurs } and { maxOccurs } are both 1.
Particle_quantity | ||
---|---|---|
Term_quantity 0 : | ||
SE( "quantity" ) Term_quantity 1 | ||
Term_quantity 1 : | ||
EE | ||
Particle_price | ||
---|---|---|
Term_price 0 : | ||
SE( "price" ) Term_price 1 | ||
Term_price 1 : | ||
EE | ||
Grammars
The
grammar
for
the
sequence
group
term
in
<product>
element
declaration
is
derived
from
created
based
on
the
grammars
of
subordinate
particles
as
follows.
See
section
8.5.4.1.8.1
Sequence
Model
Groups
for
the
rules
used
to
derive
generate
grammars
from
a
for
sequence
group.
groups.
Term_sequence = Particle_description ⊕ Particle_quantity ⊕ Particle_price |
which yields the following grammars for Term_sequence .
Term_sequence | ||
---|---|---|
Term_description 0 : | ||
SE("description") Term_description 1 | ||
Term_quantity 0 | ||
Term_description 1 : | ||
Term_quantity 0 | ||
Term_quantity 0 : | ||
SE("quantity") Term_quantity 1 | ||
Term_quantity 1 : | ||
Term_price 0 | ||
Term_price 0 : | ||
SE("price") Term_price 1 | ||
Term_price 1 : | ||
EE | ||
Grammars
The
grammar
for
the
particle
that
is
the
content
model
of
element
<product>
are
derived
from
is
created
based
on
Term_sequence
(shown
above)
given
{
minOccurs
}
value
of
1
and
{
maxOccurs
}
value
of
2.
See
section
8.5.4.1.5
Particles
for
the
rules
used
to
derive
generate
grammars
from
for
particles.
Particle_sequence | ||
---|---|---|
Term_description 0,0 : | ||
SE("description") Term_description 0,1 | ||
Term_quantity 0,0 | ||
Term_description 0,1 : | ||
Term_quantity 0,0 | ||
Term_quantity 0,0 : | ||
SE("quantity") Term_quantity 0,1 | ||
Term_quantity 0,1 : | ||
Term_price 0,0 | ||
Term_price 0,0 : | ||
SE("price") Term_price 0,1 | ||
Term_price 0,1 : | ||
Term_description 1,0 | ||
Term_description 1,0 : | ||
SE("description") Term_description 1,1 | ||
Term_quantity 1,0 | ||
EE | ||
Term_description 1,1 : | ||
Term_quantity 1,0 | ||
Term_quantity 1,0 : | ||
SE("quantity") Term_quantity 1,1 | ||
Term_quantity 1,1 : | ||
Term_price 1,0 | ||
Term_price 1,0 : | ||
SE("price") Term_price 1,1 | ||
Term_price 1,1 : | ||
EE | ||
Grammars
for
attribute
uses
of
attributes
"sku"
and
"color"
are
as
follows.
See
section
8.5.4.1.4
Attribute
Uses
for
the
rules
used
to
derive
generate
grammars
from
for
attribute
uses.
Use_sku | ||
---|---|---|
Use_sku 0 : | ||
AT("sku") [schema-typed value] Use_sku 1 | ||
Use_sku 1 : | ||
EE | ||
Use_color | ||
---|---|---|
Use_color 0 : | ||
AT("color") [schema-typed value] Use_color 1 | ||
EE | ||
Use_color 1 : | ||
EE | ||
Note
the
subtle
difference
between
the
forms
of
the
two
grammars
Use_sku
and
Use_color
.
In
At
the
first
grammar
outset
of
each,
the
grammars,
only
Use_color
contains
a
production
of
which
the
right
hand
right-hand
side
starts
with
EE,
which
stems
from
is
the
result
of
the
difference
in
their
occurrence
optionality
as
requirement
defined
in
the
schema.
Finally,
grammars
the
grammar
for
the
element
<product>
is
derived
from
created
based
on
the
grammars
of
its
attribute
uses
and
content
model
particle
as
follows.
See
section
8.5.4.1.3.2
Complex
Type
Grammars
for
the
rules
used
to
derive
generate
grammars
from
a
for
complex
type.
types.
ProtoG_ProductElement = Use_color ⊕ Use_sku ⊕ Particle_sequence |
which
yields
the
following
grammars
grammar
for
element
<product>.
ProtoG_ProductElement | ||
---|---|---|
Use_color 0 : | ||
AT("color") [schema-typed value] Use_color 1 | ||
Use_sku 0 | ||
Use_color 1 : | ||
Use_sku 0 | ||
Use_sku 0 : | ||
AT("sku") [schema-typed value] Use_sku 1 | ||
Use_sku 1 : | ||
Term_description 0,0 | ||
Term_description 0,0 : | ||
SE("description") Term_description 0,1 | ||
Term_quantity 0,0 | ||
Term_description 0,1 : | ||
Term_quantity 0,0 | ||
Term_quantity 0,0 : | ||
SE("quantity") Term_quantity 0,1 | ||
Term_quantity 0,1 : | ||
Term_price 0,0 | ||
Term_price 0,0 : | ||
SE("price") Term_price 0,1 | ||
Term_price 0,1 : | ||
Term_description 1,0 | ||
Term_description 1,0 : | ||
SE("description") Term_description 1,1 | ||
Term_quantity 1,0 | ||
EE | ||
Term_description 1,1 : | ||
Term_quantity 1,0 | ||
Term_quantity 1,0 : | ||
SE("quantity") Term_quantity 1,1 | ||
Term_quantity 1,1 : | ||
Term_price 1,0 | ||
Term_price 1,0 : | ||
SE("price") Term_price 1,1 | ||
Term_price 1,1 : | ||
EE | ||
The
other
element
declaration
<order>
can
be
processed
to
generate
its
proto-grammar
in
the
same
a
similar
fashion
as
was
seen
done
above
follows.
The
grammar
for
element
<product>,
which
would
generate
particle
"product"
is
created
based
on
Term_product
given
{ minOccurs }
value
of
1
and
{ maxOccurs }
value
of
unbounded
.
See
section
8.5.4.1.5
Particles
for
the
following
proto-grammars.
rules
used
to
generate
grammars
for
particles.
Particle_product (before simplification) | ||
---|---|---|
Term_product 0,0 : | ||
SE("product") Term_product 0,1 | ||
Term_product 0,1 : | ||
Term_product 1,0 | ||
Term_product 1,0 : | ||
SE("product") Term_product 1,1 | ||
EE | ||
Term_product 1,1 : | ||
Term_product 1,0 | ||
In
the
above
grammars,
two
grammars
Term_product
0,1
and
Term_product
1,1
are
redundant
because
they
serve
for
no
other
purpose
than
simply
relaying
one
non-terminal
to
another.
Though
it
is
not
required,
the
uses
of
non-terminals
Term_product
0,1
and
Term_product
1,1
are
each
replaced
by
Term_product
1,0
and
Term_product
1,0
,
which
produces
the
following
modified
simplified
proto-grammars.
| ||
---|---|---|
Term_product 0,0 : | ||
SE("product") Term_product 1,0 | ||
Term_product
| ||
SE("product") Term_product 1,0 | ||
EE | ||
The proto-grammar of the element <order> equates to Particle_product because the type definition of element <order> has no attribute uses, and its content model has both { minOccurs } and { maxOccurs } property values of 1 where the element particle "product" is the sole member of the content model.
The element proto-grammars ProtoG_ProductElement and ProtoG_OrderElement produced in the previous section can be turned into their normalized forms which are shown below with an event code assigned to each production. See section 8.5.4.2 EXI Normalized Grammars for the process that converts proto-grammars into normalized grammars, and section 8.5.4.3 Event Code Assignment for the rules that determine the event codes of productions in normalized grammars.
NormG_ProductElement | |||
---|---|---|---|
Event Code | |||
Use_color 0 : | |||
AT("color") [schema-typed value] Use_color 1 | 0 | ||
AT("sku") [schema-typed value] Use_sku 1 | 1 | ||
Use_color 1 : | |||
AT("sku") [schema-typed value] Use_sku 1 | 0 | ||
Use_sku 1 : | |||
SE("description") Term_description 0,1 | 0 | ||
SE("quantity") Term_quantity 0,1 | 1 | ||
Term_description 0,1 : | |||
SE("quantity") Term_quantity 0,1 | 0 | ||
Term_quantity 0,1 : | |||
SE("price") Term_price 0,1 | 0 | ||
Term_price 0,1 : | |||
SE("description") Term_description 1,1 | 0 | ||
SE("quantity") Term_quantity 1,1 | 1 | ||
EE | 2 | ||
Term_description 1,1 : | |||
SE("quantity") Term_quantity 1,1 | 0 | ||
Term_quantity 1,1 : | |||
SE("price") Term_price 1,1 | 0 | ||
Term_price 1,1 : | |||
EE | 0 | ||
NormG_OrderElement | |||
---|---|---|---|
Event Code | |||
Term_product 0,0 : | |||
SE("product") Term_product 1,0 | 0 | ||
Term_product 1,0 : | |||
SE("product") Term_product 1,0 | 0 | ||
EE | 1 | ||
Note
that
some
grammars
productions
that
were
present
in
the
proto-grammars
have
been
removed
in
the
normalized
grammars.
Those
grammars
productions
were
culled
upon
the
completion
of
grammar
normalization
because
their
LeftHandSide
left-hand-side
non-terminals
are
not
referenced
from
RightHandSide
right-hand
side
of
any
available
productions.
productions,
and
yet
those
non-terminals
are
not
the
first
non-terminals
of
the
grammar
they
belong
to.
.
The
normalized
grammars
NormG_ProductElement
and
NormG_OrderElement
are
augumented
augmented
with
undeclared
productions
to
become
complete
grammars.
See
section
8.5.4.4
Undeclared
Productions
for
the
process
that
augments
to
augment
normalized
grammars
with
productions
that
represent
for
accepting
terminal
symbols
not
declared
in
schemas.
Those
productions
not
necessary
per
fidelity
options
The
complete
grammars
for
elements
<product>
and
<order>
are
shown
below.
Note
that
the
default
grammar
settings
(i.e.
the
settings
that
can
be
described
by
an
empty
header
options
document
<exi:header/>
is
used
for
the
sake
of
this
augmentation
process,
and
those
productions
that
accept
ER,
NS,
CM
and
PI
have
been
pruned
using
according
to
the
rules
described
in
section
8.3
Pruning
Unneeded
Productions
.
The
resulting
grammar
with
since
those
terminal
symbols
are
not
preserved
in
the
default
fidelity
options
setting
is
shown
below.
grammar
settings.
| |||
---|---|---|---|
Event Code | |||
Use_color 0 : | |||
AT("color") [schema-typed value] Use_color 1 | 0 | ||
AT("sku") [schema-typed value] Use_sku 1 | 1 | ||
EE | 2.0 | ||
AT(xsi:type) Use_color 0 | 2.1 | ||
AT(xsi:nil) Use_color 0 | 2.2 | ||
| 2.3 | ||
AT("color")
| 2.4.0 | ||
AT("sku")
| 2.4.1 | ||
| 2.4.2 | ||
SE(*)
Use_sku
| 2.5 | ||
CH
| 2.6 | ||
Use_color 1 : | |||
AT("sku") [schema-typed value] Use_sku 1 | 0 | ||
EE | 1.0 | ||
| 1.1 | ||
AT("sku")
| 1.2.0 | ||
| 1.2.1 | ||
SE(*)
Use_sku
| 1.3 | ||
CH
| 1.4 | ||
Use_sku 1 : | |||
SE("description") Term_description 0,1 | 0 | ||
SE("quantity") Term_quantity 0,1 | 1 | ||
EE | 2.0 | ||
| 2.1 | ||
| 2.2.0 | ||
SE(*)
Use_sku
| 2.3 | ||
CH
| 2.4 | ||
Use_sku 1_copied : | |||
SE("description") Term_description 0,1 | 0 | ||
SE("quantity") Term_quantity 0,1 | 1 | ||
EE | 2.0 | ||
SE(*) Use_sku 1_copied | 2.1 | ||
CH [untyped value] Use_sku 1_copied | 2.2 | ||
Term_description 0,1 : | |||
SE("quantity") Term_quantity 0,1 | 0 | ||
EE | 1 | ||
SE(*) Term_description 0,1 | 2.0 | ||
CH
| 2.1 | ||
Term_quantity 0,1 : | |||
SE("price") Term_price 0,1 | 0 | ||
EE | 1 | ||
SE(*) Term_quantity 0,1 | 2.0 | ||
CH
| 2.1 | ||
Term_price 0,1 : | |||
SE("description") Term_description 1,1 | 0 | ||
SE("quantity") Term_quantity 1,1 | 1 | ||
EE | 2 | ||
SE(*) Term_price 0,1 | 3.0 | ||
CH
| 3.1 | ||
Term_description 1,1 : | |||
SE("quantity") Term_quantity 1,1 | 0 | ||
EE | 1 | ||
SE(*) Term_description 1,1 | 2.0 | ||
CH
| 2.1 | ||
Term_quantity 1,1 : | |||
SE("price") Term_price 1,1 | 0 | ||
EE | 1 | ||
SE(*) Term_quantity 1,1 | 2.0 | ||
CH
| 2.1 | ||
Term_price 1,1 : | |||
EE | 0 | ||
SE(*) Term_price 1,1 | 1.0 | ||
CH
| 1.1 | ||
| |||
---|---|---|---|
Event Code | |||
Term_product 0,0 : | |||
SE("product") Term_product 1,0 | 0 | ||
EE | 1.0 | ||
AT(xsi:type) Term_product 0,0 | 1.1 | ||
AT(xsi:nil) Term_product 0,0 | 1.2 | ||
| 1.3 | ||
| 1.4.0 | ||
SE(*)
Term_product
| 1.5 | ||
CH
| 1.6 | ||
Term_product 0,0_copied : | |||
SE("product") Term_product 1,0 | 0 | ||
EE | 1.0 | ||
SE(*) Term_product 0,0_copied | 1.1 | ||
CH [untyped value] Term_product 0,0_copied | 1.2 | ||
Term_product 1,0 : | |||
SE("product") Term_product 1,0 | 0 | ||
EE | 1 | ||
SE(*) Term_product 1,0 | 2.0 | ||
CH
| 2.1 | ||
This document is the work of the Efficient XML Interchange (EXI) WG .
Members of the Working Group are (at the time of writing, sorted alphabetically by last name):
The EXI Working Group would like to acknowledge the following former members of the group for their leadership, guidance and expertise they provided throughout their individual tenure in the WG. (sorted alphabetically by last name)
The EXI working group owes so much to our distinguished colleague from Nokia, Kimmo Raatikainen (1955-2008), on the progress of our work, who succumbed to an ailment on March 13, 2008. His breadth of knowledge, depth of insight, ingenuity and courage to speak up constantly shed a light onto us whenever the group seemed to stray into a futile path of disagreements during the course. We shall never forget and will always appreciate his presence in us, and great contribution that is omnipresent in every aspect of our work throughout.