W3C

Disposition of comments for the Efficient Extensible Interchange Working Group

Single page view

Not all comments have been marked as replied to. The disposition of comments is not complete.

In the table below, red is in the WG decision column indicates that the Working Group didn't agree with the comment, green indicates that a it agreed with it, and yellow reflects an in-between situation.

In the "Commentor reply" column, red indicates the commenter objected to the WG resolution, green indicates approval, and yellow means the commenter didn't respond to the request for feedback.

CommentorCommentWorking Group decisionCommentor reply
LC-3069 John Schneider <john.schneider@agiledelta.com> (archived comment)
1. Architecture & Design: The specification defines canonical EXI with
respect to an input EXI stream. This limits one’s ability to use canonical
EXI with traditional XML or other XML Infoset representations and creates
a poor architectural fit with the rest of the XML stack of technologies
that are defined with respect to the XML Infoset. The strict dependency
on an EXI input stream, the EXI options document and the EXI schemaId
creates intrinsic incompatibilities with XML, which does not support
these EXI-specific artifacts. This leads to practical implementation
problems, such as the inability for canonical EXI to support digital
signatures through XML intermediary nodes, which you identified at the
end of section A.1.

To be useful in all XML contexts and with all XML technologies, EXI
canonicalization must be defined with respect to the XML Infoset. We
recommend you update the specification to define canonical EXI with
respect to a given XML Infoset, a given XML Schema and a given set of
EXI options. The schema and EXI options may be provided any number of
ways, as you describe well in section C.2. As with EXI, the user should
be allowed to embed these in the EXI header when it is advantageous,
but should not be required to do so when it is not. Mandating the
inclusion of the EXI options and a schemaID in every message is at
odds with EXI’s efficiency objectives and makes it onerous to use
canonical EXI as a transmission format. As you point out in section
C.1., using canonical EXI as a transmission format can eliminate
the need to perform [redundant] canonicalization at the receiver —
further increasing efficiency. We have users that currently employ
canonical EXI this way and it is very advantageous to them. However,
requiring the EXI options and schemaId in every message would quickly
overwhelm the benefits of using canonical EXI as a transmission format.

5. Section 4: As stated above, to be useful in all XML contexts and with
all XML technologies, EXI canonicalization must be defined with respect to
a given XML Infoset rather than a given EXI stream. The semantics of the
specification should be specified with respect to a given XML Infoset,
a given XML Schema and a given set of EXI options (independent on how
these are acquired).

15. Section A.1: The second paragraph states that Canonical EXI deals with
EXI documents. As alluded in the third paragraph of this section, this is
not strictly true. Canonical EXI should be usable with and provide benefits
to XML, EXI or any other XML Infoset representation. However, as stated
earlier in these comments, canonical EXI must be defined with respect to
the XML Infoset rather than an EXI input document to achieve this. Defining
EXI canonicalization with respect to only EXI is limiting and fails to
realize the full potential of the technology.

The last sentence in this section also states that it is not possible to
use XML on intermediary nodes when Canonical EXI has been used for signing.
This is a limitation of the current specification and not of canonical EXI
in general. If you define canonical EXI with regard to a given XML Infoset,
XML Schema and given set of EXI options and ensure all EXI nodes use the
same XML Schema and EXI options, this limitation goes away. As stated earlier,
there are more reliable and efficient ways to ensure cooperating nodes use
the same XML Schemas and EXI options than including the EXI options document
and schemaId in every message. And these methods do not fail when transcoding
to XML because they do not depend on the XML/EXI message for the schema and
EXI options. The reason the current specification fails in this regard is
because it depends strictly on the EXI document to carry the options and
schemaId and transcoding to XML loses this information. As discussed earlier,
this is a design flaw that should be fixed.
We agree with your comment that Canonical EXI should be based on XML Infoset and changed the specification accordingly. yes
LC-3070 John Schneider <john.schneider@agiledelta.com> (archived comment)
4. Section 3: As mentioned above, making the EXI options document and
the EXI schemaId mandatory in every canonical EXI document is at odds
with the efficiency objectives of EXI. In many or perhaps even most
use cases that require efficiency, these can be (and are) provided
out of band or specified by a higher-level protocol. As such, including
them in every canonical EXI message introduces unnecessary overhead
and provides no value since all cooperating nodes already have this
information.

Furthermore, forcing the inclusion of a schemaId in every message does
not actually solve the problem of ensuring the sender and receiver use
the same schemas. The EXI schemaId is not guaranteed to be unique and
would be easy for a sender and receiver to end up using the same schemaId
for two different versions of the same schema or even two completely
different schemas (breaking any signature that depends on schemaId).
There are more reliable ways to ensure senders and receivers are using
the same schemas for encoding/decoding EXI documents. This problem is
not unique to EXI canonicalization and the EXI canonicalization
specification should not force a specific, sub-optimal solution on EXI
users. As with EXI, users should be allowed to use the EXI options
document and schemaId to address this issue, but they should not be
forced to do so if they have a better, more efficient solution that
is already working.
The WG acknowledges there is a desire and need to allow applications
to choose whether they use header options and schemaId in the canonical form.
yes
LC-3073 John Schneider <john.schneider@agiledelta.com> (archived comment)
8. Section 4.2.2: The meaning of this section is not entirely clear.
Presumably, it is not possible with the current EXI specification to
use a production that is not capable of representing the content value
by definition). Are there circumstances that this section is attempting
to prohibit that are currently allowed by the EXI 1.0 specification?
This section has been revised. The intent is to privilege
schema-valid events over untyped events. To do so Section 4.2.2
and 4.2.3 have been combined into one. In this spirit, an EXI
processor MUST use the event that matches most precisely first.
yes
LC-3074 John Schneider <john.schneider@agiledelta.com> (archived comment)
10. Section 4.4: The last sentences of this section indicates that
Canonical EXI processors SHOULD be able to convert an untyped value
to each datatype representation defined in EXI 1.0. This special
language would not be required if EXI canonicalization were defined
more generally with respect to the XML Infoset rather than an input
EXI stream
Correct. Not required any-more given that we base our algorithm on XML Infoset. yes
LC-3075 John Schneider <john.schneider@agiledelta.com> (archived comment)
11. Section 4.4.1: The last sentence of this section specifies that
all canonical EXI processors MUST support arbitrarily large integer
values. This means there will be some canonical EXI documents that
devices without support for arbitrarily large integers cannot process.
Recommend you consider updating this definition so it is possible to
generate a canonical representation for any EXI document that any
device that meets the minimum EXI processing requirements can handle.
In particular, recommend you consider changing this definition such
that canonical EXI processors MUST represent all Unsigned Integer
values using the Unsigned Integer datatype representation when strict
is true. However, when strict is false canonical EXI processors must
represent Unsigned Integer values greater than 2147483647 using the
String datatype representation. This would enable devices with limited
capabilities to at least read, display and retransmit arbitrarily large
values — even if they don’t have the capability to process them.
We think that retransmitting arbitrary large values is doable also if the the device is not capable to represent it properly.

Note: a limited intermediary device can fall-back to string. The only device that has to use integer encoding is the one that checks the signature and requires a canonicalized document.
yes
LC-3077 John Schneider <john.schneider@agiledelta.com> (archived comment)
I’ve been following the discussion regarding Canonical EXI’s treatment of
empty elements and would like to offer a suggestion to simplify the wording
and improve the efficiency of the proposed solution.

Here is what I would propose:

“When strict is false or the current element grammar contains a production
of the form LeftHandSide : EE with event code of length 1, EXI can represent
the content of an empty element explicitly as an empty CH event or implicitly
as a SE event immediately followed by an EE event. In these circumstances,
Canonical EXI MUST represent an empty element by a SE event followed by an
EE event.”

I think this description states the issue and the alternate solution simply
and clearly. The alternate solution improves compactness by prescribing
the most efficient representation of an empty character event when it is
available (i.e., by omitting the CH event). It improves processing efficiency
by requiring only 1-2 checks (strict & available EE) and does not require
knowledge or checking against DTR types. These checks occur in a relatively
hot code path, so minimizing overhead is important for efficiency. Because
the alternate approach does not depend on DTR knowledge, it also avoids the
need to describe how to handle user defined DTRs that can also encode empty
strings (which the current proposal does not address).
There are two approaches proposed on how to define rules regarding
the encoding of empty elements in schema-informed context.

Please provide any opinions as to which of those approaches you
consider more appropriate to have as part of Canonical EXI.

The behavior of each approach is described below.

Approach A: This approach always first tries to encode empty elements
(i.e. SE followed by EE, optionally AT, etc. in between) as a sequence of
SE CH EE (optionally AT etc. between SE and CH) where CH is used for
representing empty string, for elements defined to have simple-content,
as long as doing so is possible (i.e. unless the codec in effect does *not*
permit to encode empty string "").

Approach B: This approach encodes empty elements (i.e. SE followed by EE,
optionally AT, etc. in between) as a sequence of SE EE (optionally AT etc.
in between). As an exception, for elements defined to have simple-content,
it is allowed to insert CH that represents empty string "" between SE and EE
only when doing so is necessary for representing an empty element there.

Note the approach B provides better efficiency, while approach B leads to
generate the same sequence of events whether strict or non-strict mode.

---------------------------------------------------------------------

After considering several opinions that were discussed here on this issue [1],
the group agreed to take approach B. The editor's draft [2] will soon
reflect this decision.

[1] https://www.w3.org/2005/06/tracker/exi/issues/112
[2] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#emptyElementContent
yes
LC-3078 John Schneider <john.schneider@agiledelta.com> (archived comment)
16. Section C.2: It is interesting and encouraging to see a good description
of best practices for sharing EXI options without the EXI options document.
This is the flexibility the specification should allow rather than mandating
that the EXI options and schemaId be specified inside every canonical EXI stream.
The WG uses the following form to communicate EXI-C14 options.

<exi-c14n:options xmlns:exi="http://www.w3.org/2009/exi"
xmlns:exi-c14n="http://www.w3.org/2016/exi-c14n">
<exi-c14n:omitOptionsDocument/>
<exi-c14n:utcTime/>
<exi:header>
<exi:common>
<exi:compression/>
</exi:common>
</exi:header>
</exi-c14n:options>

The WG agrees that it should be possible to have one way to
represent every set of Canonical EXI options.
yes
LC-3044 timeless <timeless@gmail.com> (archived comment)
> EXI can be used in such use cases and offers benefits w.r.t. compact data exchange and fast processing.

> To ensure that relevant Infoset items are available the following
> EXI Fidelity Options must be always enabled:
> Preserve.pis, Preserve.prefixes, and Preserve.lexicalValues.
> When the XML canonicalization algorithm preserves comments
> the EXI fidelity option Preserve.comments must be also enabled.

//This almost feels like normative instruction, and I don't recall
similar instructions in the main document.
//If similar instructions do exist in the main document, a pointer
would be appreciated.

I've decided the following is the block could benefit from emendation:

> Canonical XML is designed to be useful to applications that test whether an XML document has been changed (e.g., XML signature).

I read the "is" here as indicating it was something defined in this
document. I think this text is actually referring to something beyond
this document, in which case, I'd suggest:

is => was

Alternatively you could prefix the sentence with "While" or something
(but that would involve rewriting the end of the sentence)....

> Canonical EXI, in contrast to Canonical XML, deals with EXI documents and does not use plain-text XML data and the associated overhead.

the => its
A pointer to Best Practices document was added to address the first point.
(see http://www.w3.org/TR/exi-best-practices/#signature)

The whole paragraph in question now reads as follows.

"The Canonical EXI documents does not want to tackle XML. Instead it deals
with EXI only and this section is just a recap of information shared in
our best practice document. A pointer to this document is added
(see http://www.w3.org/TR/exi-best-practices/#signature)"
tocheck
LC-3071 John Schneider <john.schneider@agiledelta.com> (archived comment)
2. Section 1, last sentence: Change “… whether two documents are identical …” to
“… whether two documents are equivalent …”

3. Section 1.2: We agree EXI canonicalization is important for EXI
environments that cannot afford to revert to traditional XML
canonicalization methods. In addition, we recommend you mention some
of the ways EXI canonicalization is useful for traditional XML users.
For example, EXI canonicalization provides the first type-aware
canonicalization scheme that can discern that +1, 1, 1.0, 1e0 and 1E0
are equivalent representations of the same floating-point value. This
allows intermediaries to use binding-models and/or type-aware processing
without breaking signatures. In addition, with a fast EXI processor,
EXI canonicalization can be much faster than traditional XML
canonicalization and can help cure some of the well-known XML security bottlenecks.

6. Section 4.2.1: Change “Prune productions” to “Select productions” in heading.
Pruning productions will remove them from the grammars, changing the event codes
of the following events and causing incompatibility with the EXI 1.0 specification.
I expect the specification intends to specify which productions must be selected
rather than removing productions from the grammars.

7. Section 4.2.2: Change “Prune productions” to “Select productions” in heading.
The word “prune” should also be replaced in the body of this section. See above rationale.

9. Section 4.2.3: Change heading “Use the event with the most accurate event”
to “Use the event that matches most precisely” or something similar. Current
wording is unclear.

14. Section 4.4.6: The last sentence in the second paragraph states that EXI
processors must first try to represent the string value as a local hit and
when this is not successful as a global hit. It might be useful to clarify
that one of the reasons the attempt to represent the string value as a local
hit may fail is because the string has already been used as a local hit
previously. EXI supports only one local table hit per value.
Agreed and implemented. yes
LC-3072 John Schneider <john.schneider@agiledelta.com> (archived comment)
13. Section 4.4.6: The W3C is standardizing on Unicode Normalization Form C
and recommending all web data be stored and transmitted in this form.
It may be useful to state this and reference the relevant W3C specification here:
http://www.w3.org/TR/charmod-norm/.
A reference has been added.

However also the Canonical XML spec has excluded unicode normalization and the working group decided to follow this path.
yes
LC-3042 timeless <timeless@gmail.com> (archived comment)
> On the contrary, values that match the default value (i.e. <blockSize>1000000</blockSize>) MUST be omitted.

On the contrary => conversely

> When the alignment option compression is set, pre-compress MUST be used instead.

instead of ?

> Moreover, the EXI event sequence of each nested element MUST be SE followed by EE

would it hurt you to link "SE" and "EE" to some definition of "Start
Element" / "End Element" for readers less familiar w/ the jargon used
herein?

> The user defined meta-data MUST NOT be used unless it conveys a convention used by the application.

"user defined meta-data" is italicized, but it isn't linked, and the
use of "The" doesn't help me. If you dropped "The", I could almost
understand what you're saying. If the "The" is important, then this
italicized text SHOULD link to something defining it.

> The user defined meta-data conveys auxiliary information and does not alter or extend the EXI data format.
> Hence it deemed acceptable to omit this information.

it => it is | it was

> Elements that are necessary to structure the EXI options document according to the XML schema
> (i.e. lesscommon, uncommon, alignment, datatypeRepresentationMap, preserve and common)
> MUST be omitted unless there is at least one nested element according to the previous steps.

Ideally steps are in numbered form, or somehow called out beyond "by
the way, I hid steps somewhere before this point".
> On the contrary => conversely

Agree.

> instead of ?

The bullet list item has been changed to "When the alignment option compression is set, pre-compress MUST be used instead of compression."
Further, references to all EXI options have been added.

> would it hurt you to link "SE" and "EE" to some definition of "Start
> Element" / "End Element" for readers less familiar w/ the jargon used
> herein?

Start Element (SE) and respectively End Element (EE) is used instead of the abbreviations.
A link to EXI event types has been added also.

> "user defined meta-data" is italicized, but it isn't linked, and the
> use of "The" doesn't help me. If you dropped "The", I could almost
> understand what you're saying. If the "The" is important, then this
> italicized text SHOULD link to something defining it.

"user defined meta-data" links now to the EXI specification.

> it => it is | it was

Changed to "it is".

> Ideally steps are in numbered form, or somehow called out beyond "by
> the way, I hid steps somewhere before this point".

The bullet list has been changed to a numbered list.
tocheck
LC-3043 timeless <timeless@gmail.com> (archived comment)
> Further, a String value MUST be represented as string value hit if possible.

`hit` is used three times, only locally. It should either be defined
or linked to something.
The terminology has been aligned with the EXI specification that uses "when a string value is found in the global or local value partition" and a reference has been added.
http://www.w3.org/TR/exi/#encodingOptimizedForMisses
tocheck
LC-3065 timeless <timeless@gmail.com> (archived comment)
> The canonical representation dictates that characters from the restricted character set MUST use
> the according n-bit Unsigned Integer.

"according n-bit Unsigned Integer" sounds weird. If it's defined
elsewhere, please link. If not, please explain. (Or "according" could
be the wrong word.)
The link to http://www.w3.org/TR/exi/#encodingBoundedUnsigned has been added. tocheck
LC-3045 timeless <timeless@gmail.com> (archived comment)
> The canonicalization process of EXI

> bases upon

awkward

> the knowledge of the used EXI options which is an optional part of the EXI header.
> These options communicate the various EXI options that have been used to encode the actual XML information with EXI and
> are crucial to be known.

awkward

> This sections

section => section | This => These

> provides some best practices - so that for example it can be successfully used as part of the digital signature framework or in other use-cases.
> Currently different options are discussed.

"discussed" or "under discussion" or ??

i.e. awkward
Proposed updates agreed.

Appendix C.2 "Exchange EXI Options" has been updated to:

"The canonicalization process of EXI is based on the knowledge of the used EXI options. The EXI options communicate the various options that have been used to encode the actual XML information with EXI and are essential for any EXI processor. Given that the presence of EXI options in its entirety is optional in the EXI header, the following subsections provide and discuss best practices how to exchange them - so that for example it can be successfully used as part of the digital signature framework or in other use-cases. "
tocheck
LC-3068 timeless <timeless@gmail.com> (archived comment)
> Optimizations such as pruning insignificant xsi:type values (e.g., xsi:type="xsd:string" for string values)
> or insignificant xsi:nil values (e.g., xsi:nil="false")
> is prohibited for a Canonical EXI processor.

I think:
is => are

> where the rules of determining equivalence is described below.
is => are (?)

> A rationale for each decision is given as well as background information is provided.

as well as => and


> Example B-3. Example algorithm for converting float values to the canonical form

Example..Example?

> Initialize the exponent with the value 0 (zero) and jump to step 2.

s/. and j/. J/

> If the value after the decimal point can be represented as 0 (zero)
> without losing precision jump to step 4, otherwise to step 3.

s/precision jump/precision, [then] jump/
s/otherwise/otherwise jump/

> If the signed mantissa is unequal 0 (zero), unequal -0 (negative zero), and contains a trailing
> zero jump to 6, otherwise to step 7.

s/zero jump/zero, [then] jump/
s/otherwise/otherwise jump/
Agree.

As to Example B-3, it now reads as follows:
"Example B-3. An algorithm for converting float values to the canonical form"
tocheck

Developed and maintained by Dominique Hazaël-Massieux (dom@w3.org).
$Id: index.html,v 1.1 2017/08/11 06:44:14 dom Exp $
Please send bug reports and request for enhancements to w3t-sys.org