W3C

List of comments on “Canonical EXI” (dated 21 May 2015)

Quick access to

There are 19 comments (sorted by their types, and the section they are about).

substantive comments

Comment LC-3041: Normative vs. Non-Normative
Commenter: timeless <timeless@gmail.com> (archived message)
Context: Document as a whole
assigned to Daniel Peintner
Resolution status:

http://www.w3.org/TR/2015/WD-exi-c14n-20150521/

This document doesn't explain what is normative/informative.
I'd guess that Notes and A/B/C aren't normative, but there doesn't
seem to be text to that effect.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3070
Commenter: John Schneider <john.schneider@agiledelta.com> (archived message)
Context: 3. Canonical EXI Header
assigned to Daniel Peintner
Resolution status:

4. Section 3: As mentioned above, making the EXI options document and
the EXI schemaId mandatory in every canonical EXI document is at odds
with the efficiency objectives of EXI. In many or perhaps even most
use cases that require efficiency, these can be (and are) provided
out of band or specified by a higher-level protocol. As such, including
them in every canonical EXI message introduces unnecessary overhead
and provides no value since all cooperating nodes already have this
information.

Furthermore, forcing the inclusion of a schemaId in every message does
not actually solve the problem of ensuring the sender and receiver use
the same schemas. The EXI schemaId is not guaranteed to be unique and
would be easy for a sender and receiver to end up using the same schemaId
for two different versions of the same schema or even two completely
different schemas (breaking any signature that depends on schemaId).
There are more reliable ways to ensure senders and receivers are using
the same schemas for encoding/decoding EXI documents. This problem is
not unique to EXI canonicalization and the EXI canonicalization
specification should not force a specific, sub-optimal solution on EXI
users. As with EXI, users should be allowed to use the EXI options
document and schemaId to address this issue, but they should not be
forced to do so if they have a better, more efficient solution that
is already working.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3077
Commenter: John Schneider <john.schneider@agiledelta.com> (archived message)
Context: 4. Canonical EXI Body
assigned to Daniel Peintner
Resolution status:

I’ve been following the discussion regarding Canonical EXI’s treatment of
empty elements and would like to offer a suggestion to simplify the wording
and improve the efficiency of the proposed solution.

Here is what I would propose:

“When strict is false or the current element grammar contains a production
of the form LeftHandSide : EE with event code of length 1, EXI can represent
the content of an empty element explicitly as an empty CH event or implicitly
as a SE event immediately followed by an EE event. In these circumstances,
Canonical EXI MUST represent an empty element by a SE event followed by an
EE event.”

I think this description states the issue and the alternate solution simply
and clearly. The alternate solution improves compactness by prescribing
the most efficient representation of an empty character event when it is
available (i.e., by omitting the CH event). It improves processing efficiency
by requiring only 1-2 checks (strict & available EE) and does not require
knowledge or checking against DTR types. These checks occur in a relatively
hot code path, so minimizing overhead is important for efficiency. Because
the alternate approach does not depend on DTR knowledge, it also avoids the
need to describe how to handle user defined DTRs that can also encode empty
strings (which the current proposal does not address).
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3069
Commenter: John Schneider <john.schneider@agiledelta.com> (archived message)
Context: 4. Canonical EXI Body
assigned to Daniel Peintner
Resolution status:

1. Architecture & Design: The specification defines canonical EXI with
respect to an input EXI stream. This limits one’s ability to use canonical
EXI with traditional XML or other XML Infoset representations and creates
a poor architectural fit with the rest of the XML stack of technologies
that are defined with respect to the XML Infoset. The strict dependency
on an EXI input stream, the EXI options document and the EXI schemaId
creates intrinsic incompatibilities with XML, which does not support
these EXI-specific artifacts. This leads to practical implementation
problems, such as the inability for canonical EXI to support digital
signatures through XML intermediary nodes, which you identified at the
end of section A.1.

To be useful in all XML contexts and with all XML technologies, EXI
canonicalization must be defined with respect to the XML Infoset. We
recommend you update the specification to define canonical EXI with
respect to a given XML Infoset, a given XML Schema and a given set of
EXI options. The schema and EXI options may be provided any number of
ways, as you describe well in section C.2. As with EXI, the user should
be allowed to embed these in the EXI header when it is advantageous,
but should not be required to do so when it is not. Mandating the
inclusion of the EXI options and a schemaID in every message is at
odds with EXI’s efficiency objectives and makes it onerous to use
canonical EXI as a transmission format. As you point out in section
C.1., using canonical EXI as a transmission format can eliminate
the need to perform [redundant] canonicalization at the receiver —
further increasing efficiency. We have users that currently employ
canonical EXI this way and it is very advantageous to them. However,
requiring the EXI options and schemaId in every message would quickly
overwhelm the benefits of using canonical EXI as a transmission format.

5. Section 4: As stated above, to be useful in all XML contexts and with
all XML technologies, EXI canonicalization must be defined with respect to
a given XML Infoset rather than a given EXI stream. The semantics of the
specification should be specified with respect to a given XML Infoset,
a given XML Schema and a given set of EXI options (independent on how
these are acquired).

15. Section A.1: The second paragraph states that Canonical EXI deals with
EXI documents. As alluded in the third paragraph of this section, this is
not strictly true. Canonical EXI should be usable with and provide benefits
to XML, EXI or any other XML Infoset representation. However, as stated
earlier in these comments, canonical EXI must be defined with respect to
the XML Infoset rather than an EXI input document to achieve this. Defining
EXI canonicalization with respect to only EXI is limiting and fails to
realize the full potential of the technology.

The last sentence in this section also states that it is not possible to
use XML on intermediary nodes when Canonical EXI has been used for signing.
This is a limitation of the current specification and not of canonical EXI
in general. If you define canonical EXI with regard to a given XML Infoset,
XML Schema and given set of EXI options and ensure all EXI nodes use the
same XML Schema and EXI options, this limitation goes away. As stated earlier,
there are more reliable and efficient ways to ensure cooperating nodes use
the same XML Schemas and EXI options than including the EXI options document
and schemaId in every message. And these methods do not fail when transcoding
to XML because they do not depend on the XML/EXI message for the schema and
EXI options. The reason the current specification fails in this regard is
because it depends strictly on the EXI document to carry the options and
schemaId and transcoding to XML loses this information. As discussed earlier,
this is a design flaw that should be fixed.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3073
Commenter: John Schneider <john.schneider@agiledelta.com> (archived message)
Context: 4.2.2 Prune productions for non applicable events with content values
assigned to Daniel Peintner
Resolution status:

8. Section 4.2.2: The meaning of this section is not entirely clear.
Presumably, it is not possible with the current EXI specification to
use a production that is not capable of representing the content value
by definition). Are there circumstances that this section is attempting
to prohibit that are currently allowed by the EXI 1.0 specification?
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3074
Commenter: John Schneider <john.schneider@agiledelta.com> (archived message)
Context: 4.4 EXI Datatypes
assigned to Daniel Peintner
Resolution status:

10. Section 4.4: The last sentences of this section indicates that
Canonical EXI processors SHOULD be able to convert an untyped value
to each datatype representation defined in EXI 1.0. This special
language would not be required if EXI canonicalization were defined
more generally with respect to the XML Infoset rather than an input
EXI stream
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3075
Commenter: John Schneider <john.schneider@agiledelta.com> (archived message)
Context: 4.4.1 Unsigned Integer
assigned to Daniel Peintner
Resolution status:

11. Section 4.4.1: The last sentence of this section specifies that
all canonical EXI processors MUST support arbitrarily large integer
values. This means there will be some canonical EXI documents that
devices without support for arbitrarily large integers cannot process.
Recommend you consider updating this definition so it is possible to
generate a canonical representation for any EXI document that any
device that meets the minimum EXI processing requirements can handle.
In particular, recommend you consider changing this definition such
that canonical EXI processors MUST represent all Unsigned Integer
values using the Unsigned Integer datatype representation when strict
is true. However, when strict is false canonical EXI processors must
represent Unsigned Integer values greater than 2147483647 using the
String datatype representation. This would enable devices with limited
capabilities to at least read, display and retransmit arbitrarily large
values — even if they don’t have the capability to process them.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3076
Commenter: John Schneider <john.schneider@agiledelta.com> (archived message)
Context: 4.4.5 Date-Time
assigned to Daniel Peintner
Resolution status:

12. Section 4.4.5: This section states that EXI Date-Time values
MUST be canonicalized according to the XML Schema dateTime canonical
representation. While this definition might be convenient, it is not
entirely appropriate for canonicalization and will lead to surprising
results for some. The canonical form for XML Schema dateTime values is
defined to make it easy to determine whether two Date-Time values refer
to the same instant, regardless of the timezone used. However, for many
applications, the Date-Time timezone is an important piece of information
that should be preserved. As such, it will be surprising if the digital
signature is not able to detect changes to this information. In addition,
those using canonical EXI as a transmission format will be surprised if
the canonical EXI format loses all their timezone information and changes
all Date-Time values to GMT. Recommend this section be updated to exclude
canonicalization of timezones in Date-Time values.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3044: Non-normative Appendix seems to feel normative
Commenter: timeless <timeless@gmail.com> (archived message)
Context: A.1 Relationship to XML Security
assigned to Daniel Peintner
Resolution status:

> EXI can be used in such use cases and offers benefits w.r.t. compact data exchange and fast processing.

> To ensure that relevant Infoset items are available the following
> EXI Fidelity Options must be always enabled:
> Preserve.pis, Preserve.prefixes, and Preserve.lexicalValues.
> When the XML canonicalization algorithm preserves comments
> the EXI fidelity option Preserve.comments must be also enabled.

//This almost feels like normative instruction, and I don't recall
similar instructions in the main document.
//If similar instructions do exist in the main document, a pointer
would be appreciated.

I've decided the following is the block could benefit from emendation:

> Canonical XML is designed to be useful to applications that test whether an XML document has been changed (e.g., XML signature).

I read the "is" here as indicating it was something defined in this
document. I think this text is actually referring to something beyond
this document, in which case, I'd suggest:

is => was

Alternatively you could prefix the sentence with "While" or something
(but that would involve rewriting the end of the sentence)....

> Canonical EXI, in contrast to Canonical XML, deals with EXI documents and does not use plain-text XML data and the associated overhead.

the => its
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3078
Commenter: John Schneider <john.schneider@agiledelta.com> (archived message)
Context: C.2 Exchange EXI Options (Best Practices "Work in Progress")
assigned to Daniel Peintner
Resolution status:

16. Section C.2: It is interesting and encouraging to see a good description
of best practices for sharing EXI options without the EXI options document.
This is the flexibility the specification should allow rather than mandating
that the EXI options and schemaId be specified inside every canonical EXI stream.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

editorial comments

Comment LC-3068
Commenter: timeless <timeless@gmail.com> (archived message)
Context: Document as a whole
assigned to Daniel Peintner
Resolution status:

> Optimizations such as pruning insignificant xsi:type values (e.g., xsi:type="xsd:string" for string values)
> or insignificant xsi:nil values (e.g., xsi:nil="false")
> is prohibited for a Canonical EXI processor.

I think:
is => are

> where the rules of determining equivalence is described below.
is => are (?)

> A rationale for each decision is given as well as background information is provided.

as well as => and


> Example B-3. Example algorithm for converting float values to the canonical form

Example..Example?

> Initialize the exponent with the value 0 (zero) and jump to step 2.

s/. and j/. J/

> If the value after the decimal point can be represented as 0 (zero)
> without losing precision jump to step 4, otherwise to step 3.

s/precision jump/precision, [then] jump/
s/otherwise/otherwise jump/

> If the signed mantissa is unequal 0 (zero), unequal -0 (negative zero), and contains a trailing
> zero jump to 6, otherwise to step 7.

s/zero jump/zero, [then] jump/
s/otherwise/otherwise jump/
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3071
Commenter: John Schneider <john.schneider@agiledelta.com> (archived message)
Context: Document as a whole
assigned to Daniel Peintner
Resolution status:

2. Section 1, last sentence: Change “… whether two documents are identical …” to
“… whether two documents are equivalent …”

3. Section 1.2: We agree EXI canonicalization is important for EXI
environments that cannot afford to revert to traditional XML
canonicalization methods. In addition, we recommend you mention some
of the ways EXI canonicalization is useful for traditional XML users.
For example, EXI canonicalization provides the first type-aware
canonicalization scheme that can discern that +1, 1, 1.0, 1e0 and 1E0
are equivalent representations of the same floating-point value. This
allows intermediaries to use binding-models and/or type-aware processing
without breaking signatures. In addition, with a fast EXI processor,
EXI canonicalization can be much faster than traditional XML
canonicalization and can help cure some of the well-known XML security bottlenecks.

6. Section 4.2.1: Change “Prune productions” to “Select productions” in heading.
Pruning productions will remove them from the grammars, changing the event codes
of the following events and causing incompatibility with the EXI 1.0 specification.
I expect the specification intends to specify which productions must be selected
rather than removing productions from the grammars.

7. Section 4.2.2: Change “Prune productions” to “Select productions” in heading.
The word “prune” should also be replaced in the body of this section. See above rationale.

9. Section 4.2.3: Change heading “Use the event with the most accurate event”
to “Use the event that matches most precisely” or something similar. Current
wording is unclear.

14. Section 4.4.6: The last sentence in the second paragraph states that EXI
processors must first try to represent the string value as a local hit and
when this is not successful as a global hit. It might be useful to clarify
that one of the reasons the attempt to represent the string value as a local
hit may fail is because the string has already been used as a local hit
previously. EXI supports only one local table hit per value.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3042: Canonical EXI options in the EXI header unclear / underspecified
Commenter: timeless <timeless@gmail.com> (archived message)
Context: 3. Canonical EXI Header
assigned to Daniel Peintner
Resolution status:

> On the contrary, values that match the default value (i.e. <blockSize>1000000</blockSize>) MUST be omitted.

On the contrary => conversely

> When the alignment option compression is set, pre-compress MUST be used instead.

instead of ?

> Moreover, the EXI event sequence of each nested element MUST be SE followed by EE

would it hurt you to link "SE" and "EE" to some definition of "Start
Element" / "End Element" for readers less familiar w/ the jargon used
herein?

> The user defined meta-data MUST NOT be used unless it conveys a convention used by the application.

"user defined meta-data" is italicized, but it isn't linked, and the
use of "The" doesn't help me. If you dropped "The", I could almost
understand what you're saying. If the "The" is important, then this
italicized text SHOULD link to something defining it.

> The user defined meta-data conveys auxiliary information and does not alter or extend the EXI data format.
> Hence it deemed acceptable to omit this information.

it => it is | it was

> Elements that are necessary to structure the EXI options document according to the XML schema
> (i.e. lesscommon, uncommon, alignment, datatypeRepresentationMap, preserve and common)
> MUST be omitted unless there is at least one nested element according to the previous steps.

Ideally steps are in numbered form, or somehow called out beyond "by
the way, I hid steps somewhere before this point".
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3067
Commenter: timeless <timeless@gmail.com> (archived message)
Context: 4.2.3 Use the event with the most accurate event
assigned to Daniel Peintner
Resolution status:

> For Start Element events the order is as follows:
> SE( qname )
> SE ( uri : * )
> SE ( * )
> For Attribute events the order is as follows:
> AT( qname )
> AT ( uri : * )
> AT ( * )

Is there a reason that there's no space before the `(` for qname, but
there is a space for the other two `(`s?
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3072
Commenter: John Schneider <john.schneider@agiledelta.com> (archived message)
Context: 4.4.6 String and String Table
assigned to Daniel Peintner
Resolution status:

13. Section 4.4.6: The W3C is standardizing on Unicode Normalization Form C
and recommending all web data be stored and transmitted in this form.
It may be useful to state this and reference the relevant W3C specification here:
http://www.w3.org/TR/charmod-norm/.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3043: String table wording lacks terminology and/or reference
Commenter: timeless <timeless@gmail.com> (archived message)
Context: 4.4.6 String and String Table
assigned to Daniel Peintner
Resolution status:

> Further, a String value MUST be represented as string value hit if possible.

`hit` is used three times, only locally. It should either be defined
or linked to something.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3065: RCS wording lacks terminology and/or reference
Commenter: timeless <timeless@gmail.com> (archived message)
Context: 4.4.7 Restricted Character Sets
assigned to Daniel Peintner
Resolution status:

> The canonical representation dictates that characters from the restricted character set MUST use
> the according n-bit Unsigned Integer.

"according n-bit Unsigned Integer" sounds weird. If it's defined
elsewhere, please link. If not, please explain. (Or "according" could
be the wrong word.)
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3045: Language issues and typos
Commenter: timeless <timeless@gmail.com> (archived message)
Context: C.2 Exchange EXI Options (Best Practices "Work in Progress")
assigned to Daniel Peintner
Resolution status:

> The canonicalization process of EXI

> bases upon

awkward

> the knowledge of the used EXI options which is an optional part of the EXI header.
> These options communicate the various EXI options that have been used to encode the actual XML information with EXI and
> are crucial to be known.

awkward

> This sections

section => section | This => These

> provides some best practices - so that for example it can be successfully used as part of the digital signature framework or in other use-cases.
> Currently different options are discussed.

"discussed" or "under discussion" or ??

i.e. awkward
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3066
Commenter: timeless <timeless@gmail.com> (archived message)
Context: C.2.2 Option 2 - Uri scheme fragment identifier
assigned to Daniel Peintner
Resolution status:

> C.2.2 Option 2 - Uri scheme fragment identifier

URI is usually written as such.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Add a comment.


Developed and maintained by Dominique Hazaël-Massieux (dom@w3.org).
$Id: index.html,v 1.1 2017/08/11 06:44:14 dom Exp $
Please send bug reports and request for enhancements to w3t-sys.org