Disposition of comments for the Efficient Extensible Interchange Working Group

Not all comments have been marked as replied to. The disposition of comments is not complete.

In the table below, red is in the WG decision column indicates that the Working Group didn't agree with the comment, green indicates that a it agreed with it, and yellow reflects an in-between situation.

In the "Commentor reply" column, red indicates the commenter objected to the WG resolution, green indicates approval, and yellow means the commenter didn't respond to the request for feedback.

Commentor	Comment	Working Group decision	Commentor reply
LC-3069 John Schneider `<john.schneider@agiledelta.com>` (archived comment)	1. Architecture & Design: The specification defines canonical EXI with respect to an input EXI stream. This limits one’s ability to use canonical EXI with traditional XML or other XML Infoset representations and creates a poor architectural fit with the rest of the XML stack of technologies that are defined with respect to the XML Infoset. The strict dependency on an EXI input stream, the EXI options document and the EXI schemaId creates intrinsic incompatibilities with XML, which does not support these EXI-specific artifacts. This leads to practical implementation problems, such as the inability for canonical EXI to support digital signatures through XML intermediary nodes, which you identified at the end of section A.1. To be useful in all XML contexts and with all XML technologies, EXI canonicalization must be defined with respect to the XML Infoset. We recommend you update the specification to define canonical EXI with respect to a given XML Infoset, a given XML Schema and a given set of EXI options. The schema and EXI options may be provided any number of ways, as you describe well in section C.2. As with EXI, the user should be allowed to embed these in the EXI header when it is advantageous, but should not be required to do so when it is not. Mandating the inclusion of the EXI options and a schemaID in every message is at odds with EXI’s efficiency objectives and makes it onerous to use canonical EXI as a transmission format. As you point out in section C.1., using canonical EXI as a transmission format can eliminate the need to perform [redundant] canonicalization at the receiver — further increasing efficiency. We have users that currently employ canonical EXI this way and it is very advantageous to them. However, requiring the EXI options and schemaId in every message would quickly overwhelm the benefits of using canonical EXI as a transmission format. 5. Section 4: As stated above, to be useful in all XML contexts and with all XML technologies, EXI canonicalization must be defined with respect to a given XML Infoset rather than a given EXI stream. The semantics of the specification should be specified with respect to a given XML Infoset, a given XML Schema and a given set of EXI options (independent on how these are acquired). 15. Section A.1: The second paragraph states that Canonical EXI deals with EXI documents. As alluded in the third paragraph of this section, this is not strictly true. Canonical EXI should be usable with and provide benefits to XML, EXI or any other XML Infoset representation. However, as stated earlier in these comments, canonical EXI must be defined with respect to the XML Infoset rather than an EXI input document to achieve this. Defining EXI canonicalization with respect to only EXI is limiting and fails to realize the full potential of the technology. The last sentence in this section also states that it is not possible to use XML on intermediary nodes when Canonical EXI has been used for signing. This is a limitation of the current specification and not of canonical EXI in general. If you define canonical EXI with regard to a given XML Infoset, XML Schema and given set of EXI options and ensure all EXI nodes use the same XML Schema and EXI options, this limitation goes away. As stated earlier, there are more reliable and efficient ways to ensure cooperating nodes use the same XML Schemas and EXI options than including the EXI options document and schemaId in every message. And these methods do not fail when transcoding to XML because they do not depend on the XML/EXI message for the schema and EXI options. The reason the current specification fails in this regard is because it depends strictly on the EXI document to carry the options and schemaId and transcoding to XML loses this information. As discussed earlier, this is a design flaw that should be fixed.	We agree with your comment that Canonical EXI should be based on XML Infoset and changed the specification accordingly.	yes
LC-3070 John Schneider `<john.schneider@agiledelta.com>` (archived comment)	4. Section 3: As mentioned above, making the EXI options document and the EXI schemaId mandatory in every canonical EXI document is at odds with the efficiency objectives of EXI. In many or perhaps even most use cases that require efficiency, these can be (and are) provided out of band or specified by a higher-level protocol. As such, including them in every canonical EXI message introduces unnecessary overhead and provides no value since all cooperating nodes already have this information. Furthermore, forcing the inclusion of a schemaId in every message does not actually solve the problem of ensuring the sender and receiver use the same schemas. The EXI schemaId is not guaranteed to be unique and would be easy for a sender and receiver to end up using the same schemaId for two different versions of the same schema or even two completely different schemas (breaking any signature that depends on schemaId). There are more reliable ways to ensure senders and receivers are using the same schemas for encoding/decoding EXI documents. This problem is not unique to EXI canonicalization and the EXI canonicalization specification should not force a specific, sub-optimal solution on EXI users. As with EXI, users should be allowed to use the EXI options document and schemaId to address this issue, but they should not be forced to do so if they have a better, more efficient solution that is already working.	The WG acknowledges there is a desire and need to allow applications to choose whether they use header options and schemaId in the canonical form.	yes
LC-3073 John Schneider `<john.schneider@agiledelta.com>` (archived comment)	8. Section 4.2.2: The meaning of this section is not entirely clear. Presumably, it is not possible with the current EXI specification to use a production that is not capable of representing the content value by definition). Are there circumstances that this section is attempting to prohibit that are currently allowed by the EXI 1.0 specification?	This section has been revised. The intent is to privilege schema-valid events over untyped events. To do so Section 4.2.2 and 4.2.3 have been combined into one. In this spirit, an EXI processor MUST use the event that matches most precisely first.	yes
LC-3074 John Schneider `<john.schneider@agiledelta.com>` (archived comment)	10. Section 4.4: The last sentences of this section indicates that Canonical EXI processors SHOULD be able to convert an untyped value to each datatype representation defined in EXI 1.0. This special language would not be required if EXI canonicalization were defined more generally with respect to the XML Infoset rather than an input EXI stream	Correct. Not required any-more given that we base our algorithm on XML Infoset.	yes
LC-3075 John Schneider `<john.schneider@agiledelta.com>` (archived comment)	11. Section 4.4.1: The last sentence of this section specifies that all canonical EXI processors MUST support arbitrarily large integer values. This means there will be some canonical EXI documents that devices without support for arbitrarily large integers cannot process. Recommend you consider updating this definition so it is possible to generate a canonical representation for any EXI document that any device that meets the minimum EXI processing requirements can handle. In particular, recommend you consider changing this definition such that canonical EXI processors MUST represent all Unsigned Integer values using the Unsigned Integer datatype representation when strict is true. However, when strict is false canonical EXI processors must represent Unsigned Integer values greater than 2147483647 using the String datatype representation. This would enable devices with limited capabilities to at least read, display and retransmit arbitrarily large values — even if they don’t have the capability to process them.	We think that retransmitting arbitrary large values is doable also if the the device is not capable to represent it properly. Note: a limited intermediary device can fall-back to string. The only device that has to use integer encoding is the one that checks the signature and requires a canonicalized document.	yes
LC-3077 John Schneider `<john.schneider@agiledelta.com>` (archived comment)	I’ve been following the discussion regarding Canonical EXI’s treatment of empty elements and would like to offer a suggestion to simplify the wording and improve the efficiency of the proposed solution. Here is what I would propose: “When strict is false or the current element grammar contains a production of the form LeftHandSide : EE with event code of length 1, EXI can represent the content of an empty element explicitly as an empty CH event or implicitly as a SE event immediately followed by an EE event. In these circumstances, Canonical EXI MUST represent an empty element by a SE event followed by an EE event.” I think this description states the issue and the alternate solution simply and clearly. The alternate solution improves compactness by prescribing the most efficient representation of an empty character event when it is available (i.e., by omitting the CH event). It improves processing efficiency by requiring only 1-2 checks (strict & available EE) and does not require knowledge or checking against DTR types. These checks occur in a relatively hot code path, so minimizing overhead is important for efficiency. Because the alternate approach does not depend on DTR knowledge, it also avoids the need to describe how to handle user defined DTRs that can also encode empty strings (which the current proposal does not address).	There are two approaches proposed on how to define rules regarding the encoding of empty elements in schema-informed context. Please provide any opinions as to which of those approaches you consider more appropriate to have as part of Canonical EXI. The behavior of each approach is described below. Approach A: This approach always first tries to encode empty elements (i.e. SE followed by EE, optionally AT, etc. in between) as a sequence of SE CH EE (optionally AT etc. between SE and CH) where CH is used for representing empty string, for elements defined to have simple-content, as long as doing so is possible (i.e. unless the codec in effect does not permit to encode empty string ""). Approach B: This approach encodes empty elements (i.e. SE followed by EE, optionally AT, etc. in between) as a sequence of SE EE (optionally AT etc. in between). As an exception, for elements defined to have simple-content, it is allowed to insert CH that represents empty string "" between SE and EE only when doing so is necessary for representing an empty element there. Note the approach B provides better efficiency, while approach B leads to generate the same sequence of events whether strict or non-strict mode. --------------------------------------------------------------------- After considering several opinions that were discussed here on this issue [1], the group agreed to take approach B. The editor's draft [2] will soon reflect this decision. [1] https://www.w3.org/2005/06/tracker/exi/issues/112 [2] https://www.w3.org/XML/EXI/docs/canonical/canonical-exi.html#emptyElementContent	yes
LC-3078 John Schneider `<john.schneider@agiledelta.com>` (archived comment)	16. Section C.2: It is interesting and encouraging to see a good description of best practices for sharing EXI options without the EXI options document. This is the flexibility the specification should allow rather than mandating that the EXI options and schemaId be specified inside every canonical EXI stream.	The WG uses the following form to communicate EXI-C14 options. <exi-c14n:options xmlns:exi="http://www.w3.org/2009/exi" xmlns:exi-c14n="http://www.w3.org/2016/exi-c14n"> <exi-c14n:omitOptionsDocument/> <exi-c14n:utcTime/> <exi:header> <exi:common> <exi:compression/> </exi:common> </exi:header> </exi-c14n:options> The WG agrees that it should be possible to have one way to represent every set of Canonical EXI options.	yes
LC-3044 timeless `<timeless@gmail.com>` (archived comment)	> EXI can be used in such use cases and offers benefits w.r.t. compact data exchange and fast processing. > To ensure that relevant Infoset items are available the following > EXI Fidelity Options must be always enabled: > Preserve.pis, Preserve.prefixes, and Preserve.lexicalValues. > When the XML canonicalization algorithm preserves comments > the EXI fidelity option Preserve.comments must be also enabled. //This almost feels like normative instruction, and I don't recall similar instructions in the main document. //If similar instructions do exist in the main document, a pointer would be appreciated. I've decided the following is the block could benefit from emendation: > Canonical XML is designed to be useful to applications that test whether an XML document has been changed (e.g., XML signature). I read the "is" here as indicating it was something defined in this document. I think this text is actually referring to something beyond this document, in which case, I'd suggest: is => was Alternatively you could prefix the sentence with "While" or something (but that would involve rewriting the end of the sentence).... > Canonical EXI, in contrast to Canonical XML, deals with EXI documents and does not use plain-text XML data and the associated overhead. the => its	A pointer to Best Practices document was added to address the first point. (see http://www.w3.org/TR/exi-best-practices/#signature) The whole paragraph in question now reads as follows. "The Canonical EXI documents does not want to tackle XML. Instead it deals with EXI only and this section is just a recap of information shared in our best practice document. A pointer to this document is added (see http://www.w3.org/TR/exi-best-practices/#signature)"	tocheck
LC-3071 John Schneider `<john.schneider@agiledelta.com>` (archived comment)	2. Section 1, last sentence: Change “… whether two documents are identical …” to “… whether two documents are equivalent …” 3. Section 1.2: We agree EXI canonicalization is important for EXI environments that cannot afford to revert to traditional XML canonicalization methods. In addition, we recommend you mention some of the ways EXI canonicalization is useful for traditional XML users. For example, EXI canonicalization provides the first type-aware canonicalization scheme that can discern that +1, 1, 1.0, 1e0 and 1E0 are equivalent representations of the same floating-point value. This allows intermediaries to use binding-models and/or type-aware processing without breaking signatures. In addition, with a fast EXI processor, EXI canonicalization can be much faster than traditional XML canonicalization and can help cure some of the well-known XML security bottlenecks. 6. Section 4.2.1: Change “Prune productions” to “Select productions” in heading. Pruning productions will remove them from the grammars, changing the event codes of the following events and causing incompatibility with the EXI 1.0 specification. I expect the specification intends to specify which productions must be selected rather than removing productions from the grammars. 7. Section 4.2.2: Change “Prune productions” to “Select productions” in heading. The word “prune” should also be replaced in the body of this section. See above rationale. 9. Section 4.2.3: Change heading “Use the event with the most accurate event” to “Use the event that matches most precisely” or something similar. Current wording is unclear. 14. Section 4.4.6: The last sentence in the second paragraph states that EXI processors must first try to represent the string value as a local hit and when this is not successful as a global hit. It might be useful to clarify that one of the reasons the attempt to represent the string value as a local hit may fail is because the string has already been used as a local hit previously. EXI supports only one local table hit per value.	Agreed and implemented.	yes
LC-3072 John Schneider `<john.schneider@agiledelta.com>` (archived comment)	13. Section 4.4.6: The W3C is standardizing on Unicode Normalization Form C and recommending all web data be stored and transmitted in this form. It may be useful to state this and reference the relevant W3C specification here: http://www.w3.org/TR/charmod-norm/.	A reference has been added. However also the Canonical XML spec has excluded unicode normalization and the working group decided to follow this path.	yes
LC-3042 timeless `<timeless@gmail.com>` (archived comment)	> On the contrary, values that match the default value (i.e. <blockSize>1000000</blockSize>) MUST be omitted. On the contrary => conversely > When the alignment option compression is set, pre-compress MUST be used instead. instead of ? > Moreover, the EXI event sequence of each nested element MUST be SE followed by EE would it hurt you to link "SE" and "EE" to some definition of "Start Element" / "End Element" for readers less familiar w/ the jargon used herein? > The user defined meta-data MUST NOT be used unless it conveys a convention used by the application. "user defined meta-data" is italicized, but it isn't linked, and the use of "The" doesn't help me. If you dropped "The", I could almost understand what you're saying. If the "The" is important, then this italicized text SHOULD link to something defining it. > The user defined meta-data conveys auxiliary information and does not alter or extend the EXI data format. > Hence it deemed acceptable to omit this information. it => it is \| it was > Elements that are necessary to structure the EXI options document according to the XML schema > (i.e. lesscommon, uncommon, alignment, datatypeRepresentationMap, preserve and common) > MUST be omitted unless there is at least one nested element according to the previous steps. Ideally steps are in numbered form, or somehow called out beyond "by the way, I hid steps somewhere before this point".	> On the contrary => conversely Agree. > instead of ? The bullet list item has been changed to "When the alignment option compression is set, pre-compress MUST be used instead of compression." Further, references to all EXI options have been added. > would it hurt you to link "SE" and "EE" to some definition of "Start > Element" / "End Element" for readers less familiar w/ the jargon used > herein? Start Element (SE) and respectively End Element (EE) is used instead of the abbreviations. A link to EXI event types has been added also. > "user defined meta-data" is italicized, but it isn't linked, and the > use of "The" doesn't help me. If you dropped "The", I could almost > understand what you're saying. If the "The" is important, then this > italicized text SHOULD link to something defining it. "user defined meta-data" links now to the EXI specification. > it => it is \| it was Changed to "it is". > Ideally steps are in numbered form, or somehow called out beyond "by > the way, I hid steps somewhere before this point". The bullet list has been changed to a numbered list.	tocheck
LC-3043 timeless `<timeless@gmail.com>` (archived comment)	> Further, a String value MUST be represented as string value hit if possible. `hit` is used three times, only locally. It should either be defined or linked to something.	The terminology has been aligned with the EXI specification that uses "when a string value is found in the global or local value partition" and a reference has been added. http://www.w3.org/TR/exi/#encodingOptimizedForMisses	tocheck
LC-3065 timeless `<timeless@gmail.com>` (archived comment)	> The canonical representation dictates that characters from the restricted character set MUST use > the according n-bit Unsigned Integer. "according n-bit Unsigned Integer" sounds weird. If it's defined elsewhere, please link. If not, please explain. (Or "according" could be the wrong word.)	The link to http://www.w3.org/TR/exi/#encodingBoundedUnsigned has been added.	tocheck
LC-3045 timeless `<timeless@gmail.com>` (archived comment)	> The canonicalization process of EXI > bases upon awkward > the knowledge of the used EXI options which is an optional part of the EXI header. > These options communicate the various EXI options that have been used to encode the actual XML information with EXI and > are crucial to be known. awkward > This sections section => section \| This => These > provides some best practices - so that for example it can be successfully used as part of the digital signature framework or in other use-cases. > Currently different options are discussed. "discussed" or "under discussion" or ?? i.e. awkward	Proposed updates agreed. Appendix C.2 "Exchange EXI Options" has been updated to: "The canonicalization process of EXI is based on the knowledge of the used EXI options. The EXI options communicate the various options that have been used to encode the actual XML information with EXI and are essential for any EXI processor. Given that the presence of EXI options in its entirety is optional in the EXI header, the following subsections provide and discuss best practices how to exchange them - so that for example it can be successfully used as part of the digital signature framework or in other use-cases. "	tocheck
LC-3068 timeless `<timeless@gmail.com>` (archived comment)	> Optimizations such as pruning insignificant xsi:type values (e.g., xsi:type="xsd:string" for string values) > or insignificant xsi:nil values (e.g., xsi:nil="false") > is prohibited for a Canonical EXI processor. I think: is => are > where the rules of determining equivalence is described below. is => are (?) > A rationale for each decision is given as well as background information is provided. as well as => and > Example B-3. Example algorithm for converting float values to the canonical form Example..Example? > Initialize the exponent with the value 0 (zero) and jump to step 2. s/. and j/. J/ > If the value after the decimal point can be represented as 0 (zero) > without losing precision jump to step 4, otherwise to step 3. s/precision jump/precision, [then] jump/ s/otherwise/otherwise jump/ > If the signed mantissa is unequal 0 (zero), unequal -0 (negative zero), and contains a trailing > zero jump to 6, otherwise to step 7. s/zero jump/zero, [then] jump/ s/otherwise/otherwise jump/	Agree. As to Example B-3, it now reads as follows: "Example B-3. An algorithm for converting float values to the canonical form"	tocheck