Canonical XML Version 1.0 Last Call Comments

Editor:: John Boyer, PureEdge Solutions Inc., jboyer@PureEdge.com

This page links to comments/issues raised on the XML Signature list during (and after) the Last Call. This page also describes how these issues were resolved.

Originator	Issue	Resolution

Doug Bunting Doug@ariba.com cXML Standards Manager, Ariba, Inc.	In Section 2, the paragraph reading "An element has attribute nodes to represent the non-namespace attribute declarations appearing in its start tag as well as nodes to represent default attributes that were not specified and not declared as #implied." implies the legality of an attribute declaration such as <!ATTLIST form method CDATA #IMPLIED "POST">. This is not legal according to section 3.3.2 of the XML Recommendation. "Default attributes" must be either declared as #FIXED or simply with an attribute value. [Bunting-1].	In [C14N-DataModel], the text is changed to the description you suggest in [Bunting-2]. Also, the other editorial tweaks given in [Bunting-1] have been addressed in [C14N-20000907] (the typo in the status section was corrected, the theorem/proof style was replaced by prose, the UTF-8 encoding was clarified, the required and recommended status of canonical forms without comments and with comments was specified, and numerous examples were added to clarify entity replacement as well as all other types of changes that C14N can make).
TAMURA Kent kent@trl.ibm.co.jp Tokyo Research Laboratory, IBM	It needs to be made clear that UTF-8 encoding is without a byte order mark. [Tamura] [Martin-1]	This is now addressed in [C14N-DataModel] based on the discussion in Section 3.2 of [UTF-16].
MURATA Makoto muraw3c@attglobal.net International University of Japan, Research Institute	Need to preserver the XML declaration, or at least the version number within it. [Murata] [Martin-1] [Cowan-Version-2]	Firstly, focus should be restricted to version, since maintaining encoding and standalone status may be quite incorrect. Secondly, the absence of version unambiguously identifies the canonical form as version 1.0. The purpose of this specification is to canonicalize XML 1.0, not future versions of XML. Hopefully few modifications will be required to support a future version of XML, but I expect that a change to XML that affects XML processors should percolate through InfoSet to XPath and finally to C14N, at which point the XML declaration should be available (which it currently is not). See prior acceptance of this decision [Cowan-Version-1].
Anli Shundi anli.shundi@nue.et-inf.uni-siegen.de Institute for Data Communications Systems, University of Siegen	The new c14n drafts since June 1st have no samples at all unlike the previous ones. I think clarification samples are needed. I hope this won't pose any delays in the publication track. [Shundi]	Added comprehensive set. See [C14N-Examples]
Susan Lesch lesch@w3.org W3C	Is "Canonical" a proper noun? It is lowercase in "canonical form" and uppercase in "Canonical XML." I would make it lowercase globally, except when referring to the title of this spec. [Lesch]	Canonical is not a proper noun because it is not a noun. It is an adjective that means "officially approved; authoritative; orthodox." [AHD] The phrase 'Canonical XML' is capitalized in the title, and that capitalization was carried through the document as an implicit reference to 'the officially approved form of XML defined by this document'. This has been fixed by lowercasing the word 'canonical' when not specifically referring to the document title (as suggested), then formally defining 'canonical XML' (see [C14NTerms]).
Susan Lesch lesch@w3.org W3C	"REQUIRED" and "MUST" seem to be used in the RFC 2119 sense. You might say so, and add http://www.ietf.org/rfc/rfc2119.txt to References. [Lesch]	Done. [C14NTerms] [C14NRefs]
Susan Lesch lesch@w3.org W3C	Below, a clause and paragraph number are followed by a quote and then a suggestion. Comments are in brackets []. 2. par. 7 [Define QName here, rather than wait until the last par. of clause 4.] 2. par. 8 URI [You might add a [URI] link, and add http://www.ietf.org/rfc/rfc2396.txt to References.] 2. par. 10, list item 4 XPath recommendation XPath Recommendation Clause 3, "Document Order for Canonical XML," needs to be numbered "3". 4. par. 4, list item 3 reference reference 5. last par. NOTE:XML NOTE: XML A.1. par. 1 rational rationale A.2. "...However, the statement in Namespaces in XML that "the prefix functions only as a placeholder for a namespace name" is incorrect." [Is it a known errata item? A negative comment inside a specification about another spec. seems somehow out of place without what you term "resolution." Sorry I didn't find it in a quick search of xml-names-editor, or in http://www.w3.org/XML/xml-names-19990114-errata. Maybe in place of "is incorrect" say "needs clarification" or "needs modification"?] A.2. Proof:Let [twice] Proof: Let Remark:The Remark: The Theorem 2:With Theorem 2: With Remark:Since Remark: Since conclusion to be draw conclusion to be drawn A.3. nodes (they are bound...). [I would make that two, shorter sentences:] nodes. (They are bound....)	All good. See [C14N-20000907]. Note that the theorems and proofs were replaced by prose now that they are better accepted.
Susan Lesch lesch@w3.org W3C	XML-related recommendations, working drafts XML-related Recommendations, Working Drafts	Didn't change this because I am not talking about a specific recommendation or working draft. My current understanding is that these should be capitalized in titles.
Martin J. Dürst duerst@w3.org W3C	These issues are I18N feedback from [Martin-1]. Canonical XML should include and xml declaration with the version information. This is important because there is some chance that a future version of XML will be defined to allow newly encoded characters in Unicode to be used in element/attribute names (e.g. Mongolian, Khmer,...). UTF-8 doesn't have a BOM For conversion (to UTF-8), please be careful (see Japanese XML profile, http://www.w3.org/TR/japanese-xml/). 'a UTF-8 encoding': There is only one UTF-8. Change e.g. to 'encoded as UTF-8'. Appendix A.1 should not use the word 'recommended' for the [CharModel] document, as that document is not (yet, I hope) a recommendation. Appendix A.1 should specifically say that security concerns were responsible for removing character normalization. The special provisions for xml:lang (and xml:space) on document subsets are appreciated.	The first and second points are addressed separately above. The third point I did not address because it seems obvious that implementers will have to be careful to correctly implement their programs. Anyone who actually has this problem should be diligent enough to make themselves sufficiently well-informed before writing code. The fourth point has been corrected. I use 'UTF-8 encoding' but not 'a UTF-8 encoding' in [C14N-20000907]. On the fifth point, I changed 'recommended' to 'is working on'. The sixth point was addressed, including a reference to a good example by Cowan [Cowan-Example]. On the seventh point, note that the modification was made most specifically to address xml:lang, however it seemed prudent to apply the propagation to all xml attributes, including those not yet defined.
Martin J. Dürst duerst@w3.org W3C	These issues are feedback from [Martin-2]. The issue of relative URIs in external entities (aka xml:base) needs very careful consideration. Please see other mails on this topic. For example, 'logically equivalent by the XML 1.0 Recommendation' touches on the xml:base issue. The first sentence of the Abstract is too long. Also, "equivalent for the purpose of many applications" I would suggest to change 'many' to 'most', or even be more strict. Ideally (but see the xml:base issue), it should be possible to say something like: No XML application can assume that the changes due to Canonicalization won't be done by some other application. 1., sencond paragraph: 'it is not a goal' ... 'such a method is unachievable'. Align so that there is no change from goal to method. 'The root also has a single element note': This is confusing, because this element node is also one of the children of the root, but the text gives a different impression. 'The XPath data model expects the XML processor to convert relative URIs to absolute URIs.': I'm very sure that this expectation is limited, most probably to namespace URIs. This should be expressed explicitly. 'then evaluating...': change to 'and then...'. The ordering of 'first namespaces, then (other/real) attributes' is expressed twice, both as 'imposing additional document order rules' and in the generation rules for element nodes. One of them, most probably the former, is redundant and should be removed. 'lexicographically least' -> 'lexicographically first' In 'Comment Nodes-': add a note saying that by default, comment nodes are not rendered. Bullet list in general: Make sure there is a space before the hyphens, e.g. 'Root Node -' instead of 'Root Node-'. - 'be the same one' -> 'be the same as the one' 'canonical form generator': This term appears for the first time here. On the web, the link helps to understand what it is supposed to mean, but on paper, this doensn't work. rational -> rationale for changes -> for the changes A.2: This too much gives the impression that canonical XML and the namespace Rec were/are wrong, and and XPath is right. I would prefer this to be rewritten to say things in a more neutral way, and shorter (without the theorems,...). A.3: Does this mean that in a plain document using no prefixes and no namespaces, every element will get xmlns="" added? Looks quite wasteful. References: Remove 'Avaliable at' in several instances.	The first point about xml:base has been the subject of quite a bit of scrutiny on my part and the part of collaborators. The problem exists without xml:base and can actually be mostly (but not completely) fixed by appropriate use of xml:base. The issue is discussed at length in [C14N-Limitations] The abstract was rewritten to address the second point. For the third point, the terminology was cleaned up slightly. However, the meaning is pretty clear. It is not a goal to achieve the method. The method is unachievable and therefore so is the goal. For the fourth point, this is the XPath terminology. Typically, the top-level element is called the document element or the root document element since it is the root of the subtree that contains all elements. I can't change the terminology of other recommendations. Added 'in namespace declarations' to address point five. For point six, note that 'then' is a grammatically valid way to start a dependent clause. 'and then' is not grammatically correct. So, I changed to just using 'and' plus the implicit assumption that the reader understands that 'and' is non-commutative in this case. If this redundancy were not included, there would be email telling me to clear up the ambiguity at this point. There are some small amounts of repetition throughout the document because people don't always get all the words, so if we say something in multiple ways and places, there is far less chance that implementers will miss the point. The term 'lexicographically least' is common and preferable to me because I'm trying to tell you which comes first, so telling you that the one which comes first is 'lexicographically first' seems like an incomplete definition (even though it is technically OK). Use of 'least' avoids confusion. The dash is like a colon, except I didn't use a colon to avoid confusion with namespaces. However, one typically doesn't put a space before a colon nor before a single dash. A terminology section has been added [C14NTerms] to address concerns such as (and including) point 12 Points 13 and 14 are corrected. Replaced theorems with prose. However, namespace rewriting did destroy certain aspects of the document, which is contrary to the intent of canonicalization. Though we do still have limitations, they were all undocumented limitations of the prior algorithm. The language pertaining to namespaces was softened (also requested by Susan Lesch) to indicate that the offending statement was only correct in a limited context that does not allow one to conclude the correctness of namespace rewriting On the last point, it is not only wasteful, but I later realized it could break content models if the DTD is reattached to the canonical form. So, C14N was modified to eliminate unnecessary namespace declarations. See [C14N-SuperfluousNSDecl] for more info.
John Boyer jboyer@PureEdge.com PureEdge Solutions Inc.	PIs don't seem to need special encoding rule for #xD because it can never occur in PI data (entities aren't allowed in PI data).	Fixed.
John Boyer jboyer@PureEdge.com PureEdge Solutions Inc.	It is necessary to change the input specification in order to sync up with transform processing model changes in the digital signature specification.	Fixed. The result is more generally applicable to other future uses of C14N. C14N should not evaluate a document subsetting expression; it should let the calling application do so, then pass the resultant node-set to C14N (ensuring that C14N's requirements of a node-set are observed).
Petteri Stenius Petteri.Stenius@done360.com Done Information, Ltd.	Can anything be done about insignificant whitespace? [Stenius]	Although whitespace normalization can be quite important, it is often too application-specific. However, most of the whitespace that would be deemed insignificant in all applications can be eliminated by sending as input an XPath node-set in which all whitespace-only text nodes have been removed, e.g. using the expression 'not(self::text()[normalize-space()=""])' on every node. The expression can also be enhanced by looking for xml:space='preserve' in ancestors of such text nodes. Our reasoning against adding this into the core C14N algorithm is that is difficult to capture all of the variations and also difficult to get most tools to provide enough information to do a good job of this. [Tamura-2]
Lauren Wood lauren@sqwest.bc.ca	Soften language in Section of 4.4 in [C14N-20000907] which claims that the Namespaces spec is incorrect in asserting that namespace prefixes have no information value. [Wood]	It is true that Namespaces came out before the other W3C recommendations (e.g. XPath and XSLT) that reference namespace prefixes. So, language has been changed to "there now exist a number of contexts in which namespace prefixes can impart information value in an XML document." [C14N-20001011].
Doug Bunting Doug@ariba.com cXML Standards Manager, Ariba, Inc.	In Section 2.1 of [C14N-20000907], clarify that turning the comment flag on doesn't result in retention of comments from the DTD. Also, the meaning of lexicographic order should be clarified as ascending order, and in the processing of comment children of the root node in Section 2.3, it should be made clear that 'root node' means the node above the document element [Bunting-3].	To Section 2.1, I added the following "Note that the XPath data model does not create comment nodes for comments appearing within the document type declaration (DTD)." I added comments about lexicographic meaning ascending order, both at the end of Section 2.2 and directly in the processing statements in Section 2.3. Finally, I added text to the 'root node' processing as well as to the comment node processing in Section 2.3 to ensure that it is known that 'root node' means the parent of the top-level document element and that comment children of the root are outside of the top-level document element (and the document type declaration).
Gregor Karlinger gregor.karlinger@iaik.at IAIK TU Graz	In [C14N-20000907], the example in section 3.6 needs a version number in the input document [Karlinger-1], and the example in section 3.4 needs to be corrected by removing the leading and trailing whitespace from the id attribute in the normId element. Also, the example is incorrect because it violates a validity constraint [Karlinger-2].	Fixed example 3.6 and the whitespace in example 3.4. The validity constraint was intentionally violated, so I left it in, but made it clearer that it was there, and I also added another item that demonstrates the other issues demonstrated by that line using a valid attribute [C14N-20001011].
Merlin Hughes merlin@baltimore.ie Baltimore Technologies, plc	In the example of section 3.5 of [C14N-20000907], the NOTATION needs a SYSTEM, and the example in 3.7 should not have an xmlns="" in element E3. As well, condition 2 for including xmlns="" appears to be redundant with condition 3.[Hughes].	Added SYSTEM in NOTATION of example 3.5. Corrected error in example 3.7 as follows: element e1 was supposed to be surrounding e2. Modified XPath expression to properly include e1 and exclude e2 despite the fact that e1 is now the parent of e2. Removed condition 2, which was redundant, and replaced it with a phrase in the prose that clarified that xmlns="" would not appear in [C14N-20001011].
Susan Lesch lesch@w3.org W3C	Typos and clarifications identified in [Lesch-2].	Made changes exactly as suggested with two exceptions. First, the occurrence of 'xml namespace' was not changed to capital XML, but rather I put a code tag around the xml. Second, example 3.3 was not changed to use example.org because there is no technical fault in the example, so it did not seem prudent to throw off implementation tests at this time.
Kevin Regan kevinr@valicert.com ValiCert	Many people will not get what they expect because c14n does not eliminate insignificant whitespace. [Regan]	This is a good point, but it has been considered numerous times in the past by the group, and interoperability with non-validating processors as well as correct operation in the absence of the DTD have been given as reasons for retaining insignificant whitespace, though applications are free to strip it before applying a signature, for example. For the final decision, please see [Reagle].
Gregor Karlinger gregor.karlinger@iaik.at IAIK TU Graz	Examples 3.4 and 3.7 use attributes of type ID. The examples do not have all DTD declarations necessary to be processable by a validating parser, yet the ID attributes seem to require a validating processor. [Karlinger-3]	This was considered by the group and the decision was that the maker of the non-validating processor being used had a bug to fix in terms of not handling ID attributes properly. [Reagle]
Jeff Cochran JCochran@docutouch.com DocuTouch	The conversion to UCS character domain should be done using Normalization Form C if converting from a non-UCS encoding. Currently, the document says 'if converting from a non-Unicode encoding.' There is a difference, specifically the character planes outside of the 17 planes representable by UTF-16.	Fixed with wording change provided by Martin J. Dürst [Martin-3]
Martin J. Dürst duerst@w3.org W3C	On Example 3.6, there needs to be a much better note to make very clear that (different to the other examples), this example is not really intended to be XML and cannot be used directly in a test. It would also be advisable to provide an actual file that contains the real bytes, or to point to it if that's already around. [Martin-4]	The actual text for all examples are provided in the comments of the specification, which you can get at by using view source. As to the specific example, there is a note indicating the difference, and it is the only note of the example.
Jonathan Marsh jmarsh@microsoft.com Microsoft Corporation	The method in Section 2.3 by which superfluous namespace declarations are omitted does not seem to match the following sentence from Section 4.6: "...omits a declaration if it determines that the immediate parent element in the canonical form has an equivalent declaration." [Marsh]	Added the words 'in scope' to the end of the sentence, as suggested by Karlinger [Karlinger-3]

References

AHD: American Heritage Dictionary, 2nd Ed..Houghton Mifflin Company, Boston. 1985.
See also www.m-w.com ('canonical' and 'canonical form')
Bunting-1: Typos and Implications in xml-c14n-20000613.Doug Bunting. July 2, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0006.html
Bunting-2: Regarding Response to Typos and Implications in xml-c14n-20000613.Doug Bunting. July 3, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0007.html
Bunting-3: RE: Last Call.Doug Bunting. September 26, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0532.html
C14N-20000907: Canonical XML Version 1.0.W3C Working Draft. John Boyer. September 7, 2000.
http://www.w3.org/TR/2000/WD-xml-c14n-20000907
C14N-20001011: Canonical XML Version 1.0.W3C Working Draft. John Boyer. October 11, 2000.
http://www.w3.org/TR/2000/WD-xml-c14n-20001011.html
C14N-20001026: Canonical XML Version 1.0.W3C Candidate Recommendation. John Boyer. October 26, 2000.
http://www.w3.org/TR/2000/CR-xml-c14n-20001026
C14N-DataModel: Canonical XML Version 1.0.W3C Candidate Recommendation. John Boyer. October 26, 2000.
http://www.w3.org/TR/2000/CR-xml-c14n-20001026#DataModel
C14N-Examples: Canonical XML Version 1.0.W3C Candidate Recommendation. John Boyer. October 26, 2000.
http://www.w3.org/TR/2000/CR-xml-c14n-20001026#Examples
C14N-Limitations: Canonical XML Version 1.0.W3C Candidate Recommendation. John Boyer. October 26, 2000.
http://www.w3.org/TR/2000/CR-xml-c14n-20001026#Limitations
C14NRefs: Canonical XML Version 1.0.W3C Candidate Recommendation. John Boyer. October 26, 2000.
http://www.w3.org/TR/2000/CR-xml-c14n-20001026#bibliography
C14N-SuperfluousNSDecl: Canonical XML Version 1.0.W3C Candidate Recommendation. John Boyer. October 26, 2000.
http://www.w3.org/TR/2000/CR-xml-c14n-20001026#SuperfluousNSDecl
C14NTerms: Canonical XML Version 1.0.W3C Candidate Recommendation. John Boyer. October 26, 2000.
http://www.w3.org/TR/2000/CR-xml-c14n-20001026#Terminology
C14N-20000710: Canonical XML Version 1.0.W3C Working Draft. John Boyer. July 10, 2000.
http://www.w3.org/TR/2000/WD-xml-c14n-20000710
Cowan-Example: Example of Harmful Effect of Character Model Normalization.John Cowan. July 7, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0038.html
Cowan-Version-1: First Letter about XML Version in Canonical Forms.John Cowan. June 6, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000AprJun/0222.html
Cowan-Version-2: Second Letter about XML Version in Canonical Forms.John Cowan. August 9, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0231.html
Hughes: Tentative Signature Over C14N Examples.Merlin Hughes. October 6, 2000.
Archived Response at http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000OctDec/0032.html
Karlinger-1: Canonical XML Comment (Example 3.6).Gregor Karlinger. September 13, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0481.html
Karlinger-2: Canonical XML Comment (Example 3.4).Gregor Karlinger. September 14, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0489.html
Karlinger-3: Questions about ID Attributes.Gregor Karlinger. November 3, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000OctDec/0101.html
Karlinger-4: AW: Canonical XML Typo.Gregor Karlinger. December 1, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000OctDec/0202.html
Lesch: Minor Comments for C14N-20000710.Susan Lesch. July 13, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0099.html
Lesch-2: Minor Typos in cr-xml-c14n-20001026.Susan Lesch. October 29, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000OctDec/0094.html
Marsh: Canonical XML Typo.Jonathan Marsh. November 30, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000OctDec/0200.html
Martin-1: I18N Feedback on C14N-20000710.Martin J. Dürst. August 1, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0211.html
Martin-2: Personal Feedback on C14N-20000710.Martin J. Dürst. August 1, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0212.html
Martin-3: Re: Character Encoding Question.Martin Dürst. November 30, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000OctDec/0225.html
Martin-4: Fwd: I18N problem in XML canonicalisation.Martin Dürst. November 24, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000OctDec/0171.html
Murata: Letter about XML Version in Canonical Forms.MURATA Makoto. July 31, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0216.html
Reagle: Closing on Whitespace and Examples 3.4 and 3.7.Joseph Reagle. November 22, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000OctDec/0163.html
Regan: Question on Latest Spec.Kevin Regan. November 12, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000OctDec/0126.html
Shundi: Letter to DSig WG about C14N Examples.Anli Shundi. July 7, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0025.html
Stenius: Issue with Insignificant Whitespace.Petteri Stenius. August 29, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0365.html
Tamura: Clarify UTF-8.TAMURA Kent. July 10, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0045.html
Tamura-2: Message against Whitespace Normalization in C14N.TAMURA Kent. September 6, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0425.html
UTF-16: UTF-16, an encoding of ISO 10646.IETF RFC 2279. F. Yergeau. January 1998.
http://www.ietf.org/rfc/rfc2279.txt
Wood: Re: comments on the XML Canonical specification.Lauren Wood. September 11, 2000.
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JulSep/0464.html