RE: Followup on I18N Last Call comments and disposition from John Boyer on 2000-06-28 (w3c-ietf-xmldsig@w3.org from April to June 2000)

From: John Boyer <jboyer@PureEdge.com>
Date: Wed, 28 Jun 2000 09:15:55 -0700
To: "Martin J. Duerst" <duerst@w3.org>, "Joseph M. Reagle Jr." <reagle@w3.org>
Cc: <w3c-ietf-xmldsig@w3.org>
Message-ID: <BFEDKCINEPLBDLODCODKCEIECDAA.jboyer@PureEdge.com>
Hi Martin and Joseph,

Regarding the treatment of xml:lang, I was concerned about the possible loss
of this information by the XPath transform.  I was also concerned about the
loss of xml:space and future xml: attributes (e.g. xml:base).

The Xpath transform is now really just a call-through to the new
canonicalization algorithm, which can take an XPath expression other than
the default (the default expression renders the whole document except
comments, which means it can be implemented without actually using XPath).

However, Section 5 of the new c14n addresses document subsets, which is the
only place where the loss of xml:lang and other xml related attributes can
affect the meaning of information that is retained by the XPath expression.
In that section, I added the requirement to obtain copies of attributes in
the xml namespace from ancestors of an element E if E's parent is excluded
from the node-set.

Though a less onerous requirement than namespace propagation, it is
identical in intent.  The loss of namespace information from ancestors is
clearly a security risk, and it would be very difficult for an XPath
expression author to account for this.  Likewise, the unintentional loss of
xml:lang and similar attributes in the xml namespace is a security risk that
would be too difficult to account for in the XPath expression.

To be honest, most of the applications for document subsetting that I have
in mind involve the elimination of a leaf or subtree of the parse tree, so
the problem would not come up.  Nonetheless, XPath is a more generic
feature, and to be 'in for a pound', we accounted for the problem.

***************************************
John Boyer,
Software Development Manager

PureEdge Solutions (formerly UWI.Com)
Creating Binding E-Commerce

v:250-479-8334, ext. 143 f:250-479-3772
1-888-517-2675  http://www.PureEdge.com
***************************************



-----Original Message-----
From: Martin J. Duerst [mailto:duerst@w3.org]
Sent: Tuesday, June 27, 2000 7:50 PM
To: Joseph M. Reagle Jr.
Cc: w3c-ietf-xmldsig@w3.org; John Boyer
Subject: Re: Followup on I18N Last Call comments and disposition


Hello Joseph,

Many thanks for your comments. I think we will
proceed rapidly, but some things need to be cleared
up first.

At 00/06/27 15:44 -0400, Joseph M. Reagle Jr. wrote:
>At 16:32 6/27/00 +0900, Martin J. Duerst wrote:
>  >Hello Joseph
>
>Hi Martin, again, thank you for the comments.
>
>[
>BTW: back in [a], what did you mean by:
> >- General
> >   The treatment of xml:lang, eg during transforms, is unclear.
>[a]
http://lists.w3.org/Archives/Public/w3c-ietf-xmldsig/2000JanMar/0254.html
>]

What we meant is that because xml:lang is an attribute
that inherits, but transforms (e.g. XPath, XSLT) don't
take care of such attributes, there is a risk that the
result has xml:lang in the wrong places, unless the
transform is carefully designed.

If the only aim of the transform is to sign, and loss/
misplacement of xml:lang does not produce security holes,
then this may be moot, otherwise we have to consider this.


>  >      [35] http://www.w3.org/TR/xmldsig-core/
>  >
>  >    It is not clear what "signature application" means. If this means an
>  >    application that produces signatures, we do not understand the
>  >    sentence in the last paragraph of 7.0 which recommends that such
>  >    applications produce normalized XML.
>
>A Signature application is an application that is conformant with the XML
>Signature specification.
>
>  >**** It has to be cristal-clear that no actual normalization should
>  >      occur in connection with any signing calculation. The current
>  >      text is not clear enough.
>
>Presently, the specification states:
>
>         We RECOMMEND that signature applications produce XML
>         content in Normalized Form C [NFC] and check that any XML
>         being consumed is in that form as well (if not, signatures may
>         consequently fail to validate).
>
>http://www.w3.org/TR/2000/WD-xmldsig-core-20000601/#sec-XML-Canonicalizatio
n
>
>Consequently, any application that states it is conformant with the XML
>Signature specification SHOULD do the above.

I can imagine two different kinds of signature applications:

1) Applications that create their content and sign it.
2) Applications that take content created elsewhere and
   sign it.

For 1), the above is the right recommendation.
For 2), the above is wrong.

There may not be a need to make a distinction between
1) and 2) in the rest of the spec, but here this is
needed.


>That's what it means, I think
>your comment is introducing the next issue:
>
>  >    A note should be added explaining the security problem mentioned in
>  >    our Last Call comments.
>  >
>  >**** E.g. in Section 8, at a convenient location (e.g. 8.1), add
>  >      something like: Using character normalization (Normalization
>  >      Form C of UTR #15) as a transform or as part of a transform
>  >      can remove differences that are treated as relevant by most
>  >      if not all XML processors. Character normalization should
>  >      therefore be done at the origin of a document, and only
>  >      checked, but never be done during signature processing.
>
>I propose text and re-orged the first two points of section 8 to deal with
>this:
>
>   8.1.3 Transforms Can Aid Birthday Attacks
>
>   In addition to the semantic concerns of transforms removing or
>   including data from a source document prior to signing, there is
>   potential for syntactical collision attacks. For instance, consider a
>   signature which includes a transform that changes the character
>   normalization of the source document to Normalized Form C [NFC]. This
>   transform dramatically increases the number of documents that when
>   transformed and digested yield the same hash value. Consequently, an
>   attacker could include a subsantive syntactical and semantic change to
>   the document by varying other inconsequential syntactical values that
>   are normalized prior to digesting such that the tampered signature
>   document is considered valid. Consequently, while we RECOMMEND all
>   documents operated upon and generated by signature applications be in
>   [NFC] (otherwise intermediate processors can unintentionally break the
>   signature) encoding normalizations SHOULD NOT be done as part of a
>   signature transform.
>   http://www.w3.org/Signature/2000/06/section-8-I18N.html

I think this puts two different kinds of concerns into the
same pot (but I'm not exactly sure, because I'm not really
familiar with the security language).



>  >    It should be mandated that, when a document is transcoded from a
>  >    non-Unicode encoding to Unicode as part of C14N, normalization must
be
>  >    performed (At the bottom of section 2 and also in A.4 in the June
1st
>  >    2000 draft).
>  >
>  >**** Unicode-based encodings are e.g. UTF-8 and UTF-16. In most cases,
>  >      the transcoding result is normalized just by design of
Normalization
>  >      Form C, but there are some exceptions.
>  >
>  >**** The above also applies to other transcodings, e.g. done as part
>  >      of the evaluation of an XPath transform or the minimal
>canonicalization.
>
>Is this precluded by the security concern raised above?

No, this is something different. It just helps to make
transcodings as deterministic as possible. If some transcoder
e.g. would convert a precomposed character in iso-8859-1 to
a decomposed representation (non-normalized), checking
would just not work.



>  ><<<<
>  >Further comments after careful reading of June-1-Core:
>  >
>  >In 4.3.3, the text on URI-Reference and "non-ASCII" characters
>  >should be alligned with that in XPointer
>  >http://www.w3.org/TR/xptr#uri-escaping to make sure all the
>  >details are correct.
>
>I believe the present text is sufficient:
>
>         (Non-ASCII characters in a URI should be represented in
>         UTF-8 [UTF-8] as one or more bytes, and then escaping these
>         bytes with the URI escaping mechanism. [XML])
>         http://www.w3.org/TR/2000/WD-xmldsig-core-20000601/#sec-Reference
>
>To state that all fragment identifiers (even from other MIME types) must be
>escaped as defined by XPTr would be innapproriate. If someone uses XPTr,
>they should follow the XPtr spec of course.

I didn't mean to state 'as defined by XPTr'. I was asking to take
the text from XPTr. I should better have asked you to take the
text from XLink (currently W3C member-only pre-published version),
this may have avoided confusion.

In XPointer, the conversion is described for the case of constructing
an XPointer and making a legal URI fragment out of it. This may indeed
be different for different fragment identifiers.

You are on the other side, what you have to describe is how to take
URI References appearing in DSig syntax (which may contain characters
outside ASCII) and converting them to URI References formally conforming
to URI syntax. In the case of XPointer (and some other cases), the fact
that both procedures are identical will allow users to put XPointers
(and other stuff) containing e.g. non-ASCII characters into DSig syntax
as characters (without constant %HH escaping). It's two sides of the
same medal, so to say.



>  >4.5 The Encoding attribute of <Object> is not described. Is that
>  >something like 'charset', or something like base64, or what.
>  >This needs to be clearly described or removed.
>
>I propose, "The Object's Encoding attributed may be used to provide a URI
>that identifies the method by which the object is encoded."

What would that be? Any such URI examples? Any examples
where this is needed?


>  >6.5 "Canonicalization algorithms take one implicit parameter:"
>  >Wrong, the charset is also an implicit parameter, and is very
>  >important. The spec must say that this parameter is derived
>  >according to the rules for the relevant protocols and formats,
>  >and that in particular for XML, the rules defined in RFC 2376
>  >or its successor apply.
>  >The spec should also say that in order to be able to correctly
>  >sign and verify web documents, it is important that this information
>  >is delivered correctly, and that this may require settings
>  >on the server side.
>  >The spec should also say that for some 'charset's, there may
>  >be differences for some characters in which Unicode character
>  >a given character is converted, and and should point to
>  >the XML Japanese Profile (http://www.w3.org/TR/japanese-xml/)
>  >submission for an example and some advice. In particular,
>  >documents intended for digital signing should preferably
>  >be created using UTF-8 or UTF-16 from the start.
>  >The following sentence should also be added to 6.5:
>  >Various canonicalization algorithms require conversion to
>  >UTF-8. Where any such algorithm is REQUIRED or RECOMMENDED,
>  >this means that that algorithm has to understand at least
>  >UTF-8 and UTF-16 as input encodings. Knowledge of other
>  >encodings is OPTIONAL.
>  >
>  >[The disposition of comments says "(these issues can be
>  >better addressed in the C14N spec)", but because this
>  >also affects minimal canonicalization, XPath transform,...,
>  >this is not true.]
>
>Proposed text:
>
>Canonicalization algorithms takes two implicit parameter when they appear
as
>a CanonicalizationMethod within the SignedInfo element: the content and its
>charset. (Note, there may be ambiguities in converting existing charsets to
>Unicode, see XML Japanese Profile [XML-Japanese] for more information.) The
               ^ for example


>charset is derived according to the rules of the transport protocols and
>media formats (e.g, RFC2376 [XML-MT] defines the media types for XML). This
>information is necessary to correctly sign and verify documents and often
>requires careful server side configuration. Various canonicalization
>algorithms require conversion to [UTF-8]. Where any such algorithm is
>REQUIRED or RECOMMENDED the algorithm MUST understand at least [UTF-8] and
>[UTF-16] as input encodings. Knowledge of other encodings is OPTIONAL

Looks good (except for a period at the end :-).




>I leave John to address the following:

Ok.


Regards,   Martin.
Received on Wednesday, 28 June 2000 12:16:10 UTC