Last Call Comment: Canonicalization from Juha Pääjärvi on 2000-03-26 (w3c-ietf-xmldsig@w3.org from January to March 2000)

From: Juha Pääjärvi <juha@firsthop.com>
Date: Sun, 26 Mar 2000 17:08:56 +0300 (EEST)
To: w3c-ietf-xmldsig@w3.org
Message-ID: <Pine.LNX.3.96.1000326165140.788A-100000@facade.firsthop.fi>
Hi,

There has been some discussion about the canonicalization alternatives in
the current XML-signature draft. Basically the comments have been that
c14n is good but too complex for some applications, and that there are
problems with the minimal canonicalization. It has been argued that the
minimal one should be removed from the draft. Someone also pointed out
that there is a potential security problem in the minimal
canonicalization.

I have a case for which either canonicalization method is good for. This
is a real problem in my draft about XML encoding of SKPI certificates but
this problem can manifest itself also in other applications. In my draft
the problem is about calculating a hash of a public-key. (SPKI defines
that issuer and subject can be identified with a hash of a public-key
among other alternatives, so it has to be included for compatibility.)

Hash of a public-key must be calculated over the XML encoded form of the
public-key being hashed. In this case, a receiver of a certificate does
not get the XML encoded version of the public-key, but is supposed to know
the public-key and just to identify the correct public-key with its hash. 
So, a receiver calculates a hash for a public-key by first forming the XML
encoded form of a public-key (which is an XML element defined as a part of
the DTD for XML encoded SPKI certificates), then canonicalizing it and
finally calculating a hash over the canonicalized version of the element.
The key issue here is the canonicalization algorithm that is supposed to
remove all possibilities for alternative presentations of the XML encoding
of a public-key before the data is passed on to a hash algorithm. 

With current options for canonicalization in the XML-signature draft this
canonicalization of the public-key element is not possible without leaving
the possibility for alternative presentations. The full-fledged XML
canonicalization cannot be used because the canonicalized data is an
element, not a whole document. The minimal canonicalization, on the other
hand, is not powerful enough for this case because the XML encoded element
to be canonicalized is not transferred to recipients. The minimal
canonicalization does not remove line breaks or other white spaces and
because of this the minimal canonicalization fails to eliminate the
possibility for alternative presentations.

This problem is best understood with the following example:

To calculate a hash of his/her public-key an issuer of a certificate must
form an XML encoded version of the public-key. There are a number of
options for the encoding of a public-key element; here are just two (the
base64 encoded contents are truncated in this example):

XML ENCODING OPTION 1:
<public-key>
  <dsa-pubkey>
    <dsa-p>AP1/U4EddRIpUt9KnC7s5Of2EbdSPO9EAMMeP4C2USZpRV...</dsa-p>
    <dsa-q>AJdgUI8VIwvMspK5gqLrhAvwWBz1</dsa-q>
    <dsa-g>APfhoIXWmz3ey7yrXDa4V7l5lK+7+jrqgvlXTAs9B4JnUV...</dsa-g>
    <dsa-y>e45XbCIKlnly8lWIBJi3uX46+fzbYjt6jiApSqoFvvZVtT...</dsa-y>
  </dsa-pubkey>
</public-key>

XML ENCODING OPTION 2:
<public-key><dsa-pubkey><dsa-p>AP1/U4...</dsa-p><dsa-g>AJdgUI...</dsa-g>
<dsa-g>APfhoI...</dsa-g><dsa-y>e45XbC...</dsa-y></dsa-pubkey></public-key>

For these two presentations of the same public-key the minimal
canonicalization leaves many differences. The recipient of a certificate
cannot know what kind of an encoding was used by the certificate issuer
prior to canonicalization and hash calculation, and thus cannot create a
certain match with the hash of a public-key even if the correct public-key
is used.

There might be applications where a signature is presented for XML data
that is not available directly, but must be created from some raw data
prior to checking a signature, and where XML c14n cannot be used (or is
not desired). This is yet another reason in addition to those mentioned by
other WG members why minimal canonicalization should be taken out of the
draft. But I think that there should be a lightweight alternative for XML
c14n because c14n is limited to complete documents, needs a DTD or a
schema and is unnecessarily complicated for many applications. 

To conclude: I think it would be beneficial to replace the minimal
canonicalization with a lightweight canonicalization that had the
following properties:
 -Can be applied on elements and whole documents
 -Does not require a DTD or schema for processing
 -Does remove the most common sources of alternation in XML documents
 -Canonicalization can be done for DOM tree and SAX events

The souces of alternation that should be removed are at least the
following:
 -Character set normalization (UTF-8, I guess)
 -White spaces (spaces, tabs and line breaks)
 -Possibly attribute order (for example convert to alphabetical order)

Any comments to the canonicalization requirements, are welcome. I have not
designed those requirements thoroughly, so it's quite possible that I
missed something there. 

Regards,
	  Juha


--
j u h a    p ä ä j ä r v i
     [R&D Engineer]                                      First Hop Ltd.
                                                         Tekniikantie 12
work   +358-9-2517 2332    juha.paajarvi@firsthop.com    FIN-02150 Espoo
mobile +358-40-560 2733    www.firsthop.com              Finland
Received on Sunday, 26 March 2000 09:09:11 UTC