8

8.0 Security Considerations

The XML Signature specification provides a very flexible digital signature mechanism. Implementors must give consideration to their application threat models and to the following factors.

8.1 Transforms

A requirement of this specification is to permit signatures to "apply to a part or totality of a XML document." (See section 3.1.3 of [XML-Signature-RD].) The Transforms mechanism meets this requirement by permitting one to sign data derived from processing the content of the identified resource. For instance, applications that wish to sign a form, but permit users to enter limited field data without invalidating a previous signature on the form might use XPath [XPath] to exclude those portions the user needs to change. Transforms may be arbitrarily specified and may include encoding tranforms, canonicalization instructions or even XSLT transformations. Three cautions must be raised with respect to this feature in the following sections.

Note, core validation behavior does not confirm that the signed data was obtained by applying each step of the indicated transforms. (Though it does check that the digest of the resulting content matches that specified in the signature.) For example, some application may be satisfied with verifying an XML signature over a cached copy of already transformed data. Other applications might require that content be freshly dereferenced and transformed.

8.1.1 Only What is Signed is Secure

First, obviously, signatures over a transformed document do not secure any information discarded by the Transforms: only what is signed is secure.

Note that the use of Canonical XML [XML-C14N] ensures that all internal entities and XML namespaces are expanded within the content being signed. All entities are replaced with their definitions and the canonical form explicitly represents the namespace that an element would otherwise inherit. Applications that do not canonicalize XML content (especially the SignedInfo element) SHOULD NOT use internal entities and SHOULD represent the namespace explicitly within the content being signed since they can not rely upon canonicalization to do this for them.

8.1.2 Only What is "Seen" Should be Signed

Additionally, the signature secures any information introduced by the transform: only what is "seen" should be signed. If signing is intended to convey the judgment or consent of an automated mechanism or person, then it is normally necessary to secure as exactly as practical the information that was presented to that mechanism or person. Note that this can be accomplished by literally signing what was presented, such as the screen images shown a user. However, this may result in data which is difficult for subsequent software to manipulate. Instead, one can sign the data along with whatever filters, style sheets, client profile or other information that affects its presentation.

8.1.3 Transforms Can Aid Birthday Attacks

In addition to the semantic concerns of transforms removing or including data from a source document prior to signing, there is potential for syntactical collision attacks. For instance, consider a signature which includes a transform that changes the character normalization of the source document to Normalized Form C [NFC]. This transform dramatically increases the number of documents that when transformed and digested yield the same hash value. Consequently, an attacker could include a subsantive syntactical and semantic change to the document by varying other inconsequential syntactical values that are normalized prior to digesting such that the tampered signature document is considered valid. Consequently, while we RECOMMEND all documents operated upon and generated by signature applications be in [NFC] (otherwise intermediate processors can unintentionally break the signature) encoding normalizations SHOULD NOT be done as part of a signature transform.

Other Duerst comments:

4.3.3 The `Reference` Element

...

The URI attribute identifies a data object using a URI-Reference, as specified by RFC2396 [URI]. (Non-ASCII characters in a URI should be represented in UTF-8 [UTF-8] as one or more bytes, and then escaping these bytes with the URI escaping mechanism [XML] and as a XPointer fragment identifier [Section 4.1.1 URI Reference Encoding and Escaping, XPtr] when they appear within a fragement identifier.

Applications should be cognizant of the fact that protocol parameter and state information, (such as a HTTP cookies, HTML device profiles or content negotiation), may affect the content yielded by dereferencing a URI.

If there a resource is identified by more than one URI, the most specific should be used (e.g. http://www.w3.org/2000/06/interop-pressrelease.html.en instead of http://www.w3.org/2000/06/interop-pressrelease).

4.3.3.1 The `Transforms` Element

.. Some Transform may require explicit MimeType, Charset (IANA registered "character set"), or other such information concerning the data they are receiving from an earlier Transform or the source data, although no Transform algorithm specified in this document needs such explicit information. Such data characteristics are provided as parameters to the Transform algorithm and should be described in the specification for the algorithm.

4.5 The `Object` Element

...

The Object's Encoding attributed may be used to provide a URI that identifies the method by which the object is encoded.

6.5 Canonicalization Algorithms

Canonicalization algorithms takes two implicit parameter when they appear as a CanonicalizationMethod within the SignedInfo element: the content and its charset. (Note, there may be ambiguities in converting existing charsets to Unicode, for an example see the XML Japanese Profile [XML-Japanese] NOTE.) The charset is derived according to the rules of the transport protocols and media formats (e.g, RFC2376 [XML-MT] defines the media types for XML). This information is necessary to correctly sign and verify documents and often requires careful server side configuration. Various canonicalization algorithms require conversion to [UTF-8]. Where any such algorithm is REQUIRED or RECOMMENDED the algorithm MUST understand at least [UTF-8] and [UTF-16] as input encodings. Knowledge of other encodings is OPTIONAL.

[UTF-16]: RFC2781. UTF-16, an encoding of ISO 10646. P. Hoffman , F. Yergeau. February 2000.
[XML-MT]: RFC 2376. XML Media Types. E. Whitehead, M. Murata. July 1998.
XML-Japanese: XML Japanese Profile. W3C NOTE. M. MURATA April 2000
http://www.w3.org/TR/2000/NOTE-japanese-xml-20000414