Signed XML: Experiences from the Creation of XFDL

A Position Paper for The W3C Signed XML Workshop


By John Boyer (jboyer@uwi.com)
UWI.Com -- The Internet Forms Company
400-1095 McKenzie Ave.
Victoria, BC V8P 2L5
Canada


0. Introduction

The Extensible Forms Description Language [3] is an XML language for the digital representation of complex forms. The primary goal of XFDL is to be an enabling technology for electronic commerce. It is reasonable to assert that XFDL satisfactorily fulfills this goal based on the number and value of deployed XFDL applications. Central to achieving this goal is the ability to digitally sign XFDL (and hence XML). It is especially true in e-commerce that the "usefulness of structured information is dependent on how trustworthy it is" [9]. Therefore, the official position of UWI.Com is that signed XML is as important to electronic commerce as XML itself. Without a standard for signed XML, e-commerce applications will not be interoperable, which is an implicit requirement of XML design goals 1, 2 and 4 [4].

UWI.Com intends to contribute to the creation of a signed XML standard and to support the standard in its implementations. To that end, this paper provides a generalization to signed XML of the digital signature features of XFDL. This discussion is by no means exhaustive as there are many good ideas appearing in other works, but the value of the features used by XFDL is derived from the successes of and refinements inspired by numerous application deployments.


1. Binding Additional Information to a Signature

The textbook formulation [5] for the digital signature algorithm states that a signature S for a message M is formed by the function Encrypt(hash(M), PrivateKey), and that verification of signature S is formed by testing the equality of hash(M) and Decrypt(S, PublicKey). The hash() function must be a mathematically sound method of measuring change in M.

The well-known shortcoming in this formulation is that security is directly proportional to the reliability of the method for delivering the public key to the verification phase. The PKI industry provides solutions to this problem. There is another problem, equally important, at the opposite end of the formulation above-- during signing. Consider this question: why is the signer signing M? The signer affixes a signature only when it is necessary to give M to another party, the verifier. Since M has a source and a destination, it is essentially a transaction record. The nature of the transaction must be important to the parties or there would be no need for a signature. We expect the digital signature to offer transaction non-repudiation, but this is only satisfied by non-repudiation of M to the extent that M completely represents the transaction [1]. The digital signature authenticates both the message content and the message signer, but what if the message does not contain sufficient information to capture the nature of the transaction?

In the electronic forms context, the transaction signer is typically a person on a client machine whose digital signature implies authorization of the input as interpreted in the context of the screen layout. The input values are the answers, and the screen layout includes the questions as well as the fine print, colors, font information, images, and locations of all visible information.

XFDL binds the input (data layer) and the screen layout (presentation layer) by construction. XML application designers could choose to delay binding until signing time. For example, the user data and stylesheet could be concatenated into a single message M. This is a bit more problematic than XFDL solutions since applications must guarantee that the stylesheet bound to a valid signature is actually the one being used to render the document.

Nonetheless, in applications where a convincing argument can be made for late binding, an XML element representing a signature would need to allow for subelements containing additional information beyond what can be immediately regenerated from the root XML element. Furthermore, it seems prudent to allow an arbitrary number of these additional elements so that each can represent a distinct Internet resource. Finally, an encoding attribute should be added to this subelement to control whether the content is raw or base 64 encoded [6], the latter of which would allow binding of non-readable/binary data such as images representing company logos, method of credit card payment, and so forth.


2. Signature Filters

Although it is by no means required for transaction non-repudiation, XFDL forms often carry server-side business logic elements such as database or workflow application instructions. Simple tasks can be encoded in XML processing instructions, but XFDL processing instructions derive substantial advantages from the XFDL computation engine. This allows XFDL forms to be intelligent agents with behaviors that vary based on the information collected by the form. This considerably simplifies many object oriented system designs, but it places an additional onus on signed XML. The reason is that digitally signed elements cannot change (without breaking the signature), so computations are 'locked' once the element is signed by one or more signature. Therefore, if server-side processing elements are to function computationally once the form arrives on the server, there must exist a signature filter mechanism that allows certain elements to be excluded from a digital signature.

Another simple example is the need to offer multiple overlapping signatures on the same form, which is necessary to reproduce typical paper form operations such as the 'for office use only' section. These examples establish the need for signature filters. The XFDL signature filter has several features whose generalization should be carried forward to a signed XML protocol.

XFDL uses element depth in the following way: Level 1 is for form global options and pages, Level 2 is for page global options and items such as fields, checks, popups, data, and so on, Level 3 is for options that configure an item, and Level 4 and beyond are for subelements in an option array (for example, the fontinfo option requires a typeface, point size and indicators for bold, italic, and underline). XFDL identifies elements uniquely using a relative scope mechanism based on an attribute named sid (scope identifier) above Level 3, or on tag name at Level 3 and below. At Level 2, the item tag is referred to as the type of the element.

An XFDL signature filter decides whether to keep or omit an XML element based on references to the unique identifier or to the type of element. Generalizing to XML, a signature filter should be able to refer to an element based on tag name or attribute value. In addition to XFDL's relative reference method, the XPointer model [8] (and hence XLink [7]) provide sufficient referencing capabilities for this task.

One of the aspects of signature filters that will require a great deal of consideration is deciding how to resolve conflicts. For example, a signature filter may indicate that an element should be included in the signature hash based on its tag, but a different part of the filter may indicate that the element should be omitted based on an attribute value. XFDL resolves these conflicts by giving precedence to explicit keeping or omitting over implicit keeping or omitting, and by giving precedence to direct reference over reference by type. However, these rules are based on the particulars of XFDL and will be difficult to generalize. For examples, attributes can be used for purposes other than unique relative scope identifiers and tag names can have different meanings based on element depth.

Furthermore, it can be easier to write some signature filters if one can specify both keep and omit clauses. For example, suppose an XFDL form has 100 fields and 100 checkboxes. Further, suppose we want a signature filter that keeps 98 of the fields and two of the checkboxes. One way to do this is to 'keep' all item elements having the field tag, which implicitly omits all other items, including the 100 checkboxes. The item reference filter then takes over to make the necessary minor exceptions. By reference one could omit the two undesired fields as well as keep the two checkboxes. Although this is useful in this case, XFDL does not support both keep and omit because it is difficult to specify precedence rules when a part of the filter is self-contradictory.

Finally, the ability to specify elements to keep or omit by wildcard pattern should also be considered (XFDL does this indirectly with group and datagroup filtering). When an XML document is expected to dynamically add new elements (such as duplicating a row in a purchase order), it can be easier to write a signature filter if it has the power to 'keep' elements based on matching the common part of the identifiers of all added elements.


3. Marking Signed Elements

As mentioned above, XFDL locks the computations on digitally signed elements to prevent their values from changing. This is a particular example of a more general need. It is important to know, at a minimum, whether some signature signs an element. The XFDL computation system can use this flag to lock computations, but other important examples include the need to provide some GUI feedback to show whether a particular element is digitally signed and the need to test whether an element is signed before allowing an API to delete it, change its character data, add subelements, and so on.

It might be useful to have an attribute that carries this flag. XFDL does not currently do this, in part because it is actually easiest to keep a signature count that tells how many signatures have been applied to an element. When a signature is deleted, the counts of the elements it signs are reduced. Only when the element's signature count is zero is the element unsigned. However, this count can only be maintained by an API and not actually in the markup. The reason is that the second signature applied to an element will break the first by changing attribute value (unless the rule is made that signature count attributes are omitted from the text to be hashed).

There is also the question of whether or not an invalid signature should increment the signature counts of elements that appear in its hash text. Currently, XFDL only increments the signature counts for valid signatures, but there are good arguments for either approach such as coming up with some way to handle the difficult problem of signature longevity (of course, a better way would be to improve current cryptographic standards to acknowledge the difference between expiring the signing versus the verifying capabilities of a certificate).


4. Signature Mechanics

In XFDL, a digital signature is represented by a signature item, which makes a copy of the specific parameters of the signature. Message M includes this signature item. The resulting digital signature is a binary data block, which is base 64 encoded and put in the signature item in a subelement called mimedata. This would seem to break the signature, but XFDL specifies that the mimedata should be omitted when regenerating the hash text during verification. The reason for precreating the signature item and adding it to the hash text is to guarantee that the signature parameters are signed. For example, a signature filter may omit the signature's actuating item (a signature button). As such, signature parameters such as the cryptographic service provider could be changed without breaking the signature, which is an obvious security hole.

A signature should include the signer's certificate as part of the binary signature block. Many systems include the whole certificate chain. If the signature contained only the encrypted hash, then building a PKI would be more difficult since each verifier would need a trustworthy method of obtaining every signer's certificate. However, it is essential that the verification include a test of the certifying authority's signature on the signer's certificate in order to prevent substitution attacks [2]. Putting the whole certificate chain might be considered, but it should be optional since it will increase the signature size with no added value in many deployments.

Finally, signed XML should use detached signatures, which contain the encrypted hash (and the signer certificate), but not a copy of the message signed. XML is regenerative, and it is a container language, so the duplicated data is unwarranted. This may seem like a minor point, but there are deployed XFDL applications requiring over 50 signatures on a single document.


5. References

[1] B. Blair & J. Boyer. XFDL: Creating Electronic Transactions Records Using XML. In the Refereed Proceedings of Eighth Annual World Wide Web Conference, Toronto, May 1999.
[2] J. Boyer. Digital Signatures with the Microsoft CryptoAPI. Dr. Dobb's Journal. Vol. 286, June, 1998.
[3] J. Boyer, T. Bray, & M. Gordon (Editors). Extensible Forms Description Language (XFDL) 4.0. W3C Note, available at: http://www.w3.org/TR/NOTE-XFDL.
[4] T. Bray, J. Paoli, & C.M. Sperberg-McQueen (Editors). Extensible Markup Language (XML) 1.0. W3C Recommendation, February 1998. Available at http://www.w3.org/TR/REC-xml.
[5] T. Cormen, C. Leiserson, & R. Rivest. Introduction to Algorithms. The MIT Press. Cambridge, Mass., 1990.
[6] N. Freed & N. Borenstein. Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies. Innosoft, First Virtual, Nov. 1996. Available at: http://src.doc.ic.ac.uk/computing/internet/rfc/rfc2045.txt
[7] E. Maler & S. DeRose. XML Linking Language (XLink). Available at: http://www.w3.org/TR/WD-xlink.
[8] E. Maler & S. DeRose. XML Pointer Language (XPointer). Available at: http://www.w3.org/TR/WD-xptr.
[9] J. Reagle. XML-DSig '99 Workshop Scope Statement. Available at: http://www.w3.org/1999/02/ds-xml-cfp-19990218.html.