W3C NOTE-SDML-19980619

SDML - Signed Document Markup Language

W3C Note 19-June-1998

This version:
Latest version:
Jeff Kravitz

Copyright © 1996, 1997, 1998 Financial Services Technology Consortium. All rights reserved.

Status of this document

This document is a submission to the World Wide Web Consortium.  It is the initial draft of the specification of SDML.  It is intended for review and comment by W3C members and is subject to change. There are W3C Staff comments on this submission.

This document is a NOTE made available by the W3 Consortium for discussion only. This indicates no endorsement of its content, nor that the Consortium has, is, or will be allocating any resources to the issues addressed by the NOTE.


1 Introduction

A child of five would understand this. Send someone to fetch a child of five.

- Groucho Marx

Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.

- John von Neumann

Make everything as simple as possible, but not simpler.

- Albert Einstein

Research is what I'm doing when I don't know what I'm doing.

- Wernher Von Braun

PROGRAM: n. A magic spell cast over a computer allowing it to turn one's input into error messages. tr.v. To engage in a pastime similar to banging one's head against a wall, but with fewer opportunities for reward.

Then anyone who leaves behind him a written manual, and likewise anyone who receives it, in the belief that such writing will be clear and certain, must be exceedingly simple-minded.

- Plato, Phaedrus 275d

Read over your compositions, and where ever you meet with a passage which you think is particularly fine, strike it out.

- Samuel Johnson, quoting a college tutor, 1773

The knowledge of Cyphering, hath drawn on with it a knowledge relative unto it, which is the knowledge of Discyphering, or of Discreting Cyphers ... Certainly it is an Art which requires great pains and a good wit, and is (as the other wits) consecrate to the Counsels of Princes.

- Sir Francis Bacon, 1623

1.1 Background

The Signed Document Markup Language (SDML) was developed by the Financial Services Technology Consortium (FSTC) as part of the Electronic Check Project. SDML is designed to:

The signatures become part of the SDML document and can be verified by subsequent recipients as the document travels through the business process. SDML does not define encryption, since encryption is between each sender and receiver in the business process and can differ for each link depending on the transport used.

SDML is the generic document structuring and signing part of the Financial Services Markup Language (FSML). FSML defines the specific document parts needed for electronic checks, the tags which identify check-specific data items, the semantics of the data items, and processing requirements for electronic checks. FSTC will be releasing the FSML specification implemented in the U.S. Treasury pilot along with a proposal for version 2.0 of FSML in the near future.

When development of FSML began in 1995, HTML was in its early stages of widespread deployment. SGML had been standardized some years earlier, software tools were readily available, and the use of tagged, readable text was attractive for its simplicity, ease of understanding, operational support, and ease of development and use. FSML/SDML were designed so that they could be defined using an SGML Document Type Definitions (see Appendix A). FSML also defined document formatting rules so that readable text electronic checks could be sent via electronic mail systems without the risk that the mail systems would modify the electronic check in ways that invalidate the signatures.

SDML is being published now to inform industry associations and standards bodies about the FSTC's experience, to show how cryptographic signatures can be embedded in structured, tagged-text documents, and to show how the business requirements of a typical application can be met. Given the standardization of XML and the widespread use of MIME attachments for sending documents with 8-bit transparency, FSTC wants to engage in discussions around how SDML and XML inter-relate and how these two approaches, which had different initial objectives, can be brought closer together and made compatible and as consistent as possible.

FSML has been implemented by the Electronic Check Project. A pilot implementation is in operation using payer and payee software to send, receive, and deposit electronic checks over the Internet. Cryptographic hardware, in the form of smart cards, has been developed to contain the private signing keys, perform the hashing and signing operations, and to perform other "electronic checkbook" functions, such as automatically numbering and logging checks written or deposited. Advice of payments are attached cryptographically to the checks when sent between payer and payee, and they are removed by the payee. Similarly, checks are attached cryptographically to deposit slips when they are sent to the payee's bank. Bank server systems have been developed to process the electronic checks, to interface with existing check processing systems in the banks, and to clear and settle electronic checks between banks. A Certificate Authority hierarchy has been established, and certificates have been issued to banks and checking account holders.

The Electronic Check project, from its inception, sought to develop a general solution to the issues of authentication and integrity associated with creating electronic financial instruments. The technical and business problems of implementing electronic check payments between payers and payees over the Internet provided a practical context for developing the solution.

Paper checks have a rich tradition, and support numerous options, check types, attached information, and sophisticated processing. However, paper checks are fundamentally a "signed writing directing a bank to pay money, after a date, from the payer's account." The Electronic Check project determined that the essence of the problem to be solved was to develop a generalized structure for creating, processing, and displaying electronic "signed writings," where cryptographic signatures would substitute for manual signatures and where an electronic message would take the place of the paper medium. The structure would need to support the same business operations as signed paper checks, such as signing, co-signing, and witnessing of signatures, and attaching and removing associated documents such as remittance slips, invoices, advice of payment, and deposit slips.

Since checks are a form of negotiable instruments, and negotiable instruments are a form of contracts, it is believed that SDML may be used to create signed documents suitable for a wide variety of purposes. For example, they may be used as messages to initiate electronic funds transfer, as orders and invoices needed for electronic commerce, or for other forms of signed contracts or agreements.

SDML documents, which are hashed and cryptographically signed using public key signature algorithms, can have the following security attributes:

The SDML signature mechanism allows documents to be combined, or added to, without loss of these attributes with respect to prior signatures or the pre-existing parts of the documents.

The Financial Services Technology Consortium (FSTC) is a not-for-profit organization whose goal is to enhance the competitiveness of the United States financial services industry. Members of the consortium include banks, financial services providers, research laboratories, universities, technology companies, and government agencies. 

1.2 Business Objectives

Some of the business objectives that were instrumental in the design of SDML were...

1.3 Technical Decisions

The above business objectives led to a number of technical decisions, based on much thought and discussion amongst the team members. Some of these technical decisions were:

1.4 Structure

Below is an example of a signed electronic document:

<sdml-doc docname="doc87" type="sample">


<adata encoding="text">
This is a sample attachment

<hash alg="sha">278B7F348EECE3822A48C4D197FD5B920001C2E8
<hash alg="sha">BC59D2FE5566F506910C5020B628E4136E1C6B39


<certissuer>/C=US/O=FSTC/OU=FSTC CA/


The above sample consists of a simple attachment (containing merely the character string "This is a sample attachment"), which is then signed. It also contains an <action> block which indicates the action to be performed when the document is received, a <signature> block, which contains the digital signature of the relevant blocks in the document, and two <cert> blocks, which contain the certificates (and thus the public keys) of the document signer, and the issuer of the public key for that signer. These blocks will be described in detail below.

An SDML electronic document is comprised of a number of blocks. as defined in the SDML definitions below. Each block contains some common fields, or elements in SGML terminology, and also contains fields that are specific to the type of block. Blocks are not nested; however, they are contained in <sdml-doc> elements, which can be nested.

All blocks that must be protected from tampering and all blocks that must be authenticated are signed using a digital signature, which is contained in a signature block. The digital signature uses one of the standard digital signature algorithms, such as MD5/RSA or SHA/DSS, although the use of MD5 is deprecated. Each signature requires a public key, which also requires a certificate. Certificates are currently distributed as X.509 certificates.

Blocks may also be "bound" together by the signature block, which contains the block names of the blocks being bound, the digital hashes of these blocks, and a digital signature on these hashes along with the other contents of the signature block. This binding allows the receiving software to verify that all blocks that were bound are present and have not been tampered with.

The concept of the SDML electronic document is that it is a flexible structure. Separating signatures, certificates, actual data, etc., into separate blocks allows a rich, complex document to be built from these "primitives," while retaining a standard format which can be parsed and verified according to a standard syntax definition. 

2 Notation

In the pseudo-SGML definitions below, an attempt is being made to show examples of the SGML format for SDML electronic documents, rather than to use formal meta-linguistic notations to define them. A more accurate definition is later in the document using more formal notations (Extended BNF and SGML DTD).

In these definitions, the following simple notations are used to indicate the type of value being used for a particular field:

ccccccccc  This is used to represent a character string. The number of c's in the definition does not indicate the allowed size of the string. Character strings may contain any legal SGML character, except the tag delimiters (< or >) and the other SGML formatting characters. These characters may be inserted into the string using the standard SGML escaping sequences. Country or language-specific characters may also be used, again using the standard SGML escape sequences. Quote symbols have no special significance, and if contained in a value string, will be considered part of the value. e.g., "John Smith" is not the same value as John Smith All self-defining constants, such as ISO country and currency names, SGML tag parameters that are choices from among a list, or items specified as choices in the definitions below, must be specified in lower case. Mixed case may be used for variable strings, such as names, addresses, document names, block names, etc. In all strings, mixed-case significance is honored, i.e. the string John Smith is not equal to the string john smith
nnnnn  An ASCII character string used to denote an integer number, containing only the digits 0 - 9. The number of n's does not indicate the size of the number string. This is described elsewhere. 
nnnnn.nn  An ASCII character string used to denote a decimal (real) number, containing only the digits 0 - 9 and a single, optional decimal point. 
hhhhhh  An ASCII character string used to denote the hexadecimal encoding of a binary string of octets. It may only contain the ASCII characters 0-9, A-F, and a-f. The number of h's does not indicate the size of the string. All legal hexadecimal strings must consist of an even number of hex digits. In certain cases, described when used below, the field is split into two portions using a colon ":", e.g. 123456789:abcdef
other  A string depicted in boldface as a value represents itself. All such self-defining-constants must be in lowercase in the SDML document. 

3 Document Formatting Rules

In order for the SDML electronic document to be easily transmitted by a variety of methods (e-mail, file transfer, storage media, etc.) it was designed to be a plain ASCII document. However, certain formatting rules must be adhered to in order to ensure that most of the usual transport mechanisms, in particular e-mail systems, will successfully transport the SDML electronic document unchanged.

Note that these rules supersede the white space and line end rules for SGML, in order to ensure that signed documents can be successfully verified. Therefore, the following document formatting rules are considered mandatory for document generators and receivers:

3.1 Character Encoding

3.2 Line Formatting

3.3 Space Handling

3.4 Tags

In order to allow SDML processing using limited resources, (for example, in a Smart Card using for signing) SDML requires that certain SGML features for tag handling not be allowed.

4 SDML Document Definition

4.1 Electronic Document Definition

The definition begins with an SDML electronic document.

Every SDML electronic document consists of one or more enclosed documents. These documents are nested, with the nesting done by enclosing earlier forms of a document inside later additions to the document. Each enclosed document is built inside a <sdml-doc> tag structure. Inside a document are one or more blocks. Blocks may appear in any order, except that the <action> block (defined below) must the first block in the document.

  <sdml-doc docname="cccccccc" type="cccccccc">
     a sequence of one or more blocks and/or nested <sdml-doc> documents
Figure 4.1 Document element definition

The docname= attribute parameter is a document name, assigned by the software creating the document. This name will be used when combining documents. (See Combining Documents, below). If multiple SDML documents are being created at one time, as part of one file or transmission, the creating software should ensure that the document names are unique, within the file or transmission. This name should contain a maximum of 64 characters. Note: Attribute parameters must be enclosed in quotes.

The type= attribute parameter is a document type, used to specify the type of document. This type is used by the receiving software to ensure that it has received the correct type of document, i.e., one that it knows how to process. The document types are chosen from a list of pre-defined types, or may be types agreed upon by the sending and receiving parties, except that the latter agreed-upon types may not conflict with any pre-defined types. Note: Attribute parameters must be enclosed in quotes.

To prevent such conflict between pre-defined, standardized document types, and privately agreed-upon types, all privately agreed-upon document types should be prefixed with the characters "p-" (meaning private). For example, a document type used for auto loan applications, agreed to be used by a pair or small group of cooperating banks, could be written as type="p-autoloan". All pre-defined document types will be guaranteed not to start with the characters "p-".

4.2 Block Common Field Definitions

A block contains some common fields, along with other fields specific to the type of block. Except in a few cases and unless otherwise specified, the order of fields within a block is not predefined. Once created, however, fields may not be moved or rearranged inside a block, to permit the digital signatures and hashes to be valid.

Common Block Field Definitions

Each of the blocks contains some field definitions which are common to all block types, as follows:

Figure 4.2 Elements common to all blocks


blkname  (required) This is a character string which must contain a block name assigned at the time the document is created. The creating software must ensure that the block names are unique within a document. The names are used to refer to the block from other blocks. 

Generally, the name chosen for the block may be any unique character string. For certain blocks, a convention or rule applies when creating block names. The rules or conventions are described in the individual block descriptions. 

crit  (optional) A boolean (true/false) flag used to determine if a block is critical. If a block is critical, then the receiving software must be able to process the block. If the software cannot process a critical block, it must abort processing the entire document, or otherwise determine how to handle the document as an exceptional case. This flag is used to allow for expansion of the block types, to allow software to "ignore" block types that it doesn't recognize, providing that they are marked non-critical by the software that created them. Certain types of blocks, such as informational messages, etc. might always be considered non-critical. Other types, such as signatures, might always be considered critical. The criticality flag is assumed to have a default of true unless otherwise specified as false. Thus, it is not required to be specified in every block. 
vers  (optional) A number which indicates the version of the block. New versions may be introduced, and this number is used by receiving software to determine if it is capable of parsing/processing a block. If the version number is larger than the one understood by the receiving software, it must assume that it cannot process the block, and must use the criticality flag to determine if it can continue to process the document. If the version number is not specified, it is assumed to be 1.0. 

4.3 Block Definitions

Each SDML block starts and ends with one of the following sets of block tags:

Start Tag  End Tag 
<action>  </action> 
<signature>  </signature> 
<cert>  </cert> 
<attachment>  </attachment> 
<message>  </message> 
Figure 4.3 List of SDML Block Elements

The block types are defined as follows:

action  A block describing the action to be performed by the recipient 
signature  A block with the signatures and hashes of other blocks 
cert  A public key certificate 
attachment  An associated document attached to an SDML document 
message  An informational message, such as an error report 

4.3.1 Action Block Definition

This block contains information about the action to be performed by the recipient of the electronic document.

Figure 4.4 Action block element definition Action Block Field Definitions

function  (required) The function field contains a character string chosen from a set of commands or verbs specific to the application or type of document being sent. Each application or type of document will have a unique set of allowable functions that are supported. 
reason  (required) The reason field indicates the reason that the document is being transmitted to the recipient. It must be one of the following character strings. 
process  This indicates that the document is an original being sent to the recipient for normal processing. 
resend  This indicates that the document is a possible duplicate being resent to the recipient. It should only be processed if it is not a duplicate at the receiver. 
test  This indicates that the document is being sent as a test, and should not be fully processed (e.g. it should not transfer funds). 
info  This indicates that the document is being sent for informational purposes only (e.g., as part of the text of an e-mail message) and is not to be processed. 
return  This indicates that the document is being sent back to the originator as a returned item. The document will usually contain a <message> block indicating the reason for the return. 


4.3.2 Signature Block Definition

This block contains a digital signature for another block, or set of blocks. It is required whenever a block must be authenticated, or tamper-proofed. It also contains the reference to the certificate block containing the public key used to verify the signature. It is also used to "bind" multiple blocks together, so that the resulting compound document can be verified.

Unless otherwise specified, the data being signed consists of the entire contents of the subject block, which is defined to be everything between the start and end tags for the block. The signature must include the blockname, criticality, and version fields, if present, as well as the contents of the block.

The actual hashes of the signed blocks are included to allow verification of the binding even if the actual contents of the bound blocks is not available.

      <hash alg="sha">hhhhhh
      <hash alg="sha">hhhhhh
      <hash alg="sha">hhhhhh
Figure 4.5 Signature block element definition Signature Block Field Definitions

blockref  (required) The signature block contains one or more <blockref> fields, each of which contains the unique block name of the associated block being signed. All block references must appear immediately before their respective hashes. (See below.) The <blockref> and <hash> pairs may be repeated multiple times to sign multiple blocks. 
hash  (required) This field contains the actual hash of the respective block. Each <hash> start tag must have an attribute parameter which specifies the algorithm used to perform the hash. The currently allowed parameters are md5 or sha. The alg= attribute parameter is required. The use of md5 is deprecated. Other hash algorithms may be supported in the future. It is not required that the same hash algorithm be used for each of the blockrefs in a signature block. All hashes are encoded in "network byte order," which means that the most significant bytes are leftmost (first). Note: Attribute parameters must be enclosed in quotes. 
nonce  (required) This is a nonce, or one-time random number, used to "salt" the hashed data to discourage cryptanalysis attacks. See the section below on signature calculation. The nonce value can be any string of random ASCII characters from within the set of allowed SDML characters (see Character Encoding above), not including white space. It is therefore possible for the value to be represented as an integer (containing only the digits from 0-9), a floating point number, a hexadecimal string, or a base64-encoded string. 
sigref  (optional) This is the block name of the <cert> block which contains the public key that can be used to verify the signature. This field, although optional, is only optional when an agreement is in place indicating that the recipient of the document does not need the certificate in order to process the document. 
certissuer  (optional) This field contains the unique distinguished name of the issuer of the certificate. It should only be specified if the <cert> blocks are not being sent with this document. See the description of the <certissuer> field in the <cert> block for the syntax used to specify this field. 
certserial  (optional) This field contains the unique certificate serial number assigned by the issuer of the certificate. It should only be specified if the <cert> blocks are not being sent with this document. 
algorithm  (required) This string indicates the algorithm used to sign the signature block. It may be md5/rsa or sha/dsa or sha/rsa. Note: Implementors of code that is used to sign SDML electronic documents may choose to support only one of the above three possible signing algorithms. Implementors of code that is used to verify SDML electronic documents must support all three algorithms. This ensures interoperablity. The use of md5 is deprecated. 
timestamp  (optional) This field specifies the time that the document was signed. It must be in Universal time (i.e. GMT) specified as CCYYMMDDThhmmssZ, where the T and Z are literal characters, and where "CC" is the century (currently 19, soon 20), "YY" is the year, "MM" is the month, "DD" is the day, "hh" is the hour, "mm" is the minute and "ss" is the second. 
username  (optional) This is an identification string containing the certificate user's name. It is optionally inserted into the document by the electronic hardware token. 

This field, and the five following fields are optional identification data. This data is supplied by the electronic token owner to the token issuer at the time the token is initialized, but it is not certified to be correct or accurate by the token issuer. The data is inserted into the electronic token when the token is initialized, and may also be corrected or updated later by the issuer using administrative token functions and passwords. 

This data is then inserted, under control of the user, into the document by the electronic token, however the data cannot be changed or deleted by the user once the document is created. The user may select, when writing a document, which of the six identification fields are to be inserted into the document, in any combination, or may select none of them. 

useraddr  (optional) This is an identification string containing the certificate user's address. It is optionally inserted into the document by the electronic hardware token. 
userphone  (optional) This is an identification string containing the certificate user's phone number. It is optionally inserted into the document by the electronic hardware token. 
useremail  (optional) This is an identification string containing the certificate user's e-mail address. It is optionally inserted into the document by the electronic hardware token. 
useridnum  (optional) This is an identification string containing the certificate user's identification number. It is optionally inserted into the document by the electronic hardware token. 
userotherid  (optional) This is an identification string containing any user identification the user wishes (e.g., company name). It is optionally inserted into the document by the electronic hardware token. 
location  (optional) This field specifies location/country where the document was signed. 
sig  (required) This is a hexadecimal encoding of the actual signature data. For certain algorithms, the field is split into two portions using a colon ":". For DSA, the field contains the two portions of a DSA signature as r:s, where r and s are long hexadecimal strings. For RSA, only a single hex number is specified, with no colon separator. All signatures are encoded in "network byte order," which means that the most significant bytes are leftmost (first). Signature Calculation

Calculation of the signature is performed as follows. If an electronic token is being used, then all of the following steps must be performed by that token.

1.  The <nonce> value is created as a random number. The nonce value can be any string of random ASCII characters from within the set of allowed SDML characters (see Character Encoding above) not including white space. It is therefore possible for the value to be represented as an integer (containing only the digits from 0-9), a floating point number, a hexadecimal string, or a base64-encoded string. 
2.  The <nonce> value is logically prepended to the subject block contents before hashing. This includes the tag string "<nonce>," e.g., if the nonce value is 12345, the characters <nonce>12345 are logically prepended to the subject block before hashing. 
3.  The hash is calculated using the contents of the subject block, (with the <nonce> prepended) excluding the block start tag and block end tag, but including all characters in between, with the exception of all carriage returns, line feeds, and trailing spaces on a line. Leading and embedded spaces in a line are included in the hash. SGML entities (i.e., character names enclosed between an ampersand and a semicolon) are left untranslated when hashing. 
4.  The resulting hash value is inserted into the <hash> entry (as Hex ASCII) in the signature block. 
5.  Steps 2 through 4 are repeated for each block to be signed. 
6.  A second hash calculation is performed on the contents of the <sigdata> sub-block, which contains the previously calculated hashes, their block references, and the <nonce>. This should include all characters between the <sigdata> tag and the </sigdata> tag, again omitting all carriage returns, line feeds, and trailing spaces. This second hash is then encrypted using the private key. The result is the signature which is inserted (as Hex ASCII) into the signature block as the value for the <sig> tag. 

4.3.3 Certificate Block Definition

This block contains an encoded certificate.

Figure 4.6 Certificate block element definition Certificate Block Field Definitions

blkname  (required) The <blkname> field in a <cert> is slightly different than the "generic" <blkname>. Since the <cert> block is signed by the authority issuing the electronic token, and is probably stored in the token, it is not changeable at runtime by SDML-generating software. Thus the <blkname> chosen must be guaranteed to be unique for all subsequent documents. It is recommended (but not required) that a block naming convention be used to allow this. The recommended convention is that the name be suffixed with information that is unique to the certificate, so that the same name would never be used by other certificates in the same SDML document. As an example, a certificate issued by a bank whose Bank Routing Code is 123456789, for a customer whose account number is 987654321 might have a blockname of cert-123456789-987654321. If the certificate were for the bank itself, the blockname would be cert-123456789
certtype  (required) This field indicates the type of certificate contained in the block. The current possible values are x509v1 or possibly x509v3 (to be determined). 
certissuer  (required)When the <certtype> is x509v1 or x509v3, this field contains the unique distinguished name of the issuer of the certificate. The certificate issuer string uses the fields from the distinguished name in the ASN.1 X.509 certificate, separated by slashes, and using a TAG= identification of the name field type. The different name fields use the following identification tags: 
     Country       C=
     Commonname    CN=
     Locality      L=
     Orgname       O=
     Orgunit       OU=
     State         ST=
     Streetaddress SA=
     Title         T=

Thus, an example of an issuer string would be... 

certserial  (required) This field contains the unique certificate serial number assigned by the issuer of the certificate. 
certdata  (required)When the <certtype> is x509v1 or x509v3, this contains the hexadecimal-encoded binary value of the ASN.1 DER encoded X.509 certificate. When using DSA signatures and keys, the p,g,q values for DSA are to be contained in the ASN.1 for the certificate of the issuer of the signer's certificate (which is stored in the token). The signer's certificate will *not* contain p,g,q values. Verification software will use the p,g,q values from the issuer's certificate when verifying a signers signature. This implies that ASN.1 parsing software will have to deal with two varieties of certificates. 

4.3.4 Attachment Block Definition

This block contains any document that is to be attached to the SDML electronic document (e.g., a remittance notice, contract, etc.).

    <adata encoding="text">
Figure 4.7 Attachment block element definition Attachment Block Field Definitions

astatus  (optional) This field indicates whether the attachment is temporary or permanent. A temporary attachment is intended to be transmitted from the originator of the document to the receiver of the document. It is stripped off the document before transmission to any third party. A permanent attachment is intended to be kept with the document permanently, including transmission to any third parties in the transaction. The contents of the field may be the word temporary or the word permanent. If the field is omitted, it defaults to temporary. Note: An SDML document is not considered invalid if it is received by a third party containing a temporary attachment; however, the document may be invalid if a recipient strips off a permanent attachment. 
adata  (required) Any data may be contained in the Attachment block, between the <adata> and </adata> tags. 

The encoding= parameter for the <adata> tag is used to specify the encoding method for the data in the sub-block. It can have the following values:

mime  If the mime encoding value is selected, then the following three MIME headers are required to be placed in the next three lines of the <adata> sub-block, immediately followed by a blank line: 
Mime-Version 1.0
Content-Type: aaaaaa/bbbbbbbb
Content-Transfer-Encoding: xxxxx

Any legal MIME header values may be used for aaaaaa, bbbbbbbb, or xxxxxx. In particular, if the contents of the attached document cannot be encoded using the SDML document formatting rules, described earlier, then Content-Transfer-Encoding using base64, uuencode, or quoted-printable should be used to "armor" the document against e-mail systems. 

In addition, the encoded document may not contain the ASCII string </adata> so that the SDML parser will not interpret any portion of the attached document as the ending SGML tag for the <adata> sub-block. 

The actual encoded data follows the three MIME headers, separated by a blank line (i.e. one or more spaces followed by a new line sequence, with no other non-space characters). 

An example of a MIME-encoded <attachment> block: 

<adata encoding="mime">
Mime-Version 1.0
Content-Type: application/octet-stream
Content-Transfer-Encoding: base64



text  This allows a simple ASCII document to be inserted as an attached document without need for MIME headers or encoding/decoding software. This parameter value can only be used if the attached document inside the <adata> sub-block conforms to the SDML document formatting rules. 

4.3.5 Message Block Definition

This block contains error messages and return information that indicates the reason that the attached SDML document was not processed successfully or it may contain other information about the attached document.

Figure 4.8 Message block element definition Message Block Field Definitions

retcode  (required) This field contains a return code indicating the reason why the attached document was returned. 
msgtext  (required) This field contains a textual message explaining why the document was returned. 
msgdata  (optional) This field contains any other data that may be associated with the message, e.g., a report or bank statement. 

5 Document Structure

5.1 BNF Structure of SDML electronic documents

The following is an Extended BNF description of the global block structure of an SDML electronic document.

5.1.1 BNF Meta-Notation

The meta-symbols of BNF are:

::=  meaning "is defined as" 
meaning "or" 
[ ]  used to enclose optional items 
{ }  used to enclose repeated items (repeated zero or more times) 
< >  used to enclose specific SDML tags. 
<( )>  used to specify SDML blocks. 

Names not enclosed in any of the above bracket symbols are called nonterminals and are used to define symbols internal to the BNF specification only.

Note: Blocks are not required to be in the exact order specified below, except that the <action> block must always appear as the first block in any <sdml-doc>.

5.1.2 BNF Definition of non-terminals

cert-chain ::=        all certificates in the hierarchy
                      leading up to, but not including
                      the Root certificate.

data-block ::=        <(attachment)> | <(message)> | user-defined-block

5.1.3 BNF Definition of A Signed SDML Document

signed_doc ::=
                    { data-block }
                      <(signature)> <(cert)>
A signed SDML document consists of...

5.1.4 BNF Definition of a Multiply-Signed SDML Document

multiply_signed_doc ::=
                    { data-block }
                      signed_doc | multiply_signed_doc
                      <(signature)> <(cert)>
Thus, a multiply signed SDML document (i.e. a document which contains a
nested, inner document) consists of...
This nesting of documents may be continued indefinitely as new information
is added and signed.

5.1.5 Document Structure Diagram

SDML Document Structure Diagram

6 Combining Documents

As an SDML electronic document passes through the various steps and institutions that are part of the entire system that processes the document, new information may be added to the document. To allow the new information to be added, while still allowing the original information to be protected and verified using digital signatures, a document combining mechanism is defined.

To add new information to a document, the existing document is enclosed in a <sdml-doc> tag structure, which may also enclose new blocks containing the new information. New <signature> blocks may also be contained in the new information and may sign blocks in the inner nested documents. Each new, surrounding <sdml-doc> must also have a new <action> block, and TYPE parameter, and the <action> block and TYPE belonging to the outermost <sdml-doc> are used by the receiving system to determine the method used to process the modified document.

When combining original SDML documents into a larger, compound document, the names of the original blocks may not be unique. A document combining process must be used to handle naming conflicts when a number of documents are being combined (i.e., embedded) into a new document.

The document combining process is as follows:

1.  All the original <sdml-doc> elements are enclosed in a single new <sdml-doc> element. The original docname attribute parameters are kept with the same contents, unless all of the combined document names are not unique. If they are not unique, new, unique names should be assigned by the combining software. 
2.  Any time a block name reference is required to refer to a block which is not the same <sdml-doc> as the one containing the reference (i.e. inter-document references) then the reference consists of the DOCNAME of the <sdml-doc> element concatenated with a period "." and then with the <blkname> of the inner block being referred to. 

This is extended if the nesting is continued to more than two levels, e.g. 'outerdoc.innerdoc.block'

3.  Any block references inside a given <sdml-doc> must use the block name without any qualifying document name, to ensure that future document combining will not be prevented. 

As an example:

If there are two original documents:

    <sdml-doc docname="doc1">
    <sdml-doc docname="doc2">

When they are combined, the result is:

  <sdml-doc docname="newdoc">
    <sdml-doc docname="doc1">
    <sdml-doc docname="doc2">


Any external references to the <attachment> block in the first document would be 'doc1.block1', and the <attachment> block in the second document would be 'doc2.block1'. References inside doc1 to any blocks in doc1 must still use the original, single-level names. Similarly for internal references inside doc2.

This is extended if the nesting is continued to more than two levels, e.g. 'outerdoc.innerdoc.block'.

7 ASN.1 Definition of X.509 Version 1 Certificates

The ASN.1 definition of an X.509 Version 1 certificate is as follows:

Certificate     ::= SIGNED SEQUENCE{
             version                [0] Version DEFAULT 1988,
             serialNumber           SerialNumber,
             signature              AlgorithmIdentifier,
             issuer                 Name,
             validity               Validity,
             subject                Name,
             subjectPublicKeyInfo   SubjectPublicKeyInfo}

Version         ::= INTEGER < 1988(0)}

SerialNumber    ::= INTEGER

Validity        ::= SEQUENCE{
              notBefore              UTCTime
              notAfter               UTCTime}

SubjectPublicKeyInfo   ::= SEQUENCE{
              algorithm         AlgorithmIdentifier
              subjectPublicKey  BIT STRING}

AlgorithmIdentifier    ::= SEQUENCE{
              algorithm       OBJECT IDENTIFIER
              parameters      ANY DEFINED BY algorithm OPTIONAL}

The descriptions of the fields are as follows:

version  Indicates the version of X.509 which is being used. Signer certificates may be "1" to "3". 
Serial Number  A unique serial number assigned by the issuer. 
signature  An object identifier which idicates the algorithm used to sign the certificate. The location and format of the actual signature bits are defined by the SIGNED SEQUENCE data type. 
issuer  The distinguished name of the issuer of the certificate. 
validity  The Universal Coordinated Times before which the certificate is invalid and after which the certificate is invalid. The document must have been signed during the validity interval of the certificate. 
subject  The distinguished name of the signer. 
subjectPublicKeyInfo  The algorithm identifier of the subject's public key followed by the bits of the subject's public key. 

8 Field Summary

Below is a summary of the attributes of each of the entities or fields allowed in an SDML electronic document.

8.1 Field Attributes Table

Field Attribute Summary
Field Name Containing Blocks Min Size Max Size Optional
blkname all 1 64  
crit all 4 5 Yes
vers all 1 8 Yes
adata <attachment> 1 N/A Yes
algorithm <signature> 7 7  
astatus <attachment> 9 9 Yes
blockref <signature> 1 76  
certdata <cert> 1 N/A  
certissuer <signature> <cert> 1 256 Yes/No
certserial <signature> <cert> 1 16 Yes/No
certtype <cert> 6 6  
function <action> 1 16  
hash <signature> 1 256  
location <signature> 1 76 Yes
msgtext <message> 1 76  
msgdata <message> 1 N/A Yes
nonce <signature> 8 16  
reason <action> 1 16  
retcode <message> 1 8  
sig <signature> 80 256  
sigref <signature> 1 76 Yes
timestamp <signature> 16 16 Yes
useraddr <signature> 1 76 Yes
useremail <signature> 1 76 Yes
useridnum <signature> 1 76 Yes
username <signature> 1 76 Yes
userotherid <signature> 1 76 Yes
userphone <signature> 1 76 Yes

9 Verifying Certificates

The rules for verifying certificates are:

1.  Any certificates omitted (with prior agreement) must be obtained from a local database or other means, using <certissuer> and <certserial> fields in the referring <signature> block. 
2.  Certificates may be verified by byte-wise compare against copies kept in recipient database, or cryptographically using public key of root, or via both methods. 
3.  Signature date of document must fall between not-before and not-after dates in X.509 certificate. 
4.  Issuer certificates must be checked against certificate revocation list. 

The cryptographic verification process for X.509 certificates in an SDML document is as follows. For each sub-document in the SDML document, perform the following steps:

1.  Locate all the <cert> blocks in the sub-document. Extract the <certdata> field contents, convert the hex string to binary, and parse and extract the X.509 contents. 
2.  For each certificate in the sub-document, perform the following steps: 
(a)  If the <certissuer> field contents in the <cert> block to be verified is the name of the root, skip step 2b, and use the root public key as the public key in step 2e 
(b)  Find another certificate in the same sub-document whose X.509 "subject" distinguished name matches the "issuer" name in the certificate being verified(the <certissuer> field contents in the SDML must be the same as the issuer name in the X.509, either one may be used). If it cannot be found in the same sub-document, a local cache or database of certificates may be used. Extract its subject public key for use in step 2e  
(c)  Extract the Signed Certificate data from the X.509 in the certificate to be verified.   
(d)  Calculate the hash on the data obtained in step 2c, using the hash algorithm specified in the X.509 algorithm field.   
(e)  Verify that the hash obtained in step 2d, when signed using the public key (obtained in step 2a or 2b), matches the claimed signature from the X.509.   
(f)  Verify that the Signature date falls between the notBefore and notAfter dates in the X.509 certificate.   
(g)  Verify that the certificate is not in a certificate revocation list (if appropriate).  



10 Bibliography

[1] ISO, International Organization for Standardization, Cast Postale 56, CH-1211, Geneva 20, Switzerland. ISO 8879 Information Processing Systems - Text and Office Systems - Standard Generalized Markup Language (SGML), 1988.

[2] R. Rivest. RFC 1321 The MD5 Message-Digest Algorithm. IETF, April 1992.

[3] R.L. Rivest, A. Shamir, and L.M. Adleman. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM , 21(2):120-126, February 1978.

[4] U.S. Department of Commerce / National Institute of Standards and Technology . FIPS Pub 180 - Secure Hash Standard, May 1993.

[5] U.S. Department of Commerce / National Institute of Standards and Technology . FIPS Pub 186 - Digital Signature Standard, May 1993.

[6] CCITT, International Telegraphic Union, General Secretariat, Place Des Nations, CH-1211, Geneva 20, Switzerland. CCITT X.509 The Directory - Authentication Framework , January 1995.

[7] Nicklaus Wirth. What can we do about the unnecessary diversity of notation for syntactic definitions. Communications of the ACM , 22(11):822-823, November 1977.

[8] John Backus and Peter Naur. The syntax and semantics of the proposed international algebraic language of the Zurich ACM-GAMM conference. Proceedings of the International Conference on Information Processing , June 1959.

[9] N. Borenstein and N. Freed. RFC 1521 MIME (Multipurpose Internet Mail Extensions) - Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies. IETF, September 1993.

[10] ISO, International Organization for Standardization, Cast Postale 56, CH-1211, Geneva 20, Switzerland. ISO 8601 Data elements and interchange formats - Information interchange - Representation of dates and times, 1988.

[11] ISO, International Organization for Standardization, Cast Postale 56, CH-1211, Geneva 20, Switzerland. ISO 8824 Information Processing Systems - Abstract Syntax Notation One (ASN 1), 1995.

[12] ISO, International Organization for Standardization, Cast Postale 56, CH-1211, Geneva 20, Switzerland. ISO 8825 Information Processing Systems - Abstract Syntax Notation One - Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER), and Distinguished Encoding Rules (DER), 1995.

11 Issues and Directions

Certain issues are still being discussed and need further investigation. These include:

Appendix A - SGML Document Type Definition (DTD)

<!SGML  "ISO 8879:1986"
--                                      --
--  DTD for SDML electronic documents   --
--  First Draft  27 Feb 1996            --
--  Written by J. Kravitz IBM Research  --
--  Last Revision 21 Jan 1998           --
--  Version  1.00                       --
--                                      --

     BASESET  "ISO 646:1983//CHARSET
               International Reference Version (IRV)//ESC 2/5 4/0"
     DESCSET  0   9   UNUSED
              9   2   9
              11  2   UNUSED
              13  1   13
              14  18  UNUSED
              32  95  32
              127 1   UNUSED
     BASESET  "ISO Registration Number 100//CHARSET
               ECMA-94 Right Part of Latin Alphabet Nr. 1//ESC 2/13 4/1"
     DESCSET  128 32  UNUSED
              160 95  32
              255  1  UNUSED

                TOTALCAP        150000
                GRPCAP          150000
         SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
                           19 20 21 22 23 24 25 26 27 28 29 30 31 127 255
         BASESET  "ISO 646:1983//CHARSET
                   International Reference Version (IRV)//ESC 2/5 4/0"
         DESCSET  0 128 0
         FUNCTION RE          13
                  RS          10
                  SPACE       32
                  TAB SEPCHAR  9
         NAMING   LCNMSTRT ""
                  UCNMSTRT ""
                  LCNMCHAR "-"
                  UCNMCHAR "-"
                  NAMECASE GENERAL YES
                           ENTITY  NO
                  SHORTREF SGMLREF
         NAMES    SGMLREF
                  NAMELEN  34
                  TAGLVL   100
                  LITLEN   1024
                  GRPGTCNT 150
                  GRPCNT   64                   

    RANK     NO

<!DOCTYPE sdml [

<!ELEMENT sdml o o ( sdml-doc )>
<!ELEMENT sdml-doc - -      (
                             action        ,
                              sdml-doc     |
                              signature    |
                              cert         |
                              attachment   |
<!ATTLIST sdml-doc docname CDATA #REQUIRED
                   type    CDATA #REQUIRED >
<!ELEMENT action    - -     (
                             blkname     ,
                             crit?       ,
                             function    &
<!ELEMENT signature - -     (
                             blkname     ,
                             crit?       ,
                             sigdata     ,
<!ELEMENT sigdata   - -     (
                             ( blockref , hash )+ &
                             nonce       &
                             sigref?     &
                             (certissuer , certserial)?  &
                             algorithm   &
                             timestamp?  &
                             location?   &
                             username?   &
                             useraddr?   &
                             userphone?  &
                             useremail?  &
                             useridnum?  &

<!ELEMENT cert      - -     (
                             blkname     ,
                             crit?       ,
                             certtype    &
                             (certissuer , certserial)  &
<!ELEMENT attachment - -    (
                             blkname     ,
                             crit?       ,
                             astatus?    ,
<!ELEMENT message    - -    (
                             blkname     ,
                             crit?       ,
                             retcode     &
                             msgtext     &

<!ELEMENT blkname       - O (#PCDATA)>
<!ELEMENT crit          - O (#PCDATA)>
<!ELEMENT vers          - O (#PCDATA)>

<!ELEMENT adata         - - (#CDATA)>
<!ATTLIST adata encoding (mime | text)  text >
<!ELEMENT algorithm     - O (#PCDATA)>
<!ELEMENT astatus       - O (#PCDATA)>
<!ELEMENT blockref      - O (#PCDATA)>
<!ELEMENT certdata      - O (#PCDATA)>
<!ELEMENT certissuer    - O (#PCDATA)>
<!ELEMENT certserial    - O (#PCDATA)>
<!ELEMENT certtype      - O (#PCDATA)>
<!ELEMENT function      - O (#PCDATA)>
<!ELEMENT hash          - O (#PCDATA)>
<!ATTLIST hash alg (md5 | sha) #REQUIRED >
<!ELEMENT location      - O (#PCDATA)>
<!ELEMENT msgtext       - O (#PCDATA)>
<!ELEMENT msgdata       - - (#PCDATA)>
<!ELEMENT nonce         - O (#PCDATA)>
<!ELEMENT reason        - O (#PCDATA)>
<!ELEMENT retcode       - O (#PCDATA)>
<!ELEMENT sig           - O (#PCDATA)>
<!ELEMENT sigref        - O (#PCDATA)>
<!ELEMENT timestamp     - O (#PCDATA)>
<!ELEMENT useraddr      - O (#PCDATA)>
<!ELEMENT useremail     - O (#PCDATA)>
<!ELEMENT useridnum     - O (#PCDATA)>
<!ELEMENT username      - O (#PCDATA)>
<!ELEMENT userotherid   - O (#PCDATA)>
<!ELEMENT userphone     - O (#PCDATA)>


Appendix B - Definitions

Certificate  This is a piece of data containing the Public Key of a person or organization that is issued by a certificate issuer who is authenticating that the Public Key is in fact the one owned by the named person or organization. The certificate is usually digitally signed to authenticate it and prevent tampering. The certificate may be thought of as a binding between a public key, and the identification of the owner of that public key. 
Certificate Authority  This is either a piece of software, or the organization that uses the software, that issues certificates. Certificate Authorities (usually abbreviated as CA's) may issue certificates to end users, or to other CA's, allowing them, in turn, to issue certificates. The purpose of a Certificate Authority is to act as the assurance that a particular public key belongs to the person or organization identified with that public key. 
Certificate Hierarchy  This is the "chain" of certificates, each one pointing to the issuer of that certificate that can be followed to authenticate that the certificate owner, and its issuer are bona fide. 
Digital Signature  A cryptographic mechanism applied to a file, document, or other piece of data that allows the document to be authenticated as to its creator and contents. Most Digital Signature algorithms involve a Cryptographic Hash combined with a Public Key Encryption. 
Electronic Token  This is a electronic, tamper-resistant device used to perform the digital signing operation, and to hold any secret information, such a private keys, in a secure manner. Examples of this token would be Smart Cards, PCMCIA Cards, PC-bus (e.g. PCI) boards, etc. Some applications may require the use of such a token, as opposed to allowing such signing and secret keeping to be performed on a regular PC. It is not strictly required by the SDML definition that such a token be used; however, SDML is defined in such a way as to support such tokens. 
Public Key  A Public Key is a number (usually a very large number) that is mathematically related to its associated Private Key and is used in Public Key Cryptography. A Public Key can be freely published without loss of security, and, in fact, for digital signature purposes, must be widely distributed. 
Private Key  A Private Key is a number (usually a very large number) that is mathematically related to its associated Public Key and is used in Public Key Cryptography. A Private Key must be very securely hidden and not be made accessible to anyone other than the key owner. Very often, electronic means (and sometimes even explosives!) are used to protect the security of Private Keys. 
Root CA  This is the most-authoritative Certificate Authority in a Certificate Hierarchy. (This is a simplification, and assumes a tree-structured certificate hierarchy. Other, more complex, structures, with no root or multiple roots are possible). All certificates in the hierarchy can be traced back, possibly through multiple levels of CA's, to the Root CA. 

Appendix C - Acknowledgements

The creation and continued enhancement of this document was greatly assisted by many people, including the following contributors from the FSTC E-Check Technical Team:

Jim Akister  RDM 
Milt Anderson  Bellcore 
Sheueling Chang  Sun Microsystems 
Greg Dunne  Telequip 
Mark Feldman  CommerceNet 
Nikki Fischer  Huntington Banks 
John Fricke  Chase Bank 
Michael Halperin  BBN Planet 
Chris Hibbert  Agorics 
Eric Hill  Agorics 
Frank Jaffe  BankBoston 
David Lant  RDM 
An Le  National Semiconductor 
Stuart Marks  Sun Microsystems 
Cyndi Mills  BBN 
Elaine Palmer  IBM Research 
Brian Risman  Bank of Montreal 
Robert Rocchetti  Sun Microsystems 
Jim Seck  Unisys 
Mark Smith  Oak Ridge National Lab 
Sean Smith  IBM Research 
Tony Smith  Intranet 
Dave Solo  BBN 
Kurt Thams  Agorics 
Gene Tsudik  USC-ISI 
Paridhi Verma  IBM Research 
Jyri Virkki  Bellcore 
Gary Werner  Unisys