This document is a NOTE made available by the W3 Consortium for discussion only. This indicates no endorsement of its content, nor that the Consortium has, is, or will be allocating any resources to the issues addressed by the NOTE.
This document is a submission to W3C. Please see <http://www.w3.org/Submission/> Acknowledged Submissions to W3C regarding its disposition.
This document describes conventions for using HTML in email. As messages go back and forth between participants in a discussion, it is interesting to be able to track properties of the text in the message and properties of the message itself, such as who wrote what or what message a quoted excerpt is originally from. This proposal defines a mechanism for embedding this information within an email message in a manner that degrades gracefully to downlevel mail clients.
HTML Threading enables user agents (UAs) to easily identify the source message and author for arbitrary runs of text and additionally defines conventions to apply a distinct visual style to text written by various authors.
Separate IETF documents ([MHTML], [MID], and [REL]) discuss how to encapsulate HTML into MIME messages.
Email messages today frequently represent an entire history of a conversation: a single message generates a series of replies, or it may be forwarded to new people who might then enter the conversation. As conversations progress, it becomes difficult to determine who wrote what and in what message a run of text first appeared.
Today, in plain-text mail messages, this information is typically conveyed by prepending a "quoting" character such as ">" before each line in a message that is quoted during a reply. Each time a message is quoted, it gets one more ">". However, this is an imperfect solution for several reasons:
The overall goals for HTML Threading are:
Some applications of HTML Threading include (but are not limited to):
This proposal comprises four components:
Each group of properties contains information about a single author or a single message. Text within the message is tagged with a URL that points to an author or source message, and an associated group of properties references the same URL.
Text may also be tagged with a style to provide default presentation information, if the UA supports stylesheets.
UAs may need to glean information about the message from which the text originated, and information about the author of the message. Schemas are defined for each.
This specification recommends that certain properties be provided when describing messages and authors. Properties that are recommended SHOULD be provided by the sender. Properties that are considered optional MAY be provided. No properties are absolutely required.
One unique property grouping SHOULD be provided for each author of a message or quoted message, and one SHOULD be provided for each quoted message embedded within the message and the for the message itself.
Each block has the following attributes:
Properties that describe a message adhere to the schema defined in http://www.w3.org/schemas/Message (Hereafter simply "message"). The ABOUT attribute at the root of the property block is a URL, which provides a unique reference to the message.
Properties defined by this schema include:
NAME |
VALUE |
Recommended? |
Date | Date the message was originally sent, in ISO-8601 format | Yes |
In-Reply-To | The RFC822 message id of the message that this is in response to. | Yes, for responses only. |
Message-ID | The RFC822 "Message-ID" header of the message, in the form of an "MID:" URL. | Yes. |
AuthorURL | A URL to information about the author. Typically, this is a mailto: URL, but could be any valid URL. This is used as the reference for the block of properties that describe an author. | Yes |
AuthorEmail | A valid RFC822 email address for the author | Yes |
AuthorName | A human-readable representation of the author's name | Yes. |
Subject | The RFC822 "Subject" header | Optional |
From | The RFC822 "From" header | Optional |
Received | Date the message was originally received, in ISO-8601 format | Optional |
To | The RFC822 "To" header | Optional |
Cc | The RFC822 "Cc" header | Optional |
Additionally, properties from other schemas can be included in an XML block describing a message, using the XML schema namespace mechanism.
All of the VALUE elements are string values. Note, however, that special characters in values must be escaped using XML escaping mechanisms within the XML stream.
Note that the AuthorURL property is a unique identifier for the author. The XML identifying the author can be discovered by finding the XML that has an ABOUT attribute pointing to the same URL. A UA can thus distinguish excerpts from multiple messages that are written by the same author without storing information about each author multiple times.
XML that describes an author adheres to the schema defined in http://www.w3.org/schemas/Person (Hereafter simply "person"). The ABOUT attribute for the block is a URL that identifies the author being described.
Properties defined by this schema include:
NAME |
VALUE |
Recommended? |
AuthorURL | A URL to information about the author. Typically, this is a mailto: URL, but could be any valid URL. This is used as the reference for the block of properties that describe an author. | Optional (part of the ABOUT attribute for the XML block) |
CN | Display name (Common name) of the author (string) | Yes |
O | Organization of the author (string) | Yes |
OU | Organizational unit for the author (string) | Optional |
telephoneNumber | Primary phone number for the user (string) | Optional |
title | Occupational title | Optional |
sn | Surname | Optional |
givenName | Given name | Optional |
PreferredFormat | "text/html" if this author prefers HTML mail, else "text/plain" or any other MIME content type. | Optional |
StylesheetClassname | The name of a CSS1 class name that is used for text authored by this person. | Optional |
Additionally, properties from other schemas can be included in an XML block describing an author, using the XML schema namespace mechanism.
The UA that creates a message is responsible for ensuring that the message contains:
Information in the XML for the "current" message is typically incomplete, if present at all. This is because many details (such as Received date) cannot be known by the UA that originated the message before the message is submitted for delivery. However, the UA that subsequently forwards or replies to that message does have access to this information. Therefore, the UA that forwards or replies to a message SHOULD add appropriate properties for the message being forwarded/replied-to at the time that it quotes the message.
The XML for a message may be embedded in the HTML of the message, may be stored in a separate MIME body part according to the MHTML specification, or may be external to the message. For the purposes of email, the XML SHOULD be included with the message if possible.
When the XML is not in the HTML itself, a LINK element in the HEAD block of the HTML points to the the XML. The REL attribute for the LINK element is "HTMLAttrib". If the URL points to another body part of a message, then the MIME-type for the body part containing the XML is "application/xml".
A UA may discover which properties describes a particular message or author by finding a block of XML with the appropriate ABOUT attribute. This is done by matching URLs. URLs MUST match fully to be considered a match. Due to the matching complexity they introduce, relative URLs are not permitted.
When quoting a message, BLOCKQUOTE or DIV elements are used to surround the quoted message in a block fashion. Q (new in HTML 4.0, and the preferred element to use) or SPAN (for UAs that do not yet conform to HTML 4.0) is used to denote the range of an in-line run of text from a particular source.
The source of text (either author or source message) identified through the use of the HTML 4.0 CITE attribute on the surrounding DIV, BLOCKQUOTE, SPAN, or Q elements. Typically, this URL points to another message, in which case the MID URL defined in [MID] is used, although it can be any URL type for other sources (e.g., a web page from which a section derives).
The UA can discover information about the message that is quoted by looking at the CITE target of the innermost, surrounding BLOCKQUOTE, DIV, Q, or SPAN element that has an CITE attribute, and matching it to the XML block that references the same URL. That XML block contains data describing the text. Typically, this data is from the message schema.
If no matching data can be found, and the CITE references the current message, then information from the current message can be used.
If no matching data from the message or author schemas is found, and the CITE does not reference the current message, then the properties of the text should be considered unknown, although other mechanisms could be used, such as scanning a message store or looking at the current message to try to find a corresponding message, and then looking at that message.
If no enclosing elements can be found, the UA should assume that the text is new to the current message. It is recommended that this text be enclosed in a DIV or BLOCKQUOTE block with appropriate CITE attributes when it is replied to or forwarded.
Text that is new to the current message can thus be determined in one of three ways:
Properties of the current message may be included in the XML included with the message, or it can be gleaned from the context in which the message was received (e.g., the surrounding MIME message). The latter is likely to be more accurate and more complete (for example, the received time of the message is not known by the sender and hence cannot be present in the XML). When the same property can be determined from both the XML and from the context of the message, the context is preferred.
Once it determines the message from which a block of text originated, a UA can discover information about an author as follows:
For example, consider the following email exchange:
From: Eric Berman <ericbe@microsoft.com> To: Dave Raggett <dsr@w3.org> MIME-Version: 1.0 ... <P>Text from Eric in response to a message from Dave in response a message from Eric</p> <BLOCKQUOTE CITE="mid:198d893921432@skdr83.23415h1"> <P>Text from Dave in response to a message from Eric</p> <BLOCKQUOTE CITE="mid:8ah35k32l11@38943k.2313243"> <P>Original text from Eric</p> </blockquote> </blockquote> ... <XML> <?xml:namespace href="http://www.w3.org/schemas/Message" AS "M"/> <M:MESSAGE M:ABOUT="mid:8ah35k32l11@38943k.2313243"> <AuthorURL>mailto:ericbe@microsoft.com</AuthorURL> ... </MESSAGE> <M:MESSAGE M:ABOUT="mid:198d893921432@skdr83.23415h1"> <AuthorURL>mailto:dsr@w3.org</AuthorURL> ... </MESSAGE> </XML>
The UA in this case can determine that the first paragraph is new to the current message, the second paragraph came from message with ID "198d893921432@skdr83.23415h1" (because of the CITE attribute on the BLOCKQUOTE), and the third was from "8ah35k32l11@38943k.2313243". In this example, the message with ID "8ah35k32l11@38943k.2313243" and the current message are both by the same author.
Default presentation for the excerpted text may be provided through the CLASS attribute on the surrounding elements; this identifies a CSS1 conformant class name. A stylesheet definition for that class MAY be defined, which would then provide default presentation information for all text from that class. The UA may choose to provide this presentation information on a per-author basis (directly applied to the text written by that author) or on a per-message basis (applied to the BLOCKQUOTE or DIV that surrounds the quoted message).
For example, quoted text in a BLOCKQUOTE may have a style applied which displays a vertical bar to the left of the quoted section, as a more robust (and wrappable) mechanism than putting "|" or ">" to the left of each line.
This also provides a default presentation for clients that do not interpret the XML information. UAs with CSS1 -conformant HTML display thus automatically get the default presentation for the message. Note that this does not preclude the use of additional styles or formatting within the ranges of quoted information.
Classes can be applied this to a wide range of HTML elements. In addition, the receiving UA can alter the appearance on-the-fly of all text authored by a particular individual. (For example, to highlight all text written by a particular individual).
When quoting a message, the original message is typically wrapped in a BLOCKQUOTE block, with a CITE attribute to reference the source message, and a CLASS attribute for default presentation for the quote. In most HTML display implementations, the BLOCKQUOTE element implies a degree of indentation (which can be overridden by the UA), which helps to preserve the flow of conversation in a message in downlevel clients. For this reason, the use of BLOCKQUOTE is preferred to the use of DIV.
This specification describes a method for embedding properties of people and messages in email. This should be encrypted if it is considered sensitive, and conforming applications are encouraged to avoid including sensitive information altogether if not required.
This specification also describes a method for attributing text to particular people. UAs MUST NOT assign any degree of trust that this attribution is valid unless the original message can be found, is signed, and the signature can be verified. Even then, however, there is no guarantee that the text has not been modified since being quoted, and that those changes have simply not been tracked according to this specification.
This specification does not describe mechanisms to ensure the authenticity and integrity of XML data associated with a message. In particular, there is no way to determine if the XML data has been tampered with described here. However, existing mechanisms for signing messages can be used and applied to the XML as well as the rest of the message. Additionally, down-level clients may fail to mark text appropriately, resulting in text that can be misattributed.
It may be desirable for UA's to "fix up" the data in XML that arrives with a message. UA's should take care when applying this to signed messages as this can invalidate the signature.
Since data is associated with text as a message is forwarded, or possibly when it is cut/pasted between documents, it is possible for an author or UA to include data with text when that is not the intended behavior. UA's should make it clear to users that text may have associated data, and SHOULD enable the user to remove all of the associated data.
Similarly, if class names used to associate presentation information with a run of text by an author are derived from that author's name and/or email address, then the author of a run of text may be derived by examining the HTML source. If this is not the desired behavior, then other algorithms for generating a class name should be used, or the author should have the option to remove all of the associated classes.
User agents that claim conformance to this specification MUST:
Conforming user agents SHOULD:
This section describes a suggested implementation model for UAs. These are guidelines only and do not constitute a requirement for compliance.
When quoting a message during a reply/forward, it is recommended that the text be encapsulated with BLOCKQUOTE elements, with a CITE attribute identifying the message being quoted, and optionally a CLASS attribute defining default style information. (BLOCKQUOTEs explicitly authored by the user should not have an CITE, or should have a CITE pointing to the current message, so that they can be distinguished from message excerpts.)
As a message makes several roundtrips, it may acquire several nested BLOCKQUOTE blocks. Because most HTML implementations indent BLOCKQUOTE text, the use of BLOCKQUOTE ensures that recipients that support neither this recommendation nor CSS1 can still display an appropriate level of indentation to the user. However, a mail UA may choose to use other elements, such as DIV, if they do not wish to use BLOCKQUOTE in order to achieve the same grouping of text within a message (for example, for forwarding a message, where indenting may not be desired.).
An application MAY choose to enclose all new message text in a DIV block so that all text is unambiguously tagged. This is not strictly necessary (since the lack of any explicit element or context indicates that the text is new). UAs MUST respect this element. When replying or forwarding the message, UA's MAY change the DIV to a BLOCKQUOTE to avoid needlessly nesting a DIV within a BLOCKQUOTE.
It is common when responding to a message to edit within the quoted block of text. For example, questions asked within the original message may be answered one at a time in the response. These edits can fall into two categories: "block level", where the division between quoted and new text is a block-level boundary, and "inline", where there is no such division.
If the user was starting a new block in the middle of an existing one (by hitting enter, for example), the UA must restart the existing style and CITE block after the BLOCKQUOTE.
For example, if the original text was:
<BLOCKQUOTE CLASS="dsr--w3-org" CITE="mid:014328a83@389ak3j21h4"> The quick black fox </BLOCKQUOTE>
and the replying author hits enter before the word "fox", then the resulting HTML would be:
<BLOCKQUOTE CLASS="dsr--w3-org" CITE="mid:014328a83@389ak3j21h4"> The quick black </BLOCKQUOTE> <P CLASS="ericbe--microsoft-com">You mean brown!</p> <BLOCKQUOTE CLASS="dsr--w3-org" CITE="mid:014328a83@389ak3j21h4"> fox </BLOCKQUOTE>
The text "You mean brown" will thus appear on its own line and in the style defined for "ericbe--microsoft-com". Since the new text is not enclosed in any DIV or BLOCKQUOTE elements, it is considered part of the current message.
Note that this may nest arbitrarily deep, so it may be necessary to close multiple BLOCKQUOTE elements and then reopen each one, with the right CLASS and CITE attributes, and in the right order.
If the user types text in-line within a block of text that is not tagged as having been written by that user, then a Q range identifying the current message should be created to encompass this. This allows comments to be inserted in-line. For better compatibility with UAs that do not support the <Q> element, <SPAN> MAY be used, although Q is preferred.
For example, if Pete sends:
...I have a new car...
And Eric replies and adds new text in the middle of this line, the resulting HTML should be:
...I have a<Q CITE="mid:014328a83@389a09fkkh3usj" CLASS="ericbe--microsoft-com">n insanely great</q> new car...
If Pete's text is black sans-serif, and a stylesheet definition exists for .ericbe--microsoft-com defined as maroon boldface sans-serif, then a CSS1 conformant UA would display this as:
I have an insanely great new car
When pasting text that has no block-level formatting (no <P>, <DIV>, etc. if HTML, or CRLF if plain-text), it should be treated as if it were an in-line edit and a Q range should be created as described above to delimit the text. If the target location for the paste is written by the same author and part of the same message, then the range need not be created.
When pasting text that has block level formatting, it should be pasted at the quoting level of the insertion point. If the clipboard text further contains quoted text (e.g., BLOCKQUOTE with CITE attribute), that block effectively becomes quoted one or more levels deeper than the surrounding text.
When copying selected text from a message, it is important for the UA to identify all of the XML and styles that correspond to the text being moved so that they can be included on the clipboard and merged into the target message's XML and stylesheets at paste time.
When pasted XML refers to the same message or author as XML in the target, the XML in the target message should not be overwritten. When there is a conflict in the style name, the style in the target should not be overwritten.
The UA for the source of the copied text should ensure that the text being copied is enclosed in DIV or BLOCKQUOTE elements that have appropriate attributes (and corresponding XML, and optional style) corresponding to the source message. The first text in the copied block should also tagged with appropriate author identification, if it isn't already. This ensures that when it is pasted into the target document, it will be distinguishable from the surrounding text.
UAs SHOULD use styles to provide default presentation formatting on a per-author basis. UAs MAY choose to use styles to provide default presentation formatting on a per-message basis. If a style is defined for a range of text, then all text within that range inherit the default formatting for that message.
It is recommended that stylesheet definitions for messages not include block-level formatting such as borders or indentation so that this can be provided by the surrounding BLOCKQUOTE or DIV elements. Thus block-level formatting can be used to help preserve the conversation thread (e.g., indenting each time the message is replied to), while inline characteristics such as color can be used to display author information (inheriting and building off of the indentation, etc., from the surrounding BLOCKQUOTE block). The UA is, of course, free to define and apply additional styles if it requires specific additional formatting.
Unspecified stylesheet properties inherit from surrounding text. For example, a text style that is defined simply as black may be displayed bold if it is surrounded by bold text. For this reason, UA's may choose to ensure that stylesheet definitions for authors be as complete as possible.
Note that the use of formatting at the BLOCKQUOTE and/or DIV level provides default formatting for the text within that block. For example, if a style corresponding to a particular message is defined as red text, then all text from that message will be red unless explicitly formatted otherwise.
UAs may also choose to define styles for authors and apply these styles directly to the text. If so, they should use style names as defined below (so that they may be applied and modified in a consistent way by different UAs).
When creating, replying to, or forwarding a message, it is important to consistently re-use a style class name that corresponds to the author if one already exists, or else to create a new one in a consistent and non-conflicting manner that will be re-used by other UAs as the message thread continues.
Conforming applications SHOULD use the class name stored in the StylesheetClassname peroperty for an author, if present. Otherwise, an algorithm such as the one here may be employed to derive a class name for the author.
Since internet email names more or less uniquely map to a single person or entity, and since they are largely invariant over the lifetime of a message, it is recommended that these provide the basis for class names as follows:
For example, "jim@floober.com" would become the name .jim--floober-com. This results in a name that is highly likely to be unique, is reproducible, and still fairly human legible.
One feature of this approach is that an arbitrary number of authors can be tracked throughout a conversation--and this information can be preserved through round-trips. As each new person joins the conversation, the number of classes and WC blocks defining authors grows by one, but never grows unless new people join.
Note that this naming convention is just that. It is not a requirement for conformance with this recommendation, nor does it preclude the use of other naming conventions or other styles. For example, for privacy reasons an author may not want to use a derivative of their email name as a class name. In this case an alternative algorithm to generate a unique classname may be used.
The class name that is ultimately derived for an author, by this or by other algorithms, SHOULD be stored in the "StylesheetClassname" property for that author so that all clients can re-use the same classname in the future. This is particularly true if the algorithm described here is not used--otherwise, a single author's text would be displayed with styles from multiple classes. In particular, it may be desirable to use class names that are not related to the author's email name in order to protect their privacy, in which case the StylesheetClassname property is used to ensure consistent class name usage. (Note, of course, that the author's email name is likely present in the XML that includes the StylesheetClassname property, and hence the name can be exposed that way. However, if the XML is removed, then the attributed text automatically becomes "anonymized".)
Note that this algorithm is one-way: given an email name, it generates a stylesheet class name in a predictable manner, but the class name cannot reliably yield an email name. UA's should not assume that any given class name is meaningful; they should use the XML data to determine author info and treat the class name as opaque.
When generating the plain-text stream from HTML, anything within BLOCKQUOTE or DIV that has a CITE attribute should be quoted with an appropriate quote character (e.g., ">"). Nested BLOCKQUOTEs with CITE attributes need to have an appropriate number of quote characters prepended to each line.
Q or SPAN ranges within a BLOCKQUOTE range should also be delimited so that the recipient can tell that somebody added something new to a block. The actual implementation of this delimitation is UA defined, but suggested methods include prepending the users initials in brackets or surrounding the text with distinctive characters such as asterisks.
Here is a sample MHTML message that uses HTML Threading:
An original message might go out in a very simple form, like this:
From: Yogi Beera <yogi@picnic.com> To: Boo Booz <booboo@jellystone.org> Message-ID: FD1D3K3KASDDFD17C00805FD459C86BB2BF@POPDOG Subject: Joke MIME-Version: 1.0 Content-Type: text/html; <HTML> <HEAD> <STYLE TYPE="text/css"> <!-- .yogi--picnic-com { color: black; font-weight: bold; font-style: normal; text-decoration: none; } --> </STYLE> </head> <BODY bgcolor="#FFFFFF"> <P CLASS=yogi--picnic-com>Konck Knock</p> </body> </html>
The response would quote this message and might result in something like this:
From: Boo Booz <booboo@jellystone.org> To: Yogi Berra <yogi@picnic.com> Message-ID: A3F9D8E1A3FD23DF3SDF3S321FD459C86BB2BF@POPDOG Subject: RE: Joke MIME-Version: 1.0 Content-Type: multipart/related; boundary="---- =_NextPart_000_01BC180B.74A74F40" ------ =_NextPart_000_01BC180B.74A74F40 Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: 7bit <HTML> <HEAD> <LINK REL=HTMLAttrib HREF="cid:abodkelsa"> <STYLE TYPE="text/css"> <!-- .yogi--picnic-com { color: black; font-weight: bold; font-style: normal; text-decoration: none; } .booboo--jellystone-org { color: blue; font-weight: normal font-style: normal; font-decoration: none; } .attribution { font-style: italic; text-decoration: underline; margin-bottom: 0em; } --> </STYLE> </head> <body bgcolor="#FFFFFF"> <BLOCKQUOTE CITE="mid:FD1D3K3KASDDFD17C00805FD459C86BB2BF@POPDOG"> <SPAN CITE="mid:A3F9D8E1A3FD23DF3SDF3S321FD459C86BB2BF@POPDOG"> <P CLASS="attribution">On Tuesday, 1/14, Yogi Beera wrote:</p> </span> <P CLASS=yogi--picnic-com>Konck <SPAN CITE="mid:A3F9D8E1A3FD23DF3SDF3S321FD459C86BB2BF@POPDOG" CLASS="booboo--jellystone-org"> you misspelled "knock"... </span>Knock</p> </blockquote> <P CLASS="booboo--jellystone-org">Who's there?</p> </body> </html> ------ =_NextPart_000_01BC180B.74A74F40 Content-Type: application/xml Content-Transfer-Encoding: 7bit Content-ID: abodkelsa <XML> <?xml:namespace HREF="http://www.w3.org/schemas/Message" AS "M"> <?xml:namespace HREF="http://www.w3.org/schemas/Person" AS "P"> <M:MESSAGE M:ABOUT="mid:FD1D3K3KASDDFD17C00805FD459C86BB2BF@POPDOG"> <Date>19970108T073008Z</Date> <From>Yogi <ericbe@microsoft.com></From> <AuthorURL>mailto:ericbe@microsoft.com</AuthorURL> </MESSAGE> <M:MESSAGE M:ABOUT="mid:A3F9D8E1A3FD23DF3SDF3S321FD459C86BB2BF@POPDOG"> <Date>19970114T142300Z</Date> <From>Dave Raggett <raggett@w3.org></From> <AuthorURL>mailto:raggett@w3.org</AuthorURL> <In-Response-To>FD1D3K3KASDDFD17C00805FD459C86BB2BF@POPDOG</In-Response-To> </MESSAGE> <P:PERSON P:ABOUT="mailto:yogi@picnic.com"> <CN>Yogi Beera</CN> <O>Picnic Corp.</O> </PERSON> <P:PERSON P:ABOUT="mailto:booboo@jellystone.org"> <CN>Boo Booz</CN> <O>Jellystone National Park</O> </PERSON> </XML> ------ =_NextPart_000_01BC180B.74A74F40--
Dave Raggett, Dan Connolly, Brett Marl, Alex Hopmann, Ralph Swick, Darren Apfel, Thomas Reardon, Chris Wilson, Yaron Goland, Scott Isaacs, George Hatoun, Andrew Layman, others.....
2/3/97:
2/6/97
2/20/97
2/28/97
3/11/97
3/28/97
4/3/97
4/6/97
4/8/97
5/9/97
8/8/97
9/17/97
9/30/97
10/23/97
10/27/97