To: "Joe & J. Harvey" <ddd @Org>, JJV @ BBNcan be represented as:
To: "Joe & J. Harvey" <ddd @ Org>, JJV@BBNand
To: "Joe & J. Harvey" <ddd@ Org>, JJV @BBNand
To: "Joe & J. Harvey" <ddd @ Org>, JJV @ BBNThe process of moving from this folded multiple-line representation of a header field to its single line representation is called "unfolding". Unfolding is accomplished by regarding CRLF immediately followed by a LWSP-char as equivalent to the LWSP-char.
Certain field-bodies of headers may be interpreted according to an internal syntax that some systems may wish to parse. These fields are called "structured fields". Examples include fields containing dates and addresses. Other fields, such as "Subject" and "Comments", are regarded simply as strings of text.
Field-names, unstructured field bodies and structured field bodies each are scanned by their own, independent "lexical" analyzers.
These symbols are:
So, for example, the folded body of an address field
":sysmail"@ Some-Group. Some-Org, Muhammed.(I am the greatest) Ali @(the)Vegas.WBAis analyzed into the following lexical symbols and types:
:sysmail quoted string @ special Some-Group atom . special Some-Org atom , special Muhammed atom . special (I am the greatest) comment Ali atom @ atom (the) comment Vegas atom . special WBA atomThe canonical representations for the data in these addresses are the following strings:
field = field-name ":" [ field-body ] CRLF field-name = 1*<any CHAR, excluding CTLs, SPACE, and ":"> field-body = field-body-contents [CRLF LWSP-char field-body] field-body-contents = <the ASCII characters making up the field-body, as defined in the following sections, and consisting of combinations of atom, quoted-string, and specials tokens, or else consisting of texts>
; ( Octal, Decimal.) CHAR = <any ASCII character> ; ( 0-177, 0.-127.) ALPHA = <any ASCII alphabetic character> ; (101-132, 65.- 90.) ; (141-172, 97.-122.) DIGIT = <any ASCII decimal digit> ; ( 60- 71, 48.- 57.) CTL = <any ASCII control ; ( 0- 37, 0.- 31.) character and DEL> ; ( 177, 127.) CR = <ASCII CR, carriage return> ; ( 15, 13.) LF = <ASCII LF, linefeed> ; ( 12, 10.) SPACE = <ASCII SP, space> ; ( 40, 32.) HTAB = <ASCII HT, horizontal-tab> ; ( 11, 9.) <"> = <ASCII quote mark> ; ( 42, 34.) CRLF = CR LF LWSP-char = SPACE / HTAB ; semantics = SPACE linear-white-space = 1*([CRLF] LWSP-char) ; semantics = SPACE ; CRLF => folding specials = "(" / ")" / "<" / ">" / "@" ; Must be in quoted- / "," / ";" / ":" / "\" / <"> ; string, to use / "." / "[" / "]" ; within a word. delimiters = specials / linear-white-space / comment text = <any CHAR, including bare ; => atoms, specials, CR & bare LF, but NOT ; comments and including CRLF> ; quoted-strings are ; NOT recognized. atom = 1*<any CHAR except specials, SPACE and CTLs> quoted-string = <"> *(qtext/quoted-pair) <">; Regular qtext or ; quoted chars. qtext = <any CHAR excepting <">, ; => may be folded "\" & CR, and including linear-white-space> domain-literal = "[" *(dtext / quoted-pair) "]" dtext = <any CHAR excluding "[", ; => may be folded "]", "\" & CR, & including linear-white-space> comment = "(" *(ctext / quoted-pair / comment) ")" ctext = <any CHAR excluding "(", ; => may be folded ")", "\" & CR, & including linear-white-space> quoted-pair = "\" CHAR ; may quote any char phrase = 1*word ; Sequence of words word = atom / quoted-string
This mechanism is not fully general. Characters may be quoted only within a subset of the lexical constructs. In particular, quoting is limited to use within:
Full\ Name@Domainis not legal and must be specified as:
Note: In structured field bodies, multiple linear space ASCII characters (namely HTABs and SPACEs) are treated as single spaces and may freely surround any symbol. In all header fields, the only place in which at least one LWSP-char is REQUIRED is at the beginning of continua- tion lines in a folded field. When passing text to processes that do not interpret text according to this standard (e.g., mail protocol servers), then NO linear-white-space characters should occur between a period (".") or at-sign ("@") and a <word>. Exactly ONE SPACE should be used in place of arbitrary linear-white-space and comment sequences. Note: Within systems conforming to this standard, wherever a member of the list of delimiters is allowed, LWSP-chars may also occur before and/or after it. Writers of mail-sending (i.e., header-generating) programs should realize that there is no network-wide definition of the effect of ASCII HT (horizontal-tab) characters on the appear- ance of text at another network host; therefore, the use of tabs in message headers, though permitted, is discouraged.
A comment is a set of ASCII characters, which is enclosed in matching parentheses and which is not within a quoted-string The comment construct permits message originators to add text which will be useful for human readers, but which will be ignored by the formal semantics. Comments should be retained while the message is subject to interpretation according to this standard. However, comments must NOT be included in other cases, such as during protocol exchanges with mail servers. Comments nest, so that if an unquoted left parenthesis occurs in a comment string, there must also be a matching right parenthesis. When a comment acts as the delimiter between a sequence of two lexical symbols, such as two atoms, it is lex- ically equivalent with a single SPACE, for the purposes of regenerating the sequence, such as when passing the sequence onto a mail protocol server. Comments are detected as such only within field-bodies of structured fields. If a comment is to be "folded" onto multiple lines, then the syntax for folding must be adhered to. (See the "Lexical Analysis of Messages" section on "Folding Long Header Fields" above, and the section on "Case Independence" below.) Note that the official semantics therefore do not "see" any unquoted CRLFs that are in comments, although particular pars- ing programs may wish to note their presence. For these pro- grams, it would be reasonable to interpret a "CRLF LWSP-char" as being a CRLF that is part of the comment; i.e., the CRLF is kept and the LWSP-char is discarded. Quoted CRLFs (i.e., a backslash followed by a CR followed by a LF) still must be followed by at least one LWSP-char.
The quote character (backslash) and characters that delimit syntactic units are not, generally, to be taken as data that are part of the delimited or quoted unit(s). In particular, the quotation-marks that define a quoted-string, the parentheses that define a comment and the backslash that quotes a following character are NOT part of the quoted- string, comment or quoted character. A quotation-mark that is to be part of a quoted-string, a parenthesis that is to be part of a comment and a backslash that is to be part of either must each be preceded by the quote-character backslash ("\"). Note that the syntax allows any character to be quoted within a quoted-string or comment; however only certain characters MUST be quoted to be included as data. These characters are the ones that are not part of the alternate text group (i.e., ctext or qtext). The one exception to this rule is that a single SPACE is assumed to exist between contiguous words in a phrase, and this interpretation is independent of the actual number of LWSP-chars that the creator places between the words. To include more than one SPACE, the creator must make the LWSP- chars be part of a quoted-string. Quotation marks that delimit a quoted string and backslashes that quote the following character should NOT accompany the quoted-string when the string is passed to processes that do not interpret data according to this specification (e.g., mail protocol servers).
Where permitted (i.e., in words in structured fields) quoted- strings are treated as a single symbol. That is, a quoted- string is equivalent to an atom, syntactically. If a quoted- string is to be "folded" onto multiple lines, then the syntax for folding must be adhered to. (See the "Lexical Analysis of Messages" section on "Folding Long Header Fields" above, and the section on "Case Independence" below.) Therefore, the official semantics do not "see" any bare CRLFs that are in quoted-strings; however particular parsing programs may wish to note their presence. For such programs, it would be rea- sonable to interpret a "CRLF LWSP-char" as being a CRLF which is part of the quoted-string; i.e., the CRLF is kept and the LWSP-char is discarded. Quoted CRLFs (i.e., a backslash fol- lowed by a CR followed by a LF) are also subject to rules of folding, but the presence of the quoting character (backslash) explicitly indicates that the CRLF is data to the quoted string. Stripping off the first following LWSP-char is also appropriate when parsing quoted CRLFs.
There is one type of bracket which must occur in matched pairs and may have pairs nested within each other: o Parentheses ("(" and ")") are used to indicate com- ments. There are three types of brackets which must occur in matched pairs, and which may NOT be nested: o Colon/semi-colon (":" and ";") are used in address specifications to indicate that the included list of addresses are to be treated as a group. o Angle brackets ("<" and ">") are generally used to indicate the presence of a one machine-usable refer- ence (e.g., delimiting mailboxes), possibly including source-routing to the machine. o Square brackets ("[" and "]") are used to indicate the presence of a domain-literal, which the appropriate name-domain is to use directly, bypassing normal name-resolution mechanisms.
- text - qtext - dtext - ctext - quoted-pair - local-part, except "Postmaster" When matching any other syntactic unit, case is to be ignored. For example, the field-names "From", "FROM", "from", and even "FroM" are semantically equal and should all be treated ident- ically. When generating these units, any mix of upper and lower case alphabetic characters may be used. The case shown in this specification is suggested for message-creating processes. Note: The reserved local-part address unit, "Postmaster", is an exception. When the value "Postmaster" is being interpreted, it must be accepted in any mixture of case, including "POSTMASTER", and "postmaster".
During transmission through heterogeneous networks, it may be necessary to force data to conform to a network's local con- ventions. For example, it may be required that a CR be fol- lowed either by LF, making a CRLF, or by <null>, if the CR is to stand alone). Such transformations are reversed, when the message exits that network. When crossing network boundaries, the message should be treated as passing through two modules. It will enter the first module containing whatever network-specific transforma- tions that were necessary to permit migration through the "current" network. It then passes through the modules: o Transformation Reversal The "current" network's idiosyncracies are removed and the message is returned to the canonical form speci- fied in this standard. o Transformation The "next" network's local idiosyncracies are imposed on the message. ------------------ From ==> | Remove Net-A | Net-A | idiosyncracies | ------------------ || \/ Conformance with standard || \/ ------------------ | Impose Net-B | ==> To | idiosyncracies | Net-B ------------------