LC315 serialization escaping

Hi all,

this is regarding whttp:header serialization into HTTP.

Apparently, in HTTP 1.1 (RFC 2616) header values follow RFC 822 (MIME
messages) section 3.1 (or 2.2 in the updated RFC 2822) which allows any
ASCII character except CR and LF in the header body, and at any LWSP
character (space or horizontal tab) it can be split with a CRLF followed
by that LWSP. For example:

Header: foo bar

can be represented as 

Header: foo
 bar

It seems that when serializing the HTTP header value, we need to handle
somehow all non-ASCII characters (and CR and LF). I'd propose the normal
XML character entity escaping, i.e. &#num;, but & is a perfectly normal
character in header fields so we wouldn't want them to become & or
the numbered character reference. Because the RFCs don't seem to define
escaping for characters that are not allowed, I suggest that we simply
restrict, in prose, the content to consist of only ASCII characters
except CR and LF. That means ASCII codes 1 to 127, except for 10 and 13.
Restricting it in schema would mean people couldn't use xs:int etc.
because they would have to extend our type, probably based on xs:string.

When serializing, we should suggest that due to line length
considerations (also in the RFCs), the values should be split into
multiple lines if otherwise the line length would be more than 78
characters (including the ending CRLF). This limit is taken from RFC
2822; 2616 doesn't talk about such limits and 822 suggests less than 65
or 72; 2822 also has a hard limit (MUST) of 998 characters and we might
want to include this one.

Further, header name (we should soon have a proposal to add an attribute
specifying that) can consist only of printable ASCII characters except
for colon, i.e. 33 to 57 and 59 to 126. We should similarly restrict the
value of the attribute, except here we can safely create XML Schema type
to catch this restriction.

This seems to cover my action item. 8-)

Best regards,

Jacek

Received on Thursday, 15 September 2005 16:42:32 UTC