WF2: application/x-www-form-urlencoded encoding ill-defined

Dear Web Application Formats Working Group,

  http://www.w3.org/TR/2006/WD-web-forms-2-20060821/ section 5.3 item 4
is:

  Control names and values are escaped. Space characters are replaced by
  "+" (U+002B), and other non-alphanumeric characters are encoded in the
  submission character encoding and each resulting byte is replaced by
  "%HH", a percent sign (U+0025) and two uppercase hexadecimal digits
  representing the value of the byte.

This text is rather unclear and incorrect; it does not define what non-
alphanumeric characters are (and whatever it means, it's incorrect), the
character encoding is applied to the whole string, not just non-alpha-
numeric characters, and %hh encoding is applied based on what the bytes
are, not what the character were.

Consider the following cases:

  * encoding is UTF-8 and the value is "_", implementations should not
    apply %hh encoding to it even though it's not alphanumeric

  * encoding is UTF-7 and the value is "ö", the byte sequence would be
    +APY- and implementations should apply %hh escaping only to the +,
    not to the whole thing or nothing (depending on whether "ö" is con-
    sidered alphanumeric)

Please change the draft in a way that properly reflects the above and
current implementations. I don't know the exact set of bytes that need
to have %hh encoding applied, but I suspect the set is similar to that
of characters considered reserved in the query string as per RFC 3986.

regards,
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 

Received on Wednesday, 6 September 2006 01:54:07 UTC