This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 12605 - section 4.10.22.5, step 4.4: simplify x-www-form-urlencoded encoding algorithm
Summary: section 4.10.22.5, step 4.4: simplify x-www-form-urlencoded encoding algorithm
Status: CLOSED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: Other other
: P3 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://www.whatwg.org/specs/web-apps/...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-05 11:28 UTC by contributor
Modified: 2011-08-04 05:02 UTC (History)
5 users (show)

See Also:


Attachments

Description contributor 2011-05-05 11:28:13 UTC
Specification: http://www.whatwg.org/specs/web-apps/current-work/multipage/association-of-controls-and-forms.html
Section: http://www.whatwg.org/specs/web-apps/current-work/#url-encoded-form-data

Comment:
section 4.10.22.5, step 4: first sub-step deals with U+0020 space, so U+0020
and 0x20 can be removed from the subsequent points

Posted from: 218.120.54.137
User agent: Opera/9.80 (Windows NT 5.1; U; ja) Presto/2.8.131 Version/11.10
Comment 1 Hallvord R. M. Steen 2011-05-05 12:18:35 UTC
I think a part of the algorithm in step 4.4 is superfluous - there is one step saying 'if the character isn't in the range"  and inside that if-block there is another step saying 'if the character IS in the range" giving the exact same range of character codes.

Wouldn't this:

       <!-- * - . _ 0-9 a-z A-Z -->

       <dt>If the character isn't in the range U+0020, U+002A,
       U+002D, U+002E, U+0030 to U+0039, U+0041 to U+005A, U+005F,
       U+0061 to U+007A</dt>

       <dd>

        <p>Replace the character with a string formed as follows:</p>

        <ol><li><p>Let <var title="">s</var> be an empty string.</li>

         <li>

          <p>For each byte <var title="">b</var> of the character when
          expressed in the selected character encoding in turn, run
          the appropriate subsubsubstep from the list below:</p>

          <dl class="switch"><dt>If the byte is in the range 0x20, 0x2A, 0x2D, 0x2E,
           0x30 to 0x39, 0x41 to 0x5A, 0x5F, 0x61 to 0x7A</dt>

           <dd><p>Append to <var title="">s</var> the Unicode
           character with the code point equal to the byte.</dd>

           <dt>Otherwise</dt>

           <dd><p>Append to the string a U+0025 PERCENT SIGN character
           (%) followed by two characters in the ranges U+0030 DIGIT
           ZERO (0) to U+0039 DIGIT NINE (9) and U+0041 LATIN CAPITAL
           LETTER A to U+0046 LATIN CAPITAL LETTER F representing the
           hexadecimal value of the byte (zero-padded if
           necessary).</dd>

          </dl></li>

        </ol></dd>

       <dt>Otherwise</dt>

       <dd><p>Leave the character as is.</dd>

      </dl></li>

Be better written as 


       <!-- * - . _ 0-9 a-z A-Z -->

       <dt>If the character is in the range U+002A,
       U+002D, U+002E, U+0030 to U+0039, U+0041 to U+005A, U+005F,
       U+0061 to U+007A</dt>
       
       <dd><p>Leave the character as is.</dd>

       <dt>Otherwise</dt>

       <dd>

        <p>Replace the character with a string formed as follows:</p>

        <ol><li><p>Let <var title="">s</var> be an empty string.</li>

         <li>

          <p>For each byte <var title="">b</var> of the character when
          expressed in the selected character encoding in turn,  append 
          to the string a U+0025 PERCENT SIGN character
           (%) followed by two characters in the ranges U+0030 DIGIT
           ZERO (0) to U+0039 DIGIT NINE (9) and U+0041 LATIN CAPITAL
           LETTER A to U+0046 LATIN CAPITAL LETTER F representing the
           hexadecimal value of the byte (zero-padded if
           necessary).
          </li>

        </ol></dd>



      </dl></li>
Comment 2 Ian 'Hixie' Hickson 2011-08-03 00:07:19 UTC
(Please do not suggest new text, instead, say what is wrong with the current text. Just proposing new text makes it impossible for the editor to determine if the problem is endemic (requiring more changes than you realise), or whether what the editor thinks of as mistakes in the new proposed text are intentional or not (and should be fixed or not), or whether stylistic differences are intentional or not, etc.)

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Partially Accepted
Change Description: see diff given below
Rationale: I've tried to improve the text here.

Note that your proposed text breaks the case where the encoding used maps to bytes that themselves map to ASCII characters that don't need escaping.
Comment 3 contributor 2011-08-03 00:08:46 UTC
Checked in as WHATWG revision r6355.
Check-in comment: Try to make this text more readable.
http://html5.org/tools/web-apps-tracker?from=6354&to=6355
Comment 4 Hallvord R. M. Steen 2011-08-03 10:32:28 UTC
OK. I thought suggesting replacement text was the preferred way, but it's faster and simpler for me not to :)
Comment 5 Michael[tm] Smith 2011-08-04 05:02:38 UTC
mass-moved component to LC1