This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 20923 - Need a crazy mode for the %-decoder
Summary: Need a crazy mode for the %-decoder
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: HTML (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: Unsorted
Assignee: Ian 'Hixie' Hickson
QA Contact: contributor
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-09 01:26 UTC by Ian 'Hixie' Hickson
Modified: 2013-04-09 19:24 UTC (History)
2 users (show)

See Also:


Attachments

Description Ian 'Hixie' Hickson 2013-02-09 01:26:28 UTC
Fragment identifier parsing in the HTML spec has this crazy thing:

  Let decoded fragid be the result of expanding any sequences of percent-encoded 
  octets in fragid that are valid UTF-8 sequences into Unicode characters as 
  defined by UTF-8. If any percent-encoded octets in that string are not valid 
  UTF-8 sequences (e.g. they expand to surrogate code points), then skip this step 
  and the next one.

Any chance of getting an algorithm for this somehow?

http://www.whatwg.org/specs/web-apps/current-work/#the-indicated-part-of-the-document
Comment 1 Anne 2013-02-12 11:52:15 UTC
So the algorithm you want is:

1. Percent decode /input/ into /bytes/.

2. Run utf-8's decoder on /bytes/. If that emitted an encoder error, return input, otherwise return the result of running utf-8's decoder on /bytes/.

Isn't that simple enough to just put in the HTML specification?
Comment 2 Ian 'Hixie' Hickson 2013-02-13 00:25:57 UTC
Sure, I can do it in the HTML spec if you like.
Comment 3 contributor 2013-04-09 19:24:07 UTC
Checked in as WHATWG revision r7796.
Check-in comment: Update integration with URL spec and Encoding spec.
http://html5.org/tools/web-apps-tracker?from=7795&to=7796