This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 7380 - Suggest heuristic detection of UTF-8
Summary: Suggest heuristic detection of UTF-8
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
Keywords: NE
Depends on:
Reported: 2009-08-20 07:29 UTC by Maciej Stachowiak
Modified: 2010-10-04 14:32 UTC (History)
3 users (show)

See Also:


Description Maciej Stachowiak 2009-08-20 07:29:12 UTC
Step 6 of the encoding detection algorithm should specifically suggest the possibility of algorithmically detecting UTF-8. Here is some suggested wording from the I18N WG:

"Note: The UTF-8 encoding has a highly detectable bit pattern. Documents that contain bytes > 0x7F which match the UTF-8 pattern are very likely to be UTF-8, while documents that do not match it definitely are not. While not full autodetection, it may be appropriate for a user-agent to search for this common encoding."