This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Step 6 of the encoding detection algorithm should specifically suggest the possibility of algorithmically detecting UTF-8. Here is some suggested wording from the I18N WG: "Note: The UTF-8 encoding has a highly detectable bit pattern. Documents that contain bytes > 0x7F which match the UTF-8 pattern are very likely to be UTF-8, while documents that do not match it definitely are not. While not full autodetection, it may be appropriate for a user-agent to search for this common encoding."