Bugzilla – Bug 15076
Make UAs use UTF-8 as fallback encoding if the page has a HTML5 doctype
Last modified: 2011-12-07 08:19:06 UTC
Quoting Kornel Lesiński:
> Could <!DOCTYPE html> be an opt-in to default UTF-8 encoding?
> It would be nice to minimize number of declarations a page needs to include.
Such a UA behaviour would, presumably, involve a formalizing a new step in the encoding sniffing algorithm, between the current step 5 and step 6. In essence, the UA would default to UTF-8 if the other meta data fails - see: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-December/034069.html
Presumably, UAs would need to change before the spec could officially allow authors to rely on the DOCTYPE.
Try to get a browser vendor to ship it. If they can, I'd be happy to change the spec.
(In reply to comment #1)
> Try to get a browser vendor to ship it. If they can, I'd be happy to change the
My editorial assistant hat asked my HTML parser module owner hat. Therefore:
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the tracker issue; or you may create a tracker issue
yourself, if you are able to do so. For more details, see this document:
Change Description: no spec change
We already have *three* backwards-compatible ways to opt into UTF-8. <!DOCTYPE html> isn't one of them. Making the change proposed here would violate the Don't Reinvent the Wheel design principle.
Moreover, I think it's a mistake to bundle a lot of unrelated things into one mode switch instead of having legacy-compatible defaults and having granular ways to opt into legacy-incompatible behaviors. (That is, I think, in retrospect, it's bad that we have a doctype-triggered standards mode with legacy-incompatible CSS defaults instead of having legacy-compatible CSS defaults and CSS properties for opting into different behaviors.)
Making this change would make the encoding selection behavior even more confusing to authors than it is now, since using <!DOCTYPE html> would lead to radically different behavior in old and new browsers.
Furthermore, currently the doctype mode processing happens on the tree builder level but efficient encoding sniffing needs to happen before tokenization. It would be *even* more confusing to have case where encoding sniffing <!DOCTYPE html> and CSS mode sniffing <!DOCTYPE html> get out of sync.
If the author wishes to minimize declarations, (s)he can put the UTF-8 BOM followed immediately by <!DOCTYPE html> at the start of the file.