Reported: 2011-12-06
Modified: 2011-12-07
7 users

Leif Halvard Silli 2011-12-06:
Quoting Kornel Lesiński:

> Could <!DOCTYPE html> be an opt-in to default UTF-8 encoding?
> It would be nice to minimize number of declarations a page needs to include.


Such a UA behaviour would, presumably, involve a formalizing a new step in the encoding sniffing algorithm, between the current step 5  and step 6. In essence, the UA would default to UTF-8 if the other meta data fails - see: http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2011-December/034069.html

Presumably, UAs would need to change before the spec could officially allow authors to rely on the DOCTYPE.
Ian 'Hixie' Hickson 2011-12-06:
Try to get a browser vendor to ship it. If they can, I'd be happy to change the spec.
Henri Sivonen 2011-12-07:
(In reply to comment #1)
> Try to get a browser vendor to ship it. If they can, I'd be happy to change the
> spec.

My editorial assistant hat asked my HTML parser module owner hat. Therefore:

We already have *three* backwards-compatible ways to opt into UTF-8. <!DOCTYPE html> isn't one of them. Making the change proposed here would violate the Don't Reinvent the Wheel design principle.

Moreover, I think it's a mistake to bundle a lot of unrelated things into one mode switch instead of having legacy-compatible defaults and having granular ways to opt into legacy-incompatible behaviors. (That is, I think, in retrospect, it's bad that we have a doctype-triggered standards mode with legacy-incompatible CSS defaults instead of having legacy-compatible CSS defaults and CSS properties for opting into different behaviors.)

Making this change would make the encoding selection behavior even more confusing to authors than it is now, since using <!DOCTYPE html> would lead to radically different behavior in old and new browsers.

Furthermore, currently the doctype mode processing happens on the tree builder level but efficient encoding sniffing needs to happen before tokenization. It would be *even* more confusing to have case where encoding sniffing <!DOCTYPE html> and CSS mode sniffing <!DOCTYPE html> get out of sync.

If the author wishes to minimize declarations, (s)he can put the UTF-8 BOM followed immediately by <!DOCTYPE html> at the start of the file.