Re: [HTML5] 2.8 Character encodings

On Mon, 6 Jul 2009, Dr. Olaf Hoffmann wrote:
> 
> in the current draft are mentioned in 2.8
> http://www.w3.org/TR/2009/WD-html5-20090423/infrastructure.html#character-encodings-0
> some 'willful' misinterpretations of encoding information, for example 
> to interprete a string like 'ISO-8859-1' as 'Windows-1252'.
>
> 1. Which string has an author to note, if he really wants to indicate, that
> the encoding is for example 'ISO-8859-1' and not 'Windows-1252'?

"ISO-8859-1". If the author has really used that encoding, then there is 
no difference between them (1252 is a superset).


> 2. As far as I have seen, HTML5 has no version indication like previous
> versions of HTML had and other popular formats like SVG have.
> How can a browser identify, that a document is really intended as
> 'HTML5' with the implicated  'willful' misinterpretations of encoding
> information and no other HTMLversion?

It doesn't matter, all versions of HTML are in practice processed with 
these mappings. It is indeed why HTML5 has these mappings -- because 
browsers already did this. We wouldn't add these mappings if we didn't 
have to to handle legacy content (content in previous versions of HTML).


> Assuming that a viewer is able to identify a document somehow being a 
> HTML5 document after looking into the content and for example a server 
> sended 'ISO-8859-1' before, does this mean, that the viewer switches to 
> or reparses the document with 'Windows-1252' again?

I don't understand the question.


> Obviously it would be better to avoid such misinterpretation by using an 
> encoding like UTF-8 not confused by the current HTML5 draft, however due 
> to the history of older projects or server configurations it might be 
> still convenient for many authors to continue to use 'ISO-8859-1' 
> instead of other encodings, even if they switch for example from HTML4 
> to HTML5 for some documents.

Hopefully my answers above will reassure you that this is not in fact a 
problem that authors will face.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 20 July 2009 08:57:25 UTC