This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
I'm trying to validate (by url) a page that is encoded with iso-8859-1 character encoding and is declared as iso-8859-1 character encoding. It is html5. The encoding is being declared in the http Content-Type header properly and corrsepond to the encoding actually used AND the one declared in the html. If I choose "detect automatically" as both the character encoding and the doctype, I get this warning: Using windows-1252 instead of the declared encoding iso-8859-1 First of all it is unclear: Is it telling me that the page is encoded with a different encoding than the one declared, or is it telling me that the validator is using a different encoding to decode it? In both cases it doesn't make sense. In the first case, it is plain wrong, because the page IS encoded with iso-8859-1. In the second case, then tha automatic detection doesn't work properly. At the top of the validation page, it says "encoding: iso-8859-1" and "doctype: html5", so it looks like it is the first hypothesis, then the error message is bogus. If I select manually iso-8859-1 instead of detect automatically, then the warning disappears and everything look correct (i do get errors but that's ok because the page does have errors). I can't provide a link to the page, but here's the first part of the content: <!DOCTYPE html> <html> <head> <meta charset="iso-8859-1"> <title>XXXX: Est
I have this exact situation. The sample web page (HTML5) is http://www.lovatasinhala.com/iso-8859-1-l.htm I went ahead and included windows-1252 as the character encoding in the HTTP header (via .htacess file). However, the point is that this page does not have any characters outside iso-8859-1, and yet the validator says it is windows-1252 when it is only iso-8859-1. As far as I see the difference between is-8859-1 and windows-1252 is that windows-1252 has the following characters in addition to iso-8859-1. The first column is the single-byte code and the third is the corresponding Unicode codepoint. 80 € 20AC EURO SIGN 82 ‚ 201A SINGLE LOW-9 QUOTATION MARK 83 ƒ 0192 LATIN SMALL LETTER F WITH HOOK 84 „ 201E DOUBLE LOW-9 QUOTATION MARK 85 … 2026 HORIZONTAL ELLIPSIS 86 † 2020 DAGGER 87 ‡ 2021 DOUBLE DAGGER 88 ˆ 02C6 MODIFIER LETTER CIRCUMFLEX ACCENT 89 ‰ 2030 PER MILLE SIGN 8A Š 0160 LATIN CAPITAL LETTER S WITH CARON 8B ‹ 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK 8C Œ 0152 LATIN CAPITAL LIGATURE OE 8E Ž 017D LATIN CAPITAL LETTER Z WITH CARON 91 ‘ 2018 LEFT SINGLE QUOTATION MARK 92 ’ 2019 RIGHT SINGLE QUOTATION MARK 93 “ 201C LEFT DOUBLE QUOTATION MARK 94 ” 201D RIGHT DOUBLE QUOTATION MARK 95 • 2022 BULLET 96 – 2013 EN DASH 97 — 2014 EM DASH 98 ˜ 02DC SMALL TILDE 99 ™ 2122 TRADE MARK SIGN 9A š 0161 LATIN SMALL LETTER S WITH CARON 9B › 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 9C œ 0153 LATIN SMALL LIGATURE OE 9E ž 017E LATIN SMALL LETTER Z WITH CARON 9F Ÿ 0178 LATIN CAPITAL LETTER Y WITH DIAERESIS
I believe that the validator behavior here conforms to the requirements in the HTML5 spec. So if you want to discuss those requirements and/or suggest that they be changed, the best places to have that discussion are public-html@w3.org and whatwg@whatwg.org
@Michael[tm] Smith I don't think you read the report carefully. The page is declared as ISO-8859-1 both in the headers and in the meta tag. Why on earth should it be detected as another encoding?!?!?!?!? Where does the html5 standard say such an absurdity?
Also note the part about the unclearness of the message.
matteo, see http://encoding.spec.whatwg.org/ for context