14680 2011-11-02 16:26:18 +0000 Using windows-1252 instead of the declared encoding iso-8859-1 2015-08-23 07:07:26 +0000 1 1 1 Unclassified HTML Checker General unspecified PC Windows XP RESOLVED WORKSFORME http://www.lovatasinhala.com/iso-8859-1-l.htm P3 normal --- 1 matteosistisette mike+validator ian mike transoral www-validator-cvs oldest_to_newest 59504 0 matteosistisette 2011-11-02 16:26:18 +0000 I'm trying to validate (by url) a page that is encoded with iso-8859-1 character encoding and is declared as iso-8859-1 character encoding. It is html5. The encoding is being declared in the http Content-Type header properly and corrsepond to the encoding actually used AND the one declared in the html. If I choose "detect automatically" as both the character encoding and the doctype, I get this warning: Using windows-1252 instead of the declared encoding iso-8859-1 First of all it is unclear: Is it telling me that the page is encoded with a different encoding than the one declared, or is it telling me that the validator is using a different encoding to decode it? In both cases it doesn't make sense. In the first case, it is plain wrong, because the page IS encoded with iso-8859-1. In the second case, then tha automatic detection doesn't work properly. At the top of the validation page, it says "encoding: iso-8859-1" and "doctype: html5", so it looks like it is the first hypothesis, then the error message is bogus. If I select manually iso-8859-1 instead of detect automatically, then the warning disappears and everything look correct (i do get errors but that's ok because the page does have errors). I can't provide a link to the page, but here's the first part of the content: <!DOCTYPE html> <html> <head> <meta charset="iso-8859-1"> <title>XXXX: Est 71826 1 transoral 2012-08-03 18:32:41 +0000 I have this exact situation. The sample web page (HTML5) is http://www.lovatasinhala.com/iso-8859-1-l.htm I went ahead and included windows-1252 as the character encoding in the HTTP header (via .htacess file). However, the point is that this page does not have any characters outside iso-8859-1, and yet the validator says it is windows-1252 when it is only iso-8859-1. As far as I see the difference between is-8859-1 and windows-1252 is that windows-1252 has the following characters in addition to iso-8859-1. The first column is the single-byte code and the third is the corresponding Unicode codepoint. 80 € 20AC EURO SIGN 82 ‚ 201A SINGLE LOW-9 QUOTATION MARK 83 ƒ 0192 LATIN SMALL LETTER F WITH HOOK 84 „ 201E DOUBLE LOW-9 QUOTATION MARK 85 … 2026 HORIZONTAL ELLIPSIS 86 † 2020 DAGGER 87 ‡ 2021 DOUBLE DAGGER 88 ˆ 02C6 MODIFIER LETTER CIRCUMFLEX ACCENT 89 ‰ 2030 PER MILLE SIGN 8A Š 0160 LATIN CAPITAL LETTER S WITH CARON 8B ‹ 2039 SINGLE LEFT-POINTING ANGLE QUOTATION MARK 8C Œ 0152 LATIN CAPITAL LIGATURE OE 8E Ž 017D LATIN CAPITAL LETTER Z WITH CARON 91 ‘ 2018 LEFT SINGLE QUOTATION MARK 92 ’ 2019 RIGHT SINGLE QUOTATION MARK 93 “ 201C LEFT DOUBLE QUOTATION MARK 94 ” 201D RIGHT DOUBLE QUOTATION MARK 95 • 2022 BULLET 96 – 2013 EN DASH 97 — 2014 EM DASH 98 ˜ 02DC SMALL TILDE 99 ™ 2122 TRADE MARK SIGN 9A š 0161 LATIN SMALL LETTER S WITH CARON 9B › 203A SINGLE RIGHT-POINTING ANGLE QUOTATION MARK 9C œ 0153 LATIN SMALL LIGATURE OE 9E ž 017E LATIN SMALL LETTER Z WITH CARON 9F Ÿ 0178 LATIN CAPITAL LETTER Y WITH DIAERESIS 86436 2 mike 2013-04-21 01:34:48 +0000 I believe that the validator behavior here conforms to the requirements in the HTML5 spec. So if you want to discuss those requirements and/or suggest that they be changed, the best places to have that discussion are public-html@w3.org and whatwg@whatwg.org 86453 3 matteosistisette 2013-04-21 13:10:29 +0000 @Michael[tm] Smith I don't think you read the report carefully. The page is declared as ISO-8859-1 both in the headers and in the meta tag. Why on earth should it be detected as another encoding?!?!?!?!? Where does the html5 standard say such an absurdity? 86454 4 matteosistisette 2013-04-21 13:11:07 +0000 Also note the part about the unclearness of the message. 86908 5 ian 2013-04-26 23:21:33 +0000 matteo, see http://encoding.spec.whatwg.org/ for context