HTML+CSS Internationalization Tests
Character encodings
This page groups together pages being developed by the Internationalization Core Working Group to assess internationalization support of user agents. These tests are still under development and should not be taken as final.
Note that Internationalization WG tests do not only test conformance with W3C standards. In some cases the tests also allow for exploration of the behavior of user agents in ways not described by the standards.
HTML
Basic tests [results]
- HTTP charset declaration
Setting the HTTP header charset declaration will will affect the encoding of a page.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - UTF-8 BOM
A page with no encoding declarations, but with a UTF-8 signature will be recognized as UTF-8.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - UTF-16LE BOM
A page with no encoding declarations, but with a UTF-16 little-endian BOM will be recognized as UTF-16.
HTML5 XHTML5 - UTF-16BE BOM
A page with no encoding declarations, but with a UTF-16 big-endian BOM will be recognized as UTF-16.
HTML5 XHTML5 - XML declaration
Setting the encoding in the XML declaration will affect the encoding of a page served as XML, but will not affect pages served as text/html.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - meta Content-Type charset declaration
Setting the encoding in the meta Content-Type element will affect the encoding of a page served as text/html, but will not affect pages served as XML.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - HTML5 meta charset declaration
Setting the encoding in the HTML5 meta charset element will affect the encoding of a page served as text/html, but will not affect pages served as XML.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - charset on an a element
A link to a page using the a element with a charset attribute will cause a page with no other encoding information to be rendered using the encoding in the charset attribute.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1
Precedence [results]
- HTTP vs UTF-8 BOM
If the HTTP header of a page is not set to UTF-8, a UTF-8 BOM will not cause a file to be treated as UTF-8.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - HTTP vs UTF-16LE BOM
If the HTTP header of a page is not set to UTF-16, a UTF-16 little-endian BOM will not cause a file to be recognized as UTF-16.
HTML5 XHTML5 - HTTP vs UTF-16BE BOM
If the HTTP header of a page is not set to UTF-16, a UTF-16 big-endian BOM will not cause a file to be recognized as UTF-16.
HTML5 XHTML5 - HTTP vs XML declaration
The HTTP header has a higher precedence than an XML declaration.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - HTTP vs meta Content-Type
The HTTP header has a higher precedence than a meta Content-Type encoding declaration.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - HTTP vs HTML5 meta
The HTTP header has a higher precedence than an HTML5 meta encoding declaration.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - UTF-8 BOM vs meta Content-Type
A page with a UTF-8 BOM will be recognized as UTF-8 even if the meta Content-Type declares a different encoding.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - UTF-8 BOM vs HTML5 meta charset
A page with a UTF-8 BOM will be recognized as UTF-8 even if the HTML5 meta charset declares a different encoding.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - UTF-16LE BOM vs XML declaration
A page with a UTF-16 little-endian BOM will be recognized as UTF-16 even if the XML declaration declares a different encoding.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - UTF-16LE BOM vs meta Content-Type
A page with a UTF-16 little-endian BOM will be recognized as UTF-16 even if the meta Content-Type element declares a different encoding.
HTML5 XHTML5 - UTF-16LE BOM vs HTML5 meta charset
A page with a UTF-16 little-endian BOM will be recognized as UTF-16 even if the meta charset element declares a different encoding.
HTML5 XHTML5 - XML declaration vs meta Content-Type
The XML declaration has a higher precedence than a meta Content-Type encoding declaration for pages served as XML, but not for pages served as text/html.
HTML5 XHTML5 - XML declaration vs HTML5 meta
The XML declaration has a higher precedence than an HTML5 meta encoding declaration for pages served as XML, but not for pages served as text/html.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - meta Content-Type, then HTML5 meta
A meta Content-Type encoding declaration has a higher precedence than a following HTML5 meta encoding declaration.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - HTML5 meta, then meta Content-Type
An HTML5 meta encoding declaration has a higher precedence than a following meta Content-Type encoding declaration.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1
Escapes [results]
- hex ncr
A hexadecimal numeric character reference produces the intended character.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - decimal ncr
A decimal numeric character reference produces the intended character.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - lower-case entity
A lower case character entity reference produces the intended character.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - upper-case entity
An upper case character entity reference produces the intended character.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - supplementary character
An hexadecimal numeric reference containing the Unicode code point of a supplementary character produces the intended character.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - hex ncr outside range of charset
An hexadecimal numeric reference containing the Unicode code point of a character which is not supported by the current character encoding still produces the intended character.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - decimal ncr outside range of charset
A decimal numeric reference containing the Unicode code point of a character which is not supported by the current character encoding still produces the intended character.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - character entity outside range of charset
A character entity reference for a Unicode code point which is not supported by the current character encoding still produces the intended character.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - hex ncr for C1 position of euro sign
An hexadecimal numeric reference containing the code point for the euro in the Windows 1252 code page should not produce a euro sign.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - decimal ncr for C1 position of euro sign
A decimal numeric reference containing the code point for the euro in the Windows 1252 code page should not produce a euro sign.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1
C1 controls [results]
- C1 code points in ISO 8859-1
When an ISO 8859-1 encoded document contains code points in its C1 control range that correspond to graphic characters in the Windows 1252 encoding, these are not displayed as Windows 1252 characters, and the user agent continues to otherwise handle this document as ISO 8859-1.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - C1 code points in ISO 8859-15
When an ISO 8859-15 encoded document contains code points in its C1 control range that correspond to graphic characters in the Windows 1252 encoding, these are not displayed as Windows 1252 characters, and the user agent continues to otherwise handle this document as ISO 8859-15.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - C1 code points in cp1256
When a Windows cp1256 encoded document contains code points in its C1 control range that correspond to graphic characters in the Windows 1252 encoding, these are not displayed as Windows 1252 characters, and the user agent continues to otherwise handle this document as Windows cp1256.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1
CSS
HTTP declarations [results]
- html utf8, css http iso1
The user agent respects the encoding of a css stylesheet declared in HTTP.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - html utf8, css http iso15
The user agent respects the encoding of a css stylesheet declared in HTTP.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - html iso1, css http utf8
The user agent respects the encoding of a css stylesheet declared in HTTP.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1
@charset declarations [results]
- html utf8, css @charset iso1
The user agent respects the encoding of a css stylesheet declared in an @charset rule.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - html utf8, css @charset iso15
The user agent respects the encoding of a css stylesheet declared in an @charset rule.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - html iso1, css @charset utf8
The user agent respects the encoding of a css stylesheet declared in an @charset rule.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1
link charset [results]
- html utf8, link charset iso1
The user agent respects the encoding of a css stylesheet declared in a charset attribute on the HTML link.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - html utf8, link charset iso15
The user agent respects the encoding of a css stylesheet declared in a charset attribute on the HTML link.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - html iso-8859-1, link charset utf8
The user agent respects the encoding of a css stylesheet declared in a charset attribute on the HTML link.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1
Inheritance from HTML encoding [results]
- html iso1
The user agent applies the encoding of the HTML file to a css stylesheet whose encoding is not otherwise declared.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - html iso15
The user agent applies the encoding of the HTML file to a css stylesheet whose encoding is not otherwise declared.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - html utf8
The user agent applies the encoding of the HTML file to a css stylesheet whose encoding is not otherwise declared.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1
UTF-8 signature [results]
- html iso-8859-1, css bom
The user agent uses a UTF-8 signature without an @charset at the beginning of a css stylesheet to indicate that the encoding is UTF-8.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - html iso-8859-15, css bom
The user agent uses a UTF-8 signature without an @charset at the beginning of a css stylesheet to indicate that the encoding is UTF-8.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1
@charset and BOM [results]
- html iso-8859-1, css bom and @charset utf8 (in lowercase)
When a stylesheet has a BOM and an @charset declaration that is for the right Unicode encoding, the stylesheet works.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - html iso-8859-1, css bom and @charset utf8 (in uppercase)
When a stylesheet has a BOM and an @charset declaration that is for the right Unicode encoding, the stylesheet works.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - html iso-8859-15, css bom and @charset iso15
When a stylesheet has a BOM and a @charset declaration that is not for a Unicode encoding, the stylesheet fails.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1
Unknown encoding [results]
- html utf-8, css @charset unknown
When a stylesheet has a @charset declaration with an unknown value, the stylesheet should be ignored.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1
Typos [results]
- no semicolon at end of charset rule
If a @charset declaration is missing a final semicolon, the encoding declaration will not be recognised.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - extra spaces after @charset
If a @charset declaration has more than one space after 'charset', the encoding declaration will not be recognised.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - extra spaces before colon in charset rule
If a @charset declaration has spaces just before the colon, the encoding declaration will not be recognised.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - linebreak in middle of charset rule
If a @charset declaration has a line break in the middle, the encoding declaration will not be recognised.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - single quotes around charset name
If a @charset declaration value has single, rather than double, quotes around it, the encoding declaration will not be recognised.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - blank line before @charset
If a @charset declaration is not on the first line of the file, the encoding declaration will not be recognised.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - blank spaces before @charset
If a @charset declaration does not start at the beginning of the first line of the file (when there is no BOM), the encoding declaration will not be recognised.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1
Precedence [results]
- http vs. @charset
An HTTP encoding declaration for a stylesheet takes precedence over an @charset declaration.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - http vs. charset link
An HTTP encoding declaration for a stylesheet takes precedence over a charset attribute link declaration.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - http vs. bom
An HTTP encoding declaration for a stylesheet takes precedence over a UTF-8 signature.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1 - @charset vs. link charset
An HTTP encoding declaration for a stylesheet takes precedence over a UTF-8 signature.
HTML4 HTML5 XHTML1.0html XHTML1.0xml XHTML5 XHTML1.1