character sets RE: simple language testable thing

The default character set for HTML is undefined, and the HTML spec says "user
agents must not assume any default" --
http://www.w3.org/TR/html4/charset.html#h-5.2.2

HTTP claims that the default is iso-8859-1 - which is a pain, because for XML
the default is Unicode. This sets up a conflict :(

This is a slightly messy issue, but it is possible to determine that
characters outside the range of a particular language are being used. HTML
allows user agents to guess that character set, without specifying any set of
reasonable guesses. Unfortunately most people get it wrong, asserting for
example that American english can be written in ASCII (which is false), and
the tools are still not that great either.

I hope Richard Ishida or someone very versed in these issues can help
clarify.

Cheers

Chaals

On Wed, 4 Feb 2004, Jens Meiert wrote:

>
>> In Hebrew (for once ) this is easy.
>> A foreign word is written in a different character set.
>
>CMIIW, but since the UCS (Universal Character Set, often referred to as
>Unicode) is the document character set for HTML/XML, they (foreign words) ain't
>written in a different character set.
>
>Again referring to to John (see my last post [1]) I claim this is an issue
>where unimpaired users are affected as well. Also, I don't see any need for
>ruling language use by the WAI WG (there already was such a discussion a few
>months ago [2] ;).
>
>
>All the best,
> Jens.
>
>
>[1] http://lists.w3.org/Archives/Public/w3c-wai-gl/2004JanMar/0169.html
>[2] http://lists.w3.org/Archives/Public/w3c-wai-gl/2003OctDec/0411.html
>
>
>--
>Jens Meiert
>Interface Architect
>
>http://meiert.com/
>

Charles McCathieNevile  http://www.w3.org/People/Charles  tel: +61 409 134 136
SWAD-E http://www.w3.org/2001/sw/Europe         fax(france): +33 4 92 38 78 22
 Post:   21 Mitchell street, FOOTSCRAY Vic 3011, Australia    or
 W3C, 2004 Route des Lucioles, 06902 Sophia Antipolis Cedex, France

Received on Wednesday, 4 February 2004 05:36:05 UTC