RE: [selectors-api] Selectors API I18N Review... from Richard Ishida on 2009-01-30 (public-i18n-core@w3.org from January to March 2009)

From: Richard Ishida <ishida@w3.org>
Date: Fri, 30 Jan 2009 13:53:53 -0000
To: "'Martin Duerst'" <duerst@it.aoyama.ac.jp>, "'Phillips, Addison'" <addison@amazon.com>, <public-i18n-core@w3.org>
Cc: "'fantasai'" <fantasai.lists@inkedblade.net>, "'Lachlan Hunt'" <lachlan.hunt@lachy.id.au>
Message-ID: <003701c982e2$30898b30$919ca190$@org>

Hi Martin,

> -----Original Message-----
> From: Martin Duerst [mailto:duerst@it.aoyama.ac.jp]
> Sent: 30 January 2009 09:11
...
> At 03:32 09/01/30, Richard Ishida wrote:
> >
> >Following on from our discussion at yesterday's telecon, I did some
> >research into whether major browsers actually do normalise selector and
> >class names for matching.  The answer is that they don't.
> 
> I could have told you. The visibility of this issue is extremely low.
> It only applies to languages such as Vietnamese, where both
> precomposed and (half-)decomposed forms are widely used,
> and only if element or attribute names use these characters,
> which by itself is very rare
> (there is also the attribute value case, but that's still
> not very widely supported in browsers as far as I understand).

To be honest I wasn't particularly concerned with the case where element and
attribute names are involved.  The case for class names and ids is much more
pressing (and that's what the tests revolve around), and in my mind those
are very prone to be in-language and not rare at all as the Web rolls out
internationally.  And the 83 million inhabitants of Vietnam are not the only
people who face this issue.  There are many languages that use combining
characters, including the Latin script based languages of Africa and
aboriginal North America, most scripts of Asia, etc., and one can't always
guarantee that the input methods used for those languages will always create
text in one given form vis a vis normalization.

...
> The 'different people working on the CSS and the markup' may indeed
> be a possible scenario, and things could go wrong in particular if
> e.g. the CSS designers work on a Mac and the text is prepared on
> Windows, but then developers in Vietnam should be aware of this
> issue, they will bump into it much earlier, e.g. when doing text
> searching in editors,... My guess is that information on this
> is also available rather easily in Vietnamese, for English,
> see e.g. http://vietunicode.sourceforge.net/main.html.

It's one thing to be aware of the problem, but another to be able to deal
with it.  First, if someone else is writing the CSS and you are writing the
HTML, you have to know something about normalization, then you have to work
out what approach the CSS guy used for class names (which could even vary
from name to name depending on the input method), but then you have to match
his method.  

That means that if you're using a Mac and he was using Windows you'll need
to convert your names to partially normalized form.  That requires an
additional level of fiddling, assuming that you know how to find a way to
actually achieve it (you might need a different input method, or need to
change the settings of your editor, and if the CSS text isn't consistent in
the way combining characters are used or ordered this could be much more
problematic). 

Highly technical people like you may be able to figure all this out, but CSS
and HTML aren't designed to be used just by highly technical people.
Secondly, you shouldn't have to examine bytes to write CSS, this is just a
nuisance when you just want to type the same word as the other guy did.
That's what normalization is for - recognising that canonically equivalent
text is actually the same.  Normalizing the data before lookup would remove
all those issues.

Cheers,
RI

Received on Friday, 30 January 2009 13:53:57 UTC