RE: CSS2.1 :lang

At 11:40 03/10/17 +0200, Bert Bos wrote:

>If I understand Richard correctly, he is suggesting that the CSS
>':lang()' selector is treated semantically rather then syntactically.
>In other words, ':lang(en)' means "English," not "a string starting
>with 'en'". That's interesting, but I think it will be too complex.
>Consider this XML-based language, that allows text either in French
>(0) or English (1):
>
>     <MYLITTLELANGUAGE>
>       <WORD LANG="0">arbre</WORD>
>       <WORD LANG="1">tree</WORD>
>     </MYLITTLELANGUAGE>
>
>Then this style rule would turn the word "tree" green:
>
>     WORD:lang(en) { color: green }

I agree that this would be too complex. But you seem to say
that in the above example,

      WORD:lang(0) { color: green }

would turn the word 'tree' green. The current wording could indeed
be interpreted in that way, but it would be a rather bad idea.

Regarding the following text:

"For example, in HTML [HTML40], the language is determined by a combination 
of the "lang" attribute, the META element, and possibly by information from 
the protocol (such as HTTP headers). XML uses an attribute called xml:lang, 
and there may be other document language-specific methods for determining 
the language."

Rather than leaving the association with HTML and XML as examples in
an introductory/explanatory paragraph, the spec should very clearly say
that :lang works for the lang attribute for HTML, and for the xml:lang
attribute in XML. It may also say that other document formats may use
:lang, but they have to define how it is applied to their format. This
should be worded so that it does not give the impression that
this includes XML-based languages (where we have xml:lang
and don't need something else on top of it).

With regards to HTML, the current text mentions META, but I think
this should be removed. The META http-equivalent was designed to
generate HTTP headers, but has never been used that way. The only
place that I know it is used these days is for 'charset'.

The above example should of course be rewritten to read:
     <MYLITTLELANGUAGE>
       <WORD xml:lang='fr'>arbre</WORD>
       <WORD xml:lang='en'>tree</WORD>
     </MYLITTLELANGUAGE>

Which makes this example pointless for the point we are discussing,
but actually useful for styling.

>Wouldn't it be better to simply *recommend* that developers use codes
>as per RFC 3066, even if they only need two languages?

I think the best thing is to say that both HTML (although the spec
hasn't been updated yet) and XML use RFC 3066.
I do not think that the CSS spec should give recommendations to
developers of new document languages.


>How about the text I proposed earlier, but with an additional note
>(i.e., not normative):
>
>     The pseudo-class ':lang(C)' matches if the element is in language
>     C. CSS doesn't define what are valid language names and the string
>     C doesn't have to be a valid language name in the source document.
>     It is matched the same way as for the '|=' operator.

Actively suggesting "doesn't have to be a valid language name"
will lead people to bad ideas that we don't want them to have.
Better: "CSS doesn't define what are valid language names, this
is defined by the host language."
(I hope 'host language' is the right term here).


>     Note: It is recommended, however, that documents and protocols
>     indicate language using codes from RFC 3066 [RFC3066] or its
>     successor, and by means of "xml:lang" attributes in the case of
>     XML-based documents [XML].

This sounds good, but is dangerous. XML defines that xml:lang is
used for XML document. CSS should just say that when styling
XML documents, :lang applies to xml:lang, which uses RFC 3066.
There are occasionally some cases where other attributes are
used to indicate language in an XML format, but CSS implementations
just don't know.


Regards,    Martin.

Received on Friday, 17 October 2003 14:39:10 UTC