RE: CSS2.1 :lang

Richard Ishida writes:
> Bert Bos writes:
> > Tex Texin writes:
> > > For the purposes of matching, I wonder if it makes sense to 
> > reference 
> > > the RFCs at all. Isn't it really string matching based on strings 
> > > formatted with hyphen separators? Does any software verify that the 
> > > language tag contains appropriately registered codes or uses ISO 
> > > codes? Should it be an error, or perhaps the rule ignored, if a CSS 
> > > document specifies  :lang(k9) since k9 is not an offical 
> > language code 
> > > or a properly formatted private code.
> > 
> > I like that suggestion: it removes a dependency.
> > 
> > The definition of the "|=" operator is already generic. It 
> > only requires a UA to split a string value at every "-" and 
> > doesn't require the string to be a valid language. The 
> > ':lang()' refers to that definition and could be made generic 
> > as well, e.g.:
> > 
> > Current text in 5.11.4:
> > 
> >     The pseudo-class ':lang(C)' matches if the element is in language
> >     C. Here C is a language code as specified in HTML 4.0 [HTML40] and
> >     RFC 1766 [RFC1766]. It is matched the same way as for the '|='
> >     operator.
> > 
> > Proposed:
> > 
> >     The pseudo-class ':lang(C)' matches if the element is in language
> >     C. CSS doesn't define what are valid language names and the string
> >     C doesn't have to be a valid language name in the source document.
> >     It is matched the same way as for the '|=' operator.
> 
> 
> I disagree with this proposed para.  I think you are throwing out the
> baby with the bath water.  
> 
> I see the value of referring to RFC3066 is to ensure maximum
> standardisation/interoperability in the way language codes are used.
> For example, 3066 requires the use of 2-letter codes rather than
> 3-letter codes wherever they exist.  This is important advice for
> interoperability. 3066 also says that you should use ISO codes rather
> than some arbitrary label where it exists. Etc.
> 
> I think the original text was defining how one should label languages in
> CSS, not just how the matching should work.  And I think it is important
> to retain the former, though the text could certainly be reworded so as
> to separate the two ideas, remove the HTML reference and refer to
> RFC3066.

If I understand Richard correctly, he is suggesting that the CSS
':lang()' selector is treated semantically rather then syntactically.
In other words, ':lang(en)' means "English," not "a string starting
with 'en'". That's interesting, but I think it will be too complex.
Consider this XML-based language, that allows text either in French
(0) or English (1):

    <MYLITTLELANGUAGE>
      <WORD LANG="0">arbre</WORD>
      <WORD LANG="1">tree</WORD>
    </MYLITTLELANGUAGE>

Then this style rule would turn the word "tree" green:

    WORD:lang(en) { color: green }

Wouldn't it be better to simply *recommend* that developers use codes
as per RFC 3066, even if they only need two languages?

How about the text I proposed earlier, but with an additional note
(i.e., not normative):

    The pseudo-class ':lang(C)' matches if the element is in language
    C. CSS doesn't define what are valid language names and the string
    C doesn't have to be a valid language name in the source document.
    It is matched the same way as for the '|=' operator.

    Note: It is recommended, however, that documents and protocols
    indicate language using codes from RFC 3066 [RFC3066] or its
    successor, and by means of "xml:lang" attributes in the case of
    XML-based documents [XML]. See "FAQ: Two-letter or three-letter
    language codes."[1]

    [1] http://www.w3.org/International/questions/qa-lang-2or3.html

(replaces the 2nd para in http://www.w3.org/TR/CSS21/selector.html#lang)



Bert
-- 
  Bert Bos                                ( W 3 C ) http://www.w3.org/
  http://www.w3.org/people/bos/                              W3C/ERCIM
  bert@w3.org                             2004 Rt des Lucioles / BP 93
  +33 (0)4 92 38 76 92            06902 Sophia Antipolis Cedex, France

Received on Friday, 17 October 2003 05:40:37 UTC