Re: CSS2.1 :lang from Jukka K. Korpela on 2003-10-17 (www-style@w3.org from October 2003)

From: Jukka K. Korpela <jkorpela@cs.tut.fi>
Date: Fri, 17 Oct 2003 11:48:41 +0300 (EEST)
To: Chris Lilley <chris@w3.org>
Cc: Bert Bos <bert@w3.org>, Tex Texin <tex@i18nguy.com>, www-style@w3.org, W3c I18n Group <w3c-i18n-ig@w3.org>
Message-ID: <Pine.GSO.4.58.0310171106140.15855@korppi.cs.tut.fi>
On Thu, 16 Oct 2003, Chris Lilley wrote:

> JKK> Anyway, what the XML specification says about the xml:lang attribute is
> JKK> that "The values of the attribute are language identifiers as defined by
> JKK> [IETF RFC 1766], Tags for the Identification of Languages, or its
> JKK> successor on the IETF Standards Track."
>
> Please also look at the XML 1.0 eratta, and the XML 1.1 specification.

Good grief. I thought that it was unique to CSS specifications to make
changes in an "Errata", but the XML 1.0 "Errata" is apparently similar.
We have been given a _specification_ that is officially approved by the
W3C, containing a reference to an Errata, which says:
"This document records all known errors in - -"
but actually contains substantial _changes_ to the content of the
specification. It is left to readers to distinguish between typo fixes,
wording clarifications, and material changes.

So people who naively think they are reading the official specification
will be mislead. The specification may change at any moment, just by a
change to the "Errata", with no announcement before or after. And we don't
even have a copy of the specification as changed by the "Errata".
Yet the specification claims:
"It is a stable document and may be used as reference material or cited as
a normative reference from another document."

And there is no XML 1.1 specification.(There is a candidate dated
15 October 2002; it says: "It is inappropriate to cite this document as
other than 'work in progress.'")

> JKK> I see no way how an empty string
> JKK> could be interpreted as an accepted value for the attribute.
>
> I do, but then I am reading later specs than you seem to be.

I was reading the document that is announced by the W3C as a
specification.

> JKK> By the HTML 4.* specification,
>
> (who cares!) its being phased out in favour of the one that the rest
> of xml uses.

I do care. HTML 4 is the only specification for the semantics of HTML
elements and attributes; XHTML 1.0 just what it says (though the hype says
otherwise): a reformulation in XML or, rather, a reformulation of the
_syntax_ of HTML 4.

> JKK> the default value of the lang attribute is JKK> unknown. This is
> really mystical, but it seems to postulate that there JKK> _is_ a
> default value.
>
> One which was not possible to put in the serialisation, so yes
> previously rather mystical. In particular, once it was set on some
> element, it could not be undet on any children. Thats what xml:lang=""
> does.

Why would it need to be unset? You can use either an appropriate language
code, or one of the indicators "und" and "mul". The argumentation in the
XML 1.0 "errata" is very obscure - it looks like they decided on "" and
then tried to explain why it was needed. If there was a need for yet
another special code, it should have been formulated and proposed in the
appropriate process. But there wasn't; "und" is perhaps not optimally
clearly defined in ISO 639-2, but it's there for uses just like this.

> JKK> In practical terms, :lang is pointless until support to language markup
> JKK> in browsers becomes worth mentioning.
>
> I don't follow your point, unless you think that xml:lang is solely something
> to do with styling.

I was referring to :lang selectors in CSS. Sorry for not being clear
enough here.

> Its not; its also of use for searching, spell
> checking, speech synthesis, and so forth.

I know the arguments. Yet, actual use of lang and xml:lang attributes is
very limited, and partly _wrong_. Try using lang="ru" for transliterated
Russian text and view the page on IE and you probably see what I mean.
(It is a fundamental flaw in language markup that there is no way to
indicate the writing system. But language does not change when the letters
are transliterated, does it?)

> JKK>  Since the whole point in CSS 2.1
> JKK> is to define a practical subset of CSS 2.0, I don't see why :lang is kept
> JKK> there at all.
>
> Possibly because, at least in theory, CSS2.1 is not restricted to
> buggy HTML browsers that have not changed much over the last 4 years.
> Instead, its all CSS implementations.

Really? So what is the point of CSS 2.1 then? Why have so many CSS 2.0
features been removed from it?

> JKK> Besides, the actual meaning of language markup is still obscure.
> JKK> The whole thing is vaguely defined, little used, and little
> JKK> supported,
>
> I invite you to back up those claims.

OK, see http://www.cs.tut.fi/~jkorpela/kielimerkkaus/
It's in Finnish, so it might not be optimally accessible to you.
Just to summarize a few points:
- the writing system problem I mentioned above
- the conflicts between the various meanings and purposes of language
  markup; example: if a document (in a language other than English)
  discusses CSS and mentions, say, the property name vertical-align,
  should it be marked up as being in English (thereby making suitable
  pronunciation possible, but confusing spelling and grammar checkers,
  since it does not really obey normal English rules)
- how do you deal with words and expressions that are commonly
  used in other languages - is "fiancé", when used in English text,
  a French word? what about "status quo"
  (such problems don't exist when language codes are used e.g. as
  for bibliographic purposes; but as you get down to individual
  words and even morphemes, marking up _all_ language changes as
  WCAG 1.0 requires, it's a huge conceptual problem, in addition
  to being quite some work in practice)
- what do you do with words that contain parts from different
  languages?
- how do declare the language of data in attribute (e.g.
  title="..." attributes), as required by WCAG 1.0?
- by W3C example, names are not marked up as being in their
  respective languages; what might justify this, in the light
  of reasons presented for language markup in general.

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Friday, 17 October 2003 04:52:30 UTC