xs:language and XML 1.03e/RFC 3066bis

We need to open an issue for 2e and 1.1 regarding (possible) changes to xs:language necessary to align with XML 1.03e [1] and RFC 3066bis [2].

The easy one is that XML 1.03e allows the emtpy string as a legal value for xml:lang and is interpreted as "forget any info you knew about language from an ancestor" (note: 2e and before never explicitly allowed it but gave that impression).  The definition we give for xs:language is:

<xs:simpleType name="language" id="language">
   <xs:restriction base="xs:token">
      <xs:pattern value="[a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*"/>
   </xs:restriction>
</xs:simpleType>

So, to allow for the empty string we at least need to change this to:

<xs:simpleType name="language" id="language">
   <xs:restriction base="xs:token">
      <xs:pattern value="([a-zA-Z]{1,8}(-[a-zA-Z0-9]{1,8})*)?"/>
   </xs:restriction>
</xs:simpleType>

The second issue is that RFC 3066 is being revised as we speak...the current draft of 3066bis is at [2].  3066bis puts a number of contraints on the subtags.  There has been some discussion with i18n regarding how exact lexical/value space of xs:language should be made relative to the EBNF in 3066bis [3,4].  At the very least they would like us to change our reference to "3066 or successors".

pvb

[1] http://www.w3.org/TR/2004/REC-xml-20040204/#sec-lang-tag
[2] http://www.ietf.org/internet-drafts/draft-phillips-langtags-05.txt
[3] http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2004Jul/0057.html
[4] http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2004Jul/0096.html

Received on Monday, 16 August 2004 22:35:41 UTC