This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Section 3.4.3, 1st para: "language ... as defined by [RFC 3066]or its successor(s) in the IETF Standards Track." RFC 3066 is *not* on the IETF Standards Track, it is BCP 47. Its successor is now approved but not yet published (no RFC number yet, known as 3066bis). We recommend to change the reference to BCP 47 (http://www.rfc-editor.org/rfc/bcp/bcp47.txt), which will in due time point to the new RFC. This would require no other change to that section, but ensure that the enriched semantics provided by 3066bis, and the new language subtag registry (which is already live) are taken into account.
Thank you for the assistance. It looks as if the reference to BCP 47 will do exactly what we need. One question does arise, though, as I look at BCP 47 / RFC 4646. I note that it provides an ABNF definition for the syntax of well-formed language tags which is significantly more restrictive than that of RFC 3066, which is repeated in the draft under review. Should XML Schema 1.1 define the lexical space of xsd:language using the grammar of RFC 3066, or of RFC 4646? Or what? Does RFC 4646 allows itself to be more restrictive than RFC 3066 because there is some evidence that the extra restrictions will not actually invalidate existing data? Or would we risk inconveniencing users of XML Schema and of schemas written with XML Schema, if we shifted to the more restrictive grammar? Speaking for myself, I think the XML Schema WG would benefit from knowing your views and those of the i18n WG on this matter. Thanks.
Hello Michael, We discussed this issue at http://www.w3.org/2007/09/18-core-minutes#item07 . We would like to propose that you use the ABNF defined in RFC 4646. This ABNF is stable. The updates of BCP 47 (which will lead to a new RFC obsoleting RFC 4646) are only about adoption of certain values for the extlang subtag, see http://tools.ietf.org/html/rfc4646#section-2.2.2 and the charter of the LTRU WG at http://www.ietf.org/html.charters/ltru-charter.html . Mainly terms of references, I would propose the following changes in sec. 3.4.3: /START proposal sec. 3.4.3/ [Definition:] language represents formal natural language identifiers, as defined by [BCP 47]. The value space and lexical space of language are the set of all strings that conform to the ABNF (here RFC 4646 grammar) This is the set of strings accepted by the grammar given in [RFC 4646], the RFC which currently represents [BCP 47]. The base type of language is token. Note: The regular expression above provides the only normative constraint on the lexical and value spaces of this type. The additional constraints imposed on language identifiers by [BCP 47], and in particular their requirement that language codes be registered with IANA or ISO if not given in ISO 639, are not part of this datatype as defined here. Note: [BCP 47] specifies that language tags and sub tags "are to be treated as case insensitive: there exist conventions for the capitalization of some of the subtags, but these MUST NOT be taken to carry meaning." For instance, [ISO 3166] recommends that country codes are capitalized (MN Mongolia), while [ISO 639] recommends that language codes are written in lower case (mn Mongolian). Since the language datatype is derived from string, it inherits from string a one-to-one mapping from lexical representations to values. The literals 'MN' and 'mn' therefore correspond to distinct values and have distinct canonical forms. Users of this specification should be aware of this fact, the consequence of which is that the case-insensitive treatment of language values prescribed by [BCP 47] does not follow from the definition of this datatype given here; applications which require case-sensitivity should make appropriate adjustments. /END proposal sec. 3.4.3/ Since the RFC 3066 ABNF was rather lax and users were not punished for producing useless language tags (like "English-England"), we see the danger that the more restrictive grammar of RFC 4646 leads to more useful, but unexpected results. To make people aware of this situation, I would propose the following note as a health warning, with a non-normative reference to RFC 3066: "The ABNF defined in the predecessor of RFC 4646, RFC 3066, was rather lax. Users were not punished for producing ABNF-compliant, but otherwise useless language tags. In contrast, the more restrictive grammar in RFC 4646 is more appropriate for creating language tags. However, users need to be warned that due to the lax ABNF of RFC 3066, they might get unexpected results than processing legacy data." HTH, Felix
The XML Schema Working Group discussed this issue at its teleconference of 14 December 2007. The current reference to RFC 3066 is non-normative: the xsd:language datatype is intended to hold language codes as defined by RFC 3066 or its successor(s), but type validity is defined solely by a simple regular expression, and a note points out that for the full checking of language codes, additional work is required beyond checking for type validity. We did not choose to change that basic pattern; implicitly, the WG's answer to the question in comment #2 and the suggestion in comment #3 was: no, we will retain the current regular expression, which is very simple, and not attempt to model the more restrictive grammar of RFC 4646. (This means there is some gap between the strings which are type-valid against xsd:language and the set of strings accepted by the grammar in RFC 4646, but there is already a gap between the type-valid strings and the set of correct language identifiers, and changing to the grammar of RFC 4646 will not close that gap.) I note that the WG's decision not to reproduce the grammar from RFC 4646 also helps insulate XSDL from changes to the definition of correct language codes in revisions of the relevant IETF specs, by a form of loose coupling. We did agree to refer to BCP 47 instead of RFC 3066 as appropriate; the editors were so instructed. (I'm marking this 'decided' as well as needsDrafting, as a reminder that the WG does not want to see the wording before it's integrated into the status quo.) In view of the purpose of BCP 47 and other documents in the BCP series, it may seem unnecessary to retain the words "or its successor(s)", but I expect we'll keep them just in case. (But we'll delete the reference to the standards track.) François, as originator of the issue, please indicate your acceptance of this disposition by changing the status of the bug to RESOLVED. (Or if François is unavailable, perhaps Felix Sasaki can act in his stead on behalf of the i18n WG -- assuming this issue was raised on their behalf.) If the WG doesn't hear from either of you in a month, we'll assume you're happy.
The changes described in comment #3 have been made, as described in a comment on bug 4850. So I'm marking this issue as resolved.
Hello XML Schema Working Group, The Internationalization Core Working Group discussed your resolution, see http://www.w3.org/2008/06/18-core-minutes#item05 We are concerned that you have a non-normative reference to RFC 3066, but require the facet of the language data type to follow RFC 3066. We think this means that the reference to RFC 3066 should be normative. An additional comment: You have the XML Schema document at http://www.w3.org/2001/xml.xsd which is also mentioned in the XML Schema 1.1. data types draft. It would be great if you could make the following change: <xs:documentation> See RFC 3066 at http://www.ietf.org/rfc/rfc3066.txt and the IANA registry at http://www.iana.org/assignments/lang-tag-apps.htm for further information. to <xs:documentation> See BCP 47 at http://www.rfc-editor.org/rfc/bcp/bcp47.txt and the IANA language subtag registry at http://www.iana.org/assignments/language-subtag-registry for further information. Thank you, Felix
ACTION 2008-07-18.01: MSM to check whether we have write authority on xml.xsd and update it to add the documentation change as described in bug 3079 if we do and pass the buck to Core if we don't.
Several drafts of the relevant changes to xml.xsd have been prepared, which vary in style and wording, and are now being reviewed by members of the XML Schema WG. Those with W3C member access to the server can find pointers to the drafts, and discussion, at http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2008Dec/0010.html Once the WG is satisfied that the changes are correct, we will install the new version of xml.xsd in the usual way. Felix, if you or the i18n WG wish to review the drafts and express a view on which form of presentation you prefer, please feel free to do so. If you're busy, on the other hand, please feel free to wait until we tell you we think we are done. Apart from the changes to xml.xsd, we do think we are done with this issue.
(In reply to comment #7) > Several drafts of the relevant changes to xml.xsd have been prepared, > which vary in style and wording, and are now being reviewed by members > of the XML Schema WG. Those with W3C member access to the server can > find pointers to the drafts, and discussion, at > > http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2008Dec/0010.html > > Once the WG is satisfied that the changes are correct, we will install > the new version of xml.xsd in the usual way. > > Felix, if you or the i18n WG wish to review the drafts and express a > view on which form of presentation you prefer, please feel free to do so. > If you're busy, on the other hand, please feel free to wait until we > tell you we think we are done. Thanks. I think I would prefer that and wait until you come back to me again. > > Apart from the changes to xml.xsd, we do think we are done with this issue. > I agree, many thanks! Felix
The XML Schema WG has reviewed and approved the draft changes to the schema document for the XML namespace mentioned in comment #7, and they are now in place both at the URI for the current (mutable) schema document for that namespace, namely http://www.w3.org/2009/01/xml.xsd and at the dated immutable location for the now-current version, namely http://www.w3.org/2009/01/xml.xsd With these changes, I believe this issue has been successfully resolved; I am updating it accordingly. Francois, or Felix, as the representatives of the I18n Core WG (which I understand to be the actual originator of the issue), if you would convey this information to the I18n Core WG, so that they can review the new form of the schema document and confirm that the documentation has been successfully updated, it would be helpful. Close the issue to signal that I18n is happy with the resolution; reopen it if there is a problem. If we don't hear from you in the next two weeks, we will assume that I18n is happy with the resolution of the issue.
There's a typo in comment 9; the mutable form of the schema document is at http://www.w3.org/2001/xml.xsd