This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 3079 - RFC3066 ref
Summary: RFC3066 ref
Status: RESOLVED FIXED
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.1 only
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema comments list
URL:
Whiteboard: important, work, i18n cluster
Keywords: resolved
Depends on:
Blocks:
 
Reported: 2006-04-04 17:25 UTC by Fran
Modified: 2009-01-23 16:39 UTC (History)
2 users (show)

See Also:


Attachments

Description Fran 2006-04-04 17:25:41 UTC
Section 3.4.3, 1st para: "language ... as defined by [RFC 3066]or its
successor(s) in the IETF Standards Track."  RFC 3066 is *not* on the
IETF Standards Track, it is BCP 47.  Its successor is now approved but
not yet published (no RFC number yet, known as 3066bis).  We recommend
to change the reference to BCP 47
(http://www.rfc-editor.org/rfc/bcp/bcp47.txt), which will in due time
point to the new RFC.  This would require no other change to that
section, but ensure that the enriched semantics provided by 3066bis, and
the new language subtag registry (which is already live) are taken into
account.
Comment 1 C. M. Sperberg-McQueen 2006-09-09 00:57:39 UTC
Thank you for the assistance.  It looks as if the reference to
BCP 47 will do exactly what we need.

One question does arise, though, as I look at BCP 47 / RFC 4646.
I note that it provides an ABNF definition for the syntax of
well-formed language tags which is significantly more 
restrictive than that of RFC 3066, which is repeated in the
draft under review.

Should XML Schema 1.1 define the lexical space of xsd:language
using the grammar of RFC 3066, or of RFC 4646? Or what?

Does RFC 4646 allows itself to be more restrictive than
RFC 3066 because there is some evidence that the extra restrictions
will not actually invalidate existing data?  Or would we risk
inconveniencing users of XML Schema and of schemas written with
XML Schema, if we shifted to the more restrictive grammar?
Speaking for myself, I think the XML Schema WG would benefit from
knowing your views and those of the i18n WG on this matter.

Thanks.
Comment 2 Felix Sasaki 2007-09-19 05:50:24 UTC
Hello Michael,
We discussed this issue at http://www.w3.org/2007/09/18-core-minutes#item07 . We would like to propose that you use the ABNF defined in RFC 4646. This ABNF is stable. The updates of BCP 47 (which will lead to a new RFC obsoleting RFC 4646) are only about adoption of certain values for the extlang subtag, see http://tools.ietf.org/html/rfc4646#section-2.2.2 and the charter of the LTRU WG at http://www.ietf.org/html.charters/ltru-charter.html .
Mainly terms of references, I would propose the following changes in sec. 3.4.3:
/START proposal sec. 3.4.3/
[Definition:]   language represents formal natural language identifiers, as defined by [BCP 47]. The value space and lexical space of language are the set of all strings that conform to the ABNF

        (here RFC 4646 grammar)

This is the set of strings accepted by the grammar given in [RFC 4646], the RFC which currently represents [BCP 47]. The base type of language is token.
Note: The regular expression above provides the only normative constraint on the lexical and value spaces of this type. The additional constraints imposed on language identifiers by [BCP 47], and in particular their requirement that language codes be registered with IANA or ISO if not given in ISO 639, are not part of this datatype as defined here.
Note: [BCP 47] specifies that language tags and sub tags "are to be treated as case insensitive: there exist conventions for the capitalization of some of the subtags, but these MUST NOT be taken to carry meaning." For instance, [ISO 3166] recommends that country codes are capitalized (MN Mongolia), while [ISO 639] recommends that language codes are written in lower case (mn Mongolian). Since the language datatype is derived from string, it inherits from string a one-to-one mapping from lexical representations to values. The literals 'MN' and 'mn' therefore correspond to distinct values and have distinct canonical forms. Users of this specification should be aware of this fact, the consequence of which is that the case-insensitive treatment of language values prescribed by [BCP 47] does not follow from the definition of this datatype given here; applications which require case-sensitivity should make appropriate adjustments.
/END proposal sec. 3.4.3/
Since the RFC 3066 ABNF was rather lax and users were not punished for producing useless language tags (like "English-England"), we see the danger that the more restrictive grammar of RFC 4646 leads to more useful, but unexpected results. To make people aware of this situation, I would propose the following note as a health warning, with a non-normative reference to RFC 3066:
"The ABNF defined in the predecessor of RFC 4646, RFC 3066, was rather lax. Users were not punished for producing ABNF-compliant, but otherwise useless language tags. In contrast, the more restrictive grammar in RFC 4646 is more appropriate for creating language tags. However, users need to be warned that due to the lax ABNF of RFC 3066, they might get unexpected results than processing legacy data."
HTH,

Felix
Comment 3 C. M. Sperberg-McQueen 2007-12-14 19:47:26 UTC
The XML Schema Working Group discussed this issue at its teleconference of
14 December 2007.  The current reference to RFC 3066 is non-normative: the
xsd:language datatype is intended to hold language codes as defined by RFC 3066
or its successor(s), but type validity is defined solely by a simple regular
expression, and a note points out that for the full checking of language codes,
additional work is required beyond checking for type validity.

We did not choose to change that basic pattern; implicitly, the WG's answer
to the question in comment #2 and the suggestion in comment #3 was:  no,
we will retain the current regular expression, which is very simple, and not
attempt to model the more restrictive grammar of RFC 4646.  (This means there
is some gap between the strings which are type-valid against xsd:language
and the set of strings accepted by the grammar in RFC 4646, but there is
already a gap between the type-valid strings and the set of correct language
identifiers, and changing to the grammar of RFC 4646 will not close that gap.)

I note that the WG's decision not to reproduce the grammar from RFC 4646
also helps insulate XSDL from changes to the definition of correct language
codes in revisions of the relevant IETF specs, by a form of loose coupling.

We did agree to refer to BCP 47 instead of RFC 3066 as appropriate; the
editors were so instructed.  (I'm marking this 'decided' as well as
needsDrafting, as a reminder that the WG does not want to see the wording
before it's integrated into the status quo.)

In view of the purpose of BCP 47 and other documents in the BCP series, it
may seem unnecessary to retain the words "or its successor(s)", but I expect
we'll keep them just in case.  (But we'll delete the reference to the standards
track.)

François, as originator of the issue, please indicate your acceptance of this
disposition by changing the status of the bug to RESOLVED.  (Or if François
is unavailable, perhaps Felix Sasaki can act in his stead on behalf of the
i18n WG -- assuming this issue was raised on their behalf.)  If the WG
doesn't hear from either of you in a month, we'll assume you're happy.
Comment 4 C. M. Sperberg-McQueen 2008-01-30 15:53:28 UTC
The changes described in comment #3 have been made, as described in
a comment on bug 4850.

So I'm marking this issue as resolved.
Comment 5 Felix Sasaki 2008-06-26 12:44:49 UTC
Hello XML Schema Working Group,

The Internationalization Core Working Group discussed your resolution, see
http://www.w3.org/2008/06/18-core-minutes#item05
We are concerned that you have a non-normative reference to RFC 3066,
but require the facet of the language data type to follow RFC 3066. We think this means that the reference to RFC 3066 should be normative.

An additional comment: You have the XML Schema document at
http://www.w3.org/2001/xml.xsd
which is also mentioned in the XML Schema 1.1. data types draft. It
would be great if you could make the following change:

<xs:documentation>
See RFC 3066 at http://www.ietf.org/rfc/rfc3066.txt
and the IANA registry at
http://www.iana.org/assignments/lang-tag-apps.htm for
further information.

to

<xs:documentation>
See BCP 47 at http://www.rfc-editor.org/rfc/bcp/bcp47.txt
and the IANA language subtag registry at
http://www.iana.org/assignments/language-subtag-registry for
further information.

Thank you,

Felix
Comment 6 David Ezell 2008-08-08 15:49:16 UTC
ACTION 2008-07-18.01: MSM to check whether we have write authority on xml.xsd and update it to add the documentation change as described in bug 3079 if we do and pass the buck to Core if we don't.
Comment 7 C. M. Sperberg-McQueen 2008-12-22 15:29:34 UTC
Several drafts of the relevant changes to xml.xsd have been prepared,
which vary in style and wording, and are now being reviewed by members
of the XML Schema WG.  Those with W3C member access to the server can
find pointers to the drafts, and discussion, at 

  http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2008Dec/0010.html

Once the WG is satisfied that the changes are correct, we will install
the new version of xml.xsd in the usual way.  

Felix, if you or the i18n WG wish to review the drafts and express a 
view on which form of presentation you prefer, please feel free to do so.
If you're busy, on the other hand, please feel free to wait until we
tell you we think we are done.

Apart from the changes to xml.xsd, we do think we are done with this issue.
Comment 8 Felix Sasaki 2008-12-22 22:21:31 UTC
(In reply to comment #7)
> Several drafts of the relevant changes to xml.xsd have been prepared,
> which vary in style and wording, and are now being reviewed by members
> of the XML Schema WG.  Those with W3C member access to the server can
> find pointers to the drafts, and discussion, at 
> 
>   http://lists.w3.org/Archives/Member/w3c-xml-schema-ig/2008Dec/0010.html
> 
> Once the WG is satisfied that the changes are correct, we will install
> the new version of xml.xsd in the usual way.  
> 
> Felix, if you or the i18n WG wish to review the drafts and express a 
> view on which form of presentation you prefer, please feel free to do so.
> If you're busy, on the other hand, please feel free to wait until we
> tell you we think we are done.

Thanks. I think I would prefer that and wait until you come back to me again.

> 
> Apart from the changes to xml.xsd, we do think we are done with this issue.
> 

I agree, many thanks!

Felix

Comment 9 C. M. Sperberg-McQueen 2009-01-21 21:31:25 UTC
The XML Schema WG has reviewed and approved the draft changes to the 
schema document for the XML namespace mentioned in comment #7, and
they are now in place both at the URI for the current (mutable) schema 
document for that namespace, namely

  http://www.w3.org/2009/01/xml.xsd

and at the dated immutable location for the now-current version,
namely

  http://www.w3.org/2009/01/xml.xsd

With these changes, I believe this issue has been successfully resolved;
I am updating it accordingly.

Francois, or Felix, as the representatives of the I18n Core WG (which
I understand to be the actual originator of the issue), if you would
convey this information to the I18n Core WG, so that they can review
the new form of the schema document and confirm that the documentation
has been successfully updated, it would be helpful.  Close the issue to
signal that I18n is happy with the resolution; reopen it if there is a
problem.  

If we don't hear from you in the next two weeks, we will assume that
I18n is happy with the resolution of the issue.
Comment 10 C. M. Sperberg-McQueen 2009-01-23 16:39:35 UTC
There's a typo in comment 9; the mutable form of the schema document is
at

  http://www.w3.org/2001/xml.xsd