[Bug 14227] New: Full Text language option should address synonymy

http://www.w3.org/Bugs/Public/show_bug.cgi?id=14227

           Summary: Full Text language option should address synonymy
           Product: XPath / XQuery / XSLT
           Version: Working drafts
          Platform: PC
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Full Text 3.0
        AssignedTo: holstege@mathling.com
        ReportedBy: cmsmcq@blackmesatech.com
         QAContact: public-qt-comments@w3.org


In the joint call of 20 September I was asked to raise a bug against Full
Text's description of the language option.  Specifically, the text of the
section on the language option needs to address the question of what to do when
there are both two- and three-letter codes for a language (i.e. which should be
used?)  The text of any description of the feature names used for language
support, as sketched in Mary Holstege's mail at

  http://lists.w3.org/Archives/Member/w3c-xsl-query/2011Sep/0224.html

may also need to address this question -- at the very least it should be
consistent with the language option.

The value of the language option is required to be castable to xs:language,
which means that its semantics eventually are based on RFC 3066 (in XSD 1.0) or
its successor BCP 47 (in XSD 1.1).  

BCP 47 already addresses the question of preferring the two- or three-letter
codes; it describes rules for a Preferred-Value field in the IANA Language
Subtag Registry.  So in some sense, if we assume that the recommendations of
BCP 47 are binding on the formulation of values for the language option and
features, we may infer that FT already addresses the topic and there is not
really any bug here.  

Empirically, however, today's call provides some evidence for the claim that
the FT spec does not make its position on the matter adequately clear.  So
perhaps it would be a good idea if the description of the language option, and
the description of the class of feature names based on the language option,
were to mention explicitly that where the relevant RFCs define more than one
code for a language or language-locale combination, the provisions of BCP47
regarding preferred values SHOULD be followed.  It would be nice if we could
then say "For example, prefer 'deu' to 'de'", or "For example, prefer 'de' to
'deu'" -- that would require that someone actually wade through the details of
BCP47 and come out the other side with an answer to that question.  

It might also be helpful to remind readers (with an example, or in a note) that
the values of the language option might include codes like 'en-US', 'en-CA',
and 'en-GB' for a hypothetical implementation with three different tokenizers
for U.S. English, Canadian English, and British English.  Note:  I think
'en-GB' is the right way to say 'British English' but if it's not, please
substitute the correct way to say it.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Tuesday, 20 September 2011 17:48:42 UTC