Re: SPARQL: language tag issues

On Mon, Nov 21, 2005 at 07:27:55PM -0600, Dan Connolly wrote:
> 
> On Fri, 2005-11-11 at 12:19 +0100, Bjoern Hoehrmann wrote:
> > * Eric Prud'hommeaux wrote:
> > >  Returns true if language-range (first argument) matches language-tag
> > >  (second argument) per Tags for the Identification of Languages
> > >  [RFC3066] section 2.5. RFC3066 defines a case-insensitive,
> > >  hierarchical matching algorithm which operates on ISO-defined
> > >  subtags for language and country codes, and user defined subtags. In
> > >  SPARQL, a language-range of "*" matches any non-empty language-tag
> > >  string.
> > 
> > http://lists.w3.org/Archives/Public/spec-prod/2005OctDec/0007.html would
> > suggest to change the reference to the more generic BCP 0047, XML 1.0
> > uses "The values of the attribute are language identifiers as defined by
> > [IETF RFC 3066], Tags for the Identification of Languages, or its
> > successor" -- either is fine with me as long as it is clear that SPARQL
> > does not need to be revised in order to consider the successor of RFC
> > 3066 as the normative reference. The links also lack ".txt" which is
> > included for other RFC references.
> > 
> > Do I understand correctly that for some RDF with xml:lang="" or no in-
> > scope language information "*" would not match?
> 
> Good question... that's how I read "non-empty language-tag string" too.

That's how it was intended.

> Eric, let's be sure there's a test case or three for this to be sure
> we know what the answer is.

Yes, in both positive and negative.
  http://www.w3.org/2001/sw/DataAccess/tests/#langmatches-3
  http://www.w3.org/2001/sw/DataAccess/tests/#LangMatches-4

> >  That would be different
> > from e.g. how the Accept-Language:* header would be interpreted, is
> > there a specific reason for this difference?
> 
> One possible reason is that xml:lang="" doesn't produce a literal
> whose lang is the empty string; it produces a literal with no lang.

True, there is no "" language in XML, just <no language tag>. However,
you do point out an inconsistency between HTTP's use of '*'
[[
   The special range "*", if present in the Accept-Language field,
   matches every tag not matched by any other range present in the
   Accept-Language field.
]]
and SPARQL's
[[
In SPARQL, a language-range of "*" matches any non-empty language-tag
string.
]]

3066 invites protocols to define their interpretation of '*':
[[
   The special range "*" matches any tag.  A protocol which uses
   language ranges may specify additional rules about the semantics of
   "*"; for instance, HTTP/1.1 specifies that the range "*" matches
   only languages not matched by any other range within an
   "Accept-Language:" header.
]]
however, consistency is always nice. I'll propose
"In SPARQL, a language-range of "*" matches any language-tag string
and matches """ (maybe "the empty string"?)
to the DAWG.
-- 
-eric

office: +81.466.49.1170 W3C, Keio Research Institute at SFC,
                        Shonan Fujisawa Campus, Keio University,
                        5322 Endo, Fujisawa, Kanagawa 252-8520
                        JAPAN
        +1.617.258.5741 NE43-344, MIT, Cambridge, MA 02144 USA
cell:   +81.90.6533.3882

(eric@w3.org)
Feel free to forward this message to any list for any purpose other than
email address distribution.

Received on Tuesday, 22 November 2005 10:39:38 UTC