Re: SPARQL and Unicode versions from Dan Connolly on 2006-01-08 (public-rdf-dawg-comments@w3.org from January 2006)

From: Dan Connolly <connolly@w3.org>
Date: Sun, 08 Jan 2006 08:54:15 -0600
To: Dave Beckett <dave@dajobe.org>
Cc: public-rdf-dawg-comments@w3.org
Message-Id: <1136732055.20839.586.camel@dirk.w3.org>

On Sat, 2006-01-07 at 20:01 -0800, Dave Beckett wrote:
> Dan Connolly wrote:
> > On Sat, 2006-01-07 at 12:38 -0800, Dave Beckett wrote:
> > 
> >>SPARQL refers to:
> >>
> >>[[
> >>  [UNICODE]
> >>    The Unicode Standard, Version 4. ISBN 0-321-18578-1, as updated from
> >>  time to time by the publication of new versions. The latest version of
> >>  Unicode and additional information on versions of the standard and of
> >>  the Unicode Character Database is available at
> >>  http://www.unicode.org/unicode/standard/versions/.
> >>
> >>]]
> >>
> >>which cites a moving target.  Please define SPARQL in terms of a
> >>particular version of Unicode only, and no other.  Otherwise if or when
> >>this Unicode consortium makes some incompatible changes, all existing
> >>implementations become invalid.
> > 
> > 
> > How so? How is conformance to SPARQL sensitive to changes in Unicode?
> 
> The SPARQL query syntax is defined on Unicode characters:
> 
> [[
> A. SPARQL Grammar
> 
> A SPARQL query string is a Unicode character string (c.f. section 6.1
> String concepts of [CHARMOD])
> ...
> ]]
> 
> although the grammar defines precise ranges of codepoints for particular
> things such as names of variables (based on XML 1.1 I think).
> 
> If the definition of a Unicode character string changes in some future
> Unicode revision, such as for example by allowing additional codepoints,
> then there will be additional codepoints allowed in a SPARQL query
> string, following the sentence above.

I believe that's by design, following...

"C063  [S]  A generic reference to the Unicode Standard MUST be made if
it is desired that characters allocated after a specification is
published are usable with that specification".
  http://www.w3.org/TR/2005/REC-charmod-20050215/#C063

I suppose I should check with the WG.

> Any part of the grammar that uses an negated range such as with '[^...]'
> will allow such codepoints.  Examples include:
>   http://www.w3.org/TR/rdf-sparql-query/#rQ_IRI_REF
> and all string literals.
> 
> These codepoints may be refused by something implementing Unicode 4.0
> and no more.

I suppose we need a test case that uses a codepoint that isn't currently
allocated in Unicode 4.0.

I still can't think of any reason why changes in Unicode specs would
make any difference to SPARQL producers/consumers. It's not like
they need to reference the Unicode tables to check the grammar or
anything.

> Dave
-- 
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E

Received on Sunday, 8 January 2006 14:54:20 UTC