Re: RDF-ISSUE-74: Prefixed names and slashes [RDF Turtle]

The argument for the \u-escape proposal is the ability to put characters 
that occur in some sources of existing data (e.g. lifescience) into 
prefix names.

Gavin has proposed another issue with "Prefixed names and slashes" which 
is related but not identical.

People have expressed a desire for maximum compatibility between Turtle 
and SPARQL.  We have to start from where we are, not take a clean-slate 
approach and discount all switching costs.  See ISSUE-1.

For the proposal:

1/ For the existing data, why does the original character have to used 
and not "_", "." or "-"?

2/ Prefix names are about readability.  \003D is not a readable form of "=".

3/ There has been no analysis of alternatives
    e.g. expand the range of chars allowed as Gavin suggests.
    e.g. reuse delimiting by <> and overload scheme/prefix which has 
deployed experience.  It's even a recurring user expectation albeit not 
common.

4/ The current proposal also introduces \u-escapes into the prefix part 
and blank node labels but the argument only applies to the local part of 
prefixed names.


What we need is design principle and a set of possible ways to achieve 
the objective.  \u-escapes is not the only option.

Comments on certain assertions about the current situation inline.

 Andy

On 15/08/11 13:11, Eric Prud'hommeaux wrote:
> * Richard Cyganiak<richard@cyganiak.de>  [2011-08-15 11:24+0100]
...
>> I think there is consensus in the group that we should not add extensions to Turtle at this point. We should just standardize it as it is already implemented (modulo SPARQL alignment).
>>
>> Personally I am very strongly opposed to extending Turtle in ways that are incompatible with SPARQL.
>
> I agree with keeping SPARQL and Turtle compatible so I'll address
> SPARQL:
>
> SPARQL has already changed from processing escape sequences before
> lexing to after lexing. Previously legal SPARQL strings like
>    PREFIX<http://example.org/>  ex:
>    ASK { ?s ex:ab\u0063d ?o }
> become illegal if SPARQL doesn't accept escaping in prefixed names.
> Do such strings exist in the wild? Probably not as they weren't useful
> to utter (because they were unescaped before lexing). But the argument
> about backward compatibility swings in favor of SPARQL allowing them
> in prefixed names.

I don't follow this argument - there is no backwards compatibility here 
for data or queries.

Turtle, upto and including the start point for this WG did not allow \u 
escapes in qnames at all.  Existing software and data does not use the 
feature if it follows that doc.

SPARQL, up and including 1.1 LC, allowed escaping in hard-to-type 
charcaters like α (Unicode codepoint 03B1, Greek small letter alphas). 
SPARQL-WG has a issue box in the LC to signal the possible change.  The 
LC period got no feedback on the matter.

Every Turtle and SPARQL parser has to change if these changes become 
permanent.  That is a backwards compatibility argument.

> Another argument for escaping is that identifier names (e.g. in
> biology) have things like ':' and '$' in them. Prefixes add a huge
> amount to the readability of SPARQL and Turtle. Forcing a query or
> data writer to abandon the logical prefix because there's an illegal
> localname character is an equally huge impediment to usability.
>
> Escape sequences in strings and IRIs are of limited use as one can
> embed the all legal IRI and string chars in those productions with the
> help of the specialized escapes \\[nrtb].

Surely the use case which lead to the current deployed systems is 
foreign language characters and that is well supported currently.

> Disallowing escape sequences
> in SPARQL

Who is proposing that?  SPARQL-WG isn't.

SPARQL-LC published with the existing escaping (done on the input stream 
before parsing).

 > (and maintaining the status quo in Turtle) means we have to
> justify to the users why the escaping rules they can apply to strings
> and IRIs aren't applicable to prefixed names where they'd be most useful.

The use case which lead to the current deployed systems is forgein 
language characters and that is well supported currently.

...

Received on Monday, 15 August 2011 17:15:43 UTC