ISSUE-75: Valid plain literals containing #x0 are ill-typed in RDF 1.1

#x0

Valid plain literals containing #x0 are ill-typed in RDF 1.1

State:
CLOSED
Product:
RDF Concepts
Raised by:
Richard Cyganiak
Opened on:
2011-08-19
Description:
The lexical space of xsd:string doesn't cover all Unicode strings.

I assume we will end up referring to XSD 1.1 for the definition of xsd:string [1]. That document leaves it up to implementations whether they support the XML 1.0 or XML 1.1; accordingly, the definition of allowed characters in an xsd:string is [2] or [3].

The more permissive one from XML 1.1:

Char ::= [#x1-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

This excludes #x0, Unicode codepoint U+0000. XML 1.0 also excludes a number of other control codes in the #x0-#x1F range.

The definition of “lexical form” in RDF 2004 [4] says “Unicode string”, which according to [5] includes *all* codepoints including the control codes.

So, any string that includes #x0 was a valid untagged plain literal in RDF 2004. In RDF 1.1, it will be typed as an xsd:string, and thus will be an ill-typed literal.

(On the other hand, such strings could never be serialized in RDF/XML or XHTML+RDFa; they were serializable only in N-Triples and Turtle.)

Is this a problem? Can we go ahead with the new literal design despite this restriction? Should we acknowledge it in the RDF Concepts spec?

[1] http://www.w3.org/TR/2005/WD-xmlschema11-2-20050224/datatypes.html#string
[2] http://www.w3.org/TR/REC-xml/#dt-character
[3] http://www.w3.org/TR/xml11/#NT-Char
[4] http://www.w3.org/TR/rdf-concepts/#dfn-lexical-form
[5] http://www.unicode.org/versions/Unicode6.0.0/UnicodeStandard-6.0.pdf
Related Actions Items:
No related actions
Related emails:
  1. RE: Agenda: JSON-LD Telecon - Tuesday, July 2nd 2013 (from markus.lanthaler@gmx.net on 2013-07-02)
  2. RE: sandro's review of json-ld-api (from markus.lanthaler@gmx.net on 2013-03-29)
  3. Re: Status update on LC comments and post-LC changes to R2RML (from richard@cyganiak.de on 2011-11-07)
  4. Status update on LC comments and post-LC changes to R2RML (from richard@cyganiak.de on 2011-11-07)
  5. Re: RDF-ISSUE-75 (#x0): Valid plain literals containing #x0 are no longer valid in RDF 1.1 (from richard@cyganiak.de on 2011-08-21)
  6. Re: RDF-ISSUE-75 (#x0): Valid plain literals containing #x0 are no longer valid in RDF 1.1 (from ivan@w3.org on 2011-08-20)
  7. RDF-ISSUE-75 (#x0): Valid plain literals containing #x0 are no longer valid in RDF 1.1 (from sysbot+tracker@w3.org on 2011-08-19)

Related notes:

Proposed to resolve as a sentence in RDF Concepts Section 5 and possibly in the RDF Primer or another document motivating changes from RDF 1.0.

David Wood, 13 Oct 2011, 18:01:49

Richard will put a statement about this into rdf concepts.

Also:

<Scott_Bauer> sandro: we put it in rdf concepts now?
<AlexHall> how many implementors validate xsd:strings right now?
<iand> we could write a negative test case: :x :y "\u0000" .
<iand> ask implementors to try that test and see if they handle it
<Scott_Bauer> letting cygri create the action item?
<cygri> ACTION: cygri to add a note to RDF Concepts re ISSUE-75
* trackbot noticed an ACTION. Trying to create it.
* RRSAgent records action 10
<trackbot> Created ACTION-107 - Add a note to RDF Concepts re ISSUE-75 [on Richard Cyganiak - due 2011-10-20].
<sandro> gavin_: This wasn't a problem pre-turtle because no syntax could express it.
<Scott_Bauer> davidwood: Ian's says it should be a test case
<Scott_Bauer> gavinc: it can't be expressed in n-triples
<Scott_Bauer> sandro: it's a syntax error -- you expect it to fail
<iand> it can be expressed in ntriples (as above) but it is just datatype invalid
<Scott_Bauer> topic: issue 76
<cygri> sandro++
<Scott_Bauer> sandro: close issue 75 first
<iand> If i can write "x"^^xsd:int then I can write "\u0000"^^xsd:string

Sandro Hawke, 13 Oct 2011, 18:07:35

Display change log ATOM feed


Guus Schreiber <guus.schreiber@vu.nl>, Chair, Ivan Herman <ivan@w3.org>, Sandro Hawke <sandro@w3.org>, Staff Contacts
Tracker: documentation, (configuration for this group), originally developed by Dean Jackson, is developed and maintained by the Systems Team <w3t-sys@w3.org>.
$Id: 75.html,v 1.1 2014-07-09 12:18:03 carine Exp $