Rdf text LC WG comment
SPARQL Working Group response to the request for review of:
rdf:text: A Datatype for Internationalized Text
W3C Working Draft 21 April 2009
SPARQL queries act on the graph, not on the serialized form. Thus, we suggest to the editors state the interactions with SPARQL in respect to:
- the restriction to rdf:text not appearing in RDF graphs should be extended such that rdf:text MUST NOT appear in SPARQL XML results. This extends the existing coverage of RDF graph exchange to include SPARQL results from SELECT, in the same way that CONSTRUCT and DESCRIBE queries are already covered.
- the use of "semantic equivalence" shall be clarified and it should be noted that rdf:text is a D-entailment and is accessed by SPARQL via a BGP entailment regime extension.
- that functions STR/DATATYPE/LANG act on the lexical representations and will be affected depending on the way an rdf:text aware entailment regime manifests it's results.
In addition it should be noted that rdf:text relates to the assumption in RDF that a literal has a datatype or a language tag but not both. Existing, deployed code relies on this invariant.
There are some SPARQL-specific issues that arise that are not addressed in the document. The rdf:text only refers to "graph exchange" when saying that rdf:text must not appear in RDF graphs serializations but that does not apply to SPARQL directly.
Because rdf:text document says nothing about SPARQL operations and it's not clear to me whether changes to existing SPARQL queries are being assumed. At one time, they were.
Since SPARQL is defined over simple entailment, NOT datatype entailment, the notion of "semantic equivalence" (mentioned but not defined in the rdf:text document) does not make sense and this spec appears to require changes to SPARQL behaviour. This would be undesirable since it affects:
1. SPARQL Query Result XML Format
2. Interactions with simple entailment matching of BGPs, and extension of SPARQL via BGPs.
3. Effects on DATATYPE, LANG and STR
Note: In RDF, a literal has either a language tag or a datatype but not both. rdf:text changes this assumption so deployed code or SPARQL implementations that rely on this invariant may break.
We believe that these concerns can be remedied, if rdf:text talks about D-entailment specifically, instead of "semantic equivalence" (and thus not affecting simple entailment as well) in general.
SPARQL XML Results Format
This is not "graph exchange" so the prohibition use of rdf:text in a serialization does not apply. It could be applied, but might not help systems that do want to see rdf:text literals, for example, SPARQL/OWL2.
The problem here, again, is that the semantic implications of rdf:text are not forward-compatible with existing RDF. This concern would be remedied by defining the semantic implications of rdf:text in terms of D-entailment only, as suggested above. In fact, we think that this fix makes the restrictions of the usage of rdf:text in RDF graphs redundant.
What happens if a datatype property is restricted to a rdf:text? What does the RDF serialization look like? Does it include rdf:text?
The SPARQL standard defines SPARQL with respect to simple entailment and provides a mechanism for extension to other entailment regimes. See the section "12.6 Extending SPARQL Basic Graph Matching".
Since SPARQL is defined over simple entailment, NOT datatype entailment, the notion of "semantic equivalence" (mentioned but not defined in the rdf:text document) does not make sense. SPARQL is not acting on the serialization of an RDF graph. It acts on the value space of literals.
Simple entailment does not cover the RDF-MT entailments xsd1a and xsd1b, which are the rules for plain literals without language tag being the same value as XSD strings. So these are not required of a SPARQL processor using simple entailment.
Additional semantic equivalences implied by rdf:text should only affect D-entailment (where rdf:text is part of the datatype map D following ) but not simple entailment. Thus, the document should not talk about "semantic equivalence" in general terms but just in terms of D-entailment. This should fix the main problem raised and would only affect SPARQL engines that follow a (yet to be defined).
We suggest that it is explicitly noted that access to rdf:text aware entailment regimes by a SPARQL query is via the extension mechanism.
Effects on DATATYPE, LANG and STR
Noting that this SPARQL-WG should maintain compatibility with SPARQL as published Jan 2008.
These functions are accessors to the components of a literal term. Different ways of manifesting a value from BGP matching will lead to different resutlts from these functions.
For these example, the serialized form using rdf:text is used although in an RDF graph it exists as a value and when the graph is serialised rdf:text does not appear. The examples relate to a variable bound to such a value and how the literal accessor function (DATATYPE, LANG and STR) of SPARQL can be impacted.
rdf:text does define some functions on rdf:text.
DATATYPE is defined so that the type of a plain literal without language tag is xsd string. There is no datatype for a literal with language.
SPARQL has the concept of a "simple literal" for a plain literal without language tag.
These functions are applied as part of the algebra, not as part of BGP matching - the entailment extension mechanism does not modify these functions. There may be different entailment regimes, maybe on different graphs, in the same query.
DATATYPE of a literal with language tag
DATATYPE ("Padre de familia"@es) ==> error
When a literal is bound to a variable and subsequently used in a call to DATATYPE, what return value is expected? Is it true that if instead it is presented as below, a different result is obtained?
DATATYPE("Padre de familia@es"^^rdf:text) ==> rdf:text
DATATYPE ("Padre de familia") ==> xs:string
but what is:
DATATYPE ("Padre de familia") ==> rdf:text ?? xs:string ??
because one value space is a subset of the other.
The reason for rdf:text is the uniform treatment of literals so the query to find all the untyped literals ("untyped" meaning as per the current SPARQL REC - without type - simple literal or literal with language tag) might be changed.
In RDF, a literal has either a language tag or a datatype but not both. So:
Lang("Padre de familia"@es) ==> "es"
Lang("Padre de familia@es"^^rdf:text) ==> ""
Lang("Padre de familia@es"^^rdf:text) ==> ??
c.f. rtfn:lang-from-text(Padre de familia@es"^^rdf:text) ==> "es"
rdf:text is a datatype with lexical space including the language tag
STR("Padre de familia@es"^^rdf:text) ==> "Padre de familia@es" STR("Padre de familia"@es) ==> "Padre de familia"
STR("Padre de familia@es"^^rdf:text) ==> "Padre de familia" ??
because STR returns the lexical form.
The lexical space of literals with language tags is changed by rdf:text.
The EBV of a string is false if the string is of length zero else true.
Do any rdf:text literals have an EBV of false?
IRIs vs. URIs
"This specification uses Uniform Resource Identifiers (URIs) for naming datatypes and their components" indicates that language tags in RDF are URIs, where SPARQL Query interpreted them as IRIs. Using URIs would imply that
<X> <p> <http://xn--9oqp94l.example/?user=%D8%A3%D9%83%D8%B1%D9%85&channel=R%26D> .
would be matched by the SPARQL graph pattern
<X> <p> <http://伝言.example/?user=أكرم&channel=R&D> .