Re: RDF-ISSUE-13 (RDF XMLLiterals): Review RDF XML Literals [Cleanup tasks][Turtle][JSON] from Jeremy Carroll on 2011-03-09 (public-rdf-wg@w3.org from March 2011)

From: Jeremy Carroll <jeremy@topquadrant.com>
Date: Wed, 09 Mar 2011 12:40:25 -0800
To: Ivan Herman <ivan@w3.org>
CC: RDF Working Group WG <public-rdf-wg@w3.org>
Message-ID: <4D77E5B9.2070605@topquadrant.com>
The 2004 design is canonicalize on input. A non-canonicalized XMLLiteral 
is ill-formed, and hence goes into a semantic black hole that I don't 
understand.

In fact the 2004 design is more canonicalize during XML processing. Thus 
an XMLLiteral in a turtle or a N-triples doc must already be 
canonicalized. Otherwise it is equivalent to "1 syntax error.0"^^xsd:decimal

Thus in an RDFa processor if it wishes to convert HTML into turtle, it 
should (IMO):
a) convert HTML into XHTML
b) process any XML => XMLLiteral by applying the XC14N (the actual 
details are not so very difficult)
c) serialize the canonicalize XMLLiteral in TTL

If the XMLLiteral is already embedded within the HTML as say an 
attribute value, then this canonicalization is already required.

Jeremy


On 3/9/2011 12:27 AM, Ivan Herman wrote:
> On Mar 9, 2011, at 02:38 , Jeremy Carroll wrote:
>
>> I realized I should have given a simple answer.
> :-)
>
>> The decision was based on not wanting to require an XML subsystem within an RDF reasoner. Therefore the XML processing is confined to the RDF/XML parser that has to handle XML anyway.
>>
> Ok. What is not clear from the text, though (and that may be my fault) is whether it is allowed in a particular syntax to have a non-canonicalized XML or not. Ie, whether the canonicalization is to be done by a parser or not. My instinct says that the canonicalization should be done by the parser, but I remember having had huge discussions in the RDFa Working Group on whether the code generated by RDFa (eg, the Turtle output) should be canonicalized or not.
>
> I realize that is largely an issue of that particular serialization syntaxes should make clear, hence also labeling this mail with the turtle and JSON sub-groups' subject heading. I think this is certainly a cleanup operation that we have to do (although it would put an extra burden on parsers and I am not sure many of those do that...)
>
> Thanks Jeremy
>
> Ivan
>
>
>> Jeremy
>>
>>
>> On 3/8/2011 5:27 PM, Jeremy Carroll wrote:
>>> Hi Ivan
>>> http://www.w3.org/TR/2003/WD-rdf-concepts-20031010/#section-substantive-Revisions
>>> Under "XMLLiteral simplification" gives the blow by blow account.
>>>
>>> There is text embedded in
>>> http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0170.html
>>>
>>> Joe Reagle says:
>>> [[
>>>
>>> I presume that the reason you even care how the xml-literal is represented
>>>>>   is that you will want to compare RDF instances (which might contain
>>>>>   xml-literals) to see if they are identical at some point?
>>> ]]
>>>
>>> The current design is intended to make that easy, and put the burden of XML processing within the RDF/XML parser.
>>> A turtle or N3 parser is not required to have an XML subsystem, whereas the older design, which canonicalized as part of the lex2value mapping required all RDF implementations to be able to do that.
>>>
>>> Notice also point i in
>>> http://lists.w3.org/Archives/Public/www-rdf-comments/2003JanMar/0335.html
>>> [[
>>>
>>> An example fix would be
>>> to require an RDF/XML parser to use a specific canonicalization on
>>> input.
>>>
>>> ]]
>>> A proposal that was accepted in full.
>>>
>>> Jeremy
>>>
>>>
>>> On 3/8/2011 12:41 AM, Ivan Herman wrote:
>>>> Jeremy,
>>>>
>>>> just want to understand... what was the reason xc14n was required on the lexical space? I would expect that xc14n is important to be able to compare xml literals but that is a value space issue. Just like 123.456 is identical, in value space, to 123.4560000
>>>>
>>>> Thanks
>>>>
>>>> Ivan
>>>>
>>>> On Mar 7, 2011, at 20:28 , Jeremy Carroll wrote:
>>>>
>>>>> The motivation in the 1999 M&S spec, and the 2004 Recs for XML Literals were to do with I18N use cases involving HTML (and in some of them Ruby)
>>>>>
>>>>> I believe that for at least some of these use cases we would now recommend RDFa.
>>>>>
>>>>> I think there are some use cases that are not addressed by RDFa.
>>>>>
>>>>> Once you take the use cases seriously, then you end up somewhere not a million miles away from the current specs, with all their problems.
>>>>>
>>>>> I suspect an underlying error in the 2002-2004 work was the following incorrect reasoning:
>>>>> - it is important for RDF to carry rich text literals (e.g. involving Ruby markup)
>>>>> - it is important to be able to tell if two RDF fragments are the same
>>>>> Hence:
>>>>> - it is important to be able to compare two rich text literals in RDF [It is this that leads to the XC14N dance]
>>>>>
>>>>> Jeremy
>>>>>
>>>>>
>>>>> On 3/7/2011 5:35 AM, Ivan Herman wrote:
>>>>>> On Mar 7, 2011, at 14:25 , RDF Working Group Issue Tracker wrote:
>>>>>>
>>>>>>> RDF-ISSUE-13 (RDF XMLLiterals): Review RDF XML Literals [Cleanup tasks]
>>>>>>>
>>>>>>> http://www.w3.org/2011/rdf-wg/track/issues/13
>>>>>>>
>>>>>>> Raised by: Andy Seaborne
>>>>>>> On product: Cleanup tasks
>>>>>>>
>>>>>>> RDF Concepts:
>>>>>>> http://www.w3.org/TR/rdf-concepts/#section-XMLLiteral
>>>>>>>
>>>>>>> RDF Syntax:
>>>>>>> http://www.w3.org/TR/REC-rdf-syntax/#section-Syntax-XML-literals
>>>>>>>
>>>>>>> The lexical space of RDF XML Literals is XML fragments which are required to be "exclusive canonical XML".  The lexical space and the value space are in 1-1 correspondence. The rules are quite complicated. These rules for canonicalization apply to the lexical form; equality testing can be done using string compare.
>>>>>>>
>>>>>>> Canonicalization rules include no use of<tag/>    and that attributes must be in sorted order (this is not an exhaustive list).
>>>>>>>
>>>>>>> A consequence of this is that many correct XML fragments are not legal as XML Literals because they do not correspond to exclusive canonicalization.
>>>>>>>
>>>>>>> Possible cleanup includes partially relaxing the lexical space restrictions while retaining the value space so that fragments can be used as XML literals without complex processing.
>>>>>>>
>>>>>> +10^infinite
>>>>>>
>>>>>> I know of no RDF serializers around that would produce correct XML Literals in this sense. They all produce valid XML, with hopefully the right namespace declarations (though that does not always happen either) but they certainly do not necessarily go through the extra mile of canonicalization. And there is no reason for that either: canonicalization comes into place when two XML fragments must be compared as strings; but this should be done in value space and not in lexical space...
>>>>>>
>>>>>> Ivan
>>>>>>
>>>>>>> RDF XML Literals are the only datatype hard wired into RDF.
>>>>>>>
>>>>>>> If a Turtle document is to be validated, will that require access to an XML parser and canonicalization engine?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> ----
>>>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>>>> Home: http://www.w3.org/People/Ivan/
>>>>>> mobile: +31-641044153
>>>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>> ----
>>>> Ivan Herman, W3C Semantic Web Activity Lead
>>>> Home: http://www.w3.org/People/Ivan/
>>>> mobile: +31-641044153
>>>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>>>> FOAF: http://www.ivan-herman.net/foaf.rdf
>>>>
>>>>
>>>>
>>>>
>>>>
>
> ----
> Ivan Herman, W3C Semantic Web Activity Lead
> Home: http://www.w3.org/People/Ivan/
> mobile: +31-641044153
> PGP Key: http://www.ivan-herman.net/pgpkey.html
> FOAF: http://www.ivan-herman.net/foaf.rdf
>
>
>
>
>
Received on Wednesday, 9 March 2011 20:40:49 UTC