Re: ISSUE-26: We don't need any RDFS vocabulary for error triples! from Ivan Herman on 2010-07-21 (public-rdfa-wg@w3.org from July 2010)

From: Ivan Herman <ivan@w3.org>
Date: Wed, 21 Jul 2010 15:14:44 +0200
To: benjamin.adrian@dfki.de
Cc: RDFa WG <public-rdfa-wg@w3.org>
Message-Id: <54F6D8C7-C45A-4294-A4DC-3629D045E8A2@w3.org>
On Jul 21, 2010, at 14:23 , Benjamin Adrian wrote:

> Hi Ivan,
> 
> My main concern about an RDFS vocabulary for error triples about RDFa Parser errors is,
> that (if we want to do a good job) it has to be extensible for all kind of RDF parser errors.

You mean RDF/XML or Turtle parser errors, right?

This was indeed the problem of Toby and, acknowledging this issue, we have slightly changed the requirements in terms of not *requiring* the processor graph mechanism to be used. What we say is that *if* an implementation uses it, then this is the way to use it.

The comparison is also a little bit misleading. As Shane pointed out, there is a difference between RDFa and the others, namely that there are situations (I think it was the un-referencability of a @profile file that triggered this) where a large part of the triples are not generated. I always felt that a service should inform the caller somehow if this occurs.

(Yes, I can see the possible answer that if, say, somebody mistypes the rdf: namespace declaration in an RDF/XML file then the file does not generate any triples at all... Ie, there is at least some analogy) 

> And this a real big issue and means a lot of communication efforts!
> 
>> What you propose, if my understanding is correct, is to have an error vocabulary in XML and return that to the caller as an XML Literal. If I am an RDF application and use a remove RDFa service to extract RDF from an RDFa file that means that I would have to include into my application an XML parser (even if it is a simple one) just to understand the Error message, whereas if I get the results in the form of triples then, well, I use whatever I use for my application already. I just do not believe that would be acceptable.
>>   
> I don't see the real use case here. I never wrote any program logic on error messages or info messages.
> I just use grep to see what went wrong. Inside an application logic I also catch exceptions and check what type they are.

Hm. Grep may not be an option is I run a (remote) distiller on a (remove) HTML file that refers to a (remove) @profile file... And, in the case of a remote service, there is no such thing as an exception. 

So... what would you expect a remote service like a distiller to return if a @profile file is not reachable? Just return whatever triples that are generated and leave it at that?

(I do not mean to be provocative: this is a genuine question of what you as a user would expect...)


>> I am less qualified on the API level but... the user of an API surely has to be prepared to handle RDF graphs. That should include the handling of a processor graph. If the order of the statements is a problem for the API user than we have much bigger problems on our hands because the order of RDF triples extracted from the RDFa content is also random! I would hope that is not a real issue...
>> 
>>   
> Well it's not a problem, but an issue the application developer has to be aware of.
> The ordering of RDF triples is random. That means using the RDFTripleIterator
> for complex queries is nearly impossible on a large dataset.
> 
> It also means that filtering error triples with triple-events is very difficult and error-prone.

I do not understand that. In rough RDFLib parlance

for s in processor_graph.triples((None, rdf:type, rdfa:ProfileReferenceError)) :
    for c in processor_graph.triples((s, rdfa:context, None)) :
        print c

will print out all the @profile values that cannot be dereferenced. I do not see that to be any more difficult than managing triples in general...

> Multiple randomly ordered RDF triples about errors are completely useless when using the
> event based mechanism.
> 
> That's why I recommend describing each error in a single RDF triple.
> Other solutions might look like:
> - error events contain a property group instead of a single triple.

We have not specified the error mechanism on the API. But isn't it possible to ask for a property group using the rdf:type of the error? I would then get hold of the subjects for errors and then I can get hold of the error descriptions for each of those. I really do not see why this is much more complicated than processing RDFa triples in general.

Note that, I presume, the order in a property group is not fixed either...

> - the returned triple order of the processor graph is fixed

I do not think this is feasible. What this means is that an implementation cannot use an underlying triple store or environment to generate and store error triples. That seems to be prohibitive to me...

I guess the real issue that does come and did come up in the past is whether an error mechanism is necessary at all. We seemed to have a working group consensus on this, and I begin to wonder whether this is still true...

Ivan


> 
> Cheers,
> 
> Benjamin
>> On Jul 19, 2010, at 13:38 , Benjamin Adrian wrote:
>> 
>>   
>>> Hi,
>>> 
>>> I say we don't need any RDFS vocabulary for error triples!
>>> 
>>> Read why:
>>> 
>>> The spec sais:
>>> 
>>> "SAX-based processors or processors that utilize function or method callbacks
>>> to report the generation of triples are classified as event-based RDFa Processors."
>>> 
>>> That means, the callback function is called  for every generated RDF triple.
>>> Parsing error triples with these callbacks can be extremely difficult, when
>>> the ordering of the generated triples inside the processor graph are unsorted
>>> (as it may occur -- it's RDF not XML!).
>>> 
>>> So searching the stream for triples with patterns like:
>>> rdf:type rdfa:ProfileReferenceError
>>> 
>>> 
>>> is nice when the generated triples' ordering is like this:
>>> 
>>>      _:1 a rdfa:ProfileReferenceError ;
>>>      _:1 dc:description "The @profile value could not be deferenced" ;
>>>      _:1 dc:date "2010-06-30T13:40"^^xsd:dateTime ;
>>> 
>>> But what if they are generated like this?
>>>      _:1 dc:date "2010-06-30T13:40"^^xsd:dateTime ;
>>>      _:1 dc:description "The @profile value could not be deferenced" ;
>>>      _:1 a rdfa:ProfileReferenceError ;
>>> 
>>> 
>>> Then you have to puffer and search the whole stream, which means you should better use the
>>> model based approaches of error reporting.
>>> 
>>> -->  NEITHER EARL NOR ANOTHER RDFS  it should be really simple.
>>> 
>>> I don't think that the intention of EARL matches the use case of our error vocabulary.
>>> The used RDF vocabulary must be as simple as possible.
>>> That means it should use as few properties as possible.
>>> Nobody will ever reason on an error graph. So why not
>>> summarizing all information about a single error in a single triple describing a stack trace.
>>> 
>>> [] c:description "ProfileReferenceError: The @profile value<http://www.example.org/profile>  could not be deferenced. \n
>>>                             Line<http://www.example.org>: 564 \n
>>>                             HTTP GET: ....\ n
>>>                             HTTP RESPONSE ".
>>> 
>>> If you say, well, a string is not not enough, try an XMLLiteral:
>>> 
>>> [] c:description "<ProfileReferenceError>: The @profile value<http://www.example.org/profile>  could not be deferenced. \n
>>>                            <POSITION>
>>>                            <URL>http://www.example.org</URL>
>>>                            <LINE>564</LINE>
>>>                            </POSITION>
>>>                            <REQUEST>  GET: ....</REQUEST>
>>>                             <RESPONSE>HTTP RESPONSE ...</RESPONSE>
>>>                             </ProfileReferenceError>".
>>> 
>>> 
>>> That's it :) I'm fine with XML or plain Literals as objects for error triples.
>>> 
>>> Best regards,
>>> 
>>> Benjamin
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> __________________________________________
>>> Benjamin Adrian
>>> Email :
>>> benjamin.adrian@dfki.de
>>> 
>>> WWW :
>>> http://www.dfki.uni-kl.de/~adrian/
>>> 
>>> Tel.: +49631 20575 145
>>> __________________________________________
>>> Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
>>> Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
>>> Geschäftsführung:
>>> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff
>>> Vorsitzender des Aufsichtsrats:
>>> Prof. Dr. h.c. Hans A. Aukes
>>> Amtsgericht Kaiserslautern, HRB 2313
>>> __________________________________________
>>> 
>>>     
>> 
>> ----
>> Ivan Herman, W3C Semantic Web Activity Lead
>> Home: http://www.w3.org/People/Ivan/
>> mobile: +31-641044153
>> PGP Key: http://www.ivan-herman.net/pgpkey.html
>> FOAF: http://www.ivan-herman.net/foaf.rdf
>> 
>> 
>> 
>> 
>> 
>>   
> 
> 
> -- 
> __________________________________________
> Benjamin Adrian
> Email : benjamin.adrian@dfki.de
> WWW : http://www.dfki.uni-kl.de/~adrian/
> Tel.: +49631 20575 145
> __________________________________________
> Deutsches Forschungszentrum für Künstliche Intelligenz GmbH
> Firmensitz: Trippstadter Straße 122, D-67663 Kaiserslautern
> Geschäftsführung:
> Prof. Dr. Dr. h.c. mult. Wolfgang Wahlster (Vorsitzender) Dr. Walter Olthoff
> Vorsitzender des Aufsichtsrats:
> Prof. Dr. h.c. Hans A. Aukes
> Amtsgericht Kaiserslautern, HRB 2313
> __________________________________________
> 


----
Ivan Herman, W3C Semantic Web Activity Lead
Home: http://www.w3.org/People/Ivan/
mobile: +31-641044153
PGP Key: http://www.ivan-herman.net/pgpkey.html
FOAF: http://www.ivan-herman.net/foaf.rdf
Attachments

application/pkcs7-signature attachment: smime.p7s
Received on Wednesday, 21 July 2010 13:14:26 UTC