Re: RDF-ISSUE-25 (Deprecate Reification): Should we deprecate (RDF 2004) reification? [Cleanup tasks]

On Fri, 2011-04-08 at 20:12 +0100, Richard Cyganiak wrote:
> Sandro,
> 
> I am sorry but I will not respond to the substance of this message.
> 
> There was a survey whose results you know well. A charter was written up. It says that the WG must standardize the multigraph stuff, and that it must deprecate (whatever that means) reification.

The charter only gives reification as an example of something we might
want to deprecate.  I do not think it pre-judges ISSUE-25.   I do not,
however, expect anyone to actually support keeping it.

> Now you are suggesting that reification could be the way to address the multigraph stuff. I find this perverse and strongly object to the proposal. I think discussing it at all is a waste of working group resources.

I'm not claiming that it's the best way to address the multigraph stuff,
but I do think it's important that we understand where it fails and why.

In particularly, I think it would be good if, at the end of this, we can
provide the community a clear answer as to why reification was not good
and we had to use something else.    Also, people are likely to reinvent
it unless we can clearly explain why it's a bad idea -- or accurately
describes the problems it causes.

Honestly, I was surprised (and a bit charmed) at the strength of my
arguments taking the devil's advocate position here.  I expected to
quickly learn how reification was indefensible.

I'm sorry for not more clearly explaining what I was doing.  I see now
you thought I was seriously arguing for this, and so you ended up
putting energy into this which you wouldn't have otherwise.  I'll try to
be more careful about that in the future.

     -- Sandro

> Richard
> 
> 
> On 8 Apr 2011, at 19:04, Sandro Hawke wrote:
> 
> > On Fri, 2011-04-08 at 11:02 +0100, Richard Cyganiak wrote:
> >> On 8 Apr 2011, at 05:42, Sandro Hawke wrote:
> >>> <u1> { <a> <b> 1, 2 }
> >>> <u2> { <a> <c> 3, 4 }
> >>> 
> >>> would be:
> >>> 
> >>> <u1> eg:hasTriple [ rdf:subject <a>; rdf:predicate <b>; rdf:object 1 ],
> >>>                 [ rdf:subject <a>; rdf:predicate <b>; rdf:object 2 ].
> >>> 
> >>> <u2> eg:hasTriple [ rdf:subject <a>; rdf:predicate <c>; rdf:object 3 ],
> >>>                 [ rdf:subject <a>; rdf:predicate <c>; rdf:object 4 ].
> >>> 
> >>> So, why do SPARQL folks prefer TriG and N-Quads to these forms?  I don't
> >>> know.    
> >> 
> >> The second is about five times more verbose. 
> > 
> > I'm not sure what you mean by "the second", but yes, these forms *feel*
> > verbose.    But are they?
> > 
> > Compared to TriG, these forms use about 10 bytes more per graph and
> > about 45 bytes more per triple.   I expect when gzip'd those numbers
> > would drop to about 2 bytes and 5 bytes.   Probably not a big deal.
> > 
> > If you're storing the reified graphs in a g-box, then yes, you have a
> > ~5x expansion of the original graphs, but is that a fair comparison?
> > With the other approaches, you CAN'T store the result in a g-box.  So
> > this is comparing apples to ... empty space.
> > 
> >> It is unsuitable for hand-writing.
> > 
> > Agreed, much like N-Triples or N-Quads.   Or RDFa or RDF/XML. 
> > 
> >> To be even remotely readable and efficiently processable, it relies on something that is not significant in RDF: order of statements.
> > 
> > I don't agree with your characterization here.   This only depends on
> > the ordering of statements the way lots of RDF data which serializes
> > objects does -- if the parser hands you the triples with good locality,
> > the data can be streamed; otherwise it must be buffered or queried.
> > 
> >> It is brittle because it raises the question of what to do with incomplete reified triples. 
> > 
> > It seems to me this is just like with any other error or omission in
> > the inputs.  You can ignore it, ask a human for help, or whatever, as
> > appropriate to your situation.    The semantics are clear enough: some
> > graph has some triples, but you haven't been told exactly what they are.
> > 
> >> Its verbosity explodes exponentially when one wants to say that Alice said that Bob said that Charlie said something.
> > 
> > I don't think so.   
> > 
> > How would you do this in TriG, which also doesn't support nesting?
> > Something like this:
> > 
> > <Alice> said <AlicesClaim>.
> > <AlicesClaim> { <Bob> said <BobsClaim> }
> > <BobsClaim>  {<Charlie> said <CharliesClaim> } 
> > 
> > That structure translates exactly to what I'm describing above, with
> > just the constant-factor expansion.
> > 
> >>> If you put that into N-Triples and sort it by predicate, performing the import is going to
> >>> require holding the entire structure in memory.  But a valid response might be, "don't do that".
> >> 
> >> "Don't do that" is not a practical response. The order of statements is not significant in RDF, and not maintained by many systems.
> > 
> > I disagree that it's not a practical response.   The point here is that
> > if you want to convey a multi-graph dataset in a format which allows it
> > to be parsed with constant memory and linear time, you will need to pay
> > attention to the order in which triples occur in that serialization.
> > And what you have to do is exactly what you have to do in serializing
> > turtle to avoid extraneous node ids -- using [] and () whenever possible
> > -- it's not a new or special or obscure thing.
> > 
> > This approach has the feature that you *can* store your multi-graph
> > dataset in a single graph, but that doesn't mean you ever have to do
> > that.  Just as you parse TriG with a special parser and load the data
> > into a quad-store, if you're doing serious multi-graph work, you would
> > presumably keep using a special parsing system and special store, for
> > efficiency, even though with this approach you don't really have to.
> > 
> >    -- Sandro (who can't believe he's defending RDF reification...)
> > 
> > 
> 
> 

Received on Friday, 8 April 2011 19:29:55 UTC