Re: RDF-ISSUE-5 (Graph Literals): Should we define Graph Literal datatypes? [RDF Graphs] from Steve Harris on 2011-03-05 (public-rdf-wg@w3.org from March 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Sat, 5 Mar 2011 09:03:17 +0000
To: Sandro Hawke <sandro@w3.org>
Cc: Pat Hayes <phayes@ihmc.us>, RDF Working Group WG <public-rdf-wg@w3.org>
Message-Id: <FFFCD071-C425-42DC-B55D-29E9E6F06AD9@garlik.com>
On 2011-03-05, at 05:17, Sandro Hawke wrote:
> On Fri, 2011-03-04 at 19:26 -0600, Pat Hayes wrote:
>> On Mar 4, 2011, at 3:59 PM, RDF Working Group Issue Tracker wrote:

snip

>> 1. This would allow such 'metaRDF' descriptions only for the case where the object graph - the one being described - is completely specified by its full textual representation. This would make such metaRDF almost unusable for large object graphs, and exceedingly awkward, at best, for all but toy object graphs. For any graph, the g-text is a much more verbose way to refer to it than a URI would be. 
> 
> I think there are lots of real apps that use many tiny (but not "toy")
> graphs of 1-50 triples, often where the graph represents an 'object' or
> a simple claim.    The data store might have a billion triples, but the
> granularity of the metadata is often per-triple or close to it.

That's true, and infact Garlik has some stores like this, but SPARQL-style named graphs work fine for this usecase.

I would say that if existing stores handle named graphs inefficiently, and I don't believe they do, that's an implementation issue.

> Yes, the URI is probably still smaller, but it presents its own
> problems, like worrying about the box contents changing.

You'll have to explain why that's an issue I'm afraid.

> I don't think apps like this weigh in heavily for either design.   
> 
> (I certainly agree situations with very large graphs get 'exceedingly
> awkward' for graph literals.)

Well, with some support from query languages and syntaxes that is "just" an implementation issue, but I just don't see any advantage.

>> 2. The full textual representation of a graph does not, ironically, serve to "identify" it in the sense required. Suppose I publish some RDF in a box with a URI. The URI identifies the box, but it does not identify the graph. The very same graph might be a snapshot of a different box with a different provenance and history and authority claiming it to be true. It is the box, not the graph, which will be asserted or will have a history or be deprecated, etc.. But a graph literal of a snap of a box does not identify the box. Even if we say that such a literal identifies any box whose snap is equivalent to the literal, the task of checking such equivalence is NP-complete (an old result of Jeremy's) so we have hamstrung our implementations ahead of time. And this is probably not a good rule to adopt, in any case, even if it were computationally cheap.
> 
> I think there are different use cases here.   Sometimes we want to
> reason about the box, sometimes we want to reason about a graph that
> might have been in certain boxes at certain times.   It's nice to be
> able to talk about both the boxes and the graphs.
> 
> I think you're proposing that whenever we want to talk about a known
> graph, we first put it in a box, and then talk about that box.   That's
> the technique, as I mentioned, that I've been using in my own code for
> many years.  It doesn't really need any new specs.  So, yes, it works,
> but I think it'd be somewhat clearer to be able to talk about the graphs
> themselves, sometimes.
> 
> (I can live with either answer to ISSUE-5; I just think it's something
> we'll need to think about and decide.)
> 
>> 3. It is completely unnecessary, if we have named graphs. A named graph has a name which refers to it and identifies its box. Most descriptive languages, including RDF, use names in this way to make assertions about the things named. AFAIKS, nothing is gained by making such a graph into a literal instead of simply using its name to refer to it. And this use of graph names requires no changes to any RDF syntax (or indeed semantics.)
> 
> Again, I'm all in favor of being explicit about boxes, sometimes giving
> them URI names, sometimes maybe referring to them using blank nodes.
> But I disagree that "nothing is gained by making such a graph into a
> literal instead of simply using its name to refer to it."   You mean
> instead of simply using the name of a box that currently happens to
> contain it?  What if that box content changes?  How do you find out what
> URI names which graph?   These issues can be addressed, but in some
> cases, particularly when you're working with boxes whose contents change
> rapidly (current stock price) or continually (current temperature), I
> think it's probably better to also have an explicit notion of the graphs
> themselves.

bGraph identifiers would perhaps be convenient where someone doesn't want to go to the effort of assigning a name to the graph, but bNodes don't seem exactly popular with some portion of the community.

3store supported bNodes-as-graph-identifiers, but neither 4store, or 5store do. No-one has complained about that as far as I can recall.

- Steve

-- 
Steve Harris, CTO, Garlik Limited
1-3 Halford Road, Richmond, TW10 6AW, UK
+44 20 8439 8203  http://www.garlik.com/
Registered in England and Wales 535 7233 VAT # 849 0517 11
Registered office: Thames House, Portsmouth Road, Esher, Surrey, KT10 9AD
Received on Saturday, 5 March 2011 09:03:54 UTC