Re: A rant about the terminology debate from Charles Greer on 2012-08-24 (public-rdf-wg@w3.org from August 2012)

From: Charles Greer <cgreer@marklogic.com>
Date: Fri, 24 Aug 2012 10:06:44 -0700
To: Sandro Hawke <sandro@w3.org>
CC: Kingsley Idehen <kidehen@openlinksw.com>, "public-rdf-wg@w3.org" <public-rdf-wg@w3.org>
Message-ID: <5037B4A4.1010303@marklogic.com>
On 08/24/2012 09:09 AM, Sandro Hawke wrote:
> On 08/24/2012 07:32 AM, Kingsley Idehen wrote:
>> On 8/24/12 7:10 AM, Sandro Hawke wrote:
>>> On 08/24/2012 05:31 AM, Richard Cyganiak wrote:
>>>> Sandro,
>>>>
>>>> On 24 Aug 2012, at 03:53, Sandro Hawke wrote:
>>>>> People will need some way linguistically to distinguish X from Y
>>>>> People will instinctively say X to clarify they mean Y
>>>>> People rarely mean to only be talking about X, and when they do,
>>>>> they should put the word Y in there
>>>>> People should be able to use X when they want to clarify they're
>>>>> talking about Y
>>>> That kind of argument is irrelevant. People talk the way they talk.
>>>> They won't pay attention to what we write anyways for the most part.
>>>> We're not in the business of helping them to express themselves.
>>>>
>>>> We are in the business of extending an existing technical
>>>> specification with clear definitions of a few additional concepts in
>>>> order to promote consistency between W3C specifications and to
>>>> promote interoperability between implementations that already use
>>>> these concepts in some form.
>>>>
>>>> We'll never get anywhere with this discussion if we entertain the
>>>> crazy notion that we can magically solve all uncertainty and
>>>> enlighten the RDF community with a Great Global Renaming.
>>> That's a strawman.  I'm not saying names will magically solve
>>> anything.  It sounds like our disagreement is a matter of degree. I
>>> believe our choice of terms will affect the quality & uptake of our
>>> specs by X%, while you believe it is a smaller amount Y%. Perhaps
>>> X=30 and Y=10.
>>>
>>> This is complicated by the fact that names interact with mental
>>> models, so sometimes when we're discussing names, I think it turns
>>> out to be a proxy for mental models.     And as we change our models,
>>> our opinions on the best names are likely to change.
>>>
>>> So, let's put this back in the box for now, and hopefully when we've
>>> got the model solved, we can spend one telecon talking over proposals
>>> and make a final decision.   (I suppose we should give it an ISSUE
>>> number, or broaden ISSUE-14 to include this.)
>>>
>>> I updated http://www.w3.org/2011/rdf-wg/wiki/Graph_Terminology and made
>>> http://www.w3.org/2011/rdf-wg/wiki/Graph_Terminology/Options and I'm
>>> quite happy to let the matter drop for as long as possible.
>>>
>>>        -- Sandro
>> Sandro,
>>
>> How would you map g-box, g-snap, and g-text in formal relational DBMS
>> terminology? Such a mapping would help many. Basically, mapping to
>> relations, sets of tuples, and notation.
>>
> I'm not really fluent in RDBMS theory terminology.   I do know the
> terminology database app developers use, though, I think -- the kind of
> stuff you find in the Oracle or MySQL manuals (talking about "tables"
> instead of "relations").   In that terminology, I'd say:
>
>     g-box: table (or view)
>     g-snap: dump of a table (or view)
>     g-snap: not something one normally deals with; either:
>         - a state of a table; or
>         - a value which is the set of all the rows in a table.
>
> This is more of an analogy than a real correspondence, since a table row
> is not the same thing as an RDF triple, in general.    (You could make a
> Subject/Property/Value table, but the data typing of the value wouldn't
> work right, in general.)
>
>       -- Sandro
Analogies do help.  Using CJ Date's terminology for relational things 
you might say

g-box: relational variable
g-snap: relational value

This led me to consider -
g-snap -- a value
name + g-snap is a variable with state
name + g-box is a named lambda expression.

The two kinds of naming look as though they depending on the context 
you're working with (assigning state to variables, vs. building 
expressions for evaluation).

I wonder if there's a distinction we could leverage between "RDF Graph" 
and "Linked Data Graph"

Charles

>> Kingsley
>>>> Best,
>>>> Richard
>>>>
>>>>
>>>> On 24 Aug 2012, at 03:53, Sandro Hawke wrote:
>>>>
>>>>> On 08/23/2012 11:22 AM, Richard Cyganiak wrote:
>>>>>> On 23 Aug 2012, at 16:00, Sandro Hawke wrote:
>>>>>>>> You proposed to redefine "graph" by splitting it into two
>>>>>>>> separate concepts, a mutable and an immutable one.
>>>>>>>>
>>>>>>>> I propose to instead redefine "named graph" in the same way, by
>>>>>>>> splitting it into two separate concepts, a mutable and immutable
>>>>>>>> one.
>>>>>>> You lost me here, sorry.   What's the use case for an immutable
>>>>>>> named graph?
>>>>>> I guess I should have said "abstract named graph", sorry if that
>>>>>> caused confusion. Abstract IRI-graph-pairs. The thing that SPARQL
>>>>>> queries operate over.
>>>>>>
>>>>>>> And it sounds like you're suggesting "mutable named graph" as the
>>>>>>> official term for g-box.  Is that right?
>>>>>> Almost. My definition of "mutable named graph" would be:
>>>>>>
>>>>>> "A *mutable named graph* is a resource, denoted by an IRI, that
>>>>>> has a mutable association with an (abstract, immutable) RDF graph.
>>>>>> The RDF graph is also known as the *state* of the mutable named
>>>>>> graph."
>>>>>>
>>>>>> The key points are:
>>>>>>
>>>>>> 1) we insist that it is a resource, so the kind of thing denoted
>>>>>> by IRIs
>>>>>> 2) we insist that it is actually denoted by some IRI
>>>>>> 3) it essentially has a mutable slot that contains an RDF graph
>>>>>>
>>>>>> This means it can cover both the terms "RDF space/g-box" and the
>>>>>> term "(name, slot) pair" from the diagram in [1].
>>>>>>
>>>>>> I repeat my assertion that there is no need to ever talk about
>>>>>> unnamed g-boxes.
>>>>> Yeah, this makes sense, but it's not my first or second choice in
>>>>> naming proposals.   Probably wouldn't help to go into why/why not
>>>>> at this point.
>>>>>
>>>>>>>>> I think the key elements are : (1) we stop using "RDF Graph" as
>>>>>>>>> the
>>>>>>>>> canonical, precise term for a g-snap;
>>>>>>>> I disagree; "RDF graph" is a perfectly fine term.
>>>>>>> I wish.   I can live with it, but I think it's hardly "fine".
>>>>>>> People use it wrong all the time; they say "RDF graph" and mean a
>>>>>>> mutable and/or distinct set of RDF triples.
>>>>>> I think by actually defining proper terms for these other things,
>>>>>> and by clarifying that "graph" can mean "any of the above", we
>>>>>> make a solid step towards improving the situation.
>>>>> I think you're saying "RDF graph"==g-snap; "graph"=g-snap/or/g-box.
>>>>>
>>>>> I have a problem with this.  I think people will need some way
>>>>> linguistically to distinguish "graph" in the RDF world from "graph"
>>>>> in the wider world, and the natural way to do that is to add the
>>>>> modifier, "RDF".   So people will instinctively say "RDF graph" to
>>>>> clarify they mean "graph" in the RDF sense (not a bar chart or
>>>>> something).    But with your proposal, they've now accidentally
>>>>> changed to talking about g-snaps.
>>>>>
>>>>> I think people rarely mean to only be talking about g-snaps, and
>>>>> when they do, they can/should put the word "abstract" in there.   I
>>>>> also think the presence or absence of the modifier "RDF" shouldn't
>>>>> affect the semantics of the term -- people should be able to use it
>>>>> when they want to clarify they're talking about RDF, without it
>>>>> otherwise affecting the meaning.
>>>>>
>>>>>     -- Sandro
>>>>>
>>>>>> Best,
>>>>>> Richard
>>>>>>
>>>>>> [1] http://www.w3.org/2012/08/RDFNG.html#fig1
>>>>>>
>>>>>>
>>>>>>
>>>>>>> I'm not saying we have to solve this problem, or that we can, but
>>>>>>> I think it would be helpful if we could and I think this proposal
>>>>>>> is our best bet.
>>>>>>>
>>>>>>>        -- Sandro
>>>>>>>
>>>>>>>> But we can stress that "RDF graph" is an abstract, unnamed,
>>>>>>>> immutable graph, and that when we talk about "graphs" in general
>>>>>>>> then we may sometimes mean named ones that may or may not be
>>>>>>>> mutable.
>>>>>>>>
>>>>>>>>> (2) we pick terms for g-box and g-snap that convey the idea
>>>>>>>>> that they are two different kinds of "graphs";
>>>>>>>> I disagree; I believe that there is never any need to talk about
>>>>>>>> *unnamed* g-boxes; all the g-boxes we want to talk about are
>>>>>>>> named. Therefore, a term like "mutable named graph" is
>>>>>>>> sufficient to say all that needs to be said about g-boxes.
>>>>>>>>
>>>>>>>>> (3) we use "graph" if/when we don't mind being ambiguous about
>>>>>>>>> g-box/g-snap.
>>>>>>>> I'd rephrase that: We can use "graph" if/when we don't mind
>>>>>>>> being ambiguous about
>>>>>>>> g-snap/abstract-named-graph/mutable-named-graph. For example
>>>>>>>> when we say, "SPARQL Update can be used to copy data from one
>>>>>>>> graph to another". In that case we mean mutable-named-graph.
>>>>>>>>
>>>>>>>>> On your details....  let me start with:  to you, can you have a
>>>>>>>>> named
>>>>>>>>> graph that's not in a dataset (or graph store)?
>>>>>>>> As defined in SPARQL (named graph == IRI-graph-pair), no.
>>>>>>>>
>>>>>>>> But if we allow a term such as "mutable named graph", then yes.
>>>>>>>> A Turtle document on the Web is a "mutable named graph", in that
>>>>>>>> sense. It doesn't have to be in any particular dataset. Well,
>>>>>>>> it's in the Web, and for me it makes sense to speak of the
>>>>>>>> entire web as a "mutable RDF dataset".
>>>>>>>>
>>>>>>>> Best,
>>>>>>>> Richard
>>>>>>>>
>>>>>>>>
>>>>>>>>> I don't usually hear the term used outside SPARQL, so I don't
>>>>>>>>> have much of an ear for that usage.
>>>>>>>>        -- Sandro
>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>> Richard
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 22 Aug 2012, at 18:07, Sandro Hawke wrote:
>>>>>>>>>
>>>>>>>>>> On 08/21/2012 03:33 AM, Andy Seaborne wrote:
>>>>>>>>>>> On 20/08/12 16:30, Sandro Hawke wrote:
>>>>>>>>>>>> If it wouldn't cause SPARQL too many problems, I'd suggest
>>>>>>>>>>>> we should do
>>>>>>>>>>>> the same with dataset, and even allow a dataset to be a kind
>>>>>>>>>>>> of graph, I
>>>>>>>>>>>> think, so that the world at large can use the word term "RDF
>>>>>>>>>>>> dataset"
>>>>>>>>>>>> for any collection of RDF data (whether or not it's
>>>>>>>>>>>> segmented into named
>>>>>>>>>>>> graphs).
>>>>>>>>>>> That would be problematic.  "RDF Dataset" is a specifically
>>>>>>>>>>> defined term.  "Dataset" we can be loose about (c.f. VoiD) ;
>>>>>>>>>>> "RDF Dataset" is stressing the tie to a particular
>>>>>>>>>>> definition. You might as well mix properties and triples if
>>>>>>>>>>> you're going to mix things of different "shape".
>>>>>>>>>> In the telecon, I mentioned on irc the term "bacronym" but
>>>>>>>>>> what I meant was "retronym". These are terms like "cow milk"
>>>>>>>>>> that arise once some term ("milk") becomes ambiguous (eg
>>>>>>>>>> because of soy milk, almond milk, rice milk, etc).  See
>>>>>>>>>>
>>>>>>>>>> I take the "radical proposal" to be the recognition that some
>>>>>>>>>> terms are ambiguous and we need to make retronyms to
>>>>>>>>>> disambiguate them.
>>>>>>>>>>
>>>>>>>>>> Here's a revised proposal:
>>>>>>>>>>
>>>>>>>>>>     - We pick terms like "Abstract RDF Graph" (gsnap) and
>>>>>>>>>> "Maintained RDF Graph" (gbox) that fit the retronym model.
>>>>>>>>>> It makes it easy, when someone says "graph" or "RDF Graph", to
>>>>>>>>>> think/ask, "do you mean abstract or maintained?"     (I don't
>>>>>>>>>> find these terms quite as ontologically comfortable as g-snap
>>>>>>>>>> and g-box/space/data-source, because it makes them both be
>>>>>>>>>> subclasses of "graph", but I think this approach  works better
>>>>>>>>>> for the community.)
>>>>>>>>>>
>>>>>>>>>>     - We clarify that in all W3C specs to date, "RDF Graph"
>>>>>>>>>> means "Abstract RDF Graph"
>>>>>>>>>>
>>>>>>>>>>     - Going forward, we avoid using the term "RDF Graph", using
>>>>>>>>>> either Abstract Graph or Maintained Graph  (with or without
>>>>>>>>>> "RDF" in there).   Or just "graph" when we don't care which kind.
>>>>>>>>>>
>>>>>>>>>> I think that much of the confusion around the term "named
>>>>>>>>>> graph" comes from a lack of clarity around whether what is
>>>>>>>>>> meant is a "named abstract graph" or a "named maintained
>>>>>>>>>> graph".   I think the latter is much more common; the
>>>>>>>>>> difference doesn't manifest in SPARQL 1.0 because it doesn't
>>>>>>>>>> consider the idea of data changing. In my mind, this proposal
>>>>>>>>>> is our best chance for being able to coherently keep using the
>>>>>>>>>> term "named graph", which seems to be very popular.
>>>>>>>>>>
>>>>>>>>>> BTW, I think we might also want to define "Frozen" graph,
>>>>>>>>>> which is a maintained graph in the sense that it exists in a
>>>>>>>>>> computer's storage, but which is required to never change.
>>>>>>>>>> This is, I think, mostly what PROV wants to use.
>>>>>>>>>>
>>>>>>>>>>       -- Sandro
>>>>>>>>>>
>>>
>>>
>>>
>>
>


-- 
Charles Greer
Senior Engineer
MarkLogic Corporation
charles.greer@marklogic.com
Phone: +1 707 408 3277
www.marklogic.com

This e-mail and any accompanying attachments are confidential. The information is intended solely for the use of the individual to whom it is addressed. Any review, disclosure, copying, distribution, or use of this e-mail communication by others is strictly prohibited. If you are not the intended recipient, please notify us immediately by returning this message to the sender and delete all copies. Thank you for your cooperation.
Received on Friday, 24 August 2012 17:07:15 UTC