Re: naming dataset syntax from Kingsley Idehen on 2012-09-26 (public-rdf-wg@w3.org from September 2012)

From: Kingsley Idehen <kidehen@openlinksw.com>
Date: Wed, 26 Sep 2012 18:34:10 -0400
To: public-rdf-wg@w3.org
Message-ID: <506382E2.3060407@openlinksw.com>
On 9/26/12 6:03 PM, David Wood wrote:
> Hi Arnaud,
>
> On Sep 26, 2012, at 17:13, Arnaud Le Hors <lehors@us.ibm.com 
> <mailto:lehors@us.ibm.com>> wrote:
>
>> I hear you David, and I'm not saying we should try and do everything 
>> with one format either but we should try and find the right balance. 
>
> Yes, I agree of course.
>
>> The current trend of creating a new format for every use case worries 
>> me. Even if you want to see it as a feature rather than a bug there 
>> is something to be said about feature bloat. :-)
>>
>> I think we could do without RDF/XML and hope it eventually goes away 
>> once we have Turtle. I think it has done more harm than good.
>
> Yes, I used to think that way too, until I spent some time with 
> several large enterprises recently.  They exclusively use RDF/XML 
> because they can create it via XSLT from their many systems that 
> produce XML output.  We are now using it in Callimachus for exactly 
> the same reason.
>
> Now, I personally only write Turtle whenever I can (which is 
> generally), but I doubt that RDF/XML will be going away soon.  XML is 
> the more mature technology base and is widely deployed.

-1

RDF/XML has done more harm than good, and that's from someone that's 
overseen the design, development, and implementation of 100+ xslt based 
data converters using RDF/XML. It should have never held a marquee 
position re. RDF, and keeping it so for 12 years is something I would 
desperately like to forget actually happened -- bearing in mind how long 
Turtle has been around.

Let's leave RDF/XML and its baggage in the backroom where it belongs :-)

>
>> I think there is a good reason for having RDFa and JSON-LD which are 
>> very different from Turtle. I can't say the same about TriG vs Turtle.
>
> Perhaps it would be better to think about our TriG efforts as 
> experimental, or more leading edge.

TriG is just an optional syntax that will mostly used by those that use 
Turtle. It does address the vital issue of partitioning and packaging 
triples scoped specific named graphs into a single RDF document. As 
folks start using Turtle en masse (which is already in full swing), they 
will naturally desire and then seek the aforementioned capability.


> We have so many different ways to deal with groupings of RDF 
> statements and they are all implementation-specific.  It is useful to 
> look back at our charter, which says:
> [[
> The RDF Community has used the term “named graphs” for a number of 
> years in various settings, but this term is ambiguous, and often 
> refers to what could rather be referred as quoted graphs, graph 
> literals, URIs for graphs, knowledge bases, graph stores, etc. The 
> term “Support for Multiple Graphs and Graph Stores” is used as a 
> neutral term in this charter; this term is not and should not be 
> considered as definitive. The Working Group will have to define the 
> right term(s).
> ]]
> …and requires us to "Standardize a model and semantics for multiple 
> graphs and graphs stores".
>
> We chose last year to split the problem into two so we could at least 
> make progress on standardizing Turtle while we arg^H^H^Hdiscussed 
> named graphs.  Thus, we ended up with a separate format.
>
> There are also many who wish to keep Turtle small and simple for the 
> sake of standardizing something that already works well.  I sympathize 
> with them, but that is beside the point.  I think you would have few 
> backers if you wanted to fold multiple graphs into Turtle.  Go ahead 
> and ask if you like.  That might be quicker than finding the right 
> threads in our already bloated email archive :/

The important thing here is to keep RDF "horses for courses" compliant 
re. syntaxes. We can't eradicate audience diversity, so multiple 
syntaxes for a common data model aligns well the said model's dexterity :-)


Kingsley
>
> Regards,
> Dave
>
>>
>> When users asked where to start it's never the best to have to answer 
>> "it depends" or something equivalent.
>> --
>> Arnaud  Le Hors - Software Standards Architect - IBM Software Group
>>
>>
>>
>>
>> From: David Wood <david@3roundstones.com 
>> <mailto:david@3roundstones.com>>
>> To: Arnaud Le Hors/Cupertino/IBM@IBMUS,
>> Cc: W3C RDF WG <public-rdf-wg@w3.org <mailto:public-rdf-wg@w3.org>>
>> Date: 09/26/2012 12:22 PM
>> Subject: Re: naming dataset syntax
>> ------------------------------------------------------------------------
>>
>>
>>
>> Hi Arnaud,
>>
>> I appreciate your need for marketing simplicity.  However, please 
>> consider this:
>>
>> RDF used to have one standard format (RDF/XML) which was, as you say, 
>> overly complicated for many potential users.  Now we have two 
>> standard formats (RDF/XML and RDFa).  Those serve very different 
>> communities (enterprise XML developers and some Web developers).  We 
>> are now in the process of defining either two additional standard 
>> formats (Turtle and JSON-LD) or three (if we add TriG).  Again, the 
>> potential users of those formats are different, but in each case we 
>> can parse the formats as RDF.
>>
>> To my mind, that is a feature, not a bug.  We do not need to explain 
>> each format to all users.  Instead, we need to figure out which kind 
>> of user is in front of us and tell them about the format that most 
>> closely suits their needs.
>>
>> Regards,
>> Dave
>>
>>
>>
>>
>> On Sep 26, 2012, at 15:12, Arnaud Le Hors <_lehors@us.ibm.com_ 
>> <mailto:lehors@us.ibm.com>> wrote:
>>
>> I realize this group is more interested in technical purity than 
>> marketing and that from a technical point of view using two different 
>> formats and names can be totally justified but I'd like to ask 
>> everyone to think about the bigger picture here.
>>
>> RDF is already plagued with the image of being an overly complicated 
>> technology and this is hindering its uptake in the industry. We 
>> really don't want to make things worse by introducing a bunch of new 
>> formats and names.
>>
>> In a private email Andy wrote to me:
>>
>> > A collection of graphs isn't itself a graph.
>> >
>> > A syntax for a collection of graphs isn't a syntax for a graph.
>>
>> This certainly makes perfect sense and is very simply put. As an 
>> engineer I can certainly appreciate the difference but as someone 
>> interested in helping adoption of RDF in the industry I just don't 
>> think this is worth introducing a whole new format and name.
>>
>> Turtle is providing us with something everyone can understand (unlike 
>> RDF/XML) and the name has been out there for a while now. We should 
>> try to build on that rather than start confusing things (again) with 
>> the introduction of multiple formats.
>>
>> Could we not simply have two different versions of Turtle with a way 
>> for programs to differentiate the two so that we can still only talk 
>> about Turtle?
>>
>> Regards.
>> --
>> Arnaud  Le Hors - Software Standards Architect - IBM Software Group
>>
>>
>> Sandro Hawke <_sandro@w3.org_ <mailto:sandro@w3.org>> wrote on 
>> 09/26/2012 11:18:34 AM:
>>
>> > From: Sandro Hawke <_sandro@w3.org_ <mailto:sandro@w3.org>>
>> > To: David Wood <_david@3roundstones.com_ 
>> <mailto:david@3roundstones.com>>,
>> > Cc: Arnaud Le Hors/Cupertino/IBM@IBMUS, W3C RDF WG 
>> <_public-rdf-wg@w3.org_ <mailto:public-rdf-wg@w3.org>>
>> > Date: 09/26/2012 11:19 AM
>> > Subject: naming dataset syntax
>> >
>> > On 09/26/2012 01:58 PM, David Wood wrote:
>> > Hi Arnaud,
>> >
>> > We agreed quite early (Feb 2011) to "use 
>> _http://www.w3.org/2010/01/Turtle/_
>> > as the starting point for the Turtle work" [1] and in April 2011 to
>> > limit syntactic sugar additions to Turtle [2].
>> >
>> > IIRC, we had substantial conversations regarding the desirability of
>> > turning Turtle into a quad language, but we decided (without
>> > resolution) not to do that because:
>> > - Turtle is widely fielded already
>> > - We wished to minimize disruption, as per our charter
>> > - Issues around datasets/quads were (and are) less agreed upon
>> >
>> >
>> > Yes, we agreed to get Turtle out the door as a language for Triples.
>> >
>> > So, now, what do we call a language that's like Turtle except it can
>> > also include datasets (that is, the triples can be segmented into
>> > named sections)?
>> >
>> > Frankly I expect this language to supplant Turtle as soon as it is
>> > well supported, as long as it doesn't do anything to exclude simple
>> > usage.   I think the kind of people who use Turtle (or RDF) are the
>> > kind of people who will want to segment and manage their data. But
>> > (1) I could be wrong, and (2) it may be a long time before it is
>> > well-supported, given how confused we are about it within the WG.
>> >
>> > So, myself, I'm split about what to call it.  Compared to me,
>> > however, the WG, tends to lean more toward existing users and
>> > experts, over new users and non-experts, so I expect the WG to just
>> > go with "trig" unless someone makes a strong case for something else.
>> >
>> > (In my prototype coding, I called the hypothetical trig-like
>> > language "mugl", for MultiGraphLanguage.    If we start from a blank
>> > slate, we can probably do better than mugl or trig.)
>> >
>> >        -- Sandro
>> >
>> >
>>
>> > Regards,
>> > Dave
>> >
>> > [1] _http://www.w3.org/2011/rdf-wg/meeting/2011-02-23#resolution_1_
>> > [2] _http://www.w3.org/2011/rdf-wg/track/issues/34_
>>
>> >
>> > On Sep 26, 2012, at 12:42, Arnaud Le Hors <_lehors@us.ibm.com_ 
>> <mailto:lehors@us.ibm.com>> wrote:
>> >
>> > Hi Sandro,
>> >
>> > This discussion had already started when I joined the WG and as I
>> > caught it midstream I thought it was about extending Turtle. I've
>> > since then realized that this wasn't the intent and everybody seems
>> > to agree with that but I must admit that I still don't know why.
>> > Could you please explain or point me to some reference I could read
>> > to catch up on that?
>> >
>> > I have to say that the proliferation of formats for RDF makes me a
>> > bit nervous. This doesn't go along with making RDF simpler for the
>> > masses/industry and facilitating adoption.
>> >
>> > Thanks.
>> > --
>> > Arnaud  Le Hors - Software Standards Architect - IBM Software Group
>> >
>> >
>> > Sandro Hawke <_sandro@w3.org_ <mailto:sandro@w3.org>> wrote on 
>> 09/25/2012 04:14:25 PM:
>> >
>> > > From: Sandro Hawke <_sandro@w3.org_ <mailto:sandro@w3.org>>
>> > > To: W3C RDF WG <_public-rdf-wg@w3.org_ 
>> <mailto:public-rdf-wg@w3.org>>,
>> > > Date: 09/25/2012 04:14 PM
>> > > Subject: Dataset Syntax - checking for consensus
>> > >
>> > > I'm not sure how much progress we'll be able to make on dataset
>> > > semantics tomorrow, so I thought I'd draft some proposals on dataset
>> > > syntax.   The chairs can put this on the agenda is they like (but 
>> it's
>> > > too short notice for these decisions to be binding yet).  I'm 
>> thinking
>> > > it would be useful to see how close we are to agreement on these 
>> issues.
>> > >
>> > > If you followup with votes, please use -1 for Formal Objection, 0 
>> for
>> > > abstain, +1 for approve.   Numbers in between are fine, too.
>> > >
>> > > PROPOSED: We will produce a W3C Recommendation for a dataset syntax,
>> > > similar to TriG and to SPARQL's named graph syntax.
>> > >
>> > > PROPOSED: We'll request a media-type for this syntax which is 
>> different
>> > > from the media-type for Turtle.  (That is, we will not consider this
>> > > language to supplant Turtle and take over the name, becoming the new
>> > > "Turtle", as was once proposed.)
>> > >
>> > > PROPOSED: Our dataset syntax will allow for the expression of empty
>> > > named graphs, whatever their semantics might be (to be decided). The
>> > > syntax is an empty curly-braces expression, as in "<g> { }".
>> > >
>> > > PROPOSED: Our dataset syntax will have some standard mechanism 
>> (to be
>> > > determined within the next few weeks) through which a Dataset
>> > > serialization can include some RDF data about the Dataset (that 
>> is, some
>> > > metadata in the form of an RDF graph).
>> > >
>> > >
>> > > Below, there are groups of proposals which are alternative 
>> solutions to
>> > > a design issue.   If you approve of more than one of the 
>> alternatives,
>> > > please vote "+2" for your favorite.
>> > >
>> > > * Name of the dataset syntax
>> > >
>> > > PROPOSED: We will call our recommended dataset syntax "trig",
>> > > capitalized to Trig as needed.
>> > > PROPOSED: We will call our recommended dataset syntax "TriG", but
>> > > informally and in the media type, "trig".
>> > > PROPOSED: We will call our recommended dataset syntax "TriG", and 
>> use
>> > > that capitalization everywhere.
>> > >
>> > > * Use of equals sign, like <g> = { <s> <p> <o> } .  This is not in
>> > > SPARQL but is in traditional TriG, for compatibility with N3.
>> > >
>> > > PROPOSED: In our dataset syntax, a "=" MAY appear between the 
>> name and
>> > > the graph.
>> > > PROPOSED: In our dataset syntax, a "=" MUST appear between the 
>> name and
>> > > the graph.
>> > > PROPOSED: In our dataset syntax, a "=" MUST NOT appear between 
>> the name
>> > > and the graph.
>> > >
>> > > * Use of the "graph" keyword, which MUST be used in SPARQL and 
>> MUST NOT
>> > > be used in traditional TriG.
>> > >
>> > > PROPOSED: In our dataset syntax, the case-insensitive keyword 
>> "graph"
>> > > MAY appear before the name, in a name-graph pair.
>> > > PROPOSED: In our dataset syntax, the case-insensitive keyword 
>> "graph"
>> > > MUST appear before the name, in a name-graph pair.
>> > > PROPOSED: In our dataset syntax, the case-insensitive keyword 
>> "graph"
>> > > MUST NOT appear before the name, in a name-graph pair.
>> > >
>> > > * Use of curly braces { <a> <b> <c> } around the default graphs. 
>>   They
>> > > MUST be used in traditional TriG, and MUST NOT be used in SPARQL.
>> > >
>> > > PROPOSED: In our dataset syntax, triples of the dataset's default 
>> graph
>> > > MAY be surrounded by curly braces.
>> > > PROPOSED: In our dataset syntax, triples of the dataset's default 
>> graph
>> > > MUST be surrounded by curly braces.
>> > > PROPOSED: In our dataset syntax, triples of the dataset's default 
>> graph
>> > > MUST NOT be surrounded by curly braces.
>> > >
>> > > * Some designs for carrying for metadata
>> > >
>> > > PROPOSED: In our dataset syntax, we'll say that metadata goes in the
>> > > default graph
>> > > PROPOSED: In our dataset syntax, we'll say that the default graph 
>> goes
>> > > inside curly braces and the metadata goes outside curly braces
>> > > PROPOSED: In our dataset syntax, we'll say that metadata goes 
>> inside a
>> > > set curly braces after a keyword "meta".
>> > > PROPOSED: In out dataset syntax, we'll have a keyword "meta" 
>> followed by
>> > > "default" or the name of a named graph, to indicate to readers 
>> where the
>> > > metadata is.
>> > >
>> > >
>>
>


-- 

Regards,

Kingsley Idehen	
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen
Attachments

application/pkcs7-signature attachment: S/MIME Cryptographic Signature
Received on Wednesday, 26 September 2012 22:34:36 UTC