Re: RDF Data Shapes WG agenda for 19 May 2016 from Dimitris Kontokostas on 2016-05-19 (public-data-shapes-wg@w3.org from May 2016)

From: Dimitris Kontokostas <kontokostas@informatik.uni-leipzig.de>
Date: Thu, 19 May 2016 09:09:18 +0300
To: "Peter F. Patel-Schneider" <pfpschneider@gmail.com>
Cc: Arnaud Le Hors <lehors@us.ibm.com>, public-data-shapes-wg <public-data-shapes-wg@w3.org>
Message-ID: <CA+u4+a23X=+TYiYmkQy1Jf=DCBURXjgbVWwTN6f1wvZ4eD6Jeg@mail.gmail.com>
On Thu, May 19, 2016 at 1:15 AM, Peter F. Patel-Schneider <
pfpschneider@gmail.com> wrote:

> Here is a proposed partial reply to Tom Baker.  It depends on some changes
> that have not been done to the SHACL specification.
>
> peter
>
> Date: Sun, 1 May 2016 16:40:21 +0200
> From: Thomas Baker <tom@tombaker.org>
> To: RDF Shapes <public-rdf-shapes@w3.org>
> >
> > Comments on
> >
> > Shapes Constraint Language (SHACL)
> > Editors Draft 29 April 2016
> > http://w3c.github.io/data-shapes/shacl/
> >
> > Some context: I have followed this activity since participating in the
> workshop
> > on RDF validation in 2013 [1].  The activity seemed like it might
> achieve the
> > goals pursued a decade ago with the DCMI Working Draft, Description Set
> Profile
> > Constraint Language [2].  I have tried to keep up with the excellent
> work by
> > Karen Coyle, Antoine Isaac, Hugo Manguinhas, Thomas Hartmann, and others
> on
> > comparing the emerging SHACL specification to requirements that have
> > accumulated over the years in the Dublin Core community.
> >
> > There is alot to like in SHACL but I must confess that each time I tried
> to
> > actually read the specification I found myself getting stuck at the same
> > places.  I'd set it aside, assuming that the issues would shake out.
> Many
> > months later, however, I find the same sticking points, unchanged.  This
> time I
> > pressed on through the introduction to Section 2.1.
> >
> > These comments convey my thoughts while reading the text and end with
> some
> > suggestions.  I have made no effort to catch up on discussion in the
> relevant
> > mailing lists [4,5], so please forgive me if I simply cover issues here
> that
> > are already well-understood.
> >
> > Abstract
> >
> >   First sentence (also first sentence of Introduction):
> >
> >     "SHACL is a language for describing and constraining the contents of
> RDF
> >     graphs"
> >
> >   So I ask myself: If an RDF graph is an immutable set of triples, in
> what
> >   sense can it be "constrained"?  If an RDF graph is a description with a
> >   meaning determined by RDF semantics, what does it mean for that
> _description_
> >   to be "described"?  Surely SHACL is not meant to somehow limit the
> >   RDF-semantic meaning of an RDF graph, which would make no sense, but
> then
> >   what does mean "constraining" mean?  Surely the specification of a
> >   "constraint language" should start by defining "constraint".
>
> 1: Change: Replace "constraining" by "validating" whereever possible in the
> document.
>

One problem with this is that C in SHACL stands for Constraint and if we
eliminate the term constraint in the document the name becomes irrelevant.
Especially in the abstract we should, imho,  have all the SHACL terms
(Shapes, Constraint, Language).


>
> 2: Change in abstract and introduction: SHACL is a language for validating
> whether RDF graphs meet certain conditions.
>
> >   Further on, one finds that the "constraint language" actually has
> nothing to
> >   do with somehow constraining RDF graphs and everything to do with
> describing
> >   an instance of the class "shape", which can be used with a process for
> >   determining whether a given RDF graph conforms to the set of
> constraints
> >   described in that shape ("validation").  In the Abstract, however,
> validation
> >   is mentioned only in passing ("can be used to communicate information
> about
> >   data structures...  generate or validate data, or drive user
> interfaces").
> >
> >   The Abstract concludes with an unsettling reference to the "underlying
> >   semantics" of SHACL.  We already have RDF semantics. Will this document
> >   define another?
>
> 3: Change:  Use "SHACL" to differentiate SHACL versions of terminology
> from RDF
> and RDFS versions throughout the document.
>
> > 1. Introduction
> >
> >     "This document defines what it means for an RDF graph... to conform
> to a
> >     graph containing SHACL shapes"
> >
> >   An improvement over the Abstract.
>
> 4: Change to:
>
> This document defines the SHACL Shapes Constraint Language, a language for
> validating RDF graphs against a set of conditions.  These conditions are
> provided as shapes and other constructs expressed in the form of an RDF
> graph.  RDF graphs that are used in this manner are called "shapes graphs"
> in
> SHACL and the RDF graphs that are validated against a shapes graph are
> called "data graphs".  As SHACL shape graphs are used to validate that data
> graphs satisfy a set of conditions they can also be viewed as a description
> of the data graphs that do satisfy these conditions.
>
> > 1.2. SHACL example
> >
> >     "A shapes graph containing shape definitions and other information
> that can
> >     be utilized to determine what validation is to be done"
> >
> >   The wording is odd.  How about:
> >
> >     "A shapes graph, which describes a set of constraints, can be used to
> >     determine whether a given data graph conforms to the constraints."
>
> 5: Change to:
>
> A shapes graph contains shapes and other information to determine
> whether a data graph validates aganinst the shapes graph.
>
> >   Up to this point, has the text actually said that SHACL shape graphs
> are
> >   expressed in RDF?  The Document Outline does say that examples are
> expressed
> >   in Turtle syntax, which strongly implies RDF.  But that SHACL shape
> graphs
> >   are expressed in RDF is actually not obvious for anyone who knows that
> SPARQL
> >   also expresses shape-like constructs for matching against RDF data,
> and that
> >   SPARQL constructs are not themselves expressed in RDF.
>
> >   (As an aside, readers of RDF 1.1 Turtle will find instances with
> prefixed
> >   names in lowercase, whereas in the SHACL spec the prefixed names are in
> >   uppercase.  A sentence about the naming conventions used in this
> document
> >   could make this explicit.)
> >
> >   Section 1.2 continues:
> >
> >     "ex:IssueShape... [has constraints that apply]... to a (transitive)
> >     subclass of ex:Issue following rdf:subClassOf triples"
> >
> >   Hmm - nothing in the spec has yet hinted that the process of
> validating a
> >   data graph against a shape graph will _require_ additional, out-of-band
> >   information such as schema definitions.
>
> 6: *NEEDS WORK*
>
> > 1.3. Relationship between SHACL and RDF
> >
> >     "SHACL uses RDF and RDFS vocabulary... and concepts... [but] SHACL
> does not
> >     always use this vocabulary or these concepts in exactly the way that
> they
> >     are formally defined in RDF and RDFS."
> >
> >   Hang on, so SHACL does _not_ use RDF/S vocabulary as defined by the
> RDF/S
> >   specs??  It is jarring to read this in a W3C rec-track specification.
> How is
> >   this not a show-stopper?
> >
> >   One then learns that SHACL validation is about more than matching an
> >   immutable data graph against an immutable shapes graph.  Apparently it
> >   involves the prior creation of an _expanded_ data graph through
> selective
> >   materialization of inferred triples.
>
> 7: The only materialization-like notion in SHACL is default value types.
> This
> notion is being revised and may be done away with.
>
> >   The notion of "SHACL processors" having (selectively) to support
> inferencing
> >   goes far beyond just defining a vocabulary for describing a shape and a
> >   process for evaluating that shape against a data graph.  It implies a
> >   software application with SHACL-specific features and an inferencing
> style
> >   that is SHACL-specific -- both of which, to my way of thinking, should
> be
> >   completely orthogonal to the language specification, which could quite
> >   reasonably focus on just the vocabulary and validation algorithm.
>
> 8: SHACL shapes are written in RDF and some constructs in SHACL are
> grouped by
> using rdf:type and rdfs:subClassOf triples.  The document will be changed
> to
> use SHACL-specific vocabulary showing that there is no need for inferencing
> beyond SPARQL paths, in particular rdf:type/rdfs:subClassOf*
>
> >   If, as the spec points out, "SHACL implementations may operate on RDF
> graphs
> >   that include entailments", couldn't the SHACL spec be helpfully
> simplified by
> >   leaving the materialization of inferred triples out of scope entirely
> -- as
> >   something done in a pre-processing phase, perhaps according to a few
> >   well-known patterns as described in a separate specification?
>
> 9: This could have been done but the working group did not want to depend
> on
> any external materialization of entailed triples.  SHACL thus works on any
> RDF
> graph.
>
> >   The section ends with very puzzling definitions for "subclass",
> "type", and
> >   "instance" -- "A node is an instance of a class if one of its types is
> the
> >   given class"?? -- but I press on, hoping the next section will bring
> some
> >   clarity...
>
> 10: This section will be eliminated in favor of SHACL-specific terms.
>
> > 2. Shapes
> >
> >   The first paragraph says:
> >
> >     "Shape scopes define the selection criteria"
> >
> >   but then Figure 1 says:
> >
> >     "Scope selects focus nodes"
> >
> >   If a shape is just a graph (or part of a shapes graph), then surely
> that
> >   graph cannot actually perform a action, like "selects", as if executed
> like a
> >   Java method.  Figure 1 also talks about filter shapes that "refine" or
> >   "eliminate" and constraints that "produce".  Talking about graphs as
> agents
> >   is deeply confusing.
> >
> >     "Class-based scopes define the scope as the set of all instances of a
> >     class."
> >
> >   Okay, yes... classes have extensions... after all, RDF Schema 1.1 says
> that
> >   "Associated with each class is a set, called the class extension of the
> >   class, which is the set of the instances of the class" [3].  But what
> does
> >   this have to do with defining the set of focus nodes for a shape?  The
> scope
> >   of a shape is _not_ a specific data graph but the set of all instances
> of a
> >   class in the world?
>
> 11: *NEEDS WORK*
>
> >   I stop reading.
> >
> > Summary and suggestions
> >
> > The spec looks quite nice on the surface but the explanation is
> conceptually
> > muddled.  Would it not be simpler and clearer to define a SHACL where, to
> > paraphrase the 2008 DSP specification [2], "the fundamental usage model
> for a
> > [shape] is to examine whether a [data graph] matches the [shape]"?
> Everything
> > else could be out of scope.  Some suggestions:
> >
> > 1. Define "constraint" up-front.
>
> Shapes are discussed early.  Constraints are introduced in the new section
> on terminology.
>
> > 2. If a shape is described in RDF, say so early on, then avoid implying
> that a
> >    SHACL shape is based on any semantics other than RDF semantics.
>
> See change 4: above.
>
> > 3. Come up with better names than 'subclass', 'superclass', 'type', and
> >    'instance' for whatever it is that is being described.  Anyone
> familiar with
> >    classes and instances in RDF -- or classes and instances in OOP --
> will
> >    surely be led astray by yet another completely different re-use of
> >    terminology that only _seems_ familiar.  Repurposing these well-worn
> terms
> >    actually gets in the way of understanding.
>
> *NEEDS TO BE DONE* Most of these have been eliminated in favor of "SHACL
> type".
>
> > 4. Move anything about materializing additional triples as a
> pre-processing
> >    step -- even sub-class relationships -- into a separate document
> specifically
> >    for implementation advice, such as a primer. In other words, split
> out all
> >    references to inferencing from the SHACL language itself.  To keep the
> language
> >    specification clear, an immutable data graph need only be validated
> against an
> >    immutable shape graph, full stop.  Anything else can be moved
> elsewhere.
>
> *NEEDS TO BE DONE*
>
> > 5. Move Sections 6 through 11 into a separate document or primer.  Far
> better
> >    to put this into its own shorter, focused specification than tack it
> onto
> >    specification that is already much too long -- 108 pages, had I
> printed
> it out.
>
> *NEEDS TO BE DONE*
>
> > Simpler, clearer specs stand a correspondingly greater chance of
> actually being
> > read -- and used.
> >
> > Tom
> >
> > [1]
>
> https://www.w3.org/blog/SW/2013/10/04/w3c-workshop-report-rdf-validation-practical-assurances-for-quality-rdf-data/
> > [2] http://dublincore.org/documents/dc-dsp/
> > [3] https://www.w3.org/TR/rdf-schema/#ch_classes
> > [4] https://lists.w3.org/Archives/Public/public-rdf-shapes/
> > [5] https://lists.w3.org/Archives/Public/public-data-shapes-wg/
> >
> > --
> > Tom Baker <tom@tombaker.org>
>
>
>
> Date: Thu, 5 May 2016 10:15:11 +0200
> From: Thomas Baker <tom@tombaker.org>
> To: RDF Shapes <public-rdf-shapes@w3.org>
>
> > More comments on SHACL [1], Editor's Draft 29 April 2016
> > http://w3c.github.io/data-shapes/shacl/
> >
> > I posted a previous batch of comments on 1 May [1] but have learned a few
> > things since then.  I remain unsure what the specification really means
> in some
> > respects, so the following reflects what I think the specification
> "really"
> > means -- what I infer it to mean -- with some suggestions on how the spec
> > could help the reader by articulating some key assumptions up-front.
> >
> > 1. SHACL provides a vocabulary for describing shapes and a simple
> >    algorithm for "validating" an arbitrary graph of RDF data (Data Graph)
> >    against an RDF description of data shapes (Shapes Graph).
>
> See 4:
>
> > 2. The SHACL validation algorithm checks the conformance of triples in
> >    the Data Graph to "constraints" described in the Shapes Graph.
>
> See 4:
>
> > 3. Validation evaluates a target Data Graph at the level of its abstract
> >    syntax.  In accordance with RDF 1.1 Concepts and Abstract Syntax [1],
> >    RDF abstract syntax consists of triples, or subject and object nodes
> >    connected with predicates, with nodes that may be IRIs, blanks, or
> >    datatyped literals. The SHACL spec's use of "focus nodes" fits with
> >    the use of "node" in rdf11-concepts [2].
>
> SHACL works on RDF graphs, which is the abstract syntax of RDF, but
> "RDF graphs" is a better name for this.  There is wording in the new
> terminology section that does defer to [2] for terminology from there.
>
> > 4. In accordance with the Closed-World Assumption (CWA), the validation
> >    algorithm limits itself to matching constraint patterns, as described
> in
> >    the Shapes Graph, against the abstract-syntactic components of the
> triples
> >    actually asserted in target Data Graph, with no further
> interpretation of
> >    the Data Graph or inferencing based on its formal semantics.
>
> More care is now taken to say that SHACL works on data graphs directly.
>
> > 5. A Shapes Graph is expressed in RDF.  Even though the primary use of
> >    a Shapes Graph is for CWA-based validation, it should be noted that
> the
> >    semantics of the Shapes Graph itself, as of any other expression in
> RDF,
> >    follows the Open-World Assumption (OWA).
>
> Shapes graphs in SHACL are viewed as syntactic constructs, where the OWA
> and
> CWA assumptions are not relevant.  SHACL does determine whether some
> syntactic constructs are valid by using chains of rdfs:subClassOf triples,
> but this again only looks at the triples that are in the RDF graph.  Thus
> SHACL does not depend on any open-world notions.
>
> > 6. The inherently open-world meaning of the Shapes Graph, however, does
> not
> >    seem to be of practical consequence for its use in CWA-based
> validation --
> >    unless, perhaps, one were to construct or augment a Shapes Graph with
> inferred
> >    triples -- with the caveat that shapes graphs could potentially
> pollute
> >    "real" data by adding meaning that is not intended to be interpreted
> as
> >    real data, e.g., as when the practical hack of using a class IRI to
> name a
> >    shape were followed (Section 2.1.2.1, "Implicit Class Scopes").
>
> The scopes of a shape are determined only by looking at the triples in the
> shapes graph.  SHACL does not depend on the addition of any triples to
> either the shapes graph or the data graph, even for shapes that are also
> SHACL instances of rdfs:Class.
>
> *NEED TO RENAME "implicit" to something else*
>
> > 7. A Shapes Graph may specify a potential set of "focus nodes" as the
> "scope"
> >    of validation in the Data Graph.  A Shapes Graph may also specify a
> potential
> >    set of "focus nodes" to be dropped out of the validation scope
> ("filtered").
> >    Potential focus nodes may or may not match actual nodes in the Data
> Graph.
>
> The discussion of shapes, scopes, and filters has been revised
> considerably.
> *NEED TO DO THIS*
>
> > 8. Validation based on closed-world assumptions applies to the
> relationship
> >    between constraints (as described the Shapes Graph) and triples in
> the data
> >    graph viewed at the level of their RDF abstract-syntactic components
> >    (e.g., the "focus nodes").
>
> *TO DO*
>
> > Note: An earlier iteration of these comments was posted on the
> DC-ARCHITECTURE
> > [3].  The resulting thread drew out some additional comments and
> insights that
> > could be of interest to members of Data Shapes.
>
> The working group may take these extra comments into account.
>
> > [1]
> https://lists.w3.org/Archives/Public/public-rdf-shapes/2016May/0000.html
> > [2] https://www.w3.org/TR/rdf11-concepts/
> > [3]
>
> https://www.jiscmail.ac.uk/cgi-bin/webadmin?A2=ind1605&L=dc-architecture&P=3148
> >
> > ----------------------------------------------------------------------
> > Discussion
> >
> > Because SHACL is expressed in RDF, like it or not, a Shapes Graph is
> > interpreted according to OWA.  Since the design decision was made to
> express
> > the Shapes Graph in RDF, and not in a completely different syntax -- as
> in the
> > case of SPARQL or, for that matter, DCMI's DSP -- the native OWA
> interpretation
> > of a Shapes Graph cannot be papered over, ignored, or otherwise
> contradicted.
>
> SHACL views the shapes graph as an RDF graph, i.e., a set of triples.  As
> all that counts is this set of triples, the OWA is not relevant.  Even if
> this were not the case, there is nothing in the SHACL specification that is
> concerned with whether something that is not stated in the shapes graph, or
> in the data graph, is false or not.
>
> > The design choice of expressing Shapes Graphs in RDF does somewhat limit
> SHACL,
> > in certain respects, compared to SPARQL or DSP.  In SPARQL, for example,
> > `rdfs:subClassOf*` is interpreted as referring to the transitive closure
> of
> > `rdfs:subClassOf`; the asterisk is a sort of syntactic sugar, a
> convenience
> > notation, that triggers specific inferences.  As there is no equivalent
> way to
> > express `rdfs:subClassOf*` in RDFS, there is no way to say that
> > `rdfs:subClassOf` actually _means_ the transitive closure without, in
> effect,
> > arbitrarily overriding its global semantics.
>
> The RDFS semantics implies that rdfs:subClassOf is transitive so any
> discussion of classes in RDFS has to take this into account.  As SHACL is
> only concerned with RDF graphs as sets of triples it does not depend on the
> RDFS semantics when it talks about SHACL types (and SHACL subclass,
> superclass, and instance) so these are defined as using the transitive
> closure of rdfs:subClassOf triples.  The SHACL specification takes care to
> use "SHACL" to distinguish these notions from their RDF and RDFS versions.
>
> *NEEDS TO BE DONE*
>
> > Perhaps this is why the SHACL spec says that "SHACL does not always use
> this
> > vocabulary or these concepts in exactly the way that they are formally
> defined
> > in RDF and RDFS" (Section 1.3) -- a notion which gratuitously sets SHACL
> at
> > odds with W3C Semantic Web standards.
>
> *TO DO*
>
> > One could perhaps sidestep the issue by dropping _all_ consideration of
> > inferencing from the normative SHACL specification; saying only that
> there may
> > be a need for inferencing in a pre-processing phase; then discussing
> those
> > pre-processing options in a separate guidance document.  Putting
> inferencing
> > out of scope would make the SHACL spec simpler, clearer, and shorter.
>
> Instead of depending on pre-processing SHACL does its own determination
> here.
>
> > Abstract syntax issues
> >
> > Because SHACL is viewing RDF data graphs through a closed-world lens, the
> > meaning of the graph is beside the point -- just as the meaning of a
> graph is
> > beside the point with SPARQL.  A SHACL Shapes Graph is validated against
> a Data
> > Graph at the level of the abstract syntax of the Data Graph.  According
> to RDF
> > 1.1 Concepts and Abstract Syntax, RDF graphs are sets of
> subject-predicate-
> > object triples, where the elements may be IRIs, blank nodes, or datatyped
> > literals [1].
>
> This is made more clear in the current version of the specification.
>
> *NEEDS TO BE DONE*
>
> > Note that at the level of their abstract syntax, RDF Graphs have no
> "classes"
> > and no "instances"!  A search in rdf11-concepts [1] for the words
> "instance" or
> > "class" will find no mention of either one, anywhere in the spec.
>
> This is why SHACL needs to define its own terminology in the new
> terminology
> section.
>
> > Confusingly, the SHACL spec makes reference to "instances", "classes", or
> > "instances of classes" in the Data Graph, viewing the Data Graph through
> a
> > semantic lens.  Coining a new SHACL-specific notion of "instance" (and
> "class",
> > etc) next to the existing notions of RDF "instance" and OO "instance"
> make
> > SHACL particularly hard to grok.  At the end of Section 1.3, for
> example, the
> > definition for "instance" starts off by saying:
> >
> >   "A node is an instance of a class..."
> >
> > which I take to mean:
> >
> >   "A node [in the Data Graph] is an instance of a class..."
>
> *TO DO*
>
> > By comparison, the SPARQL spec specifies a SPARQL-specific syntax to
> express
> > triple patterns composed of variables and RDF-abstract-syntactic things
> such as
> > IRIs and Literals.  SPARQL itself does not "understand" that something
> is a
> > class or an instance -- it simply supports the formation of triple
> patterns and
> > leaves it to Primers and other usage guides to express queries,
> informally, in
> > semantic terms (e.g., "What data is stored about instances of class
> X?")  This
> > separation of concerns makes the SPARQL specification much easier to
> > understand.  It is worth noting that DCMI's Description Set Profile
> Constraint
> > Language [3] also defines its own syntax.
>
> *TO DO*
>
> > As an aside, it is unclear to me why it is even necessary for the SHACL
> spec to
> > redefine an already-loaded, overdetermined term such as "class" to refer
> to a
> > set of what one might call "type-matched focus nodes".   If the
> intention is to
> > make SHACL more understandable to people who are unfamiliar with RDF,
> this
> > should be done not in the formal spec but in a primer or tutorial, where
> an
> > explanation can be customized for a specific audience, such as
> programmers.
>
> *TO DO*
>
> > A year ago, it was proposed that an abstract syntax be developed for
> SHACL [4].
> > There was little discussion and the issue remains open but neglected.
> Since
> > SHACL is natively expressed in RDF, its abstract syntax is in effect the
> > abstract syntax for RDF.  It is not clear to me whether this is actually
> a good
> > idea.  If a Shapes Graph only exists to be used in a closed-world process
> > validating a Data Graph, what is the specific advantage of expressing it
> in
> > RDF?  Might a proper abstract syntax for SHACL, based on its own BNF,
> etc,
> > further focus and clarify the SHACL language?  On the other hand, I see
> no
> > specific reasons why SHACL should _not_ use RDF to express shapes graphs
> as it
> > does -- provided that the spec (or a primer) point out any potential
> pitfalls,
> > as touched on above.
>
> *TO DO*
>
> > [1] https://www.w3.org/TR/rdf11-concepts/
> > [2] https://www.w3.org/TR/rdf11-concepts/#data-model
> > [3] http://dublincore.org/documents/dc-dsp/
> > [4] https://www.w3.org/2014/data-shapes/track/issues/52
> >
> >
> > --
> > Tom Baker <tom@tombaker.org>
>
>


-- 
Dimitris Kontokostas
Department of Computer Science, University of Leipzig & DBpedia Association
Projects: http://dbpedia.org, http://rdfunit.aksw.org,
http://aligned-project.eu
Homepage: http://aksw.org/DimitrisKontokostas
Research Group: AKSW/KILT http://aksw.org/Groups/KILT
Received on Thursday, 19 May 2016 06:10:20 UTC