W3C

RDF Data Shapes Working Group Teleconference

10 Sep 2015

Agenda

See also: IRC log

Attendees

Present
Arnaud, iovka, ericP, kcoyle, labra, dimitris, hknublau, pfps, hsolbrig, aryman, simonstey
Regrets
Chair
Arnaud
Scribe
hknublau

Contents


<Arnaud> hello

<Arnaud> we're on, so feel free to call in

<scribe> scribenick: hknublau

Primer

Arnaud: we started with a Primer which then got abandoned.

… Not mandatory, although a lot of WGs do it

… how terse shall the spec be, different audiences

… entirely up to the WG to decide, middle ground might be to make the spec reasonably readable

… Primer adds to the workload

hknublau: we are spread thinly, we should rather wait with a Primer

aryman: My preference is to make the spec readable, with plenty of examples

… XML Schema spec is IMHO unreadable

… counter example: SPARQL spec is both a good reference and readable

… let’s try to make the spec self-contained.

Arnaud: Anyone in favor of a Primer now? Assuming we can find a middle-ground, we should continue

ericP: We still have our old document that could be resurrected.

Arnaud: Takes too much time, proposing to not worry about Primer now.

hknublau: Where to put SPARQL queries (to make it readable) - appendix or in the middle

aryman: Either variation works. Order by readership, SPARQL might confuse inexperienced users.

… Precise prose with examples, semantics chapter (mean repeating the structure)

<simonstey> http://www.w3.org/TR/owl2-primer/

<simonstey> was about to say that

hknublau: Details could be optional, open by click

aryman: I did something like this before, works quite well.

… SPARQL is also normative, so it can be in the same document with collapsible sections

kcoyle: how does this work for printing?

Arnaud: we can do anything, may require programming

aryman: JavaScript is now stable enough by all browsers, so we could do anything

ericP: CSS print directives, e.g. some sections can be hidden if in print mode

aryman: could have pre-generated PDF versions
... suggesting to in-line SPARQL, then we can make it interactive. (Agreement)

hknublau: how to format the SPARQL, e.g. what about helper functions

pfps: SPARQL functions should be completely vanilla

<pfps> pre-binding isn't vanilla SPARQL

<pfps> My opinion is that the SPARQL snippets have to be completely vanilla

aryman: we try to use SPARQL-as-is, WG members can suggest alternatives
... Implementation strategy should not shine through

… spec should be decoupled from implementation strategy

hknublau: we could put a short introduction explaining how the snippets are to be interpreted

pfps: pre-binding sounds like an implementation details

aryman: instead of pre-bound variables, just use an example?

<pfps> I'm worried that several bits of the implementation details are going to bleed through to the snippets

aryman: maybe inject a VALUES clause for the variables

<pfps> I think that it would be better to have meta-variables, i.e., non-SPARQL syntax that is substituted by an argument to the construct

pfps: we still don’t have a description of pre-binding

<Dimitris> what about %var1%

pfps: could just be SPARQL variables

… much of this depends on wording

aryman: our argument variables use $ sign, others use ?

hknublau: strongly agreed

PROPOSAL: all SPARQL snippets should use $ for variables that represent arguments, e.g. $minCount, while all other variables remain ?

+1

<Dimitris> +1

<simonstey> +1

<hsolbrig> +0

<aryman> +1

<Labra> 0

<kcoyle> +1

<ericP> +1

<pfps> +1

<iovka> +0.5

RESOLUTION: all SPARQL snippets should use $ for variables that represent arguments, e.g. $minCount, while all other variables remain ?

UC&R document

Arnaud: unsure what the status is, it hasn’t been republished.

… we ought to publish an update

simonstey: I started to refactor it, I wanted to restructure it all, to Use Cases. But lack of time.

(I use Colloquy)

simonstey: I would like to republish better version

<Arnaud> http://www.w3.org/TR/2015/WD-shacl-ucr-20150414/

<Arnaud> http://w3c.github.io/data-shapes/data-shapes-ucr/

Arnaud: the published version has significant differences

simonstey: additional use cases

… my changes are in a different place

… for next telco I could have a version that we could publish

Arnaud: we need another week of review anyway

… current editor draft has lots of improvement, Simon, feel free to take a couple of weeks and let us know.

<pfps> +1 to try to have a reviewable version next week or the next - I'll try to review it

Next: ISSUEs?

Arnaud: suggestion to talk about Results vocabulary

ISSUE-51

Arnaud: I wonder how much details we really need.

… different modes of operation?

… in case of Success, we said this should be optional

… at a minimum a boolean result should suffice

… we discussed it may lead to an infinite number of results

…compared to a C compiler, it’s all clearly defined if the code is correct, but error handling is undefined

… implementation may have modes and flags

pfps: I don’t like the analog to syntax errors. Misleading. Better analog would be queries.

… when you run a query you may have syntax errors, and maybe only the first ones.

… the validation results are like the answers of the query

… contract should be that everything gets reported

<ericP> E20935 Server vaporized by meteor

… some implementations may decide to limit

Arnaud: two different question: 1) how much info needs to be reported 2) how do we report it

aryman: different modes possible, e.g. at development time you want verbose reporting

… yet at runtime less info may be needed

… analogy with syntax checking: 1 error in your code may trigger lots of other errors, which are clutter

… could we have a dependency graph between constraints?

… e.g. minCount 1 then check on actual value

pfps: sounds difficult

aryman: each constraint could have a priority

… optimizer could overrule that decision

<pfps> This seems like a significant extension to both the syntax and the meaning of SHACL

<Dimitris> I have something similat in rdfunit (TestCaseDependency) but not yet implemented

<Dimitris> https://github.com/AKSW/RDFUnit/blob/master/ns/rdfunit_ontology_diagram.png

<ericP> hknublau: these feels complicating to specify and use

<ericP> ... but in cases like hasValue, we could just return one violation

<ericP> ... but that's an implementation detail

<ericP> ... in the test cases we need to look for all of the possible errors

<ericP> ... how to specify it is more important than what they report

(thanks eric)

<ericP> ... there will be variation between implementations

Dimitris: agreed this gets complicated

… sounds like a nice to have

… about reporting: multiple modes would be good, e.g. summary and details

Arnaud: nobody wants to drop the details?

… Option: we skip all this together, just a boolean

<pfps> part of the raison d'etre of SHACL is reporting violations - leaving the reporting undefined thus seems rather counterproductive

+1 peter

<pfps> it's like if SPARQL didn't define what queries are supposed to return

… Another option is to define all details of what needs to be reported under what options

<Dimitris> +1 to peter

aryman: just saying it’s invalid is not useful

<pfps> my complaint about violation reporting in the current spec is that it is unclear just what reporting is going on

… in source code we have line numbers, in RDF we only have triples

… we need a mechanism to guide people to source of a problem

Arnaud: difference between what is expected and what is required for conformance

<Dimitris> +q

<pfps> again - like SPARQL without definting what queries return

<ericP> hknublau: in my impl, i report the violating triple or the relevent component, e.g. s,p for cardinality constraint

<ericP> ... SHACL compliance (at some conformance level) should permit boolean and another should include everything (which still needs to be defined)

Dimitris: maybe to also accommodate ShEx: focus node, links to shapes and constraint

<iovka> +q

ericP: we are working on this problem too with ShEx

… a boolean might be sufficient as in some cases there is no way to correct the input anyway

ericP: challenge is which of the many fixes to report back

… example… two solutions may exist

… what about nested violations (sh:valueShape)

iovka: best thing would be interactive error reporting

Dimitros: interactive cannot work with an engine

<iovka> -q

aryman: we should be looking at this from use cases perspective

… e.g. input form should have a red star next to a field that requires fixing

… requirements document indicate we need enough reporting

<simonstey> http://w3c.github.io/data-shapes/data-shapes-ucr/#r10-vocabulary-for-constraint-violations

Arnaud: reducing the format may speed up the spec process

… (with a chair’s hat on)

<ericP> hknublau: i don't know what we would save if we abandoned this and it's already specified

<Zakim> ericP, you wanted to explore Dimitris's proposal

ericP: Dimitris’ proposal sounds promising

<pfps> the version of the spec document that I reviewed had significant problems with violation reporting not related to the violation/error/success difference

… if I validate X as a Y, and that entails validating V as a W we can still easily capture that tree

… simple thing: just return focus node and shape

kcoyle: how would that handle Arthur’s form use case

ericP: question of modularity, some implementations may report less or more

<pfps> again, these are not errors - they are violations

hsolbrig: Error handling in XSLT works pretty well

… they tie error messages to test cases

<pfps> the best XSLT analogue is the result of the transformation

… test case 43 says this data should fail due to cardin constraint

… at runtime, all test cases return the same error number

… a uniform consistently “error number” should be sufficient

aryman: two parts: 1) what is the syntax for error reporting

… should have the capability to point at the source of the error and perhaps a human-readble message

… 2) minimum: if something is wrong we need at least one error

… details should be left to implementations

<pfps> it would be easy to specifty exactly what gets reported - just say that it is the results of the SPARQL+ query

<ericP> what if we use SHACL to validate the validation report and it's invalid?

Arnaud: WG seems to want at least one violation

pfps: Disagree

… Typical use case: find the violations in a large RDF graph

… if all you get back is one violation then this is useless

<ericP> the C spec still doesn't mandiate reporting more than one error

<ericP> the C spec didn't address your screw case; the implementors did that independently

<hsolbrig> How about at least one error per focus node?

<ericP> there's *0* standardization ebtween MSVC, g++, clang, ...

Arnaud: what compliance level is needed

… order of violations reporting should not matter

<ericP> hknublau: disagree.

<pfps> I don't think that anyone would argue that the order of violation reporting can be specified

<ericP> ... in the SPARQL spec, there's a protocol and if you utter a query, you get the same response back

<ericP> ... so you can build code that works against any SHACL implementation

<ericP> ... in SPARQL there's SELECT and ASK; ASK is equivalent to gettting a boolean response

aryman: for reference implementation performance characteristics should not matter

… lowering the bar

Dimitris: we define output format, leave details about time out

Arnaud: Compliance levels one-star, two-star etc sounds complex (?)

… Peter suggested failures should be separated

pfps: Draft with Success, ValidationResults and Failures seem to be 3 different things

… risk of smushing together: implementations could produce some violations and quit

… only report violations, everything else is handled by another mechanism

<aryman> +1 to separating constraint violations from other failure/success signals

Arnaud: Terminology…

… severity levels: info, warning, error, then Failures for the runtime errors

<pfps> I would prefer, and have previously stated, that constraint violations should not be called errors

<aryman> +1 to calling a violation a Violation

<simonstey> do we have to report runtime errors? if not, info/warning/failure for the 3 levels

<hsolbrig> What is an example runtime error?

<Dimitris> I am also open to separate failures/success from violation instances

<simonstey> +1 to that

<pfps> I am against putting syntax "failures" into the results graph

aryman: in some cases you cannot even report a result object, it should be an API issue

<Arnaud> https://lists.w3.org/Archives/Public/public-data-shapes-wg/2015Sep/0025.html

pfps: I had sent a proposal

Dimitris: Similar to Holger’s latest draft, but removing Success and Failures, I would be OK with minor changes.

aryman: agree except where it references an API

<pfps> if "API" is not something to say, then "result of the operation" is a suitable replacement

<aryman> holger broke up

<simonstey> *brbl* *grmbl*

<ericP> i think that was more *brbl* *grmbl* *bggl* "and then we're done"

Dimitris: even an Info is violation

hknublau: we are going back and forth

<simonstey> but peter is calling them validationresults in his proposal isn't he?

… constraint violation -> validation results -> constraint violation

<pfps> again, I think that "error" is the wrong word to use for any kind of constraint violation

Suggest a class ValidationResult with severity. Keep Error as a level because it’s used in many places like this

<simonstey> +1 to validationresults

RESOLUTION: limit reporting to validation results, and not include runtime errors

<aryman> +1

<pfps> works for me

<ericP> kcoyle: +1

<Dimitris> +1

<hsolbrig> +1

<simonstey> +1

<Labra> 0

Arnaud: I think we had a good meeting, let’s stay motivated and full of energy

<Arnaud> trackbot, end meeting

Summary of Action Items

Summary of Resolutions

  1. all SPARQL snippets should use $ for variables that represent arguments, e.g. $minCount, while all other variables remain ?
  2. limit reporting to validation results, and not include runtime errors
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.143 (CVS log)
$Date: 2015/09/25 21:19:56 $