RDF Validation Workshop -- 11 Sep 2013

Attendees

Present: kcoyle, Workshop_room, DaveReynolds, hhalpin, dbs, Workshop_room.a, tbaker, DBooth
Chair: Arnaud Le Hors and Harold Solbrig
Scribe: DavidBooth, PhilA, Anamitra, TimCole

Topics

Validating statistical Index Data represented in RDF using SPARQL Queries - Jose Labra Gayo

(slides, report summary)

Jose: Motivation - Webindex Project

<ericP> [http://www.slideshare.net/jelabra/validating-statistical-index-data-using-sparql-qeries slide 2]

<ericP> jose: developed for web index

<ericP> ... we developed the data portal for web index

Visualization and data portal

<ericP> [slide 3]

<ericP> jose: the workflow involves:

<ericP> ... .. get data from external sources

<ericP> ... .. statisticians produce index

<ericP> ... .. we map that to RDF and provide visualizations

Conversion is from Excel to RDF

<kcoyle> tbaker: the piratepad has quite a bit of content http://piratepad.net/E255z6M73S

jose: Technical details. 61 countries, 85 indicators. > 1megatriple, linked to DBpedia, etc.

WebIndex slide

<ericP> [slide: WebIndex computation process (1)]

<PhilA> Jose Emilio is talking about http://thewebindex.org/ and its use of http://www.w3.org/TR/vocab-data-cube/

Jose: Used SPARQL CONSTRUCT instead of ASK
... empty graph if ok, else RDF graph with error

[slide: SPARQL queries RDF Data Cube]

[slide: limitations of SPARQL expressivity]

Jose: Challenge computing series computation on RDF collections
... Idea of RDF Profiles for dataset families

could it be the mouse?

Jose: http://computex.herokaupp.com
... Source code in Scala (site on slides)
... http://herokuapp.com/ - demo site
... Webindex as use case, SPARQL as implementation, RDF Profiles (declaritive, Turtle)

<danbri> (somewhat related, SKOS validation - http://www.w3.org/2004/02/skos/webstage/validation - also just a structure)

<jose:> you can check e.g. that one observation is in one slice but not much more expressivity than that

<ashok:> if SPARQL works for a fairly complicated situation, why are we thinking about anything else?

<jose:> SPARQL is hard to debug
... we need to differentiate validating the graph vs. a dataset
... with SPARQL, we can test specific values in a particular graph
... though we could compile ShEx to SPARQL

<Jose> A couple of interesting, albeit unrelated ideas here...
... signing RDF - how do you generate a reproducable MD5 w/o order?
... functional patterns for RDF lists. Should there be "best practices"?

<PhilA:> is slide 11 a candidate profile?
... if so, i see it as too complicated
... we have two req: validation and form creation. too complex for the latter

<ericP:> is that 'cause of the expressivity, or 'cause it's in RDF?

<PhilA:> i suppose 'cause it's in RDF

<evren:> re: UI generation, the issue is not the syntax, it's the SPARQL query. that's where the shape of the data is described

PhilA: EU reqs are "don't make me need to speak SPARQL to generate a UI"

gjiang: did you use SPARQL extensions?

jose: we weren't happy when we had to use jena:sqrt

gjiang: maybe there can be a link from SPARQL to some statistical package

Stardog ICV - Evrin Sirin

(slides, report summary

<gjiang> [at slide 8] Semantics in OWL are for inference not suitable for validation

<gjiang> [at slide 11] Rule syntax for constraints

Evren: [at slide 13] if each person must have two parents, but only one was specified, inference can determine that there is another parent, and then the validation can be applied after inference.

(Evren talks to a slide that is not in the uploaded slides)

Evren: the tool figures out explanation of validation.

evren: i agree with the folks that said we need good explanations of errors but don't believe the constraints author should have to write the explanation. that should be the tool.
... we have definitions of constraints in W3C specs so we should capture those

EricP: Re validation and reasoning, SPARQL semantics say you have an RDF graph, but how you got it is up to you. The reasoning just changes what graph you use. Do you think that's a good model for validation use with entailment? If so, then we don't have to think about entailment.

Evren: Yes.

Arnaud: The question is whether the language allows you to specify that entailment should be used.
... the question is "does the language you use allow you to specify the entailment?"
... Initially you propopsed to just change the OWL namespace. Is that what you use now?

Evren: No, that would require using all the tool chains. You just execute it through the validation. That's why at the tool level you need to separate the axioms from the constraints.

Arthur: How would you associate the constraints with a graph?

<PhilA> PhilA: A proposal (from Paul Davidson) is to add a property to VoID that links a dataset to a profile (constraint set)

Evren: You could use named graphs, to have your constraints in a named graph. You need to keep them separate. Axiom annotations could also be used to indicate constraints, but we didn't do that because axiom annotations are a lot like reification, and tools may not treat them well.

EricP: What if someone interprets constraints as inference rules accidentally?

Evren: under OWA it would just infer that person085 is a manager, instead of determining (under CWA) that there is an error because person085 is not a manager.

@@1: how can i read this to learn about the graph to e.g. generate a form?

evren: you can thing about it as the SPARQL BGP describes the graph
... so we see "someValuesFrom" and we'll create a text box, ...
[Evren explains how constraint can be represented in SPARQL]

_: What about optional properties?

@@1: how would i describe optional properties

evren: right, you wouldn't write that in the constraints langauge

Arthur: It's not really a constraint, it's a graph descriptoin.

<DaveReynolds> +1 to last speaker, optional properties are needed for describing the data "shape" as part of publish/consume contract, even though they are not part of validation

Arthur: You want to describe a contract with a service, and part of the contract is that a property can appear 0+ times.

evren: we added "min 0" to our OWL constraint. it's not actionable during constraints checking but it describes the graph

Bounds: Expressing Reservations about incoming Data - Martin Skjaeveland

(slides: http://www.w3.org/2001/sw/wiki/images/e/e5/MSkjaeveland_w3c-rdfval2013.pdf, report summary

Arthur: [on slide 6] What do you mean by element? Does it depend on ts position?

Martin: By element I mean a S P or O in a graph.

Evren: [on slide 13] What kind of use cases for ontology hijacking?

Martin: Can check if you are adding domain and range axioms.

Evren: OWL RD only allows things that can be expressed with one triple. Cannot have someValuesFrom, allValuesFrom, (and some others).

<danbri> if I say MyNewType is a subClassOf http://schema.org/Person, versus MyNewType is a superTypeOf http://schema.org/Person ... people tend to see the latter as weirder, the former as acceptable and non-hijack-y

gjiang: Re ont hijacking that adds statements. What about removing statements? what effect does it have?

martin: no, only considered use cases of receiving data and protecting existing dataset.

Eric: Use case came from practical considerations or theoretical?

Martin: We did prior work on managing RDF transformations. This is transforming by adding.

EricP: SADI project is all about inferring extra triples. Their rules are written in OWL DL.

Coffee

<dbs> coffee++

OSLC Resource Shape: A Linked Data Constraint Language

(slides, paper, report summary)

slide 1 - 1 slide intro to OSLC

Arthur: IBM customers want tools that cover the product life cycle and beyond
... core specs delivered to W3C, being worked on in LDP WG. More domain specific specs gone to OASIS
... customers bothered by lack of XML Schema analogue
... came up with minimal language
[on slide 6] ... RDF/XML snippet shown is a resource shape for a bug report
[on slide 7] ... Does data have to be in the graph or is it externally referenced, for e.g.
[on slide 8] ... Creation factory is the data source, query capability is the endpoint (scribe paraphrase)
... data can link to its description (its shape)
[on slide 10] ... example of declarative list of properties etc. Encoded in turtle
... OSLC is just a vocabulary, it's not an ontology. How you use it is up to you

slides 11 - 16 show the spec

Arthur: [on slide 17] SPARQL seems good for the task of testing against the resource shape

ericP: I notice people favour returning True if there's a failure (the inverse of OSLC model)

Arthur: OK, but you want data to be returned so you cna fix it
[on slide 20 - Summary] ... OSLC has been around about 3 years

hsolbrig: How does this relate to WSDL?

Arthur: It's in the same spirit
... you can check for properties, cardinalities etc.
... I was in the WSDL WG and this one - I suggested re-using WSDL but there was too much baggage. WSDL basically too complicated although I fear we may have thrown away too much. We need a way to express constraints on RDF

hsolbrig: Is there a spec for the semantics of OSLC?

Arthur: The semantics would be formalised using SPARQL

[Discussion of what 'read-only' means]

<Zakim> evrensirin, you wanted to aks about deletes

evrensirin: you said something about not needing to do anything about DELETE?

arthur: you might want to specify a pre-condition for a delete
... that's a good point. The context of the constraint is important

Description Set Profiles - Tom Baker

(slides, paper, report summary)

tbaker: [Gives background on DC.] Application Profiles date from 2000
[on slide 6] ... Looks more like a record format

<kcoyle> [on slide 8] description set document

slide 10 shows same data in XML

tbaker: So can we validate the extracted data
... Defined a small set of constraints that we saw being used in the DC community in their app profiles
... being produced as natural language text

tbaker: [on slide 13] Just flash this up - it's the entire set of templates defined in the description set profile constraint language

tbaker: [on slide 16] The motivation was to help people author application profiles in a consistent way
... here's a screenshot from an experiment that sadly no longer exists although there is some Python code I can share
... it shows a tabular presentation of a profile - a style people are used to
[on slide 17] ... constraints are being embadded in the source of the wiki page in a controlled way
[on slide 19] ... vision was that the profile could be used to configure editors as well as validators
[on slide 20] ... We found that people were designing APs without looking at functional requirements
... so this is an attempt from 2007 to put the APs in context
... the yellow box is the AP - a set of documentation about the content of your metadata
... you can also document the domain model it was based on

<gjiang> ... distinction among foundation standards, domain standards, application profile

tbaker: we had some syntax definitions based on the abstract model
[on slide 21] ... I'm really offering this as a set of requirements that were gathered in the DC community up to 2008
[on slide 22] ... we wanted to encourage people to base their APs on functional requirements
[on slide 23] ... wanted to encourage people to model reality but with a light touch
[on slide 24] ... then we wanted to constrain the data - important for consistency and quality control
... bridging the gap between people who see the world as a series of records and those whop see unbounded graphs
... record people, used to XML, just saw it as the latest validation syntax. Some APs were then written as OWL ontologies. Wanted to get people to constrain the data, not the vocabulary (scribe note - hope I got that right)
[on slide 22] ...
[on slide 22] ...
[on slide 22] ...

PhilA: [on slide 26] +1 to the 'Authored in an idiom usable by normal people' requirement

tbaker: Before questions - can I ask kcoyle to comment? Anything to add?

kcoyle: My only comment is that I've been doing a back of the envelope on what we have and do not have is DSP language
... when the requirements are completed, what we might want to do is to look at the existing languages and techniques and see which ones cover what
... my gut feeling that there may not be a single solution because diff comunities have diff contexts

Arnaud: I hope you'll be able to join us after lunch as that's when we'll step back from the reqs and look at use cases, diff technologies etc. whether they match or not
... challenge in standards is always to decide on the use cases
... that's all for after lunch

TimCole: Thinking about APs.... XMl Schema always seem pretty powerful. Does anything on DSP provide any guidance on how we might make a language from what we have?
... One application might ask for foaf:name, another might want foaf:givenName and foaf:familyName - can I define a constraint doc in some way so that I can add an extra requirement?

tbaker: We refer to a specific set of DPS, or set of them - they're cookie cutters for data
... in the example from FOAF - those distinctions are defined in the FOAF vocab - the DSP would say what to use but I don't see how that eg would impact the design of the constraint language itself

TimCole: You've defined a profile with lots of things and I want to change one thing. Do I have to repeat the whole thing or can I just define the difference?

tbaker: We did discuss having a layered approach so people can define a basic profile and then just add a layer on. So that's in the same thought process but we decided not to solve that

Arnaud: Anything else?

Experiences with the Design of the W3C XML Schema Definition Language - Noah Mendelsohn

(slides, paper, report summary)

Arnaud: Noah was involved in XML Schema and so he's here to share his experiences of that

Noah: We went through a lot of things when designing XMLSchema - it has a lot right and a few problems

Noah: [on slide 3] These topics match those in the paper
[on slide 4] ... People came with very different assumptions and ideas and diff ideas about validation
... some thought the idea was to end up with a Boolean
... others wanted to say more
... some people wanted to know that data matched a type and why (data binding)
... following the 80/20 rule is good but one person's 80 is another's 20
[on slide 5] ... discussed diff between validating doc as a whole or at the element level
... RDF folks better at idea that serialisations are diff versions of same abstract model. That doesn't work so well for all XML folks
[on slide 6] ... No surprise that XML folks write their schemas in XML
... It's possible that there were better ways of encoding a schema

<Arnaud> Noah is talking about the example on page 3 of his paper

Noah: So the warning is - don't automatically write your schemas in RDF
[on slide 9] ... Anticipate versioning
you're likely to need an answer
... people find that their previous work needs updating. May need to reinterpret something
... how do you write a schema on day 1 such that if I get something different you can handle that the supplier might be using a later version of the schema, or even just providing data that is correct and it's the schema that is in error. Do you throw out the whole thing or do you break it down at the element level and highlight the 'error'

ericP: Drills down a little.

<mgh> In GS1 standards that provide XSD artefacts, we use this mechanism to represent an extension point (wildcard) <xsd:any namespace="##other" processContents="lax" minOccurs="0" maxOccurs="unbounded"/>

Noah: Point is that how to handle such cases is essentially app-specifc

ericP: Did you consider creating a compact syntax?

Noah: I guess you'll want your abstract model to map to RDF - you're used to that
... we do have the abstract model for XML, it's there.

Arnaud: Thank you for coming Noah

<tbaker> Thank you, Noah!

<kcoyle> could someone post here when things start up again, for those of us on the phone? thx

Arnaud: you're touching points that have been raised

Lunch - will be 25 minutes

<Anamitra> scribe-Anamitra

<dbs> aside: thanks to everyone for being so good to us remote attendees :)

<kcoyle> it's hard to hear - we may need some structure to be able to get participation of the phone people

<SteveS> Scribe: Anamitra

Next Steps

(report summary)

<Zakim> PhilA, you wanted to talk about queuing

<kcoyle> +1 dbs -- keeping track of the slides is a big help

up next Alignment of requirements and technology

Arnaud: questions regarding what we want to do
... capture use cases
... its just not about validation - its abt describing the Resource too

<mgh> XSD can also be used to generate an instance XML document example from an XSD. Do we need that kind of capability? - to generate a set of triples from a description?

PhilA: describe and validation are different

<Zakim> arthur, you wanted to describe scope

harold: [you want to] want to publish what you expect without going to SPARQL
... if i import data from an RDB with a good model, all i need from our language is to publish the description

<PhilA> +1 to Harold

Arthur: just calling this workshop validation is not accurate

ericP: lets call it validation and description

+1 ericP

<hsolbri> Characterization?

ericP: constraints is not a clear way to describe a resource

<Zakim> kcoyle, you wanted to ask about defining description

<ericP> kcoyle: when we talk about validation description, or do we have a broader view of description?

kcoyle: there are certails aspects that are just description without any validation aspect

hsolbri: we need something that does not imply process

<TimCole> To provide scope, do we want though to focus on descriptive aspects that support validation?

hsolbri: testcases for RDF and sw that produces RDF is to be considered

Arnaud: Resource shape serves dual purpose of describe and validation

<Zakim> evrensirin, you wanted to comment about being careful about descriptions

evrensirin: define the scope - main goal validation - side goal is describe the resource

<Zakim> PhilA, you wanted to talk about the likely new CSV on the Web WG which, in some ways, is closely related

gjiang: low level user should be able to define the constraints like UML

<PhilA> -> CSV on the WEb http://www.w3.org/2013/05/lcsv-charter.html

<hsolbri> gjiang: we may need an OCL for the description language as well

philA: similar to csv metadata - like headers and data type

<Zakim> arthur, you wanted to discuss how resources can use existing vocabularies in a novel way

arthur: we need to describe resources/documents - you can describe that without inventing any new RDF terms
... we should avoid inventing vocab terms if we can
... and re-use as much we can

<Zakim> hsolbri, you wanted to say CSV is on our radar itself. We started with UML / XML Schema, need to produce RDF equiv and CSV

hsolbri: omv - schema for describing ontology - modeled in RDF
... started with UML
... UML->XML schema
... we need to be able to exchange constraints between different modeling framwork - UML, RDF

<PhilA> +1 to hsolbri

Ashok_Malhotra: UML is useful - lets focus on just RDF validation - and then build tooling later for covering exchange between models - keep the swcope small

Arnaud: can define a transformation from csv to RDF and then validate using the RDF validator

<Zakim> hsolbri, you wanted to rebut

hsolbri: UML and xml schema community has already done the groudwork - lets start with that - as relevant to RDF

sandro: there is too much mismatch between these models

hsolbri: RDF type analogus to UML class and UML attribute to RDF predicate

<Zakim> arthur, you wanted to say UML has a different perspective and to and to

arnaud: guided by UML - makes sense

arthur: fundamental mismatch between UML and RDF
... RDF class is a classification - a resource can have many classification
... UML and RDF has intersection - so u can do a OO model as RDF - but not the other way

hsolbri: lossy in both direction

arthur: oo is abt info hiding -

<Zakim> kcoyle, you wanted to caution about starting with UML or XML or ??

kcoyle: agree with Arthur -
... UML and other models comes with baggage

<Zakim> ericP, you wanted to say that it's probable that the info that we care about for shape/pattern description is largely covered by UML

SteveS: UML has evolved

sandro: we should have a way to produce the RDF constraints as UML diagrams

arthur: ER diagrams precede UML

<Zakim> hsolbri, you wanted to change the subject.

<kcoyle> SteveS: flow-charting

<kcoyle> can't we start as a community group?

<Zakim> DavidBooth, you wanted to say I think it would be helpful if we roughly ranked our use cases and requirements

arthur: we need to plan - have atleast 2 stages -
... statge 1>extremely simple spec - then follow that up with the stage 2

Ashok_Malhotra: easy declarative stuff for 80% of stuff - and the SPARQL for rest of it

<Zakim> ericP, you wanted to ask if the description and validation of the issue tracking document in http://www.w3.org/2012/12/rdf-val/SOTA#ex seems useful to all of us here

+1 Ashok_Malhotra

<kcoyle> sandro: start with a spec, get all of the right people in the room

<Zakim> hsolbri, you wanted to ask eric a question about pushback

hsolbri: do we have a political issue for validating RDF -

<TimCole> +q Reaction may depend on definition of validation.

sandro: consumers need to know about what they are consuming - that argument works - as opposed to a triple store needing that info

<Zakim> TimCole, you wanted to suggest that reaction may depend on definition of validation

<SteveS> Based on past discussions within context of LDP: I think Tim and Henry see the need/motivation for this thing we called validation

TimCole: want to stay away from just a binary result - valid or not - give information about the result

_: the simple declarative format will lend itself to autogenerate SPARQL

<Zakim> evrensirin, you wanted to comment on simplicity

ericP: if simple format is not able to define something - we will need to re-look as to whether we can improve it to cover that

<kcoyle> can't hear - pls scribe! thx

Arthur: disjoint constraint can be added to resource shape
... should be driven by use cases

Ashok_Malhotra: do we have people who will like to start of this spec?

Arthur: I would

Arnaud: is it a requirement to make this language RDF

<kcoyle> DCMI can offer the constraints in DSP - Arthur, I will do that

ericP: the primary language should be RDF

<kcoyle> +1 needs to be demonstrable

<tbaker> +1 agree that should be representable, not necessarily represented, in RDF - who am I agreeing with (is this being scribed)?

hsolbri: description should exist in SPARQL query form

arthur: the more declaritive the language is - the easier it is to define in RDF

<Zakim> evrensirin, you wanted to talk about RDF representation

evrensirin: atleast have a way to specify sparql as a literal in the constraint language

Ashok_Malhotra: schema for schemas never worked

<Zakim> ericP, you wanted to say that the requirement we're discussing is whether the expression in RDF is *interoperable*

ericP: interoperable RDF representation

hsolsbri: represent in RDF as much as possible - should be able to publish a standard representation form

<kcoyle> pls scribe

<tbaker> didn't catch the point about SKOS - or who was talking...

<arthur> Eric was talking about SKOS

<tbaker> thnx

<Arnaud> scribe: TimCole

Can we define next steps

Do we agree that a Working Group should be formed to make a new declarative language with fall back to SPARQL

To speed things along we should start from a preliminary spec? Who would do this?

<kcoyle> I offered DSP structure and constraints to Arthur

Candidates. ResourceShape, Shape Expressions, DSP

Arthur. There will have to be a call...

<roger> can you send a link to your "shape expressions" information please Eric ?

<tbaker> wondering whether there is consensus that a working group is needed as opposed to a community group (as Karen suggested)

Arnuad: The working group will be chartered to use a spec as starting point, but WG can throw the spec out and start again.

evrensirin: Could the WG start with multiple specs?

Arnaud: There are IP issues which make this approach more difficult.

<Zakim> PhilA, you wanted to talk a little about W3C process

PhilA: To get a WG chartered, need bums on seats

<kcoyle> TimCole: :-)

<Zakim> SteveS, you wanted to talk about doing joint submission

SteveS: Submission (of starting spec) can be collaborative.

Arnaud: Charters need to be approved by W3C mgmt, and then by members.
... A draft charter is developed on mailing list. Responses feed the process of moving the charter forward.
... If interested in submitting a spec to serve as starting point, need to submit to W3C to clear IP issues.
... process takes a few months. PhilA: at least.

TimCole: Do we need to do any winnowing or prioritizing of list developed yesterday?

<kcoyle> TimCole: list needs a fair amount of work

Arnaud: Have we done enough for now? Let Chairs move forward, form mailing list, etc.
... Or we can work a little longer?

evrensirin; We need to say a little more about needs and priorities

<ericP> SOTA ex

ericP: The example implies some things about expressivity and interface

evrensirin: Wants to talk more about higher level aspects of use case. Who's this for?

ericP: Wants to keep concrete though. Not too hi-level.

<Zakim> tbaker, you wanted to ask if a WG is really needed. Why not a Community Group?

tbaker: Given lack of really strong agreement on task needed, do we want to start with a Community Group

Arnaud: Community Groups are recent. More of a forum to work together. No resources or formal endorsement by the W3C.

<Zakim> PhilA, you wanted to answer Tom

Arnaud: At best CG creates a spec which would need to be submitted, go through a WG, and then be ratrified.

PhilA: Some commercial entities reluctant to implement a CG spec.

Arnaud: Some success stories, but really though the startup is faster in the end not really faster in the end.

PhilA: if we can get a WG charter that tends to be better

<tbaker> +1 depends on how mature the concept is, and easier to involve people with CG

arthur: I think this is a mature area, and so appropriate for a WG

Commitments

Arnaud: Going back to having people commit. Do we have a critical mass?
... Who here would commit?

<arthur> +1

<hsolbri> +1

<ericP> +1

<roger> +0.6

<labra> +1

<evrensirin> +1?

<kcoyle> ~1 (unsure)

<nmihindu> +1

<tbaker> +0.6

<ssimister> 0.5 not sure yet

<Ashok_Malhotra> 0

<mSkjaeveland> 0

<mesteban> +0.5

<mgh> +0.5 not sure yet

TimCole: harder to join WG if your institution not part of W3C
... -1 since Illinois not part of W3C

<ddolan> +0.5

<SteveS> +0.1 I will participate through Arthur/Arnaud, definitely support it

<Ashok_Malhotra> Community Groups cannot create standards

tbaker: Not sure we are really ready to write a good charter yet.

Arnaud: Let's get back to what the problem we're trying to solve?
... Let's focus on the use cases.

<Arnaud> http://piratepad.net/E255z6M73S

<kcoyle> pad has requirements, but not use cases. need to gather use cases

<kcoyle> most of the talks represented one or more use cases

moving to pirate pad now.

<Zakim> dbooth, you wanted to suggest roughly prioritizing use cases and requirements

<SteveS> How about a X day effort to build the list of requirements and/or use-cases, then Y day effort to prioritize them (using a surveying tool)?

<Zakim> dbooth, you wanted to say it is important to be able to apply different schemas to the same datasets

See also

RDF Validation Workshop

Practical Assurances for Quality RDF Data

Attendees

Contents

Validating statistical Index Data represented in RDF using SPARQL Queries - Jose Labra Gayo

Stardog ICV - Evrin Sirin

Bounds: Expressing Reservations about incoming Data - Martin Skjaeveland

OSLC Resource Shape: A Linked Data Constraint Language

Description Set Profiles - Tom Baker

Experiences with the Design of the W3C XML Schema Definition Language - Noah Mendelsohn

Next Steps

Commitments