Attendees
- Present
- +1.617.715.aaaa, dbs, DaveReynolds, aisaac, +1.510.435.aabb, Workshop_room, +1.510.435.aacc, +1.510.435.aadd, +1.510.435.aaee, kcoyle
- Chair
- Arnaud Le Hors and Harold Solbrig
- Scribe
- sandro, arthur
Contents
- Topics
- Introductions
- State of the Art
- Presentation from Mark Harrison (U Cambrdige)
- Requirements for RDF Validation - Harld Solbrig
- RDF Validation in a Linked Data World - Esteban-Gutiérrez
- Lightening Talks
- Requirements Discussion
- Guoquin Jiang presentation - Mayo Clinic
- Simple Application-Specific Constraints for RDF Models - Shawn Simister
- Tim Cole - Using SPARQL to validate Open Annotation RDF Graphs
- Futher requirements Discussion
Introductions
Miguel Esteban Gutiérrez <mesteban> | Center for Open Middleware (Universidad Politecnica de Madrid) |
---|---|
Jose Labra presentation <labra> | My name is Jose Emilio Labra Gayo (University of Oviedo, Spain). I am interested in this workshop because we have a practical use case on the WebIndex and we have used a SPARQL queries based tool to validate RDF called Computex. We are also interested on RDF profiles |
Graham Rong <GR> | PhD, from MIT has been working on semantic web application in financial industry ... http://bit.ly/RxzPyr Linking XBRL to RDF: The Road To Extracting Financial Data For Business Value |
Sandro Hawke <sandro> | W3C. Staff contact for RDF-WG, GLD-WG, and was for SPARQL, RIF, OWL, Prov |
Roger Menday <roger> | Fujitsu Laboratories of Europe. Working on using Linked Data technologies in the Enterprise |
Guoqian Jiang <guoqian> | Clinic, Rochester MN. I am a clinical informatics researcher. My research interests focus on clinical data standards and using semantic web tools for data validation and quality assurance in health domain. |
Harold Solbrig <hsolbri> | Mayo Clinic. Focus on Ontologies in clinical research and standardized ontology representation. Editor and author of OMG LQS specification, HL7/ISO Common Terminology Services (CTS) and OMG CTS2. Participant in ISO 11179 and XMDR projects, IHTSDO SNOMED CT, WHO ICD-11 project. |
David Booth <DavidBooth> | KnowMED. Applying RDF and other semantic web technology to medical records and other healthcare information to facilitate better research and help measure quality of care. |
Ashok <Ashok_Malhotra> | Oracle. Member of LDP WG. Worked on XML Schema for many, many years! |
Martin G. Skjæveland <mSkjaeveland> | , PhD student from University of Oslo, Norway. Will present work on validating incoming RDF data based on what in the receiving dataset. |
Arthur Ryman <arthur> | IBM Rational, developed OSLC Resource Shape spec to fill the void where XML Schema lived, for documenting and specifying REST APIs for Linked Data |
Robert Beideman <rmb> | GS1: Leveraging RDF and LOD to facilitate availability of trusted, authentic data about Products, Companies, and Services on the Web |
Mark Harrison <mgh> | Auto-ID Lab at the University of Cambridge. We have a close collaboration with GS1 in the development of technical standards for supply chain visibility, traceability and electronic pedigree and we've recently been involved in the GS1 Digital project, which is looking at ways to use Linked Open Data for products |
Anamitra <Anamitra> | IBM/Maximo: RDF data introspection |
Evren Sirin <evrensirin> | Clark & Parsia, We develop Stardog RDF database that provide RDF validation capabilities |
Arnaud Le Hors <Arnaud> | ("Arno Luh Oarss"), IBM Linked Data Standards Lead, chair of the LDP WG and of this workshop (former W3C Team member :-) |
Tim Cole <timCole> | Univ of Illinois and W3C Open Annotation Community Group |
Steve Speicher <SteveS_> | IBM SWG Rational: LDP Editor: OSLC community/standards, I work with arthur |
Dave Reynolds <DaveReynolds> | , Epimorphics Ltd. Part of GLD working group co-editing Data Cube and Org specs. Among other things work with UK public sector on use of Linked Data which has raised a number of validation-like requirements. |
Antoine Isaac <aisaac> | from Europeana: previously working on SKOS. Interested in getting good quality data from numerous, heterogeneous datasets |
David Dolan <ddolan> | from Cape Mobile Tech's |
Jim McCusker | from RPI: Biomedical Semantics interested are data and provenance interoperability in life sciences |
Paul Davidson | Chief Information Officer, Sedgemoor District Council, UK |
Bob Morriss | from University of Massachusetts Boston/Harvard Herbaria |
David Lowery | from Harvard University - Museum of Comparative Zoology |
Phil Archer | from W3C. working with government linked data in the UK |
Noah Mendelsohn | from Tufts University: XML Schema guru/historian |
State of the Art
<Ashok_Malhotra:> When we started RDF, folks said it was great BECAUSE it had no schema. Are we changing our mind?
<Arnaud:> Sounds like JSON :-)
<hsolbri:> The schema is there whether you write it down formally or not.
<DavidBooth:> There are lots of different schemas in RDF. The beauty of RDF is the ability to combine them.
<arthur> PDF version of my charts at http://www.w3.org/2001/sw/wiki/File:OSLC_Resource_Shapes.pdf
Presentation from Mark Harrison (U Cambrdige)
(slides, , report summary, ???)
Robert: GS1: we did bar codes. We work with the Auto-Id Labs (started here at MIT)
... GS1 digital, trying to leveral all the master data in the supply chain, business-to-consumer
Mark: (slide with iPones, LOD for products, Pre-Sale)
... more informed choices, eg products with particular environmental impact
[on slide 9] ... do we want broken hyperlink checking?
... (can we validate offline)
... what is the scope/boundary of what we validate?
... When we have these huge code lists, the scale of validation queries might be problematic
... 3000 attributes, hundreds of which are code-list-driven
<hsolbri> Focus on markup and validation tools rather than the actual validation
<DaveReynolds> +1, publishing and inspecting the contract is at least as important as enforcing the contract
hsolbri: Happy to see these use cases. I think RDF "validation" is not the best framing. I think it's MORE important to publish the characters of what's in a store, rather than just validating.
arthur: This sounds a lot like what we've done at IBM. Can you describe....
mark: It's about making sure you can ...
... We need to make sure the two datasets are in sync with each other.
... You need to have confidence that these are the true values asserted by manufacturer.
... Maybe we could use digital signatures. There's liability to consider.
arthur: you're comparing published data with Reference data. you don't need to comopute a sig
mark: true, we could use prov as an alterantive to sigs
<arthur> GS1 uses cases very similar to OSLC, except for digital signatures
timcole: The issue cardinality, not validations. Value is correct... unit transformations. 600g = 1.2lbs or whatever. Are you encompassing that in validation?
mark: Yes.
... like in eric's example of reproducedOn
date -- you want to do checking like that, with units conversion
... EU legislation says vitamins are expressed in certain units. Sanity checking on values -- to make sure we're not off by orders of magnitude
timCole: Does broaden the scope.
mark: Yes.
Robert: We used to have a closed network for this. To open it to millions of producers makes this more complex.
Ashok_Malhotra: If you want to test whether this date follows this other date, there are xquery functions to handle all of that stuff. So we can just pick them up. We don't have to invent them again
mark: We should leverage what we can, yes.
... And using qudt for conversion of units, and so on.
Requirements for RDF Validation - Harld Solbrig
hsolbri: [re: ASN1] we had "strings" where were kind of like rdf graphs. a ptext code was a sort of ontology
<guoqian> hsolbrig: from ptxt to ASN.1
hsolbri: RDF only guarantees triples, literals
... With SPARQL, you have to code EVERYTHING as optional!
<mgh> In SPARQL need to use OPTIONAL extensively for defensive coding in case value is not present
hsolbri: ... which is NP
... SIde note: Dataset (identity is content), Triple store (Identity separate from content)
<guoqian> hsolbri: a definition about what is RDF store
hsolbri: We should focus on the invariants in an RDF store. The synax MUST provide a way to state the invariants. What will always be true of this store, so when you're writing queries, you know what's optional, what can be in there, what can't be in there.
... We need a way for them to be published, and for them to be discovered.
... Future -- invariants will change over time.
<guoqian> hsolbri:RDF validation must provide a standard syntax and semantics for describing RDF invariants
hsolbri: Semantic Versioning. semver.org
... That was the MUST. Here's the SHOULD.
... representable in RDF, maybe also a DSL
... formally verifiable, consistent, maybe complete
... self-defining
... able to express subset of UM 2. class and attribute assertions (and some OCL?)
... able to express XML Schema invariants
... implementable in exising tooling and infrastructure (RDF, SPARQL, REST, ...)
hsolbri: [slide 17] Example of allowed transitions -- you're allowed to add subjects, but not to add predicates.
... spectrum from read-only to write-any-triple.
<guoqian> hsolbri: LOD today OK for research but not for production systerms
<guoqian> ... OK for relatively static stores but not for federation and evolution
<aisaac> Question for Harold: Just checking, when you say "All constraints of XML Schema", this includes sequences?
guoqian: You're offering another definition of "store". Is this different from existing defn of named graphs?
hsolbri: I'd have to go back and look at that. I think Named Graphs are local to quad store. ANd I'm focussing on having the identity of a store, but have the contents be constrainted.
<sandro:> as i understand SPARQL11 terminology, a "graph store" can have multiple "states"
... so you're talking about a particular graph store to only contain certain datasets
Arnaud: people use the term "graph" sometimes to mean something mutable or not, gboxes and gsnaps.
hsolbri: "magic box" was a term we onces used.
aisaac: I heard Harold say he wants to represent all that's allowed by XML Schema. Does that include Sequence Information?
hsolbri: Great question. There are situations where people take advantage of order, but this may be a drawback. so, maybe MOST of XML schema. The challenge is how to get it back out in the right order....
Arnaud: We have on the agenda a presentation from Noah Mendelson, to talk about XML Schema, warning us against reproducing some of their mistakes.
... Some people will say 20/80 rule, but which 80?
Arthur: Your summary slide was a bit disappointing/negative.
hsolbri: I believe fixing this is necessary to to make RDF able to be a primary source for content.
arthur: I consider your second negative to be a positive. It's why we've adopted RDF. Traditional data warehouses are very expensive because they completely enforce the schema. RDF allows more graceful evolution.
hsolbri: So, the flexibility of RDF is seen as a real advantage. A fellow at OMG used to distinguish between precise and detailed. We publish the invariants that are known, but it's important to be able to leave flexibility. If we make no assertion about firsttname and lastname, then that's important to know, too.
evrensirin: Graceful evolution of data is an advantage of RDF. That's not about enforcement of schema, but about having the option to not have a schema.
... Clarification on post-conditions. State transitions, or states?
hsolbri: Closely related to reasoning. If you're doing anything beyong a basic PUT, adding a triple to a store may involve doing additional inferences, eg adding a firstname may result in the presence of a fullname in a store.
... what has to be true for this set of rules to fire; what is true if they do.
RDF Validation in a Linked Data World - Esteban-Gutiérrez
(slides)
[discussion of dynamics in validation not captured]
Linked Data Profiles - Paul Davidson
Pauls wants a "Linked Data Profile" that describes the properties, values, etc., that should be used so that multiple councils in England can share data
<sandro> +1 Paul Davidson, make it easier to share municipal data
Forms to direct interaction with Linked Data Platform APIs - Roger Menday
Roger: described use of REST APIs at Fujitsu
Roger: participating in LDP activity
... need to descibe parameters to create resources (Progenitor)
... use case: enable robots to fill in forms
... proposed a vocab (f:parameterSet ...) to be included in an LDP container
Europeana and RDF data validation
(slides, slideshare, report summary)
Antoine: aggregates data from multiple sources (musems) and need to enforce constraints
... described as table: property, occurence, range
... using OWL now
... EDM is implemented as XML Schema (for RDF) with Schematron rules
<dbs> EDM = Europeana Data Model
Antoine: Also using Dublin Core Description Set
... OWL = hard, SPARQL = low-level
Thoughts on Validating RDF Healthcare Data
<guoqian> -- Schema promiscuous: why RDF?
<aisaac> Bye folks. It was a great morning. Enjoy the rest of your day, and thx a lot for the slide moving!
dbooth multiple schema, multiple data sources
... ==> need multiple perspectives on validation of the same data
... wish list: build on SPARQL,
... use SPARQL UPDATE to build intermediate results (instead of one giant SPARQL query)
... check URI patterns
... must be incremental so you can do it continuously, e.g. like rgression testing
... declarative is too awkward for complex rules ==> need operational (imperative): SPARQL UPDATE pipelines
Validate requirements and approaches - Dave Reynolds
DaveReynolds currently working with UK gov: multiple vocabs, manual docs, each publsiher validates their data
... need a shared validation approach: need to specify "shape" of data
... declarative rules are desirable
... understandable by "mortals"
<hsolbri> Interesting: does Reynold's declarative requirement clash with Booth's procedural?
DaveReynolds cites W3C Datacube vocab
<DavidBooth> Harold, I think it depends on the complexity of the validation check. If it can be expressed in a simple declarative rule, then that is easiest. My point is that for more complex checks, operational is needed.
DaveReynolds SPARQL used to express Datacube integrity constraints
... SPARQL queries hard to understand
... for irregular data, OWL is also too hard
<guoqian> need ability to validate against external services such as registries
DaveReynolds need to specify controlled terms too
Requirements Discussion
Arnaud framing discussion: what do we need? What can we afford?
Harold: compare need for procedural steps versus declarative constraints
... must declarative description also be executable (for validation) e.g. by translation to SPARQL
... e.g. in many cases, the datastore content is already valid, so the missing capability is to advertise what's in a store
David: desirable to have high-level specification that is translatable to an executable language (SPARQL)
Arnaud: use the IRC queue system "q+" to get on queue
<DavidBooth> David: Want the best of both worlds: declarative when a constraint can be easily expressed that way, while allowing fall back to SPARQL when necessary. So to my mind the ideal would be declarative *within* the SPARQL framework.
<Zakim> ericP, you wanted to discus XML Schema/RNG + schematron
Dave: SPARQL is too low level: need high-level description
Eric: uses multiple schema langauges XSD, RelaxNG, Schematron
... we'll probably have a high-level validation language that is extensible with low-level rules in SPARQL, JS, etc
<hsolbri> UML has Class, property and OCL (schematron equivalent)
Evren: SPARQL has extension points. Concern about SPARQL UPDATE since it changes data
David: didn't imply to actually change data
Tim: OWL wasn't developed for validation, SPARQL wasn't developed for validation: why not have a language without baggage
Harold: we should be informed by UML
Ashok: should split up problem, 1) state, 2) structure, 3) constraints
Arnaud: perspectives are 1) validation, 2) description
Eric: description should be translatable to SPARQL, SPIN, whatever
<Zakim> hsolbri, you wanted to say if it isn't compatible, I think we need a good justification as to why.
Eric: cites Stephan Decker proposal to translate description into SPARQL
Harold: cites project to translate UML -> Z - SPARQL
<Zakim> evrensirin, you wanted to talk about what we can afford with sparql translation
<guoqian> hsholbri: working on translating from UML to Z to Sparql
Evren: translation is good implementation strategy, but not for state transitions
<Zakim> ericP, you wanted to say that coverage of all triples may be tricky in SPARQL
David: use multiple graphs or datasets to describe pre/post conditions
<Zakim> labra, you wanted to talk about RDF profiles
Labra: descibes work on RDF validation based on profiles
... like Schematron, using SPARQL instead of XPath
<Zakim> hsolbri, you wanted to say proposed requirement - invariants (and rules?) expressible in RDF
Harold: SPARQL not using RDF (unlike SPIN) - we should require an RDF representation
<guoqian> hsolbri:SPARQL should be able to be defined in RDF with meta data
Evren: SPIN is going to allow a literal string of SPARQL
Harold: don't want to parse another grammar
Evren: SPIN has both - RDF based and literal SPARQL string
<Zakim> ericP, you wanted to ask if the expressivity of SPIN in RDF is of opperational valye
Evren: what is the value of the RDF representation of SPARQL in SPIN? Is this just for query governance?
Harold: RDF is useful for impact analysis
<Zakim> DavidBooth, you wanted to say I think a main reason for the RDF-based SPIN syntax is the ability to change namespaces in the query
Steve: need to also see why validation fails
<guoqian> hsolbri: meta-repository may be an argument for RDF validation
<Zakim> DavidBooth, you wanted to say one thing I particularly like about SPIN CONSTRUCT rules is the ability to attach arbitrary data to a validation error
Harold: metadata merging is important so RDF is useful in that use case
David: SPIN CONSTRUCT rules allow attachment of other data
<SteveS> I'd like the validation results to not only provide a useful message that a tool could possibly recover, but also the context such as the triples causing problem and rules that cause it (some guidance on how to become validate would be helpful)
Arnaud: need to discuss what is affordable
... need to prioritize what we can do in a 2-year period
... experience shows that the experience of developing standards in charter groups can be brutal [laughs]
End
<arthur> Break for lunch courtesy of W3C
<arthur> check out this w3c spec that contains Z notation http://www.w3.org/TR/wsdl20/wsdl20-z.html
Guoquin Jiang presentation - Mayo Clinic
<hsolbri> [at Slide 8: Architecture] Clinical Element Models converted to XML Schema, Instance data to XML then Schema to OWL and instance to RDF
Guoquin:: Slide 11: Check constraints and validate
Guoquin:: ... Use SPARQL
Guoquin:: Eric: Is SPIN generated fron Schema
Guoquin:: Jiang: No, by hand ... perhaps in future'
Guoquin:: Slide 15: Reference Model picture
there is a SPARQL error on chart 16
Guoquin:: Slide 16: Data values
Guoquin:: Slide 19: RDF Rendering of Domain Template
Guoquin:: ... using SPIN in an RDF Form
Guoquin:: Slide 20: Discussion Points
Guoquin:: ... RDF Validation against CIMI Models
Guoquin:: ... Challenging issues (data types, value set binding)
Guoquin:: ... XML Semantics Resuse Technology
<arthur> i don't undertand XSD->OWL
<arthur> XSD = constraints, OWL = Inference
Slide 21: Picure showing Technologies and their Relationships
Overlay: BRIDGing Technology
Arthur: How can you translate XML Schema to OWL or UML to OWL?
Eric explains ... they are different but can be used in similar ways
Discussion on translation between UML and OWL, XML and OWL
scribe: constraints and reasoning are just different
<kcoyle> aadd is kcoyle
Q&A
Discussion of constraint checking vs. inference
Arnaud: Are you doing this mapping on Slide 10 or are you thinking of doing this?
... asks about validation at different levels
Harold: This is a vision ...
... MIF is an extension of UML with a higher degree of expressivity
... Effort to translate MIF to OWL [example resulting clinical data]
Simple Application-Specific Constraints for RDF Models - Shawn Simister
ssimister: RDF Validation at Google
... we are triplifying the Web
... What approaches did we consider?
... Schematron, SchemaRama
... SPIN constraints
... nice to be able to have metadata on constraints, like for severity of violations
... OWL Integrity Constraints
... Our Solution ... path-based constraints
... What did we learn
... Most constraints are property paths. SPARQL handles the rest
... constraints describes the app, not the world it inhabits
... Constraints need to be app specific
<arnaud:> how do the constraints get created? do you do it, does the developer?
ssimister: some of each. gmail team had their own internal software with their internal test cases, so it as easy to get them to generate stuff for us.
<guoqian> -- schema.org
<sandro:> surely an app has one set of property paths for what's needed to use the data at all, and another that it might be able to use.
ssimister: we only talk about the required stuff. for one thing, we're trying to not discourage people from providing information we don't happen to use yet.
<sandro:> It would be nice, probably, to still tell folks what data you can use if provided.
ssimister: good idea.
DBooth: Are the paths RDF property paths?
ssimister: No they are not ... very similar
Arthur: Why do you split into context and constraints when you can use a single SPARQL query?
ssimister: The design came from Schematron
<mgh> Seems like a constrained subset of the property paths that can be used in SPARQL 1.1 - not supporting *, + notation
Question about the parser
ssimister: Superset of RDF ...
... not public yet
Using SPARQL to validate Open Annotation RDF Graphs - Tim Cole
Tim: Context: W3C Open Annotation CG
... has 102 members
... narrow and easy usecase for RDF
Tim describes the OA data model
Tim: describes the OA Ontology
... LoreStore Annotation Repository
... store, search, query, display and validate annotations
... approach
Bob Morris on FilteredPush RDF Validation
Tim: rules are grouped into RuleSets. All rules in a set must be valid
... the OAD namespace has some extensions to the OA namespace
[Q&A]
Tim: I was happy that most of these topics came up in the more complex cases as well
COFEE BREAK for 15 Minutes
Requirements List
(discussion pirate pad, report summary)
The group collaborated on a PiratePad, with some extra coordination because PiratePad permits a maximum of 10 simultaneous users.
Minutes formatted by David Booth's scribe.perl version 1.138 (CVS log)
$Date: 2013/10/04 17:30:29 $