HCLS -- 25 Mar 2014

Shape Expressions

Eric: In Sept W3C had an RDF validation workshop. As outcome, a Shape Expression language that I worked on, a BNF-grammar-like way to describe how an RDF graph should work.
... People tended to use SPARQL for validation, but terrible for implementing validation rules as a whole. Like writing a yacc parser by hand.
... People tried hand-tooled sparql queries, sparql path expressions. OWL with CWA and unique name assumption.
... Another language presented was Resource Shapes, from IBM; and Description Set Profile, from Dublin Core people.

<agray> Description set profile http://dublincore.org/documents/dc-dsp/

Eric: I've been working on Shape Expressions, which draws from those languages. It does conjunctions, disjunctions, optional groups, and more expressivity. Has an algebra: defined semantics such that for any schema and data there's a definite pass/fail answer.

<agray> Shape expression http://www.w3.org/2001/sw/wiki/ShEx

Eric: Designed to be similar to Relax NG.

<ericP> http://www.w3.org/2013/ShEx/EvaluationLogic#tests

Eric: Click "Test with cut" or "Test without cut" at the above link.
... The goal is to start a WG on it. I've been working on a submission, with Harold Holbrig. Trying to get some AC reps to sign off and send in by tomorrow or the next day.
... to have it ready for charter for RDF Validation WG
... Use cases were around FHIR in RDF. We have a mechanical transformation of FHIR to RDF, then using Shape Expressions to convert back to XML.

<ericP> ShEx Primer

<ericP> ShEx Examples

<ericP> GenX example

Eric: and here's a GenX example (above)
... GenX takes an RDF schema and instance data. If you load the above page and hit popup, there's a new window with transformations generated.
... Purpose is to demo the extensibility of Shape Expressions -- you can have your own expressions in {...}
... And the ability to turn the generated RDF back into XML. Because it didn't have 0/1 cardinality, genX is a much more declaritive way to handle this.

David: GenX uses RDF with Shape Expression to generate XML?

Eric: Yes.
... You can also change the schema so that the state is not written as at-state but at-condition or something else.
... In http://www.w3.org/2013/ShEx/FancyShExDemo?schemaURL=test/GenX/schema.shex&dataURL=test/Issue-pass-date.ttl the ShEx on the left is generic ShEx except for the GenX expressions.

<ericP> http://www.w3.org/2013/ShEx/FancyShExDemo?schemaURL=test/Issue-js-test-date-triple.shex&dataURL=test/Issue-fail-date.ttl&colorize=1

Eric: ShEx can be used as a transformation language via the actions.
... In this case the action generated XML.
... Regarding SPARQL, see above URL
... In http://www.w3.org/2013/ShEx/FancyShExDemo?schemaURL=test/Issue-js-test-date-triple.shex&dataURL=test/Issue-fail-date.ttl&colorize=1 it uses JavaScript and SPARQL semantic extensions. When you click "View as Sparql Query" the added functionality gets stuck into the SPARQL implementation to check the order of the dates.
... If you click the link to view as a SPARQL query, that query will validate that the dates are in the right order.
... When you use one of these semantic extension, it is implementation dependent to support it.
... After the WG works on this, they might add a date validation mechanism, but this illustrates extensibility.

<agray> For the dataset descriptions, we need to be able to do the choice between items. "One of dct:issued or dct:created MUST be provided"

Eric: IBM wants to start the Validation WG. They'll most want to work with REsource Shapes, because they've been using it for years, but it doesn't have group optional, sem extension or disjunctions. I'm hoping to start with ShEx, but I expect IBM to say that it's too complicated. We'll see. I've sent the submission to several members to sign onto it. That would improve the odds of ShEx being the starting point.

Alistair: We need disjunction.

Eric: Some of the complexity is due to disjunction, but the majority comes from the fact that it gives you semantic actions in a particular order. Things like GenX demands those.
... I suspect people will want known ordering.

David: I also think extensibility is important also.

Eric: That also came out of the workshop. Question is: what is the API promise for ext? Ordered or not?

David: Ordered is nice.

Eric: Yes, it also allows you to do thinks like SQL inserts.
... List is: public-rdf-shapes@w3.org
... Alistair has use cases on metadata. ShEx covers most, but a couple of things are left to semantic actions: note has SHOULD/MUST/MUST NOT, and if you fail a SHOULD it will go to a sem action, for a warning.

David: nice!
... Would be nice to have a cleaner simpler FHIR RDF representation, so that when people see it they'll like it. Would be nice to try ShEx for that purpose.
... Auto-generated FHIR RDF is rather verbose at present -- not very pretty.

https://github.com/jmandel?tab=repositories

<ericP> https://github.com/jmandel/fhir-rdf/blob/master/generic/fhir-shapes.shex

Eric: Without sem actions, there are scripts that gen ShEx for FHIR RDF.
... Hard job is round tripping from RDF back to FHIR. Need that for a good story.

FHIR RDF

<Claude> https://join.me/927-178-241

Eric: A while ago Josh and I worked on this, making it more or less friendly to RDF heads versus FHIR heads. Whether to rename the properties of the structure that's in FHIR XML/JSON? And whether to faithfully emulate the FHIR datatypes versus more terse RDF-typed nodes.
... There were ObjectIdentifier structures in FHIR, but in XSLT i put it out as a URL instead of a bnode with a bunch of properties. There are other opportunities for that kind of thing.
... The question of using properties that are RDF friendly vs FHIR friendly, my inclination is to lean toward FHIR heads: underbars in names instead of camelCase.

David: Fine. I'm more concerned about ugly structures, like bnodes with a bunch of properties instead of a simple URL.

Eric: We created this a while ago, then Claude started working on an ont for this. Ideally the ont and data that we're producing should be aligned.
... Now we're more concerned with the subsumption rules.

https://join.me/927-178-241

Claude is screen sharing at https://join.me/927-178-241

Claude: Concern that I have, looking for feedback: I'm taking FHIR concepts and modeling them in an ont.
... Didn't want to add any hierarchy that wasn't already there in FHIR.
... Want to discuss FHIR extension mechanism, and building an ont, need to figure out the intent and audience.
... My assumption: desire for clinical sem representation, and FHIR is the next gen of models from HL7, but that's where it ends.
... When you think of a clinical model, it's to maximize interop. So it's designed to be extended in a generic fashion.
... Designed for closed world. Very strong constraints. I think an OWL ont for FHIR should inform itself on FHIR, but not necessarily tied to HL7 use cases for FHIR. Should look at FHIR as a common core, but in semantic world it's an open world.
... Core ontology could be extended, would support open world.
... Two areas to address: 1. extensibility of FHIR; 2. constraining via profiles.
... Extensions have an element w a URL that uniquely identifies that extension. That extension can contain other extensions. Or the value can be a code.
... That's attribute extension.
... The other approach is the Other concept. It has an identifier, and author and a created date. This is how you add new concepts in FHIR.
... if you have a library of core FHIR resources, the existing library should still be able to parse it.
... This extension mechanism, but if you use a semantic KR approach, it's very awkward because extensibility in a semantic world is part of the fabric of that world.
... My proposed approach would be a core ont with extension points. And when you want to add new properties to a concept, instead of using the FHIR approach, you just use the predicate, create an ont that imports the core ont and add that extension.

Eric: If I want to add a property "OrganDonor" to a Person, is it an open content model or closed? Non-monotonic changes? Can I add any property I want?
... FHIR XML as specified, can I add any element I want to a resource?

Claude: Yes, but you have to follow that structure.
... I defined an extension property, which would never be used directly, but would then define a new ont with OrganDonor that imports the core ont.
... This allows that property to be recognized as an extension property, rather than using the FHIR approach. Then if you want to add new concepts, there's an Other concept, and you define the new concept as a subclass of that in your new ont.
... The alternative, how FHIR does it, is you'd have a node of type Extension, where you add attributes. Suppose you want to add a new string value to Person. You'd have to name it "value...". It's pre-coordinated.
... Even primitive types are extensible. You'd have to represent all of the FHIR types as classes.

Eric: Suppose we skip the extensible types?

Claude: yes, we could.

Eric: Does their model require you to have an extension? Do they think of themselves as having an extension? If I put an XML extension element into Person, and I say it's got a URL and a value, to what degree do they think creating an ext and giving it a value versus ....
... In RDF, we just add properties, we don't think of creating an extension. But in FHIR they create extensions.

Claude: But the reason they do this is because one requirement is that if they see a new concept they don't want to modify their data models.
... If you have multiple extensions, I don't think it would be hard to add the classes formally. And if you need to use this generic approach and say it inherits from Other, when serializing to XML all predicates can be standard extension.

Eric: So GenX might be able to spit that back out as XML.
... A reason for having Element name extension and properties encoded in the URL of that extension, is that you still have a closed content model.
... In RDF we're doing the same job by saying "this is derived from Extension".

David: ShEx could indicate what's supposed to be there. And when you add an extension, you could supply a ShEx to indicate what else is supposed to be there.

Claude: How do you express a constraint in RDF? Suppose I have Person and a datatype property isOrganDonor, and it's boolean.
... In FHIR this is done by saying Person is an extension [ valueBoolean true, .. and maybe an other extension ]
... This is if you translated XML to RDF w bnodes.
... "Person dt:name STRING Nanjo, Claude " in RDF
... But in FHIR you'd say: Person extension ]valueString Nanjo,Claude, extension [ uri: parsingFormat ; valueString lastName,givenname*
... I think the reason ISO 21090 doesn't just say something is a string, it adds 20 other attributes to help you understand it. But since FHIR didn't want to be burdened that, it gives you simple datatypes and let's you extend it to do things like that.
... Could we say: Person dt:name [ value "Nanjo, Claude", parsingFormat "SN, GN*"]
... and that would follow stadnard OWL approaches.
... The other approach would be to say: Person dt:name xs:string ; and Person dt:nameFormat xs:string.

Eric: You lose the vapid value property.
... You could do either one.
... I like some properties of both of those approaches.

<ericP> https://github.com/ericprud/FDA-TA/blob/master/renal.ttl

<ericP> http://piratepad.net/ltbLsK7A08

Eric: In FDA workk we've been doing, there's a concept of Observation. There's a code for it, might use SNOMED or LOINC to say serum creatinine level.
... There aren't any constraints on the codes or values.
... That means that peopel get to do whatever coding system they want, and they're promising a serum creatinine test.
... So blank node that you're generating could have a useful label.
... For strings you're less likely to have two different people asserting the same things on strings.

(See pirate pad nodes: http://piratepad.net/ltbLsK7A08 )

Claude: Do we want to support both ways in OWL, the FHIR way and the RDF-way?

David: Prefer the simplest way.

Claude: Someone could add a 'NOT' extension that would negate the meaning.
... Any attribute that dramatically changes the semantics of the class is a problem.

David: Want it to look simple and easy in RDF, so that people can look at it and say "hey, that's not bad". We want to gain RDF acceptance.

Eric: Were there use cases where desirable naive interpretation was enabled in XML but disabled in RDF?
... Don't want to show them nice RDF that disables some of their use cases.
... There are some people already sympathetic to RDF, but we'd better not throw away serious use cases.

Claude: I suspect expressivity to be stronger in OWL than XML.
... I could have adverse event, which has an agent 0..*. But if the adverse event is a reaction, then it must have an agent. That's a need for a profile.
... Another is a problem code can only be bound to this value set.

David: What about closed content model? Not yet sure if this is being sufficiently addressed in RDF.

Claude: Next time: profiles.

ADJOURNED

<egonw> mscottm: you saw there will be a European location for the Network of BioThings hackathon? In Maastricht, in fact?

- DRAFT -

HCLS

25 Mar 2014

Attendees

Contents

Shape Expressions

FHIR RDF

Summary of Action Items

Scribe.perl diagnostic output