HCLS/ClinicalObservationsInteroperability/caBIGSemanticWorkflowsPrototyping

From W3C Wiki

caBIG Semantic Workflows

Models for Composition

A description of our basic models for composition, starting with the most simple case in Iteration 1.

Iteration 1

Assumptions made:

  • Web services have one input and one output (In-Out message exchange patterns); this to avoid having to take into account the choreography of web services themselves.
  • Web services are described using WSDL
  • Web services have their input and output messages annotated with SPAT Annotations. This way we can abstractly view a Web service as a tuple a =(pre,post) where both pre and post are sets of RDF triples (RDF graph).
  • A composition consists of a sequence of Web services (parallelization later).
  • We have a pool of web services available WS = { a | a = (pre, post) }.
  • We have a user goal u = (input, goal) where input is a ground RDF graph and goal is an unground RDF graph. Note that we can write the user goal as a SPARQL query with body input merge goal, where 'merge' is an RDF merge (union with resolution of blank nodes).


Composition can then be described as follows. Given a user goal u and a pool of web services WS as above, a composition is a sequence of web services (a1,a2,...,an) where ai = (prei,posti) in WS such that

  • S1 = input entails pre1.
  • S2 = S1 merge post1 entails pre2.
  • S3 = S2 merge post2 entails pre3.
  • ...
  • Sn = S{n-1} merge post{n-1} entails pren.
  • S{n+1} = Sn merge postn entails goal

Intuitively, (S1,..,Sn+1) is a sequence of search sates such that the first state consists of the input of the user goal; subsequent states are updated with the output RDF triples of the applied web services to the previous state, and the final state satisfies the goal of our user query.

Future Iterations

  • Introduction of SAWSDL
  • Parallelization of services in compositions (only sequences get generated in Iteration 1)
  • More intricate Semantic Web Service models, for example WSMO with more intricate pre- and postconditions in WSML or other logical languages
  • Adding heuristic functions and filtering techniques to reduce the search space
  • Taking into account background ontologies and reasoning (basically amounts to the calculation of implicit triples at each search state); turns fast exponential, thus restricted ontology languages have to be considered

Implementation

Iteration 1

Roughly, based on translation of RDF triples associated with input and output to rules. Body of rules contains input triples and Head of rules contain output triples, as in SPDL.

(to be updated)


Other approaches may use AI Planning techniques.

Issues

  • SAWSDL vs. SPAT embedded directly in the WSDL.
    • Since SPARQL Annotations (SPAT) map symmetrically between RDF graphs and XML documents, the can be considered equivalent to a SAWSDL lifting and lowering schema mappings.
    • SPATs applied via lifting and lowering schema mappings imply an extra level of indirection; specifically, that some document associate annotations with the schema mapping names:
 :id       spat:SPAT '?req :id xpath(".")' .
 :keywords spat:SPAT '?req :keywords xpath("tns:Keywords") ; :index    xpath("tns:SearchIndex")' .
 :asin     spat:SPAT '?book :asin xpath("tns:ASIN")' .
 :title    spat:SPAT '?book tns:docTitle xpath("aws:Title")' .
    • The SAWSDL model reference could be used to convey preconditions, as could SPATs on the operation element.
    • Also, failure to use applicable standards is a bit sociopathic and hides your semantics from other tools which follow the standard.
  • Inference strategy
    • The original SPDL system used a forward-chaining reasoner and recorded the closure with proofs.
    • Multiple solutions were manifestations of multiple potential choreographies.
    • A choreography with two alternatives for operations 2 is likely more reliable than one with a critical point of failure.
    • Using commodity forward or backward chaining reasoners may affect the degree to which alternative paths surface.

Process

Following are the process and data flow for an example rule compilation and invocation for Amazon Web Services (AWS) ItemSearch and ItemLookup operations.

rule Compilation

  • wsdlEngine = new WSDL::Engine()
  • wsdlEngine->parse(WSDL)
    • validator = new SchemaValidator -- an XML Schema validator with SAX handlers for parsing schema definitions.
    • spatHandler = new SPAT::Handler(this)
    • validator->registerAppinfoHandler('SPDL:SPAT', spatHandler)
    • wsdlSAXhandler = new WSDL::SAXhandler(this, validator)
    • wsdlParser = new SAXparser(wsdlSAXhandler)
    • wsdlParser->parse(WSDL)
      • wsdlParser->start_element(<definitions>)
      • wsdlParser->start_element(<types>) -- types contains the XML schema so next start_element goes to validator schema parser handler.
      • wsdlParser->start_element(<xs:schema>)
        • validator->start_element(<xs:schema>)
          • spatHandler->attribute(SPAT, schemaContext Template:AnchorSC1) -- SPAT is e.g. '?req this:id xpath(".")', schemaContext promises an API to the validator
            • SPARQLparser::parse('ASK {' + SPAT + '}'}, schemaContext) -- make something parsable as SPARQL (or subset the SPARQL parser).
              Returns a BasicGraphPattern of SpatTriplePatterns where embedded XPaths keep the passed schemaContext.
        • ... validator->end_element(</xs:schema>)
      • WSDL::start_element(<wsdl:message name="ItemSearchRequestMsg">)
      • ... message2element[ItemSearchRequestMsg] = tns:ItemSearch ...
      • WSDL::end_element(</wsdl:message>)
      • WSDL::start_element(<portType name="AWSECommerceServicePortType">)
      • WSDL::start_element(<operation name="ItemSearch">)
      • WSDL::start_element(<input message="tns:ItemSearchRequestMsg">)
      • ... portType2rule[AWSECommerceServicePortType][ItemSearch] = rule1 = WSDL::Rule(message2element[ItemSearchRequestMsg], message2element[ItemSearchResponseMsg])
      • WSDL::end_element(</input>)
      • WSDL::end_element(</operation>)
      • ... portType2rule[AWSECommerceServicePortType][ItemLookup] = rule2 = WSDL::Rule(message2element[ItemLookupRequestMsg], message2element[ItemLookupResponseMsg])
      • WSDL::end_element(</portType>)
      • ... <binding/> and <service/> tell you to use e.g. SOAP
  • wsdlEngine->invokables = the set of rules with antecedents and consequents tied to triple patterns with schema-resolvable XPaths.
  • schemaContext2ruleTerm: map from schemaContext (e.g. SC1) to XPath term in a triple pattern:
 <tns:SubscriptionId>001 => [XPath(., <tns:SubscriptionId>001)]
 <tns:Request>001 => [XPath(tns:Keywords, <tns:Request>001>)
                      XPath(tns:SearchIndex, <tns:Request>001>)]
 <tns:Item>001 => [XPath(tns:ASIN, <tns:Item>001)]
 <tns:ItemAttributes>001 => [XPath(tns:Title, <tns:ItemAttributes>001)]


Query Execution

Given a query:

 SELECT ?asin ?title WHERE {
  ?X tns:id "0FWYBWB91M5S26YBE382" ;
     tns:keywords "Weaving" ;
     tns:index "Books" ;
     tns:asin ?asin ;
     tns:title ?title }
  • wsdlEngine->executeQuery():
    • userQueryDB = new W3C::SPDL::RdfDB()
    • ... populate with user query turned into an assertion:
 <var:?X> tns:title <var:?title> .
 <var:?X> tns:keywords "Weaving" .
 <var:?X> tns:index "Books" .
 <var:?X> tns:id "0FWYBWB91M5S26YBE382" .
 <var:?X> tns:asin <var:?asin> .
    • backwardChain(query, invokables, userQueryDB) [query, rules, premise] -- This is the secret to treating services as databases. The trick here is that the rule bodies can leverage the premise, but we can't let the query match anything in the premise (because the premise is the query expressed as triples).
      • ... match triples from rule2 (ItemLookup)
        • ... match triples from rule1 (ItemSearch)
        • ... rule1 body fulfilled by userQueryDB with variables substitutions V.
        • rule1->invokeService(schemaContext2ruleTerm) -- @@ invoke now? simpler, but backtracking entails potentially wasted (though cachable) service invocations. perhaps adequate for the prototype.
          • generatorCallback = new WSDL::GeneratorCallback(V, schemaContext2ruleTerm)
          • <tns:ItemSearch>->generateDocument(generatorCallback)
            • <tns:SubscriptionId>->generateDocument(generatorCallback)
              • generatorCallback->getCharData(SC1)
            • "<tns:SubscriptionId>0FWYBWB91M5S26YBE382</tns:SubscriptionId>"
            • ...
 <tns:ItemSearch xmlns:tns="...">
   <tns:SubscriptionId>0FWY...</tns:SubscriptionId>
   <tns:Request>
     <tns:Keywords>Weaving</tns:Keywords>
     <tns:SearchIndex>Books</tns:SearchIndex>
   </tns:Request>
 </tns:ItemSearch>
          • HTTP POST SOAP envelope with above body
          • parse SOAP response
          • validator->setContextHandler(schemaContext2ruleTerm)
          • soapHandler = new W3C::SPDL::SOAP::SAXHandler(validator)
          • soapParser = new SAXparser(soapHandler)
            • ... validating the input element (and sub elements et al) produces a tree of bindings. These instantiate the rule (rule2) head.
 XPath(tns:ASIN, <tns:Item>001): "1883010039"
   XPath(tns:Title, <tns:ItemAttributes>001) "Learning to Weave, Revised Edition"
 XPath(tns:ASIN, <tns:Item>001): "1596680075'
   XPath(tns:Title, <tns:ItemAttributes>001) "Spin to Knit: The Knitter's Guide to Making Yarn"
      • ... match more triples from rule2 (ItemLookup)
      • ... rule2 body fulfilled by userQueryDB + rule1 invocataion
      • rule1->invokeService(schemaContext2ruleTerm)...
    • backward chaining fulfilled
  • render answer

XML Schema Validator Context API

  • registerAppinfoHandler(namespace, SAXContextHandler)
    • for each appinfo in namespace, calls SAXContextHandler with a the elt/attr name and a context
  • instantiate(...)
    • for each required CData section calls ...

Resources