SpotOfDrama

From W3C Wiki


Aka "Atom SOAP RDF" ...

We're experimenting SOAP/RDF mapping tools, on the hypothesis that the SOAP Encoding Data Model and RDF's graph data model are close enough for RDF tools to be able to consume SOAP-encoded graphs, and for SOAP Encoding to serve as an alternate unstriped XML syntax for RDF. The aspiration here is to try this with Atom/Echo/Pie and see what the resulting SOAP-based markup looks like.

Context: RDF can be readable thread on Sam Ruby's weblog. Also SWAD-Europe work on SOAP/RDF graph mapping.

Tools: We use the Jena/ARP-based W3C RDF Validator service, as well as W3C's XT-based online XSLT service.

Nearby: AtomJobExample SOAP/Encoding/RDF

Note: this page reports on a work-in-progress; the exact behaviour of these stylesheets may change as things are improved.

Section 2.2 of the SOAP primer has an example of SOAP Encoding:


<?xml version='1.0' ?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope" >
 <env:Header>
   <t:transaction
           xmlns:t="http://thirdparty.example.org/transaction"
           env:encodingStyle="http://example.com/encoding"
           env:mustUnderstand="true" >5</t:transaction>
 </env:Header>  
 <env:Body>
  <m:chargeReservation       env:encodingStyle="http://www.w3.org/2003/05/soap-encoding"
         xmlns:m="http://travelcompany.example.org/">
   <m:reservation xmlns:m="http://travelcompany.example.org/reservation">
    <m:code>FT35ZBQ</m:code>
   </m:reservation>   
   <o:creditCard xmlns:o="http://mycompany.example.com/financial">
    <n:name xmlns:n="http://mycompany.example.com/employees">
           Åke Jógvan Øyvind
    </n:name>     
    <o:number>123456789099999</o:number>
    <o:expiration>2005-02</o:expiration>
   </o:creditCard>
  </m:chargeReservation>
 </env:Body>
</env:Envelope>


Ignoring the header, and focussing on the contents of the "Body" of this document, we see a SOAP Encoded instance of the SOAP Encoding Data Model. As such it is in effect a serialization of the node-edge-node graph data structure that SOAP encoding uses. This is more-or-less (details still unclear) isomorphic to RDF's graph data model. Let's try feeding this to Max's SOAP-to-RDF convertor...


1: <?xml version="1.0" encoding="utf-8"?>
2: <rdf:RDF xmlns:enc="http://www.w3.org/2003/05/soap-encoding" xmlns:env="http://www.w3.org/2003/05/soap-envelope" xmlns:ns1="urn:rubysoapservices" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
3: 
4: <rdf:Description>
5: <chargeReservation rdf:parseType="Resource" xmlns="http://travelcompany.example.org/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
6:    <reservation rdf:parseType="Resource" xmlns="http://travelcompany.example.org/reservation" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
7:     <code>FT35ZBQ</code>
8:    </reservation>   
9:    <creditCard rdf:parseType="Resource" xmlns="http://mycompany.example.com/financial" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
10:     <name xmlns="http://mycompany.example.com/employees">
11:            ?ke J?gvan ?yvind
12:     </name>     
13:     <number>123456789099999</number>
14:     <expiration>2005-02</expiration>
15:    </creditCard>
16:   </chargeReservation>
17:  </rdf:Description>
18: </rdf:RDF>


And here are the triples of the data model, represented in the N-Triples test case format (ie. as subject, predicate, object):

_:jARP48903 <http://travelcompany.example.org/reservationcode> "FT35ZBQ" .
_:jARP48902 <http://travelcompany.example.org/reservationreservation> _:jARP48903 .
_:jARP48904 <http://mycompany.example.com/employeesname> "\n           ?ke J?gvan ?yvind\n    " .
_:jARP48904 <http://mycompany.example.com/financialnumber> "123456789099999" .
_:jARP48904 <http://mycompany.example.com/financialexpiration> "2005-02" .
_:jARP48902 <http://mycompany.example.com/financialcreditCard> _:jARP48904 .
_:jARP48901 <http://travelcompany.example.org/chargeReservation> _:jARP48902 .


And here's a picture (from the RDF validator). See larger image for a readable version.

soaprdf-a-small.png

You can hopefully reproduce this by taking the soap2rdf'd SOAP 1.2 example and feeding it to the SOAP validator.

Recap, what's going on here?

We're exploring a mapping between SOAP Encoding and RDF. The basic idea is that SOAP Encoding is 'edge-centric'. All XML elements stand for relationship in the graph, and not for types. In RDF, we have a mixed ('Striped") model, where sometimes elements encode edges, sometimes nodes.

There are plenty of details left to work through. What conventions from SOAP Encoding can we use for representing the URIs of nodes in the graph, for example?

Let's try working some Atom/Echo/Pie markup into SOAP-encoded format... Hmm, now should we start with the Atom version of the RDF flavours we've been seeing?

Here's one approach. We begin with Aaron's minAtom.rdf example, which uses a DTD-defaulting variant of RDF to omit much of RDF's verbosity. Such an approach is in fact similar syntactically to what we're doing here, except that it relies upon DTD-powered attribute defaulting mechanisms which many RDF parsers don't (currently) support.

Anyway, here is minAtom.rdf:

<?xml version="1.0"?>
<!DOCTYPE rdf:RDF SYSTEM "atom.dtd">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

  <feed xmlns="http://purl.org/atom/ns#">
    <title>dive into mark</title>
    <link>http://diveintomark.org/</link>
    <modified>2003-08-05T18:30:02Z</modified>
    <author>
      <name>Mark Pilgrim</name>
    </author>
    <entry>
      <title>Atom 0.2 snapshot</title>
      <link>http://diveintomark.org/2003/08/05/atom02</link>
      <id>tag:diveintomark.org,2003:3.2397</id>
      <issued>2003-08-05T08:29:29-04:00</issued>
      <modified>2003-08-05T18:30:02Z</modified>
    </entry>
  </feed>
</rdf:RDF>


Here is atom.dtd, which is where the clever and sneaky stuff happens:

<!ELEMENT rdf:RDF (feed)>
<!ATTLIST generator rdf:parseType CDATA #FIXED "Resource">
<!ATTLIST author rdf:parseType CDATA #FIXED "Resource">
<!ATTLIST contributor rdf:parseType CDATA #FIXED "Resource">
<!ATTLIST entry rdf:parseType CDATA #FIXED "Resource">
<!ATTLIST content rdf:parseType CDATA #FIXED "Resource">
<!ATTLIST body rdf:parseType CDATA #FIXED "Literal">


What this generates is the following markup (ie. here we expand this by hand, and drop the olde-style DTD declaration):


<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
  <feed xmlns="http://purl.org/atom/ns#">
    <title>dive into mark</title>
    <link>http://diveintomark.org/</link>
    <modified>2003-08-05T18:30:02Z</modified>
    <author rdf:parseType="Resource">
      <name>Mark Pilgrim</name>
    </author>
    <entry  rdf:parseType="Resource">
      <title>Atom 0.2 snapshot</title>
      <link>http://diveintomark.org/2003/08/05/atom02</link>
      <id>tag:diveintomark.org,2003:3.2397</id>
      <issued>2003-08-05T08:29:29-04:00</issued>
      <modified>2003-08-05T18:30:02Z</modified>
    </entry>
  </feed>
</rdf:RDF>


...this exanded (and more verbose!) RDF will parse happily with any decent RDF parser, no DTD magic required. But it does have those unsightly rdf:parseType="Resource" attributes everywhere. These are RDF's way of stepping out of "striped" syntax mode into a syntax that looks a lot more like SOAP Encoding markup, ie. where (more or less) every element stands for an edge in the graph, and not for a node.

So, let's turn this into SOAP Encoding. Since we're using SOAP Encoding as a doc format not a protocol/message format, we omit the Header. This is unconventional, but what we're doing is unconventional anyway...

soapAtom-1.xml:

<?xml version="1.0"?>
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope" >
<env:Body>
  <feed xmlns="http://purl.org/atom/ns#" env:encodingStyle="http://www.w3.org/2003/05/soap-encoding">
    <title>dive into mark</title>
    <link>http://diveintomark.org/</link>
    <modified>2003-08-05T18:30:02Z</modified>
    <author>
      <name>Mark Pilgrim</name>
    </author>
    <entry>
      <title>Atom 0.2 snapshot</title>
      <link>http://diveintomark.org/2003/08/05/atom02</link>
      <id>tag:diveintomark.org,2003:3.2397</id>
      <issued>2003-08-05T08:29:29-04:00</issued>
      <modified>2003-08-05T18:30:02Z</modified>
    </entry>
  </feed>
  </env:Body>
</env:Envelope>


Now we've got this, next job is to try to auto-convert it back to RDF. And also we should see if we can find a SOAP de-serializer that likes what it sees...

Converting this back into RDF:

Here's the RDF/XML that we get from converting this:


1: <?xml version="1.0" encoding="utf-8"?>
2: <rdf:RDF xmlns:enc="http://www.w3.org/2003/05/soap-encoding" xmlns:env="http://www.w3.org/2003/05/soap-envelope" xmlns:ns1="urn:rubysoapservices" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
3: <rdf:Description>
4: <feed rdf:parseType="Resource" xmlns="http://purl.org/atom/ns#" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> 
5:     <title>dive into mark</title> 
6:     <link>http://diveintomark.org/</link> 
7:     <modified>2003-08-05T18:30:02Z</modified> 
8:     <author rdf:parseType="Resource"> 
9:       <name>Mark Pilgrim</name> 
10:     </author> 
11:     <entry rdf:parseType="Resource"> 
12:       <title>Atom 0.2 snapshot</title> 
13:       <link>http://diveintomark.org/2003/08/05/atom02</link> 
14:       <id>tag:diveintomark.org,2003:3.2397</id> 
15:       <issued>2003-08-05T08:29:29-04:00</issued> 
16:       <modified>2003-08-05T18:30:02Z</modified> 
17:     </entry> 
18:   </feed>
19: </rdf:Description>
20: </rdf:RDF>


And here are the triples that represent these RDF statements:

_:jARP48918 <http://purl.org/atom/ns#title> "dive into mark" .
_:jARP48918 <http://purl.org/atom/ns#link> "http://diveintomark.org/" .
_:jARP48918 <http://purl.org/atom/ns#modified> "2003-08-05T18:30:02Z" .
_:jARP48919 <http://purl.org/atom/ns#name> "Mark Pilgrim" .
_:jARP48918 <http://purl.org/atom/ns#author> _:jARP48919 .
_:jARP48920 <http://purl.org/atom/ns#title> "Atom 0.2 snapshot" .
_:jARP48920 <http://purl.org/atom/ns#link> "http://diveintomark.org/2003/08/05/atom02" .
_:jARP48920 <http://purl.org/atom/ns#id> "tag:diveintomark.org,2003:3.2397" .
_:jARP48920 <http://purl.org/atom/ns#issued> "2003-08-05T08:29:29-04:00" .
_:jARP48920 <http://purl.org/atom/ns#modified> "2003-08-05T18:30:02Z" .
_:jARP48918 <http://purl.org/atom/ns#entry> _:jARP48920 .
_:jARP48917 <http://purl.org/atom/ns#feed> _:jARP48918 .


And let's have a picture... (and a larger readable version). soapAtom-1-image-small.png

So where are we up to?

We're forgetting SOAP as a huge framework for building distributed Web services, XML protocol etc., and looking at just one part of it, the SOAP Encoding Data Model and it's XML representation. We have a quick demo of a SOAP-ized Atom fragment (this version not using any extension namespaces or fancy stuff; that can come later) and the conversion of this into RDF/XML using XSLT. Unlike other Atom-to-RDF convertors, the approach here is entirely generic. The same strategy would work on RSS1-as-SOAPRDF, FOAF-as-SOAPRDF, etc., since we are mapping between similar data models rather than between application vocabularies. This reduces the extent to which we need to build special-case parsing machinery for handing Atom-to-RDF, since a generic SOAPEnc-to-RDF facility should serve that purpose.

What's the cost?

  • need to include some extra markup if we really want this to be handle-able by off the shelf SOAP tools (eg. deserializers). If not, not so bad.
  • SOAP Encoding (like RDF) imposes its own syntactic discipline on the chaos that is XML Namespace mixing. We'd need extension vocabs to stick to the rules

What are the benefits?

  • adopt rather than invent an abstract data model to guide extension of the format
  • set up camp in a comfortable REST-friendly middle ground between the Web Service and Semantic Web worlds. We don't need fancy tools from either camp, but should be able to benefit from both.

What's next?

  • try this with a more verbose example
  • try mixed-content; need a convention for doing that within SOAP (by switching encoding rules?)
  • try extension names (Jobs example see AtomJobExample?)

OK...

Time to sort out how we represent URIs. Here's the SOAP Enc markup:


   <author> 
      <name>Mark Pilgrim</name> 
      <foaf:weblog xsd:type="anyURI">http://diveintomark.org/</foaf:weblog>
    </author> 


...and here's how we want it to look in RDF:


    <author rdf:parseType="Resource" xmlns:foaf="http://xmlns.com/foaf/0.1/"> 
      <name>Mark Pilgrim</name> 
      <foaf:weblog rdf:resource="http://diveintomark.org/"/>
    </author> 


(or similar, we could do the namespace prefix thing differently, eg changing the default ns on the 'weblog' element)

TODO:

  • only emit an rdf:type subelement if we're not dealing with a Literal (MaxF)
  • currently our 2nd example isn't auto-converting OK, because of this.
  • bug Sam into helping with an Axis consumer app that parses this stuff! (DanBri)

From chat w/ Sam Ruby,

<danbri>	RDF's syntax overhead for flagging a URI is seen by some as too costly.
	<danbri>	Putting type=anyURI attribute on an element (to my eyes) looks more likely to be accepted by 'vanilla XML'ists
	<danbri>	(even if they're SOAP skeptics w.r.t. big picture)
	<rubys>	IIRC, that really isn't a SOAP construct, it is an XML schema construct.  Just made popular by SOAP encoding.
	<danbri>	If we could find a profile of this which (a) parsed into RDF generically (b) could get some use out of say Axis' deserializer (c) imposed a cross-vocab discipline on namespaces, I'd be happy.
	<rubys>	http://www.w3.org/TR/xmlschema-1/#xsi_type
	<rubys>	I can code/test (b)
	<danbri>	re (b), that'd be great..



currently an IslandTopic, should plumb in from FaqIdeas (or whever Faqs live now, some confusion) or other pages about Rdf syntax.