StripeSkipping

From W3C Wiki
Revision as of 21:28, 5 April 2007 by SandroHawke (Talk)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The RDF striped syntax is all well and good, but its clumsy sometimes. Often a stripe doesn't carry much information; you'd like to skip typing it.

rdf':parse'Type="resource" lets you skip from a property stripe to another property stripe, asuming you dont care to name the class which would be named in the intermeidate stripe.

rdf':parse'Type="Collection" lets you skip from class stripe to class stripe, assuming the property you want is first/rest. Perhaps it should instead assume a "child" property which is a list, but... whatever.

If the parser knew which XML tags were property tags and which were class tags, it could do this automatically.

One could use case as significant

mumble mumbe ... skipping some explanation, link to StructuredTextOntology.

    <Para>   <Emph>    ....
    characters identify individuals -- themseves.
    <Resource><uri>http:....</uri></Resources>

Write demo parser? It would read simple XML, do ns-concat, decide striping based on case, and need to recognize special namespaces for (1) naming the characters and (2) naming the "uri" property. And maybe the Resource class. (and rdf:List stuff)

Of course using lists of characters makes things vaguely hideous.... But that's mostly from from an N-Triples view. If you're inside a running program, the pseudo-DOM has a million fairly-cheap lists anyway.

--->>> DONE! See StripeSkippingParser.py . It even mostly works nicely.... It's a good start.

Given

<Doc xmlns="http://example.com/htont/">
   <author>
      <name>Sandro Hawke</name>
      <email>sandro@w3.org</email>
   </author>
   <heading>This is a sample document!</heading>

<Para>I would like <Link><target>http://foo.bar</target>you</Link>
to read this <Emph>wonderful</Emph>
document</Para>

</Doc>


It produced pseudo N-Triples

_:g0 <http://...#type> <http://example.com/htont/Doc> .
<> root _:g0 .
# need to infer a property stripe
_:g0 <http://...#li> "\n" .
_:g0 <http://...#li> "   " .
# need to infer an individual stripe
_:g0 <http://...#li> _:g1 .
# need to infer an individual stripe
_:g1 <http://example.com/htont/author> _:g2 .
# need to infer a property stripe
_:g2 <http://...#li> "\n" .
_:g2 <http://...#li> "      " .
# need to infer an individual stripe
_:g2 <http://...#li> _:g3 .
_:g3 <http://example.com/htont/name> "Sandro Hawke" .
_:g2 <http://...#li> "\n" .
_:g2 <http://...#li> "      " .
# need to infer an individual stripe
_:g2 <http://...#li> _:g4 .
_:g4 <http://example.com/htont/email> "sandro@w3.org" .
_:g2 <http://...#li> "\n" .
_:g2 <http://...#li> "   " .
_:g0 <http://...#li> "\n" .
_:g0 <http://...#li> "   " .
# need to infer an individual stripe
_:g0 <http://...#li> _:g5 .
_:g5 <http://example.com/htont/heading> "This is a sample document!" .
_:g0 <http://...#li> "\n" .
_:g0 <http://...#li> "\n" .
_:g6 <http://...#type> <http://example.com/htont/Para> .
_:g0 <http://...#li> _:g6 .
# need to infer a property stripe
_:g6 <http://...#li> "I would like " .
_:g7 <http://...#type> <http://example.com/htont/Link> .
_:g6 <http://...#li> _:g7 .
_:g7 <http://example.com/htont/target> "http://foo.bar" .
# need to infer a property stripe
_:g7 <http://...#li> "you" .
_:g6 <http://...#li> "\n" .
_:g6 <http://...#li> "to read this " .
_:g8 <http://...#type> <http://example.com/htont/Emph> .
_:g6 <http://...#li> _:g8 .
# need to infer a property stripe
_:g8 <http://...#li> "wonderful" .
_:g6 <http://...#li> "\n" .
_:g6 <http://...#li> "document" .
_:g0 <http://...#li> "\n" .
_:g0 <http://...#li> "\n" .



Todo:

  • add filter to turn rdf':li arcs into proper first/rest arcs
  • fit into some RDF framework as a proper parser
  • add support for XML attributes
  • write a serializer for documents in this form
  • glue together with HypertextOntology