Arch/XML Syntax Issues 2 - W3C RIF-WG Wiki

The XML syntax for all RIF dialects is a simple recursive serialization of the syntactic objects being interchanged. For instance, the serialization of a ruleset includes the serialization of each rule in the ruleset, and the serialization of each rule includes the serialization of each condition used in that rule.

Several issues have not yet been decided by the Working Group. These are not entirely orthogonal, but some attempt has been made to keep them as independent as possible.

1. Is the XML serialization a subset of RDF/XML?

This issue connects to many of the other issues here.

Reasons for using RDF/XML subset:

It's an existing syntax that meets our need
Allows rules to process rules (RIF documents) as semantic-level data instead of just as XML trees, without schema/dialect awareness
Allows the design of the syntax to borrow more directly from experience and existing W3C Semantic Web Recommendations
Allows RDF tools to examine rules as data
Allows input handlers to re-use RDF parsers
Appears very supportive of the Semantic Web [ ]

Reasons against:

Does not follow the style and experience of the rule-related W3C Member Submissions (SWRL, SWSF, WRL)
Does not follow the style of the core W3C Web Services Recommendations, which are based directly on XML (SOAP, WSDL)
OWL 1.0 had difficulties using RDF/XML for its serialization
Could increase barrier to entry for Production Rules and Business Rules users who are not in RDF
The resulting XML syntax might be more verbose
The resulting XML syntax might have extra parts
Precludes very elegent XML syntaxes, such as can be obtained with stripe skipping
The elements from the RDF namespace may confuse people
The use of mixed namespaces with RIF and RDF may make it less convenient to use some XML tools (e.g., XML Schema validators and XSLT translators)
Some people feel that RDF as the serialization syntax of another language leads to a known impedance mismatch: RDF descriptions make an open-property assumption whereas syntactic descriptions must be closed by definition

1.1. Semantic Loop?

Question: There seems to be agreement that RDF graphs may be included in RIF by a normative embedding, ie. including a triple (s p o.) by it's translation to a RIF slot s[p->o]. Now, this slot again has a a RIF/RDF serialization (splitting off the RIF slot to a set of triples T), any of the triples in T again has an embedding in RIF. How do we resolve this semantic cycle?

Answer: The cycle here is the same cycle as occurs in prolog by the fact that clauses have the same syntax as terms. You can read a fact as a term, and assert it as part of another clause. You can read that clause as a term, and assert it as part of another clause, etc. But there is no actual problem here.

age(sam, 5).                     # simple fact 
claimed(fred, age(sam, 5))       # that fact turned into a term in another fact
claimed(jones, claimed(fred, age(sam, 5)))    # ... etc

This syntactic proposal does not define anything like 'claimed', it just aligns the rule syntax and the frame knowledge syntax, so that such things do not need to provide a mapping between the two.

1.2. Additional Issues/Requirements

In connection with the previous one, I would like to discuss in the group the followign requirements:

RDF Ground facts preserved in RIF/RDF i.e.
```
 s p o. (in RDF/XML) =?=  s p o. (in RIF/RDF)
 
```
Do we want RDF data to be preserved in its RIF reading? If not, ie. if we reify triples, shouldn't we rather reuse the existing reification vocabulary of RDF?
It should be guaranteed that the RDF version of a ruleset (RDF-)merged with some RDF graph g should not possibly change the RIF-meaning of the single rules in r. ie, it should be allowed to merge rulesets, rulesets and data, but not data into rules by simply merging the representing RDF graphs.

Reason for facts being preserved:

RDF and RIF would not be differ on simple collections of ground facts.
Modularity: Adding data to a ruleset could simply be done by trivial RDF merge.

Reason against facts being preserved:

Probably not an easy-win to get from current XML proposal and the proposed embedding of RDF facts into RIF.

Reason for single rules not changeable by RDF merge:

Modularity: Adding data to a ruleset could simply be done by trivial RDF merge, without
- the danger that the rules themselves would be changed.

Reason against single rules not changeable by RDF merge:

Reflection??? (I still see no worked out use case for this in UCR, though)

The whole idea of RDF bases on the ease of merge of RDF Graphs, if we have a RIF/RDF version, we should follow this rationale as well.

2. If so: is the XML root element 'rdf:RDF'?

Moved to Arch/XML Syntax Issues/Root Element.

3. Does the XML instance document indicate where order is significant?

The parts of XML documents, being sequential data structures, are inherently ordered. In some cases the order of elements in the document is significant, while in others it is not. For example, the order of rules in a ruleset might have important semantic consequences in one dialect while the order might have no meaning in another. In BLD, the order of rules in a Ruleset, of conjuncts in an And (Horn premise), of disjuncts in an Or, and of slots in a Uniterm or Frame is not significant, as specified by the formal BLD semantics.

When the meaning has no significance, systems may store the data in more efficient unordered containers. The use of ordered versus unordered containers could also be used on the interchange format level to signal to programmers whether order is significant, which affects how they write related software. Alternatively, semantic attributes can be used to specify ordered versus unordered elements on the interchange format level (the default being unordered). On the implementation level, different optimized data structures can then be chosen according to the high level specification. The question here is whether the input processors can determine whether order is signficant just by examination of an input document, or if they need to refer to the specification of the relevant dialect first.

(This is part of "Do we allow parsing without schema knowledge?" in Arch/XML_Syntax_Issues_1)

Reasons for including order information:

This information may be helpful for some implementation approaches
Required in RDF/XML

Reasons against including order information:

It's not usually done in XML
It complicates the syntax (with addition of some flag or alternate construct)
Order-independence is a matter for the semantics, not syntax, to specify.

With the right default, ordered="no", logics such as BLD can be easily represented in XML and can be conveniently modified to ordered dialects using an explicit ordered="yes" where required. For example, a BLD <Ruleset> is equivalent to <Ruleset ordered="no">. For a Prolog-like dialect <Ruleset ordered="yes"> can be used.

4. Are there local pointers in the XML serialization?

(This is a modified version of "IRIs vs Local Identifiers" in Arch/XML_Syntax_Issues_1, based on the idea that if we need identifier for parts of the syntax, local ones need to be allowed -- but we might not need them.)

Pointers are needed on some level if we want structure sharing or if the underlying instance data has loops. (Loops are avoided when developing a well-founded logic.)

In the RDF/XML example, these local pointers would be NodeIDs. In the general XML example, these local pointers would be id/idrefs.

Reasons for Local Pointers:

Allows structure sharing (see below)
Some dialects may have loop structures in their data

Reasons against:

Makes serialization and deserialization more complicated
Loop structures can also be represented using symbolic labels
Pointers should be avoided on the level of rule interchange (specifying the "What"), as they can be introduced on the implementation level (realizing the "How")
Reuse of rules is harder when they need to be disentangled from general graph structures, and easier when rules can be picked at the roots of their tree structures
As far as we know, all practical rule languages use syntax that avoids local pointers. It is unclear as to why should an XML syntax radically depart from the established practice of defining syntaxes for rule languages.

5. If so, is structure sharing allowed, mandated, or forbidden?

Structure sharing occurs when some part of the syntax tree occurs multiple times and instead of repeating the serialization at each occurrance, a back-pointer is serialized instead.

Reasons to mandate structure sharing:

Saves higher-level work on input processing (eg no symbol tables needed for rule variables)
RIF documents will be smaller

Reasons to forbid it:

Structure sharing is somewhat complicated and implementation-level (see above)
The Equal predicate of RIF BLD can be used as a declarative way to obtain small RIF documents by naming structures once and reusing their names as often as necessary

Reasons to have it optional:

Allows simpler serializers (they don't have to implement it if they don't want to)

6. What XML serialization for variables?

Does a variable look the same where it's declared as where it's used? When/if variables are typed (sorted), will the type be repeated at each use? Will a pointer be used?

Since the multisorted logic was abandoned, in BLD a variable cannot be constrained to only have (long) integer values. However, if this was re-introduced in analgoy to similarly constrained constants, it would be declared thus:

<Exists>
   <declare>
      <Var type="xsd:long">x</Var>
   <declare>
   <formula>
      ...
   </formula>
</Exists>

But imagine declaring a variable that is constrained to only have integer values as follows:

<Exists>
   <declare>
      <Var id="var_x">
         <name>x</name>
         <type>http://www.w3.org/2001/XMLSchema#int</type>
      </Var>
   <declare>
   <formula>
      ...
   </formula>
</Exists>

Now we have three options on using it:

6.1. Use Variable Name to Use Variable

<Uniterm>
   ...
   <VarReference>
      <name>x</name>
   </VarReference>
   ...
</Uniterm>

Note that we probably want to call it a "VarReference" instead of a "Var" because its syntax is different.

Actually, "Var" should be used since there is no syntactic distinction between bound and free variables in a pure logic such as BLD:

<Uniterm>
   ...
   <Var>x</Var>
   ...
</Uniterm>

Reasons For (Use Variable Name to Use Variable):

Fairly simple and well-understood

6.2. Repeat Variable Structure to Use Variable

<Uniterm>
   ...
      <Var>
         <name>x</name>
         <type>http://www.w3.org/2001/XMLSchema#int</type>
      </Var>
   ...
</Uniterm>

Actually,

<Uniterm>
   ...
      <Var type="xsd:long">x</Var>
   ...
</Uniterm>

Reasons For (Repeat Variable Structure to Use Variable):

Shows the type directly at the variable occurrence

Reasons against:

Increases variable serialization size somehow redundantly since the type can also be found be looking back to the declare

6.3. Use Variable ID for Reference

Suppose we add an ID to the declaration:

<Exists>
   <declare>
      <Var id="var_x">  <!-- Note the "id" attribute here -->
         <name>x</name>
         <type>http://www.w3.org/2001/XMLSchema#int</type>
      </Var>
   <declare>
   <formula>
      ...
   </formula>
</Exists>

Actually,

<Exists>
   <declare>
      <Var id="var_x" type="xsd:long">x</Var> <!-- Note the "id" attribute here -->
   <declare>
   <formula>
      ...
   </formula>
</Exists>

Then we can use it by just making a reference:

<Uniterm>
   ...
      <Var id="var_x" />
   ...
</Uniterm>

and the input processor (without knowing anything about the dialect or variables or anything) will produce (via structure sharing) an internal data structure like in option 2.

Reasons For (Use Variable ID for Reference):

Much more terse than repeating structures
Makes the semantics of Declaration/Reference more available at syntactic level -- they become all about the same variable, not the same variable name.
Reuses a mechanism (structure sharing) from a lower level

Reasons Against (Use Variable ID for Reference):

May be confusing to people who are used to thinking of variables as character strings (instead of as things named by character strings).
Adds unnecessary bloat to the RIF specification.
May be error-prone, since ids must be kept unique.

7. What XML serialization for local named constants?

Example from BLD0921, with both book and LeRif made local:

<Uniterm>
  <op><Const type="rif:local">book</Const></op>
  <arg><Var>Author</Var></arg>
  <arg><Const type="rif:local">LeRif</Const></arg>
</Uniterm>

(fully striped, where names become content between element tags)

8. What XML serialization for global (IRI) named constants?

Example from BLD0921, with both book and LeRif made global:

<Uniterm>
  <op><Const type="rif:iri">http://example.com/products#book</Const></op>
  <arg><Var>Author</Var></arg>
  arg><Const type="rif:iri">http://example.com/books#LeRif</Const></arg>
</Uniterm>

Example from eg7b:

     <Uniterm>
       ...
       <op>
         <GlobalConstant>
           <name>http://example.com/app#r</name>
         </GlobalConstant>
       </op>
     </Uniterm>

9. What XML serialization of literal data values?

Example from BLD0921:

   <arg><Const type="xsd:long">49</Const></arg>

(fully striped)

Example from eg7b:

   <Literal_int><intValue>1</intValue></Literal_int>

10. Are builtins distinguished in the syntax from logical functions and predicates?

Assumption so far has been: no (except in their IRI path names).

11. If so: What syntax is used?

Example from BLD0921:

<Uniterm>
  <op><Const type="rif:iri">http://www.w3.org/2007/rif/builtin/greaterThan</Const></op>
  <arg><Var>diffdate</Var></arg>
  <arg><Const type="xsd:long">10</Const></arg>
</Uniterm>

(fully striped, where names become content between element tags)