Arch/XML Syntax Issues 1 - W3C RIF-WG Wiki

The XML syntax for all RIF dialects is a simple recursive serialization of the syntactic objects being interchanged. For instance, the serialization of a ruleset includes the serialization of each rule in the ruleset, and the serialization of each rule includes the serialization of each condition used in that rule.

Several issues have not yet been decided by the Working Group. These are not entirely orthogonal, but some attempt has been made to keep them as independent as possible.

Example of an XML serialization style:

<Dog>
    <iri>http://hawke.org/2005/Taiko</iri>
    <color>Black</color>
    <born><xsd:datetime>1995-05-28</xsd:datetime></born>
    <companion><Dog>
        <iri>http://hawke.org/2005/Tzuzumi</iri>
    </Dog></companion>
<Dog>

Another possible serialization style:

<Dog id="http://hawke.org/2005/Taiko" color="Black">
    <born><xsd:datetime>1995-05-28</xsd:datetime></born>
    <companion id="http://hawke.org/2005/Tzuzumi">
<Dog>

AdrianGiurca: Another possible serialization:

<ex:Dog rif:id="http://hawke.org/2005/Taiko">
    <ex:born>
     <rif:TypedLiteral rif:type="xsd:datetime">1995-05-28</rif:TypedLiteral>
    </ex:born>
    <ex:color> 
     <rif:TypedLiteral rif:type="ex:ColourEnumeration">Black</rif:TypedLiteral>
    </colour>
    <ex:companion>
     <ex:Dog rif:id="http://hawke.org/2005/Tzuzumi"/>
    </ex:companion>
</ex:Dog>

AdrianGiurca: I propose the attribute type to be mandatory.

1. Big Issues

1.1. Do we allow stripe-skipping?

(explained elsewhere)

SandroHawke: I lean against it, favoring ease of (correct) implementation over brevity and ease of human reading/writing.

DaveReynolds: Seconded. Non-stripe skipped is also easier to manipulate with XSLT et al.

GaryHallmark: After playing around a bit with converting the ASN06 of WD1 to XML Schema, I believe some stripe skipping "falls out" from these conversion rules:

classes with subclasses are represented as a choice. Children of the choice are either
1. a ref to a group with name of the subclass, if such a group exists (i.e. the subclass itself has subclasses), or
2. an element with name of the subclass and type of the subclass.
If the class has superclasses, the choice is wrapped in a group definition and a group reference to that definition is wrapped in a complexType definition. Else the choice is wrapped in a complexType definition. Names of definitions are the same as the class name.
classes without subclasses are represented as a sequence of properties, including properties in superclasses. Properties are represented as an element with name the same as the property name and with type the same as the property type. Lists are represented using maxOccurs on the element. The sequence is wrapped in a complexType definition with name the same as the class name.

See RIF3.xsd for the resulting schema. An example instance document is in RIF3.xml.

The conversion rules are simple and seem to yield a schema that is less verbose than non-skipped and yet still easy to use (at least from a JAXB perspective).

Here is a modified set of conversion rules that does not skip stripes:

Classes with subclasses are mapped to a choice. Children of the choice are either
1. a ref to the group with name of the subclass, if the subclass itself has subclasses, or
2. a ref to the element with name of the subclass
If the class has superclasses, the choice is wrapped in a group definition and a group reference to that definition is wrapped in a complexType definition. Else the choice is wrapped in a complexType definition. Names of definitions are the same as the class name.
Classes without subclasses are mapped to a sequence. Children of the sequence are mapped from the class properties, including properties in superclasses. A property is mapped to an element with name the same as the property name and with type the same as the property type. Lists are represented using maxOccurs on the element. An element definition is created with name the same as the class name. The element wraps a complexType that wraps the sequence. In addition to the element definition, if the class is the class of a property, a complexType definition is created with name the same as the class name. The complexType wraps a sequence that wraps an element that references the element with name the same as the class name.

See RIF4.xsd for the resulting schema. An example instance document is in RIF4.xml.

The conversion rules are simple enough and JAXB generates just 2 additional classes as compared to the skipped version. It is necessary to handle the name conflict of the element and complex type mapped from classes without subclasses. This can be done by choosing different names or by using JAXB customization instructions.

AdrianGiurca: I argued in the favor of non non-stripe skipped.

We do need to choose property names that indicate relationships between classes, rather than just repeat the class name. E.g. "Person has a companion that is a Dog" will look better when serialized than "Person has a dog that is a Dog".

1.2. Do we use an RDF/XML Normal Form?

RDF/XML can be used to serialize objects in XML. We could use a constrained subset of RDF/XML which allows strict schema validation. Should we? (The syntax would probably be a bit more complicated, but we get off-the-shelf parsers and specs.)

For example:

<Dog rdf:about="http://hawke.org/2005/Taiko" 
    <color>Black</color>
    <born rdf:datatype="&xsd;datetime">1995-05-28</born>
    <companion rdf:resource="http://hawke.org/2005/Tzuzumi" />
</Dog>

is much like the other examples on this page, but is also RDF/XML. Parsers need not know anything about RDF/XML, though -- they can be written just using the RIF Core Schema. But by giving the identification and datatype properties the right names, the format can also be parsed by RDF/XML systems. The main cost may be in possible confusion.

1.3. Do we allow parsing without schema knowledge?

This design question affect many little questions. If schema knowledge is required for parsing, then (for instance) we don't need to specify the datatypes wherever data occurs (although we may want to anyway).

For example, in the instance:

<Dog>
   <weight>89</weight>
</Dog>

we can't tell whether the "89" should be turned into an integer, floating point number, or kept as a string -- unless we look at the schema.

In contrast, with

<Dog>
   <weight datatype="&xsd;int">89</weight>
</Dog>

we know it's an "int" even without the schema.

2. Issues Related to Literals

See http://www.w3.org/TR/rdf-concepts/#section-Literals

2.1. Should string values be child-elements or attributes?

Child Elements:

<Dog>
   <color>Black</color>
</Dog>

Attributes:

<Dog color="black" />

DaveReynolds: Minor preference in favour of child-elements (use attributes for annotations like typing, content for .. um .. content).

AdrianGiurca: I completely agree with Dave comment in the form:

<ex:Dog rif:id="http://hawke.org/2005/Taiko">
 <ex:color rif:type="ex:ColourEnumeration">Black</ex:colour>
</ex:Dog>

2.2. Should datatypes look like class names?

Yes:

<Dog>
   <born><xsd:datetime>1995-05-30</xsd:datetime><born>
</Dog>

No:

<Dog>
   <born datatype="xsd:datetime">1995-05-30</born>
</Dog>

DaveReynolds: Minor preference in favour of the latter (same weak argument as above and consistency with xs:type, rdf:datatype).

AdrianGiurca: The same option as Dave in the form:

<ex:Dog>
  <ex:born>
   <rif:TypedLiteral rif:type="xsd:datetime">1995-05-28</rif:TypedLiteral>
  </ex:born>
</Dog>

GaryHallmark: Another option:

<Dog>
    <born>xsd:datetime("1995-05-30")</born>
</Dog>

AdrianGiurca: I guess this is more difficult to be parsed and more error prone.

2.3. Should strings require a datatype?

Yes:

<Dog>
    <color><xsd:string>Black</xsd:string></color>
</Dog>

No:

<Dog>
    <color>Black<color>
</Dog>

DaveReynolds: Minor preference for "no".

AdrianGiurca: Colour is a good example for an enumerated datatype:

<ex:color rif:type="ex:ColourEnumeration" xml:lang="us-en">Black</ex:colour>

Are we going to allow that?

2.4. Where do we put the language tags?

Internationalization principles require us to allow users to tag strings with natural language.

Something like:

<Dog>
   <color lang="en.US">Black</color>
   <color lang="fr">Noir</color>
</Dog>

Note that language-tagged strings can't be xsd:strings. So if we do this, we need two kinds of strings.

DaveReynolds: Variable and rule names, comments etc should be internationalizable text because they will be presented to users (e.g. in rule editors) but most string literals are tests against object values and really are strings not text. I can see an agument for a text datatype but it should be a separate datatype and I don't think it is required for phase 1.

AdrianGiurca: I guess is a good idea to use the XML xml:lang attribute described in the XML Recommendation (IETF RFC 3066).

<ex:color rif:type="ex:ColourEnumeration" xml:lang="en-us">Black</ex:color>
<ex:color rif:type="ex:ColourEnumeration" xml:lang="fr">Noir</ex:color>

3. Issues Related to Identifiers

3.1. Should identifiers be child-elements or attributes?

Child Elements:

<Dog>
    <iri>http://hawke.org/2005/Taiko</iri>
    <companion><Dog>
        <iri>http://hawke.org/2005/Tzuzumi</iri>
    </Dog></companion>
<Dog>

Attributes:

<Dog iri="http://hawke.org/2005/Taiko">
    <companion iri="http://hawke.org/2005/Tzuzumi">
<Dog>

SandroHawke: I prefer Attributes, because I think the identifiers are fundamentally different from properties; they are something the parser should know about, for when there are loops in the transmitted objects.

AdrianGiurca: I also prefer attributes but I am in the favor of a full syntax which to easy know the type of the resource i.e.

<ex:Dog rif:id="http://hawke.org/2005/Taiko">
    <ex:companion>
    <ex:Dog" rif:id="http://hawke.org/2005/Tzuzumi"/>
    </ex:companion>
</ex:Dog>

3.2. If attributes are used, should they be allowed directly on properties (as above), or should we force a class stripe

Example:

<Dog iri="http://hawke.org/2005/Taiko">
    <companion><Dog iri="http://hawke.org/2005/Tzuzumi" /></companion>
<Dog>

DaveReynolds: Force a stripe (consistent with avoiding stripe skipping elsewhere).

3.3. If attributes are used, and allowed on properties, should the attribute name be the same?

For example, RDF/XML has the terms be different:

<Dog rdf:about="http://hawke.org/2005/Taiko">
    <companion rdf:resource="http://hawke.org/2005/Tzuzumi" />
<Dog>

SandroHawke: unless we're going to use the RDF names, I think they should be the same as each other.

DaveReynolds: if this usage is allowed the attribute names should be different, they are different concepts.

AdrianGiurca: This is a good example of serialization to RDF of the XML syntax previously discussed.

GaryHallmark: I'm used to something like:

<Document targetNamespace="http://hawke.org/2005/" tns="http://hawke.org/2005/">
    <Dog name="Taiko">
        <companion ref="tns:Tzuzumi"/>
    </Dog>
</Document>

I like this because

http://hawke.org/2005/Taiko seems like an IRI but it doesn't ever need to appear in its full form (maybe this is bad from an RDF point of view?)
name v. ref makes it clear whether you are "defining" a new Dog or referencing a Dog

AdrianGiurca: I guess may be this is appropriate too:

<Document xmlns="http://hawke.org/2005/" xmlns:pets="http://hawke.org/2005/pets" >
   <pets:Dog rif:id="pets:Taiko">
    <pets:companion> 
     <pets:Dog rif:ref="pets:Tzuzumi"/>
    </pets:companion>
   </pets:Dog>
</Document>

This goes again in the favor of qualified names.

3.4. IRIs vs Local Identifiers

Can you have local identifiers, or only URIs, for the syntactic elements?

Local identifier using blank-node syntax:

<Dog iri="_:taiko">
    <companion><Dog iri="_:tzuzumi" /></companion>
<Dog>

Or using a different attribute name:

<Dog local_id="taiko">
    <companion><Dog local_id="tzuzumi" /></companion>
<Dog>

DaveReynolds: to me locality of identifiers is a scoping issue not a naming issue and I'd rather just have iri's but I know that's not a shared view

GaryHallmark: this is confusing because I don't know whether tzuzumi is a new Dog or a reference to a Dog defined elsewhere. I'd like to avoid bnodes.

AdrianGiurca: Looking to the model semantics of RDF I guess we need to avoid bnodes. May be this can be written as:

<ex:Dog rif:id="taiko">
 <ex:companion><rif:Var rif:type="Dog" rif:name="tzuzumi" /></ex:companion>
</ex:Dog>

3.5. Allow CURIEs? (and which style)

The issue here is how to use some kind of namespace declarations to avoid having long IRIs through the document.

No:

<Dog iri="http://hawke.org/2005/Taiko">

No (using standard XML entity references):

<Dog iri="&ns;taiko">

Yes (inside iri spot with brackets):

<Dog iri="[ns:taiko]">

Yes (with a different attribute name):

<Dog curie="ns:taiko">

GaryHallmark: I like the last option if its xml schema declaration is

<attribute name="curie" type="xsd:QName"/>

Is there some reason why

<Dog iri="ns:taiko">

is bad?

AdrianGiurca: I prefer to use qualified names since it is more easy to create/define local names when they are needed. I agree with the last solution proposed by Gary but in the form:

<ex:Dog rif:id="ns:taiko"/>