Nearby: DesignIssues, RDF Striping, Missing Isn't Broken, A story about RDF and XML, rdf:nodeID, ...

Why Is RDF The Way It Is?

Rough notes towards design choice archeology. danbri@w3.org

Syntax issues: the XML encoding

There are often many ways of serializing a complex graph into a tree. serializers can start with different objects, and split out multi-referenced things into different parts of the tree in different places. So that's just a fact about the problem space of encoding rdf's structure into xml trees.

Imagine 10 people, 10 courses, 10 topics, with a few relations (teaches, attended, contributedTo, uses, ...).

Another observation is that the way the relationship is named (wrote versus authored) is related to the encoding strategy a serializer will use.

<Document><author><Person><nick>danbri</nick></Person></author></Document> ...works fine, so long as the vocab you're using defines 'author'. If it defined 'made' instead, the encoding would need to start with Person, or use a cross-ref (via URI or bnode)

So there were two designs in tension. Being very regular, the <rdf:Description> approach, versus being idiomatic/intuitive in XML. That was where the option to pull down an rdf type and use it as the element name came from. In sad fact, rdf:Description should never have been used. the default should have been rdf:Thing or Resource, so that node elements always took a type. Water under bridge.

But that can never be 100% since RDF allows multiple types, even from different schemas, and the XML element name is one one place to put something.

So if someone wants to say that George Bush is both a Person and a Great Golfer, a reasonable task simply forces the situation of multiple ways of saying the same thing. There may be other things true of him as well; RDF's syntax only only shorthand for mentioning one class though.

The key niceness is we bother to document in RDF/XML the ways in which different idioms _can_ mean the same thing. Another multiways of saying same thing case: ordering rarely caries meaning ie. if <Person><foo>a</foo><bar>b</bar></Person is true, you know that <Person><bar>b</bar><foo>a</foo></Person> is true. and crawlers etc can safely throw away such irrelvant detail.

So RDF, in trying to help, gets beaten up! So unfair!

Finally, there is the attribute form.

The world wanted a way of putting RDF inside HTML docs.

This was even pre-xhtml days. We expected XML Schema to make that legal at some point, knowing it broke DTD validation. (RDF doesn't fit w/ DTDs btw cos we tried to be nice and allow idiomatic XML, ie allow unpredictable user-definted types to appear as element names).

The deployment consideration was that RDF in the HEAD of an HTML page would not "spill onto the page" and that in 97 seemed to require puttign content into attributes.

So... why not put it *all* in XML attributes? Why define <Person><eg:foo>a</eg:foo></Person> to carry same meaning as <Person eg:foo="a"> ? ARE WE JUST PERVERSE? ;)

Main stopper there: XML doesn't allow you to repeat elements of the same name.

But also, literal content sometimes has substructure. XML Literal etc., so there was a desire for a form that allowed a chunk of hypertext to appear as a property value.

OK so now you've got <Person><eg:foo>a</eg:foo><eg:bar>b</eg:bar></Person> and <Person><eg:bar>b</eg:bar><eg:foo>b</eg:foo></Person> and <Person eg:bar=b"><eg:foo>a</eg:foo></Person> and <Person eg:bar="b" eg:foo="a'/> and (from XML) <Person eg:bar="b" eg:foo="a'/> (ignoring other DTD and namespace variants at the XML layer for now).

Plus you've also got:

<rdf:Description eg:bar="b" eg:foo="a"><rdf:type rdf:resource="http://.../Person"/></rdf:Description>

and hence <rdf:Description><rdf:type rdf:resource="http://.../Person"/><eg:foo>a</eg:foo><eg:bar>b</eg:bar></rdf:Description> AND <rdf:Description eg:bar="b"><eg:foo>a</eg:foo><rdf:type rdf:resource="http://.../Person"/></rdf:Description>

...and this without getting into my first comment, that serializers can traverse a fancy graph in various ways, hence variation in terms of subelements etc. (and the interaction with vocab design, 'wrote' vs 'author').

But go back through the issues. The RDF M+S WG were not stupid people.

Should there be an attribute form for simple flat properties, so they can be embedded in HTML docs without spilling in downlevel browsers? Sure. Score points for realism and deployment considerations.

Should all properties be encoded as XML attributes? Nope. XML doesn't allow repetition of attributes, and some property values have structure (eg. XML, hypertext, lang tagging etc) so a markup idiom needed too. Score point for thinking ahead.

Should it be possible to use category words from the domain we're describing on XML elements that stand for a node in the graph? Sure. Intuitive and idiomatic.

Is this enough? No, since RDF graphs may have multiple types (not just from a class hierarchy) since we encourage mixing of multiple independent schemas for rich description. So a property form needed too (rdf:type). Score 1 for open world, pluralism etc.

Should property-encoding element ordering matter? No; 'cos the data model doesn't care.

Should users therefore have to use properties in 1 canonical order? No; hard to specify! Alphabetic order? <bar> before <foo>? Goes crazy w/ I18N, as well as being a dumb idea anyway.

Should property element ordering be ignored. Yup; oops, that means, again, there are multiple ways to say the same thing.

And so on.

Sometimes there are emergent properties of a set of sensible, well motivated decisions grounded in a whole load of subtle constraints.

We (I say we, I turned up late to this bit of work) had constraints from nature of the task (graphs into trees), from HTML browser deployment concerns, from XML arcana, from RDF's data model (unordered). </rant>

See dave's paper (@@url) for ideas on how to do this stuff in future

http://www.ilrt.bris.ac.uk/discovery/chatlogs/foaf/2004-07-17#T13-10-13-1

Each bit of the design (maybe except parseType="Resource", i forget where that came from) has its reasons for existing.

Q: who came up with parseType="Resource" ?

http://lists.w3.org/Archives/Member/w3c-rdf-syntax-wg/1998Oct/0068 examples of parseType usage

Example 2: anonymous node with properties

<R:Description about='#ResA'> <PropA> <R:Description> <!-- ** here's the problem --> <PropA1>ValA1</PropA1><PropA2>ValA2</PropA2><PropA3>ValA3</PropA3> </R:Description> </PropA> </R:Description>

can be written as

<R:Description about='#ResA'> <PropA R:parseType='Resource'> <PropA1>ValA1</PropA1><PropA2>ValA2</PropA2><PropA3>ValA3</PropA3> </PropA> </R:Description> --from Ralph, 23 Oct 98

http://lists.w3.org/Archives/Member/w3c-rdf-syntax-wg/1998Nov/0022.html 17 Nov, it went into spec.

another motvating example:

<Description about="John_Smith"> <n:weight> <rdf:value>200</rdf:value> <n:units rdf:resource="http://www.nist.gov/units/Pounds"/> </n:weight> </Description> Should there be <Description> around the value of <n:weight> like: <Description about="John_Smith"> <n:weight> <Description> <rdf:value>200</rdf:value> <n:units rdf:resource="http://www.nist.gov/units/Pounds"/> </Description> </n:weight> </Description> --(not ralph this time) @@mail ren?

There was an understanding that striping was sometimes inelegant, and that an extra attribute was less intrusive than an extra element nesting level (for the parseType="Resource" case).

Remember this was very early days for XML too. XML could've gone out the door without hooks for namespaces; rdf was a big motivator for that. XML schema etc didn't etc. or xpath, xquery etc. In 97 at least, when the basics of the rdf design were settled.