Position paper
W3C Rules language Workshop April 2005
Tim Berners-Lee, Dan Connolly, Eric Prud'homeaux, Yosi Scharf
MIT/CSAIL Decentralized Information Group
This short paper summarizes experience at MIT/CSAIL in developing and using Notation3 (N3) as a language for RDF and as a rules language for the Semantic Web. N3 was developed as simple syntax for RDF. Then, to make a rule language, graph literals and variables were added to N3. RDF properties were then introduced to allow rules to be expressed, web access and built-in functions.
This paper is provided as input to the W3C workshop on rules languages for the semantic web. A more elaborate introduction is provided by the N3 Tutorial [Ber03].
The semantic web operates at the data level by (a) considering the semantics of any existing data and representing it as a graph of typed binary relationship arcs between items; and (b) using URIs to identify items, including the types of arcs. This is the RDF data model, and the RDF specifications provide a serialization format for such information in XML. [RDFC04]
From whiteboards to chat channels, it was found useful to have a minimalist syntax for jotting down and reading RDF. Notation3 is a language using conventional unix-style punctuation, which has is both writable and readable more easily than the RDF/XML syntax.
ex:c1 rdf:type ex:Car; ex:licensedYear 2002, 2003, 2004; ex:color "green".
Various forms of literal value are allowed in RDF graphs, however the RDF standard does not itself provide for another RDF graph itself to be a data value. Remedying this allows one to express relationships between graphs, for example that a given graph is the RDF content of a particular document. The importance of agents on the semantic web being aware of where data has come from and where it is allowed to go to raises a need to be able to explicitly talk about graphs.
ex:Joe ex:said { ex:c1 ex:color "charcoal" }.
In its blank nodes (items in the graph not directly identifies by a URI) an RDF graph has a form of existential variable. Extending the language to allow variables existentially or universally quantified over a graph allows N3 to be used for a form of logic. The drive for this initially for N3 was so that, given variables, a rule is just a relation between two graphs.
Variables are defined such that when substitution occurs in a graph, it also occurs in any nested graph.
In the <http://www.w3.org/2000/10/swap/log#>
namespace,
here given the prefix log:
, the log:implies
property expresses a rule, its subject being the antecedent graph, and the
object being the consequent graph. The shorthand => may be used for
log:implies.
{ ?x fam:brother ?y; fam:son ?z } => { ?x fam:nephew ?z }.
The N3 rule engine built by the authors, cwm
, is a crude
forward chaining reasoner operating with such rules. Rules may have full N3,
even with nested graphs, on both sides of the implication. This gives a form
of completeness as rules can generate rules. When used as a rule language on
RDF alone, N3 can of course be constrained so that there is no nesting of
graphs.
The fact that the rule language and the data language are the same gives a
certain simplicity (there is only one syntax) and completeness (rules can
operate on themselves, anything written in N3 can be queried in N3). This
would be broken if a special syntax were added for built-in functions and
operators. Instead, these are simply represented as RDF properties. The
cwm
engine, when analyzing a rule prior to running it, treats
specially those properties it knows as calculable functions which occur in
the antecedent.
{ ex:d test:point ?x. ?x math:sin ?y } => {...}
In the wide range of applications we hope to be deployed across the Semantic Web, it is expected that different engines will be capable of implementing different sets of functions. Also, one can expect to be able to dynamically load software to implement new functions. Also, cwm can be told that that a particular property is defined by a particular remote document or remote service. This means that dynamically, the treatment of a property can change as it becomes calculable. The boundary between "built-in" functions and other properties is not well defined. All this speaks against built-in functions being brought out as special syntax, and supports the use of RDF properties for them.
Using properties as built-in functions raises the common question in RDF
of how n-ary functions are represented. The choice taken in cwm
was to use RDF lists (collections) to group the multiple arguments to a
function
{ ?x a ex:TestData. ( ?x 1 ) math:sum ?y. ( ?y " is one more than " ?x ) string:concatenation ?s } => { ?s a ex:Result }.
This may require [BP] attributing more tuple-like semantics to lists than they come with out of the RDF box.
The built-in function log:semantics
accesses a resource,
retrieves a representation of it, parses that and returns the graph.
(Currently, cwm will parse RDF/XML, and N3 and its subsets; GRDDL maybe added
later.)
Another function, log:includes
, checks whether one graph is a
subset of the other. Together, these allow rules to access the web, and to
objectively check the contents of documents, without having to load them and
believe everything they say. In this example the master.rdf
file
is checked to see what it says is an order, and those orders are checked to
see what items they mention. At no time is either file trusted for any other
information.
@forAll v:DOC, c:G1, v:Order, v:y. { <master.rdf> log:semantics v:G1. v:G1 log:includes { vi:DOC a biz:CustomerOrder }. v:DOC log:semantics v:ORDER. v:ORDER log:includes { [] biz:item v:y }. } => { v:DOC ex:orderItem v:y }.
Wheras some datasets (such as a list of members of a club) are definitively complete, others (such as a set of temperature measurements) are not: one never knows when evidence may come to light of another. This aspect of the semantic web makes negation as failure meaningless unless it is associated to a specific dataset.
Just as RDF statements on the semantic web are reusable by other parties, and combinable with others to make a larger applications, so also it is a design goal for semantic web rules that they can be reusable in a similar way.
The effect of a default with an explicit domain is achieved with
log:notIncludes
, the negation of log:includes
. In
the example below, if an order has an item which is car, and the order
doesn't say that the car has some color, then the car is black.
{ <thisOrder.rdf> log:semantics ?ORDER. ?ORDER log:includes { ?x biz:item ?y. ?y a ex:Car }; ?ORDER log:notIncludes { ?y ex:color [] } } => { ?y ex:Color "black" }.
The semantics of this are a great improvement on negation as failure with an undisclosed domain, but the syntax is clumsy, and syntactic sugar could be investigated.
The authors have considered introducing binary operator syntax as syntactic sugar. This would be a general extension to the N3 syntax. The need for it has not been sufficiently acute to date to merit increasing the complexity of the language.
The language outlined here has been used as a data language by many implementations. It has been used as rule language also by Euler[DR05], a backward-chaining reasoner, and Pychinko[Par05], a rete-based rule engine. The fact that the rule language has been used in fairly different engines is encouraging.
Subsets of N3 have been published as NTriples [RDFT04] and Turtle [Beck04].
The N3 extensions to RDF have also been used to represent queries and patches. [Ber04]
N3 was developed with much discussion with Jos de Roo and Sean Palmer and others in the RDF Interest Group, now the Semantic Web Interest Group. Thanks to everyone involved.
Latest version
available at http://www.w3.org/TR/rdf-concepts/
Latest version available at http://www.w3.org/TR/rdf-testcases