This is an archive of an inactive wiki and cannot be modified.

RDF bNode Semantics and Interoperability with Rules Languages

An RDF triple has the form (s,p,o), where s and o may be bNodes. "bNode" stands for "blank node", which refers to the fact that the corresponding nodes in the RDF graph are "blank", i.e., have no label.

On this page we first review bNodes and the semantics of bNodes in RDF, identify the problems in combining the RDF bNode semantics with rules languages, and propose a number of possible ways of dealing with the issue.

Blank Nodes in RDF

bNodes serve the purpose of representing resources which cannot be named at present. For example, bNodes allow us to represent something like "somebody whose name is 'John' and who is 25 years old". This can be represented using the following triples:

In this particular notation, blank nodes always start with '_:', followed by a name to identify the blank node, so that the same blank node can be referenced in several triples of the same graph (a graph is a set of triples).


Now, bNodes are not only used to represent resources which cannot be named, but also to overcome some syntactical limitations of RDF and to facilitate the encoding of other languages (e.g., OWL DL) in RDF.

Take the triple (_:X, name, "John"). By the RDFS semantics conditions [1], every plain literal must be in the class extension of rdfs:literal, which means that <"John", rdfs:literal> must be a tuple in the relation associated with rdf:type. Therefore, one may expect that a graph consisting of the triple (_:X, name, "John") entails the triple ("John", rdf:type, rdfs:literal). However, this is not a valid triple in RDF, since RDF does not allow literals in the subject of a triple. The RDFS semantics therefore approximates this triple by using a blank node. We can illustrate this as follows:

This introduction of bNodes can not only be done for literals, but for any kind of resource:

Note also that the names of bNodes are not important. For example:

In fact, it turns out that bNodes can be interpreted as existentially quantified variables in first-order logical languages.

RDF bNodes in RIF

Although we recognize that there is an ongoing debate on how RDF triples are best represented using predicate formulas http://lists.w3.org/Archives/Public/public-rif-wg/2006Jan/0000.html, we choose for reasons of convenience to represent each triple (s,p,o) using a ternary predicate triple(s,p,o).

In fact, for this particular issue it does not matter whether triples are represented as ternary or binary (e.g. p(s,o)) predicates, but for the representation of certain SPARQL query patterns, namely patterns which contain a variable in place of the predicate name, the binary predicate brings additional challenges, because it requires a higher-order syntax (see also Higher-Orderness).


bNodes are represented by existentially quantified variables and an RDF graph corresponds to the existential closure of a conjunction of atomic formulae of the form triple(s,p,o).

For example, the RDF graph:

can be represented as:

Whereas this type of formula is perfectly valid first-order logic, it is not Horn logic (see also Horn_Rules_Semantics) and thus does not fall in the domain of traditional rules languages.

Traditional rule languages based on Horn logic and the Herbrand model semantics cannot deal with unknown individuals which are introduced by existentially quantified variables in the heads of rules and RDF triples can be seen as facts, i.e., rules without a body.


Through a process of skolemization (i.e., replacing all existentially quantified variables with fresh constants), one can eliminate the existentially quantified variables. bNodes which have been skolemized are sometimes called "rigid bNodes".

However, skolemization does not solve all the problems. More specifically, when naively skolemizing RDF graphs, entailment may be lost.

Consider the two graphs:

Clearly, S rdf-entails E. The skolemization of S, denoted Sk(S), also rdf-entails E. However, Sk(S) does not rdf-entail Sk(E), since the occurrences of ?x are replaced with different fresh skolem constants.

Conclusion

There are three general ways to deal with the bNode issue.


[1] Patrick Hayes. RDF semantics. W3C Recommendation 10 February 2004. http://www.w3.org/TR/rdf-mt/.

[2] Rosati, R. 2006. DL+log : Tight integration of description logics and disjunctive datalog. In KR2006.

[3] Eiter, T.; Lukasiewicz, T.; Schindlauer, R.; and Tompits, H. 2004. Combining answer set programming with description logics for the semantic web. In Proc. of the International Conference of Knowledge Representation and Reasoning (KR2004).