RDF Graph Literals and Named Graphs

Introduction

This document introduces a formal semantics for (RDF) Graph Literals and Named Graphs. Graph Literals allow applications to make statements on triples (eg, provenance) without really asserting them, ie, without ensuring their existence. Named Graphs makes it possible to assign a URI to a collection of triples, and being able to make statements on the whole set (in some ways, the URI of an RDF/XML file that contains a number of triples can be considered to be a Named Graph).

Whereas Graph Literals are defined in terms of Datatypes (ie, they do not require any addition or change to the core RDF Model and Semantics), the interpretation of Named Graphs is an extension to the core RDF Semantics. The term “Quoted Graph” can also be found in different tools, but this document used the term of Graph Literal to reinforce the relationship to RDF Literals.

The syntax and semantics of Named Graphs is inspired by the main ideas described in the paper of Named Graphs by J. Carroll et al.[[NAMED-GRAPHS]] although there are some additional features (namely union and intersection as well as the connection to graph literals). Ie, this document contains, essentially, a formalization of what has been described there.

The term “Named Graph” is actually problematic. The semantics below refers to Graphs, ie, sets of triples that are treated as a unit. “Name” can be a URI, but can also be a bnode identifier; it would be much cleaner to talk simply about Graphs instead. However, there might be a confusion with the terms used in the RDF Concepts document[[RDF-CONCEPTS]]. Also, the term “Named Graph” has also become widely used in the community.

The treatment of Graph Literals is largely in line with the treatment of similar constructs in N3[[N3]], although that tool does not define the constructs, strictly speaking, as part of the RDF datatype mapping.

Note that there are other approaches to define semantics for Named Graphs (although the term used is often 'Context', which may not be fully identical). A typical case is described by P. Pediaditis et al.[[PROV-RDF-S]]: it extends triples to quandruples by adding a reference to a URI to each of those, and then extend, eg, the RDFS inference rules to quadruples. These quadruples (or “quads”) actually corresponds to many triple store implementations (eg, AllegroGraph or RDFLib). (Actually, Pediaditis et al. go one step further by defining sets of names sets, called graphsets, with the appropriate inference rules.) In another paper ([[Coloring-RDF]]) the same research group uses the idea of quadruples to describe provenance mechanisms. Another approach is by Guha et al[[Context-SW]], which assigns a (context) URI to each resource. A recent paper of Renata et al.[[PROV-QUERY]] goes even further by defining quintiples instead of quadruples, with an extra stament identifier attached to quadruples; the goal is again (like in the case of the [[Coloring-RDF]] paper) to express provenance information.

The goal of this document was, however, to explore what can be achieved with a minimal departure from the current RDF model, keeping an ease of implementation and transition from current systems in mind. The semantics in this document have the advantage of not changing the core RDF semantics at all, ie, Named Graphs and Graph Literals appear as an extension rather than a change. There is no need for an extension of the current RDF serialization syntaxes (although some extension is convenient). The reuse of the datatype mechanism also means that many constructs, already defined by current standards (eg, datatype restrictions) become automatically valid for graph literals (at least conceptually; implementation still have to follow).

Prefix	Namespace URI
`rdfl:`	`http://www.w3.org/2009/rdfl#`
`rdfg:`	`http://www.w3.org/2009/rdfg#`
`rgfn:`	`http://www.w3.org/2009/rgfn#`

Graph Literals

Definition of Graph Literals

This specification introduces the notion of Graph Literals (also referred to as “quoted graphs”), an additional datatype denoted by the URI rdfl:GraphLiteral. The datatype is defined as follows.

Lexical Space:: Valid serialization of an RDF Graph in RDF/XML, Turtle, or N-Triples. Specific RDF serializations may restrict the serialization syntaxes they accept.
Value Space:: Valid RDF Graphs

To make the discussions below simpler, we introduce the G(l) notation, where l is a Graph Literal: G(l) denotes the RDF Graph obtained by interpreting (“parsing”) the value of l, ie, the corresponding value of the Literal. Using this notation, the Graph Literals v₁ and v₂ are defined to be equal if the graphs G(v₁) and G(v₂) are equivalent as defined in the RDF Semantics recommendation [[!RDF-MT]]. (Ie, the two graphs are identical except for a possible remapping of blank node identifiers.)

Note that blank nodes have local scopes to Graph Literals. Ie, two different Graph Literals have different blank nodes, even if the names used in the syntax are identical.

To use Graph Literals in applications, facets, functions, and operators on Graph Literals are also defined. Facets can be used, eg, to construct datatype restrictions in OWL 2[[!OWL2-OVERVIEW]]; functions and operators can be used with SPARQL[[!RDF-SPARQL-QUERY]] or with RIF[[!RIF-OVERVIEW]].

Constraining Facets for Graph Literals

A facet pair `(F v)` is in the facet space of `rdfl:GraphLiteral`, if…	Each such facet pair is mapped to a subset of the value space of `rdfl:GraphLiteral` containing…
… `F` is `rdfl:graphLiteralContains` and `v` is a Graph Literal	`{G(x): x is a GraphLiteral, and an instance of G(v) contains G(x)` (ie, `G(x)` is a subgraph of `G(v)` with a suitable mapping of blank node identifiers)
… F is `rdfl:graphLiteralIsContainedBy` and `v` is a Graph Literal	`{G(x): x is a GraphLiteral, and an instance of G(v) is contained by G(x)` (ie, `G(x)` is a supergraph of `G(v)` with a suitable mapping of blank node identifiers)

Functions for Graph Literals

This section uses “RDF Term” to denote either a URI or a bnode identifier.

Operator `rgfn:equivalentGraph`

xsd:boolean rgfn:equivalent-graph( RDF Term arg1, RDF Term arg2 )

The value is True if G(arg1) and G(arg2) are equivalent graphs, False otherwise.

Operator `rgfn:subgraph`

xsd:boolean rgfn:subgraph( RDF Term arg1, RDF Term arg2 )

The value is True if and instance of G(arg1) is a subgraph of G(arg2), False otherwise.

Syntax

Because Graph Literals are defined as Literals, the generic syntaxes defined for RDF automatically extend to Graph Literals, too. However, as referred to in the definition of the Graph Literal lexical space, specific RDF serialization syntaxes may restrict the serialization syntax accepted for Graph Literals.

Turtle

Turtle accepts N-Triple and Turtle syntax as serialiation syntaxes when used as the lexical value of Graph Literals. Also, @prefix declarations are automatically valid in the lexical space of Graph Literals (unless overwritten). For example, the following is a valid Turtle encoding for a Graph Literal:

@prefix ex: <http://www.example.com/>
ex:A ex:prop "ex:B ex:prop2 ex:C"^^rdfl:GraphLiteral .

However, just like Turtle offers a simplified syntax for some of the numeric literals, similar simplified syntax should also be accepted using the '{' and '}' characters. Ie, the example above could be written as:

@prefix ex: <http://www.example.com/>
ex:A ex:prop { ex:B ex:prop2 ex:C } .

RDF/XML

RDF/XML only accepts an RDF/XML encoding for Graph Literals. To differentiate the Graph Literals from the “real” triples encoded in the file, a special parse type value can be used. Ie:

<rdf:Description rdf:about="http://www.example.com/A">
    <ex:prop rdf:parseType="GraphLiteral">
      <rdf:Description rdf:about="http://www.example.com/B">
         <ex:prop2 rdf:resource="http://www.example.com/C"/>
      </rdf:Description>
    </ex:prop>
</rdf:Description>

which encodes the same Graph as in the Turtle examples. Here again, following the XML conventions, namespaces defined on the "top" level are automatically valid for the Graph Literal.

Examples

These examples are used to emphasize some characteristics of Graph Literals.

Triples are not asserted

It should be emphasized that triples within a Graph Literal are not asserted. In other words, the function G in the specification is only for definion purposes and not for the creation of those triples as part of the triples of the enclosing graph. For example, consider the graph, G₁:

@prefix ex: <http://www.example.com>
ex:A ex:prop { ex:B ex:prop2 ex:C } .

and the SPARQL query

PREFIX ex: <http://www.example.com/>
ASK WHERE { ex:B ex:prop2 ex:C . }

on that data; the result of the query is False.

Usage of constraining facets

Constraining facets in OWL 2 and using Graph Literals can produce restrictions on the graph. Consider the following Graph:

:GL a rdfs:Datatype ;
    owl:onDatatype rdfl:GraphLiteral ;
    owl:withRestrictions (
        [ rdfl:GraphLiteralContains { [] cc:attributionURL [] }]
    ) .

That restricts graph literals to those that contain a Creative Commons attribution triple. Using this datatype one can also define a class restriction like:

:A  a owl:Restriction ;
    owl:onProperty ex:p ;
    owl:allValuesFrom :GL .

Using these, an OWL processor would detect an inconsistency in the following Graph:

:x a :A;
    ex:p { q:a q:b q:c } .

but the following Graph is o.k.:

:y a :A;
    ex:p { q:a q:b q:c;
        cc:attributionURL <uri>
    } .

Semantics of Graph Literals

Formally, Graph Literals add to the core vocabulary and the semantics conditions of all variants of RDF Inference regimes. This section gives a formal definition of those additional conditions and axiomatic triples, although they contain no particular complications. Graph Literals extend the RDFS vocabulary to a larger, Graph Literal vocabulary rdfgV with more complex semantic constraints:

RDFL vocabulary

rdfl:GraphLiteral rdfl:graphLiteralContains rdfl:graphLiteralIsContainedBy

The rdfl-interpretation of the RDFL vocabulary are completely defined by the conditions in the table of RDFL semantic conditions below. An rdfl-interpretation of V is an rdfs-interpretation I of (V ∪ rdfV ∪ rdfsV ∪ rdflV) which also satisfies the following semantic conditions and all the triples in the subsequent table, called the RDFL axiomatic triples.

Additional RDFL semantic conditions involving Graph Literals.

If "xxxx"^^rdfl:GraphLiteral and G("xxxx") is a valid RDF Graph then IL("xxxx"^^rdfl:GraphLiteral) is the value of xxxx and IL("xxxx"^^rdfl:GraphLiteral) ∈ IG and IEXT(rdf:type) ∈ 〈IL("xxxx"^^rdfl:GraphLiteral), I(rdfl:GraphLiteral)〉

If "xxxx"^^rdfl:GraphLiteral and G("xxxx") is not a valid RDF Graph then IL("xxxx"^^rdfl:GraphLiteral) ∉ IG and IEXT(rdf:type) ∉ 〈IL("xxxx"^^rdfl:GraphLiteral), I(rdfl:GraphLiteral)〉

RDFL axiomatic triples.

rdfl:GraphLiteral rdf:type rdfs:Datatype . rdfl:GraphLiteral rdfs:subClassOf rdfs:Literal . rdfl:graphLiteralContains rdf:type owl:DatatypeProperty . rdfl:graphLiteralContains rdfs:range rdfl:GraphLiteral . rdfl:graphLiterlaIsContainedBy rdf:type owl:DatatypeProperty . rdfl:graphLiterlaIsContainedBy rdfs:range rdfl:GraphLiteral .

Named Graphs

Semantics of Named Graphs

The J. Carroll et al. paper[[NAMED-GRAPHS]] describes the semantics of named graphs. Here is a quote from the paper (the reference has been changed):

In more detail a set of Named Graphs N is a 5-tuple 〈N,V,U,B,L〉 where: U is a set of URIrefs; L is a set of literals (both plain and typed); B is a set of ‘blank’ nodes; U, B and L are pairwise disjoint; V = U ∪ B ∪ L is the set of nodes of N; N is a set of pairs forming a partial function from U to V × U × V . Each pair ng = (n,g) ∈ N is a Named Graph in N, and we write n = name(ng) and g = rdfgraph(ng). Thus, for each ng ∈ N, rdfgraph(ng) is an RDF graph (a set of triples) which is named name(ng).

We do not directly semantically interpret Named Graphs; however an RDF(S) interpretation I (as in [[!RDF-MT]]) conforms with a set of Named Graphs N when:

For every Named Graph ng ∈ N, we have I(name(ng)) = ng

RDFG Interpretations

Based on the definition in the quote, an extra vocabulary, rdfgV, can be defined that extends the RDFS vocabulary to a larger, named graph vocabulary with more complex semantic constraints:

RDFG vocabulary

rdfg:Graph rdfg:subGraphOf rdfg:equivalentGraph rdfg:unionOf
                                rdfg:intersectionOf rdfg:isGraph

Note, however, that the validity (ie, the correct interpretation) for these vocabulary terms are always bound to a specific set of Named Graphs N.

An rdfg-interpretation of V for a set of Named Graphs N (also referred to as “rdfg-interpretation in N”) is an rdfs-interpretation I of (V ∪ rdfV ∪ rdfsV ∪ rdfgV) which conforms with N as defined above, and which also satisfies the following semantic conditions as well as all the triples in the subsequent table, called the RDFG axiomatic triples (in N). (To make formulae more consise, the notation ngraph(x) is used as a shorthand for rdfgraph(name^-1(x))).

RDFG semantic conditions in N.

ICEXT(I(rdfg:Graph)) = dom(name)

If 〈x,y〉 ∈ IEXT(I(rdfg:subGraphOf)) iff x ∈ dom(name) and y ∈ dom(name) and ngraph(x) ⊆ ngraph(y)

IEXT(I(rdfg:subGraphOf)) is transitive and reflexive on dom(name)

If 〈x,y〉 ∈ IEXT(I(rdfg:equivalentGraph)) iff x ∈ dom(name) and y ∈ dom(name) and ngraph(x) = ngraph(y)

IEXT(I(rdfg:equivalentGraph)) is transitive, reflexive, and symmetric on dom(name)

If s is a sequence of g₁ , … , g_n ∈ dom(name) and 〈g,s〉 ∈ IEXT(I(rdfg:unionOf)) then ngraph(g₁) ∪ ngraph(g₂) ∪…∪ ngraph(g_n) ⊆ IGEXT(g)

If s sequence of g₁ , … , g_n ∈ IG and 〈g,s〉 ∈ IEXT(I(rdfg:intersectionOf)) then ngraph(g₁) ∩ ngraph(g₂) ∩…∩ ngraph(g_n) = IGEXT(g)

If 〈x,"yyyy"^^rdfl:GraphLiteral〉 ∈ IEXT(I(rdfg:isGraph)) iff x ∈ IEXT(I(rdfg:Graph)) and G("yyyy"^^rdfl:GraphLiteral) ⊆ V and ngraph(x) = G("yyyy"^^rdfl:GraphLiteral)

The first semantic conditions states that rdfg:Graph is a class denoting all URI-s that are assigned as names for named graphs. In all other conditions, equivalence, subsumption, etc, is taken in terms of b-node remapping. (See also note below.)

The semantic conditions also define a set of axiomatic triples on the RDFG vocabulary, namely:

RDFG axiomatic triples in N.

rdfg:Graph rdf:type rdfs:Class . rdfg:subGraphOf rdfs:domain rdfg:Graph . rdfg:subGraphOf rdfs:range rdfg:Graph . rdfg:equivalentGraph rdfs:domain rdfg:Graph . rdfg:equivalentGraph rdfs:range rdfg:Graph . rdfg:intersectionOf rdfs:domain rdfg:Graph . rdfg:unionOf rdfs:domain rdfg:Graph . rdfg:isGraph rdfs:domain rdfg:Graph . rdfg:isGraph rdfs:range rdfl:GraphLiteral .

RDFG Entailment with a set of Named Graphs N

S rdfg-entails in N E when every rdfg-interpretation in N which satisfies every member of S also satisfies E. This follows the wording of the definition of simple entailment in Section 2 of the RDF Semantics Recommendation [RDF-SEMANTICS], but refers only to rdfg-interpretations instead of all simple interpretations.

Since every rdfg-interpretation in N is an rdfs-interpretation, if S rdfg-entails E then it rdfs-entails E.

Syntax

No particular extra syntax is needed for Named Graphs, except for the syntax for Graph Literals.

Examples

Triples are asserted

The Graph Literal example above showed that Graph Literals do not assert any triples by themselves. The definition and semantics of rdfg:isGraph of Named Graphs will assert them, however. Eg, with the data:

@prefix ex: <http://www.example.com/>
<> rdfg:isGraph { ex:B ex:prop2 ex:C } .

the SPARQL query

PREFIX ex: <http://www.example.com/>
ASK
WHERE { ex:B ex:prop2 ex:C . }

will return True.

A slightly more general case, involving the GRAPH notion of SPARQL, is:

@prefix ex: <http://www.example.com/>
ex:G rdfg:isGraph { ex:B ex:prop2 ex:C } .

and, in this case, the SPARQL query

PREFIX ex: <http://www.example.com/>
ASK
FROM NAMED <http://www.example.com/G>
WHERE { GRAPH ex:g { ex:B ex:prop2 ex:C . } }

will return True.

Issues, Questions

Blank Nodes

An open issue is as follows: what is the scope of blank nodes in Named Graphs and how does one define that? The top level question is whether two named graphs can share a blank node or not.

The named graph paper[[NAMED-GRAPHS]] explicitly says that two named graphs should not share blank nodes on the syntax level:

For ng, ng' ∈ N with ng ≠ ng' then the blank nodes used in triples from rdfgraph(ng) are all distinct from those used in triples from rdfgraph(ng'), i.e. blank nodes cannot be shared between different graphs named in N.

Such strict disjointness is indeed reinforced for Graph Literals. It is not clear at this moment whether such general statements for all named graphs is useful, however. Eg, when triple stores are considered where the triples are clustered into several graphs for, say, provenance purposes. In such a setup it is perfectly concievable that two graphs would have overlapping triples involving blank nodes. It also makes the practical implementations of, say, subgraph relationship a bit more complicated. Indeed, a practical inference rule for the semantics is to say that if a triple belongs to graph G₁ then adding that triple to G₂; however, at present, this should also include replacing possible blank node identifiers by new ones.

B.t.w., the named graph construct of SPARQL is silent on the issue… That may very well be the approach to take, and has been done in this document.

Union: equal or subset?

The current semantic conditions define the union as a superset of the set operations in the semantic restrictions, rather than an equality of the sets. The main motivation was to be able to define simple entailement rules for implementation, which would not require something like a 'there exist a g' for the implementations of unions. It also seems to fit better an open world assumption. Nevertheless, maybe this is too restrictive, equality should be used, and implementer should do what is required both for the runtime and the syntax…

Usage (or not) of the Reification Vocabulary

The original paper J. Carroll et al.[[NAMED-GRAPHS]] also referred to the possibility to use the RDF reification vocabulary to construct Named Graphs. In that setting, the following RDF triples

<S> a rdf:Statement;
  rdf:subject   <s> ;
  rdf:predicate <p> ;
  rdf:object    <o> .

would yield a Named Graph <S>, containing the triple (<s> <p> <o>). However, such a formulation (in combination with the possible usage of owl:sameAs) would lead to the 'Superman problem'. As that has been one of the strong objections against the usage of the reification vocabulary in, for example, provenance related issues (while that is one of the obvious application areas for Named Graphs), it may be better not to include it.

Introduction

Conventions

Graph Literals

Definition of Graph Literals

Constraining Facets for Graph Literals

Functions for Graph Literals

Operator rgfn:equivalentGraph

Operator rgfn:subgraph

Syntax

Turtle

RDF/XML

Examples

Triples are not asserted

Usage of constraining facets

Semantics of Graph Literals

Named Graphs

Semantics of Named Graphs

RDFG Interpretations

RDFG Entailment with a set of Named Graphs N

Syntax

Examples

Triples are asserted

Issues, Questions

Blank Nodes

Union: equal or subset?

Usage (or not) of the Reification Vocabulary

Operator `rgfn:equivalentGraph`

Operator `rgfn:subgraph`