Containers of Triples

From RDF Working Group Wiki
Revision as of 12:37, 5 May 2012 by Sandro (Talk | contribs)

Jump to: navigation, search
(Or: Types of "Graphs")

In RDF, people use the term "graph" in many different kinds of ways, referring to many different kinds of collections of triples. Sometimes the distinctions do not matter; sometimes they do. This page points out some of the cases where they do.

See also the older Graph Terminology which explain the terms nicely, but in an abstract way. Here we are trying to be very concrete about the differences.

Applicable Metadata Properties

Property g-text g-snap g-box frozen g-box
"Last Modified" (What was the most recent time there was a change in which triples it contained?) No No Yes Yes [1]
"Creation Time" (When was it created? What is the time before which it did not exist?) No No Yes Yes [1]
"Content-Length" (How many bytes long is it) Yes [6] No No [5] No [5]
"Content-Type" (Which format or language is it written in, eg "application/rdf+xml") Yes No No [5] No [5]
Read Access List (What is the complete list of people who are allowed to see which triples are in it?) No No Yes Yes
Write Access List (What is the complete list of people who are allowed to update which triples are in it?) No No Yes No
Maintainer (Who is responsible for maintaining this information?) No No Yes Yes [2]
Creator (Who created this?) Yes Yes Yes [3] Yes
License (What is the legal license expressing how this may be used?) Yes Yes [4] Yes Yes
In Dataset X (Is this queriable as part of some particular SPARQL Dataset? A SPARQL Dataset is the abstract information against which a SPARQL query is executed.) No Yes No No

Notes:

[1] For a frozen g-box, Last-Modified is the time it became frozen.

[2] Debatable, but maybe providing metadata about it and making it dereferenceable is a kind of maintenance

[3] Can be very poorly-defined, like who created a given wikipedia page. Probably it's the person who first put content there, pushing the "create page" button, but people may often incorrectly think they created some or all of the content currently there.

[4] What happens legally if Alice grants you one license and Bob grants you another? That could easily happen if they both licensed the same graph. Does copyright law deal with the possibility of two people creating the same work, independently? If they contain the same triples, there would be no way to distinguish between the g-snap Alice is trying to license and the g-snap Bob is trying to license.

[5] If the contents are only being served in one format (maybe it's just an RDF/XML file), then YES would make sense.

[6] Strictly speaking there might be a distinction between a g-text expressed in characters and the same g-text expressed in bytes using some particular character encoding, such as utf-7, utf-8, or utf-16. We're glossing over that difference for now.

Sorting Flowchart

So you have some sort of container or group or collection of RDF triples. What type is it?

1. At any given point in time, is it clearly defined which triples are contained in it? For example, considering a Web service which returns different RDF data based on the client's IP address, we would have to say "No". If YES, go to question 1. If NO, it's not a container of triples, this is not the right quiz for you.

2. While it conceptually contains triples, does it actually consist entirely of a sequence of characters (or bytes)? If YES, it's g-text. If NO, proceed to number 3.

3. Could you have two of these, with distinct identities, but containing exactly the same triples? "Distinct identities" means you might want to say something about one and not also be saying the same thing about the other. If YES, it's some kind of g-box; proceed to number 4; if NO, it's g-snap.

4. Is it theoretically possible for it to contain different triples tomorrow than it does today? If YES, it's a g-box. If NO, it's a frozen g-box.

Types of Containers

G-Text

A g-text is a character string written in some RDF language, like RDF/XML or Turtle. Some might not call it a container of triples. In a sense it only contains them indirectly, because it completely describes a g-snap. Still some people would probably call this a graph:

<a> <b> <c>
<c> <bb> <a>

When computers transmit triples, they put them in a g-text, then transmit the g-text.

Official name: RDF Graph Serialization


G-Snap

This is for the purists: it's a mathematical set of triples. The g-snap has no identity of its own. Anything you can say about the g-snap containing just the triple <a> <b> <c> is true of every g-snap containing just that triple, because there's really only one g-snap like that.

It also makes no sense to talk about a g-snap changing.

G-snaps tend to be somewhat confusing outside of math. It may help to think of them like integers. There is only one number 7, for example. We can write it down many places, and change what's written in those places, but we can't change the number itself. We can assign meaning to some of those places -- on the score board in a stadium, for instance, it means a lot -- but it's complete nonsense to say that one person's number seven is somehow different from another person's number seven. There is only one number seven. Similarly, there is only one empty g-snap, only one g-snap containing just <a> <b> <c>, etc.

This is what the RDF Recommendation (2004) calls a "graph" or "RDF Graph".

Official Name: RDF Graph

Suggested other names: "graph snapshot" (but cf "frozen g-box"), "pure graph", ...


G-Box

The g-box is perhaps the most versitile of containers for triples. G-boxes retain their separate identities regardless of which triples they contain, and their triples can (in general) change over time.

Examples of g-boxes includes:

  • An HTML page with embedded RDFa or microdata
  • A SQL database exported as RDF using R2RML
  • The "default graph" or any of the "named graphs" of a SPARQL server supporting SPARQL 1.1 Update. (Quotes are used around these terms to indicate common usage, which is different from how the terms are defined in the specs.)
  • A file containing RDF/XML, Turtle, or N-Triples

Suggested names for this include "graph container", "data space", "data page", "layer", and "sheet".

Frozen G-Box

The frozen g-box is a subclass of g-box, which is defined such that it will never again change its contents. It's not clear how practical this is, but there seems to be some demand for it. See Rolling Snapshots.

If it were identified with a URL that included some kind of cryptographic hash, that might help keep it from changing.

The frozen g-box is different from the g-snap in that it retains it's own identity, separate from its contents. For example, there can be many different empty frozen g-boxes, each meaningfully having its own metadata.

Suggested names for this include "graph snapshot" (confusingly), and 'frozen' followed by any of the names for a g-box.