Containers of Triples

From RDF Working Group Wiki
Revision as of 11:13, 5 May 2012 by Sandro (Talk | contribs)

Jump to: navigation, search
Or: Types of "Graphs"

In RDF, people use the term "graph" in many different kinds of ways, referring to many different kinds of collections of triples. Sometimes the distinctions do not matter; sometimes they do. This page explain some of the cases where they do.

Applicable Metadata Properties

Property g-text g-snap g-box static g-box
"Last Modified" (What was the most recent time there was a change in which triples it contained?) No No Yes Yes [1]
"Creation Time" (When was it created? What is the time before which it did not exist?) No No Yes Yes [1]
"Content-Length" (How many bytes long is it) Yes No No No
"Content-Type" (Which format or language is it written in, eg "application/rdf+xml") Yes No No No
Read Access List (What is the complete list of people who are allowed to see which triples are in it?) No No Yes Yes
Write Access List (What is the complete list of people who are allowed to update which triples are in it?) No No Yes No
Maintainer (Who is responsible for maintaining this information?) No No Yes Yes [2]
Creator (Who created this?) Yes Yes Yes [3] Yes
License (What is the legal license expressing how this may be used?) Yes Yes [4] Yes Yes

Notes:

[1] For a frozen g-box, Last-Modified is the time it became frozen.

[2] Debatable, but maybe providing metadata about it and making it dereferenceable is a kind of maintenance

[3] Can get very poorly-defined, like who created a given wikipedia page. One person started it, but how much of that original work remains? Possibly none of it.

[4] What happens legally if Alice grants you one license and Bob grants you another? That could easily happen if they both licensed the same graph. Does copyright law deal with the possibility of two people creating the same work, independently? If they contain the same triples, there would be no way to distinguish between the g-snap Alice is trying to license and the g-snap Bob is trying to license.

Sorting Flowchart

So you have some sort of container or group or collection of RDF triples. What type is it?

1. At any given point in time, is it clearly defined which triples are contained in it? For example, considering a Web service which returns different RDF data based on the client's IP address, we would have to say "No". If YES, go to question 1. If NO, it's not a container of triples, this is not the right quiz for you.

2. While it conceptually contains triples, does it actually consist entirely of a sequence of characters (or bytes)? If YES, it's g-text. If NO, proceed to number 3.

3. Could you have two of these, with distinct identities, but containing exactly the same triples? "Distinct identities" means you might want to say something about one and not also be saying the same thing about the other. If YES, it's some kind of g-box; proceed to number 4; if NO, it's g-snap.

4. Is it theoretically possible for it to contain different triples tomorrow than it does today? If YES, it's a mutable g-box. If NO, it's a static g-box.

Types of Containers

G-Text

Some might not call the g-text a container of triples, but it sure looks like one:

"<a> <b> <c>"

That looks like it has a triple in it. Maybe it does. But it's also a string of 11 characters. When computers transmit triples, they put them in a g-text, then transmit the g-text.


G-Snap

This is for the purists: it's a mathematical set of triples. The g-snap has no identity of its own. Anything you can say about the g-snap containing just the triple <a> <b> <c> is true of every g-snap containing just that triple, because there's really only one g-snap like that.

It also makes no sense to talk about a g-snap changing.

G-snaps tend to be somewhat confusing outside of math. It may help to think of them like integers. There is only one number 7, for example. We can write it down many places, and change what's written in those places, but we can't change the number itself. We can assign meaning to some of those places -- on the score board in a stadium, for instance, it means a lot -- but it's complete nonsense to say that one person's number seven is somehow different from another person's number seven. There is only one number seven. Similarly, there is only one empty g-snap, only one g-snap containing just <a> <b> <c>, etc.

This is what the RDF Recommendation (2004) calls a "graph".

Suggested names for this include "RDF Graph", "graph snapshot", ...


Mutable G-Box

The mutable g-box is perhaps the most versitile of containers for triples. G-boxes retain their separate identities regardless of which triples they contain, and their triples can change over time.

Examples of mutable g-boxes includes:

  • An HTML page with embedded RDFa or microdata
  • A SQL database exported as RDF using R2RML
  • The "default graph" of a normal SPARQL server
  • A file containing RDF/XML, Turtle, or N-Triples

Names sometimes used for this include "graph" and "named graph".

Suggested names for this include "graph container", "data space", "data page", "layer", and "sheet".

Static G-Box

The static g-box is like the mutable g-box, except it's defined to never change. It's not yet clear how practical this is.

If it were identified with a URL that included some kind of cryptographic hash, that might help keep it from changing.

The static g-box is different from the g-snap in that it retains it's own identity, separate from its contents. There can be many different empty static g-boxes, each meaningfully having its own metadata.

Suggested names for this include "graph snapshot" (confusingly), and 'static' followed by any of the names for a mutable g-box.