DatasetDynamics
Dataset Dynamics
The Issue
Linked Datasets change in the course of time: resource representations and links between resources are created, updated and removed; entire graphs can change or disappear. The frequency and dimension of such changes depends on the nature of a linked data source. Sensor data are likely to change more frequently than archival data. Updates on individual resources cause minor changes when compared to a complete reorganization of a data source's infrastructure such as a change of the domain name. Anyway, in many scenarios linked data consuming applications need to deal with these kind of changes in order to keep their local data dependencies consistent. Dataset dynamics denotes a research activity that currently investigates how to deal with that problem.
Use Cases
The Dataset Dynamics interest group identified three representative use cases in which applications need to be informed about changes in remote linked datasets.
- UC1 Link Maintenance: An application hosts resources that are linked with remote resources and uses remote data in its local application context. It needs to be informed when representations of these remote resources change or become unavailable under a given URI in order to keep these links valid.
- UC2 Dataset Synchronization: A dataset consumer wants to mirror or replicate (parts of) a linked dataset. The periodically running synchronization process needs to know which triples have changed at what time in order to perform efficient updates in the local dataset.
- UC3 Data Caching: An application that consumes data from one or more remote datasets uses a HTTP-level cache that stores local copies of remote data. These caches need to be invalidated when the remote data is changed.
Technical Infrastructure
These use cases require for a technical infrastructure comprising the following components:
- A Dataset Dynamics Vocabulary that can express meta-information about the dynamics of a data set (e.g., change frequency, dimension of changes, last update, etc.) and provide a link to the update notification source URI.
- A Change Description Vocabulary to express the semantics of changes at different granularity levels.
- A Change Notification Protocol that communicates changes from a remote linked dataset to a local client application.
- Applications for detecting and dealing with changes
Examples and Demos
- sparqlPuSH, SPARQL + pubsubhubbub, Alexandre Passant
- GUO Graph Diff, a prototype script for performing "diffs" on RDF Graphs, Nathan
- Linked Data Camp Vienna 09 demo, using voiD+dady and Atom.
Related Work
name | type | discovery | notification | change representation |
DSNotify | protocol, RDF schema | no | yes | yes |
Web of Data Link Maintenance Protocol | protocol, XML schema | ? | yes | yes |
Ping the Semanitc Web | centralised service | ? | ? | yes |
SemanticPingBack | Pingback extension | ? | ? | ? |
Memento: Time Travel for the Web | HTTP extension | yes | no | ? |
RFC4287 - Atom Syndication Format | XML schema | ? | yes | no |
RFC5023 - Atom Publishing Protocol | protocol | yes | yes | no |
Talis' Changesets | RDF vocabulary | ? | ? | yes |
Triplify's Updates | RDF vocabulary | ? | ? | yes |
Graph Update Ontology (GUO) | RDF vocabulary | ? | ? | yes |
Guaranteed RDF Update Format (GRUF) | format | ? | ? | yes |
Web Subscription (WebSub) | protocol | ? | yes | ? |
dady (data source dynamic) | RDF vocabulary | yes | no | no |
sparqlpush - pubsubhubbub (PuSH) interface for SPARQL endpoints. | protocol | no | yes | yes* |
PubSubHubbub (PSHB) - an open, simple, web-scale pubsub protocol | protocol | yes | yes | yes* (via Atom or RSS) |
Simple Update Protocol (SUP) - a simple and compact "ping feed" | protocol | yes | yes | no |
Delta - an ontology for the distribution of differences between RDF graphs | N3 vocabulary | no | no | yes |
SPARQL Inferencing Notation (SPIN) - SPIN SPARQL Syntax | RDF vocabulary | no | no | yes |
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) | protocol / XML Schema | no | no | yes |
Discussion
Meetings
see http://www.w3.org/wiki/DatasetDynamics/Meetings
Related
- DSNotify: Handling Broken Links in the Web of Data
- slide-set Michael Hausenblas motivating the problem, RDF Dataset Notifications
- Dataset Dynamics Guide
- Keeping up with a LOD of Changes, Talis Nodalities Magazine Issue 9, 2010
- Towards Dataset Dynamics: Change Frequency of Linked Open Data Sources, LDOW2010 paper
- Caching in HTTP, RFC2616
- Caching Tutorial by Mark Nottingham
- REDbot Resource Expert Droid (checks HTTP resources) by Mark Nottingham
- Web Authoring Statistics
- Things Caches Do by Ryan Tomayko
- Linked Open Data caching characteristics by Michael Hausenblas
- Modelling HTTP cache configuration in the Semantic Web, DanC on DIG blog
- discussion on LOD mailing list regarding RDF Update Feeds
- Memento: Time Travel for the Web
- PushBackDataToLegacySources
- Read Write Web Community Group