RDF Stream Models

From RDF Stream Processing Community Group

In this wiki page we will be posting our shared knowledge on RDF Stream Models

Background on Data Stream and Complex Event Processing

To complete, only references by now.

  • Models used in DSMS:
    • Arvind Arasu, Shivnath Babu, and Jennifer Widom. The CQL continuous query language: semantic foundations and query execution. The VLDB Journal, 15(2):121–142, June 2006.
    • Daniel J. Abadi, Don Carney, Ugur Cetintemel, Mitch Cherniack, Christian Convey, Sangdon Lee, Michael Stonebraker, Nesime Tatbul, and Stan Zdonik. Aurora: a new model and architecture for data stream management. The VLDB Journal, 12(2):120–139, August 2003.
    • Sirish Chandrasekaran, Owen Cooper, Amol Deshpande, Michael J. Franklin, Joseph M. Hellerstein, Wei Hong, Sailesh Krishnamurthy, Samuel R. Madden, Fred Reiss, and Mehul A. Shah. TelegraphCQ: continuous dataflow processing for an uncertain world. In Proc. ACM SIGMOD International Conference on Management of Data, pages 668–668. ACM, 2003.

RDF Stream models

Brief overview on Stream RDF models: Slides by D. DEll'Aglio

Temporal Graphs

Observation: If we decide to "just" consume triples, then we completely ignore the consumption of (i) graphs and (ii) relational data. In processing streams of RDF data (not limited to triples) we inverse the processing model: queries are usually fix while data is volatile, yet unknown. RDF data is a graph, sometimes with a context (e.g. time) as a named graph. SPARQL provides an extension point with basic graph pattern matching. None of the above assumes a limitation to single triples. It is therefore desirable to place no such limitation to the representation and processing of RDF streams. Even worse, in case we consume relational or graph data and split it up to single triples for the sake of allowing consuming triples only, we would tear apart joined data and we would have to re-join the relational tuples and graphs afterwards. This seems not to most desirable approach.

It is therefore that we should consider allowing to consume time-(interval) annotated graphs rather than time-annotated triples. Classic RDF triples are defined to be facts. Facts are true at the current point in time. Yet, when processing streams there is at least an ordering component if not a time component added to such facts. Regarding TEF-SPARQL [Kietz et al. 2012] we propose to distinguish between (possibly) complex events and facts. Simple events happen at a single time instant, complex events start and end happening whereas facts always have a time interval during which they are valid. Facts are usually triggered by events or represent background data which we consider valid. We hence need to express both complex events as graphs and be able to assign these but also facts a time interval.

Variant 1 - time-stamp based

In this approach, the basic model is that of temporal graphs, which can be defined as follows:

  • A temporal graph is a set of temporal triples
  • A temporal triple is an RDF triple (a,b,c) with a temporal label t (a natural number), which can be written as (a,b,c)[t].
  • Reference:
    • C. Gutierrez, C.A. Hurtado, and A. Vaisman. Introducing time into RDF. IEEE Transactions on Knowledge and Data Engineering, 19(2):207–218, 2007.

Variant 2 - interval based

Annotating time intervals to graphs implies time intervals to the triples contained. In the extreme case we assign time interval to triples, i.e. when the graph contains a single triple only. One open issue is punctuation for temporal graphs.


triple oriented

In this approach, the basic model is that of temporal graphs, which can be defined as follows:

  • A temporal graph is a set of temporal triples
  • A temporal triple is an RDF triple (a,b,c) with a temporal label i (an interval, i.e., two natural numbers), which can be written as (a,b,c)[i].

graph oriented

An alternativ representation with the same expressive power is based on named graphs:

  • A temporal graph tg is a graph g that is associated with an interval i (i.e., two natural numbers)
  • Every triple (a,b,c) is associated with a temporal graph (analogous to a named graph)

References:

  • J Tappolet, A Bernstein, Applied temporal RDF: efficient temporal querying of RDF data with SPARQL, In: 6th European Semantic Web Conference (ESWC), 2009-06.
  • Konstantina Bereta, Panayiotis Smeros and Manolis Koubarakis. Representation and querying of valid time of triples in linked geospatial data. Proc ESWC 2013.
  • Jörg-Uwe Kietz, Thomas Scharrenbach, Lorenz Fischer, Minh Khoa Nguyen and Abraham Bernstein. TEF-SPAQRL: The DDIS query-language for time annotated event and fact Triple-Streams - Version 1.0 -, Tech Report, University of Zurich, Department of Informatics, 2012.

Serialization

  • Considerations of the data format have to take into account the serialization. If we disentangle these questions we risk to fail in defining a proper serialization in the end.
  • The most natural serialization is to convey ref over http and xml. This way we can easily represent complex events as subgraphs, since it is these subgraphs we provide.

In case we provide formats like turtle/n3/n-triples/nquads we will have to face the problem of punctuation, i.e. when does a subgraph end? If we are to face punctuation we will have to add context to triples and this context must relate to the graph we are processing. It does not really matter whether the time is modeled in the context explicitly.

Discussion of the RSP Serialization Group

Streams of Objects

Streaming objects consist of RDF sub-graphs. Number of triples per object can be variable. The object may contain zero or more timestamps t and zero or more time intervals [ t(start) , t(end) ) to annotate the object and the validity of the included data. The object may also contain zero or more counter elements to validate ordering and detect missing objects. Characteristics of timestamps, time interval descriptors and counters are defined in a stream description.

Event Processing ODP

A lightweight re-usable ontology component based on DOLCE Ultra Light. Aligned with SSN and Event-F, but can be used independently.

Addresses the following requirements:

  • Clear separation of events and event objects
  • Payload support
  • Encapsulated event objects (composite events)
  • References to triggering events
  • Support for multiple timestamps
  • SPARQL querying ability

[1] Event Processing ODP

[2] Presentation for RSP CG Telco 25.9.2013

[3] Rinne, M., Blomqvist, E., Keskisärkkä, R., Nuutila, E.: Event Processing in RDF. In: Proceedings of WOP2013 - Research paper track. CEUR Workshop Proceedings, CEUR-WS.org (2013)

Arguments against Event Processing ODP

  1. additional overhead in the same manner as RDF reification.
  2. mixing a data format with a high level knowledge representation.
  3. We can do reification anyhow without explicitly requiring it using the named graph approach.

Proposal: Leave standard as simple as possible and define ODP on top. We can solve this by adding a context to triples, i.e. a named graph where the named graph represents the actual event. We will still have to decide whether the reference explicitly refers to a timestamp or whether it refers to a time representation.

More

  • Davide Francesco Barbieri, Daniele Braga, Stefano Ceri, Emanuele Della Valle, and Michael Grossniklaus. C-SPARQL: A continuous query language for RDF data streams. International Journal of Semantic Computing, 4(1):3–25, 2010.
  • D. Le-Phuoc, M. Dao-Tran, J. Xavier Parreira, and M. Hauswirth. A native and adaptive approach for unified processing of linked streams and linked data. In Proc. 10th International Semantic Web Conference ISWC 2011, pages 370–388. Springer, 2011.
  • D. Anicic, P. Fodor, S. Rudolph, and N. Stojanovic. Ep-sparql: a unified language for event processing and stream reasoning. In Proc. 20th international conference on World wide web WWW 2011, pages 635–644. ACM, 2011.
  • Rinne, M., Abdullah, H., Törmä, S., Nuutila, E.: Processing Heterogeneous RDF Events with Standing SPARQL Update Rules. In: Meersman, R., Dillon, T. (eds.) OTM 2012 Conferences, Part II. pp. 793–802. Springer-Verlag, 2012.