From RDF Stream Processing Community Group
Participants
- Daniele Dell'Algio
- Robin Keskisärkkä
- Bernhard Ortner
- Jean-Paul Calbimonte (Chair)
- Alejandro Llaves
- Minh Dao Tran
- Tara Athan
Agenda
- Definition of window functions
- Multiple triples in metadata of timestamped graph
- Examples of streams in the doc
- Timeline for this work
Resources
Minutes
- Tara Athan: I am having audio problems - I would like to point out that the example uses CURIEs where the prefixes have not been defined. Also, it would be good to specify what syntax is being used - is this JSON-LD or something else?
- Tara Athan: I think it is fine to use prefixes, but it is just necessary to mention what is intended.
- Tara Athan: I would like to see the definitions of stream, substream and window finalized before addressing the details of the window function, because there is a dependence.
- Tara Athan: The current defintion of stream says nothing about order.
- Tara Athan: The definition of stream should, I believe, include a constraint about the timestamps. Not just any sequence of time-stamped graphs should be considered to be a stream.
- Daniele (PoliMi): yes, in particular if it is possible to assign different timestamps (i don't know if that proposal is still valid)
- Tara Athan: I think the last proposal was that each (time-stamping) predicate should have a partial order associated with it, that the time values must satisfy.
- Daniele (PoliMi): actually ??, it was something more generic, to associate also other kind of annotations (e.g., generation time, transmission time, etc.)
- Tara Athan: generation time, transmission time are the time-stamping predicates.
- Tara Athan: At present we only allow one such triple for each time-stamped graph. But it is possible to have the same graph have multiple occurrences in the stream.
- Tara Athan: Example: :g1 {...}{:g1,prov:generatedAtTime,t1}
- Tara Athan: :g1 {...}{:g1,prov:observedAtTime,t2}
- Daniele (PoliMi): and is the content always the same
- Daniele (PoliMi): content -> {...}
- Tara Athan: If there is different content both given the same name :g1, then that is an inconsistency.
- Tara Athan: Can we have a written proposal in the chat? THen we can vote on it.
- Jean-Paul: the partial order of timestamps in a stream be specified on a predicate-by-predicate basis, as a way to allow greater generality of streams while still preserving the ability to merge arbitrary streams.
- Jean-Paul: ordering in the stream is only with respect to timestamps of the same predicate.
- Alejandro Llaves: But with this definition, if I have a data input with unordered observation time, is it not a stream?
- Tara Athan: It is necessary to say that there is a unique partial order associated with each predicate. That is, it is not a user decision what partial order to use.
- Tara Athan: @Alejandro - If the data violates that partial order associated with the predicate, then it is not a stream.
- Alejandro Llaves: I.e., are we saying that a list of observations with unordered observation time is not a stream?
- Tara Athan: It is theoretically possible to define a predicate with a trivial partial order - that nothing is comparable. Then you can have unordered data.
- Minh: A stream S consists of a sequence of timestamped graphs whose elements sharing the same predicate are ordered by a partial order associated with this predicate on the timestamps.
- Tara Athan: by a partial order => by the partial order
- Daniele (PoliMi): what do you mean by "where"?
- Tara Athan: Yes, it is important to specify the properties of the partial ordering. I made a proposal in that email.
- Daniele (PoliMi): maybe i misunderstood your question
- Tara Athan: If there is possible to have different partial orders for the same predicate, then it may not be possible to merge those streams.
- Tara Athan: It is impossible to define the scope of an "application".
- Daniele (PoliMi): the time on which the data arrives
- Tara Athan: If the timestamp is the time it arrives, then the data *becomes* a stream once that timestamp is associated with it.
- Daniele (PoliMi): we have different use cases where the data is ordered in this way
- Robin Keskisärkkä: In practise this strict ordering would often require an intermediate step in which the stream "becomes" ordered with respect to some predicate (outside RSP). A typical case would be when sensor streams are processed in a cluster (e.g. streams with different partitions in Kafka, where there is an order that can become partially unordered when there is some network latency).
- Tara Athan: Perhaps we should qualify our terminology. Say "RDF stream" rather than "stream"
- Tara Athan: or "RDF time-stamped stream"
- Tara Athan: 1. The usual mathematical requirements of a partial order apply (http://mathworld.wolfram.com/PartialOrder.html):a) Reflexivity X <= Xb) Antisymmetry X <= Y and Y <= X implies X = Yc) Transitivity X <= Y and Y <= Z implies X <= Z2. The partial order must respect the natural order of time.In particular, if every time instant within the closure of temporal entity X is earlier than every time instant within the closure of temporal entity Y, then X <= Y(where closure of a time instant t is defined as the degenerate interval [t, t], and closure of an interval is defined in the usual way)
- Tara Athan: Some formatting was lost in the copy-paste.
- Robin Keskisärkkä: A minor thing, but what is the motivation for the variables used in the window function definition (i.e. l and d)?
- Robin Keskisärkkä: Typically we speak of duration, upper/lower bound, step
- Robin Keskisärkkä: so it's kindow confusing when step = d
- Tara Athan: Any finite stream can be considered in its entirety as an RDF dataset. I think it is sufficient to define queries on RDF datasets - we shouldn't need something special for streams.
- Robin Keskisärkkä: kind of*
- Robin Keskisärkkä: it's a minor thing
- Tara Athan: The output of the window function is still a stream.
- Tara Athan: So it is just a two-step process - filter the original stream by the window function, then apply the query to resulting substream.
- Tara Athan: OK, so if it is a matter of the query language, then that is syntax, not semantics.
- Daniele (PoliMi): it's a matter of semantics
- Daniele (PoliMi): let's for example say that we want to eval a bgp p over the output of a window function
- Daniele (PoliMi): to follow the sparql definition, the bgp is is applied to the active graph of the dataset
- Daniele (PoliMi): so we need a way to move from the output of the window function to a graph (that would be the active one in that case)
- Daniele (PoliMi): p is a typo
- Tara Athan: I still am not familiar with "bgp"
- Daniele (PoliMi): basic graph pattern
- Tara Athan: I would think that the query needs to be defined for an RDF dataset. How would you apply a bgp to an RDF dataset?
- Daniele (PoliMi): sorry for the silly question, but is the rdf dataset the same dataset defined in the sparql spec?
- Daniele (PoliMi): i can support minh on that
- Tara Athan: I am not completely familiar with the SPARQL spec, but at first glance it looks like SPARQL does not commit to a particular semantics of RDF datasets, while we do.
- Minh: ok, thanks everyone and have a nice weekend
- Alejandro Llaves: thanks, have a nice weekend!