Telecon 23.10.2015

From RDF Stream Processing Community Group

Participants

  • Daniele Dell'Algio
  • Robin Keskisärkkä
  • Bernhard Ortner
  • Jean-Paul Calbimonte (Chair)
  • Alejandro Llaves
  • Minh Dao Tran
  • Tara Athan

Agenda

  • Definition of window functions
  • Multiple triples in metadata of timestamped graph
  • Examples of streams in the doc
  • Timeline for this work

Resources

Minutes

  • Tara Athan: I am having audio problems - I would like to point out that the example uses CURIEs where the prefixes have not been defined. Also, it would be good to specify what syntax is being used - is this JSON-LD or something else?
  • Tara Athan: I think it is fine to use prefixes, but it is just necessary to mention what is intended.
  • Jean-Paul: i agree
  • Tara Athan: I would like to see the definitions of stream, substream and window finalized before addressing the details of the window function, because there is a dependence.
  • Tara Athan: The current defintion of stream says nothing about order.
  • Daniele (PoliMi): yes
  • Tara Athan: The definition of stream should, I believe, include a constraint about the timestamps. Not just any sequence of time-stamped graphs should be considered to be a stream.
  • Daniele (PoliMi): yes, in particular if it is possible to assign different timestamps (i don't know if that proposal is still valid)
  • Tara Athan: I think the last proposal was that each (time-stamping) predicate should have a partial order associated with it, that the time values must satisfy.
  • Minh: @Tara: yes
  • Daniele (PoliMi): actually ??, it was something more generic, to associate also other kind of annotations (e.g., generation time, transmission time, etc.)
  • Tara Athan: generation time, transmission time are the time-stamping predicates.
  • Daniele (PoliMi): indeed
  • Tara Athan: At present we only allow one such triple for each time-stamped graph. But it is possible to have the same graph have multiple occurrences in the stream.
  • Tara Athan: Example: :g1 {...}{:g1,prov:generatedAtTime,t1}
  • Tara Athan: :g1 {...}{:g1,prov:observedAtTime,t2}
  • Daniele (PoliMi): and is the content always the same
  • Daniele (PoliMi): ?
  • Daniele (PoliMi): content -> {...}
  • Tara Athan: If there is different content both given the same name :g1, then that is an inconsistency.
  • Daniele (PoliMi): ok
  • Tara Athan: Can we have a written proposal in the chat? THen we can vote on it.
  • Jean-Paul: the partial order of timestamps in a stream be specified on a predicate-by-predicate basis, as a way to allow greater generality of streams while still preserving the ability to merge arbitrary streams.
  • Jean-Paul: ordering in the stream is only with respect to timestamps of the same predicate.
  • Alejandro Llaves: But with this definition, if I have a data input with unordered observation time, is it not a stream?
  • Tara Athan: It is necessary to say that there is a unique partial order associated with each predicate. That is, it is not a user decision what partial order to use.
  • Tara Athan: @Alejandro - If the data violates that partial order associated with the predicate, then it is not a stream.
  • Alejandro Llaves: I.e., are we saying that a list of observations with unordered observation time is not a stream?
  • Tara Athan: It is theoretically possible to define a predicate with a trivial partial order - that nothing is comparable. Then you can have unordered data.
  • Minh: A stream S consists of a sequence of timestamped graphs whose elements sharing the same predicate are ordered by a partial order associated with this predicate on the timestamps.
  • Tara Athan: by a partial order => by the partial order
  • Daniele (PoliMi): what do you mean by "where"?
  • Tara Athan: Yes, it is important to specify the properties of the partial ordering. I made a proposal in that email.
  • Daniele (PoliMi): maybe i misunderstood your question
  • Tara Athan: If there is possible to have different partial orders for the same predicate, then it may not be possible to merge those streams.
  • Tara Athan: It is impossible to define the scope of an "application".
  • Daniele (PoliMi): the time on which the data arrives
  • Tara Athan: If the timestamp is the time it arrives, then the data *becomes* a stream once that timestamp is associated with it.
  • Daniele (PoliMi): we have different use cases where the data is ordered in this way
  • Robin Keskisärkkä: In practise this strict ordering would often require an intermediate step in which the stream "becomes" ordered with respect to some predicate (outside RSP). A typical case would be when sensor streams are processed in a cluster (e.g. streams with different partitions in Kafka, where there is an order that can become partially unordered when there is some network latency).
  • Tara Athan: Perhaps we should qualify our terminology. Say "RDF stream" rather than "stream"
  • Tara Athan: or "RDF time-stamped stream"
  • Tara Athan: 1. The usual mathematical requirements of a partial order apply (http://mathworld.wolfram.com/PartialOrder.html):a) Reflexivity X <= Xb) Antisymmetry X <= Y and Y <= X implies X = Yc) Transitivity X <= Y and Y <= Z implies X <= Z2. The partial order must respect the natural order of time.In particular, if every time instant within the closure of temporal entity X is earlier than every time instant within the closure of temporal entity Y, then X <= Y(where closure of a time instant t is defined as the degenerate interval [t, t], and closure of an interval is defined in the usual way)
  • Tara Athan: Some formatting was lost in the copy-paste.
  • Robin Keskisärkkä: A minor thing, but what is the motivation for the variables used in the window function definition (i.e. l and d)?
  • Robin Keskisärkkä: Typically we speak of duration, upper/lower bound, step
  • Robin Keskisärkkä: so it's kindow confusing when step = d
  • Tara Athan: Any finite stream can be considered in its entirety as an RDF dataset. I think it is sufficient to define queries on RDF datasets - we shouldn't need something special for streams.
  • Robin Keskisärkkä: kind of*
  • Robin Keskisärkkä: it's a minor thing
  • Tara Athan: The output of the window function is still a stream.
  • Tara Athan: So it is just a two-step process - filter the original stream by the window function, then apply the query to resulting substream.
  • Tara Athan: OK, so if it is a matter of the query language, then that is syntax, not semantics.
  • Daniele (PoliMi): it's a matter of semantics
  • Daniele (PoliMi): let's for example say that we want to eval a bgp p over the output of a window function
  • Daniele (PoliMi): to follow the sparql definition, the bgp is is applied to the active graph of the dataset
  • Tara Athan: "bgp p"?
  • Daniele (PoliMi): so we need a way to move from the output of the window function to a graph (that would be the active one in that case)
  • Daniele (PoliMi): p is a typo
  • Tara Athan: I still am not familiar with "bgp"
  • Daniele (PoliMi): basic graph pattern
  • Tara Athan: I would think that the query needs to be defined for an RDF dataset. How would you apply a bgp to an RDF dataset?
  • Daniele (PoliMi): sorry for the silly question, but is the rdf dataset the same dataset defined in the sparql spec?
  • Daniele (PoliMi): i can support minh on that
  • Tara Athan: I am not completely familiar with the SPARQL spec, but at first glance it looks like SPARQL does not commit to a particular semantics of RDF datasets, while we do.
  • Minh: ok, thanks everyone and have a nice weekend
  • Robin Keskisärkkä: bye
  • Alejandro Llaves: thanks, have a nice weekend!
  • Daniele (PoliMi): bye
  • Tara Athan: bye