Telecon 21.02.2014

From RDF Stream Processing Community Group

Participants

  • Alasdair Gray
  • Claudio Di Ciccio
  • Jean-Paul Calbimonte
  • Oscar Corcho
  • Daniele Dell'Aglio
  • Danh Le Phuoc
  • Robin Keskisärkkä
  • Roland Stühmer
  • Monika Solanki
  • Darko Anicic
  • Josiane Parreira
  • Jesus Arias

Minutes

Query requirements in the Use cases

  • Jean-Paul: Reviewing the status of use cases and their queries. There have been additions and input. Start to discuss queries and requirements found.
  • Oscar: In the wiki: [1]
  • Danh: General comment on some of the queries: some functionalities are out of the scope, like learning or mining functions embedded in the query, etc. Some query descriptions look more like a workflow description.
  • Caludio: feed the events with the queries. The learning part can be done by an external module, code package
  • Danh: there's a need to bridge the gap between high level snecario and query scenarios
  • Jean Paul: we have to iterate over uses cases to see which parts are queries and which parts are more focused on transformational workflows, etc.
  • Claudio: abstract ideas into real data-> try to convert logs of activities into RDF streams, that can be consumable
  • Danh: willing to decompose the scenario and work it out a bit
  • Jean Paul: I have tried to decompose the parts of the queries that make more sense in the use cases page

Time windows and slide

  • Jean-Paul: Watching thorugh the use cases queries and their features. Need for windows, for instance in the Inland water use case.
  • Claudio: only keep track of the most recent intervals
  • Darko: is the M<N a sorrt of step inside the window?
  • Oscar: Discussion on the need to provide more examples on what M<N means, as there is some confusion
  • Jean-Paul: is it like a window slide?
  • Danh: it is not a slide but a sub aggregation
  • Claudio: divide the results: subdivide the interval into steps
  • Alasdair: Claudio, can you provide a concrete example of what you are wanting to do (an English statement of the desired functionlaity)

Sort of Geospatial queries

  • Jean Paul: there may be some cases where geospatial information is necessary, but this is most probably outside the context of this group (e.g., the next lock)
  • Claudio: yes, but the focus is not on geo-spatial functions but rather linking with 'static' data.
  • Alasdair Gray : We should look at incorporating GeoSPARQL for this aspect rather than inventing yet another syntax
  • Oscar :I agree with Alasdair, of course. Anything else on the geospatial aspects is out of scope, IMO
  • Oscar :Summary: static data, subqueries, aggregates, etc., from the same or different streams
  • Danh: we have dynamic data about position, direction, distance, etc. There is also spatial computation of a moving object
  • Danh: I do not think that GeoSPARQL is even enough for this
  • Danh: if this is out of scope, we should be very clear about that
  • Oscar: this can widen the scope of the group a lot
  • Jean-Paul: +1
  • Danh: danh +1

Timestamped Named graphs (dereferenceable?)

  • Monika: in my case, we are working on streams of named graphs
  • Monika: not sure about which query engine supports this
  • Monika : The query engine from Milano group
  • Jean-Paul: C-SPARQL??
  • Monika: not sure if CQELS can be used for this use case
  • Alasdair Gray : Example: g1 :occursAt 1; g2 :occursAt 2 ...
  • Alasdair Gray : g1 and g2 are graphs
  • Monika : Cannot share my desktop, option disabled on my machine
  • Oscar : Monika shares desktop to show an example
  • Alasdair Gray : There are two issues here. First we have a stream of graph URIs which are given a timestamp of when they arrive in the system and the statements in the where clause of teh query need to operate over that graph.
  • Daniele: working on this system (not csparql)
  • Daniele: Marco and Emanuele are working on SLD, which is not integrated in C-SPARQL, and may be useful here
  • Oscar : Suggestion to discuss about this next week
  • Oscar : ACTION (Monika): add example on the wiki for everybody to see it more calmly
  • Monika: the query needs to be run over the aggregation of the content of the named graphs that are inside a window
  • Daniele: why do you need a stream processor for this type of problem, given that you have time to write the RDF and make it available as Linked Data
  • Danh: if you put all these RDF chunks together and send it over a stream, then you can actually run it in almost any processor
  • Danh: maybe this is a good case to discuss about whether the current RDF stream model that we are dealing with is enough to express easily cases like this
  • Monika: the structure that I propose makes it simpler to me
  • Monika: I would appreciate if Danh can help in trying to see how this would look like as a simple stream
  • Oscar: ACTION (Monika): put the material online and alert everybody by e-mail
  • Monika : That is true

Application and System time

  • Alasdair Gray : The second issue that came up in Monika's example is that we really want to have the sliding window over a user defined property in the named graph, not over the system time added by the stream processing engine.
  • Danh: most systems use the windows use the time assigned by the system
  • Monika: the timestamps that appear currently in my files are for the sake of logging that, but they would be part of the stream. monika, correct me if I took it wrong
  • Daniele (PoliMI) : i think it is a problem related to the model of stream: do we want to model the temporal dimension through additional annotation, or do we want to model it in "standard" RDF?
  • Oscar : @Daniele, I agree
  • Roland: Esper supports both timestamps in the event and system time
  • Monika : The additional annotation may not be really important from the query perspective, atleast in my case.
  • Monika : the auxillary time stamps are just logs
  • Monika: Darko can you summarise what you explained on the wiki?
  • Darko: if there is no application time, it uses a system assigned time
  • Alasdair Gray : If we allow query processing based on an arbitrary timestamp in the data then we have issues around ordering and uncertainty in whether we've seen all the data by a specified time
  • Daniele: loose contemporaneity?
  • Roland: contemporaneity: we loose it with system time semantics
  • Jean Paul: not sure about what we will be doing on how to handle system time vs application time. ACTION: need to discuss that and make it explicit
  • Alasdair: in an earlier call, we talked about going into application time and not system time
  • Darko: however, in some scenarios we cannot assume that application time is going to be available (e.g., if small devices do not have a clock)
  • Daniele: in that case, what will happen is that the system time will be used as the application time.
  • Monika : @Daniele, agree
  • Daniele (PoliMI) : and it will introduce problems on correctness (it would be an approximation)
  • Alasdair Gray : In an earlier call we talked about having the output of one stream processing system being the input to another stream processing system, In this case we really need to work on application time since the first stream processing system will introduce a delay
  • Danh: there are different ways to express time. We need an action to discuss this very clearly
  • Roland: e.g., the SSN ontology has several timestamps, in fact (system time would add a third timestamp)
  • Alasdair Gray : By application time I was meaning that the timestamp would be captured in the data with a property that states what type of timestamp it is
  • Daniele: adding something else on system time: this is an annotation that is made on the data that the data stream processor receives. But actually this data is only used internally by the system. When the data goes out it should be the application time, not the system time. So system time should be only internal to the processor
  • Oscar : I agree
  • Alasdair Gray : Although system time is internal, it is surfaced in the query language. Windows operate over the system time
  • Daniele: normally time windows are actually always defined on the time that is present/explicit on the stream
  • Roland: some engines, like Esper, default to system time <<is this right?>>
  • Roland: as a default, yes! (but esper can be made to recognize app time, too)
  • Jean Paul: provide more details on the use cases, specially if you have not done it yet, and interact with the RSP implementors

Query Semantics for timestamped graphs

  • Oscar: Jean Paul and Daniele have been working on the query semantics
  • Daniele (PoliMI) : on graph streams
  • Oscar : Jean Paul suggests to discuss it for the next call, although happy to receive comments in the meantime
  • Oscar : ACTION (JP): will summarise the requirements that are arising from the queries