Jump to content

Toggle the table of contents

Telecon 21.02.2014

From RDF Stream Processing Community Group

Participants

Alasdair Gray
Claudio Di Ciccio
Jean-Paul Calbimonte
Oscar Corcho
Daniele Dell'Aglio
Danh Le Phuoc
Robin Keskisärkkä
Roland Stühmer
Monika Solanki
Darko Anicic
Josiane Parreira
Jesus Arias

Minutes

Query requirements in the Use cases

Jean-Paul: Reviewing the status of use cases and their queries. There have been additions and input. Start to discuss queries and requirements found.

Oscar: In the wiki: [1]

Danh: General comment on some of the queries: some functionalities are out of the scope, like learning or mining functions embedded in the query, etc. Some query descriptions look more like a workflow description.

Caludio: feed the events with the queries. The learning part can be done by an external module, code package

Danh: there's a need to bridge the gap between high level snecario and query scenarios

Jean Paul: we have to iterate over uses cases to see which parts are queries and which parts are more focused on transformational workflows, etc.

Claudio: abstract ideas into real data-> try to convert logs of activities into RDF streams, that can be consumable

Danh: willing to decompose the scenario and work it out a bit

Jean Paul: I have tried to decompose the parts of the queries that make more sense in the use cases page

Time windows and slide

Jean-Paul: Watching thorugh the use cases queries and their features. Need for windows, for instance in the Inland water use case.

Claudio: only keep track of the most recent intervals

Darko: is the M<N a sorrt of step inside the window?

Oscar: Discussion on the need to provide more examples on what M<N means, as there is some confusion

Jean-Paul: is it like a window slide?

Danh: it is not a slide but a sub aggregation

Claudio: divide the results: subdivide the interval into steps

Alasdair: Claudio, can you provide a concrete example of what you are wanting to do (an English statement of the desired functionlaity)

Sort of Geospatial queries

Jean Paul: there may be some cases where geospatial information is necessary, but this is most probably outside the context of this group (e.g., the next lock)

Claudio: yes, but the focus is not on geo-spatial functions but rather linking with 'static' data.

Oscar : BTW, we are specifically discussing http://www.w3.org/community/rsp/wiki/Use_cases#Inland-water_transportation_delay_prediction

Alasdair Gray : We should look at incorporating GeoSPARQL for this aspect rather than inventing yet another syntax

Alasdair Gray : http://www.opengeospatial.org/standards/geosparql

Oscar :I agree with Alasdair, of course. Anything else on the geospatial aspects is out of scope, IMO

Oscar :Summary: static data, subqueries, aggregates, etc., from the same or different streams

Danh: we have dynamic data about position, direction, distance, etc. There is also spatial computation of a moving object

Danh: I do not think that GeoSPARQL is even enough for this

Danh: if this is out of scope, we should be very clear about that

Oscar: this can widen the scope of the group a lot

Jean-Paul: +1

Danh: danh +1

Timestamped Named graphs (dereferenceable?)

Monika: in my case, we are working on streams of named graphs

Monika: not sure about which query engine supports this

Monika : The query engine from Milano group

Jean-Paul: C-SPARQL??

Monika: not sure if CQELS can be used for this use case

Alasdair Gray : Example: g1 :occursAt 1; g2 :occursAt 2 ...

Alasdair Gray : g1 and g2 are graphs

Monika : Cannot share my desktop, option disabled on my machine

Oscar : Monika shares desktop to show an example

Alasdair Gray : There are two issues here. First we have a stream of graph URIs which are given a timestamp of when they arrive in the system and the statements in the where clause of teh query need to operate over that graph.

Daniele: working on this system (not csparql)

Daniele: Marco and Emanuele are working on SLD, which is not integrated in C-SPARQL, and may be useful here

Oscar : Suggestion to discuss about this next week

Oscar : ACTION (Monika): add example on the wiki for everybody to see it more calmly

Monika: the query needs to be run over the aggregation of the content of the named graphs that are inside a window

Daniele: why do you need a stream processor for this type of problem, given that you have time to write the RDF and make it available as Linked Data

Danh: if you put all these RDF chunks together and send it over a stream, then you can actually run it in almost any processor

Danh: maybe this is a good case to discuss about whether the current RDF stream model that we are dealing with is enough to express easily cases like this

Monika: the structure that I propose makes it simpler to me

Monika: I would appreciate if Danh can help in trying to see how this would look like as a simple stream

Oscar: ACTION (Monika): put the material online and alert everybody by e-mail

Monika : That is true

Application and System time

Alasdair Gray : The second issue that came up in Monika's example is that we really want to have the sliding window over a user defined property in the named graph, not over the system time added by the stream processing engine.

Danh: most systems use the windows use the time assigned by the system

Monika: the timestamps that appear currently in my files are for the sake of logging that, but they would be part of the stream. monika, correct me if I took it wrong

Daniele (PoliMI) : i think it is a problem related to the model of stream: do we want to model the temporal dimension through additional annotation, or do we want to model it in "standard" RDF?

Oscar : @Daniele, I agree

Roland: Esper supports both timestamps in the event and system time

Monika : The additional annotation may not be really important from the query perspective, atleast in my case.

Monika : the auxillary time stamps are just logs

Monika: Darko can you summarise what you explained on the wiki?

Darko: if there is no application time, it uses a system assigned time

Alasdair Gray : If we allow query processing based on an arbitrary timestamp in the data then we have issues around ordering and uncertainty in whether we've seen all the data by a specified time

Daniele: loose contemporaneity?

Roland: contemporaneity: we loose it with system time semantics

Jean Paul: not sure about what we will be doing on how to handle system time vs application time. ACTION: need to discuss that and make it explicit

Alasdair: in an earlier call, we talked about going into application time and not system time

Darko: however, in some scenarios we cannot assume that application time is going to be available (e.g., if small devices do not have a clock)

Daniele: in that case, what will happen is that the system time will be used as the application time.

Monika : @Daniele, agree

Daniele (PoliMI) : and it will introduce problems on correctness (it would be an approximation)

Alasdair Gray : In an earlier call we talked about having the output of one stream processing system being the input to another stream processing system, In this case we really need to work on application time since the first stream processing system will introduce a delay

Danh: there are different ways to express time. We need an action to discuss this very clearly

Roland: e.g., the SSN ontology has several timestamps, in fact (system time would add a third timestamp)

Alasdair Gray : By application time I was meaning that the timestamp would be captured in the data with a property that states what type of timestamp it is

Daniele: adding something else on system time: this is an annotation that is made on the data that the data stream processor receives. But actually this data is only used internally by the system. When the data goes out it should be the application time, not the system time. So system time should be only internal to the processor

Oscar : I agree

Alasdair Gray : Although system time is internal, it is surfaced in the query language. Windows operate over the system time

Daniele: normally time windows are actually always defined on the time that is present/explicit on the stream

Roland: some engines, like Esper, default to system time <<is this right?>>

Roland: as a default, yes! (but esper can be made to recognize app time, too)

Jean Paul: provide more details on the use cases, specially if you have not done it yet, and interact with the RSP implementors

Query Semantics for timestamped graphs

Jean-Paul: Bring to your attention some preliminary work on the window query semantics (with graphs): http://www.w3.org/community/rsp/wiki/RSP_Query_Semantics

Oscar: Jean Paul and Daniele have been working on the query semantics

Daniele (PoliMI) : on graph streams

Oscar : Jean Paul suggests to discuss it for the next call, although happy to receive comments in the meantime

Oscar : ACTION (JP): will summarise the requirements that are arising from the queries

Retrieved from "https://www.w3.org/community/rsp/wiki/index.php?title=Telecon_21.02.2014&oldid=206"