Use cases

From RDF Stream Processing Community Group

We are collecting relevant use cases that can help showing where RDF Stream Processing is important, used or interesting to be used. Most of the RSP Group have experience applying Stream Processing or complex Event Processing, and RDF or Ontology-based tools and frameworks.

Comment: We have to consider use-cases where we consume (i) graph data and (ii) relational data. Both use cases require a representation of input represented by more than a single RDF stream triple, i.e. RDF stream graphs. In the first case we, for example, consume output from other stream processing systems, since we want to allow other stream processing systems to output graphs. In the second case we consume relational data like sensor data where the schema is fix. The second use case is also most important, since it covers the integration of streams in a format other than RDF.

We propose this template for describing the use cases (thanks to Emanuele):

Use case name: a short title for the use case

Proposed Template

Streaming Information

  • Type: illustrates the types of data
  • Nature:
    • Relational Stream: yes/no
    • Text stream: yes/no
  • Origin: where does the data come from?
  • Frequency of update:
    • Sub-seconds/seconds/minutes/hours
    • In triples/minute: ... t/min
  • Quality: is there any source of uncertainty? Are there errors? If yes, of what kind?
  • Management /access
    • Technology in use: which technology is it currently used?
    • Problems: does the current technological solution present problems?
    • Means of improvement: has a way to address those problems been identified? If yes, please provide some details


  • Why RDF Stream? Which benefits has the use case by using RDF stream than other data formats? Why should the data be mapped to RDF before the data processing step?
    • Possible Answers:
      • Integration of several streams with dynamic schema (sharing/processing on the Web)
      • Integration with several static KBs
      • Integration of the history of data stream with other data sources after the processing
      • Increasing of system flexibility because the event sources are highly dynamic

[optional] Static Information required to interpret the streaming information

  • Type: illustrates the types of data
  • Origin: where do the data come from?
  • Dimension:
    • MB/GB/TB/…
    • In triples: 10^??
  • Quality: is there any source of uncertainty? Are there errors? If yes, of what kind?
  • Management /access
    • Technology in use: which technology is it currently used?
    • Available Ontologies and Vocabularies: is there any reference information model, ontology, vocabulary?

City Data Fusion

Streaming Information

  • Type:
    • Anonymized Call Data Records
    • Social Streams
    • traffic sensors
    • bike sharing
    • weather and pollution sensors
  • Nature:
    • Relational Stream: yes
    • Text stream: yes
  • Origin:
    • Anonymized Call Data Records: Mobile Telecom operators
    • Social Streams: Twitter, Instagram, Foursquare, etc.
    • traffic sensors: Municipality, assurance companies, car sharing companies
    • bike sharing: Municipality
    • weather and pollution sensors: environmental agencies
    • Anonymized Call Data Records from Mobile Telecom Industry
    • Social Streams from Twitter, Instagram, Foursquare, etc.
  • Frequency of update:
    • Anonymized Call Data Records
      • seconds
      • In triples/minute: from 1,000 to 100,000 t/min/km^2 of the city
    • Social Streams
      • seconds
      • In triples/minute: from 100 to 10,000 t/min/km^2 of the city
    • traffic sensors
      • minutes
      • In triples/minute: 1,000 t/min for the entire city
    • bike sharing
      • minutes
      • In triples/minute: 100 t/min for the entire city
    • weather and pollution sensors
      • hours
      • In triples/minute: 10 t/min for the entire city


  • Quality: call data records are quite clean. Social Stream are per se clean, but their interpretation introduce uncertainty. Traffic, weather, pollution sensors can broke down and can record wrong observations.
  • Management /access
    • Technology in use: a variety! in some cases DSMS and Big Data solutions are used. Most of the sensor data are managed in proprietary systems.
    • Problems: variaty and velocity is difficult to handle
    • Means of improvement: Big Data solutions

Static Information required to interpret the streaming information

  • Type: maps, point of interest, census data
  • Origin: open street map, wikipedia, freebase, open data, etc.
  • Dimension:
    • MBs per km^2
    • In triples: 10^5 per km^2
  • Quality: open data tends to be of variable quality, veracity of the data is a serious issue
  • Management /access
    • Technology in use: GIS and DBMS
    • Available Ontologies and Vocabularies: OCG standards

Environmental Monitoring in Oil Industry

Streaming Information

  • Type: Environmental data: temperatures, pressures, salinity, acidity, fluid velocities etc,
  • Nature:
    • Relational Stream: yes
    • Text stream: no
  • Origin: Data is produced by sensors in oil wells and on oil and gas platforms equipments. Each oil platform has an average of 400.000.
  • Frequency of update:
    • from sub-second to minutes
    • In triples/minute: [10000-10] t/min
  • Quality: It varies, due to instrument/sensor issues
  • Management /access
    • Technology in use: Dedicated (relational and proprietary) stores
    • Problems: The ability of users to access data from different sources is limited by an insufficient description of the context
    • Means of improvement: Add context (metadata) to the data so it become meaningful and use reasoning techniques to process that metadata

Static Information required to interpret the streaming information

  • Type: Topology of the sensor network, position of each sensor, the descriptions of the oil platform
  • Origin: Oil and gas production operations
  • Dimension:
    • 100s of MB as PostGIS dump
    • In triples: 10^8
  • Quality: Good
  • Management /access
    • Technology in use: RDBMS, proprietary technologies
    • Available Ontologies and Vocabularies: Reference Semantic Model (RSM), based on ISO 15926

EMT Bus Data from Madrid

Streaming Information

  • Type: Public Transport data: Bus stops and lines, time and distance for a bus to reach a stop.
  • Nature:
    • Relational Stream: yes
    • Text stream: no
  • Origin: Data is produced by sensors in buses, gps.
  • Frequency of update:
    • from seconds to minutes
  • Quality: Generally accurate if the bus is close to the stop. Otherwise only approx.
  • Management /access
    • Technology in use: Web Services (REST and SOAP-based available)
    • Problems: No quey infrastracture, lack of integration, no interop models.
    • Means of improvement: Add context (metadata) to the data, add query capabilities.

Static Information required to interpret the streaming information

  • Type: Bus stops, lines, coordinates.
  • Origin: EMT Madrid
  • Dimension:
    • Data needed does not exceed tens of MB
  • Quality: Good
  • Management /access
    • Technology in use: Web services, XML
    • Available Ontologies and Vocabularies: SSN ontology (any transport ontology available?)

Queries

  • Select all bus lines and the waiting time for each one, for a given bus stop, in the last minute.
  • Select all bus stops where there is a waiting time less than 5 minutes, in a bounding box area.
  • Select all waiting times greater than 10 min, and line nr, for those lines with a scheduled frequency of 7 min, every two minutes.
  • Select the average waiting time to go from bus stop A to bus stop B in the last day every half an hour.

OpenGov: Public Spending / Stock Ticker Data

This usecase combines data streams from multiple sources.

The goal is to highlight (publicly traded) companies, that double their stock price within a certain time frame and are/were awarded a government contract in the same time-frame.

Streaming Information

  • Type: Data on public spending in the US and stock ticker data
  • Nature:
    • Relational Stream: yes
    • Text stream: no
  • Frequency of update:
    • Stock Data (daily values in 2001):
      • original csv format: 1.961.295 lines (492mb unzipped)
      • triples: 73.181.951 t/year
      • triples/day: 200k t/day
    • Public Contracts (in 2001):
      • original csv format: 641.640 contracts (800mb csv unzipped)
      • triples: 109.996.479 t/year
      • triples/day: 300k t/day
  • Quality: Good
  • Management /access
    • Technology in use: Data read from disk in time annotated n-triple format
    • Problems: As we process the data on a distributed system, streaming triples is suboptimal as it does not allow efficient routing.
    • Means of improvement: Combining data in subgraphs would allow content-based routing of data for better workload scheduling.

Static Information required to interpret the streaming information

  • Type: Static information about stock tickers such as company names or industry codes
  • Management /access
    • Currently this data is replicated in each stock quote. This is inefficient and could be stored in a static data store (or possibly also in memory).

Queries

  • Select all companies that have doubled their stock price in the last week (this one only uses streaming data from one source)
    • Doable in all RDF Stream processors
  • Select all companies that have doubled their stock price in the last week and have also had a Government contract in the same week (this ones requires a combination of two streaming data sources)
    • This should be done with two CONSTRUCT queries registered, which fire these events, and require another query that selects from both sources. This is doable, for instance, in CQELS. Not sure about other RDF Stream processors.
  • Select all companies that double their stock price in one week after they get a contract (this one requires a combination of two streaming data sources, and maybe some relationship between events)
    • Difficulty on establishing what "after" means here. A CEP-oriented stream processor seems to be the most suitable.

Urban Transport Assistance

The use case targets to provide assistance to people moving in urban environments by providing information on problems and opportunities based on real-time information and derived forecasts.

Streaming Information

  • Types:
    • Location events (lat, long) of personal sensors, public transportation, taxis, private cars
    • Recognized activities (in vehicle / on bicycle / on foot)
    • Geofence events (entry, exit)
    • Car OBD events (door sensors, safety belts, engine start / stop etc.)
    • Road segment velocities
    • Traffic exception info (accidents, cancelled or delayed public transportation, road construction etc.)
    • Weather information (temperature, rain/snow)
  • Nature:
    • Multiple event streams, some with multiple types of events
    • Personal information needs to be handled securely
    • Text stream: no
  • Data origin: Produced by sensors in mobile phones, cars, public transportation companies, traffic infrastructure management and weather services.
  • Frequency of update:
    • from sub-second to hours, depending on the stream
    • In events/minute: 10000-0.01 events/min
  • Quality: Variable
  • Management /access
    • Technology in use: Mobile phone sensors, sensor libraries (e.g. Google play APK), car connectivity, transport infra web services,weather services.
    • Requirements: Event format has to be flexible enough to support multiple different event types within a stream, each event type containing different parameters. Even if the transport is split into multiple streams, the event processing system needs to be able to manage all types. Both public and private streams are needed.

Static Information required to interpret the streaming information

  • Type: Public transportation timetables
  • Origin: E.g. Helsinki Metropolitan transport timetables are available through HTTP Get, XML dump or GTFS dump.

Sample Queries

  • Destination detection
    • Trigger destination detection event, when GPS of a person is unavailable for t1 seconds
    • Trigger destination detection event, when person stays within radius r1 for t1 seconds
  • Travel Mode
    • Trigger an on_foot event when on_foot is detected by activity recognition for 20 seconds
    • Trigger an in_bus event when activity changes from on_foot to in_vehicle close (r2) to a bus stop and matching bus identification either from stop-specific timetable or live fleet tracking can be found.
    • Trigger an in_car event when activity changes from on_foot to in_vehicle without matching public transport.
    • Trigger state changes to store the latest travel mode based on the change events. Store a log of state changes.
  • Movement
    • Trigger a movement event, when person moves more than x1 meters and the state is on_foot or in_car.

Monitoring of Multi-Cloud Applications

Streaming Information

  • Type: logs
  • Nature:
    • Relational Stream: yes
    • Text stream: rarely
  • Origin: Infrastructure as a service (IaaS), Platform as a service (PaaS), Software as a service (SaaS), Network as a service (NaaS) from both public and private cloud systems
  • Frequency of update:
    • seconds
    • In triples/minute: a typical business application may produce from 1,000 to 10,000 t/min
  • Quality: no issues
  • Management /access
    • Technology in use: no standard solution
    • Problems: logs are in proprietary formats, but users need to query them omogenously
    • Means of improvement: the modaclouds EU project chose to investigate RDF stream processing

Static Information required to interpret the streaming information

  • Type: description of the deployment of the application
  • Origin: application developer and cloud solution provider
  • Dimension:
    • MB
    • In triples: 10^5
  • Quality: no issues

Document Editing

This use case provides a way for crowdsourcing edits of public documents. It is prototypical, but representative of the kind of work possible in the digital humanities textual scholarship community.

Streaming Information

  • Type: Open Annotation graphs indicating edits to a document as it exists in some point in time.
  • Nature:
    • A sequence of RDF graphs representing annotations using the Open Annotation data model.
  • Origin: Annotations could be provided by a number of tools or harvested from a number of annotation stores. The vetted edits would form another stream of annotations that would be used to create new versions of the document against which further edits could be made through annotation.
  • Frequency of update: Infrequent. Less than a thousand in a day.
  • Quality: variable
  • Management/access:
    • Technology: Web services, JSON-LD
    • Available Ontologies and Vocabularies: Open Annotation

Static Information required to interpret the streaming information

  • Type: Static version of the document against which the edits are being asserted.
  • Origin: Document store. Typically documents are stored as TEI (XML) available through some REST interface. The memento extensions to HTTP could be used to select a time-based version of a document.
  • Management/access:
    • Document access managed through REST

Criminal Intelligence

Streaming Information

  • Type: Social media streams (e.g., tweets), money transfer data, individual travel trends (e.g., airport check-ins), etc.
  • Nature:
    • Relational Stream: yes
    • Text stream: yes
  • Origin: Data is produced by large sets of users, or existing systems that log user activities.
  • Frequency of update:
    • Sub-second to weekly, depending on the type of stream.
  • Quality: Social media (inaccurate and noisy but high volume), money transfers (accurate but possibly missing values), individual travel trends (generally accurate but infrequent and geographically distant).
  • Management /access
    • Technology in use: Web Services (push/pull), databases, documents
    • Problems: Timely information extraction from textual data, detecting relatively small and rare traces that form patterns in large sets of data.
    • Means of improvement: Enabling the use of queries that change over time as more information becomes available, plugin of new "detection patterns", and statistical evaluation/enrichment of noisy data streams.

Static Information required to interpret the streaming information

  • Type: Domain ontologies for banking and airports, geographical data, descriptions of patterns for detection of possible criminal activity.
  • Origin: DBpedia, domain ontologies, and patterns developed within the project.
  • Dimension:
    • Data needed may exceed hundreds of MB.
  • Quality: Good
  • Management /access
    • Technology in use: Web services, XML, databases
    • Available Ontologies and Vocabularies: SSN ontology (or similar), SIOC, and domain ontologies.

Road transportation delay prediction

In this use case, the objective of the data stream analysis is the automated run-time prediction of possible disruption of road transportations. Based on traffic information, weather information, trucks' routes and their current positions in this regard, a predicting system would forecast a possible obstacle in the timely arrival of the means of transport at destination. The idea is to adopt an external classifier, fed by multiple sources of information, which alerts in case of possible adverse weather conditions (e.g., storms), infrastructural problems (e.g., trees fallen on streets) or traffic problems (e.g., accidents), possibly obstructing the itinerary of the truck.

Streaming Information

  • Type:
    • Road transportation means’ positions, labelled with timestamps, comprising longitude, latitude and velocity.
    • Road congestion notifications.
    • Weather conditions and forecast.
  • Nature:
    • Relational Stream: yes
    • Text stream: yes
  • Origin:
    • Trucks’ on-board transponders.
    • Traffic reports - online APIs.
    • Weather information and forecast services - online APIs.
  • Frequency of update:
    • Trucks’ on-board transponders.
      • Seconds
      • In triples/minute: from 1 to 100 t/min per truck
    • Traffic reports
      • Minutes
      • In triples/minute: from 1 to 10 t/min per zone
    • Weather
      • Minutes
      • In triples/minute: from 1 to 10 t/min per zone


  • Quality: data are generally clean.
  • Management /access
    • Technology in use: traffic and weather reports can be gathered via web by free APIs, or under subscription. CSV, JSON and XML formats are available for returned information. Transponder data base on GPS information and are conveyed via proprietary formats to the trucking companies. CSV formats are known to be used to this extent.
    • Problems: given the route of the truck, the traffic conditions along the remaining section of the path have to be checked. Also, only the weather conditions that possibly affect the zones which will be traversed by the truck have to be considered. In particular, different weather conditions can affect broader or narrower areas. On-board transponders' data are not public, and need specific authorisations from the trucking company in order to get access to them. Traffic and weather report/forecast online APIs are usually accessible under subscription.
    • Means of improvement: spatial proximity calculation heuristics, regarding roads and regional areas, may help for what the cross-check with weather conditions is concerned. Moreover, the analysis would considerably benefit from the integration between Event Stream Processing System and automated classifiers, from Machine Learning.

Static Information required to interpret the streaming information

  • Type: road-maps, tracked truck identifiers and planned routes.
  • Origin: open street map, wikipedia, freebase, open data, routes planned on the SatNav.s, etc.
  • Dimension:
    • MBs per truck
    • In triples: 10^6
  • Quality: for what road-maps are regarded, there is no particular issue. Routes instead can be more or less detailed, ranging from just the specification of current position and final destination to the whole preset route on the Satnav.
  • Management /access
    • Technology in use: web API for maps.
    • Problems: the access to routes may be not granted.

Queries => Language functionalities

  • Select all the events within a N-minute time window. => Sliding window (C), Sequences (H)
  • Considering (M<N)-minute intervals in the previous time window, calculate the average of different properties (velocity, etc.). => C, H, Rename (A), Slicing window (D), Aggregation (B), Query composition (G)
  • From the previous result, given the truck number, its current position and its planned route, calculate the current distance to the next warehouse. => C, H, A, D, B, G, Integration with external data sources (F), Event source selection (I)
  • Given the truck number, its current position and its direction, retrieve the traffic information in the upcoming section of the road. => C, H, A, D, B, G, F, I
  • Given the truck number, its current position and its direction, retrieve information about the status of the next section of the road (e.g., construction sites). => C, H, A, D, B, G, F, I
  • Given the truck number, its current position and its direction, retrieve the weather information and forecast affecting the next section of the road. => C, H, A, D, B, G, F, I
  • Given a truck number, provide the joint XML representation of the result of queries 3-6. => C, H, A, D, B, G, F, I, Data extraction (E)

Aircraft diversion prediction

In this use case, the objective of the data stream analysis is the automated run-time prediction of diversions of aircrafts, based on the sequence of positions for the current flight. The gathered information are typically kept available at public web services. Data have to be interpolated over time intervals and may be passed as such to external machines (e.g., anomaly detectors), in order to detect anomalies. After a variable-length period of anomalous behaviour for the flight under analysis, an alarm for a possible diversion is raised.

Streaming Information

  • Type:
    • Aircraft positions labelled with timestamps, comprising longitude, latitude, altitude and velocity.
  • Nature:
    • Relational Stream: yes
    • Text stream: no
  • Origin:
    • Flight monitoring on-line service API records
  • Frequency of update:
    • Flight monitoring on-line service API records
      • seconds
      • In triples/minute: from 1,000 to 100,000 t/min per flight monitoring on-line service
  • Quality: data are generally clean, but can suffer from some lack of accuracy in the position of the plane. Furthermore, data are up to date but can be delayed of secs/mins. Data are generally accurate if the aircraft is next to zones covered by antennas. Otherwise, there could be approximations or blind traits along the route.
  • Management /access
    • Technology in use: Most of the flight data are managed in proprietary systems and available on the web under subscription (web API). Data are provided according in XML, JSON, etc. formats, according to a proprietary standard defined by the service provider.
    • Problems: cross-check of different data collections to determine aircraft positions with good approximations; blind zones overseas.
    • Means of improvement: The analysis would considerably benefit from the integration between Event Stream Processing System and automated classifiers, from the Machine Learning field.

Static Information required to interpret the streaming information

  • Type: maps, airports, flight numbers
  • Origin: open street map, wikipedia, freebase, open data, etc.
  • Dimension:
    • MBs
    • In triples: 10^6
  • Management /access
    • Technology in use: web API, XML repositories

Queries => Language functionalities

  • Select all the events within a N-minute time window. => Sliding window (C), Sequences (H)
  • Considering (M<N)-minute intervals in the previous time window, calculate the average of different properties (velocity, altitude, etc.). => C, H, Rename (A), Slicing window (D), Aggregation (B), Query composition (G)
  • From the previous result, calculate the average gained distance with respect to the destination airport. => C, H, A, D, B, G, Integration with external data sources (F)
  • Given a flight number, provide an XML representation of the result of the previous query. => C, H, A, D, B, G, F, Event Instance Selection (I), Data extraction (E)

Inland-water transportation delay prediction

In this use case, the objective of the data stream analysis is the automated run-time prediction of delays for barges, in the context of inland-water transportations. The prediction would be based on multiple information, such as the sequence of positions for the ship, its direction (upstream or downstream, with respect to the river), the draught, the water level, the number of vessels already in the locks, and the weather conditions. A predicting system would forecast a possible obstacle in the timely arrival of the means of transport at destination. The idea is to adopt an external classifier, fed by multiple sources of information, which alerts in case of possible adverse weather conditions (e.g., ice banks), traffic problems (e.g., through docs), or water-related problems (e.g., water level too low to allow the barge to proceed). In other words, the classifier is meant to alert ahead of time, in case some conditions possibly slowing down or impeding the progress of the ship along the river were verified.

Streaming Information

  • Type:
    • Barge positions labelled with timestamps, comprising longitude, latitude, (optionally) river kilometer and velocity.
    • Weather conditions and forecast.
    • Current water level, recorded for different sections of the river.
  • Nature:
    • Relational Stream: yes
    • Text stream: yes
  • Origin:
    • Ship monitoring on-line service API records.
    • Naval transponders data.
    • Weather information and forecast services - online APIs.
    • Water level reports.
  • Frequency of update:
    • Ship monitoring on-line service API records
      • seconds
      • In triples/minute: from 10,000 to 300,000 t/min per ship monitoring on-line service
    • Ships’ on-board transponders.
      • Seconds
      • In triples/5-minutes: from 1 to 10 t/5min per ship
    • Water-level updates
      • Minutes
      • In triples/hour: 1 to 10 t/hr per checkpoint (usually, in correspondence with locks)
    • Weather
      • Minutes
      • In triples/minute: from 1 to 10 t/min per zone
  • Quality: data are generally clean. Although, the reported draught of the ship is usually a value provided by the captain, on a voluntary basis. Therefore, its value may not be completely accurate. Also, the velocity of the ship is determined on the basis of the ground speed: hence, it does not consider the speed of the vessel only, but rather the sum of the barge’s engine speed with the effect of the river stream’s thrust.
  • Management /access
    • Technology in use: inland-water traffic (as far as maritime traffic) and weather reports can be gathered via web by APIs, under subscription. CSV, JSON and XML formats are available for returned information. Transponder data are conveyed via proprietary formats to the shipping companies. CSV formats are known to be used to this extent. Water levels are stored in different textual formats, formatted on the discretionary basis of the public authority gathering the data.
    • Problems: on-board transponders' data are not generally public, and need specific authorisations from the shipping company in order to get access to them. The same holds true for water-level measurements, which require the agreement of public authorities. Weather conditions and forecast online APIs are usually accessible under subscription. Given the direction of the ship (i.e., either upstream or downstream) and its current location, only weather conditions affecting the remaining section of river to traverse are relevant. Futhermore, no ETA (Estimated Time of Arrival) is usually provided, therefore a delay can only be analysed w.r.t. the average timings of other ships.
    • Means of improvement: spatial proximity calculation heuristics, regarding rivers and regional areas, may help for what the cross-check with weather conditions is concerned. Moreover, the analysis would considerably benefit from the integration between Event Stream Processing System and automated classifiers, from Machine Learning.

Static Information required to interpret the streaming information

  • Type: maps, harbours, locks, ship numbers and vessel models
  • Origin: open street map, wikipedia, freebase, open data, ship monitoring services, local authorities, etc.
  • Dimension:
    • MBs
    • In triples: 10^6
  • Management /access
    • Technology in use: web API, XML or CSV repositories.

Queries => Language functionalities

  • Select all the events within a N-minute time window. => Sliding window (C), Sequences (H)
  • Considering (M<N)-minute intervals in the previous time window, calculate the average of different properties (velocity, draught, etc.). => C, H, Rename (A), Slicing window (D), Aggregation (B), Query composition (G)
  • From the previous result, given the vessel number, its current position and its direction (upstream, downstream), calculate the current distance to the next lock. => C, H, A, D, B, G, Integration with external data sources (F), Event source selection (I)
  • Given the vessel number, its current position and its direction, retrieve the water level measured at the next lock. => C, H, A, D, B, G, F, I
  • Given the vessel number, its current position and its direction, calculate the number of vessels in the surroundings of the next lock. => C, H, A, D, B, G, F, I
  • Given the vessel number, its current position and its direction, retrieve the weather information and forecast for the next section of the river. => C, H, A, D, B, G, F, I
  • Given a vessel number, provide the joint XML representation of the result of queries 3-6. => C, H, A, D, B, G, F, I, Data extraction (E)

Attempt to write SPARQL queries (extended)

  • Select all the events within a N-minute time window. => Sliding window (C), Sequences (H)

select ?e 
from stream <streamhere> [N min]
where {
?e a :Event
}

Considering (M<N)-minute intervals in the previous time window, calculate the average of different properties (velocity, draught, etc.). => C, H, Rename (A), Slicing window (D), Aggregation (B), Query composition (G)

select AVG(?vel) as ?velo 
from stream <streamhere> [N min]
where {
?e a :Event.
?e :velocity ?vel.
}


From the previous result, given the vessel number, its current position and its direction (upstream, downstream), calculate the current distance to the next lock. => C, H, A, D, B, G, Integration with external data sources (F), Event source selection (I)

Given the vessel number, its current position and its direction, retrieve the water level measured at the next lock. => C, H, A, D, B, G, F, I

select ?water 
from <streamname> [N min]
from <boatsstream> [N min]
where {
  ?vessel :number %VESNUMBER%.
  ?vessel :position ?p.
  ?p gs:nearby ?loc.
  ?locp :waterlevel ?water.
}

Queries require also geospatial support (e.g. geosparql)

Event based Traceability/Visibility in Supply chains

Streaming Information

  • Type: Supply chain EPCIS events
  • Nature:
    • Relational Stream: yes
    • Text stream: no
  • Origin: RFID/Barcode readers
  • Frequency of update:
    • seconds to minutes
    • In events/minute: 10000-100 events/min
  • Quality: generally accurate
  • Management /access
    • Technology in use: Commercial RFID event processing platforms that provide EPCIS capture and query interfaces
    • Problems: Interoperability between systems
    • Means of improvement: Add semantics to EPCIS events

Static Information required to interpret the streaming information

  • Type: Product master data, GIS data, Environmental sensor data
  • Origin: Persistent databases and other sensors

Queries

    • Select the EPCIS events in the last 5 minutes where the business step is commissioning.
    • Select the EPCIS events from the last 20 events where the EPCs commissioned are between X and Y
    • Select the EPCs from the events in the last 5 minutes which were part of an aggregation event.

Water Supply and Sewage Network Management

Streaming Information

  • Type:
    • Water network sensors: pressure, flow rate, stored volume, tank level, power consumption.
    • Environmental data: rainfall, environmental hazard risk zones.
    • Water quality: turbidity, chlorine, free chlorine, pH, temperature.
    • Alarms: built-in alarms generated from sensors, sensor values out of defined range, or issues in network facilities.
    • Events data generated from crowd-sensing platforms.
  • Nature:
    • Relational Stream: yes
    • Text stream: no
  • Origin:
    • Sensors distributed in water and sewage network.
    • SCADA systems that generate new kind of signals and alarms.
    • Citizen or Facilities Companies enabled with mobile devices.
  • Frequency of update:
    • Typically from minutes.
    • In triples/minute: thousands t/min for a medium sized city (hundreds of thousand inhabitants.
  • Quality:
    • Sensor generated: generally good but there existis missing data, erroneous values and diverse precision issues.
  • Management /access
    • Technology in use: proprietary SCADA systems.
    • Problems:
      • Closed proprietary systems.
      • Integration with other data sources is complex and costly.
      • Lack of standardization complicates aggregation of data.
      • Queres require propietary mechanisms and usually custom programming per use case.
      • Low reasoning capabilities of existing systems to develop advanced analytics.
    • Means of improvement:
      • Open and standard query systems.
      • Easy integrations of data sources (weather, administrative, geographical, environmental).
      • Improve reasoning and advanced analytics on graph of sensors.

Static Information required to interpret the streaming information

  • Type:
    • Topology of the sensor network.
    • Specification of sensor devices.
    • Position of each sensor.
    • Maps showing network locations.
  • Origin:
    • SDIs
    • Custom or proprietary sources of topological and GIS data.
    • Custom schema definition of sensor devices.
  • Dimension:
    • Unknown
  • Quality:
    • Mostly inexistent RDF data
  • Management /access
    • Technology in use: RDBMS, proprietary technologies.
    • Available Ontologies and Vocabularies:
    • OGC Standards (SWE, SOS, SAS, O&M, WaterML, SensorML)
    • SSN Ontology

Sample Queries

  • Get sensors of water level attribute that exceed over 90% of maximum capacity or incoming flow rate may fill the sensed reservoir or tank in less than 24 hours.
  • Get active alarms within a geographical area (bounding box).
  • Retrieve real-time sensor measured data for type of attribute or type of sensor within a geographical area.
  • Compare minimum flow rate in two similar weeks of different years for same sensors.
  • Get sensor attributes out of defined range from in a time window.
  • Get minimum values of hourly averaged water pressures in pipe.

Social Media

Streaming Information

  • Type:
    • Posts, Comments, Homepages
    • Identities
    • Clicks, Interactions with other Identities
  • Nature:
    • Relational Stream: yes
    • Text stream: yes
  • Origin:
    • Social Media Platforms: Facebook, Twitter,.LinkedIn...
    • Homepages, Newspaper...
  • Frequency of update:
    • Sub-seconds/seconds/minutes/hours
    • In data/sec (Paid)
      • Twitter 10k Posts / Sec
      • Facebook: est 41k Posts / Sec
    • In data/sec (unpaid)
      • Both 50 posts / sec
      • Homepages: hard to estimate, roughly 1 Post / 5 Minutes (Newspaper)
  • Quality:
    • Platforms: Data is clean and well defined. Sometimes encoding problems occur, e.g. when a user posts a smiley or somthing like that
    • Other SM: The maintainer sometimes uses a false field for the data encoding, e.g. in RSS sometimes the content is used with <content>, sometimes with <description>. Sometimes the content isn't static, an URL is used as placeholder for a future post
  • Management /access
    • Technology in use: mostly 3rd party APIs or libraries.
    • Available Ontologies and Vocabularies: mostly built upon SIOC and FOAF
    • Problems: mostly data harmonizing problems and integration issues
    • Means of improvement: --


  • Why RDF Stream?
    • Integration of several streams with dynamic schema (sharing/processing on the Web)
    • Integration with reasoning to apply high level analysis, e.g. run predefined KPIs, such as top X opinion leader or LDA
    • Integration of various sources

Example Queries

Motivated by Social Network Analysis (SNA)

  • get Top X, e.g. 3 opinion leader within a defined time-interval (=query window)
  • get Top X, e.g. 3 discussed topics within a defined time-interval (Similiar to emanuelle's wine query)
  • activity on media channels
  • get Top X sentiments, e.g. positive/neutral/negative topics