RSP Serialization Group

Links with Web of things Interest Group

Discussions on this topic may overlap or be strongly correlated with those of this interest group:

http://www.w3.org/WoT/IG/

Assumptions

The ontology used for the RDF data generation is known. It is known to RDF stream generator, processing engine and consumers. It should be publicly broadcasted.
The ontology should not be changed after that RDF data generator has started generating RDF data.
All of the URI prefixes used in RDF data should be known to all components (generator, processor and consumers). It should be broadcasted before RDF data generation and streaming.

Requirements

Serialization format shall be stream-able
Direct data retrieving (e.g., for querying) from the stream
Scalable
- applicable for powerful devices and
- for resource-constrained embedded devices (sensors, actuators, etc.)
Extendable (for time stamps or intervals)
Human readable (nice to have)
Should be compatible to existing standard serializations formats
Efficient (easy to process) and compact on the wire
Type-aware representation (provide data (e.g., objects) in type-aware manner for fast processing and memory saving)
Punctuation in the stream
...

RDF Formats

W3C Recommendations:

RDF/XML first standard format (based on XML Infoset)
Turtle is a subset of Notation3 (N3)
- N3 is a shorthand (non-XML) serialization of RDF
N-Quads also a subset of N3; supports multiple RDF Graphs
N-Triples line based format
JSON-LD
Trig

Binary Representations

HDT
- achieves high levels of compression.
- provides retrieving features to the compressed data.
- works on the complete dataset, with non-negligible processing time.
- is a W3C member submission and has several associated libraries and tools.
- Reference: Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutiérrez, Axel Polleres, Mario Arias. Binary RDF Representation for Publication and Exchange. Web Semantics: Science, Services and Agents on the World Wide Web, 19, pp. 22-41, 2013. View BibTeX
SHDT
- a deviation from HDT that simplifies the associated metadata.
- does not compete in compression but allows operating in constrained devices.
- Reference: H. Hasemann, A. Kroller, and M. Pagel. RDF Provisioning for the Internet of Things. In Proc. of the International Conference on the Internet of Things (IOT), pp.143-150. IEEE, 2012. View BibTeX
ERI
- considers an RDF stream as a continuous flow of blocks (with predefined maximum size) of triples.
- follows an encoding procedure similar to that of the Efficient XML Interchange (EXI) format, multiplexing the information into structural and value channels.
- a standard compressor can be used in each channel, leveraging its data regularities to produce better compression results.
- produces state-of-the-art compression, remaining competitive in processing time.
- Reference: Javier D. Fernández, Alejandro Llaves, Oscar Corcho. Efficient RDF Interchange (ERI) Format for RDF Data Streams. In Proc. of the International Semantic Web Conference (ISWC), 2014. View BibTeX Research Object Github source code
RDSZ
- applies the general-purpose stream compressor Zlib to RDF streams.
- uses differential encoding to take advantage of structural similarities.
- achieves gains in compression at the cost of increasing the processing time.
- Reference: Norberto Fernández, Jesús Arias, Luis Sánchez, Damaris Fuentes-Lorenzo, Óscar Corcho. RDSZ: An approach for lossless RDF stream compression. In Proc. of the Extended Semantic Web Conference (ESWC), LNCS 8465, pp. 52–67, 2014. View BibTeX
Efficient RDF with W3C EXI
- uses the benefits of EXI coding (e.g., low memory usage, fast processing, high compression rate)
- EXI is applicable to resource-constrained embedded devices such as microcontrollers (e.g., see paper)
- can be perfectly harmonized with a semantic repositories such as RDF Store (e.g., see μRDF Store; more information will be followed)
RDF Binary using Apache Thrift
- can encode RDF graphs and SPARQL result set
- its main objective is to provide fast machine encoding and decoding
- Thrift allows to implement the format in a wide variety of programming languages
- Reference: http://afs.github.io/rdf-thrift/
Binary RDF in Sesame
- its main objective is to reduce parsing overhead and memory requirements
- Reference: http://www.rivuli-development.com/2011/11/binary-rdf-in-sesame/
RDF-3X
- supposedly good compression and efficient indexing on static RDF data. I don't know if usable for streaming
- Reference: Thomas Neumann and Gerhard Weikum. 2010. The RDF-3X engine for scalable management of RDF data. The VLDB Journal 19, 1 (February 2010), 91-113. DOI=10.1007/s00778-009-0165-y [1]

Evaluation

	RDF/XML (XML Infoset)	Turtle	N-Quads	N-Triples	Trig	JSON-LD
Stream-able / quads?	Yes	No	Yes	No	Yes	Yes
Extendable	Yes (e.g., use attributes)	Yes	Yes	Yes	Yes	Yes
Human readable	Yes	Yes	Yes	Yes	Yes	Yes
Standard	Yes	Yes	Yes	Yes	Yes	Yes
Efficient/Compact	Yes (with EXI)	No	No	No	No	Yes?
Direct processable/retrieving	Yes	Yes	Yes	Yes	Yes	Yes
Scalable	Yes (use EXI for constrained embedded devices)	No	No	No	No	No
Type-aware (announce the type of value)	Yes	Yes*	Yes*	Yes*	Yes*	Yes*
...

RDF/XML Example

<Description rdf:about="http://rsp.sample/" rsp:timeStamp= "2014-09-08T10:12:05">
 <RSP>Hello</RSP>            
</Description>

Open Issues

Is there a dicision about the kind of streaming (consecutive graph items vs. autonomic (sub-)graph transactions)?