RSP Serialization Group

From RDF Stream Processing Community Group

Links with Web of things Interest Group

Discussions on this topic may overlap or be strongly correlated with those of this interest group:

http://www.w3.org/WoT/IG/

Assumptions

  • The ontology used for the RDF data generation is known. It is known to RDF stream generator, processing engine and consumers. It should be publicly broadcasted.
  • The ontology should not be changed after that RDF data generator has started generating RDF data.
  • All of the URI prefixes used in RDF data should be known to all components (generator, processor and consumers). It should be broadcasted before RDF data generation and streaming.

Requirements

  • Serialization format shall be stream-able
  • Direct data retrieving (e.g., for querying) from the stream
  • Scalable
    • applicable for powerful devices and
    • for resource-constrained embedded devices (sensors, actuators, etc.)
  • Extendable (for time stamps or intervals)
  • Human readable (nice to have)
  • Should be compatible to existing standard serializations formats
  • Efficient (easy to process) and compact on the wire
  • Type-aware representation (provide data (e.g., objects) in type-aware manner for fast processing and memory saving)
  • Punctuation in the stream
  • ...

RDF Formats

W3C Recommendations:

Binary Representations

  • HDT
    • achieves high levels of compression.
    • provides retrieving features to the compressed data.
    • works on the complete dataset, with non-negligible processing time.
    • is a W3C member submission and has several associated libraries and tools.
    • Reference: Javier D. Fernández, Miguel A. Martínez-Prieto, Claudio Gutiérrez, Axel Polleres, Mario Arias. Binary RDF Representation for Publication and Exchange. Web Semantics: Science, Services and Agents on the World Wide Web, 19, pp. 22-41, 2013. View BibTeX
  • SHDT
    • a deviation from HDT that simplifies the associated metadata.
    • does not compete in compression but allows operating in constrained devices.
    • Reference: H. Hasemann, A. Kroller, and M. Pagel. RDF Provisioning for the Internet of Things. In Proc. of the International Conference on the Internet of Things (IOT), pp.143-150. IEEE, 2012. View BibTeX
  • ERI
    • considers an RDF stream as a continuous flow of blocks (with predefined maximum size) of triples.
    • follows an encoding procedure similar to that of the Efficient XML Interchange (EXI) format, multiplexing the information into structural and value channels.
    • a standard compressor can be used in each channel, leveraging its data regularities to produce better compression results.
    • produces state-of-the-art compression, remaining competitive in processing time.
    • Reference: Javier D. Fernández, Alejandro Llaves, Oscar Corcho. Efficient RDF Interchange (ERI) Format for RDF Data Streams. In Proc. of the International Semantic Web Conference (ISWC), 2014. View BibTeX Research Object Github source code
  • RDSZ
    • applies the general-purpose stream compressor Zlib to RDF streams.
    • uses differential encoding to take advantage of structural similarities.
    • achieves gains in compression at the cost of increasing the processing time.
    • Reference: Norberto Fernández, Jesús Arias, Luis Sánchez, Damaris Fuentes-Lorenzo, Óscar Corcho. RDSZ: An approach for lossless RDF stream compression. In Proc. of the Extended Semantic Web Conference (ESWC), LNCS 8465, pp. 52–67, 2014. View BibTeX
  • Efficient RDF with W3C EXI
    • uses the benefits of EXI coding (e.g., low memory usage, fast processing, high compression rate)
    • EXI is applicable to resource-constrained embedded devices such as microcontrollers (e.g., see paper)
    • can be perfectly harmonized with a semantic repositories such as RDF Store (e.g., see μRDF Store; more information will be followed)
  • RDF Binary using Apache Thrift
    • can encode RDF graphs and SPARQL result set
    • its main objective is to provide fast machine encoding and decoding
    • Thrift allows to implement the format in a wide variety of programming languages
    • Reference: http://afs.github.io/rdf-thrift/
  • Binary RDF in Sesame
  • RDF-3X
    • supposedly good compression and efficient indexing on static RDF data. I don't know if usable for streaming
    • Reference: Thomas Neumann and Gerhard Weikum. 2010. The RDF-3X engine for scalable management of RDF data. The VLDB Journal 19, 1 (February 2010), 91-113. DOI=10.1007/s00778-009-0165-y [1]

Evaluation

RDF/XML (XML Infoset) Turtle N-Quads N-Triples Trig JSON-LD
Stream-able / quads? Yes No Yes No Yes Yes
Extendable Yes (e.g., use attributes) Yes Yes Yes Yes Yes
Human readable Yes Yes Yes Yes Yes Yes
Standard Yes Yes Yes Yes Yes Yes
Efficient/Compact Yes (with EXI) No No No No Yes?
Direct processable/retrieving Yes Yes Yes Yes Yes Yes
Scalable Yes (use EXI for constrained embedded devices) No No No No No
Type-aware (announce the type of value) Yes Yes* Yes* Yes* Yes* Yes*
...

RDF/XML Example

<Description rdf:about="http://rsp.sample/" rsp:timeStamp= "2014-09-08T10:12:05">
 <RSP>Hello</RSP>            
</Description>

Open Issues

  • Is there a dicision about the kind of streaming (consecutive graph items vs. autonomic (sub-)graph transactions)?