W3C | Submissions

Team Comment on "Binary RDF Representation for Publication and Exchange (HDT)" Submission

W3C is pleased to receive the Binary RDF Representation for Publication and Exchange (HDT) Submission from DERI Galway.

The Submission proposes a binary representation of RDF graphs intended for efficient transfer. The term dictionary and the graph encoding (e.g. integer arrays or bitmaps) can be broken up across multiple resources. The HDT Submission defines extensions to the VoID vocabulary to provide metadata related for HDT formats, graph statistics and a general category of publication items.

Implementations

There is an open source implementation, HDT-It!, documented in a separate document.

Relationships to W3C Activities

Efficient interchange of RDF graphs is clearly relevant to the Semantic Web Activity. However, there is, at the moment, no Working Group that would actively pursue this line of work.

While the current RDF Working Group is working on a standard for Turtle, the current standard serialization for RDF is RDF/XML. The Efficient XML Interchange Working Group has standardized a binary format for XML. Tensions exist between re-using an existing standard and creating a new standard, more tightly coupled to the RDF data and query model:

EXI is done
EXI is specified, implemented and debugged; there is a substantial community who already understand it.
model-specificity
EXI produces XML, requiring an e.g. RDF/XML SAX handler to produce RDF triples; HDT is framed directly in terms of the RDF graph data model.
RDF/XML's open content model
RDF/XML's EXI encoding will produce heavy tag dictionaries and light CDATA dictionaries; we're unsure if that's optimal either in theory or in common implementation. Benchmarking the encoding time and final size of large RDF graphs would clarify this impact.
re-applicability of model
HDT's format is useful in memory or memory-mapped files in that the indexing favors common query navigation paths through RDF graphs.

Next Steps

It is possible that a future Working Group will standardize a binary serialization for RDF, taking this Submission as input. The use cases presented by this Submission, as well as performance metrics that discriminate HDT from RDF/XML over EXI, will serve as input for future work on EXI.