Copyright © 2011 DERI Galway at the National University of Ireland, Galway, Ireland, Free University of Bozen-Bolzano, The Open University, Universidad Politécnica de Madrid, Alcatel-Lucent, Cisco, OpenLink Software and Profium Ltd. All rights reserved.
This document is available under the W3C Document License. See the W3C Intellectual Rights Notice and Legal Disclaimers for additional information.
This document contains a brief description of the implementation of a tool to create and interact with RDF HDT (Header-Dictionary-Triples).
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document is a part of the HDT Submission which comprises five documents:
By publishing this document, W3C acknowledges that the Submitting Members have made a formal Submission request to W3C for discussion. Publication of this document by W3C indicates no endorsement of its content by W3C, nor that W3C has, is, or will be allocating any resources to the issues addressed by it. This document is not the product of a chartered W3C group, but is published as potential input to the W3C Process. A W3C Team Comment has been published in conjunction with this Member Submission. Publication of acknowledged Member Submissions at the W3C site is one of the benefits of W3C Membership. Please consult the requirements associated with Member Submissions of section 3.3 of the W3C Patent Policy. Please consult the complete list of acknowledged W3C Member Submissions.
Figure 1 shows a conceptual description of the process of obtaining an HDT representation from a RDF graph. The first step extracts basic RDF features necessary to build the Dictionary and the underlying graph, as well as information that will be included in the Header. The second and third steps build the Dictionary and encode the Triples respectively. The abstract notion of HDT is finally implemented (fourth step) into a practical and usable HDT ready for modular and clean publication (and management) and compact exchange.
HDT-It! 0.7 is a C++ tool performing this process. It is a free software / Open Source C tool that makes use of Raptor library to provide a set of parsers and serializers between HDT and the main RDF syntaxes. It also provides a basic querying interface. The project is hosted at http://code.google.com/p/hdt-it.
HDT creation refers to the process of converting an existing RDF document (in a given syntax) into HDT. HDT-It! makes use of Raptor library to parse firstly the given document (RDF/XML, N3, Turtle, JSON).
The HDT creation is guided by a configuration file given in the execution with the main parameters (documented in the project site). The original RDF document conversion is a multi-phase process.
The Dictionary component is an abstract class which is instantiated with a concrete dictionary implementation. HDT-It! 0.7 provides the concrete class DictionaryPlain which corresponds to the dictionary implementation by default.
HDT-It! 0.7 makes use of Hash and vector structures to maintain the mapping between strings and IDs, following the alphabetical order through a final sorting and re-mapping operation.
The Triples component is an abstract class which is instantiated with a concrete triples implementation. HDT-It! 0.7 provides the Plain Triples, Compact Triples and Bitmap Triples implementations. The configuration file will specify the concrete implementation to follow.
Once the dictionary is built, HDT-It! 0.7 makes a second read over the original RDF document replacing the IDs, building an auxiliary vector structure to represent the triples and sorting it following the Adjacency List order (by default or the order specified in the configuration file). This structure is used by any of the three given implementations.
If output file/s are given in the configuration file, HDT-It! 0.7 creates the Header, Dictionary and Triples files.
This implementation, HDT-It! 0.7, does not perform the compress phase, both for the Dictionary and the Header. In this case, the user should have to run the appropriate application (e.g. gzip and Huffman) over the generated output and change the Header dc:format property.
HDT-It! 0.7 allows an HDT load from a given HDT Header. It allows several features:
This implementation, HDT-It! 0.7, does not perform the uncompress phase, both for the Dictionary and the Header. In this case, the user should have to run first the appropriate application (e.g. gzip and Huffman) over the original input.
This feature is only available for Bitmap Triples, due to the operations (rank, select) allowed by the Bitmap indexing and used in Check&Find operation.
HDT-It! 0.7 allows to query by console or by a given file (documented in the project site). The operations can be:
The S-P-O Adjacency List order is assumed. The response patterns vary for alternative representations S-O-P Adj. List, P-S-O, P-O-S, O-P-S Adj. List and O-S-P Adj. List.
HDT work is partially funded by MICINN (TIN2009-14009-C02-02), Millennium Institute for Cell Dynamics and Biotechnology (ICDB) (Grant ICM P05-001-F), and Fondecyt 1090565 and 1110287. Javier D. Fernández is granted by the Regional Government of Castilla y Leon (Spain) and the European Social Fund.