ConverterToRdf

From W3C Wiki

A Converter to RDF is a tool which converts application data from an application-specific format into RDF for use with RDF tools and integration with other data. Converters may be part of a one-time migration effort, or part of a running system which provides a semantic web view of a given application. See also: RDFImportersAndAdapters

Please add converters as you make them or hear of them.

Formats

in alphabetical order:

BibTex

BibTex is the format for bibliographic references in TeX.

Bittorrent

CSV (Comma-Separated Values)

See also: Flat Files and TSV

  • An RDF Extension is available for Google Refine. It can convert Excel, CSV, and other tabular data to RDF. The schema mapping can be defined in a graphical UI.
  • RDF123 has Windows and Linux applications to download, a Java application and servlet.
  • XLWrap wraps spreadsheets (including cross tables) to arbitrary RDF graphs; supports Excel/OpenDocument/CSV streamed processing, local/HTTP loading, expressions similar to Excel/OpenOffice Calc, custom functions, usage via API or SPARQL endpoint
  • csv2rdf4lod uses declarative RDF enhancement parameters to specify how to transform tabular data into well-structured, well-connected RDF. The tool uses identifiers for source organization, dataset, and version to establish default namespaces for all URIs created and provides VoID and provenance metadata as part of the conversion output.
  • Tarql is a command-line application that converts CSV to RDF with a user-defined mapping. The mapping is written in standard SPARQL 1.1.

Debian

The package information in Debian and similar systems (Ubuntu, Fink, etc), with its general usefulness and its graph-like nature, is a clear candidate for conversion to RDF.

See VitaVoni blog about this.

  • finkn3.py Takes Fink (OS-X port of Debian packaging) dependencies and converts to to RDF/N3. (SWAP) No idea whether this would be a quick hack to export debian data.
  • STEAMY converts Debian packages to RDF.

Email (RFC822 headers)

There are others in this vein which run over IMAP or mailbox files.@@

Excel

  • Cambridge Semantics' Anzo for Excel extracts RDF data from Excel spreadsheets while keeping the spreadsheet in-sync with the underlying data as things change
  • XLWrap wraps spreadsheets (including cross tables) to arbitrary RDF graphs; supports Excel/OpenDocument/CSV streamed processing, local/HTTP loading, expressions similar to Excel/OpenOffice Calc, custom functions, usage via API or SPARQL endpoint
  • TopBraid Composer can convert Excel spreadsheets into instances of an RDF schema.
  • TabLinker can convert non-standard Excel spreadsheets to the Data Cube vocabulary, e.g. Excel files that contain hierarchical information in row and column headers etc.
  • NOR2O can convert excel to Scovo and Data Cube Vocabulary.
  • Esxcel2rdf is a Microsoft Windows program (exe) that converts Excel files into valid RDF. It has been tested on Windows 98, and Windows 2000 Professional. (MindSwap) Export can be done via comma- or tab- separated values. See Flat Files above.
  • aperture.sf.net includes Java crawler for Excel and open document. Does only extract plaintext and basic metadata, though.
  • RDBToOnto can handle Excel spreadsheets. See description below under SQL section.
  • Sheet2RDF is a platform for acquisition and transformation of spreadsheets (Microsoft Excel and OpenDocument spreadsheets) into RDF. It combines a practical user interface (available for the RDF Management Framework Semantic Turkey) with the potentialities of a full transformation language (PEARL, provided by the CODA framework).
  • SKOS Play! SKOS Play also provides a way to convert Excel spreadsheets to SKOS files. This makes it easy to produce SKOS files for taxonomies and authorities list. The converter can even generate data in other RDF vocabularies.
  • Spread2RDF is a converter for complex spreadsheets to RDF and a Ruby-internal DSL for specifying the mapping rules for this conversion.

EXIF

See JPEG.

File Systems

  • TripFS exposes an entire file system as linked data, tracks changes, and links files to external data sources.

Flickr data

  • Dave Becketts flickurl library can access Flickr information (including machine tags) and convert it to RDF

Flat files

See also: CSV and TSV

  • flat2rdf converts classic unix text database files, like /etc/passwd, into RDF/N3 (Simile)

Free Text

  • FRED [1] is a machine reading tool that extracts RDF from natural language text, by following ontology design and linked data patterns. Links to DBpedia, schema.org, WordNet, VerbNet, etc. are included in the output, as well as deep text annotation in either Earmark or NIF.
  • SemNova [2] a tool to transform text in RDF

GPS

  • garmin2rdf.py Reads a Garmin GOPS receiver, dumping the contents in RDF/XML. (Matt Biddulph)
  • fromGarmin.py Downloads GPS data from a Garmin on a serial link to an RDF/N3 file. (SWAP)

iCalendar

iCalendar is an IETF standard for calendar (event and to-do list) data. Icalendar files typically are stored with a .ics extension.

Java bytecode

  • java2rdf scans java bytecode for method calls and creates a description of the dependencies between classes and the package/archive encoded in RDF/N3. (Simile)

Javadoc

  • javadoc2rdf is a doclet that makes javadoc output metadata about your code (structure of the classes, methods, comments, etc.) encoded in RDF/N3. (Simile)

Issue tracking: Jira

  • jira2rdf transforms Atlassian Jira's events about bug reports and issue tracking into RDF/N3.

JPEG

The metadata within JPEG photo is encoded in the EXIF standard.

  • jpeg2rdf scans a folder for JPEG files, parses the EXIF and IPCT metadata found in those files and dumps an RDF/N3 representation of it into a file. (Simile)
  • An adapted version of jhead extracts RDF data form the EXIT encoded in JPEG files within a directory. Generates RDF/N3. (SWAP)

LDIF

This is format used for contact information in LDAP server system. It is for example exported by Thunderbird's address-book.

  • ldif2n3.py Very incomplete, but useful. Generates foaf. Hides email addresses by hashing in the FOAF style if -m command flag is given. (SWAP)

Makefile

The unix Makefile syntax expresses dependencies between files in a software build.

  • make2n3.py Convert the makefiles in several directories in RDF and merge them to get the big picture. (SWAP)

MARC

transforms MARC records from Z39.2 format into MODS and then from MODS to an RDF representation of MODS.

  • MARiMbA is a command-line tool, designed with librarians in mind, to transform MARC (MAchine-Readable Cataloging) records to RDF, following Linked Data best practices.
  • easyM2R

easily convert MARC data to RDF. easyM2R is a php-based attempt to easily convert MARC data to RDF using JSON-LD and MARCspec for configuration.

Meteographical

  • Meteo is UK weather forecast data in RDF, extracted from NOAA's public domain GRIB files. Example: London.

Microformats

MongoDB

MongoDB is a NoSQL document database that stores JSON documents along with binary contents (BSON).

  • Morph-xR2RML implements the generic xR2RML mapping language and comes with a connector for MongoDB. It can map arbitrary MongoDB JSON documents to arbitrary domain ontologies, either by materializing the RDF data or by performing SPARQL-to-MongoDB query rewriting.

Multimedia

Following the DRY principle, a pointer to tools in the realm of multimedia (origin: MMSEM-XG):

OAI-PMH

  • oai2rdf harvests an OAI-PMH repository and transforms the captured metadata in an RDF representation thru pluggable XSLT stylesheets.

Outlook

Microsoft Outlook contains contact and event data, and so on in a proprietary format.

  • Lookout.py convers the Microsoft Outlook calendar and address format into RDF. (SWAP)
  • aperture.sf.net includes Java crawler for MS Outlook

Open Financial Exchange (OFX)

OFX is the format for downloaded bank statements and other financial information. There are various levels of OFX, the early ones being HTTP headers followed by SGML, the later ones being HTTP-like headers followed by XML.

  • OFX-to-n3.y converts OFX format to RDF/N3. The conversion is only syntactic. The OFX modeling is pretty well thought out, so taking it as defining an RDF ontology seems to make sense. Rules can then be used to define mapping into your favorite ontology.

Open CourseWare

Palm OS

  • Palmagent converts the calendar format of PalmOS into RDF. (SWAP)

plist

The Apple OS-X property list (.plist) filetype is an XML fromat for arbitrary structured data. Numeric keys are used as local IDs. OS X applications store many kinds uf data in these files, including configuration data, iPhoto almum and photo data, iTunes metadata, and so on.

To convert plists well, added information is necessary, such as a namespace for the properties.

plist2rdf.xsl is an XSLT script to convert a plist file into RDF/XML. It does not add namespaces to the exported data.

Quicken Interchange Format (QIF)

  • qif2n3.py Takes Quicken interchange format and converts to to RDF/N3. (SWAP)

Quick and Dirty CSV to RDF Converter (QUIDICRC)

  • quidicrc A perl script for rapidly transferring csv to RDF with some translation in the middle. (not actively being maintained, available open source -- SWAP)

Random

Seriously.

  • random2rdf generates synthetic random graphs encoded in RDF/N3.

SDMX

SDMX is an XML-based exchange format for statistical data and metadata, used by major statistics-producing organizations such as Eurostat, the World Bank, OECD, and the IMF.

  • Linked SDMX is an XSLT-based converter that turns SDMX data sets and data structure definitions into RDF, using the Data Cube Vocabulary.

Spreadsheet

See CSV (Comma-Separated Values) and #Excel.

SQL

SQL databases are rich stores of relational data ideal for export as RDF. Conference tracks and many papers cover this subject from different angles. See also: RdfAndSql

  • D2RQ provides a mapping from a SQL server (tested with several brands), producing both linked virtual RDF data files and a SPARQL service. Uses a configuration file in Turtle. (DERI and FU Berlin)
  • dbview.py provides a mapping from a SQL server (tested with mySQL), producing linked virtual RDF data files. Uses a configuration file in N3. (SWAP)
  • OpenLink Virtuoso's declarative N3/Turtle based Metaschema Language enables the creation of RDF Instance Data for associated RDF Ontologies via RDF VIEWs of ODBC, JDBC, ADO.NET, and OLE-DB accessible SQL Data. It is important to note that these VIEWs also apply to Native Virtuoso Data and/or Heterogeneous Data from other Web Services, HTTP/WebDAV, NNTP, and other Data Sources known to Virtuoso. This is an enhancement of the traditional SQL VIEW concept than enables multiple use of the same base SQL Data from a variety of data access points.
  • Triplify is a small plugin for Web applications, which reveals the semantic structures encoded in relational databases by making database content available as RDF, JSON or Linked Data.
  • RDBToOnto is a full-fledged conversion tool that can produce accurate RDF/OWL models from various types of relational databases and Excel spreadsheets. The conversion is fully automated while various parameters can be set through the user interface to refine the resulting models (e.g., derivation of rich class hierarchies, proper naming of instances, database optimization before conversion, etc).
  • morph or morph implement R2RML and perform a transformation from RDB to RDF.

Some RDF Triple stores are implemented using SQL databases, but that is not covered here.

Subversion

Subversion is a code-management system.

  • svn2rdf A pair of scripts; one can be used in a post-commit subversion hook to generate RDF/N3 with each commit, the other on a working copy. (Simile)

TSV (Tab-Separated Values)

See also: Flat Files and CSV (Comma-Separated Values)

  • tab2n3.py Takes Tab-separated text (as typically output by all kinds of things including Microsoft Output and Spreadsheets) and converts it to N3, using the column headings to generate property URIs. (SWAP)
  • TopBraid Composer can convert tab-separated spreadsheet files into an RDF/OWL class with corresponding properties and instances.
  • XLWrap, [4] wraps CSV files (and spreadsheets) to arbitrary RDF graphs; supports local/HTTP loading, expressions similar to Excel/OpenOffice Calc, custom functions, usage via API or SPARQL endpoint

Talis SW Format Converter

  • Talis' converter, convert from various format to various formats (including RDF->RDF with various serializations, RDF->HTML, etc)

UML

  • TopBraid Composer can convert UML Class Diagrams (XMI format) into RDF/OWL models.
  • EulerGUI is a lightweight IDE that translates on the fly UML and eCore XMI into N3. Moreover there are N3 rules to convert UML to OWL.

VCARD, Addressbook, …

VACRD is a standard for interchange of contact data, such as business cards and address books.

"Representing vCard Objects in RDF/XML" is a W3C note defining an ontology for VCARD. FOAF is widely used ontology covering some of the domain.

Weather

  • weather2rdf Given a US city or ZIP code, retrieves weather report data from weather.com and returns it in RDF. (Simile)

XML

  • GRDDL: Any XML files can be marked up with pointers to XSLT files which convert them to RDF. The standard for this is GRDDL. A GRDDL pointer can even be put in an XML schema, so that automatically all XML documents written to that schema will have a defined RDF mapping which any GRDDL-aware processor will benefit from. Several XSLT conversion transformations can be found linked from MicroModels
  • Krextor is a framework for extracting RDF in various notations from various XML languages and can easily be extended for additional input languages. Support for RDFa and some mathematical markup languages is built in. The implementation is done in XSLT, with a command-line frontend and a Java wrapper.
  • TopBraid Composer can convert XML Schema (and their XML instance files) into RDF/OWL models.
  • Rhizomik ReDeFer includes XSD2OWL and XML2RDF plus MPEG-7 to RDF (all XSLT-based)
  • XHTML: Convert existing pages to RDF. For example, see HtmlToRdf.
  • SPARQL2XQuery The SPARQL2XQuery Framework provides mechanisms for: (a) Query translation (SPARQL to XQuery) (b) Mapping specification & generation (Ontology to XML Schema) (c) Schema transformation (XML Schema to OWL) and (d) Data Transformation (XML to RDF and vice versa)

XMP

XMP is an Adobe-sponsored specification for putting RDF metadata in virtually any form of file, including binary formats. XMP metadata is RDF data in fact, but it has to be extracted from the file.

Frameworks

The following are general tools which provide conversion from many formats.

AnnoCultor

AnnoCultor was built during several years of practical work on porting various datasets to RDF. It allows converting data from the following data sources:

  • databases via SQL and JDBC;
  • XML files, also in batch;
  • RDF files,
  • Solr servers,
  • custom formats, via format-specific parsers written in Java.

AnnoCultor is specifically suited for the situations where XSLT is not sufficient.

It comes with built-in converters for Geonames and Getty vocabularies (AAT, ULAN, TGN), that are ready to use. Several additional specific converters illustrate advanced use: converters for collections of Louvre and Joconde, Institute Collection Netherlands, Dutch Museum of Asian Ceramics, Tropenmuseum Amsterdam.

As part of conversion, AnnoCultor can semantically tag (enrich) data with links to various vocabularies, with advanced customised disambiguation and term processing possibilities. These vocabularies should be represented in RDF or SKOS to be imported via SPARQL queries. AnnoCultor comes with built-in tagging with Geonames and a custom time ontology.

AnnoCultor is written in Java, but conversion rules are written in XML. They are extendible with either small Java snippets, or custom rules implementions in Java. AnnoCultor has been practically used with datasets ranging from a few records to more than ten millions, containing up to dozens fields each.

Apache Any23

Apache Any23 is a Java library web service and command line tool for parsing multiple document formats and extracting structured data in RDF format from a variety of Web documents. Currently it supports the following input formats:

  • RDF/XML, Turtle, Notation 3
  • RDFa
  • Microformats: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License, XFN and Species

Apache Any23 is used in major Web of Data applications such as sindice.com and sig.ma.

Aperture

  • Aperture is a project written in Java gathering RDF extractors for many formats, mentioned in the list above.

Aperture supports crawling, making it not a converter but a framework to crawl updates of data (like rsync).

Datalift

Datalift is an original platform dedicated to the Linked Data. In Datalift, the input data are raw data coming from multiple heterogeneous formats. The produced output data are Linked Data. The Datalift platform is actively involved in the Web mutation to the Linked Data.

Currently it supports the following input formats:

  • CSV
  • RDF/XML, Turtle, Notation 3, nTriples, RDFa, TriC, TriX
  • RSS
  • XML
  • GML
  • Shapefile

Inputs can either be compressed (zip, gzip) or not.

It is also possible to input data from:

  • SQL request
  • SPARQL request

EasyRDF

A PHP library designed to make it easy to consume and produce RDF. Designed for use in mixed teams of experienced and inexperienced RDF developers. Written in PSR-12 compliant PHP and tested extensively using PHPUnit.

EasyRDF is also built in to Drupal core since version 8.

PiggyBank

  • Piggy-bank is a Simile project which allows the Firefox-based clent to automatically load "RDFizers", javascript-based converters to RDF.

Piggy-bank associates given scarping scripts with given web sites. (How?)

RML

The RDF Mapping language (RML) is defined as a superset of the W3C-recommended mapping language, R2RML. Where R2RML maps data in relational databases to RDF, RML allows to express rules that map data in heterogeneous structures and serializations to the RDF data model. One language thus allows to map and join data from databases, existing RDF, CSV, JSON, XML and Web APIs, and is easily extendable to other data formats. Except for the language itself, extensions exist that allow the inclusion of data transformations (using FnO).

Multiple processors exist that handle RML mapping documents, such as the reference implementation and carml.

Mapping documents can be edited using the RMLEditor, and validated using the RMLValidator.

SPARQL Micro-Services

The SPARQL Micro-Service architecture is meant to bridge the world of Linked Data and Web APIs. It enables SPARQL querying over Web APIs and allows assigning dereferenceable URIs to Web API resources that do not have a URI in the first place.

The current PHP implementation supports JSON-based Web APIs and provides example SPARQL micro-services for Flickr, MusicBrainz, and 3 biodiversity-related Web APIs: Encyclopedia of Life, Biodiversity Heritage Library, and the Macauley Library.

SPARQL-Generate

SPARQL-Generate is a language for expressing transformations from heterogeneous data formats to RDF. Its syntax is based on SPARQL. The output is specified in the form of a SPARQL graph template, while the values for the variables used to generate the graph are specified with SPARQL BIND and custom functions. The input files are specified in the language with the special keyword SOURCE and the URL of the file. The source file is fetched when the query is executed. An extra keyword, ITERATOR, is used to iterate over recurring data structures in the source file or files. For instance, one may iterate over XML elements with the same name in an XML file, or over JSON objects assigned to the same key in a JSON document. Existing RDF data can be combined with the transformation in the same SPARQL-Generate query, with the full expressive power of SPARQL 1.1. In addition, syntactic sugar is added to make the transformations more concise and easier to write.

The reference implementation is based on Apache Jena.

Triplr

Triplr is a general “Stuff in, triples out” system by Dave Beckett. Triplr handles GRDDL, RSS, Atom, and other formats.

Virtuoso Sponger

OpenLink Software via the "Sponger" component of Virtuoso's SPARQL Processor and Proxy Web Service (used by default by OpenLink Data Explorer) provides RDFization for:

  • RDFa
  • GRDDL
  • Amazon Web Services
  • eBay Web Services
  • Freebase Web Services
  • Facebook Web Services
  • Yahoo! Finance
  • XBRL Instance documents
  • DOI (includes a custom resolver for HTTP)
  • OAI
  • RSS/Atom Feeds
  • Digital Music Files (various formats via ID3 Tags)
  • Image Files
  • vCard
  • iCalendar
  • Microformats - hCard, hCalendar
  • HR-XML Resumes
  • Flickr
  • Del.icio.us
  • Bugzilla
  • ODBC or JDBC accessible SQL Data
  • Many others

Apache Marmotta LDClient

Apache Marmotta LDClient library is a flexible and modular Linked Data Client (RDFizer) that can be used by any Linked Data project independent of the Apache Marmotta platform. In other words, it provides the infrastructure for retrieving remote resources via different protocols (primarily HTTP) and offers pluggable adapters (called data providers) that wrap other data sources (e.g. YouTube or Facebook) as Linked Data resources by mapping their data to appropriate RDF structures.

It provides several backends with support for:

  • different RDF serializations
  • XML
  • HTML
  • RDFa
  • Facebook
  • Youtube
  • Vimeo
  • MediaWiki
  • PHPbb
  • LDAP
  • ...

Applications

GRefine - OpenRefine RDF Extension

Add on for OpenRefine (formerly Google Refine) to change flat spreadsheet data to RDF.

sheet2rdf

Sheet2RDF is a platform for acquisition and transformation of datasheets into RDF, developed by the ART Research Group at the University of Rome Tor Vergata

The platform supports Microsoft Excel, Apache OpenOffice and LibreOffice spreadsheets (through Apache POI and jOpenDocument), as well as CSV, TSV and other delimited formats (through Commons CSV).

SKOS Play!

Convert a spreadsheet to RDF. Requires the spreadsheet to be formatted in a well-defined way.

Notes

Historically, this list was made from a lists of RDFizers and SWAP converters. It has grown significantly from community input since then.

This should be in a data format like Semantic Media Wiki or in N3 -- TimBL

> Would there an advantage to have this kind of list in an RDF file specifically to make queries on it. Maybe if we add a format on how to declare it here, we could create a converter to RDF. -- KarlDubost

> The task force InfoGathering from SWEO works on such a vocabulary, if you want to rewrite this list using this vocab, look here: DataVocabulary or contact me -- LeoSauermann on 22.1.2007