ConverterToRdf

From W3C Wiki
Revision as of 20:20, 7 February 2013 by Rcygania2 (Talk | contribs)

Jump to: navigation, search

A Converter to RDF is a tool which converts application data from an application-specific format into RDF for use with RDF tools and integration with other data. Converters may be part of a one-time migration effort, or part of a running system which provides a semantic web view of a given application. See also: RDFImportersAndAdapters

Please add converters as you make them or hear of them.

Formats

in alphabetical order:

BibTex

BibTex is the format for bibliographic references in TeX.

Bittorrent

CSV (Comma-Separated Values)

See also: Flat Files and TSV

  • An RDF Extension is available for Google Refine. It can convert Excel, CSV, and other tabular data to RDF. The schema mapping can be defined in a graphical UI.
  • RDF123 has Windows and Linux applications to download, a Java application and servlet.
  • XLWrap wraps spreadsheets (including cross tables) to arbitrary RDF graphs; supports Excel/OpenDocument/CSV streamed processing, local/HTTP loading, expressions similar to Excel/OpenOffice Calc, custom functions, usage via API or SPARQL endpoint
  • csv2rdf4lod uses declarative RDF enhancement parameters to specify how to transform tabular data into well-structured, well-connected RDF. The tool uses identifiers for source organization, dataset, and version to establish default namespaces for all URIs created and provides VoID and provenance metadata as part of the conversion output.
  • Tarql is a command-line application that converts CSV to RDF with a user-defined mapping. The mapping is written in standard SPARQL 1.1.

Debian

The package information in Debian and similar systems (Ubuntu, Fink, etc), with its general usefulness and its graph-like nature, is a clear candidate for conversion to RDF.

See VitaVoni blog about this.

  • finkn3.py Takes Fink (OS-X port of Debian packaging) dependencies and converts to to RDF/N3. (SWAP) No idea whether this would be a quick hack to export debian data.
  • STEAMY converts Debian packages to RDF.

Email (RFC822 headers)

There are others in this vein which run over IMAP or mailbox files.@@

Excel

  • Cambridge Semantics' Anzo for Excel extracts RDF data from Excel spreadsheets while keeping the spreadsheet in-sync with the underlying data as things change
  • XLWrap wraps spreadsheets (including cross tables) to arbitrary RDF graphs; supports Excel/OpenDocument/CSV streamed processing, local/HTTP loading, expressions similar to Excel/OpenOffice Calc, custom functions, usage via API or SPARQL endpoint
  • TopBraid Composer can convert Excel spreadsheets into instances of an RDF schema.
  • TabLinker can convert non-standard Excel spreadsheets to the Data Cube vocabulary, e.g. Excel files that contain hierarchical information in row and column headers etc.
  • NOR2O can convert excel to Scovo and Data Cube Vocabulary.
  • Esxcel2rdf is a Microsoft Windows program (exe) that converts Excel files into valid RDF. It has been tested on Windows 98, and Windows 2000 Professional. (MindSwap) Export can be done via comma- or tab- separated values. See Flat Files above.
  • aperture.sf.net includes Java crawler for Excel and open document. Does only extract plaintext and basic metadata, though.
  • RDBToOnto, see description below under SQL section.

EXIF

See JPEG.

File Systems

  • TripFS exposes an entire file system as linked data, tracks changes, and links files to external data sources.

Flickr data

  • Dave Becketts flickurl library can access Flickr information (including machine tags) and convert it to RDF

Flat files

See also: CSV and TSV

  • flat2rdf converts classic unix text database files, like /etc/passwd, into RDF/N3 (Simile)

GPS

  • garmin2rdf.py Reads a Garmin GOPS receiver, dumping the contents in RDF/XML. (Matt Biddulph)
  • fromGarmin.py Downloads GPS data from a Garmin on a serial link to an RDF/N3 file. (SWAP)

iCalendar

iCalendar is an IETF standard for calendar (event and to-do list) data. Icalendar files typically are stored with a .ics extension.

Java bytecode

  • java2rdf scans java bytecode for method calls and creates a description of the dependencies between classes and the package/archive encoded in RDF/N3. (Simile)

Javadoc

  • javadoc2rdf is a doclet that makes javadoc output metadata about your code (structure of the classes, methods, comments, etc.) encoded in RDF/N3. (Simile)

Issue tracking: Jira

  • jira2rdf transforms Atlassian Jira's events about bug reports and issue tracking into RDF/N3.

JPEG

The metadata within JPEG photo is encoded in the EXIF standard.

  • jpeg2rdf scans a folder for JPEG files, parses the EXIF and IPCT metadata found in those files and dumps an RDF/N3 representation of it into a file. (Simile)
  • An adapted version of jhead extracts RDF data form the EXIT encoded in JPEG files within a directory. Generates RDF/N3. (SWAP)

LDIF

This is format used for contact information in LDAP server system. It is for example exported by Thunderbird's address-book.

  • ldif2n3.py Very incomplete, but useful. Generates foaf. Hides email addresses by hashing in the FOAF style if -m command flag is given. (SWAP)

Makefile

The unix Makefile syntax expresses dependencies between files in a software build.

  • make2n3.py Convert the makefiles in several directories in RDF and merge them to get the big picture. (SWAP)

MARC

transforms MARC records from Z39.2 format into MODS and then from MODS to an RDF representation of MODS.

  • MARiMbA is a command-line tool, designed with librarians in mind, to transform MARC (MAchine-Readable Cataloging) records to RDF, following Linked Data best practices.

Meteographical

  • Meteo is UK weather forecast data in RDF, extracted from NOAA's public domain GRIB files. Example: London.

Microformats

Multimedia

Following the DRY principle, a pointer to tools in the realm of multimedia (origin: MMSEM-XG):

OAI-PMH

  • oai2rdf harvests an OAI-PMH repository and transforms the captured metadata in an RDF representation thru pluggable XSLT stylesheets.

Outlook

Microsoft Outlook contains contact and event data, and so on in a proprietary format.

  • Lookout.py convers the Microsoft Outlook calendar and address format into RDF. (SWAP)
  • aperture.sf.net includes Java crawler for MS Outlook

Open Financial Exchange (OFX)

OFX is the format for downloaded bank statements and other financial information. There are various levels of OFX, the early ones being HTTP headers followed by SGML, the later ones being HTTP-like headers followed by XML.

  • OFX-to-n3.y converts OFX format to RDF/N3. The conversion is only syntactic. The OFX modeling is pretty well thought out, so taking it as defining an RDF ontology seems to make sense. Rules can then be used to define mapping into your favorite ontology.

Open CourseWare

Palm OS

  • Palmagent converts the calendar format of PalmOS into RDF. (SWAP)

plist

The Apple OS-X property list (.plist) filetype is an XML fromat for arbitrary structured data. Numeric keys are used as local IDs. OS X applications store many kinds uf data in these files, including configuration data, iPhoto almum and photo data, iTunes metadata, and so on.

To convert plists well, added information is necessary, such as a namespace for the properties.

plist2rdf.xsl is an XSLT script to convert a plist file into RDF/XML. It does not add namespaces to the exported data.

Quicken Interchange Format (QIF)

  • qif2n3.py Takes Quicken interchange format and converts to to RDF/N3. (SWAP)

Quick and Dirty CSV to RDF Converter (QUIDICRC)

  • quidicrc A perl script for rapidly transferring csv to RDF with some translation in the middle. (not actively being maintained, available open source -- SWAP)

Random

Seriously.

  • random2rdf generates synthetic random graphs encoded in RDF/N3.

SDMX

SDMX is an XML-based exchange format for statistical data and metadata, used by major statistics-producing organizations such as Eurostat, the World Bank, OECD, and the IMF.

  • SDMX to QB is an XSLT-based converter that turns SDMX data sets and data structure definitions into RDF, using the Data Cube Vocabulary.

Spreadsheet

See #CSV and #Excel.

SQL

SQL databases are rich stores of relational data ideal for export as RDF. Conference tracks and many papers cover this subject from different angles. See also: RdfAndSql

  • D2RQ provides a mapping from a SQL server (tested with several brands), producing both linked virtual RDF data files and a SPARQL service. Uses a configuration file in Turtle. (DERI and FU Berlin)
  • dbview.py provides a mapping from a SQL server (tested with mySQL), producing linked virtual RDF data files. Uses a configuration file in N3. (SWAP)
  • OpenLink Virtuoso's declarative N3/Turtle based Metaschema Language enables the creation of RDF Instance Data for associated RDF Ontologies via RDF VIEWs of ODBC, JDBC, ADO.NET, and OLE-DB accessible SQL Data. It is important to note that these VIEWs also apply to Native Virtuoso Data and/or Heterogeneous Data from other Web Services, HTTP/WebDAV, NNTP, and other Data Sources known to Virtuoso. This is an enhancement of the traditional SQL VIEW concept than enables multiple use of the same base SQL Data from a variety of data access points.
  • Triplify is a small plugin for Web applications, which reveals the semantic structures encoded in relational databases by making database content available as RDF, JSON or Linked Data.
  • RDBToOnto is a full-fledged conversion tool that can produce accurate RDF/OWL models from various types of relational databases and Excel spreadsheets. The conversion is fully automated while various parameters can be set through the user interface to refine the resulting models (e.g., derivation of rich class hierarchies, proper naming of instances, database optimization before conversion, etc).
  • morph or morph implement R2RML and perform a transformation from RDB to RDF.

Some RDF Triple stores are implemented using SQL databases, but that is not covered here.

Subversion

Subversion is a code-management system.

  • svn2rdf A pair of scripts; one can be used in a post-commit subversion hook to generate RDF/N3 with each commit, the other on a working copy. (Simile)

TSV (Tab-Separated Values)

See also: Flat Files and CSV

  • tab2n3.py Takes Tab-separated text (as typically output by all kinds of things including Microsoft Output and Spreadsheets) and converts it to N3, using the column headings to generate property URIs. (SWAP)
  • TopBraid Composer can convert tab-separated spreadsheet files into an RDF/OWL class with corresponding properties and instances.
  • XLWrap, [2] wraps CSV files (and spreadsheets) to arbitrary RDF graphs; supports local/HTTP loading, expressions similar to Excel/OpenOffice Calc, custom functions, usage via API or SPARQL endpoint

Talis SW Format Converter

  • Talis' converter, convert from various format to various formats (including RDF->RDF with various serializations, RDF->HTML, etc)

UML

  • TopBraid Composer can convert UML Class Diagrams (XMI format) into RDF/OWL models.
  • EulerGUI is a lightweight IDE that translates on the fly UML and eCore XMI into N3. Moreover there are N3 rules to convert UML to OWL.

VCARD, Addressbook, …

VACRD is a standard for interchange of contact data, such as business cards and address books.

"Representing vCard Objects in RDF/XML" is a W3C note defining an ontology for VCARD. FOAF is widely used ontology covering some of the domain.

Weather

  • weather2rdf Given a US city or ZIP code, retrieves weather report data from weather.com and returns it in RDF. (Simile)

XML

  • GRDDL: Any XML files can be marked up with pointers to XSLT files which convert them to RDF. The standard for this is GRDDL. A GRDDL pointer can even be put in an XML schema, so that automatically all XML documents written to that schema will have a defined RDF mapping which any GRDDL-aware processor will benefit from. Several XSLT conversion transformations can be found linked from MicroModels
  • Krextor is a framework for extracting RDF in various notations from various XML languages and can easily be extended for additional input languages. Support for RDFa and some mathematical markup languages is built in. The implementation is done in XSLT, with a command-line frontend and a Java wrapper.
  • TopBraid Composer can convert XML Schema (and their XML instance files) into RDF/OWL models.
  • Rhizomik ReDeFer includes XSD2OWL and XML2RDF plus MPEG-7 to RDF (all XSLT-based)
  • XHTML: Convert existing pages to RDF. For example, see HtmlToRdf.
  • SPARQL2XQuery The SPARQL2XQuery Framework provides mechanisms for: (a) Query translation (SPARQL to XQuery) (b) Mapping specification & generation (Ontology to XML Schema) (c) Schema transformation (XML Schema to OWL) and (d) Data Transformation (XML to RDF and vice versa)

XMP

XMP is an Adobe-sponsored specification for putting RDF metadata in virtually any form of file, including binary formats. XMP metadata is RDF data in fact, but it has to be extracted from the file.

Frameworks

The following are general tools which provide conversion from many formats.

AnnoCultor

AnnoCultor was built during several years of practical work on porting various datasets to RDF. It allows converting data from the following data sources:

  • databases via SQL and JDBC;
  • XML files, also in batch;
  • RDF files,
  • Solr servers,
  • custom formats, via format-specific parsers written in Java.

AnnoCultor is specifically suited for the situations where XSLT is not sufficient.

It comes with built-in converters for Geonames and Getty vocabularies (AAT, ULAN, TGN), that are ready to use. Several additional specific converters illustrate advanced use: converters for collections of Louvre and Joconde, Institute Collection Netherlands, Dutch Museum of Asian Ceramics, Tropenmuseum Amsterdam.

As part of conversion, AnnoCultor can semantically tag (enrich) data with links to various vocabularies, with advanced customised disambiguation and term processing possibilities. These vocabularies should be represented in RDF or SKOS to be imported via SPARQL queries. AnnoCultor comes with built-in tagging with Geonames and a custom time ontology.

AnnoCultor is written in Java, but conversion rules are written in XML. They are extendible with either small Java snippets, or custom rules implementions in Java. AnnoCultor has been practically used with datasets ranging from a few records to more than ten millions, containing up to dozens fields each.

Apache Any23

Apache Any23 is a Java library web service and command line tool for parsing multiple document formats and extracting structured data in RDF format from a variety of Web documents. Currently it supports the following input formats:

  • RDF/XML, Turtle, Notation 3
  • RDFa
  • Microformats: Adr, Geo, hCalendar, hCard, hListing, hResume, hReview, License, XFN and Species

Apache Any23 is used in major Web of Data applications such as sindice.com and sig.ma.

Aperture

  • Aperture is a project written in Java gathering RDF extractors for many formats, mentioned in the list above.

Aperture supports crawling, making it not a converter but a framework to crawl updates of data (like rsync).

PiggyBank

  • Piggy-bank is a Simile project which allows the Firefox-based clent to automatically load "RDFizers", javascript-based converters to RDF.

Piggy-bank associates given scarping scripts with given web sites. (How?)

Triplr

Triplr is a general “Stuff in, triples out” system by Dave Beckett. Triplr handles GRDDL, RSS, Atom, and other formats.

Virtuoso Sponger

OpenLink Software via the "Sponger" component of Virtuoso's SPARQL Processor and Proxy Web Service (used by default by OpenLink Data Explorer) provides RDFization for:

  • RDFa
  • GRDDL
  • Amazon Web Services
  • eBay Web Services
  • Freebase Web Services
  • Facebook Web Services
  • Yahoo! Finance
  • XBRL Instance documents
  • DOI (includes a custom resolver for HTTP)
  • OAI
  • RSS/Atom Feeds
  • Digital Music Files (various formats via ID3 Tags)
  • Image Files
  • vCard
  • iCalendar
  • Microformats - hCard, hCalendar
  • HR-XML Resumes
  • Flickr
  • Del.icio.us
  • Bugzilla
  • ODBC or JDBC accessible SQL Data
  • Many others

Notes

Historically, this list was made from a lists of RDFizers and SWAP converters. It has grown significantly from community input since then.

This should be in a data format like Semantic Media Wiki or in N3 -- TimBL

> Would there an advantage to have this kind of list in an RDF file specifically to make queries on it. Maybe if we add a format on how to declare it here, we could create a converter to RDF. -- KarlDubost

> The task force InfoGathering from SWEO works on such a vocabulary, if you want to rewrite this list using this vocab, look here: DataVocabulary or contact me -- LeoSauermann on 22.1.2007