WebSchemas/LookInside
Overview
Many useful data exist in the form of tables, either inside HTML documents, or as standalone files in various formats (CSV, XLS, ODF), or in relational database management systems. While tables are a natural representation of data for many applications, they do not provide the necessary semantics to enable search engines to effectively index, and surface to users the information they contain.
The goal of this proposal is to help bridge between data tables and triples, by providing a simple mechanism to describe data tables so that their contents can be understood in terms of entities and properties.
Mark-up example
Here is an example of the mark-up for an HTML table, in RDFa:
<table typeof="SetOf/Painting" vocab="http://schema.org/">
<thead>
<tr>
<th property="image">Image</th>
<th property="name">Title</th>
<th property="dateCreated">Year</th>
<th>Technique</th>
<th>Dimensions</th>
<th property="contentLocation">Gallery</th>
</tr>
</thead>
<tbody>...</tbody>
</table>
The table is annotated with typeof="SetOf/Painting", which means each row contains a Painting instance. Each column is annotated with the name of a property of the Painting type.
JSON-LD example
For non-HTML tables, an equivalent description can be provided in JSON-LD. Assuming the same table is available as a CSV at http://wp.org/rembrandt-paintings.csv :
{
"@context": "http://schema.org/",
"@type": "SetOf/Painting",
"name": "{http://wp.org/rembrandt-paintings.csv#col:Title}",
"dateCreated" : "{http://wp.org/rembrandt-paintings.csv#col:Year}",
"contentLocation" : "{http://wp.org/rembrandt-paintings.csv#col:Gallery}",
"author": "http://en.wikipedia.org/wiki/Rembrandt"
}
More details are available in the full proposal.
Background Research & Related Work
R, Mathematica, Matlab and Octave
- The R language has a 'data frame' construction, http://www.r-tutor.com/r-introduction/data-frame
- Octave has http://www.gnu.org/software/octave/doc/interpreter/Cell-Arrays.html and also http://octave.sourceforge.net/dataframe/overview.html which is close to the R approach.
- Matlab has recently added a similar construct, see http://www.mathworks.se/help/matlab/tables.html (via http://www.mathworks.se/help/matlab/release-notes.html http://www.mathworks.se/products/matlab/whatsnew.html )
- Matlab tables construct may be closer to matlab/octave structure (named fields) than to a cell array (though they're related); also new in Matlab 2013b. (via b_jonas, jwe in #octave IRC.freenode.net)
- http://pandas.pydata.org/ has data frame in python
- R/JSON/D3 discussion
- See http://www.wolfram.com/mathematica/new-in-9/built-in-integration-with-r/create-and-display-data-frames.html http://mathematica.stackexchange.com/questions/19136/creating-a-r-dataframe-like-construct-in-mathematica