WebSchemas/LookInside

From W3C Wiki

Overview

Many useful data exist in the form of tables, either inside HTML documents, or as standalone files in various formats (CSV, XLS, ODF), or in relational database management systems. While tables are a natural representation of data for many applications, they do not provide the necessary semantics to enable search engines to effectively index, and surface to users the information they contain.

The goal of this proposal is to help bridge between data tables and triples, by providing a simple mechanism to describe data tables so that their contents can be understood in terms of entities and properties.

Mark-up example

Here is an example of the mark-up for an HTML table, in RDFa:

<table typeof="SetOf/Painting" vocab="http://schema.org/">
  <thead>
    <tr>
      <th property="image">Image</th>
      <th property="name">Title</th>
      <th property="dateCreated">Year</th>
      <th>Technique</th>
      <th>Dimensions</th>
      <th property="contentLocation">Gallery</th>
    </tr>
  </thead>
<tbody>...</tbody>
</table>


The table is annotated with typeof="SetOf/Painting", which means each row contains a Painting instance. Each column is annotated with the name of a property of the Painting type.

JSON-LD example

For non-HTML tables, an equivalent description can be provided in JSON-LD. Assuming the same table is available as a CSV at http://wp.org/rembrandt-paintings.csv :

{
  "@context": "http://schema.org/",
  "@type": "SetOf/Painting",
  "name": "{http://wp.org/rembrandt-paintings.csv#col:Title}",
  "dateCreated" : "{http://wp.org/rembrandt-paintings.csv#col:Year}",
  "contentLocation" : "{http://wp.org/rembrandt-paintings.csv#col:Gallery}",
  "author": "http://en.wikipedia.org/wiki/Rembrandt"
}


More details are available in the full proposal.


Background Research & Related Work

R, Mathematica, Matlab and Octave