dwbp-ISSUE-239 (Laufer): machine-readable standardized data formats - serialization data formats - dataset formats [Best practices document(s)]

dwbp-ISSUE-239 (Laufer): machine-readable standardized data formats - serialization data formats - dataset formats [Best practices document(s)]

http://www.w3.org/2013/dwbp/track/issues/239

Raised by: Carlos Laufer
On product: Best practices document(s)

In Best Practice 14, "Use machine-readable standardized data formats", the term data format is used to define the serialization format of a dataset distribution. 

The example uses GTFS (https://developers.google.com/transit/gtfs/reference), a standard way of distributing timetables. We have here two standards: GTFS (structure and serialization) and CSV (serialization). GTFS is distributed as a set of CSV files embedded in a single .zip style file.

The previous BP examples use timetables but it is not explicit if it was a GTFS feed. It could be any format and it seems that it is a single file containing all the information, distributed in different formats as csv, json, ttl, etc. But GTFS is a standard way of defining more that the serialization format (a set of csv files). It defines the structure and the meaning of data (a set of specific named files and a vocabulary).

Serialization standardized data formats has a semantic related to how a machine understand the meta-model of the different ways of distributing data, the data itself is inside this pack. This data could use a standard: a vocabulary or a more complex structure of distribution, as GTFS, for example, and so on.

I think this difference should be clear in the document. Maybe it will be interesting to have a BP talking about things like GTFS. I cannot see a BP that talks about this: using standards for publishing datasets for specific domains or applications.

Received on Wednesday, 17 February 2016 18:22:17 UTC