DataStore, Layers and legacy files from Paul Klink on 2015-02-25 (public-csv-wg@w3.org from February 2015)

From: Paul Klink <paul@klink.id.au>
Date: Wed, 25 Feb 2015 19:40:09 +1100
To: public-csv-wg@w3.org
Message-ID: <54ED8A69.8030803@klink.id.au>

Hi all,

I am the guy working on the "Fielded Text" standard mentioned in
December. I just did an update to the standard
(http://www.fieldedtext.org/Standard) and while doing so, gave some
thought regarding the aims of Fielded Text compared to the aims of "CSV
on the Web".

Fielded Text focuses on standardising the encoding and decoding of
tabular data into text for transport purposes. It aims to support as
wide as possible range of text formats (delimited and fixed width) and
to provide as much compatibility with existing text files as possible.
The Meta in fielded text is limited to only that which is needed for
encoding and decoding. It recognises attributes or behaviour that are
implicit in the text files with tabular data (eg. headings, comments,
null values) and only adds a few considered essential to encoding (eg.
typed fields, field names and Ids). It also adds some attributes to
support round tripping (eg. write formats).

A good analogy to Fielded Text is string encoding/decoding. If you want
to move text from one system to a different system, you will encode it
to one of the well known formats (say UTF-8 or a MBCS). The person at
the other end will then able to decode it using standardised methods to
import the text into their system.

As I see it, "CSV on the Web" is more focused on publishing (as opposed
to transport). The Meta data for "CSV on the Web" assigns a far greater
number of attributes to the tabular data. The aim with this seems to be
to provide more information about the data within the files, describe
linkages between files, assist with transformations and control access
to them. In my view it seems like it's aiming to be a Text Database
focused on publishing, using CSV as the data store.

After I considered the above, I realised that Fielded Text covers a
subset of "CSV on the Web". Specifically access to tabular data in the
data store.

In .NET Microsoft defined a couple of interfaces which could be
construed as providing layers to data store access. These are
IDataRecord and IDataReader.

These are documented at:
-
https://msdn.microsoft.com/en-us/library/system.data.idatarecord%28v=vs.110%29.aspx
-
https://msdn.microsoft.com/en-us/library/system.data.idatareader%28v=vs.110%29.aspx

It was surprisingly easy to implement these interfaces in my
implementation of FieldedText:
http://sourceforge.net/p/tfieldedtext/code/ci/default/tree/delphi/2/Xilytix.FieldedText.DotNetDataReader.pas

After having said all of the above, here is a suggestion.

If the "CSV on the Web" defined layers similar to the above for
accessing the Data Store, other standards such as Fielded Text could be
used to specify the implementation of the Data Store.

For example, "CSV on the Web" Meta would define a field's name, data
type and headings and then Fielded Text's Meta would define how that
field is actually stored (Delimited or Fixed Width, delimiter character,
format picture strings).

The upside would be access to different types of data stores,
potentially providing access to a large number of 'legacy' text files.
The downside is that the standard is less constrained and
implementations are more difficult to implement or may not provide
complete coverage.

Anyway, I am just floating it as an idea. Hopefully you consider it
relevant.

Regards
Paul

Received on Wednesday, 25 February 2015 08:40:34 UTC