See also: IRC log
JeniT: approve previous meeting minutes?
...
...
(no comment...)
RESOLUTION: previous meeting minutes approved
DavideCeolin: on the use cases, 5 uc were
assigned to me
... jeremy already set up the requirements
... I am not sure they are really part of the others
JeniT: it would be useful to know whether there were any issues to be discussed?
DavideCeolin: one was about provenance
... but this was discussed
... but it has changed a bit
... everything related to annotation is linked to this
... the possibility to map elements to URIs
... in some spreadsheets there are codes, and it would be useful to link
them to URIs
... I pointed that out as a separate requirements
... not sure whether this is covered by others
JeniT: I agree that is a requirement
... probably is not elsewhere noted
DavideCeolin: is it covered by the external definition resource?
JeniT: keep it separate for now, and we will be able to come back to this later
DavideCeolin: there is also requirement on
unit of measure
... partially covered by the semantic requirements
... but i still kept it separated for now
JeniT: you mean that the semantic type might cover some part of the measure unit?
DavideCeolin: I am not an expert on that, I am not sure what the best way to cover this
JeniT: this may be a choice of the publisher
whether this is something to go into the semantic part
... thanks for that, it is really good
... our UC document is coming together
... anybody have any issues/questions?
...
...
(no comments)
<JeniT> http://w3c.github.io/csvw/syntax/
JeniT: following on last call I have took
the tabular data specification into the spec
... in the data model we said every column has a name, now we say every
column has an index
... the name of the column is part of the annotation, ie, the annotated
data model
<JeniT> http://w3c.github.io/csvw/syntax/#annotated-model
JeniT: the annotated model talks about the
different types of annotations (tables, column level, etc)
... these are the changes I made
... the first problem is issue 1
... is the order of rows significant in a table?
fresco: in the case you are using several table in the same place then it may be a problem
JeniT: in our model we have now one table only
yakovsh: if we follow the spreadsheet model the row order is specifically significant
JeniT: agreed
<fresco> good point that references to individual cells rely on the row and column order being maintained
stasinos: one of the ways we are discussing
is an ID which then has the properties of a fields in a row
... we can then make the requirement for that to represent a strict
order
yakovsh: rfc 7111 have a row and a column level reference
<yakovsh> http://tools.ietf.org/html/rfc7111
JeniT: any reason not to have it significant?
(no reaction...)
JeniT: in that case I will add it in
JeniT: next issue: in the annotated data
model I have annotating table, column, row, cells
... I also have annotated regions
... I have a suspicion I put it in because it looked like a useful
generalization
... do we really need it, or should I take it out?
... any use case around that?
stasinos: one thing, i remember we were
discussing a situation of a cell being, say, the sum of other cells
... in that case there is the notion of a region
... in that case we may want to talk about general regions
JeniT: that is interesting because it brings
up referencing
... referencing cells in a random manner, in a way
... this is different from the current spec which talks about subtables
... I think that is a useful thing to say
stasinos: it is hard to tell, without a really use case
fresco: referencing should be better as part of a separate specification
stasinos: that is reasonable
JeniT: in rfc 7111 we have that notion
<yakovsh> should RFC 7111 be referenced in our documents? I don't see it
<fresco> referencing abstract "table" regions (i.e. data parsed from a CSV file) vs referencing parts of a CSV file
ivan: do we really want to represent spreadsheet functionalities
stasinos: it is not the reproduction of the functionalities, it is just to characterize the raw data itself
<stasinos> to be able to specify which regions have data, and which have derivatives (of any sort)
<fresco> requirement for row headings as well as column headings, to be able to say that a row is a derivative?
JeniT: what I will do is to put in a note
that we talked about annotation regions, there may some usage, but we
can refer this to more use cases
... is that a reasonable way forward
+
+1
<stasinos> +1
<yakovsh> +1
<DavideCeolin> +1
fresco: the parser can draw up an index, and you can have headers and rows, ie, you may want to specify the nature of rows and columns
JeniT: we so have annotated rows and columns
<JeniT> http://w3c.github.io/csvw/syntax/#syntax
JeniT: next issue in section 3
... what I have done is to cut it down, it does not talk about how to
input tabular data, but only how to output
... that is the best practice of tabular data
JeniT: we are only looking only at the
output for best practice
... issue 4 tries to make it as rfc compatible as possible
... if we use the mime type, that refers to the default and usual
character sets
... this issue is that we would like to say that utf-8 is the default
yakovsh: I have discussed with the area
directors and it may be possible to amend the draft
... if there are specific suggestions for character sets
... there is also the issue of cr and lf
... i do not know about the default character set
... it is definitely possible to have a default character set if we get
a guidance from W3C
... the issue currently says that the content type header must be used
to set the character set
... I know that people do not change content type anyway, let alone
changing the character set
... so it would be great if utf-8 would be default
<JeniT> Section 4.1.1 of RFC2046 specifies that "The canonical form of any MIME "text" subtype must always represent a line break as a CRLF sequence. Similarly, any occurrence of CRLF in MIME "text" must represent a line break. Use of CR and LF outside of line break sequences is also forbidden."
<fresco> application/csv sounds like a good idea to me
JeniT: if we want to say that it is o.k. to use LF, then we have a problem using text/csv...
fresco: the old spec was ascii but all the
parser ignored that
... but the newer parsers fall back on utf8
... ie, the specification could get away to use utf8
... most people ignore the original spec
JeniT: according to the rfc we should not
call that text/csv, only application/csv
... the line ending is quite clear
yakovsh: rdc 4180 was passed with the old
mime guidelines, but those changed
... it is possible to change that
... i will go back and see what is involved
... i think it can be changes
... question: is there a byte mark if the default in utf8?
JeniT: bom is usually optional with utf8
... you do not usually have to use it
... in practice, if you use it, you get horrible characters
... I would like to avoid that
yakovsh: i will talk to the appl. working
group, csv is not only the only one that has this issue
... we will discuss that after the ietf meeting
... another question, rfc 4180 is an informational doc, if w3c really
wants that ietf could push it through as a fast track
JeniT: yes, it would be good to have a
standard for csv
... there has been other cases where the body has been done by w3c
... if we can do this that would be great
ivan: any formal step is necessary from W3C?
yakovsh: no, it should be o.k without it
<fresco> parser parameters: https://github.com/hubgit/csvw/wiki/CSV-Parser-Parameters
<JeniT> https://github.com/hubgit/csvw/wiki/CSV-Parser-Notes
fresco: there are also notes and looking at
the different parameters parsers use
... some of the things people have to specify
... there are 2-3 different sections
... character set, discussed
... dialects of the csv file (separators, white space should be trimmed
or not, how to select particular bits of the file to be used, ie, what
is a comment line, etc)
... there is also a separate set on how to transform data, that may be a
separate issue
... lot of csv parsers have these transformation fields in them
... one issue is trimming of white space
... one way is the '\', the unix way
... or the quotes, the excel way
... the quoting is in particularly for output
... that is something to specify to put things in quotes only if there
is a special character in the field
... that is basically it...
... I will clean it up
yakovsh: is there a list of application that you looked at?
<fresco> https://github.com/hubgit/csvw/wiki/CSV-Parser-Notes
fresco: the big one is a python parser, and
pandas in python
... pandas has a lot of transformation, has a multi index with several
header columns/rows
... it specifies the decimal and thousands separators
... java has a nice one
... the standard one is the php csv parser, but it uses really with the
standard case
<JeniT> https://github.com/theodi/csv-validation-research
JeniT: have you looked at this one?
fresco: the data package only specifies only
a few parameters
... there are a few more parameters in common use
... they might become useful
JeniT: how to move that into spec space?
... we could have it a standalone spec
JeniT: or roll it into the syntax spec as a separate section
fresco: on the transformation side it is
interesting whether we would use these
... three different types of information like comment and white space
whould be part of the syntax
... the transformation may be a separate specification
... it leaves us with the region specification
JeniT: the region selection should be part of the specification
stasinos: the question is how to describe a
region is another thing
... the syntax doc should have a best practice part
JeniT: my inclination to roll it into the syntax spec, with a very separate section
<stasinos> ... and a much looser, permissive part describing what can be specified
JeniT: we need a separate to spec to convert to, say, json, what we need here is how to convert that into an abstract model
... AOB?
stasinos: there is a very interesting discussion on the problem of tables transformed into RDF graphs
JeniT: we took the decision with Dan not to discuss this until we have
the UC document out
... then, yes, we will get to it
... but we have to have the basic things done