CSV on the Web Working Group Teleconference -- 18 Mar 2015

<danbri1> jtandy: for json conversion. during csv into tabular mapping, urls get expanded. when we want them back into json they need to be compacted back again if they're in rdfa initial context, or using rdf type

jtandy: pointed out by ivan and gkellogg; during mapping of CSV into model URLs are expanded.

<danbri1> gkellogg; yes, things that are in the vocab range need to be recompacted

<danbri1> make them pretty again

<danbri1> compaction is really as simple as finding a prefix that has tht url and substituting

ivan: the only minor disagrement is on where to put it; I feel that compaction is used for JSON only, but the concept may be useful elsewhere.

<danbri1> gkellogg: we could do it in the metadata doc, e.g. in appendix, but q is when to stop. Other versions might want to compact urls relative to the document base. Would you put that in there?

<danbri1> gkellogg: example of foaf vs foaf:foo

<danbri1> gkellogg: if you compact something ending ':' would you omit it?

<danbri1> why don't i put it in, see if people object to it as complex

jtandy: to confirm, we’ll add a section on URL compaction to the appendix to the metadata document on JSON-LD

… I’ll also need to update the examples for JSON output, because the schema.org MusicEvent listing uses rdf:type.

<danbri1> https://github.com/w3c/csvw/labels/Requires%20telcon%20discussion%2Fdecision 4!

<ivan> https://github.com/w3c/csvw/issues/344

<danbri1> JSON Common properties require removing JSON-LDisms #344

ivan: I think we’ve agreed to that in email.

jtandy: I wanted to confirm gregg’s proposal.

… I can take an action to insert that into the JSON doc.

<danbri1> https://github.com/w3c/csvw/issues/342

<danbri1> Behavior of Table Groups when starting with CSV #342

<JeniT> (what if it’s not a transformation)

<danbri1> gkellogg: q is … if you start the transformation from a csv, and that csv has a metadata doc which then intrudces table group with other tables

<danbri1> … do you end up converting them all?

<danbri1> or just the table associated with the initial csv

<danbri1> part of the reasoning, is that if you wanted them all suggests you'd start with the metadata doc. By starting with a CSV seems to indicate you just want that one.

<danbri1> jenit: let's consider non-transform actions eg. validation

JeniT: considering the validation requirement, though, my position is that it’s perfectly okay to decide that it has some default that is “I will only pay attenition to, and will suppress the output off …” any other CSV file.

… The question we should be answering is that if we’re creating a tabular data model, does it include such metadata or not.

<danbri1> +1 to jeni

<danbri1> (what tools actually do seems a more trivial detail, APIs and UI will handle that.)

… I think it should include the embedded metadata for all referenced CSV files, because people may point to the CSV files from web pages and try to embed graphics.

… Althought they’re pointing to a single CSV file, the implication is that they’re also interested in referenced tables which you only discover by going to that metadata.

… When we’ve got a TableGroup, you really need to see it as all CSV files together.

… It doesn’t necessarily mean that all implementations need to get all the output. In terms of the model, we get maximum flexibility by looking at all the embedded metadata.

ivan: we have to have a clear statement what a conformant application is doing. My impression is that there are two approaches depending on the application scenario.

… I don’t know how we’d put that into the spec. We could say that applications must provide the ability to choose if just one CSV file is managed, or all the others as well, but it’s difficult to decide which one.

JeniT: we have a “suppressOutput” mechanism, so that the implementation could provide defaults which are automatically set, unless overridden by the user.

… Our definition about what metadata is retrieved needs to be the more general.

<danbri1> +1

… I don’t think it’s about different levels of conformance, but just add a note.

<danbri1> gkellogg: the other metadata that could be retrieved, if we start at a csv, we get its metadata. we can then identify other csv files, and retrieve them to get their embedded metadata i.e. their titles. But we wouldn't go retrieve the other metadata files that might be associated with those CSVs.

<danbri1> So the only other info we'd be getting is the titles, right?

<danbri1> jenit: precisely what i suggest - yes

<danbri1> gkellogg: would we also read the content of those files? or just the metadata?

<danbri1> would the model include all the rows, cols, cells from all the cells in the tablegroup?

<danbri1> jenit: way i see it is that the overall model is that

JeniT: as I see it, the model is that of a table group with tables, rows, columns, cells etc.

<danbri1> what you end up with is a tablegroup with tables in it, rows, cols, cells, for all the CSVs referenced from the metadata associated from 1st file.

… However, if you have an implementation which suppresses the output of those CSV files, then even getting the embedded metadata is not of interest, and you’re not concerned about validation, then there’s no need to get that data.

… Conceptually it’s creating a model with all the tables, but practically, there’s not need to fetch that data.

ivan: can I as a user suppress output for any file expect a specified file.

JeniT: a metadata document lists all the CSV files, and you are by definition listing all the CSV files.

… It would be creating an artificial metadata file based on ?

<danbri1> danbri: any proposal to close #342 ?

<JeniT> I said that the implementation would create an artificial metadata file based on the metadata that it’s retrieved, in which the suppressOutput flag is set on all the other CSV files aside from the one you start with

<JeniT> PROPOSAL: define that a model is created based on the embedded metadata in CSV files referenced from the metadata file retrieved from the CSV file; however note that implementations can choose to automatically define overriding metadata that suppresses the output from CSV files other than the one the implementation starts with

<danbri1> gkellogg: my concern re letting impl choose things is w.r.t. testing

<danbri1> i dont' think we currently have tests with multiple established table groups

<danbri1> gkellogg: if we have such a test we need to be sure output is consistent

<danbri1> … too much app flexibility can interfere with testing

<danbri1> jtandy: public sector roles/salary example has tablegroup in it

jtandy: One example we have is roles and salaries...

<danbri1> jenit: suggest that the tests all make the assumption that there is no additional user defined metadata aside from whatever might be provided for the test

JeniT: have tests that have table groups with tables being suppressed, and one where they’re not suppressed

<danbri1> gkellogg: how woudl we express, you suggest we add an option to control if output is suppressed.

<danbri1> jenit: we have that option in the metadata already in there, suppress output property

<jtandy> (the public sector roles and salaries example has two tables not suppressed and one that it supressed)

<danbri1> gkellogg: if we have a test e.g. publi sector roles/salaries, and input file is the senior roles csv, then a conforming application would output a table group with senior roles, junior roles, adn 3rd table

<danbri1> jenit: yes

<danbri1> gkellogg: i believe we have another example too

<danbri1> resolution then to this, starting w/ a csv file is no diff from starting from the metadata file associated with it

<danbri1> jenit: yes

<jtandy> (the third table in this example is gov.uk/professions.csv ... and this is marked as supporessed)

<danbri1> prop was "define that a model is created based on the embedded metadata in CSV files referenced from the metadata file retrieved from the CSV file; however note that implementations can choose to automatically define overriding metadata that suppresses the output from CSV files other than the one the implementation starts with"

<JeniT> PROPOSAL: starting from a CSV file generates the same table model as starting from the metadata file that the CSV file references

<danbri1> +1

<DavideCeolin> +1

<jtandy> +0

+0.1

<ivan> +1

<JeniT> +1

<ivan> RESOLVED: starting from a CSV file generates the same table model as starting from the metadata file that the CSV file references

<danbri1> 2 left in https://github.com/w3c/csvw/labels/Requires%20telcon%20discussion%2Fdecision

<danbri1> https://github.com/w3c/csvw/issues/306 The table annotations in the csv2* documents? #306

jtandy: when I did the first big edit on the CSV conversion documents, we talked about callouts and examples separate. We also talked about restructuring examples so that example snippets went into the body, and the examples themselves into an annex.

… do we still want to do that given the short amount of time left?

ivan: the goal of publishing now is to have all technical issues solved. If there are editorial/beautification left till later, that should be okay.

<JeniT> https://rawgit.com/w3c/csvw/hiding-annotation-details/csv2json/index.html#example-countries

… That being said, before this call, I played with some JavaScript stuff to hide these detailed annotations; have a look at that to see if it’s useful.

<JeniT> +1 to having these buttons to show/hide the detailed model

ivan: if the default is “show”, it would show up in epub/print.

… Some epub readers understand these things, some done.

jtandy: just looking at it now, it does get rid of a lot of tables.

<JeniT> +1 yes please ivan

ivan: I’ll take care of this tomorrow with a PR.

<danbri1> We should also consider the accessibility aspect. i.e. if you're on a reader, how to skip these verbose tables.

jtandy: when gkellogg restructure the roles document, I missed the oportunity to put the examples back in. I’ll do that later.

<ivan> https://github.com/w3c/csvw/issues/175

<danbri1> https://github.com/w3c/csvw/issues/175 Need default metadata #175

ivan: JeniT asked me to go through and see if all defaults are settled. I believe it is, however, I try to find in the document an answer on what happens if I have a CSV file with one row being the title.

… The implication that this should be the expanded metadata is there, but it seems to be hidden. This is probably the most likely usage of embedded metadata.

… I wonder if this should not be considered “standard” embedded metadata, and not just an implementation thing.

<JeniT> http://w3c.github.io/csvw/syntax/#embedded-metadata

JeniT: do you think it would work to have a section on embedded metadata to have an example with a very simple example?

ivan: that would help a lot.

… My question is, when the CSV file has only one header line whether that should be considered normative or not.

<danbri1> i.e. an empty table?

<jtandy> (the simple example with one header line is at http://w3c.github.io/csvw/csv2json/#example-countries)

<JeniT> http://w3c.github.io/csvw/syntax/#headers

… If the dialect says there is one header line, i.e. the simplest dialect, in that case, embedded metadata MUST be used as column titles.

JeniT: 6.4.1 says that it should contain the names of columns.

ivan: that’s informative, not normative.

<danbri1> proposal: it's fine. close the issue. celebrate and publish.

JeniT: we’d need to change the charter to do something normative around CSV files.

ivan: JeniT proposed to add an additional example into 4.2, which I agree with.

<danbri1> bye!

jtandy: the “countries” example used in the conversion document is nice and simple.

<jtandy> see http://w3c.github.io/csvw/csv2json/#example-countries

<danbri1> https://github.com/w3c/csvw/issues/175

<ivan> "Discussed on 2015.3.18, agreed to add a simple example into the syntax document at section 4.2 showing a table with a single header line"

<danbri1> https://github.com/w3c/csvw/labels/Requires%20telcon%20discussion%2Fdecision

<ivan> https://github.com/w3c/csvw/labels/Requires%20telcon%20discussion/decision

<danbri1> "Bummer, we couldn't find anything."

<danbri1> ivan: publishing by end of March is probably too tight, but 1st week April is realistic.

ivan: there are still some editorial things to be done; I think publishing by the end of march will be a bit too tight. But the first week of April is realistic.

<danbri1> Formally these are drafts

… In any case, it’s just a draft.

… Not a formal “Last Call”

… For now the goal is to have the specs out; we will have to show that there is wide review, so they’ll need to go out to specific groups for comment.

… The hight-priority thing is to have the docs done and published. Also, we need to be more careful about important changes we’ve made.

<Zakim> jtandy, you wanted to ask if there's appetite to refactor csv2rdf in a similar way to csv2json?

jtandy: I just want to make the point that the use case document still needs to have requirements elaborated, but that shouldn’t hold up publication.

… We’ve refactored csv2json; who’d have thought that the rdf implementation would be simplier!

<jtandy> :-)

ivan: it’s strange but true that in the RDF case, generation is relatively simple. The point is that the complicated stuff for RDF is done by that environment.

… serializing to Turtle needs to go through the algorithim similar to what’s in the JSON document.

CSV on the Web Working Group Teleconference

18 Mar 2015

Attendees

Contents

Summary of Action Items