CSV on the Web Working Group Teleconference -- 12 Nov 2014

see also http://lists.w3.org/Archives/Public/public-csv-wg/2014Nov/0029.html (link added to agenda wiki)

<danbri> http://lists.w3.org/Archives/Public/public-csv-wg/2014Nov/0029.html

<danbri> <- issue list

danbri: trying to be more driven by GitHub issues.

… We’ll postpone message issues until JeniT comes on.

<danbri> i'll look up the others later

Discussion Issues from Mappings

jtandy: ivan is on route to Australia.

… We can talk on teleconf when no progress made on GitHub.

<danbri> https://github.com/w3c/csvw/issues/62 Should the RDF/JSON

<danbri> transformation check the values?

<danbri> - CSV to JSON mapping, CSV to RDF mapping

… first: issue #62

… we thought the mapping issues could assume all parsing has already been done.

<danbri> (see github for detailed text, we are not reminuting it all)

… question is, if we can’t perform a transformation, do we proceed or die?

… My thought is that we can assume parsing has already thrown out bad data.

danbri: I think this touches on what is being defined: conformance criteria or behavior.

… If it’s a class of software, we should just do the easiest thing. Advanced implementations might do better without being non-conformant.

… A perfectly conformant processor could just pass everything through.

<danbri> eg. we might say "conformant processors are not required to …" so we only require pass through

<danbri> gregg: it adds complexity/branching to testing

jtandy: summarizing gkellogg, in order to give us a spec which is testable, we’ll go with the pass-through.

<danbri> dan proposing simple pass through

<danbri> gregg: advanced processing needs to be under control of a flag, so all processors can be guaranteed to give same output from same input

jtanday: I’ll take an action to update GitHub.

<danbri> I suggest a "Resolution (of those present): " since we have low turnout

resolution of those present: different processor flags to control parsed or raw output.

… pass through literal values in conformant mode, and allow advanced processors to do some contextual parsing/fixing

<danbri> conformant mode passes through literal values, advanced processors may offer additional contextual checking/fixing (via flags)

<danbri> gregg: additionally, processors may support an advanced processing flag, which will allow us to test that advanced processors produce consistent output (if that's not over-constraining them)

<danbri> jtandy to capture this into github issue

jtanday: I’ll pass that through, and when we come to actual implementations, we’ll check back.

issue 61:

what should the mapping of an empty cell be for RDF and JSON

danbri: 61 not flagged for discussion

<danbri> https://github.com/w3c/csvw/issues/59 How should ``language`` be

<danbri> used in RDF mapping?

<danbri> - CSV to RDF mapping

issue 59: how should langauge be used in JSON mapping?

<danbri> Ivan wrote " • If the content of the cell is not datatyped, and is not a URI, and the language tag's value is not "en", then the generated object should be a language literal with the global language tag set."

jtandy: I’ll come back on that one.

… “How should the locale setting be used in the default mapping?”

… unless the data itself is saying the language of a cell is different, the language of the metadata should be applied to every literal.

<danbri> gregg: lets say default lang was english from default mapping, does the json now tag all its values with English, or is that assumed and untagged?

<danbri> (non-jsonld, normal colloquial non-rdfy json)

<danbri> jtandy: "plain old json"

<danbri> jtandy: suggesting … we transform verbatim and don't add locale info

… I proposed that Plain Old JSON (“POJ”) we don’t put in the default mapping into the output.

… So, the information in the metadata says it’s German, but that is not reflected in the output.

issue 59

… we assume people can determine this from the complementary language mapping.

<jtandy> PROPOSAL: for RDF mapping, apply locale / language tag from metadata to all literal values in output

jenit: I think it’s all string values, because the number 2 is a literal string value without language.

<danbri> +1

<bill-ingram> +1

<JeniT> +1

<jtandy> +1

<danbri> revisiting https://github.com/w3c/csvw/labels/Requires%20telcon%20discussion/decision

RESOLUTION: for RDF mapping, apply locale / language tag from metadata to all literal string values in output

<jtandy> PROPOSAL: for (plain old) JSON mapping, no locale information is added to the JSON output - we assume that people will look at the complimentary metadata for locale information

<danbri> gregg: in our discussion we talked about locale coming from mapping info, as opposied to locale info that might come from the data itself

<danbri> e.g. use of a particular col or diff lang. Are you proposing that we drop such from the JSON output also?

<danbri> jtandy: not sure how i'd write locale info in plain json

<danbri> gregg: you could use JSON-LD ...

<danbri> …but simple needs to be simple; people can use json-ld etc if they're more ambitious

<danbri> jtandy: agree

jtandy: if they care about localle, they should use RDF mapping with JSON-LD serialization.

<danbri> jenit: I think the metadata will say what the lang cols are in,

jenit: the could always trace it back from the original data. I think in the JSON mapping, the’ll use a property name to indicate that, or it would otherwise be implicit.

<jtandy> (if you want to say a particular locale for plain old json, you might say "property_en" or "property_fr")

<JeniT> +1

<jtandy> (e.g. the property has a human readable hint)

<jtandy> +1

<danbri> +1

<bill-ingram> +1

<jtandy> RESOLVED: for (plain old) JSON mapping, no locale information is added to the JSON output - we assume that people will look at the complimentary metadata for locale information

https://github.com/w3c/csvw/issues/39 What should be generated for a value with datatype in the case of JSON

subtopic: issue 39: What should be generated for a value with datatype in the case of JSON

<danbri> (where JSON is plain old JSON)

jenit: similar to language: how much structure to put in the output.

… I suggested recognizing boolean, numbers and null, and otherwise just map to a string.

<danbri> jenit: (in github), "Given that we're aiming for a simple JSON mapping for simple JSON users, I think the first option above is the right one: map to a simple string, number, boolean (or null) as appropriate for the datatype."

<danbri> proposal: re #39 for simple JSON we map to a simple string, number, boolean (or null) as appropriate for the datatype.

<danbri> +1

<jtandy> +1

<JeniT> +1

<bill-ingram> +1

<danbri> resolved: re irc://irc.w3.org:6667/#39 for simple JSON we map to a simple string, number, boolean (or null) as appropriate for the datatype.

https://github.com/w3c/csvw/issues/30 How to interpret fixed string type values ("Table", "Row",...)

subtopic: issue 30: How to interpret fixed string type values ("Table", "Row",...)

jtanday: I assume they’ll figure this out based on context.

danbri: propose we just endorse the editorial decision.

jtanday: we’re trying to make JSON as “brutally simple” as possible.

<danbri> proposal: #30 aiming for json mapping to be super simple, we endorse the 2nd option as currently implemented by editors

<bill-ingram> +1

<danbri> +1

jenit: I’d suggest the authors consider if anything can be considered tables columns. I’m not sure where the typed table mapping applies.

<JeniT> +1

<jtandy> +1 ... noting that further thought is required about whether things should be declared @type

<danbri> yup

<danbri> https://github.com/w3c/csvw/issues/20 Is row by row processing sufficient?

subtopic: issue #20: Is row by row processing sufficient?

https://github.com/w3c/csvw/issues/20 Is row by row processing sufficient?

<jtandy> resolved: #30 aiming for json mapping to be super simple, we endorse the 2nd option as currently implemented by editors

danbri: I propose that we know many CSV mappings have interdependent rows, for now we’re going to go with row-by-row mapping.

<danbri> ie. we push work onto preprocessors etc

<danbri> … and advanced mappings

jtandy: I think we discussed different alternatives, but agreed that the simple mapping is definitely row-by-row, but perhaps the templated mapping might want to consider holdover values from previous rows.

jenit: perhaps a flag in the metadata saying take it from the previous row seems relatively simple, but I’m happy to keep it super-simple for now.

jtandy: I’d suggest people just pre-process the CSV to populate those blanks.

<jtandy> proposal: keep things simple - row by row processing only

<bill-ingram> +1

<jtandy> +1

<danbri> +1

<jtandy> resolved: keep things simple - row by row processing only

<JeniT> +1

<danbri> revisiting https://github.com/w3c/csvw/labels/Requires%20telcon%20discussion/decision

jenit: can we take issues in reverse order?

https://github.com/w3c/csvw/issues/23

<danbri> https://github.com/w3c/csvw/issues/23 CSV Dialect Description

subtopic: issue 23:

jenit: acting on the F2F about trying to map over the flags we describe in the syntax doc into the dialect description within metadata.

… but, also trying to get consistency from the datapackage, such as header

<JeniT> http://w3c.github.io/csvw/metadata/#dialect-descriptions

<JeniT> PROPOSAL: we can close https://github.com/w3c/csvw/issues/23 as it’s sufficiently addressed by current draft

<danbri> +1

<bill-ingram> +1

<jtandy> +1

<JeniT> +1

<JeniT> RESOLVED: we can close https://github.com/w3c/csvw/issues/23 as it’s sufficiently addressed by current draft

https://github.com/w3c/csvw/issues/54

<danbri> Using JSON-LD for the metadata document

<danbri> oh sorry

<danbri> "Pattern string formats for parsing dates/numbers/durations", rather.

jenit: we discussed using pattern strings for parsing dates, numbers, durations, … in CSV files based on some kind of localle.

<JeniT> http://www.unicode.org/reports/tr35/

<danbri> see http://www.unicode.org/reports/tr35/

… The i18n guys pointed us at tr35, which describes the kind of format for pattern strings and relation to different localles.

… having looked at it, it’s quite complicated, and I think we could cleanly drop that as a 1.0 requirement, and layer it on as something extra that implementations can play around with during the 1.0 period to be considered for 2.0

… We had previously agreed to try to do this; there are strong requirements for parsing different dates and numbers, so I’m a bit uncomfortable dropping it

jtandy: I’ve not looked at the ISO datetime standard, but I understand that it includes a number of structures for how dates and times are recognized. Perhaps that would be a place to start.
... I think the ISO standard allows things to be changed a bit compared to XSD, but simpler than TR35.

<danbri> there was also http://www.w3.org/TR/NOTE-datetime pre-xml-schema

jenit: I don’t think it does, but it could. It may be that there’s some flexibility.

<jtandy> ACTION: jtandy to review ISO 8601 to determine if it supports 'locale' type strings for date-times [recorded in http://www.w3.org/2014/11/12-csvw-minutes.html#action01]

jenit: the other one is number format, such as using “,” instead of “.” as decimal point.

<trackbot> Created ACTION-57 - Review iso 8601 to determine if it supports 'locale' type strings for date-times [on Jeremy Tandy - due 2014-11-19].

danbri: is that covered by the dialect spec?

jenit: no, it’s not parsing the CSV into values, but parsing the values themselves.

<danbri> "1000,00"

jtanday: this is about picking up a string, which might have something like “nnn nnn,nn”, where it might be a decimal, vs a typo.

danbri: we ran into this with schema.org, and settled on the western method.

<danbri> see http://schema.org/price

jtandy: problem is, people don’t publish their data that way.

… we have a number of parsing directives, and having one to indicate decimal separator might be helpful.

<Zakim> danbri, you wanted to suggest "The CSVW Working Group considered requiring an implementation of http://www.unicode.org/reports/tr35/ pattern string formats. Given the complexity of

jenit: I was trying to think of a way forward which would enable us to make a more informed decision. I think Jtandy looking at ISO8601 would be very useful.

… I’ll look at what it takes to do the number parsing, and see if that’s something we want to go forward with.

<JeniT> ACTION: jenit to investigate what number parsing would look like if done right [recorded in http://www.w3.org/2014/11/12-csvw-minutes.html#action02]

<trackbot> Created ACTION-58 - Investigate what number parsing would look like if done right [on Jeni Tennison - due 2014-11-19].

https://github.com/w3c/csvw/issues/48

<danbri> "Using JSON-LD for the metadata document "

jenit: this is just a copy-over of the issue as it was in the document. I think it’s completely resolvable as saying “yes, we are using JSON-LD”.

… The only question is if we should rename some of the JSON-LD keywords, subh as @id and @type, so they don’t stick out.

<danbri> gkellogg: some of discussions have been to serve doc as json, provide context via a header

<danbri> … aiming to look like JSON rather than JSON-LD

<danbri> … makes some sense, aliasing those keywords

<danbri> jtandy: to clarify, … it is possible to have a json-ld that replaces @id with something else and it will all work

jtandy: I want to clarify that it’s possible to have JSON-LD that replaces @id with something else and it will work.

<danbri> jenit: that's aliasing and it is fine

jenit: yes, that’s aliasing.

jtandy: we’ll always have a context?

jenit: we’ll publish one, and people point to a context.

<danbri> drafting a proposal: "close 48: we agree our metadata files are JSON-LD, and we are taking various measures to minimise associated syntax burdens"

<danbri> [I have a hard finish in 2 mins due to another meeting.]

jenit: I propose we split the issue into two: how json-ld processors can recognize this, and the other is aliasing of keywords.

<danbri> +1

<bill-ingram> +1

<jtandy> +1

<danbri> Adjourned.

probably some messup with topic vs subtopic to be fixed in minutes.

<danbri> yeah, not sure how to edit those files but at least most of the right words are in the irc log

<danbri> thanks gregg for scribing!

<JeniT> ACTION: jenit to close https://github.com/w3c/csvw/issues/48 and to open new issues on (a) JSON-LD processors recognising metadata documents as JSON-LD and (b) aliasing of JSON-LD keywords [recorded in http://www.w3.org/2014/11/12-csvw-minutes.html#action03]

<trackbot> Created ACTION-59 - Close https://github.com/w3c/csvw/issues/48 and to open new issues on (a) json-ld processors recognising metadata documents as json-ld and (b) aliasing of json-ld keywords [on Jeni Tennison - due 2014-11-19].

<danbri> trackbot, meeting is closed

<trackbot> Sorry, danbri, I don't understand 'trackbot, meeting is closed'. Please refer to <http://www.w3.org/2005/06/tracker/irc> for help.

<danbri> trackbot, end meeting

CSV on the Web Working Group Teleconference

12 Nov 2014

Attendees

Contents

Discussion Issues from Mappings

issue 61:

issue 59: how should langauge be used in JSON mapping?

issue 59

https://github.com/w3c/csvw/issues/39 What should be generated for a value with datatype in the case of JSON

https://github.com/w3c/csvw/issues/30 How to interpret fixed string type values ("Table", "Row",...)

https://github.com/w3c/csvw/issues/20 Is row by row processing sufficient?

https://github.com/w3c/csvw/issues/23

https://github.com/w3c/csvw/issues/54

https://github.com/w3c/csvw/issues/48

Summary of Action Items