CSV on the Web WG Telco

<JeniT> https://github.com/w3c/csvw/issues?q=is%3Aopen+is%3Aissue+label%3A%22Requires+telcon+discussion%2Fdecision%22

jenit: standing issue-driven agenda

<JeniT> https://github.com/w3c/csvw/issues/223

we started last week around unions of datatypes

Allowing "unions" of datatypes? #223

logger, pointer?

jumbrich: we tried to get some hints on frequency of datatypes in cols

we found our code slow, small/strange bugfeatures

Starting at 93000 files, parsed until lunchtime around 10k.

Most prominently noticed that datatypes are mixed with just strings and integer / numerical values, and that we had diff types of numerical values e.g. integers and floats

jenit: how freq or common is that?

jumbrich: roughly 1/3 of docs has a mix of char strings vs numerical values

we parsed 9000 docs. In 2082 we found an alpha value i.e. chars, and a float combination

danbri: how many (indep.) srcs are these things from?

jumbrich: good point, couple files are from same host, … eg. with same date. A lot from some brazilian portal who have a file per month. We looked at some gov uk…

… for now most of the files they probably belong to a small set of providers. Might be biased in that way.

as I wrote in the mail there is some risk of bias, hope for more next week

nothing in https://lists.w3.org/Archives/Public/public-csv-wg/2015Feb/ yet

[aside: jumbrich can you take a look at http://danbri.org/words/2012/03/08/776 … I tried to workaround the bias issue with some rdf vocab stats]

<JeniT> https://github.com/w3c/csvw/issues/223

<JeniT> sorry, doorbell brb

<JeniT> back

[discussion of null]

<Zakim> jtandy, you wanted to ask about multiple null values

jtandy: sometimes the reason I'll use multiple null values, … each letter I use for a null is for a diff reason.

would be useful to have this in the data

…if it says M it means null for this reason, N for this reason, … etc

but of course if it is a null value, the uri template gets to work on the canonicalized value, so M becomes null field, …

<JeniT> see https://github.com/w3c/csvw/issues/218

in a URI template, null is always null

jenit: union datatypes coudl give some way out

… define a datatype w/ some string value

gregg: how about literal from some other col?

… could imagine source something or other, including name of col as way of retrieving, e.g. virtual variables

… seems like a lot of complexity

jenit: let's open a separate issue for this.

gkellogg: impl of multiple datatypes is straightforward

the complexity maybe the merge language since a datatype is an atomic value

which does not otherwise take an array

that's the potential complexity - i.e. how it complicates the merge language

jenit: we'll revisit this again next week w/ more data from jürgen and ivan back

<JeniT> https://github.com/w3c/csvw/issues/260

jtandy: we have core annos defined for tables

… e.g. to define the source, rows, cols, tables, … but not for table groups.

… given that resources is a mandatory property should it not be handled as a table group annotation?

jenit: makes sense

<jtandy> +1

<gkellogg> +1

<JeniT> +1

<DavideCeolin> +1

<jumbrich> sorry

<JeniT> https://github.com/w3c/csvw/issues/261

jtandy: re serializations, we talked about standard mode, and a minimal mode

latter with just the resources described _in_ the table, without all the other paraphanalia

… for latter we needed a property linking table to the things described by the table

e.g. describes

<JeniT> https://tools.ietf.org/html/rfc6892

… i didn't find one, but there must be similar out there.

maybe rdf data cube has such?

gkellogg: my interpret. of the minimal mode, is that it would not include any table data at least in the rdf

<JeniT> http://www.w3.org/TR/powder-dr/#assoc-linking

it would simply be the results

jtandy: i expected min mode to be a block describing table as dataset, and then have something saying 'all of these resources are described in this dataset'

jenit: either way you'll need this relationship

danbri: yup powder or something from skos/schema/foaf/whatever for the non-minimal. For minimal, focus on 'what the table tells you about the world'. Row links aren't minimal.

jenit: let's open an issue on this

jtandy: we discussed minimal standard and maybe another

<JeniT> https://tools.ietf.org/html/rfc6892

jenit: see also rfc 6892 'describes'

<JeniT> “In accordance with the ATOM specification [RFC4287], the describedby relationship is a relative URI, the base of which is http://www.iana.org/assignments/relation/ - i.e. the full URI of describedby is http://www.iana.org/assignments/relation/describedby - and this is included in the ATOM registry [AREG].”

(and in foaf we have xmlns.com/foaf/spec/#term_isPrimaryTopicOf … for things to docs and doc fragments)

(dc has 'subject', schema has 'about', …)

jenit: to close this issue, we'll need to pick something

jtandy: we haven't used powder or foaf so far in output

gkellogg: void would make a lot of sense

jenit: but void describes a linked data dataset

jtandy: i also agree that the rel between a void dataset and the thing is weak

… all the triples belong to this dataset implicitly, so a little weird.

jenit: I suggest, dc subject isn't strong enough, foaf would be the wrong vocabulary, so it comes down to schema.org or using the iana link relation

(http://schema.org/Dataset (dcat inspired) is in schema, but no notion of Table or Row in there yet)

jenit: so I'll propose isDescribedBy to have a candidate

gkellogg: except url doesn't link to anything useful

<gkellogg> +1

jtandy: which is kind of irritating

jenit: so schema.org/about ? …

also doesn't quite feel right

jenit: will re-propose using linked relation isDescribedBy from POWDER

<JeniT> PROPOSAL: use http://www.iana.org/assignments/relation/describedby, as defined by POWDER, as relationship between entities & the table they are described within

( somewhat related - https://github.com/schemaorg/schemaorg/issues/301 )

jtandy: so this is rel from resource to row?

<gkellogg> +1

jenit: yes

jtandy: how would that work in the json serialization?

… you'd need a row object with all the things it describes in that row?

jenit: you'd maybe want described by property, maybe with a prefix on it, to avoid name clashes, ...

(discussion of fwd/back relation direction)

gkellogg: even from rdf, if you are filtering on subjects or entity types, having the row refer to the subjects, … vs other way, seems like it allows similar filtering

…and of course they're all triples.

jenit: is that a proposal to use 'describes' instead?

gkellogg: I guess 'describes' [from that rfc] would make more sense

jenit: i think yes in the json you don't need the specific rel as containment can clarify this

status of the rfc - informational

RFC-6892

danbri: would it be normative?

jenit: just saying use it as a property - not making normative statements about it?

jtandy: the full ref to this would be www.iana.org/assignments/relations/describes ?

yup

(presumably http or https :// )

jenit: this would/could make it easier to indicate what the table contains

<JeniT> PROPOSAL: use http://www.iana.org/assignments/relations/describes, as defined by RFC6892, as the relationship between a table/row and the entities that are described wtihin it

<gkellogg> +1

<jtandy> +1

<JeniT> +1

<DavideCeolin> +1

<jumbrich> +1

<gkellogg> "The relationship A 'describes' B asserts that resource A provides a description of resource B. There are no constraints on the format or representation of either A or B, neither are there any further constraints on either resource."

jtandy: and the prefix would be what?

gkellogg: [missed]
... how does rdf reference link relations?

not in http://www.w3.org/2011/rdfa-context/rdfa-1.1

powder is in there

it points to http://www.iana.org/assignments/link-relations/link-relations.xml

jenit: I would use 'rel'

or http://www.iana.org/go/rfc5988:foo ?

er rfc5988:foo

jenit: it does not exactly matter exxcept in examples

jtandy; when i use it in plain json, do i use unqualified 'describes'? no prefix?

jenit: you could use document containment hierarchy?

jtandy: you need a property to attach the array

jenit: yes then 'describes' without prefix

<JeniT> https://github.com/w3c/csvw/issues/245

"Separate RDF savy implementations? #245"

jenit: would be better with Ivan here

gkellogg: also https://github.com/w3c/csvw/issues/193 ("Documenting the processing steps for metadata #193") but also needs Ivan really.

… main point on that is that there is a finalization step to the metadata, pieces that are logically there are substantialized

e.g. name/title of a col

… what we have is consistent with the way we have been describing things so far.

… if we look at this as a consensus process, I think Ivan has a different feeling on that and I'd be wary of closing this issue without his involvement.

jenit: I see it as same result either way, comes down just to how it is described in the doc and whether it is clear to people. Let's revisit this once other changes are merged in, probably after we publish the next drafts. Reasonable?

gkellogg: I think what we should revisit is the finalization

… important for next set of pubs. Separate that bit about finalizing. There's text to review, … if we can include this info sooner and preserve finalization for a later round.

jenit: yes

… as long as we have covered saying what property values are

… and revisit after seeing the draft

(jenit to progress this issue)

aob?

jtandy: am planning a chunk of work on the conv docs this week, which should get through my resolved-but-not-yet-actioned issues.

jenit: great

… when done we can all review before aiming to publish as discussed end of March.

Adjourned.

CSV on the Web WG Telco

25 Feb 2015

Attendees

Contents

Summary of Action Items