See also: IRC log
jenit: standing issue-driven agenda
<JeniT> https://github.com/w3c/csvw/issues/223
we started last week around unions of datatypes
Allowing "unions" of datatypes? #223
logger, pointer?
jumbrich: we tried to get some hints on frequency of datatypes in cols
we found our code slow, small/strange bugfeatures
Starting at 93000 files, parsed until lunchtime around 10k.
Most prominently noticed that datatypes are mixed with just strings and integer / numerical values, and that we had diff types of numerical values e.g. integers and floats
jenit: how freq or common is that?
jumbrich: roughly 1/3 of docs has a mix of char strings vs numerical values
we parsed 9000 docs. In 2082 we found an alpha value i.e. chars, and a float combination
danbri: how many (indep.) srcs are these things from?
jumbrich: good point, couple files are from same host, … eg. with same date. A lot from some brazilian portal who have a file per month. We looked at some gov uk…
… for now most of the files they probably belong to a small set of providers. Might be biased in that way.
as I wrote in the mail there is some risk of bias, hope for more next week
nothing in https://lists.w3.org/Archives/Public/public-csv-wg/2015Feb/ yet
[aside: jumbrich can you take a look at http://danbri.org/words/2012/03/08/776 … I tried to workaround the bias issue with some rdf vocab stats]
<JeniT> https://github.com/w3c/csvw/issues/223
<JeniT> sorry, doorbell brb
<JeniT> back
[discussion of null]
<Zakim> jtandy, you wanted to ask about multiple null values
jtandy: sometimes the reason I'll use multiple null values, … each letter I use for a null is for a diff reason.
would be useful to have this in the data
…if it says M it means null for this reason, N for this reason, … etc
but of course if it is a null value, the uri template gets to work on the canonicalized value, so M becomes null field, …
<JeniT> see https://github.com/w3c/csvw/issues/218
in a URI template, null is always null
jenit: union datatypes coudl give some way out
… define a datatype w/ some string value
gregg: how about literal from some other col?
… could imagine source something or other, including name of col as way of retrieving, e.g. virtual variables
… seems like a lot of complexity
jenit: let's open a separate issue for this.
gkellogg: impl of multiple datatypes is straightforward
the complexity maybe the merge language since a datatype is an atomic value
which does not otherwise take an array
that's the potential complexity - i.e. how it complicates the merge language
jenit: we'll revisit this again next week w/ more data from jürgen and ivan back
<JeniT> https://github.com/w3c/csvw/issues/260
jtandy: we have core annos defined for tables
… e.g. to define the source, rows, cols, tables, … but not for table groups.
… given that resources is a mandatory property should it not be handled as a table group annotation?
jenit: makes sense
+1
<jtandy> +1
<gkellogg> +1
<JeniT> +1
<DavideCeolin> +1
<jumbrich> sorry
<JeniT> https://github.com/w3c/csvw/issues/261
jtandy: re serializations, we talked about standard mode, and a minimal mode
latter with just the resources described _in_ the table, without all the other paraphanalia
… for latter we needed a property linking table to the things described by the table
e.g. describes
<JeniT> https://tools.ietf.org/html/rfc6892
… i didn't find one, but there must be similar out there.
maybe rdf data cube has such?
gkellogg: my interpret. of the minimal mode, is that it would not include any table data at least in the rdf
<JeniT> http://www.w3.org/TR/powder-dr/#assoc-linking
it would simply be the results
jtandy: i expected min mode to be a block describing table as dataset, and then have something saying 'all of these resources are described in this dataset'
jenit: either way you'll need this relationship
danbri: yup powder or something from skos/schema/foaf/whatever for the non-minimal. For minimal, focus on 'what the table tells you about the world'. Row links aren't minimal.
jenit: let's open an issue on this
jtandy: we discussed minimal standard and maybe another
<JeniT> https://tools.ietf.org/html/rfc6892
jenit: see also rfc 6892 'describes'
<JeniT> “In accordance with the ATOM specification [RFC4287], the describedby relationship is a relative URI, the base of which is http://www.iana.org/assignments/relation/ - i.e. the full URI of describedby is http://www.iana.org/assignments/relation/describedby - and this is included in the ATOM registry [AREG].”
(and in foaf we have xmlns.com/foaf/spec/#term_isPrimaryTopicOf … for things to docs and doc fragments)
(dc has 'subject', schema has 'about', …)
jenit: to close this issue, we'll need to pick something
jtandy: we haven't used powder or foaf so far in output
gkellogg: void would make a lot of sense
jenit: but void describes a linked data dataset
jtandy: i also agree that the rel between a void dataset and the thing is weak
… all the triples belong to this dataset implicitly, so a little weird.
jenit: I suggest, dc subject isn't strong enough, foaf would be the wrong vocabulary, so it comes down to schema.org or using the iana link relation
(http://schema.org/Dataset (dcat inspired) is in schema, but no notion of Table or Row in there yet)
jenit: so I'll propose isDescribedBy to have a candidate
gkellogg: except url doesn't link to anything useful
<gkellogg> +1
jtandy: which is kind of irritating
jenit: so schema.org/about ? …
also doesn't quite feel right
jenit: will re-propose using linked relation isDescribedBy from POWDER
<JeniT> PROPOSAL: use http://www.iana.org/assignments/relation/describedby, as defined by POWDER, as relationship between entities & the table they are described within
( somewhat related - https://github.com/schemaorg/schemaorg/issues/301 )
jtandy: so this is rel from resource to row?
<gkellogg> +1
jenit: yes
+1
jtandy: how would that work in the json serialization?
… you'd need a row object with all the things it describes in that row?
jenit: you'd maybe want described by property, maybe with a prefix on it, to avoid name clashes, ...
(discussion of fwd/back relation direction)
gkellogg: even from rdf, if you are filtering on subjects or entity types, having the row refer to the subjects, … vs other way, seems like it allows similar filtering
…and of course they're all triples.
jenit: is that a proposal to use 'describes' instead?
gkellogg: I guess 'describes' [from that rfc] would make more sense
jenit: i think yes in the json you don't need the specific rel as containment can clarify this
status of the rfc - informational
RFC-6892
danbri: would it be normative?
jenit: just saying use it as a property - not making normative statements about it?
jtandy: the full ref to this would be www.iana.org/assignments/relations/describes ?
yup
(presumably http or https :// )
jenit: this would/could make it easier to indicate what the table contains
<JeniT> PROPOSAL: use http://www.iana.org/assignments/relations/describes, as defined by RFC6892, as the relationship between a table/row and the entities that are described wtihin it
<gkellogg> +1
<jtandy> +1
<JeniT> +1
<DavideCeolin> +1
+1
<jumbrich> +1
<gkellogg> "The relationship A 'describes' B asserts that resource A provides a description of resource B. There are no constraints on the format or representation of either A or B, neither are there any further constraints on either resource."
jtandy: and the prefix would be what?
gkellogg: [missed]
... how does rdf reference link relations?
not in http://www.w3.org/2011/rdfa-context/rdfa-1.1
powder is in there
it points to http://www.iana.org/assignments/link-relations/link-relations.xml
jenit: I would use 'rel'
or http://www.iana.org/go/rfc5988:foo ?
er rfc5988:foo
jenit: it does not exactly matter exxcept in examples
jtandy; when i use it in plain json, do i use unqualified 'describes'? no prefix?
jenit: you could use document containment hierarchy?
jtandy: you need a property to attach the array
jenit: yes then 'describes' without prefix
<JeniT> https://github.com/w3c/csvw/issues/245
"Separate RDF savy implementations? #245"
jenit: would be better with Ivan here
gkellogg: also https://github.com/w3c/csvw/issues/193 ("Documenting the processing steps for metadata #193") but also needs Ivan really.
… main point on that is that there is a finalization step to the metadata, pieces that are logically there are substantialized
e.g. name/title of a col
… what we have is consistent with the way we have been describing things so far.
… if we look at this as a consensus process, I think Ivan has a different feeling on that and I'd be wary of closing this issue without his involvement.
jenit: I see it as same result either way, comes down just to how it is described in the doc and whether it is clear to people. Let's revisit this once other changes are merged in, probably after we publish the next drafts. Reasonable?
gkellogg: I think what we should revisit is the finalization
… important for next set of pubs. Separate that bit about finalizing. There's text to review, … if we can include this info sooner and preserve finalization for a later round.
jenit: yes
… as long as we have covered saying what property values are
… and revisit after seeing the draft
(jenit to progress this issue)
aob?
jtandy: am planning a chunk of work on the conv docs this week, which should get through my resolved-but-not-yet-actioned issues.
jenit: great
… when done we can all review before aiming to publish as discussed end of March.
Adjourned.