CSV on the Web Working Group Teleconference

26 Mar 2014


See also: IRC log


Andy Seaborne (AndyS), Jeni Tennison (JeniT), Gregg Kellogg (gkellogg), Ivan Herman (Ivan), David Ceolin (DavideCeolin), Yakov Shafranovich (yakovsh), Dan Brickley (danbri), Alfonso Noriega (fonso)
Jeremy Tandy, Tim Finin, Eric Stephan, Axel Polleres
Jeni Tennison
Gregg Kellogg


previous minutes

<JeniT> http://www.w3.org/2014/03/19-csvw-minutes.html

RESOLUTION: approve previous minutes

<danbri> I'll note that I requested mon/tuesday TPAC per my action.

Model for tabular data on the web

<JeniT> http://w3c.github.io/csvw/syntax/#metadata

jenit: this section is a sketch of different methods of finding a metadata document that provides metadata about a CSV, or finding it within the CSV itself.
... The metadata document can tell an application how to deal with that file, in particular, how to transform into different formats.
... In that document, there are five different methods listed with issues.
... 3.5, use a standard path

<JeniT> http://w3c.github.io/csvw/syntax/#standard-path

yakovsh: is 3.5 specifically when used with HTTP?

<danbri> http not https, ftp, gopher, … ?

yakovsh: If so, why is the standard name considered? If using HTTP, then the Link header can describe it.

<AndyS> and "file:"

jenit: yes, and 3.4 talks about using the Link header. When we discussed on the list, people felt that having a standard location relative to the CSV would be easier than controlling the Link header.

<danbri> "When retrieving a CSV file via HTTP, the default location for a metadata file that describes that CSV file is set to csv-metadata in the same directory. If this metadata file does not explicitly point to the relevant CSV file then it must be ignored."

yakovsh: Can 3.5 also be used when files are on disk? Why HTTP only?

jenit: No particular reason, and that's a good point.

<danbri> nearby: http://tools.ietf.org/html/draft-nottingham-site-meta-05

andys: I think we also need to be adhere when we're deadlining with packages of CSV files, in which case a package description file will be needed. Something to address that will be needed.
... When I mentioned being able to work out a file given a CSV file, I was thinking of one per CSV, such as given foo.csv, it might be foo.csvm.

jenit: something about being in a similar directory
... Where I've seen metadata files used with CSV, such as simple data format, or googles, the metadata file has always been describing several related CSV files.

<danbri> (this? https://developers.google.com/public-data/ -> DSPL )

jenit: I took it as a strength that a metadata file would describe several CSV files, as that matched current usage.

<JeniT> danbri: yes

<yakovsh> for favicon, here is the link to the w3c doc: http://www.w3.org/2005/10/howto-favicon

andys: that's good when there's one publisher, but CSV files may come from a number of different publishers, and the publisher is just mechanically moving them into place.

<AndyS> "the final publisher" putting up the files on the directory.

<danbri> erg

andys: In a lot of environments, it's either impossible to control, or very difficult to control in terms of technology

jenit: what about if you use a suffix on a file name; if you want to use it on all files in a directory, use a suffix on the directory name.

andys: perhaps we document both, and in an issue say that the WG is likely to pick one, so people have a warning. It should depend on actual user experience.

jenit: let's change the document to cover both cases. I think it's reasonable for both to be possible: somewhere you look for an individual CSV file, and another default location.

<ivan> +1 to Jeni, we should have several documents in a priority order

<danbri> makes sense avoiding .xyz

<ivan> +1 to danbri + jeni, too

jenit: I'm inclined to use a suffix that doesn't look like ".foo", as those are associated with different formats.

<yakovsh> +q

jenit: do we anticipate a single mime type for CSV metadata, or not? We'll take this to the list.

<Zakim> danbri, you wanted to ask about "/.well-known/" ("http://tools.ietf.org/html/draft-nottingham-site-meta-05#appendix-B.4. Why aren't per-directory well-known locations defined?")

<yakovsh> its an rfc: http://tools.ietf.org/html/rfc5785

<yakovsh> https://www.iana.org/assignments/well-known-uris/well-known-uris.xhtml

<yakovsh> the actual registry

danbri: There's an IETF draft from mnot and friends. As I understand it, it's really one place per site. I wonder if we could consider extending it to be per-directory.
... Personally, I'm not excited about well known paths, but we should look at site-map files.

jenit: yes, .well-known is one-per-site.
... Given we're trying to do something really easy, I think it's unlikely they could access either .well-known or site-map.

yakovsh: Are we sure that every OS uses file extensions? I think MacOS uses something in the file itself.

<ivan> OS X uses extensions I believe

<danbri> osx is hybrid now

jenit: I think Mac uses a combination of both. When we're talking about a default method, I don't think that's relevant.

yakovish: regarding .well-known URIs, it's tied into AWWW. It might be prudent to reach out to mnot. It's not clear how widely it's used, such as robots.txt

jenit: some of these (e.g. robots.txt) came before .well-known. I'm not sure it's a relevant notion.

<danbri> even if something's not in the .well-known/ registry, it can still provide a safe sub-namespace to put such names where they'll only clash with other would-be-well-known names, and not with publisher names

<JeniT> http://tools.ietf.org/html/draft-nottingham-uri-get-off-my-lawn-02

jenit: we'll consider a standard path and a backup, possibly using a file extension.

<JeniT> http://w3c.github.io/csvw/syntax/#link-header

jenit: Moving on to 3.4, I think this is fairly straight forward.

jenit: Just to be sure rel=describedby is the right header

andys: we have the two cases again, a description per CSV, or one for a group.

<JeniT> http://www.w3.org/TR/powder-dr/#assoc-linking

jenit: It doesn't make a case, as the Link header describes the resource (which could be multi-part?)

<danbri> describedby is registered in https://www.iana.org/assignments/link-relations/link-relations.xhtml

jenit: Perhaps we can assume that we always have a package description.

andys: if multiple people are dropping files into a directory, this might not be a good assumption.

<danbri> +1 for one type of metadata file

<danbri> we can have conventions evolve over time

jenit: Andy seemed to be saying there would be two different types of files (packages, and individual). I'm suggesting there should be just one, but sometimes the package might just have one file in it.

<danbri> there might be a few of these that get composed

<danbri> (i.e. merged)

andys: there might be one directory with mixed information. Perhaps it should be either one or the other, a package or an individual file. Going down every path might be exhausing

ivan: from a syntactic point, does describedby allow me to use a list of URIs or just one?

jenit: I think you can have multiple Link headers, with different types and locations.

ivan: that's also related to Andy's question: the various access methods. We have to allow for different routes to get metadata with a prioritization.
... In this sense, if it's one link header with a list of references, they are in priority order, and if some are metadata for the package, and some individual, falls back to priority.
... I can imagine a system setting up a standard describedby for all CSV files, and the user adds more metadata with a well-known URI.

<JeniT> http://w3c.github.io/csvw/syntax/#metadata

jenit: I tried to put in something about cascade in section 3. That might not satisfy your requirements.
... for the Link header, we should say you can have multiple link headers and that they are merged with the one at the top being the highest priority.

ivan: the problem is, what does priority really mean? Suppose it's all in RDF. The "RDF way" would say that all statements are accumulated and do not hide each other. Other systems would do occlusion.

<danbri> oops!

<danbri> 3 pixel difference between selecting a browser tab vs closing it

jenit: it says if the same property is specified in two different locations, information closer to the document should override that which is further away.

yakovsh: In RFC4180 I started defining metadata as part of the mime type. If the mime type is a good place to stash metadata?

jenit: Probably not, as it gets lost when it moves around.

<danbri> (isdescribedby seems ok to me.)


<JeniT> https://www.w3.org/2013/csvw/wiki/Conversions

danbri: it seems everyone wants to talk about RDF mappings, but we've been putting that off.
... Also, XML, JSON, ...

jenit: the best way to structure discussion is to have a spec to discuss and "kick".
... I'd like to have people step forward to edit a document and have others contribute.
... On CSV to RDF

<danbri> I'd like to help, and relay in some ideas from https://www.w3.org/wiki/WebSchemas/LookInside http://lists.w3.org/Archives/Public/public-vocabs/2013Aug/att-0061/Lookinginsidetables.html

<AndyS> We have two already -- https://www.w3.org/2013/csvw/wiki/CSV2RDF and https://www.w3.org/2013/csvw/wiki/CSV-LD

<danbri> also we have a backwards sparql proof of concept, http://svn.foaf-project.org/foaftown/2010/lqraps/lqraps.html

<Zakim> danbri, you wanted to suggest we pick some concrete CSV files (from the UC work) to focus the mapping design

jenit: I find it hard to be able to say that one direction is definitely the way to go. I think the next step is for someone the characterize the difference between the different approaches so we can have an educated discussion in order to make a discussion.

danbri: I'm feeling a bit overwhelmed by the different threads using a set of CSV files. Then we can compare different designs.

andys: There already are examples in the different examples.

danbri: I think we should have some core examples.

<danbri> what can we take from WD-csvw-ucr-20140327 ?

jenit: I think the first step is to focus on the direct mapping, i.e. with zero metadata. If we can get that down, we're in a good position.
... Who'd like to take forward direct mapping for CSV to RDF, the possibilities and advantages/disadvantages with proper examples.
... can Andys and gkellogg get together on this?

andys: I don't think this quite touches on the fundamental differences between the two approaches.
... Gregg's very much based on JSON-LD, and I'm interested in a mapping to RDF triples.

<danbri> ACTION: danbri try expressing a direct mapping expressed using http://www.w3.org/TR/2013/PR-vocab-data-cube-20131217/ [recorded in http://www.w3.org/2014/03/26-csvw-minutes.html#action01]

<trackbot> Created ACTION-10 - Try expressing a direct mapping expressed using http://www.w3.org/tr/2013/pr-vocab-data-cube-20131217/ [on Dan Brickley - due 2014-04-02].

<AndyS> http://lists.w3.org/Archives/Public/public-csv-wg/2014Mar/0140.html

andys: I found three classes of JSON-style output. I have no idea which are commonly used. I understand the first (one row to object), I understand column arrays, I don't know what the background is about turning everything into arrays without objects inside.

jenit: given that the main difference between the two approaches is about the syntax of the metadata document, I'd like to get something down as a starting point, being just a direct mapping. This would be really helpful.
... Andy, if you could do this?

gkellogg: why don't we work together.

andys: I'd like feedback on what I've written.

jenit: something in ReSpec on GitHub; copy/paste is fine.

andys: I'm looking at a mapping to RDF, gregg's looks to both RDF and JSON through JSON-LD. When you compare and contrast, it might not be as useful.

ivan: In a way, the JSON vs RDF model is just one dimension of the differences. There was another discussion is what level of complexity do we want to allow and define within that?
... I'm a little concerned that we're having the same discussion as we had in RDB2RDF; I'm a bit worried we're just repeating the same arguments.
... Before going beyond that, I'd like to have an understanding on how the RDF conversions are done in the use cases. There are only 2-3 that really rely on an RDF mapping. R2RML can be quite complex, with a full SQL language inside. It's a level of complexity I'm quite afraid of.
... It's a kind of difference between the proposals I'd like to examine.
... Mappings of URIs and properties and much complexity.

<JeniT> +1 on defining the RDF output without dictating a particular serialisation of that RDF

<JeniT> I think there are layers: direct mapping using no metadata, mapping to RDF graph using metadata, mapping to RDF syntax using metadata

jenit: a clear document that says we need a decision would help focus discussion.


danbri: we should just say we're choosing one, say left to write, top to bottom.

jenit: also, commas are used as syntactic marker, and not as text.

ivan: I had a conversation with Richard, our i18n guy; the best way to do that would be to contact the i18n mailing list and ask them to look at use cases to see if there's something too latin-biased.
... apart from that, we should try to collect use cases outside of the US-Europe world.

ivan: I can try to reach out to Chinese colleagues, or google has some aribic people.

yakovsh: I'm a hebrew speaker, but I've never seen a hebrew CSV, but I'll poke around.

<danbri> can we add "We particularly seek feedback and suggestions on the Internationalization aspects of this work" to the Status section?

ivan: next time, I don't want to touch it now, it's in the webmaster's control.

Summary of Action Items

[NEW] ACTION: danbri try expressing a direct mapping expressed using http://www.w3.org/TR/2013/PR-vocab-data-cube-20131217/ [recorded in http://www.w3.org/2014/03/26-csvw-minutes.html#action01]
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.138 (CVS log)
$Date: 2014-03-26 14:10:32 $