15:26:25 RRSAgent has joined #csvw 15:26:25 logging to http://www.w3.org/2014/10/27-csvw-irc 15:26:30 Zakim has joined #csvw 15:26:50 rrsagent, set log public 15:27:22 Meeting: CSV on the Web WG, F2F meeting @ TPAC, 2014-10-27 15:27:27 Chair: danbri 15:48:41 danbri has joined #csvw 15:50:01 bill-ingram has joined #csvw 15:50:03 jtandy has joined #csvw 15:50:11 hadleybeeman has joined #csvw 15:50:31 laufer has joined #csvw 15:50:40 bjdmeest has joined #csvw 15:52:17 scribe: hadleybeeman 15:52:29 agenda: https://www.w3.org/2013/csvw/wiki/F2F_Agenda_2014-10 15:53:00 em has joined #CSVW 15:53:16 AxelPolleres has joined #csvw 15:53:32 ErikMannens has joined #CSVW 15:54:01 hadleybeeman has changed the topic to: Agenda: https://www.w3.org/2013/csvw/wiki/F2F_Agenda_2014-10 15:54:08 ericstephan has joined #csvw 15:55:29 https://docs.google.com/presentation/d/1PYx7PmaB4Ouyf_uHJZwE331Cg0R9aGPspjx6y1Z-GNg/edit?usp=sharing 15:57:27 phila has joined #csvw 15:57:30 danbri: [introduces the agenda] 15:57:37 Hitoshi has joined #csvw 15:57:37 topic: intros 15:57:43 rrsagent, make logs public 15:57:59 danbri: works for google, love/hate relationship with RDF. Interested in getting new ways of sucking data into search engine. 15:58:55 jenit: at the Open Data Institute, who are interested in helping people publish/consume open data. Wants to get more consistent CSVs on the Web, for users and publishers to express all the fiddly little context bits that are necessary for reusers to understand. 15:59:02 Present+ Dan Brickley 15:59:06 Present+ Jeni Tennison 16:00:09 chu has joined #csvw 16:00:44 bill-ingram: At the University of Illinois Urbana-Champagne. Interested in research data in the repository space, planning one now. 16:00:46 Present+ Bill Ingram 16:00:59 present+ Phila 16:01:02 Present+ Hadley Beeman 16:01:28 hadley beeman: one of 4 co-chairs of data on web best practives wg. Day job tech advisor to govt cto in uk. Removing barriers to data re-use and publication, become more intuitive, part of everyday life, identify bottlenecks in system. 16:01:33 Present+ Jeremy Tandy 16:01:59 Present+ Eric Prud'hommeaux 16:02:22 jtandy: From the UK Met Office (the national weather service and research institute). We produce tonnes of CSV data. Interested in cross domain boundaries. I want to take CSV to annotate it in a way that it can be combined with other data. Unanticipated reuse. 16:02:35 Present+ Laufer 16:03:14 laufer: Work at Web Engineering Laboratory at the Catholic University of Rio de Janeiro. Also participates in the Data on the Web BP group. Interested in lots of kinds of data. 16:03:15 phila_ has joined #csvw 16:03:44 Present+ Chuming Hu 16:03:46 Present+ Chunming Hu 16:04:27 Chunming_hu: W3C team from China, Chinese host of W3C. Research on data storage and parallel data storage. Work with lots of companies who want to know more about this kind of work, semantics and CSV. 16:04:39 phila_ has joined #csvw 16:04:48 Present+ Eric Stephan 16:05:45 Present+ Ivan Herman 16:05:52 Present+ Axel Polleres 16:06:03 ericstephan: Works at a lab for the US Dept of Energy (Pacific Northwest Lab). Scientists are using .xls* and CSV data. They've looked at mixing data from domains beyond original intentions for the data. Data has taken on a life of its own. I'm a hands-on, real-world problem-driven in focus. 16:06:45 ivan: I am the staff contact for this group. I've been working on various forms of data on the web for 7 or 8 years; used to lead the Semantic Web activity. The transition to CSV was a natural one 16:06:54 phila_ has joined #csvw 16:07:34 Ericprod'hommeaux: I'm w3c staff, mostly working in clinical informatics and bio informatics. Worked with Sage who were trying to get their data in a more useful form ,but after a while they were still using CSVs. 16:08:10 phila: I'm W3C staff, am a member of the group and observing. For me it's about making sure the Web is a data platform, not just a platform for exchanging other files. 16:09:13 axelpolleres: I'm from Vienna University of Economics and Business, and from the RDF linked data side. A year ago, we started to talk in Austria about how to publish data. We were quite surprised at how much needs to be done. 16:09:23 Present+ Erik Mannens 16:09:33 Present+ Ben De Meester 16:09:37 chu has joined #csvw 16:10:03 Hitoshi: I'm gathering information about W3C activities and how working groups go on and what they're focusing on. I don't have an interest in CSV, but I want to know CSV will be used on the web. 16:10:10 we talk mainly with Open Data portal providers there, such as the federal chancellery, or the Cooperation OGD Austria. 16:10:38 ivan has joined #csvw 16:11:12 Erikmannens: AC rep for MMLabs. I had a team of researchers at Ghent University on data analytics. We are working on open data publishing. Working on RML 16:11:42 BJDmeest: I'm here for the Digital Publishing and Web Annotation WGs. Interested in the semantics of data in general. 16:12:19 topic: charter 16:12:36 topic: charter http://www.w3.org/2013/05/lcsv-charter.html 16:12:58 ivan: Finishing by the end of August 2015 is, in my view, impossible. 16:13:13 ... we will have to ask for a charter extension and hope that Phila will be kind enough to help\ 16:13:29 jumbrich has joined #csvw 16:13:36 danbri: This is our contract with the wider W3C community. 16:13:58 ...The specifics for our documents come from the numbered list in the Scope section 16:14:39 ...Re metadata vocabulary: Tables are fantastic places to put stuff, but there is no where to put any other info. How much can we dare to say in this group about what the entire planet can say about their tables? 16:14:54 phila__ has joined #csvw 16:15:36 jtandy: Many people publish many CSVs together, and we want to be able to describe the relationship between them. That fits here too. 16:16:06 jenit: Not just describing the file, but also going into what the table contains. What kind of data, which columns it has, what they contain. 16:16:25 danbri: that also fits with "standard mapping mechanisms transforming CSV to other formats". 16:16:41 Jenit: that's a stand-in for structure that most programming languages will consume 16:17:24 ... the idea is that if you find a CSV file on the web, you want to be able to find out about it (metadata) or you may start with a metadata file which may point to a lot of CSV files 16:17:43 jtandy: it may be that the metadata and data are published independently of each other. Possibly by different publishers. 16:18:03 danbri: Use cases. We have lots of them 16:18:34 ericprudhommeaux: I assume use cases are linked to requirements. How easy is it for someone who has their own use case to discover that their requirements may be addressed? 16:18:46 use cases & requirements document: http://w3c.github.io/csvw/use-cases-and-requirements/ 16:19:00 jtandy: the document makes more effort in describing the use case. We need to flesh out the requirements and make them clearer. 16:19:10 ...But there is a formalised linkage between the two 16:19:17 q+ to talk about UCR 16:19:32 ack me 16:19:32 phila, you wanted to talk about UCR 16:19:39 ack phila 16:19:39 Ericprodhommeaux: A measure of success may be that someone can bring in a use case, look at the requirements and see if theirs are included already 16:20:07 q+ 16:20:11 phila: The use case document for CSVW is useful for DWBP. That group (laufer) will pull use cases from this group's document for that group's use case doc. 16:20:12 q+ 16:20:14 q+ 16:20:29 ack laufer 16:21:12 laufer: You are talking about a file with metadata for other CSV files, and I've seen that you've proposed a file extension. We will have other metadata files, but I'm not sure a particular extension would be useful. A general way to link metadata files to data files may be better. 16:21:36 jeniT: we'll be discussing that later today. But it contains 4 mechanisms for finding metadata; appending a file suffix is one of the four. 16:21:49 Hitoshi_ has joined #csvw 16:21:51 q? 16:22:05 ack ivan 16:22:32 ivan: Looking at the Use Cases document, to the editors: is the document done? 16:22:58 jtandy: I think we have a good collection of use cases. There may be others to include. D3: data driven documents — we may want to look at it. 16:23:34 ...As we reviewed use cases earlier this year, we saw that most requirements in them had already been covered. But the requirements do need more work. They are placeholders that allow us in the group to work on them. 16:23:37 q? 16:23:43 ack jtandy 16:24:18 ericstephan: I'm not sure if we've drawn out — if we found use cases that correlated well, we combined them. That was an internal, organic process. 16:24:38 ...It might be useful to show something like characteristics? Not a requirement. 16:24:43 jenit: can you give an example? 16:25:17 ericstephan: In science efforts, there may be an approach (imaging formats, for instance) used in an entirely different discipline. 16:25:34 Present+ Gregg Kellogg 16:25:37 ...Is it enough to put it in requirements, or is there another outreach mechanism that would help draw people in, so they can relate to a use case? 16:26:10 jtandy: As an example, we had to work out which use cases covered data transformation. Not a requirement, but something they have in common. Maybe a simple lookup table at the topic? 16:26:20 danbri: Do you have everything you need to do that? 16:26:38 jtandy: the ones we have are sufficiently articulated to do that. We should give them the chance to comment though. 16:26:50 danbri: and in terms of having their actual CSV files? 16:27:19 jtandy: Sometimes. Some are behind corporate firewalls. Obviously only those use cases that talk about transformation can have target XML, RDF, JSON. But examples of those help. 16:27:47 q? 16:27:51 ericstephan: It's like saying, "Here's something that illustrates this use case, and here are some sister or related datasets from something similar." 16:28:01 ...So you could expand from datasets from the explicit use case. 16:28:25 jtandy: But given the limited resources of the group, we have to balance that idea along with meeting the other deliverables. Let's try to work that out this week. 16:28:41 danbri: My feeling this that this document is in a good place. Better than many I've seen. 16:29:54 GreggKellogg: (introduces himself) I'm an IE in this group. I'm an consultant. I'm one of the editors of the JSON-LD spec. I've not participated a lot on calls due to time zone challenges. 16:30:32 danbri: re deliverables listed in the charter. UCR? 16:30:42 ivan: That's what I was checking. It's 80% done? 16:30:44 jtandy: yes 16:31:01 q+ 16:31:09 danbri: Metadata vocabulary for tabular data. Title has changed from charter, but intention is still same. 16:31:28 ...Access methods for CSV Metadata 16:32:06 jenit: This is talking about syntax around CSV, and the issues there. We have something to resolve there: we aren't the route in charge of syntax for CSV files. It's not in our charter. And yet it's the syntax that tis one of the big sticking points for making this work. 16:32:31 ... This document therefore has a non-normative section on syntax issues, which will feed into the IETF's work on this. 16:32:41 q+ to ask about IETF a little more 16:32:42 ivan: This is rec track? 16:32:45 danbri: Yes. 16:32:53 q? 16:33:10 q+ EricP 16:33:11 ack jtandy 16:33:38 jtandy: I found useful from this document: knowing what IS tabular data. We had a use case from the medical community that was a line oriented data, but not tabular. 16:33:56 ...This is a useful document for helping determine what we do want to talk about. And what we don't. 16:34:14 ...I'd suggest reading this before you get coffee at the brea. 16:34:14 q- 16:34:24 jeniT: we'll be going through this in depth later today. 16:34:49 ivan: The editor of the IETF document is a fairly active part of this group. He's not here now. 16:35:22 Frederick Hirsch: David Lewis: (introductions) 16:35:29 q? 16:35:35 ack phila 16:35:35 phila, you wanted to ask about IETF a little more 16:35:46 q== 16:35:53 phila: Do we expect the IETF spec to be updated in response to this work? 16:35:56 q? 16:35:59 jenit: yes 16:35:59 q? 16:36:03 ack ericp 16:36:04 q- ericP 16:36:16 danbri: are we happy with the mappings of the names in the charter to what we've done? 16:36:24 ivan: The titles in a charter often change. 16:36:41 danbri: It's not unreasonable to write down the data model for CSV before you move on. 16:36:55 jtandy: I don't remember having a document for access methods for metadata 16:36:57 http://w3c.github.io/csvw/syntax/#locating-metadata 16:37:00 danbri: it's a secant of the model 16:37:04 s/secant/seciton 16:37:09 s/seciton/section 16:37:23 http://w3c.github.io/csvw/csv2json/ 16:37:28 http://w3c.github.io/csvw/csv2rdf/ 16:37:48 danbri: Mapping mechanisms is the last bit. We have Generating ... 16:37:49 ivan: and Generation JSON from Tabular Data on the Web 16:37:57 jtandy: and we anticipate having one for XML 16:38:05 ivan: Yes, but there has been no interest 16:38:21 jenit: does anyone want to do this? 16:38:34 q+ ericP 16:38:35 Hitoshi has joined #csvw 16:38:45 Hitoshi_ has left #csvw 16:39:11 q? 16:39:13 jenit: a good mapping to XML would include XSI-type elements to indicate the values, which would go beyond what JSON supports. 16:39:31 ...You could envisage a mapping to XML that turns some things in to elements and some into attributes. 16:39:54 q+ to talk about XML 16:40:09 ivan: But we have to be careful: if we define a mapping to XML, and we want it to be a recommendations, we need implementations, test suites, etc. Not just a cut-and-paste job. 16:40:19 ack ericP 16:40:19 q+ 16:40:48 ack me 16:40:48 phila, you wanted to talk about XML 16:40:48 ericP: Henry Thompson wrote a paper on normal forms of XML, turning XML into RDF. If you're going the other way you might want to see it. 16:41:04 phila: would it be useful to get an XML person in the room? They are here in the building. 16:41:05 fjh has joined #csvw 16:41:14 ivan: We should talk to Liam, the XML activity lead. 16:41:21 phila: he's currently scribing a meeting 16:41:34 danbri: I spoke to him yesterday; he's suggested eXSLT. 16:41:47 q? 16:41:55 jeniT: I was intimately invovled in XSLT, but I don't remember that. 16:41:55 q+ 16:42:32 ...For completeness, it would be good to have an XML mapping. Not a trivial amount of work, and we need someone within the group to take it on. If no one wants to, then we may have to rule it out of scope or issue a note with our thoughts on it. 16:42:42 danbri: We should take seriously that it hasn't cropped up in the use cases. 16:42:54 q- 16:43:06 q- 16:43:08 jtandy: Some mention it. But we don't have anyone keen to take a lead on the work though. Mismatch between what's being asked for and what this group can currently deliver. 16:43:22 q+ 16:43:23 ack ericstephan 16:43:24 danbri: I see demand for it online. Look at StackOverflow, people are asking about libraries. 16:43:27 q? 16:43:33 gkellogg has joined #csvw 16:43:45 q+ 16:43:51 gkellogg has joined #csvw 16:43:55 ericstephan: There are a lot of scientific communities that use XML but they tend to use it more as at tag language. Not necessarily well-formed. 16:44:13 ...I don't see a lot of interest going between CSV and XML. They're either in one or the other. 16:44:17 q? 16:45:10 q- 16:45:12 chunming: We talk about someone sharing a big CSV file on the web. Another model is that someone has a huge dataset but allows a 3rd party to access just part of it, using CSV formats. Which model? 16:45:53 jenit: Scope is not to specify a query language over a large dataset that produces CSV. Or an API. But instead the files themselves. But that is a good usecase, as jtandy discusses. 16:45:56 q+ 16:46:09 ack chunming 16:46:20 jtandy: We do have a use case that is from PLOS, where we are requesting a subset of results where those results are being produced in CSV or JSON or XML 16:46:23 q+ 16:46:31 q- 16:46:42 We talked about looking at a bit of that CSV and decided not to. But we are including the provenance relationship between a small dataset and its parent dataset 16:47:08 q+ 16:47:14 gkellogg: Using an HTTP header — that seems like a protocol. Ensuring that a client can parse the HTTP headers appropriately. Does that open the door? 16:47:15 q? 16:47:19 ack gkellogg 16:47:27 q+ 16:47:42 jtandy: We were talking about using query parameters on an HTTP request in order to get rows 17-29. Not in our scope but relevant. 16:47:43 FWIW, IBM had some canonical JSON to XML mapping… http://pic.dhe.ibm.com/infocenter/wsdatap/v6r0m0/index.jsp?topic=%2Fcom.ibm.dp.xm.doc%2Fjson_jsonx.html (had to dig out the link) 16:48:12 ivan: The various methods to access the metadata means that even for huge datasets I can get it all, because they are small compared to the dataset itself. 16:48:38 ...I don't know whether the mapping to JSON or to RDF can be helpful for someone to make an inverse and be able to query into the CSV. 16:49:05 ...In RDF terms, knowing the metadata can I turn a SPARQL query back into a CSV? It's an exciting question which we won't answer here. 16:49:08 q? 16:49:11 ack ivan 16:49:12 ack ivan 16:49:42 Frederick: Regarding the charter, i'd imagine you'd defer this until you have a strong reason to address it. 16:49:58 ... @Jtandy: You mentioned provenance, which is relevant to Web Annotations 16:50:15 jtandy: we have a whole thread of discussions on benefitting from the good work of you group 16:50:20 ack fjh 16:50:23 q? 16:50:24 ivan: we have a joint session this afternoon 16:50:30 danbri: for XML then....? 16:51:00 ivan: For planning, we should make a final decision before the end of the year. Ideally earlier, but we have to talk to Liam. 16:51:02 q+ 16:51:24 q+ to propose "The WG does not intend to work on XML/CSV mappings under its current chartered period." 16:51:27 ...He may say "forget it guys", but he may want us to talk to more of the community. In which case, Christmas is not an unrealistic time 16:51:42 danbri: I was going to propose that we not work on XML mappings 16:51:46 ack me 16:51:49 danbri: Does anyone agree? 16:51:52 ivan/phil 'let's talk to liam' 16:52:03 phil: Let's talk to Liam. 16:52:12 s/phil/phila 16:52:39 jenit: I propose we catch up to Liam and other XML people over the next couple of days and address this with a resolution by the end of tomorrow. 16:53:01 also http://msdn.microsoft.com/en-us/library/bb924435(v=vs.110).aspx 16:53:03 AxelPolleres: I put something in IRC from IBM (above), but I don't know if there is anything more broad. 16:53:28 ivan: Doing a standard just because it's the charter and not checking if it's the right thing to do — sounds awkward to me. 16:53:42 AxelPolleres: I thought there may be something we could refer to, that exists already. 16:54:10 JeniT: There are ways of doing that — but I don't think any of those are what we would call standards. Where we could make normative references to them. 16:54:17 danbri: It would be helpful to end the week with a decision. 16:54:56 JeffJaffe: (introductions) CEO of W3C. Interoperable web standards, but particular interest in CSV. So much data out there, this is key. 16:55:25 danbri: Looking at the mapping mechanisms for CSV into other formats... ivan, can you talk about what you've done with direct mapping? 16:56:01 ivan: We had loads of discussion/emails on that. Not just direct mapping. My feeling is: what is realistic: a relatively simple mapping that doesn't require further language specification or syntax within the recommendation 16:56:01 backchannel-question …. as for provenance … we would just hook in PROV with the ‘provenance’ metadata property, or was anything else discussed in this group? (sorry for having missed that, in case) 16:56:34 ...what we have now is a document that mimics the RDB2RDF as a direct mapping (ericp did that). We have metadata we can rely on, so it's a bit different. 16:56:35 AxelPolleres: that was my assumption, though how to structure it in a JSON format I’m not sure 16:56:38 q? 16:57:04 @AxelPolleres: W3C PROV would seem the correct option; we're not intending to re-develop anything in this space 16:57:04 ...We had last week a mail from jtandy with reference to an RFC for URI templates which is a useful addition to that simple mapping. 16:57:08 q- 16:57:13 hmmmm, http://www.w3.org/Submission/2013/SUBM-prov-json-20130424/ seems to be “post-PROV-WG” 16:57:24 q+ to talk about terminology confusions 16:57:37 q+ to talk about IBM's work 16:57:44 q+ ericP 16:57:48 ...Those 2 documents exist, they need some care especially in how the data dives are interpreted. I think there is a separate discussion scheduled on the data dive in the metadata. 16:58:10 ...Most of it is stable, the core is stable. The core can be implemented because I have a proof of concept for the RDF and JSON part. 16:58:11 q+ 16:58:32 ...There have been two other works that we explored. 1) We had long discussion about using this in a more general form. (Moustache?) 16:58:52 ...Allowing a separate template to generate an RDF or JSON structure that is more complex than the line-by-line structure of a CSV file. 16:59:12 http://mustache.github.io 16:59:22 ...If we're not careful, this could be come more complicated. I think we should not go this route for rec. 16:59:58 ...Independently, 2) Anastasia — the R2RML language minus the SQL-specific things that are irrelevant here. 17:00:27 ...For my feeling, has the same issue as Moustache — and is very RDF-specific. No structure for JSON. 17:00:39 q? 17:00:39 ...Right now, I think it's more important to produce JSON than RDF. 17:00:46 ack danbri: 17:00:52 ack danbri 17:00:52 danbri, you wanted to talk about terminology confusions 17:01:06 danbri: Re terminology. I've realised that my thought of "direct mapping" was different to what ivan has meant. 17:01:42 ...In R2RML group, mapping starts with an SQL table and creates RDF graphs, triples. Predicates aren't mapped to well known RDF namespaces. 17:01:49 ...In this group, we have more richness. 17:02:13 ...When we say "direct mapping", we probably mean "simple mapping". Which could map to Dublin Core, or SKOS. 17:02:29 ivan: I plead guilty because I've said "direct mapping" on the mailing list. 17:02:47 danbri: This came to light when I said Google would have not interest on this. But the simple thing is potentially very valuable. 17:02:52 q? 17:02:54 ack me 17:02:54 phila, you wanted to talk about IBM's work 17:02:55 jenit: our first session tomorrow morning is on this. 17:03:21 phila: Axel found a document from IBM, so I pinged Arnaud to ask if we can use it. He wasn't sure. I'll ask him for a clearer answer 17:03:27 ack ericp 17:03:58 jtandy: Re diff between "simple mapping" and "templated mapping" — in use cases, I want to represent more complicated content. That needs to go in Simple Mapping document. 17:04:15 q+ ericp 17:04:18 ack jtandy 17:04:31 ...In simple mapping, you have to have property per column, Month and day property in different columns — can't create a date property merging them. 17:05:20 ... If you have one triple per cell — we can say "this is as far as we can go now, but there we will be a community group or separate discussion to hook in external tempting stuff." 17:05:31 q+ on JSON-LD from RDF with Framing 17:05:34 eric ericp 17:05:37 ack ericp 17:06:43 ericP: If you want to characterise the difference between simple mapping and direct mapping: CSV of people and addresses. turn into a graph. Rename predicates in that graph, reflect the metadata. Compare to simple mapping. If they differ in substantial ways, then... 17:07:10 ivan: I use the direct mapping approach. 17:07:22 ericp: any differentiation would be defensible. 17:07:44 q? 17:07:44 ivan: In the case of simple mapping, there are more info than we know. Info about the whole CSV file as a whole. 17:08:09 AxelPolleres: to @ivan: if it covers more but should be the same, is it a requirement that the single mapping produces more triples? 17:09:19 q+ 17:09:20 @Axel I wonder if the IBM work related to DFDL and Daffodil annotating data as XML document... 17:09:21 gkellogg: There are advantages to looking at RDF mappings. Serialising RDF to JSON-LD gives you a JSON result. There is a spec for doing that. Looking at simple mapping — it now does provide the RDF tools to turn the graph into something more structured using SPARQL 17:09:24 ack gkellogg 17:09:24 gkellogg, you wanted to comment on JSON-LD from RDF with Framing 17:09:42 what I meant to say is, wouldn’t it make sense to require that the “simple CSV to RDF” mapping is a *superset* (in terms of resulting triples) of “CSV->SQL->RDB2RDF direct mapping”? 17:10:03 ivan: Yes. Conceptually, I was wondering about the same thing. But as an implementer only interested in JSON: this is a long and torturous road. It might be a deal-breaker. 17:10:30 +1 to ivan 17:10:43 ...Having a separate document that shows what you get in JSON and making it as close as possible to JSON-LD — as ericP said, there should be no major difference between the direct mapping and the simple mapping — 17:10:56 q? 17:10:57 ...If there are differences because JSON requires something different then we have to accept that. 17:11:20 gkellogg: We need to include people comfortable with these technologies. 17:11:43 ivan: I disagree. People who don't know anything about RDF — they just want it in JSON. There are loads of people there 17:11:49 hadleybeeman: I agree with that 17:11:58 +1 Ivan 17:12:03 ivan: Even as an RDF person — this is a painful reality. 17:12:08 ack danbri 17:12:22 danbri: We have a spectrum of enthusiasm for RDF. 17:12:30 q? 17:12:51 ErikMannens has joined #CSVW 17:12:51 ...We need to mush these interests together. With Schema.org and Microdata (designed to be super simple for publishers) — even those were too complex 17:13:09 ...These developers aren't thinking in terms of triples or graphs. 17:13:17 q+ 17:13:38 ... Saying RDF is the answer because you can serialise to RDFXML — long histories of failings here. Let's not spend the next 10 years doing the same with JSON 17:13:38 q+ 17:13:47 ack ErikMannens 17:13:55 ErikMannens: What's wrong with profiles? Simple profiles? More extended profiles? 17:14:03 XML is not fading away - its use is growing. Honestly (Liam assures us) 17:14:24 ivan: The simple mapping to RDF is there. The definition is strictly done on the conceptual level in RDF. If someone wants to go that route and get JSON-LD, it's fine. 17:14:56 ...If they do that, or do direct JSON, the two things should be close. But we don't talk about that. The document should be readable for someone in that context. 17:15:30 q? 17:15:36 ...The context is a good example. If you serialise the result of the RDF mapping into JSON-LD, then you will have all those things there. But if you serialise directly in JSON, you will not. 17:15:37 ack fjh 17:16:09 ErikMannens has joined #CSVW 17:16:24 ivan... If you want to some how be in the RDF world, then great. But if you're not — those are noise. Irritating noise. 17:16:57 q? 17:17:06 gkellogg: The tide seems to be moving toward well understood structured data in a lot of communities that were hostile to RDF. I don't know that we need to pander to a JSON mapping that doesn't contain some aspects of this. 17:17:15 danbrI: we 17:17:22 ...'ll pick this up later 17:17:37 topic: meeting goals 17:17:48 @phila - I agree with Liam's comment, lots of legacy communities still using XML, other communities that are emerging such as High Energy Physics very interested in XML. Just not sure about the CSV XML connection. 17:18:24 topic: Review our implementation types 17:19:06 Any volunteers to take over scribing from Hadley? 17:19:07 jenit: We've looked at RDF, XML, JSON — that's one set of implementations. But I'm also interested in validators (validating a set of CSV files against the metadata to say if it's formatted correctly, has the right columns, etc.) 17:19:12 rrsagent, generate minutes 17:19:12 I have made the request to generate http://www.w3.org/2014/10/27-csvw-minutes.html fjh 17:19:16 http://csvlint.io/ 17:20:02 gkellogg has joined #csvw 17:20:13 jeniT: (shows demo of csvlint.io ) 17:21:29 ... Validation tools are really handy. We in the UK have a push to get local government to publish data about public toilets. The people pushing it defined a schema for the data, and 400+ local authorities had to validate against that. 17:21:46 ...That makes it easy to pull all of those datasets together into something consistent and coherent. 17:22:03 q? 17:22:58 ...Another important implementation: display of CSV. GOV.UK, data.gov.uk, github — have displays of CSV as a table. They'll often add on filtering or sorting options. 17:23:18 ... it's important and useful to know what the data type of the column is, so you can filter it the right way. 17:23:26 ... using jquery datatables 17:23:26 q+ 17:23:39 ... www.datatables.net 17:23:51 sideremark… seeing csvlint.io it reminds me somewhat of http://www.w3.org/2001/sw/wiki/RDF_Alerts which we did some years ago… that was RDF specific though, not sure whether any of that useful here. 17:24:04 ... Turn that CSV into an HTML table. You can imagine having pop-ups over the cells if they have annotations, having a metadata view, etc. 17:24:07 "display" / viewers 17:24:36 jenit: So those are the three implementations I think of: mappers, validators, and viewers. 17:25:11 (me: import from bytes into tabular data model, … but that's more IETFish) 17:25:40 q+ to talk about error messages & warnings 17:25:48 ack ivan 17:26:19 ivan: It's clear to me what first two categories do for us. I'm not sure how the third category fits into the picture of checking our own work. Importing is definitely not in our charter. We are not defining the byte stream to tabular conversion — that's in the IETF spec. 17:26:22 q+ ericp 17:26:36 ... What are the implementations that we have to take seriously as part of the rec track? 17:26:47 ack jenit 17:26:47 JeniT, you wanted to talk about error messages & warnings 17:27:30 jeniT: It is useful to talk about the display in a non-normative fashion 17:27:31 q+ 17:27:58 ...Also, in what we need to do for validators: we need to talk about errors, warnings, etc. 17:28:04 ericP-mobile has joined #csvw 17:28:06 ivan: Do we have to define standard errors? 17:28:19 jeniT: I think so. Not standard wordings, but codes for them. I think it's helpful. 17:28:34 present+ Richard Ishida 17:28:34 Richard: (introductions) 17:29:15 ErikMannens has joined #CSVW 17:29:33 Richard: Display will be different. Internationalization are looking at the forms in HTML, numeric formats in different languages, etc. There are problems associated with that that may be relevant here. 17:30:10 q+ 17:30:11 jenit: for CSV, unlike a lot of other data, it has the goal of being both machine readable and human readable. So we do have numerical formats that are location specific. (Dates, numbers, etc.) 17:30:33 Richard: You may need to account for locale in the metadata. 17:30:59 q? 17:30:59 ... As HTML does, a lot is done in the browser. 17:31:09 ... A lot of a locale is a language plus local settings. 17:31:14 q- ericP 17:31:27 danbri: Shall we have a joint meeting about this? 17:31:39 jenit: we have a session on data types later today. Useful for this. 17:31:45 ack hadleybeeman 17:31:59 hadleybeeman: things like the display on html page may not be as relevant for this WG but it fits well with Data On Web Best Practices WG 17:32:01 hadleybeeman: The display issue may be relevsant to DWBP group 17:32:12 we are looking at barriers to use, if average user can't see/read/understand ... 17:32:15 q? 17:32:34 hadley: display may not fit this WG but it may fit well in DWBP. 17:32:49 ack jtandy 17:32:53 jtandy: For us, in terms of display, we often want to get data into just plain JSON. "javascript goodness" can then be applied. 17:33:13 q+ 17:33:20 ... In internationalization, we look at right-to-left and top-to-bottom languages too. 17:33:57 ivan: we have Japanese representation here. China are pretty agreeable to doing everything horizontally. Japan this is not so. 17:34:08 Vote of thanks to Hadley for scribing first (busy) session 17:34:16 RRSAgent, draft minutes 17:34:16 I have made the request to generate http://www.w3.org/2014/10/27-csvw-minutes.html phila 17:42:18 ErikMannens has joined #CSVW 17:51:40 JeniT has joined #csvw 17:52:01 daveL has joined #csvw 17:52:33 present+ DaveLewis 17:54:46 bill-ingram has joined #csvw 17:55:01 Best Practices for Multilingual Linked Open Data Community Group may be willing to help with internationalisation issues 17:55:13 http://www.w3.org/community/bpmlod/ 17:56:39 fjh has joined #csvw 17:56:45 ErikMannens has joined #CSVW 18:00:01 jtandy has joined #csvw 18:02:21 AxelPolleres has joined #csvw 18:02:37 gkellogg has joined #csvw 18:02:46 scribenick: gkellogg 18:03:43 Topic: Tabular metadata 18:03:54 rrsagent, draft minutes 18:03:54 I have made the request to generate http://www.w3.org/2014/10/27-csvw-minutes.html ivan 18:04:01 ericstephan has joined #csvw 18:05:00 JeniT: talking about metdata representation for individual tables, but also how it can be applied to columns 18:05:11 … title, description, date, … 18:05:18 s/metdata/metadata/ 18:05:28 http://w3c.github.io/csvw/metadata/#common-properties 18:05:33 q+ on provenance 18:05:41 … currently in “Metadata Vocabulary” spec sec 3.3 18:06:00 danbri has joined #csvw 18:06:08 … This pulls in and references all dublic core metadata terms 18:06:40 … In some cases terms describe data values, object, natural language string, or something with a particular date format 18:06:58 … Three areas to discuss. 18:07:23 … 1) what list of properties should be, perhaps dcat, or schema.org instead of DC. Perhaps our own set. 18:07:46 q+ to ask about existing implementations 18:07:57 danbri: if they’re a DC-based project, they may need to use DC for everything. 18:08:00 q- 18:08:14 JeniT: sometimes it’s the consumer that cares most about vocabulary mapping, rather than the publisher. 18:08:42 … We need a list, as we’re expecting validators and mappers to reject properties not on the list (to avoid miss-spellings). 18:09:06 … 2) how are the properties defined, within the spec or outside. (Constraints on what we can point to) 18:09:29 … 3) How is metadata used to inform the mapping to different formats. 18:09:39 q+ 18:09:52 do we need/want any new properties on document level anything which is not covered in DC, DCAT, PROV? Do we need to specify mappings to those? 18:09:58 ack AxelPolleres 18:09:58 AxelPolleres, you wanted to comment on provenance 18:10:07 AxelPolleres: there are two types of metadata, document-level and structural. 18:10:20 … The former is also around provenance, the second is for processing instructions. 18:10:42 http://www.w3.org/TR/prov-dc/ 18:10:46 … Also consider PROV vocabulary, there are notes on how to map PROV to DC. 18:11:06 … Do we need to ensure that there are mappings between the two. 18:11:24 JeniT: we can just pick up DC terms, or we could say use DCAT or ... 18:11:44 q? 18:11:56 … In this case “provenance” is the DC term, not necessarily relating to a different spec. 18:12:07 q+ to ask about UCs and subsetting 18:12:09 ‘provenance’ isn’t in the schema.org set of terms 18:12:20 hadleybeeman: do we have any way of knowing what is used more beetween the different formats? 18:12:20 ack hadleybeeman 18:12:20 hadleybeeman, you wanted to ask about existing implementations 18:12:21 ack hadleybeeman 18:12:27 q+ 18:12:46 danbri: Google has information for microdata/rdfa/json-ld, but not from other RDF formats. 18:12:57 … Clearly, we’re going to see a lot of schema.org. 18:13:14 hadleybeeman: what would these numbers tell us if we could get them. 18:13:18 is it in us to define/extend mappings between - for us useful - properties among schema.org, DC, DCAT, PROV, e.g. extending http://www.w3.org/TR/prov-dc/ 18:13:22 q? 18:13:23 q+ 18:13:38 q+ to talk about how we should always enable extension names 18:13:43 q+ 18:14:00 ericstephan_ has joined #csvw 18:14:00 q+ ericstephan 18:14:31 ivan: Jeni said that “these terms” are the only terms you should use, which seems to be dangerious. 18:14:36 ack ivan 18:14:51 q- 18:14:52 ack me 18:14:54 danbri, you wanted to ask about UCs and subsetting 18:15:00 JeniT: That should be “un-prefixed” terms, it’s really about unprefixed terms. 18:15:09 q+ 18:15:17 danbri: can we be use-case driven? DC started with 15 terms, has grown over the years. 18:15:28 q? 18:15:29 ack danbri 18:15:29 … Can we use the use cases to pair down the set of terms we need to support. 18:16:01 jtandy: National Archives has some economic data which includes publisher, date, time, obvious stuff. 18:16:23 danbri: perhaps we can look at CSVs in repo. 18:16:29 q? 18:16:38 ack bill-ingram 18:17:07 bill-ingram: in the library, everyone in metadata knows what DC is, but it took a while to get there. People starting to talk about schema.org. 18:17:36 … Most of this relates to the software we use, for that DC is the core metadata for describing objects. It’s starting to change. 18:17:40 +1 bill-ingram 18:17:50 q? 18:17:51 from the UC doc: see http://w3c.github.io/csvw/use-cases-and-requirements/#UC-PublicationOfNationalStatistics 18:17:53 FWIW, CKAN also has some metadata properties which I am not sure how far they are aligned with e.g. DC, etc., are they? 18:17:53 … I’m interested in schema.org, but it always ends up talking about mapping back to DC. 18:17:58 ack laufer 18:17:58 q+ to propose using the overlap between the various specs 18:18:47 laufer: there may be some mandatory items. 18:18:58 q? 18:19:00 … Some may be mandatory, others optional. 18:19:20 JeniT: different organizations always create their own profiles for what they expect. 18:19:21 ack ericstephan 18:19:46 ericstephan: predomenance of data is in DC. I’m sensitive to DCAT and DC, as they’re forward thinking. 18:20:34 … looking at requirements derived from use-cases, that would be a way to help define a core set of metadata we should be considering, or if there are obvious glaring holes. 18:20:47 … I am worried about getting lost in the detail, however. 18:20:48 ack jtandy 18:21:10 ericP-mobile has joined #csvw 18:21:16 jtanday: we previously agreed to a short-list of about 15 terms. 18:21:42 … and of section 3.4.2 18:21:52 http://w3c.github.io/csvw/metadata/#optional-properties 18:22:24 … these are properties that relate to core information expected to be associated with CSVs and used in mapping. 18:22:36 q+ 18:23:14 ivan: spatial and temporal were unclear if they should be part of the core 18:23:42 JeniT: I think that list was “plucked out of the air”. There are so many groups who have thought about this, we shouldn’t re-do that thinking. 18:24:02 q? 18:24:07 ack jenit 18:24:08 JeniT, you wanted to propose using the overlap between the various specs 18:24:38 jtandy: we were looking at three main things: validation, mapping and display. 18:24:48 q? 18:24:56 … What metadata do we need to ensure that these mappings can occur, this list doesn’t form that. 18:25:38 … Maybe we can off-load choice of terms to Best Practices WG. cc/hadleybeeman 18:25:58 hadleybeeman: we haven’t gotten into this too much yet. 18:26:21 … We need to talk about this more, but that kind of a division of labor makes sense. 18:26:50 jtandy: It doesn’t matter how your publishing data, these questions are universal. 18:26:57 … It really should be about validation of parsing. 18:26:58 q+ 18:27:31 hadleybeeman: we’re shying away about specifying specific vocabularies, as there are many different needs. 18:27:54 jtanday: but you probably should be able to say that there should be a license, but there are many ways to express it. 18:28:05 ErikMannens has joined #CSVW 18:28:33 q? 18:28:35 laufer: we can’t make a complete list, but we can give examples of vocabularies which can do it. 18:29:23 ivan: until now everything is mapped to DC. The question is should we use schema.org or DCAT instead? 18:29:46 … We tried to specify a very small core, but leave the details up to the users. 18:30:09 … This list was for the small core; it does not exclude the use of other vocabularies. 18:30:36 q+ to talk about definition through implementation 18:30:42 … Do we define the 5..15 terms ourselves, or leave it open to the user to decide? 18:30:43 ack ivan 18:30:56 Is the question here: Are we defining a vocabulary, or pointing to existing work? 18:31:30 … What does it mean if we pick “language”, “title”, and “provenance”? Do we define a new core (Santa Clara Core?) 18:31:45 i thought the purpose of picking the terms was to enable s nodical of validation 18:31:51 ack fjh 18:32:01 fjh: What are the normative assertions, and how do you test them? 18:32:19 … If you push too much off to the Best Practices group, you might not have something testable. 18:32:36 q+ re DC testability 18:32:37 s/nodical/modicum/ 18:33:08 JeniT: two issues: when testing the medata file, a validator has to genrate a warning. 18:33:18 … The other level, is the actual use in deeper validation or mapping. 18:33:40 … For example, the “title” property might be used to validate column titles to be what is expected. 18:33:55 q? 18:33:55 q? 18:33:57 q+ on what are the expectations on validation 18:34:02 ack jenit 18:34:02 JeniT, you wanted to talk about definition through implementation 18:34:03 ivan: als need to check that the value given to a language mapping is a real language. 18:34:33 JeniT: That’s a way to distinguish between first-level terms, and other terms. 18:34:55 q+ 18:35:02 … The implication is that if you wanted to use, say, a license, they would need to use a prefixed-term. 18:35:08 q+ 18:35:12 ack me 18:35:12 danbri, you wanted to discuss DC testability 18:35:23 ErikMannens has joined #CSVW 18:35:38 danbri: we’re pushing on 20 years of DC work; never been too rigid. Everything’s optional. 18:35:49 q? 18:35:54 … If this group starts to make stronger claims about DC, that might be an issue. 18:36:04 AxelPolleres: what are the expectations on valididty? 18:36:28 … For some things it’s lexical, but for other’s it is more challenging (license, for example). 18:36:47 ack axelpolleres 18:36:47 AxelPolleres, you wanted to comment on what are the expectations on validation 18:36:48 q+ to say the i expect that the shape view will be that it's encouraged to restrict Duckling Core 18:36:56 … Do we want to validate other types of things. recommendations of particular strings to use? 18:37:19 q+ 18:37:23 q+ to note that DCMI (and schema.org) can be changed/improved/augmented too - we can push ideas upstream 18:37:28 ack laufer 18:37:30 … Some things we can validate, other’s we can’t; doesn’t mean they’re not important. 18:37:47 e.g. license IS very important to be declared. 18:37:59 I note the many crossovers with DWBP WG 18:38:04 laufer: we need to classify types of data. Structural data? 18:38:18 … How important is differnet types of data for searchability, for example. 18:38:20 q+ to ask whether the world would collapse if we stay with a few dublin core term 18:38:37 … What we can do is information about the structure of the data (rows and columns, datatypes, etc.) 18:39:05 … license, provenance, … Difficult to test these, you can test syntactic. 18:39:26 q+ to ask about relevance of RDF Shapes 18:40:02 ack jtandy 18:40:37 jtandy: If we’re focusing the metadata vocaulary on what is necessary for validation, then we shouldn’t spend too much time worrying about it. 18:40:44 ack ericp-mobile 18:40:44 ericP-mobile, you wanted to say the i expect that the shape view will be that it's encouraged to restrict Duckling Core 18:41:25 ericp: I’m working with DC application profiles group who want’s to make sure there are some ways of describing restrictions on publication profiles. 18:41:27 q- 18:41:43 … We’re also starting work at the W3C on this. 18:42:21 danbri: the DC view is that such restrictions make sense in a particular context. For use the question is do we decide this? 18:42:35 ack ericstephan 18:42:41 … Uses by govenrment vs search engines may be different. 18:43:09 q? 18:43:12 ack me 18:43:12 danbri, you wanted to note that DCMI (and schema.org) can be changed/improved/augmented too - we can push ideas upstream 18:43:12 +1 to ericstephan 18:43:13 bill-ingram: our focus has been generic document-level. what we’re going to do is give insite on describing CSV contents, that’s what people will look for. 18:43:38 s/bill-ingram/ericstephan/ 18:43:41 ack ivan 18:43:41 ivan, you wanted to ask whether the world would collapse if we stay with a few dublin core term 18:43:43 q? 18:44:12 +1 to DanBri: make things work together (addition: rather than defining something new) 18:44:33 ivan: I wonder if the metadata document maybe just went too far? Perhaps we should just take the bare minimum (2-3 terms) but we use DC explicitly. 18:44:56 … We do rely on DC once and for all, as we have a mechanism for using other vocabularies. 18:45:18 … What counts is the metadata for describing the structure. 18:45:36 … Accept DC, use DC, make it clear that you can use schema and DCAT by using prefixes or contexts. 18:46:09 … a validator may then check these things. 18:46:17 q? 18:47:02 q+ to suggest dc:locale per richard ishida's contrib 18:47:11 … Just a few un-prefixed terms from DC, not defined by us. 18:47:40 JeniT: We need to have some restrictions on the values of these terms. 18:47:52 q? 18:47:53 q? 18:48:07 q+ 18:48:47 jtandy: For the validation to work, we need consistent syntax. I can imagine not caring about what the @context says, because I can validate the structure. It may map to schema, or to DC. 18:49:04 ivan: I’m saying we don’t define the value of the “language” value for example. 18:49:33 q+ to ask if the interoperability between metadata sets is part of the scope of this working group? 18:50:27 ack danbri 18:50:27 danbri, you wanted to suggest dc:locale per richard ishida's contrib 18:51:02 danbri: localle has been described as being important, but isn’t in our list. 18:51:02 ack bill-ingram 18:51:21 bill-ingram: we’d prefer that everything be prefixed, but if un-prefixed, we’d like them to map to dc? 18:51:37 ivan: perhaps, but I suggest that we only allow 5 terms to be unprefixed. 18:52:25 q+ to propose a list of the unprefixed properties 18:52:25 q? 18:53:04 hadleybeeman: scope question: If I have a dataset using DC, and you have one using DCAT, does that break the purpose of this WG? Or is it okay as long as they’re each valid? 18:53:20 JeniT: Just validity. 18:53:46 ack hadleybeeman 18:53:46 hadleybeeman, you wanted to ask if the interoperability between metadata sets is part of the scope of this working group? 18:53:48 ack jenit 18:53:49 JeniT, you wanted to propose a list of the unprefixed properties 18:54:14 JeniT suggests resolution "we are going to stick to a small set of properties that are used in validation or mapping" 18:54:35 greg: i heard a couple things, …1 at v surface level terms are used, e.g. title as a string in json doc... 18:54:43 …this doesn't necc say that it maps to dc:title 18:54:57 … is everyone on board with this, or is there a feeling that it must map to DC title 18:55:00 q? 18:55:54 ivan: we define the meaning of terms according to DC, but it may be mapped. If mapped, it must be dc:title. 18:55:59 q+ 18:56:08 … Perhaps through entailment? 18:57:08 q+ 18:57:28 gregg: cautioning that looking at surface level of json where you also expect a mapping could be problematic 18:57:29 ack ericp-mobile 18:58:08 ericp: Is the WG receptive to the Best Practices coming back and saying that there may be some imposed restriction? 18:58:37 q? 18:58:37 hadleybeeman: I’d say that that is a different place for the discussion, but might include the same people 18:59:29 JeniT: I think that it’s reasonable for other groups to decide practices that we should conform to. 19:00:02 q+ 19:00:34 ivan: we use JSON, and when we can, the syntax conforms to JSON-LD. The metadata can be considered as JSON-LD by an implementor if it wants 19:00:50 … So there might not be a context? 19:01:05 q? 19:01:15 q+ to propose language as metadata for tables, title and language for columns 19:01:26 ericp, how bout calling it “best practices” rather than “imposed restrictions”? 19:01:35 q? 19:01:40 ack laufer 19:01:42 ack hadley 19:02:11 q+ AxelPolleres 19:02:19 hadleybeenman: I believe the REC track process is such that we (Best Practices) can’t decide things without considering the needs of other groups. 19:02:20 q+ on asking CSVW vs DWBP 19:02:21 q? 19:02:27 ack jenit 19:02:27 JeniT, you wanted to propose language as metadata for tables, title and language for columns 19:02:29 p 19:02:41 JeniT: perhaps we can make a decisions? 19:03:20 … I’d say that “language” at document level (or group of CSV files), and “title” and “language” at the column level. 19:03:36 … “localle” is part of “language”. 19:03:46 timeless has joined #csvw 19:03:48 I'm logging. I don't understand 'this meeting spans midnight <- if you want a single log for the two days', timeless. Try /msg RRSAgent help 19:03:53 q? 19:04:17 What’s rong with the list on http://w3c.github.io/csvw/metadata/#optional-properties ? would like to make an attempt to argue for that list. 19:04:21 … Title is natural-languge string at top of column. 19:04:45 … Name is like a variable name for that column, what is used in the mapping. Typically has a constrained syntax. 19:04:53 … @id is there for JSON-LD compatibility. 19:05:37 … it must be an IRI. 19:05:47 "locations for toilets, e.g. @id "lat" for latitude of toilers 19:05:59 gregg: that would be ok if we had a base location as you could construct an IRI 19:06:00 q? 19:06:17 q+ to note that additional properties like "base" will be required 19:06:25 ivan: that means that “title” is fundamentally different than other properties. 19:06:34 ack ax 19:06:34 AxelPolleres, you wanted to comment on asking CSVW vs DWBP 19:06:36 ack AxelPolleres 19:06:58 AxelPolleres: what’s wrong with the core list? I think the things we need are present in that list. 19:07:21 … It may be arbitrary, but it seems good. Better than something overly constrained. 19:07:39 timeless has left #csvw 19:07:50 q? 19:08:02 q+ 19:08:07 q+ 19:08:20 ack jtandy 19:08:20 jtandy, you wanted to note that additional properties like "base" will be required 19:08:24 +1 to starting small (3) 19:08:34 ErikMannens has joined #CSVW 19:08:37 q? 19:08:44 jtandy: there will be other things that are necessary, but they will emerge. 19:08:51 ack me 19:08:55 q+ 19:09:39 danbri: A small list is easier to stand behind; a medium sized list may give the false impression that we’ve thought deeply about it. 19:10:21 ack gk 19:10:39 gkellogg: if we overly restrict use of simple strings as properties within a json file, 19:10:57 …we are violating expectations of many json users who like js with dot notation (i.e. objects) 19:10:59 q? 19:11:07 jtandy: context could ... 19:11:15 gkelllogg: context could … sure, ... 19:11:30 jtandy: only lang and title are the terms which we want to validate at the surface level 19:11:33 maybe a good idea to not put something expressed elsewhere (DC) into our standard… (retracting my concerns from before, if we point to that as a “for instance” option to extend the meta-data vocab). 19:11:42 gkellogg: also looking for bare terms that dont map 19:11:45 ack iv 19:12:02 ivan: replying to gregg, … what we are talking about here are the usual metadata terms 19:12:23 ivan: 90% of metadata file consists of terms describing the csv file 19:12:27 those are of course unqualified 19:12:30 ivan: we need to be careful: what we’re talking about is the “usual metadata terms”. 90% of the metadata file consistes of terms describing the structure of the CSV file. Those are of course unqualified. 19:12:50 … There are other unqualified terms; we’ve been discussing 10% of the content of a metadata file. 19:12:54 Observers: Please consider volunteering to scribe next session. 19:12:56 q? 19:13:41 … Having a very restricted version as proposed by JeniT as being unqualified is fine. We can then see if “the other group” comes up with more required terms. 19:13:54 q? 19:14:11 jenit, want to take a poll on your proposal to bridge to lunch? 19:14:36 q? 19:14:44 … We really need to solve structural terms. the use of licence and title, say, should be soleved by the Best Practices group. Leave these open until the BPWG has something to say. 19:14:45 +1 Ivan 19:14:51 danbri: 19:14:58 q? 19:15:05 q? 19:15:10 @ivan: +1 19:15:52 PROPOSED RESOLUTION: We will define the terms ‘title’ and ‘language’ (for columns) and ‘language’ (for table groups down), provide examples using qualified terms for other metadata vocabularies, and be guided by DWBP wrt recommending other particular metadata terms to recommend 19:16:24 +q one last question 19:16:27 s/BPWG/DWBP/ 19:16:32 ack a 19:16:51 axel "what about encoding?" (utf-8 etc) 19:17:33 AxelPolleres: what about encoding metadata? 19:17:34 q? 19:17:40 JeniT: described elsewhere. 19:17:52 PROPOSED: We will define the terms ‘title’ and ‘language’ (for columns) and ‘language’ (for table groups down), provide examples using qualified terms for other metadata vocabularies, and be guided by DWBP wrt recommending other particular metadata terms to recommend 19:17:57 +1 19:17:57 +1 19:18:00 +1 19:18:01 +1 19:18:01 +1 19:18:01 +1 19:18:03 +q 19:18:04 +1 19:18:05 +1 19:18:08 -q 19:18:10 +1 though I'm just observing. But this makes sense from DWBP's perspective too 19:18:21 +1 19:18:36 +1 19:18:37 encoding is coverd by the syntax http://www.w3.org/TR/tabular-data-model/#encoding 19:18:45 RESOLVED: We will define the terms ‘title’ and ‘language’ (for columns) and ‘language’ (for table groups down), provide examples using qualified terms for other metadata vocabularies, and be guided by DWBP wrt recommending other particular metadata terms to recommend 19:18:54 +1 (as observer) 19:19:38 Hitoshi has left #csvw 20:04:02 AxelPolleres has joined #csvw 20:04:21 bill-ingram has joined #csvw 20:04:33 JeniT has joined #csvw 20:04:33 scribe: AxelPolleres 20:04:34 gkellogg has joined #csvw 20:05:02 danbri has joined #csvw 20:05:13 jtandy has joined #csvw 20:05:28 ErikMannens has joined #CSVW 20:07:30 jeniT summarizing result of discussion before lunch. 20:08:30 ericstephan has joined #csvw 20:08:45 ivan has joined #csvw 20:09:15 … title and language being the only document level meta-data attributes stanadardised. 20:09:19 we should use BCP47 for languages 20:09:27 bill-ingram1 has joined #csvw 20:09:29 dc http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=terms#terms-language 20:09:42 Addison: suggest to use language tags (BCP47) for languages 20:09:56 q? 20:10:05 s/Addison/Addison Phillips/ 20:10:30 Dan: if more than two codes are applicable, should we repeat the property? 20:12:02 Richard: several languages may appear in a doc, the intended language of the user is the top level of the meta-data, but particular cells or columns could have different language. 20:12:20 Ivan: there should be levels for meta-data at all levels of granularity. 20:12:28 ericP-mobile has joined #csvw 20:12:38 JeniT_ has joined #csvw 20:13:02 (example in mind: col in table that might be any of the croatian/bosnian/serbian lang, some in cyrillic some in latin script, but lacking formal per-cell details) 20:13:04 … additional information about “all languages” used/mentioned in the document? 20:13:44 Richard: top level is “who are the users”, second level is “rendering”. 20:14:02 q+ to ask if there are use cases for language-independent locales? 20:14:24 ivan: we didn’t differentiate that so far 20:14:26 q- ericP-mobile 20:15:40 ericP: trivial use cases, like numeric data have no language. 20:15:59 Addision Williams: there is language tags for “no language” 20:16:38 topic: datatypes 20:17:40 jeniT: datatypes per columns, cell, etc. are a common issue, e.g. xml:schema, string-value vs. semantic value 20:18:24 … in XML schema string values are constrained. 20:18:41 q? 20:19:06 … to ISO format, extremly difficult for CSVs if generated locally. 20:19:28 … we would like to be able to map from type to particular formatting for that type. 20:19:52 q+ 20:20:04 bjdmeest has joined #csvw 20:20:13 Addison Williams: e.g. there are other calendars besides Gregorian, this makes it much more complex 20:20:46 ack ivan 20:20:53 (on the example of data “27/10/2014” vs “27th October 2014” on the whiteboard) 20:21:08 q+ r12a 20:21:21 e.g. http://www.w3.org/TR/xpath-functions-30/#syntax-of-picture-string 20:21:22 ivan: trying to see whether there’s a standard on the “picture values” 20:21:27 q? 20:21:53 http://cldr.unicode.org/ 20:22:18 Addison: refers to ICU library 20:22:48 (is that this one http://site.icu-project.org/ ?) 20:23:05 http://www.unicode.org/cldr/charts/26/by_type/index.html 20:23:23 q? 20:23:52 Richard: different schemas, XML Schema (referring to ISO), HTML, UNICODE… 20:24:31 Ivan: we are rather talking about how to “understand” certain strings as “ISO” … 20:25:14 q? 20:25:22 ack r 20:25:26 http://www.unicode.org/cldr/charts/26/summary/en.html 20:25:37 Addison Williams: unicode defines all those here : http://www.unicode.org/cldr/charts/26/summary/en.html 20:26:44 Richard: you might need more than just the picture strings, e.g. ‘$’ meaning USD or Australien Dollars or HK Dollar, etc. 20:27:06 aphillip has joined #csvw 20:27:16 http://en.wikipedia.org/wiki/ISO_4217 20:27:21 Addison Williams: 3-char code for currencies: ISO4217 20:27:29 http://www.unicode.org/reports/tr35 20:27:57 q? 20:28:09 q+ to ask wher to stop 20:28:17 s/Addison Williams/Addison Phillips/ 20:28:34 q? 20:28:45 q+ 20:28:57 ack axel 20:28:57 AxelPolleres, you wanted to ask wher to stop 20:29:05 "There are things like unit ontologies too... 20:29:10 .. could go arbitarily far 20:29:15 … standard units, … 20:29:21 … i am not clear on where this would stop 20:29:28 e.g. the number of cars per 1000% people 20:29:33 [see also QUDT] 20:30:07 r12a has joined #csvw 20:30:08 ack jtandy 20:30:13 ivan: we shouldn’t go beyond XSD datatypes. 20:30:32 q+ 20:31:01 q+ 20:31:10 Jeremy: maybe we can add in metadata a script that trnasforms “picture strings” into prescribed format before validation. 20:31:20 ack r12a 20:31:25 s/trnasforms/transforms/ 20:31:25 … we should allow people to work around. 20:31:36 q- 20:32:00 Richard: it would be easier if you’d go with one global standard. 20:32:26 ivan: if you consider the data out there, that wouldn’t work, in reality, everybody uses what they want. 20:33:01 q+ 20:33:34 Addison Williams: range of date variation formats is huge. 20:33:35 gkellogg has joined #csvw 20:33:37 q+ 20:33:48 q+ to ask the coverage of picture formats 20:34:08 Ivan: is there a relatively simple picture string format we could refer to and use, which covers ~70% of cases? 20:34:35 … that we can refer to and otherwise, for special cases, allow preprocessing? 20:35:10 Addison Williams: e.g. month abbreviations in various language already make it complex. 20:36:06 Ivan: month abbreviations should be part of locale. We should look around usual libraries in common prog. langs 20:36:32 … I am uneasy with saying “either use an ISO string or give me a program” 20:36:53 q? 20:37:20 ack erics 20:38:23 ericstephan, … see http://www.w3.org/TR/2014/WD-tabular-data-model-20140327/#excel for special casing documentation around Excel 20:38:25 erik: we specify in the tabular metadata doc explicitly about e.g. Excel and the Date-formatting they use 20:38:51 Re spreadsheets, I think the Open Document Format supports dc:date, if that helps any 20:38:53 … that is a technology-based solution. 20:38:57 ack r 20:40:02 Richard: HTML will require you to use one standard format for dates, why not start out with that format. 20:40:16 JeniT: because there are masses of documents that don’t use it. 20:40:40 ack ericP-mobile 20:40:40 ericP-mobile, you wanted to ask the coverage of picture formats 20:41:01 Richard: the argument about prescribing utf-8 is similar. 20:41:32 Here is how the ODF spec handles it: http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#__RefHeading__1416366_253892949 20:42:08 Addison Williams: CLDR contains 100s of locales, not everything, but for data out there is has decent coverage 20:42:18 q? 20:42:43 JeniT, how much time needed for rest of datatypes topics in agenda? 20:42:52 ericP: the value of being able to read existing data, is not that much value. 20:43:02 ErikMannens has joined #CSVW 20:43:07 we should move on if we can, but it would be good to get a direction of travel 20:43:26 q+ to ask about the value of annotatin existing data with scripts. 20:43:27 phila has joined #csvw 20:43:42 q+ to ask i18n folks how to continue this 20:44:39 ericstephan has left #csvw 20:45:09 ericstephan has joined #csvw 20:45:16 ericP: we could do several levels, the question we want to ask ourselves is where to step. 20:45:55 axel: q … if we ask ourselves how far we want to go. What makes us believe people who are not willing to convert their data into a specific format, … why will they go produce metadata to do the mappings 20:46:00 jenit: it could be a 3rd party 20:46:01 q? 20:46:04 ack axel 20:46:04 AxelPolleres, you wanted to ask about the value of annotatin existing data with scripts. 20:46:21 Ivan: metadata can be decoupled 20:46:23 ack me 20:46:23 danbri, you wanted to ask i18n folks how to continue this 20:46:30 Axel: that answers my question then. 20:47:15 JeniT: i’m inclined to get us jsut to Picture-String without locale (e.g. no multi-language month abbreviations) 20:47:57 JeniT: … that seems like a good direction for me. 20:48:00 q+ r12a 20:48:35 Ivan: not sure, we make a requirement that is weaker than most of the prog. lang. libraries out there. 20:48:56 q+ 20:49:41 ack r 20:49:48 Richard: currency examples not covered. 20:49:55 (eg. $) 20:49:58 q? 20:50:02 q+ to state unit != datatype 20:50:04 q+ 20:50:23 Is currency best captured under locale, or as metadata in its own right? 20:50:25 q- 20:50:36 ack jt 20:50:48 ivan: datatypes are not units, currency not a good example. 20:51:20 q+ to repeat myself. 20:51:39 q? 20:52:09 ack A 20:52:09 AxelPolleres, you wanted to repeat myself. 20:52:50 axel "agree currency eg is not a good one (for datatypes). The datatype of a price is number not currency. If the metadata could be decoupled from data, … we could equally well say that someone else republishes the curated data." 20:53:12 … if there is so much data out there with poor datatyping, … … isn't data republishing as likely as metadata annotation? 20:53:27 q+ to say someone could publish a parsing method 20:53:31 ivan: a matter of scale, … if you have terrabytes of data working at metadata level is easier 20:54:08 ack b 20:54:08 bjdmeest, you wanted to say someone could publish a parsing method 20:54:24 Ben: instead of republishing, can we publish a “re-publishing” method. 20:54:39 s/method./method?/ 20:54:55 Thanks aphillip, r12a :) 20:55:16 http://w3c.github.io/csvw/metadata/#datatypes 20:55:31 topic: built-in datatypes 20:56:33 jenit: number, binary, datetime come from json tables which comes from json schema 20:56:51 q+ to ask about schema:Date/Time/Duration types 20:57:15 ivan: we may not want to add geopoint 20:57:39 JeniT: propose we just ignore geopoint alltogether 20:58:12 PROPOSAL: We should not support ‘geopoint’ as a datatype 20:58:15 discussing issues ISSUE-13 ff 20:58:18 +1 20:58:19 +1 20:58:22 +1 20:58:22 +1 20:58:25 +0 20:58:25 +1 20:58:27 +1 20:58:30 +0 20:58:30 +0 20:58:45 RESOLVED: We should not support ‘geopoint’ as a datatype 20:59:05 PROPOSAL: We should not support ‘object’, ‘array’ or ‘geojson’ as datatypes 20:59:12 (this is ISSUE 14 in the document) 20:59:13 (issue is purely in the doc, not in w3c tracker or github tracker) 20:59:16 +1 20:59:21 +1 20:59:22 +1 20:59:24 +1 20:59:26 +1 20:59:29 (ISSUE 13 in the document should be closed) 20:59:32 +1 20:59:49 +1 21:00:32 +1 21:00:35 RESOLVED: We should not support ‘object’, ‘array’ or ‘geojson’ as datatypes 21:00:49 issue 15, the any type 21:01:01 from doc, "We invite comment on whether the any type is useful." 21:01:04 Jeremy: we will support some knd of list types though, or parsing lists. 21:01:11 q? 21:01:31 gkellogg, can we deal with your point after these issues are resolved? 21:02:17 JeniT: ISSUE-15 we could enable to let people declare explicitly that something is of no particular datatype. 21:02:19 PROPOSAL: It is useful to have an ‘any’ type to explicitly say that anything is allowed 21:02:23 +1 21:02:30 +1 21:03:40 eric: what’s the difference between any type and string? 21:03:53 +1 21:03:59 jeremy: it is made explicit. 21:04:45 -0 21:04:52 +1 21:04:54 +1 21:04:55 +0.5 21:05:02 JeniT: still unsure. 21:05:03 +0 21:05:15 q? 21:05:17 fjh has joined #csvw 21:05:38 q+ to ask about null 21:05:42 fjh has joined #csvw 21:05:53 fjh has joined #csvw 21:05:58 Ivan: this is for mixed type columns 21:06:57 ack g 21:06:57 gkellogg, you wanted to ask about schema:Date/Time/Duration types 21:07:42 gregg: schema.org uses different datatypes than xml schema… 21:07:58 ack danbri 21:07:58 danbri, you wanted to ask about null 21:08:00 q? 21:08:06 ivan: I am changeing back my vote to 0 21:08:29 example: birthdate, deathdate 21:08:44 JeniT: shall we allow empty values for particular cells, e.g. deathdate 21:09:11 “”^^:null 21:09:18 +1 21:10:59 EricP: any can be solved on the application level 21:11:02 scribe: bjdmeest 21:11:07 q+ to a reference back to the locale issue for the minutes 21:11:13 time til next break: 19 mins 21:11:19 EricP: technicaly, it will be a string 21:11:19 q+ 21:11:21 +1 21:11:27 thanks new scribe, thanks old scribe! 21:12:03 q+ 21:12:03 EricP: for RDF: top-level is string 21:12:58 q- 21:13:25 Jeni: might keep this as an issue... 21:13:46 laufer: semantics is in the application, or in the datatype? 21:13:58 ivan: data can come from different sources 21:14:24 ack ivan 21:14:24 ivan, you wanted to a reference back to the locale issue for the minutes 21:14:24 q? 21:14:27 The reference to use for the date 'picture strings' is: http://www.unicode.org/reports/tr35/ 21:14:40 ack laufer 21:14:59 next, "We invite comment on whether there should be types for formats like XML, HTML and markdown which may appear within CSV cells" 21:15:06 Jeni: issue 16 is support for other kind of stuff (xml, html, markdown) 21:15:12 ... how to handle those 21:15:20 ... simple strings? specific datatypes? 21:15:42 ... Markdown would be very useful 21:15:55 q? 21:15:55 EricP: what is user can define types? 21:16:06 q+ to suggest using media types 21:16:11 ... to support different markdown flavors 21:16:12 q+ to ask difference datatype / mediatypes (aka mimetypes) 21:16:16 q+ 21:16:27 ack j 21:16:27 JeniT, you wanted to suggest using media types 21:16:29 Jeni: possiblity: 21:16:45 ... specify media-type 21:17:13 ack me 21:17:13 danbri, you wanted to ask difference datatype / mediatypes (aka mimetypes) 21:17:14 danbri: what's the difference between media-type and data-type? 21:17:27 ack i 21:17:33 ivan: technical question: 21:17:45 ... not full xml, but only fragments, same for html 21:18:08 ... is that something to use a datatype? 21:19:09 ... maybe we have to define a datatype for markdown? 21:19:27 (q: normative refs to markdown?) 21:19:33 q? 21:19:35 ... don't standardize markdown, just add datatype 21:20:02 jenit: people can specify their own with a prefix 21:20:31 ... we cannot define a markdown datatype, as there is no spec 21:21:11 danbri: fragments is usefull, we get hyperlinks 21:21:17 jenit: what about json in CVS? 21:21:47 PROPOSAL: We should add ‘xml’ and ‘html’ datatypes 21:21:49 jtandy: i have a lot of people add json in CSV 21:21:53 q+ 21:21:56 +1 21:22:04 q+ 21:22:05 gkellog: it's quite common to add html table 21:22:06 +1 21:22:08 +1 21:22:09 +1 21:22:19 danbri: csv is not really specified strcitly 21:22:37 ack me 21:22:48 +1 21:23:00 laufer: one may define its own datatype? 21:23:14 PROPOSAL: We should add ‘xml’ and ‘html’ datatypes as defined in RDF 21:23:19 ... string can have its own datatype 21:23:21 chunming has joined #csvw 21:23:29 jtandy: define it in your own namespace 21:23:42 laufer: json is not a qualified datatype? 21:23:51 jenit: its not on the list (yet) 21:23:58 s/its/it's/ 21:24:13 +1 21:24:19 +1 21:24:25 RESOLVED: We will add ‘xml’ and ‘html’ datatypes as defined in RDF 21:24:36 PROPOSAL: We should add ‘json’ datatype with our own namespace 21:24:52 ivan: what is the official status of json? 21:25:01 jenit: there is an IETF and an ?? spec 21:25:08 (ECMA spec) 21:25:11 ivan: is anything stable? 21:25:29 Common Markdown Spec: http://spec.commonmark.org/0.6/ 21:25:41 ericP: easier to have a stable spec for json than for markdown 21:25:54 gkellog: there is a communicty spec for (common) markdown 21:26:12 s/communicty/community/ 21:26:16 +0 21:26:20 jenit: should we have a json datatype? 21:26:21 +1 21:26:29 +1 21:26:32 +1 ... I've seen it in the wild 21:26:32 +1 21:26:34 +1 21:26:39 +1 21:26:54 ericP: implications might be big? at parsing... 21:27:15 q? 21:27:21 ... signed up for row with value, possibly array, but with json... might explode 21:27:38 jenit: if there is embedded json: do we parse it? 21:27:39 laufer, you're still on the queue. Was this a new question/topic? 21:27:53 gkellog: what about json-ld? merge with graph in RDF serialization? 21:28:06 ivon: json inside json? how do processors react? 21:28:18 jenit: same with xml serialization 21:28:41 q+ 21:28:55 ericP: my guess: serialization: escape with quotes 21:28:56 q- 21:29:27 ack me 21:29:31 danbri: what is worse than 10 mb of json inside csv? 10 mb of anything inside csv 21:29:50 jenit: during mapping: json or xml or ... remain strings 21:29:58 ... possibly datatype string ---- 00:00:11 ... but if the title is in english, someone else will want in portugeuese 00:00:27 q+ to seek volunteers to create some problematic specific examples 00:00:52 q+ to complain about having to hunt for files 00:01:01 gkellogg: in the JSON-LD expansion algorithm, @@1 is processed first. 00:01:31 ... there's no current way to say that "2 levels deep in this doc, there's an X of type int and i want it to be a float" 00:01:33 q- 00:01:35 q+ to reflect on Gregg’s comment. 00:01:47 ... the concept from CSS is !important 00:02:19 ... we could provide something like, but we will have issues with the JSON-LD algorithms 00:02:43 bjdmeest has joined #csvw 00:03:02 JeniT: arguing against collecting all possible metadata docs, you have to do a bunch of optimistic GETs 00:03:12 q? 00:03:17 ack j 00:03:17 JeniT, you wanted to complain about having to hunt for files 00:04:40 ivan: if we don't do that, and i look at the publisher who supplies a range of metadata files, the metadata.json becomes useless 00:05:09 ... so i have to copy it into all of the metadata files. 00:06:16 q+ 00:06:24 JeniT: i think we're muddling up our access and metadata resolution 00:06:33 ack AxelPolleres 00:06:33 AxelPolleres, you wanted to reflect on Gregg’s comment. 00:06:57 ... what keeps them from mechanically combining their sources to create one metadata file? 00:07:01 ivan: publishers aren't going to know to do that 00:08:41 AxelPolleres: the client could load a metadata file that points to another and finally the CSV 00:10:36 gkellogg: there's no import an JSON-LD, but there is in @contexts 00:11:30 PROPOSAL: We use an ‘import’ property in the first metadata document you find to merge in metadata from other files 00:11:34 ack laufer 00:12:26 laufer: do different types of metadata have the same precedence order? (e.g. license vs. title) 00:13:33 q? 00:13:48 JeniT: i think they all have to be the same 00:14:09 laufer: new types of metadata will have to agree with our ordering 00:14:32 +1 00:14:49 -0.0 00:15:11 q+ to ask if users will find links to metadata files 00:15:41 jtandy: in your UC4 example, you used a schema IRI to reference another doc 00:15:52 in here somewhere - https://github.com/w3c/csvw/tree/gh-pages/examples/tests/scenarios/uc-4/attempts - ? 00:16:12 q? 00:16:15 ack e 00:16:15 ericP, you wanted to ask if users will find links to metadata files 00:18:27 +1 00:21:10 PROPOSAL: We use an ‘import’ property in the first metadata document found through the precedence hierarchy described in section 3 (but with inclusion of user-defined metadata); the merge is a depth first recursive inclusion 00:21:32 +1 (as observer) 00:21:40 q? 00:21:48 +1 00:21:58 +1 00:22:09 +1 00:22:28 +1 00:22:30 +1 00:22:48 RESOLVED: We use an ‘import’ property in the first metadata document found through the precedence hierarchy described in section 3 (but with inclusion of user-defined metadata); the merge is a depth first recursive inclusion 00:22:49 +1 00:23:04 ivan: 2⅞... 00:23:26 ... we need lots of examples in the spec 00:23:35 JeniT: [issue 3.2 packaging] 00:23:58 ... tomorrow we have a specific issue around multiple CSV files 00:24:13 ... be we have the general problem of how to package all this stuff 00:24:29 ivan: we won't get packaging on the web done before we finish. 00:24:45 ... they can accept current stuff, e.g. zip, gzip. 00:24:56 gkellogg: we can address the result of unpacking 00:25:24 JeniT: [issuse 7: link header] 00:25:35 ... rel="describedBy" 00:26:00 ... plus the content type (which is the packaging media type) 00:26:14 s/packaging media type/metadata media type/ 00:26:22 gkellogg: we tried various things in JSON-LD and came back to DescribedBy 00:26:34 PROPOSAL: We will use ‘describedby’ as the relevant link relation 00:26:34 +1 00:26:35 +1 00:26:41 +1 00:26:42 +1 00:26:46 +1 00:26:52 +1 00:26:59 +1 00:27:05 +1 (as observer listening to gkellogg) 00:27:15 RESOLVED: We will use ‘describedby’ as the relevant link relation 00:27:23 JeniT: [issue 8: standard path] 00:27:37 ... we have two standard paths: 00:27:39 ericstephan_ has joined #csvw 00:27:40 ... .. file-specific 00:27:48 ... .. more generic metadata file 00:28:05 ... pushback on .CSVM 'cause processors will understand .JSON 00:28:06 .csvm vs .json 00:28:11 how about .csv.json ? 00:28:34 q? 00:28:39 ... propose toliets.csv -> toilets.csv.json 00:29:20 q+ to ask re i18n 00:30:05 q+ to ask what of this is specific to CSV data? 00:31:16 And would this be better as a best practice rather than "hacking the URI"? 00:31:34 http://jsontocsvconverter.example.org?input=file1.csv.json :-) 00:31:51 q? 00:32:03 http://jsontocsvconverter.example.org?input=file1.json 00:32:21 —> metadata http://jsontocsvconverter.example.org?input=file1.json.json ? 00:32:37 q+ 00:32:46 q+ 00:32:48 q+ 00:32:54 ack danbri 00:32:54 danbri, you wanted to ask re i18n 00:33:16 danbri: what happens in non-latin scripts at the end? 00:34:11 ivan: if it's all chinese chars, we end with ".json" 00:34:17 ack hadl 00:34:17 hadleybeeman, you wanted to ask what of this is specific to CSV data? 00:34:36 hadleybeeman: how much of this is specific to CSV vs. other data? 00:35:03 q? 00:35:06 maybe better … /metadata.json 00:35:08 ... 2. JeniT said hacking URIs is unpleasant so i wonder if this should be a "best practice" 00:35:55 jtandy: i think this is specifically about CSV/TSV (tabular data) 00:36:12 ericstephan_: this is about the connection between the CSV file and the metadata file. 00:37:09 ... if you save a file in your favorite office tool, and you give it a file extension, and then it appends ".doc" 00:37:20 q? 00:37:34 ack ivan 00:38:17 ivan: if there is a publisher that puts out a bunch of CSV data, and there's a need for the query component, they can [damn well] use the link header 00:38:25 q- 00:38:45 ... this -.json is for folks who can't control the server 00:39:23 ack axel 00:40:03 ... so if there's a '?' in the URI, don't look for the .json 00:40:50 AxelPolleres: i don't like ".json", can we have "-metadata.json"? 00:41:34 q+ 00:42:16 ack g 00:42:40 JeniT: the point of these simple methods of finding the metadata is that in many environments, folks have no control over http headers 00:42:40 can we vote on the suffix? “-metadata.json” ? 00:42:50 PROPOSAL: We find a metadata file by adding ‘.json’ to the end of the URL of the CSV file, but only if the URL doesn’t contain a query component 00:46:12 PROPOSAL: We find a metadata file by adding ‘-metadata.json’ to the end of the URL of the CSV file, but only if the URL doesn’t contain a query component 00:46:15 +1 00:46:16 +0.3 00:46:16 +1 00:46:17 +1 00:46:20 +1 00:46:21 +1 00:46:24 +0.1 00:46:28 +1 (as observer, under the influence) 00:46:31 RESOLVED: We find a metadata file by adding ‘-metadata.json’ to the end of the URL of the CSV file, but only if the URL doesn’t contain a query component 00:47:12 JeniT: [issue 9: default navigational climb] 00:47:27 We should note in the doc that the link header is the preferred version, yes? 00:47:28 q? 00:47:33 q+ to suggest http://www.sitemaps.org/protocol.html 00:47:37 ... still completely possible. i might have metadata file at the top of a directory of csv files 00:48:58 "The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/. 00:48:58 If you have the permission to change http://example.org/path/sitemap.xml, it is assumed that you also have permission to provide information for URLs with the prefix http://example.org/path/." 00:50:03 q- 00:51:01 AxelPolleres has left #csvw 00:51:51 jtandy: i propose that we say that there's value but that we decided not to do it. 00:52:11 rrsagent, draft minutes 00:52:11 I have made the request to generate http://www.w3.org/2014/10/28-csvw-minutes.html ivan 00:52:20 AxelPolleres has joined #csvw 00:52:23 PROPOSAL: We do not traverse path hierarchies to locate metadata files 00:52:30 +1 00:52:32 +1 00:52:48 +1 00:52:48 +1 00:52:51 +1 00:52:51 +1 00:52:52 +1 00:52:58 +1 as observer 00:53:00 gkellogg suggests that this use case might be resolved using the package mechanism 00:53:37 laufer: CKAN has a resolution for this 00:53:59 JeniT: [issue 10] 00:54:12 RESOLVED: We do not traverse path hierarchies to locate metadata files 00:54:39 jtandy: we have plenty of good ways to find things. if people wnat to add, they have to motivate us. 00:54:53 danbri: we should have an informative ref to site-map 00:55:05 q+ 00:55:14 action: danbri propose a sentence informative-referencing sitemaps.org xml format 00:55:14 Created ACTION-42 - Propose a sentence informative-referencing sitemaps.org xml format [on Dan Brickley - due 2014-11-04]. 00:55:19 q? 00:55:19 rrsagent, draft minutes 00:55:19 I have made the request to generate http://www.w3.org/2014/10/28-csvw-minutes.html phila 00:55:21 So, either link header or ‘[originalcsvfilename]-metadata.json’, with the former preferred, that’s it, yes? 00:55:44 laufer: you talk of a structure that you can access directly. 00:55:55 ... in this case, DCAT can make this link. 00:55:57 meaning that if people want to include additional mechanisms to find the metadata file (e.g. sitemaps), then they need to provide a rational argument for doing so 00:56:12 i.e. http://www.w3.org/TR/vocab-dcat/#class-distribution 00:56:14 ... you have link and distribution 00:56:23 ... in DWBP, we plan to extend DCAT 00:56:23 q? 00:56:25 ack l 00:56:34 ... so i don't know if you can add this extension. 00:57:23 JeniT: [issue 12: separate media type] 00:58:07 ... i think we need a specific media type, e.g. application/csv-metadata+json 00:58:23 Meeting: CSVW WG f2f at TPAC 2014 00:58:46 q? 00:59:03 ivan: doc has to be at a certain level of maturity 01:00:05 JeniT: i think that issue 15 (trimming whitespace) is an IETF issue 01:00:06 rrsagent, draft minutes 01:00:06 I have made the request to generate http://www.w3.org/2014/10/28-csvw-minutes.html ivan 01:00:24 see also http://tools.ietf.org/html/rfc4180 01:00:45 RRSAgent, draft minutes 01:00:45 I have made the request to generate http://www.w3.org/2014/10/28-csvw-minutes.html phila 01:00:51 ADJOURNED 01:00:59 example of what we did in SPARQL regarding mimetypes: http://www.w3.org/TR/rdf-sparql-json-res/#mediaType 01:01:15 Hitoshi has left #csvw 01:03:46 AxelPolleres has left #csvw 01:05:39 em has joined #CSVW 01:09:59 bill-ingram has joined #csvw 01:18:05 ivan has joined #csvw 01:22:00 ivan_ has joined #csvw 01:28:53 gkellogg has joined #csvw 03:00:56 ivan has joined #csvw 04:27:45 JeniT has joined #csvw 07:50:41 jumbrich has joined #csvw 11:36:29 ivan has joined #csvw