IRC log of csvw on 2014-10-27
Timestamps are in UTC.
- 15:26:25 [RRSAgent]
- RRSAgent has joined #csvw
- 15:26:25 [RRSAgent]
- logging to http://www.w3.org/2014/10/27-csvw-irc
- 15:26:30 [Zakim]
- Zakim has joined #csvw
- 15:26:50 [ivan]
- rrsagent, set log public
- 15:27:22 [ivan]
- Meeting: CSV on the Web WG, F2F meeting @ TPAC, 2014-10-27
- 15:27:27 [ivan]
- Chair: danbri
- 15:48:41 [danbri]
- danbri has joined #csvw
- 15:50:01 [bill-ingram]
- bill-ingram has joined #csvw
- 15:50:03 [jtandy]
- jtandy has joined #csvw
- 15:50:11 [hadleybeeman]
- hadleybeeman has joined #csvw
- 15:50:31 [laufer]
- laufer has joined #csvw
- 15:50:40 [bjdmeest]
- bjdmeest has joined #csvw
- 15:52:17 [hadleybeeman]
- scribe: hadleybeeman
- 15:52:29 [JeniT]
- agenda: https://www.w3.org/2013/csvw/wiki/F2F_Agenda_2014-10
- 15:53:00 [em]
- em has joined #CSVW
- 15:53:16 [AxelPolleres]
- AxelPolleres has joined #csvw
- 15:53:32 [ErikMannens]
- ErikMannens has joined #CSVW
- 15:54:01 [hadleybeeman]
- hadleybeeman has changed the topic to: Agenda: https://www.w3.org/2013/csvw/wiki/F2F_Agenda_2014-10
- 15:54:08 [ericstephan]
- ericstephan has joined #csvw
- 15:55:29 [danbri]
- https://docs.google.com/presentation/d/1PYx7PmaB4Ouyf_uHJZwE331Cg0R9aGPspjx6y1Z-GNg/edit?usp=sharing
- 15:57:27 [phila]
- phila has joined #csvw
- 15:57:30 [hadleybeeman]
- danbri: [introduces the agenda]
- 15:57:37 [Hitoshi]
- Hitoshi has joined #csvw
- 15:57:37 [hadleybeeman]
- topic: intros
- 15:57:43 [phila]
- rrsagent, make logs public
- 15:57:59 [hadleybeeman]
- danbri: works for google, love/hate relationship with RDF. Interested in getting new ways of sucking data into search engine.
- 15:58:55 [hadleybeeman]
- jenit: at the Open Data Institute, who are interested in helping people publish/consume open data. Wants to get more consistent CSVs on the Web, for users and publishers to express all the fiddly little context bits that are necessary for reusers to understand.
- 15:59:02 [JeniT]
- Present+ Dan Brickley
- 15:59:06 [JeniT]
- Present+ Jeni Tennison
- 16:00:09 [chu]
- chu has joined #csvw
- 16:00:44 [hadleybeeman]
- bill-ingram: At the University of Illinois Urbana-Champagne. Interested in research data in the repository space, planning one now.
- 16:00:46 [ivan]
- Present+ Bill Ingram
- 16:00:59 [phila]
- present+ Phila
- 16:01:02 [ivan]
- Present+ Hadley Beeman
- 16:01:28 [danbri]
- hadley beeman: one of 4 co-chairs of data on web best practives wg. Day job tech advisor to govt cto in uk. Removing barriers to data re-use and publication, become more intuitive, part of everyday life, identify bottlenecks in system.
- 16:01:33 [ivan]
- Present+ Jeremy Tandy
- 16:01:59 [ivan]
- Present+ Eric Prud'hommeaux
- 16:02:22 [hadleybeeman]
- jtandy: From the UK Met Office (the national weather service and research institute). We produce tonnes of CSV data. Interested in cross domain boundaries. I want to take CSV to annotate it in a way that it can be combined with other data. Unanticipated reuse.
- 16:02:35 [ivan]
- Present+ Laufer
- 16:03:14 [hadleybeeman]
- laufer: Work at Web Engineering Laboratory at the Catholic University of Rio de Janeiro. Also participates in the Data on the Web BP group. Interested in lots of kinds of data.
- 16:03:15 [phila_]
- phila_ has joined #csvw
- 16:03:44 [JeniT]
- Present+ Chuming Hu
- 16:03:46 [ivan]
- Present+ Chunming Hu
- 16:04:27 [hadleybeeman]
- Chunming_hu: W3C team from China, Chinese host of W3C. Research on data storage and parallel data storage. Work with lots of companies who want to know more about this kind of work, semantics and CSV.
- 16:04:39 [phila_]
- phila_ has joined #csvw
- 16:04:48 [ivan]
- Present+ Eric Stephan
- 16:05:45 [ivan]
- Present+ Ivan Herman
- 16:05:52 [ivan]
- Present+ Axel Polleres
- 16:06:03 [hadleybeeman]
- ericstephan: Works at a lab for the US Dept of Energy (Pacific Northwest Lab). Scientists are using .xls* and CSV data. They've looked at mixing data from domains beyond original intentions for the data. Data has taken on a life of its own. I'm a hands-on, real-world problem-driven in focus.
- 16:06:45 [hadleybeeman]
- ivan: I am the staff contact for this group. I've been working on various forms of data on the web for 7 or 8 years; used to lead the Semantic Web activity. The transition to CSV was a natural one
- 16:06:54 [phila_]
- phila_ has joined #csvw
- 16:07:34 [hadleybeeman]
- Ericprod'hommeaux: I'm w3c staff, mostly working in clinical informatics and bio informatics. Worked with Sage who were trying to get their data in a more useful form ,but after a while they were still using CSVs.
- 16:08:10 [hadleybeeman]
- phila: I'm W3C staff, am a member of the group and observing. For me it's about making sure the Web is a data platform, not just a platform for exchanging other files.
- 16:09:13 [hadleybeeman]
- axelpolleres: I'm from Vienna University of Economics and Business, and from the RDF linked data side. A year ago, we started to talk in Austria about how to publish data. We were quite surprised at how much needs to be done.
- 16:09:23 [ivan]
- Present+ Erik Mannens
- 16:09:33 [ivan]
- Present+ Ben De Meester
- 16:09:37 [chu]
- chu has joined #csvw
- 16:10:03 [hadleybeeman]
- Hitoshi: I'm gathering information about W3C activities and how working groups go on and what they're focusing on. I don't have an interest in CSV, but I want to know CSV will be used on the web.
- 16:10:10 [AxelPolleres]
- we talk mainly with Open Data portal providers there, such as the federal chancellery, or the Cooperation OGD Austria.
- 16:10:38 [ivan]
- ivan has joined #csvw
- 16:11:12 [hadleybeeman]
- Erikmannens: AC rep for MMLabs. I had a team of researchers at Ghent University on data analytics. We are working on open data publishing. Working on RML
- 16:11:42 [hadleybeeman]
- BJDmeest: I'm here for the Digital Publishing and Web Annotation WGs. Interested in the semantics of data in general.
- 16:12:19 [hadleybeeman]
- topic: charter
- 16:12:36 [hadleybeeman]
- topic: charter http://www.w3.org/2013/05/lcsv-charter.html
- 16:12:58 [hadleybeeman]
- ivan: Finishing by the end of August 2015 is, in my view, impossible.
- 16:13:13 [hadleybeeman]
- ... we will have to ask for a charter extension and hope that Phila will be kind enough to help\
- 16:13:29 [jumbrich]
- jumbrich has joined #csvw
- 16:13:36 [hadleybeeman]
- danbri: This is our contract with the wider W3C community.
- 16:13:58 [hadleybeeman]
- ...The specifics for our documents come from the numbered list in the Scope section
- 16:14:39 [hadleybeeman]
- ...Re metadata vocabulary: Tables are fantastic places to put stuff, but there is no where to put any other info. How much can we dare to say in this group about what the entire planet can say about their tables?
- 16:14:54 [phila__]
- phila__ has joined #csvw
- 16:15:36 [hadleybeeman]
- jtandy: Many people publish many CSVs together, and we want to be able to describe the relationship between them. That fits here too.
- 16:16:06 [hadleybeeman]
- jenit: Not just describing the file, but also going into what the table contains. What kind of data, which columns it has, what they contain.
- 16:16:25 [hadleybeeman]
- danbri: that also fits with "standard mapping mechanisms transforming CSV to other formats".
- 16:16:41 [hadleybeeman]
- Jenit: that's a stand-in for structure that most programming languages will consume
- 16:17:24 [hadleybeeman]
- ... the idea is that if you find a CSV file on the web, you want to be able to find out about it (metadata) or you may start with a metadata file which may point to a lot of CSV files
- 16:17:43 [hadleybeeman]
- jtandy: it may be that the metadata and data are published independently of each other. Possibly by different publishers.
- 16:18:03 [hadleybeeman]
- danbri: Use cases. We have lots of them
- 16:18:34 [hadleybeeman]
- ericprudhommeaux: I assume use cases are linked to requirements. How easy is it for someone who has their own use case to discover that their requirements may be addressed?
- 16:18:46 [JeniT]
- use cases & requirements document: http://w3c.github.io/csvw/use-cases-and-requirements/
- 16:19:00 [hadleybeeman]
- jtandy: the document makes more effort in describing the use case. We need to flesh out the requirements and make them clearer.
- 16:19:10 [hadleybeeman]
- ...But there is a formalised linkage between the two
- 16:19:17 [phila]
- q+ to talk about UCR
- 16:19:32 [phila]
- ack me
- 16:19:32 [Zakim]
- phila, you wanted to talk about UCR
- 16:19:39 [ivan]
- ack phila
- 16:19:39 [hadleybeeman]
- Ericprodhommeaux: A measure of success may be that someone can bring in a use case, look at the requirements and see if theirs are included already
- 16:20:07 [laufer]
- q+
- 16:20:11 [hadleybeeman]
- phila: The use case document for CSVW is useful for DWBP. That group (laufer) will pull use cases from this group's document for that group's use case doc.
- 16:20:12 [ivan]
- q+
- 16:20:14 [jtandy]
- q+
- 16:20:29 [ivan]
- ack laufer
- 16:21:12 [hadleybeeman]
- laufer: You are talking about a file with metadata for other CSV files, and I've seen that you've proposed a file extension. We will have other metadata files, but I'm not sure a particular extension would be useful. A general way to link metadata files to data files may be better.
- 16:21:36 [hadleybeeman]
- jeniT: we'll be discussing that later today. But it contains 4 mechanisms for finding metadata; appending a file suffix is one of the four.
- 16:21:49 [Hitoshi_]
- Hitoshi_ has joined #csvw
- 16:21:51 [phila]
- q?
- 16:22:05 [ivan]
- ack ivan
- 16:22:32 [hadleybeeman]
- ivan: Looking at the Use Cases document, to the editors: is the document done?
- 16:22:58 [hadleybeeman]
- jtandy: I think we have a good collection of use cases. There may be others to include. D3: data driven documents — we may want to look at it.
- 16:23:34 [hadleybeeman]
- ...As we reviewed use cases earlier this year, we saw that most requirements in them had already been covered. But the requirements do need more work. They are placeholders that allow us in the group to work on them.
- 16:23:37 [danbri]
- q?
- 16:23:43 [danbri]
- ack jtandy
- 16:24:18 [hadleybeeman]
- ericstephan: I'm not sure if we've drawn out — if we found use cases that correlated well, we combined them. That was an internal, organic process.
- 16:24:38 [hadleybeeman]
- ...It might be useful to show something like characteristics? Not a requirement.
- 16:24:43 [hadleybeeman]
- jenit: can you give an example?
- 16:25:17 [hadleybeeman]
- ericstephan: In science efforts, there may be an approach (imaging formats, for instance) used in an entirely different discipline.
- 16:25:34 [ivan]
- Present+ Gregg Kellogg
- 16:25:37 [hadleybeeman]
- ...Is it enough to put it in requirements, or is there another outreach mechanism that would help draw people in, so they can relate to a use case?
- 16:26:10 [hadleybeeman]
- jtandy: As an example, we had to work out which use cases covered data transformation. Not a requirement, but something they have in common. Maybe a simple lookup table at the topic?
- 16:26:20 [hadleybeeman]
- danbri: Do you have everything you need to do that?
- 16:26:38 [hadleybeeman]
- jtandy: the ones we have are sufficiently articulated to do that. We should give them the chance to comment though.
- 16:26:50 [hadleybeeman]
- danbri: and in terms of having their actual CSV files?
- 16:27:19 [hadleybeeman]
- jtandy: Sometimes. Some are behind corporate firewalls. Obviously only those use cases that talk about transformation can have target XML, RDF, JSON. But examples of those help.
- 16:27:47 [danbri]
- q?
- 16:27:51 [hadleybeeman]
- ericstephan: It's like saying, "Here's something that illustrates this use case, and here are some sister or related datasets from something similar."
- 16:28:01 [hadleybeeman]
- ...So you could expand from datasets from the explicit use case.
- 16:28:25 [hadleybeeman]
- jtandy: But given the limited resources of the group, we have to balance that idea along with meeting the other deliverables. Let's try to work that out this week.
- 16:28:41 [hadleybeeman]
- danbri: My feeling this that this document is in a good place. Better than many I've seen.
- 16:29:54 [hadleybeeman]
- GreggKellogg: (introduces himself) I'm an IE in this group. I'm an consultant. I'm one of the editors of the JSON-LD spec. I've not participated a lot on calls due to time zone challenges.
- 16:30:32 [hadleybeeman]
- danbri: re deliverables listed in the charter. UCR?
- 16:30:42 [hadleybeeman]
- ivan: That's what I was checking. It's 80% done?
- 16:30:44 [hadleybeeman]
- jtandy: yes
- 16:31:01 [jtandy]
- q+
- 16:31:09 [hadleybeeman]
- danbri: Metadata vocabulary for tabular data. Title has changed from charter, but intention is still same.
- 16:31:28 [hadleybeeman]
- ...Access methods for CSV Metadata
- 16:32:06 [hadleybeeman]
- jenit: This is talking about syntax around CSV, and the issues there. We have something to resolve there: we aren't the route in charge of syntax for CSV files. It's not in our charter. And yet it's the syntax that tis one of the big sticking points for making this work.
- 16:32:31 [hadleybeeman]
- ... This document therefore has a non-normative section on syntax issues, which will feed into the IETF's work on this.
- 16:32:41 [phila]
- q+ to ask about IETF a little more
- 16:32:42 [hadleybeeman]
- ivan: This is rec track?
- 16:32:45 [hadleybeeman]
- danbri: Yes.
- 16:32:53 [danbri]
- q?
- 16:33:10 [phila]
- q+ EricP
- 16:33:11 [ivan]
- ack jtandy
- 16:33:38 [hadleybeeman]
- jtandy: I found useful from this document: knowing what IS tabular data. We had a use case from the medical community that was a line oriented data, but not tabular.
- 16:33:56 [hadleybeeman]
- ...This is a useful document for helping determine what we do want to talk about. And what we don't.
- 16:34:14 [hadleybeeman]
- ...I'd suggest reading this before you get coffee at the brea.
- 16:34:14 [jtandy]
- q-
- 16:34:24 [hadleybeeman]
- jeniT: we'll be going through this in depth later today.
- 16:34:49 [hadleybeeman]
- ivan: The editor of the IETF document is a fairly active part of this group. He's not here now.
- 16:35:22 [hadleybeeman]
- Frederick Hirsch: David Lewis: (introductions)
- 16:35:29 [hadleybeeman]
- q?
- 16:35:35 [hadleybeeman]
- ack phila
- 16:35:35 [Zakim]
- phila, you wanted to ask about IETF a little more
- 16:35:46 [phila]
- q==
- 16:35:53 [hadleybeeman]
- phila: Do we expect the IETF spec to be updated in response to this work?
- 16:35:56 [JeniT]
- q?
- 16:35:59 [hadleybeeman]
- jenit: yes
- 16:35:59 [phila]
- q?
- 16:36:03 [JeniT]
- ack ericp
- 16:36:04 [phila]
- q- ericP
- 16:36:16 [hadleybeeman]
- danbri: are we happy with the mappings of the names in the charter to what we've done?
- 16:36:24 [hadleybeeman]
- ivan: The titles in a charter often change.
- 16:36:41 [hadleybeeman]
- danbri: It's not unreasonable to write down the data model for CSV before you move on.
- 16:36:55 [hadleybeeman]
- jtandy: I don't remember having a document for access methods for metadata
- 16:36:57 [JeniT]
- http://w3c.github.io/csvw/syntax/#locating-metadata
- 16:37:00 [hadleybeeman]
- danbri: it's a secant of the model
- 16:37:04 [hadleybeeman]
- s/secant/seciton
- 16:37:09 [hadleybeeman]
- s/seciton/section
- 16:37:23 [JeniT]
- http://w3c.github.io/csvw/csv2json/
- 16:37:28 [JeniT]
- http://w3c.github.io/csvw/csv2rdf/
- 16:37:48 [hadleybeeman]
- danbri: Mapping mechanisms is the last bit. We have Generating ...
- 16:37:49 [hadleybeeman]
- ivan: and Generation JSON from Tabular Data on the Web
- 16:37:57 [hadleybeeman]
- jtandy: and we anticipate having one for XML
- 16:38:05 [hadleybeeman]
- ivan: Yes, but there has been no interest
- 16:38:21 [hadleybeeman]
- jenit: does anyone want to do this?
- 16:38:34 [phila]
- q+ ericP
- 16:38:35 [Hitoshi]
- Hitoshi has joined #csvw
- 16:38:45 [Hitoshi_]
- Hitoshi_ has left #csvw
- 16:39:11 [danbri]
- q?
- 16:39:13 [hadleybeeman]
- jenit: a good mapping to XML would include XSI-type elements to indicate the values, which would go beyond what JSON supports.
- 16:39:31 [hadleybeeman]
- ...You could envisage a mapping to XML that turns some things in to elements and some into attributes.
- 16:39:54 [phila]
- q+ to talk about XML
- 16:40:09 [hadleybeeman]
- ivan: But we have to be careful: if we define a mapping to XML, and we want it to be a recommendations, we need implementations, test suites, etc. Not just a cut-and-paste job.
- 16:40:19 [phila]
- ack ericP
- 16:40:19 [ericstephan]
- q+
- 16:40:48 [phila]
- ack me
- 16:40:48 [Zakim]
- phila, you wanted to talk about XML
- 16:40:48 [hadleybeeman]
- ericP: Henry Thompson wrote a paper on normal forms of XML, turning XML into RDF. If you're going the other way you might want to see it.
- 16:41:04 [hadleybeeman]
- phila: would it be useful to get an XML person in the room? They are here in the building.
- 16:41:05 [fjh]
- fjh has joined #csvw
- 16:41:14 [hadleybeeman]
- ivan: We should talk to Liam, the XML activity lead.
- 16:41:21 [hadleybeeman]
- phila: he's currently scribing a meeting
- 16:41:34 [hadleybeeman]
- danbri: I spoke to him yesterday; he's suggested eXSLT.
- 16:41:47 [danbri]
- q?
- 16:41:55 [hadleybeeman]
- jeniT: I was intimately invovled in XSLT, but I don't remember that.
- 16:41:55 [fjh]
- q+
- 16:42:32 [hadleybeeman]
- ...For completeness, it would be good to have an XML mapping. Not a trivial amount of work, and we need someone within the group to take it on. If no one wants to, then we may have to rule it out of scope or issue a note with our thoughts on it.
- 16:42:42 [hadleybeeman]
- danbri: We should take seriously that it hasn't cropped up in the use cases.
- 16:42:54 [fjh]
- q-
- 16:43:06 [ericstephan]
- q-
- 16:43:08 [hadleybeeman]
- jtandy: Some mention it. But we don't have anyone keen to take a lead on the work though. Mismatch between what's being asked for and what this group can currently deliver.
- 16:43:22 [chunming]
- q+
- 16:43:23 [danbri]
- ack ericstephan
- 16:43:24 [hadleybeeman]
- danbri: I see demand for it online. Look at StackOverflow, people are asking about libraries.
- 16:43:27 [danbri]
- q?
- 16:43:33 [gkellogg]
- gkellogg has joined #csvw
- 16:43:45 [fjh]
- q+
- 16:43:51 [gkellogg]
- gkellogg has joined #csvw
- 16:43:55 [hadleybeeman]
- ericstephan: There are a lot of scientific communities that use XML but they tend to use it more as at tag language. Not necessarily well-formed.
- 16:44:13 [hadleybeeman]
- ...I don't see a lot of interest going between CSV and XML. They're either in one or the other.
- 16:44:17 [danbri]
- q?
- 16:45:10 [fjh]
- q-
- 16:45:12 [hadleybeeman]
- chunming: We talk about someone sharing a big CSV file on the web. Another model is that someone has a huge dataset but allows a 3rd party to access just part of it, using CSV formats. Which model?
- 16:45:53 [hadleybeeman]
- jenit: Scope is not to specify a query language over a large dataset that produces CSV. Or an API. But instead the files themselves. But that is a good usecase, as jtandy discusses.
- 16:45:56 [ivan]
- q+
- 16:46:09 [danbri]
- ack chunming
- 16:46:20 [hadleybeeman]
- jtandy: We do have a use case that is from PLOS, where we are requesting a subset of results where those results are being produced in CSV or JSON or XML
- 16:46:23 [gkellogg]
- q+
- 16:46:31 [ivan]
- q-
- 16:46:42 [hadleybeeman]
- We talked about looking at a bit of that CSV and decided not to. But we are including the provenance relationship between a small dataset and its parent dataset
- 16:47:08 [ivan]
- q+
- 16:47:14 [hadleybeeman]
- gkellogg: Using an HTTP header — that seems like a protocol. Ensuring that a client can parse the HTTP headers appropriately. Does that open the door?
- 16:47:15 [danbri]
- q?
- 16:47:19 [danbri]
- ack gkellogg
- 16:47:27 [fjh]
- q+
- 16:47:42 [hadleybeeman]
- jtandy: We were talking about using query parameters on an HTTP request in order to get rows 17-29. Not in our scope but relevant.
- 16:47:43 [AxelPolleres]
- FWIW, IBM had some canonical JSON to XML mapping… http://pic.dhe.ibm.com/infocenter/wsdatap/v6r0m0/index.jsp?topic=%2Fcom.ibm.dp.xm.doc%2Fjson_jsonx.html (had to dig out the link)
- 16:48:12 [hadleybeeman]
- ivan: The various methods to access the metadata means that even for huge datasets I can get it all, because they are small compared to the dataset itself.
- 16:48:38 [hadleybeeman]
- ...I don't know whether the mapping to JSON or to RDF can be helpful for someone to make an inverse and be able to query into the CSV.
- 16:49:05 [hadleybeeman]
- ...In RDF terms, knowing the metadata can I turn a SPARQL query back into a CSV? It's an exciting question which we won't answer here.
- 16:49:08 [danbri]
- q?
- 16:49:11 [hadleybeeman]
- ack ivan
- 16:49:12 [danbri]
- ack ivan
- 16:49:42 [hadleybeeman]
- Frederick: Regarding the charter, i'd imagine you'd defer this until you have a strong reason to address it.
- 16:49:58 [hadleybeeman]
- ... @Jtandy: You mentioned provenance, which is relevant to Web Annotations
- 16:50:15 [hadleybeeman]
- jtandy: we have a whole thread of discussions on benefitting from the good work of you group
- 16:50:20 [danbri]
- ack fjh
- 16:50:23 [JeniT]
- q?
- 16:50:24 [hadleybeeman]
- ivan: we have a joint session this afternoon
- 16:50:30 [hadleybeeman]
- danbri: for XML then....?
- 16:51:00 [hadleybeeman]
- ivan: For planning, we should make a final decision before the end of the year. Ideally earlier, but we have to talk to Liam.
- 16:51:02 [phila]
- q+
- 16:51:24 [danbri]
- q+ to propose "The WG does not intend to work on XML/CSV mappings under its current chartered period."
- 16:51:27 [hadleybeeman]
- ...He may say "forget it guys", but he may want us to talk to more of the community. In which case, Christmas is not an unrealistic time
- 16:51:42 [hadleybeeman]
- danbri: I was going to propose that we not work on XML mappings
- 16:51:46 [phila]
- ack me
- 16:51:49 [hadleybeeman]
- danbri: Does anyone agree?
- 16:51:52 [danbri]
- ivan/phil 'let's talk to liam'
- 16:52:03 [hadleybeeman]
- phil: Let's talk to Liam.
- 16:52:12 [hadleybeeman]
- s/phil/phila
- 16:52:39 [hadleybeeman]
- jenit: I propose we catch up to Liam and other XML people over the next couple of days and address this with a resolution by the end of tomorrow.
- 16:53:01 [JeniT]
- also http://msdn.microsoft.com/en-us/library/bb924435(v=vs.110).aspx
- 16:53:03 [hadleybeeman]
- AxelPolleres: I put something in IRC from IBM (above), but I don't know if there is anything more broad.
- 16:53:28 [hadleybeeman]
- ivan: Doing a standard just because it's the charter and not checking if it's the right thing to do — sounds awkward to me.
- 16:53:42 [hadleybeeman]
- AxelPolleres: I thought there may be something we could refer to, that exists already.
- 16:54:10 [hadleybeeman]
- JeniT: There are ways of doing that — but I don't think any of those are what we would call standards. Where we could make normative references to them.
- 16:54:17 [hadleybeeman]
- danbri: It would be helpful to end the week with a decision.
- 16:54:56 [hadleybeeman]
- JeffJaffe: (introductions) CEO of W3C. Interoperable web standards, but particular interest in CSV. So much data out there, this is key.
- 16:55:25 [hadleybeeman]
- danbri: Looking at the mapping mechanisms for CSV into other formats... ivan, can you talk about what you've done with direct mapping?
- 16:56:01 [hadleybeeman]
- ivan: We had loads of discussion/emails on that. Not just direct mapping. My feeling is: what is realistic: a relatively simple mapping that doesn't require further language specification or syntax within the recommendation
- 16:56:01 [AxelPolleres]
- backchannel-question …. as for provenance … we would just hook in PROV with the ‘provenance’ metadata property, or was anything else discussed in this group? (sorry for having missed that, in case)
- 16:56:34 [hadleybeeman]
- ...what we have now is a document that mimics the RDB2RDF as a direct mapping (ericp did that). We have metadata we can rely on, so it's a bit different.
- 16:56:35 [JeniT]
- AxelPolleres: that was my assumption, though how to structure it in a JSON format I’m not sure
- 16:56:38 [danbri]
- q?
- 16:57:04 [jtandy]
- @AxelPolleres: W3C PROV would seem the correct option; we're not intending to re-develop anything in this space
- 16:57:04 [hadleybeeman]
- ...We had last week a mail from jtandy with reference to an RFC for URI templates which is a useful addition to that simple mapping.
- 16:57:08 [danbri]
- q-
- 16:57:13 [AxelPolleres]
- hmmmm, http://www.w3.org/Submission/2013/SUBM-prov-json-20130424/ seems to be “post-PROV-WG”
- 16:57:24 [danbri]
- q+ to talk about terminology confusions
- 16:57:37 [phila]
- q+ to talk about IBM's work
- 16:57:44 [phila]
- q+ ericP
- 16:57:48 [hadleybeeman]
- ...Those 2 documents exist, they need some care especially in how the data dives are interpreted. I think there is a separate discussion scheduled on the data dive in the metadata.
- 16:58:10 [hadleybeeman]
- ...Most of it is stable, the core is stable. The core can be implemented because I have a proof of concept for the RDF and JSON part.
- 16:58:11 [jtandy]
- q+
- 16:58:32 [hadleybeeman]
- ...There have been two other works that we explored. 1) We had long discussion about using this in a more general form. (Moustache?)
- 16:58:52 [hadleybeeman]
- ...Allowing a separate template to generate an RDF or JSON structure that is more complex than the line-by-line structure of a CSV file.
- 16:59:12 [jtandy]
- http://mustache.github.io
- 16:59:22 [hadleybeeman]
- ...If we're not careful, this could be come more complicated. I think we should not go this route for rec.
- 16:59:58 [hadleybeeman]
- ...Independently, 2) Anastasia — the R2RML language minus the SQL-specific things that are irrelevant here.
- 17:00:27 [hadleybeeman]
- ...For my feeling, has the same issue as Moustache — and is very RDF-specific. No structure for JSON.
- 17:00:39 [danbri]
- q?
- 17:00:39 [hadleybeeman]
- ...Right now, I think it's more important to produce JSON than RDF.
- 17:00:46 [hadleybeeman]
- ack danbri:
- 17:00:52 [JeniT]
- ack danbri
- 17:00:52 [Zakim]
- danbri, you wanted to talk about terminology confusions
- 17:01:06 [hadleybeeman]
- danbri: Re terminology. I've realised that my thought of "direct mapping" was different to what ivan has meant.
- 17:01:42 [hadleybeeman]
- ...In R2RML group, mapping starts with an SQL table and creates RDF graphs, triples. Predicates aren't mapped to well known RDF namespaces.
- 17:01:49 [hadleybeeman]
- ...In this group, we have more richness.
- 17:02:13 [hadleybeeman]
- ...When we say "direct mapping", we probably mean "simple mapping". Which could map to Dublin Core, or SKOS.
- 17:02:29 [hadleybeeman]
- ivan: I plead guilty because I've said "direct mapping" on the mailing list.
- 17:02:47 [hadleybeeman]
- danbri: This came to light when I said Google would have not interest on this. But the simple thing is potentially very valuable.
- 17:02:52 [danbri]
- q?
- 17:02:54 [phila]
- ack me
- 17:02:54 [Zakim]
- phila, you wanted to talk about IBM's work
- 17:02:55 [hadleybeeman]
- jenit: our first session tomorrow morning is on this.
- 17:03:21 [hadleybeeman]
- phila: Axel found a document from IBM, so I pinged Arnaud to ask if we can use it. He wasn't sure. I'll ask him for a clearer answer
- 17:03:27 [danbri]
- ack ericp
- 17:03:58 [hadleybeeman]
- jtandy: Re diff between "simple mapping" and "templated mapping" — in use cases, I want to represent more complicated content. That needs to go in Simple Mapping document.
- 17:04:15 [danbri]
- q+ ericp
- 17:04:18 [danbri]
- ack jtandy
- 17:04:31 [hadleybeeman]
- ...In simple mapping, you have to have property per column, Month and day property in different columns — can't create a date property merging them.
- 17:05:20 [hadleybeeman]
- ... If you have one triple per cell — we can say "this is as far as we can go now, but there we will be a community group or separate discussion to hook in external tempting stuff."
- 17:05:31 [gkellogg]
- q+ on JSON-LD from RDF with Framing
- 17:05:34 [danbri]
- eric ericp
- 17:05:37 [danbri]
- ack ericp
- 17:06:43 [hadleybeeman]
- ericP: If you want to characterise the difference between simple mapping and direct mapping: CSV of people and addresses. turn into a graph. Rename predicates in that graph, reflect the metadata. Compare to simple mapping. If they differ in substantial ways, then...
- 17:07:10 [hadleybeeman]
- ivan: I use the direct mapping approach.
- 17:07:22 [hadleybeeman]
- ericp: any differentiation would be defensible.
- 17:07:44 [JeniT]
- q?
- 17:07:44 [hadleybeeman]
- ivan: In the case of simple mapping, there are more info than we know. Info about the whole CSV file as a whole.
- 17:08:09 [hadleybeeman]
- AxelPolleres: to @ivan: if it covers more but should be the same, is it a requirement that the single mapping produces more triples?
- 17:09:19 [danbri]
- q+
- 17:09:20 [ericstephan]
- @Axel I wonder if the IBM work related to DFDL and Daffodil annotating data as XML document...
- 17:09:21 [hadleybeeman]
- gkellogg: There are advantages to looking at RDF mappings. Serialising RDF to JSON-LD gives you a JSON result. There is a spec for doing that. Looking at simple mapping — it now does provide the RDF tools to turn the graph into something more structured using SPARQL
- 17:09:24 [danbri]
- ack gkellogg
- 17:09:24 [Zakim]
- gkellogg, you wanted to comment on JSON-LD from RDF with Framing
- 17:09:42 [AxelPolleres]
- what I meant to say is, wouldn’t it make sense to require that the “simple CSV to RDF” mapping is a *superset* (in terms of resulting triples) of “CSV->SQL->RDB2RDF direct mapping”?
- 17:10:03 [hadleybeeman]
- ivan: Yes. Conceptually, I was wondering about the same thing. But as an implementer only interested in JSON: this is a long and torturous road. It might be a deal-breaker.
- 17:10:30 [JeniT]
- +1 to ivan
- 17:10:43 [hadleybeeman]
- ...Having a separate document that shows what you get in JSON and making it as close as possible to JSON-LD — as ericP said, there should be no major difference between the direct mapping and the simple mapping —
- 17:10:56 [danbri]
- q?
- 17:10:57 [hadleybeeman]
- ...If there are differences because JSON requires something different then we have to accept that.
- 17:11:20 [hadleybeeman]
- gkellogg: We need to include people comfortable with these technologies.
- 17:11:43 [hadleybeeman]
- ivan: I disagree. People who don't know anything about RDF — they just want it in JSON. There are loads of people there
- 17:11:49 [hadleybeeman]
- hadleybeeman: I agree with that
- 17:11:58 [ericstephan]
- +1 Ivan
- 17:12:03 [hadleybeeman]
- ivan: Even as an RDF person — this is a painful reality.
- 17:12:08 [phila]
- ack danbri
- 17:12:22 [hadleybeeman]
- danbri: We have a spectrum of enthusiasm for RDF.
- 17:12:30 [JeniT]
- q?
- 17:12:51 [ErikMannens]
- ErikMannens has joined #CSVW
- 17:12:51 [hadleybeeman]
- ...We need to mush these interests together. With Schema.org and Microdata (designed to be super simple for publishers) — even those were too complex
- 17:13:09 [hadleybeeman]
- ...These developers aren't thinking in terms of triples or graphs.
- 17:13:17 [ErikMannens]
- q+
- 17:13:38 [hadleybeeman]
- ... Saying RDF is the answer because you can serialise to RDFXML — long histories of failings here. Let's not spend the next 10 years doing the same with JSON
- 17:13:38 [fjh]
- q+
- 17:13:47 [danbri]
- ack ErikMannens
- 17:13:55 [hadleybeeman]
- ErikMannens: What's wrong with profiles? Simple profiles? More extended profiles?
- 17:14:03 [phila]
- XML is not fading away - its use is growing. Honestly (Liam assures us)
- 17:14:24 [hadleybeeman]
- ivan: The simple mapping to RDF is there. The definition is strictly done on the conceptual level in RDF. If someone wants to go that route and get JSON-LD, it's fine.
- 17:14:56 [hadleybeeman]
- ...If they do that, or do direct JSON, the two things should be close. But we don't talk about that. The document should be readable for someone in that context.
- 17:15:30 [danbri]
- q?
- 17:15:36 [hadleybeeman]
- ...The context is a good example. If you serialise the result of the RDF mapping into JSON-LD, then you will have all those things there. But if you serialise directly in JSON, you will not.
- 17:15:37 [danbri]
- ack fjh
- 17:16:09 [ErikMannens]
- ErikMannens has joined #CSVW
- 17:16:24 [hadleybeeman]
- ivan... If you want to some how be in the RDF world, then great. But if you're not — those are noise. Irritating noise.
- 17:16:57 [fjh]
- q?
- 17:17:06 [hadleybeeman]
- gkellogg: The tide seems to be moving toward well understood structured data in a lot of communities that were hostile to RDF. I don't know that we need to pander to a JSON mapping that doesn't contain some aspects of this.
- 17:17:15 [hadleybeeman]
- danbrI: we
- 17:17:22 [hadleybeeman]
- ...'ll pick this up later
- 17:17:37 [hadleybeeman]
- topic: meeting goals
- 17:17:48 [ericstephan]
- @phila - I agree with Liam's comment, lots of legacy communities still using XML, other communities that are emerging such as High Energy Physics very interested in XML. Just not sure about the CSV XML connection.
- 17:18:24 [hadleybeeman]
- topic: Review our implementation types
- 17:19:06 [danbri]
- Any volunteers to take over scribing from Hadley?
- 17:19:07 [hadleybeeman]
- jenit: We've looked at RDF, XML, JSON — that's one set of implementations. But I'm also interested in validators (validating a set of CSV files against the metadata to say if it's formatted correctly, has the right columns, etc.)
- 17:19:12 [fjh]
- rrsagent, generate minutes
- 17:19:12 [RRSAgent]
- I have made the request to generate http://www.w3.org/2014/10/27-csvw-minutes.html fjh
- 17:19:16 [JeniT]
- http://csvlint.io/
- 17:20:02 [gkellogg]
- gkellogg has joined #csvw
- 17:20:13 [hadleybeeman]
- jeniT: (shows demo of csvlint.io )
- 17:21:29 [hadleybeeman]
- ... Validation tools are really handy. We in the UK have a push to get local government to publish data about public toilets. The people pushing it defined a schema for the data, and 400+ local authorities had to validate against that.
- 17:21:46 [hadleybeeman]
- ...That makes it easy to pull all of those datasets together into something consistent and coherent.
- 17:22:03 [danbri]
- q?
- 17:22:58 [hadleybeeman]
- ...Another important implementation: display of CSV. GOV.UK, data.gov.uk, github — have displays of CSV as a table. They'll often add on filtering or sorting options.
- 17:23:18 [hadleybeeman]
- ... it's important and useful to know what the data type of the column is, so you can filter it the right way.
- 17:23:26 [hadleybeeman]
- ... using jquery datatables
- 17:23:26 [ivan]
- q+
- 17:23:39 [hadleybeeman]
- ... www.datatables.net
- 17:23:51 [AxelPolleres]
- sideremark… seeing csvlint.io it reminds me somewhat of http://www.w3.org/2001/sw/wiki/RDF_Alerts which we did some years ago… that was RDF specific though, not sure whether any of that useful here.
- 17:24:04 [hadleybeeman]
- ... Turn that CSV into an HTML table. You can imagine having pop-ups over the cells if they have annotations, having a metadata view, etc.
- 17:24:07 [danbri]
- "display" / viewers
- 17:24:36 [hadleybeeman]
- jenit: So those are the three implementations I think of: mappers, validators, and viewers.
- 17:25:11 [danbri]
- (me: import from bytes into tabular data model, … but that's more IETFish)
- 17:25:40 [JeniT]
- q+ to talk about error messages & warnings
- 17:25:48 [danbri]
- ack ivan
- 17:26:19 [hadleybeeman]
- ivan: It's clear to me what first two categories do for us. I'm not sure how the third category fits into the picture of checking our own work. Importing is definitely not in our charter. We are not defining the byte stream to tabular conversion — that's in the IETF spec.
- 17:26:22 [phila]
- q+ ericp
- 17:26:36 [hadleybeeman]
- ... What are the implementations that we have to take seriously as part of the rec track?
- 17:26:47 [danbri]
- ack jenit
- 17:26:47 [Zakim]
- JeniT, you wanted to talk about error messages & warnings
- 17:27:30 [hadleybeeman]
- jeniT: It is useful to talk about the display in a non-normative fashion
- 17:27:31 [hadleybeeman]
- q+
- 17:27:58 [hadleybeeman]
- ...Also, in what we need to do for validators: we need to talk about errors, warnings, etc.
- 17:28:04 [ericP-mobile]
- ericP-mobile has joined #csvw
- 17:28:06 [hadleybeeman]
- ivan: Do we have to define standard errors?
- 17:28:19 [hadleybeeman]
- jeniT: I think so. Not standard wordings, but codes for them. I think it's helpful.
- 17:28:34 [phila]
- present+ Richard Ishida
- 17:28:34 [hadleybeeman]
- Richard: (introductions)
- 17:29:15 [ErikMannens]
- ErikMannens has joined #CSVW
- 17:29:33 [hadleybeeman]
- Richard: Display will be different. Internationalization are looking at the forms in HTML, numeric formats in different languages, etc. There are problems associated with that that may be relevant here.
- 17:30:10 [jtandy]
- q+
- 17:30:11 [hadleybeeman]
- jenit: for CSV, unlike a lot of other data, it has the goal of being both machine readable and human readable. So we do have numerical formats that are location specific. (Dates, numbers, etc.)
- 17:30:33 [hadleybeeman]
- Richard: You may need to account for locale in the metadata.
- 17:30:59 [danbri]
- q?
- 17:30:59 [hadleybeeman]
- ... As HTML does, a lot is done in the browser.
- 17:31:09 [hadleybeeman]
- ... A lot of a locale is a language plus local settings.
- 17:31:14 [phila]
- q- ericP
- 17:31:27 [hadleybeeman]
- danbri: Shall we have a joint meeting about this?
- 17:31:39 [hadleybeeman]
- jenit: we have a session on data types later today. Useful for this.
- 17:31:45 [phila]
- ack hadleybeeman
- 17:31:59 [danbri]
- hadleybeeman: things like the display on html page may not be as relevant for this WG but it fits well with Data On Web Best Practices WG
- 17:32:01 [phila]
- hadleybeeman: The display issue may be relevsant to DWBP group
- 17:32:12 [danbri]
- we are looking at barriers to use, if average user can't see/read/understand ...
- 17:32:15 [danbri]
- q?
- 17:32:34 [ericP-mobile]
- hadley: display may not fit this WG but it may fit well in DWBP.
- 17:32:49 [phila]
- ack jtandy
- 17:32:53 [hadleybeeman]
- jtandy: For us, in terms of display, we often want to get data into just plain JSON. "javascript goodness" can then be applied.
- 17:33:13 [danbri]
- q+
- 17:33:20 [hadleybeeman]
- ... In internationalization, we look at right-to-left and top-to-bottom languages too.
- 17:33:57 [hadleybeeman]
- ivan: we have Japanese representation here. China are pretty agreeable to doing everything horizontally. Japan this is not so.
- 17:34:08 [phila]
- Vote of thanks to Hadley for scribing first (busy) session
- 17:34:16 [phila]
- RRSAgent, draft minutes
- 17:34:16 [RRSAgent]
- I have made the request to generate http://www.w3.org/2014/10/27-csvw-minutes.html phila
- 17:42:18 [ErikMannens]
- ErikMannens has joined #CSVW
- 17:51:40 [JeniT]
- JeniT has joined #csvw
- 17:52:01 [daveL]
- daveL has joined #csvw
- 17:52:33 [daveL]
- present+ DaveLewis
- 17:54:46 [bill-ingram]
- bill-ingram has joined #csvw
- 17:55:01 [daveL]
- Best Practices for Multilingual Linked Open Data Community Group may be willing to help with internationalisation issues
- 17:55:13 [daveL]
- http://www.w3.org/community/bpmlod/
- 17:56:39 [fjh]
- fjh has joined #csvw
- 17:56:45 [ErikMannens]
- ErikMannens has joined #CSVW
- 18:00:01 [jtandy]
- jtandy has joined #csvw
- 18:02:21 [AxelPolleres]
- AxelPolleres has joined #csvw
- 18:02:37 [gkellogg]
- gkellogg has joined #csvw
- 18:02:46 [gkellogg]
- scribenick: gkellogg
- 18:03:43 [gkellogg]
- Topic: Tabular metadata
- 18:03:54 [ivan]
- rrsagent, draft minutes
- 18:03:54 [RRSAgent]
- I have made the request to generate http://www.w3.org/2014/10/27-csvw-minutes.html ivan
- 18:04:01 [ericstephan]
- ericstephan has joined #csvw
- 18:05:00 [gkellogg]
- JeniT: talking about metdata representation for individual tables, but also how it can be applied to columns
- 18:05:11 [gkellogg]
- … title, description, date, …
- 18:05:18 [ivan]
- s/metdata/metadata/
- 18:05:28 [JeniT]
- http://w3c.github.io/csvw/metadata/#common-properties
- 18:05:33 [AxelPolleres]
- q+ on provenance
- 18:05:41 [gkellogg]
- … currently in “Metadata Vocabulary” spec sec 3.3
- 18:06:00 [danbri]
- danbri has joined #csvw
- 18:06:08 [gkellogg]
- … This pulls in and references all dublic core metadata terms
- 18:06:40 [gkellogg]
- … In some cases terms describe data values, object, natural language string, or something with a particular date format
- 18:06:58 [gkellogg]
- … Three areas to discuss.
- 18:07:23 [gkellogg]
- … 1) what list of properties should be, perhaps dcat, or schema.org instead of DC. Perhaps our own set.
- 18:07:46 [hadleybeeman]
- q+ to ask about existing implementations
- 18:07:57 [gkellogg]
- danbri: if they’re a DC-based project, they may need to use DC for everything.
- 18:08:00 [danbri]
- q-
- 18:08:14 [gkellogg]
- JeniT: sometimes it’s the consumer that cares most about vocabulary mapping, rather than the publisher.
- 18:08:42 [gkellogg]
- … We need a list, as we’re expecting validators and mappers to reject properties not on the list (to avoid miss-spellings).
- 18:09:06 [gkellogg]
- … 2) how are the properties defined, within the spec or outside. (Constraints on what we can point to)
- 18:09:29 [gkellogg]
- … 3) How is metadata used to inform the mapping to different formats.
- 18:09:39 [ivan]
- q+
- 18:09:52 [AxelPolleres]
- do we need/want any new properties on document level anything which is not covered in DC, DCAT, PROV? Do we need to specify mappings to those?
- 18:09:58 [ivan]
- ack AxelPolleres
- 18:09:58 [Zakim]
- AxelPolleres, you wanted to comment on provenance
- 18:10:07 [gkellogg]
- AxelPolleres: there are two types of metadata, document-level and structural.
- 18:10:20 [gkellogg]
- … The former is also around provenance, the second is for processing instructions.
- 18:10:42 [AxelPolleres]
- http://www.w3.org/TR/prov-dc/
- 18:10:46 [gkellogg]
- … Also consider PROV vocabulary, there are notes on how to map PROV to DC.
- 18:11:06 [gkellogg]
- … Do we need to ensure that there are mappings between the two.
- 18:11:24 [gkellogg]
- JeniT: we can just pick up DC terms, or we could say use DCAT or ...
- 18:11:44 [danbri]
- q?
- 18:11:56 [gkellogg]
- … In this case “provenance” is the DC term, not necessarily relating to a different spec.
- 18:12:07 [danbri]
- q+ to ask about UCs and subsetting
- 18:12:09 [JeniT]
- ‘provenance’ isn’t in the schema.org set of terms
- 18:12:20 [gkellogg]
- hadleybeeman: do we have any way of knowing what is used more beetween the different formats?
- 18:12:20 [JeniT]
- ack hadleybeeman
- 18:12:20 [Zakim]
- hadleybeeman, you wanted to ask about existing implementations
- 18:12:21 [ivan]
- ack hadleybeeman
- 18:12:27 [bill-ingram]
- q+
- 18:12:46 [gkellogg]
- danbri: Google has information for microdata/rdfa/json-ld, but not from other RDF formats.
- 18:12:57 [gkellogg]
- … Clearly, we’re going to see a lot of schema.org.
- 18:13:14 [gkellogg]
- hadleybeeman: what would these numbers tell us if we could get them.
- 18:13:18 [AxelPolleres]
- is it in us to define/extend mappings between - for us useful - properties among schema.org, DC, DCAT, PROV, e.g. extending http://www.w3.org/TR/prov-dc/
- 18:13:22 [danbri]
- q?
- 18:13:23 [JeniT]
- q+
- 18:13:38 [JeniT]
- q+ to talk about how we should always enable extension names
- 18:13:43 [laufer]
- q+
- 18:14:00 [ericstephan_]
- ericstephan_ has joined #csvw
- 18:14:00 [danbri]
- q+ ericstephan
- 18:14:31 [gkellogg]
- ivan: Jeni said that “these terms” are the only terms you should use, which seems to be dangerious.
- 18:14:36 [danbri]
- ack ivan
- 18:14:51 [JeniT]
- q-
- 18:14:52 [danbri]
- ack me
- 18:14:54 [Zakim]
- danbri, you wanted to ask about UCs and subsetting
- 18:15:00 [gkellogg]
- JeniT: That should be “un-prefixed” terms, it’s really about unprefixed terms.
- 18:15:09 [jtandy]
- q+
- 18:15:17 [gkellogg]
- danbri: can we be use-case driven? DC started with 15 terms, has grown over the years.
- 18:15:28 [danbri]
- q?
- 18:15:29 [ivan]
- ack danbri
- 18:15:29 [gkellogg]
- … Can we use the use cases to pair down the set of terms we need to support.
- 18:16:01 [gkellogg]
- jtandy: National Archives has some economic data which includes publisher, date, time, obvious stuff.
- 18:16:23 [gkellogg]
- danbri: perhaps we can look at CSVs in repo.
- 18:16:29 [danbri]
- q?
- 18:16:38 [danbri]
- ack bill-ingram
- 18:17:07 [gkellogg]
- bill-ingram: in the library, everyone in metadata knows what DC is, but it took a while to get there. People starting to talk about schema.org.
- 18:17:36 [gkellogg]
- … Most of this relates to the software we use, for that DC is the core metadata for describing objects. It’s starting to change.
- 18:17:40 [ericstephan]
- +1 bill-ingram
- 18:17:50 [danbri]
- q?
- 18:17:51 [jtandy]
- from the UC doc: see http://w3c.github.io/csvw/use-cases-and-requirements/#UC-PublicationOfNationalStatistics
- 18:17:53 [AxelPolleres]
- FWIW, CKAN also has some metadata properties which I am not sure how far they are aligned with e.g. DC, etc., are they?
- 18:17:53 [gkellogg]
- … I’m interested in schema.org, but it always ends up talking about mapping back to DC.
- 18:17:58 [danbri]
- ack laufer
- 18:17:58 [JeniT]
- q+ to propose using the overlap between the various specs
- 18:18:47 [gkellogg]
- laufer: there may be some mandatory items.
- 18:18:58 [danbri]
- q?
- 18:19:00 [gkellogg]
- … Some may be mandatory, others optional.
- 18:19:20 [gkellogg]
- JeniT: different organizations always create their own profiles for what they expect.
- 18:19:21 [danbri]
- ack ericstephan
- 18:19:46 [gkellogg]
- ericstephan: predomenance of data is in DC. I’m sensitive to DCAT and DC, as they’re forward thinking.
- 18:20:34 [gkellogg]
- … looking at requirements derived from use-cases, that would be a way to help define a core set of metadata we should be considering, or if there are obvious glaring holes.
- 18:20:47 [gkellogg]
- … I am worried about getting lost in the detail, however.
- 18:20:48 [danbri]
- ack jtandy
- 18:21:10 [ericP-mobile]
- ericP-mobile has joined #csvw
- 18:21:16 [gkellogg]
- jtanday: we previously agreed to a short-list of about 15 terms.
- 18:21:42 [gkellogg]
- … and of section 3.4.2
- 18:21:52 [danbri]
- http://w3c.github.io/csvw/metadata/#optional-properties
- 18:22:24 [gkellogg]
- … these are properties that relate to core information expected to be associated with CSVs and used in mapping.
- 18:22:36 [ivan]
- q+
- 18:23:14 [gkellogg]
- ivan: spatial and temporal were unclear if they should be part of the core
- 18:23:42 [gkellogg]
- JeniT: I think that list was “plucked out of the air”. There are so many groups who have thought about this, we shouldn’t re-do that thinking.
- 18:24:02 [danbri]
- q?
- 18:24:07 [danbri]
- ack jenit
- 18:24:08 [Zakim]
- JeniT, you wanted to propose using the overlap between the various specs
- 18:24:38 [gkellogg]
- jtandy: we were looking at three main things: validation, mapping and display.
- 18:24:48 [danbri]
- q?
- 18:24:56 [gkellogg]
- … What metadata do we need to ensure that these mappings can occur, this list doesn’t form that.
- 18:25:38 [gkellogg]
- … Maybe we can off-load choice of terms to Best Practices WG. cc/hadleybeeman
- 18:25:58 [gkellogg]
- hadleybeeman: we haven’t gotten into this too much yet.
- 18:26:21 [gkellogg]
- … We need to talk about this more, but that kind of a division of labor makes sense.
- 18:26:50 [gkellogg]
- jtandy: It doesn’t matter how your publishing data, these questions are universal.
- 18:26:57 [gkellogg]
- … It really should be about validation of parsing.
- 18:26:58 [fjh]
- q+
- 18:27:31 [gkellogg]
- hadleybeeman: we’re shying away about specifying specific vocabularies, as there are many different needs.
- 18:27:54 [gkellogg]
- jtanday: but you probably should be able to say that there should be a license, but there are many ways to express it.
- 18:28:05 [ErikMannens]
- ErikMannens has joined #CSVW
- 18:28:33 [danbri]
- q?
- 18:28:35 [gkellogg]
- laufer: we can’t make a complete list, but we can give examples of vocabularies which can do it.
- 18:29:23 [gkellogg]
- ivan: until now everything is mapped to DC. The question is should we use schema.org or DCAT instead?
- 18:29:46 [gkellogg]
- … We tried to specify a very small core, but leave the details up to the users.
- 18:30:09 [gkellogg]
- … This list was for the small core; it does not exclude the use of other vocabularies.
- 18:30:36 [JeniT]
- q+ to talk about definition through implementation
- 18:30:42 [gkellogg]
- … Do we define the 5..15 terms ourselves, or leave it open to the user to decide?
- 18:30:43 [danbri]
- ack ivan
- 18:30:56 [hadleybeeman]
- Is the question here: Are we defining a vocabulary, or pointing to existing work?
- 18:31:30 [gkellogg]
- … What does it mean if we pick “language”, “title”, and “provenance”? Do we define a new core (Santa Clara Core?)
- 18:31:45 [ericP-mobile]
- i thought the purpose of picking the terms was to enable s nodical of validation
- 18:31:51 [danbri]
- ack fjh
- 18:32:01 [gkellogg]
- fjh: What are the normative assertions, and how do you test them?
- 18:32:19 [gkellogg]
- … If you push too much off to the Best Practices group, you might not have something testable.
- 18:32:36 [danbri]
- q+ re DC testability
- 18:32:37 [ericP-mobile]
- s/nodical/modicum/
- 18:33:08 [gkellogg]
- JeniT: two issues: when testing the medata file, a validator has to genrate a warning.
- 18:33:18 [gkellogg]
- … The other level, is the actual use in deeper validation or mapping.
- 18:33:40 [gkellogg]
- … For example, the “title” property might be used to validate column titles to be what is expected.
- 18:33:55 [fjh]
- q?
- 18:33:55 [danbri]
- q?
- 18:33:57 [AxelPolleres]
- q+ on what are the expectations on validation
- 18:34:02 [danbri]
- ack jenit
- 18:34:02 [Zakim]
- JeniT, you wanted to talk about definition through implementation
- 18:34:03 [gkellogg]
- ivan: als need to check that the value given to a language mapping is a real language.
- 18:34:33 [gkellogg]
- JeniT: That’s a way to distinguish between first-level terms, and other terms.
- 18:34:55 [laufer]
- q+
- 18:35:02 [gkellogg]
- … The implication is that if you wanted to use, say, a license, they would need to use a prefixed-term.
- 18:35:08 [jtandy]
- q+
- 18:35:12 [danbri]
- ack me
- 18:35:12 [Zakim]
- danbri, you wanted to discuss DC testability
- 18:35:23 [ErikMannens]
- ErikMannens has joined #CSVW
- 18:35:38 [gkellogg]
- danbri: we’re pushing on 20 years of DC work; never been too rigid. Everything’s optional.
- 18:35:49 [danbri]
- q?
- 18:35:54 [gkellogg]
- … If this group starts to make stronger claims about DC, that might be an issue.
- 18:36:04 [gkellogg]
- AxelPolleres: what are the expectations on valididty?
- 18:36:28 [gkellogg]
- … For some things it’s lexical, but for other’s it is more challenging (license, for example).
- 18:36:47 [danbri]
- ack axelpolleres
- 18:36:47 [Zakim]
- AxelPolleres, you wanted to comment on what are the expectations on validation
- 18:36:48 [ericP-mobile]
- q+ to say the i expect that the shape view will be that it's encouraged to restrict Duckling Core
- 18:36:56 [gkellogg]
- … Do we want to validate other types of things. recommendations of particular strings to use?
- 18:37:19 [ericstephan]
- q+
- 18:37:23 [danbri]
- q+ to note that DCMI (and schema.org) can be changed/improved/augmented too - we can push ideas upstream
- 18:37:28 [danbri]
- ack laufer
- 18:37:30 [gkellogg]
- … Some things we can validate, other’s we can’t; doesn’t mean they’re not important.
- 18:37:47 [AxelPolleres]
- e.g. license IS very important to be declared.
- 18:37:59 [hadleybeeman]
- I note the many crossovers with DWBP WG
- 18:38:04 [gkellogg]
- laufer: we need to classify types of data. Structural data?
- 18:38:18 [gkellogg]
- … How important is differnet types of data for searchability, for example.
- 18:38:20 [ivan]
- q+ to ask whether the world would collapse if we stay with a few dublin core term
- 18:38:37 [gkellogg]
- … What we can do is information about the structure of the data (rows and columns, datatypes, etc.)
- 18:39:05 [gkellogg]
- … license, provenance, … Difficult to test these, you can test syntactic.
- 18:39:26 [gkellogg]
- q+ to ask about relevance of RDF Shapes
- 18:40:02 [danbri]
- ack jtandy
- 18:40:37 [gkellogg]
- jtandy: If we’re focusing the metadata vocaulary on what is necessary for validation, then we shouldn’t spend too much time worrying about it.
- 18:40:44 [danbri]
- ack ericp-mobile
- 18:40:44 [Zakim]
- ericP-mobile, you wanted to say the i expect that the shape view will be that it's encouraged to restrict Duckling Core
- 18:41:25 [gkellogg]
- ericp: I’m working with DC application profiles group who want’s to make sure there are some ways of describing restrictions on publication profiles.
- 18:41:27 [gkellogg]
- q-
- 18:41:43 [gkellogg]
- … We’re also starting work at the W3C on this.
- 18:42:21 [gkellogg]
- danbri: the DC view is that such restrictions make sense in a particular context. For use the question is do we decide this?
- 18:42:35 [danbri]
- ack ericstephan
- 18:42:41 [gkellogg]
- … Uses by govenrment vs search engines may be different.
- 18:43:09 [danbri]
- q?
- 18:43:12 [danbri]
- ack me
- 18:43:12 [Zakim]
- danbri, you wanted to note that DCMI (and schema.org) can be changed/improved/augmented too - we can push ideas upstream
- 18:43:12 [hadleybeeman]
- +1 to ericstephan
- 18:43:13 [gkellogg]
- bill-ingram: our focus has been generic document-level. what we’re going to do is give insite on describing CSV contents, that’s what people will look for.
- 18:43:38 [gkellogg]
- s/bill-ingram/ericstephan/
- 18:43:41 [danbri]
- ack ivan
- 18:43:41 [Zakim]
- ivan, you wanted to ask whether the world would collapse if we stay with a few dublin core term
- 18:43:43 [danbri]
- q?
- 18:44:12 [AxelPolleres]
- +1 to DanBri: make things work together (addition: rather than defining something new)
- 18:44:33 [gkellogg]
- ivan: I wonder if the metadata document maybe just went too far? Perhaps we should just take the bare minimum (2-3 terms) but we use DC explicitly.
- 18:44:56 [gkellogg]
- … We do rely on DC once and for all, as we have a mechanism for using other vocabularies.
- 18:45:18 [gkellogg]
- … What counts is the metadata for describing the structure.
- 18:45:36 [gkellogg]
- … Accept DC, use DC, make it clear that you can use schema and DCAT by using prefixes or contexts.
- 18:46:09 [gkellogg]
- … a validator may then check these things.
- 18:46:17 [danbri]
- q?
- 18:47:02 [danbri]
- q+ to suggest dc:locale per richard ishida's contrib
- 18:47:11 [gkellogg]
- … Just a few un-prefixed terms from DC, not defined by us.
- 18:47:40 [gkellogg]
- JeniT: We need to have some restrictions on the values of these terms.
- 18:47:52 [JeniT]
- q?
- 18:47:53 [danbri]
- q?
- 18:48:07 [bill-ingram]
- q+
- 18:48:47 [gkellogg]
- jtandy: For the validation to work, we need consistent syntax. I can imagine not caring about what the @context says, because I can validate the structure. It may map to schema, or to DC.
- 18:49:04 [gkellogg]
- ivan: I’m saying we don’t define the value of the “language” value for example.
- 18:49:33 [hadleybeeman]
- q+ to ask if the interoperability between metadata sets is part of the scope of this working group?
- 18:50:27 [danbri]
- ack danbri
- 18:50:27 [Zakim]
- danbri, you wanted to suggest dc:locale per richard ishida's contrib
- 18:51:02 [gkellogg]
- danbri: localle has been described as being important, but isn’t in our list.
- 18:51:02 [danbri]
- ack bill-ingram
- 18:51:21 [gkellogg]
- bill-ingram: we’d prefer that everything be prefixed, but if un-prefixed, we’d like them to map to dc?
- 18:51:37 [gkellogg]
- ivan: perhaps, but I suggest that we only allow 5 terms to be unprefixed.
- 18:52:25 [JeniT]
- q+ to propose a list of the unprefixed properties
- 18:52:25 [danbri]
- q?
- 18:53:04 [gkellogg]
- hadleybeeman: scope question: If I have a dataset using DC, and you have one using DCAT, does that break the purpose of this WG? Or is it okay as long as they’re each valid?
- 18:53:20 [gkellogg]
- JeniT: Just validity.
- 18:53:46 [danbri]
- ack hadleybeeman
- 18:53:46 [Zakim]
- hadleybeeman, you wanted to ask if the interoperability between metadata sets is part of the scope of this working group?
- 18:53:48 [danbri]
- ack jenit
- 18:53:49 [Zakim]
- JeniT, you wanted to propose a list of the unprefixed properties
- 18:54:14 [danbri]
- JeniT suggests resolution "we are going to stick to a small set of properties that are used in validation or mapping"
- 18:54:35 [danbri]
- greg: i heard a couple things, …1 at v surface level terms are used, e.g. title as a string in json doc...
- 18:54:43 [danbri]
- …this doesn't necc say that it maps to dc:title
- 18:54:57 [danbri]
- … is everyone on board with this, or is there a feeling that it must map to DC title
- 18:55:00 [danbri]
- q?
- 18:55:54 [gkellogg]
- ivan: we define the meaning of terms according to DC, but it may be mapped. If mapped, it must be dc:title.
- 18:55:59 [ericP-mobile]
- q+
- 18:56:08 [gkellogg]
- … Perhaps through entailment?
- 18:57:08 [laufer]
- q+
- 18:57:28 [danbri]
- gregg: cautioning that looking at surface level of json where you also expect a mapping could be problematic
- 18:57:29 [danbri]
- ack ericp-mobile
- 18:58:08 [gkellogg]
- ericp: Is the WG receptive to the Best Practices coming back and saying that there may be some imposed restriction?
- 18:58:37 [danbri]
- q?
- 18:58:37 [gkellogg]
- hadleybeeman: I’d say that that is a different place for the discussion, but might include the same people
- 18:59:29 [gkellogg]
- JeniT: I think that it’s reasonable for other groups to decide practices that we should conform to.
- 19:00:02 [hadleybeeman]
- q+
- 19:00:34 [gkellogg]
- ivan: we use JSON, and when we can, the syntax conforms to JSON-LD. The metadata can be considered as JSON-LD by an implementor if it wants
- 19:00:50 [gkellogg]
- … So there might not be a context?
- 19:01:05 [danbri]
- q?
- 19:01:15 [JeniT]
- q+ to propose language as metadata for tables, title and language for columns
- 19:01:26 [AxelPolleres]
- ericp, how bout calling it “best practices” rather than “imposed restrictions”?
- 19:01:35 [danbri]
- q?
- 19:01:40 [danbri]
- ack laufer
- 19:01:42 [danbri]
- ack hadley
- 19:02:11 [JeniT]
- q+ AxelPolleres
- 19:02:19 [gkellogg]
- hadleybeenman: I believe the REC track process is such that we (Best Practices) can’t decide things without considering the needs of other groups.
- 19:02:20 [AxelPolleres]
- q+ on asking CSVW vs DWBP
- 19:02:21 [danbri]
- q?
- 19:02:27 [danbri]
- ack jenit
- 19:02:27 [Zakim]
- JeniT, you wanted to propose language as metadata for tables, title and language for columns
- 19:02:29 [danbri]
- p
- 19:02:41 [gkellogg]
- JeniT: perhaps we can make a decisions?
- 19:03:20 [gkellogg]
- … I’d say that “language” at document level (or group of CSV files), and “title” and “language” at the column level.
- 19:03:36 [gkellogg]
- … “localle” is part of “language”.
- 19:03:46 [timeless]
- timeless has joined #csvw
- 19:03:48 [RRSAgent]
- I'm logging. I don't understand 'this meeting spans midnight <- if you want a single log for the two days', timeless. Try /msg RRSAgent help
- 19:03:53 [danbri]
- q?
- 19:04:17 [AxelPolleres]
- What’s rong with the list on http://w3c.github.io/csvw/metadata/#optional-properties ? would like to make an attempt to argue for that list.
- 19:04:21 [gkellogg]
- … Title is natural-languge string at top of column.
- 19:04:45 [gkellogg]
- … Name is like a variable name for that column, what is used in the mapping. Typically has a constrained syntax.
- 19:04:53 [gkellogg]
- … @id is there for JSON-LD compatibility.
- 19:05:37 [gkellogg]
- … it must be an IRI.
- 19:05:47 [danbri]
- "locations for toilets, e.g. @id "lat" for latitude of toilers
- 19:05:59 [danbri]
- gregg: that would be ok if we had a base location as you could construct an IRI
- 19:06:00 [danbri]
- q?
- 19:06:17 [jtandy]
- q+ to note that additional properties like "base" will be required
- 19:06:25 [gkellogg]
- ivan: that means that “title” is fundamentally different than other properties.
- 19:06:34 [danbri]
- ack ax
- 19:06:34 [Zakim]
- AxelPolleres, you wanted to comment on asking CSVW vs DWBP
- 19:06:36 [ivan]
- ack AxelPolleres
- 19:06:58 [gkellogg]
- AxelPolleres: what’s wrong with the core list? I think the things we need are present in that list.
- 19:07:21 [gkellogg]
- … It may be arbitrary, but it seems good. Better than something overly constrained.
- 19:07:39 [timeless]
- timeless has left #csvw
- 19:07:50 [danbri]
- q?
- 19:08:02 [danbri]
- q+
- 19:08:07 [gkellogg]
- q+
- 19:08:20 [danbri]
- ack jtandy
- 19:08:20 [Zakim]
- jtandy, you wanted to note that additional properties like "base" will be required
- 19:08:24 [ericP-mobile]
- +1 to starting small (3)
- 19:08:34 [ErikMannens]
- ErikMannens has joined #CSVW
- 19:08:37 [danbri]
- q?
- 19:08:44 [gkellogg]
- jtandy: there will be other things that are necessary, but they will emerge.
- 19:08:51 [danbri]
- ack me
- 19:08:55 [ivan]
- q+
- 19:09:39 [gkellogg]
- danbri: A small list is easier to stand behind; a medium sized list may give the false impression that we’ve thought deeply about it.
- 19:10:21 [danbri]
- ack gk
- 19:10:39 [danbri]
- gkellogg: if we overly restrict use of simple strings as properties within a json file,
- 19:10:57 [danbri]
- …we are violating expectations of many json users who like js with dot notation (i.e. objects)
- 19:10:59 [danbri]
- q?
- 19:11:07 [danbri]
- jtandy: context could ...
- 19:11:15 [danbri]
- gkelllogg: context could … sure, ...
- 19:11:30 [danbri]
- jtandy: only lang and title are the terms which we want to validate at the surface level
- 19:11:33 [AxelPolleres]
- maybe a good idea to not put something expressed elsewhere (DC) into our standard… (retracting my concerns from before, if we point to that as a “for instance” option to extend the meta-data vocab).
- 19:11:42 [danbri]
- gkellogg: also looking for bare terms that dont map
- 19:11:45 [danbri]
- ack iv
- 19:12:02 [danbri]
- ivan: replying to gregg, … what we are talking about here are the usual metadata terms
- 19:12:23 [danbri]
- ivan: 90% of metadata file consists of terms describing the csv file
- 19:12:27 [danbri]
- those are of course unqualified
- 19:12:30 [gkellogg]
- ivan: we need to be careful: what we’re talking about is the “usual metadata terms”. 90% of the metadata file consistes of terms describing the structure of the CSV file. Those are of course unqualified.
- 19:12:50 [gkellogg]
- … There are other unqualified terms; we’ve been discussing 10% of the content of a metadata file.
- 19:12:54 [danbri]
- Observers: Please consider volunteering to scribe next session.
- 19:12:56 [danbri]
- q?
- 19:13:41 [gkellogg]
- … Having a very restricted version as proposed by JeniT as being unqualified is fine. We can then see if “the other group” comes up with more required terms.
- 19:13:54 [danbri]
- q?
- 19:14:11 [danbri]
- jenit, want to take a poll on your proposal to bridge to lunch?
- 19:14:36 [danbri]
- q?
- 19:14:44 [gkellogg]
- … We really need to solve structural terms. the use of licence and title, say, should be soleved by the Best Practices group. Leave these open until the BPWG has something to say.
- 19:14:45 [ericstephan]
- +1 Ivan
- 19:14:51 [laufer]
- danbri:
- 19:14:58 [danbri]
- q?
- 19:15:05 [chunming]
- q?
- 19:15:10 [jtandy]
- @ivan: +1
- 19:15:52 [JeniT]
- PROPOSED RESOLUTION: We will define the terms ‘title’ and ‘language’ (for columns) and ‘language’ (for table groups down), provide examples using qualified terms for other metadata vocabularies, and be guided by DWBP wrt recommending other particular metadata terms to recommend
- 19:16:24 [AxelPolleres]
- +q one last question
- 19:16:27 [gkellogg]
- s/BPWG/DWBP/
- 19:16:32 [danbri]
- ack a
- 19:16:51 [danbri]
- axel "what about encoding?" (utf-8 etc)
- 19:17:33 [gkellogg]
- AxelPolleres: what about encoding metadata?
- 19:17:34 [danbri]
- q?
- 19:17:40 [gkellogg]
- JeniT: described elsewhere.
- 19:17:52 [danbri]
- PROPOSED: We will define the terms ‘title’ and ‘language’ (for columns) and ‘language’ (for table groups down), provide examples using qualified terms for other metadata vocabularies, and be guided by DWBP wrt recommending other particular metadata terms to recommend
- 19:17:57 [AxelPolleres]
- +1
- 19:17:57 [ericstephan]
- +1
- 19:18:00 [gkellogg]
- +1
- 19:18:01 [danbri]
- +1
- 19:18:01 [ivan]
- +1
- 19:18:01 [bill-ingram]
- +1
- 19:18:03 [JeniT]
- +q
- 19:18:04 [jtandy]
- +1
- 19:18:05 [JeniT]
- +1
- 19:18:08 [JeniT]
- -q
- 19:18:10 [hadleybeeman]
- +1 though I'm just observing. But this makes sense from DWBP's perspective too
- 19:18:21 [bjdmeest]
- +1
- 19:18:36 [chunming]
- +1
- 19:18:37 [AxelPolleres]
- encoding is coverd by the syntax http://www.w3.org/TR/tabular-data-model/#encoding
- 19:18:45 [ivan]
- RESOLVED: We will define the terms ‘title’ and ‘language’ (for columns) and ‘language’ (for table groups down), provide examples using qualified terms for other metadata vocabularies, and be guided by DWBP wrt recommending other particular metadata terms to recommend
- 19:18:54 [ericP-mobile]
- +1 (as observer)
- 19:19:38 [Hitoshi]
- Hitoshi has left #csvw
- 20:04:02 [AxelPolleres]
- AxelPolleres has joined #csvw
- 20:04:21 [bill-ingram]
- bill-ingram has joined #csvw
- 20:04:33 [JeniT]
- JeniT has joined #csvw
- 20:04:33 [AxelPolleres]
- scribe: AxelPolleres
- 20:04:34 [gkellogg]
- gkellogg has joined #csvw
- 20:05:02 [danbri]
- danbri has joined #csvw
- 20:05:13 [jtandy]
- jtandy has joined #csvw
- 20:05:28 [ErikMannens]
- ErikMannens has joined #CSVW
- 20:07:30 [AxelPolleres]
- jeniT summarizing result of discussion before lunch.
- 20:08:30 [ericstephan]
- ericstephan has joined #csvw
- 20:08:45 [ivan]
- ivan has joined #csvw
- 20:09:15 [AxelPolleres]
- … title and language being the only document level meta-data attributes stanadardised.
- 20:09:19 [JeniT]
- we should use BCP47 for languages
- 20:09:27 [bill-ingram1]
- bill-ingram1 has joined #csvw
- 20:09:29 [danbri]
- dc http://dublincore.org/documents/2012/06/14/dcmi-terms/?v=terms#terms-language
- 20:09:42 [AxelPolleres]
- Addison: suggest to use language tags (BCP47) for languages
- 20:09:56 [danbri]
- q?
- 20:10:05 [AxelPolleres]
- s/Addison/Addison Phillips/
- 20:10:30 [AxelPolleres]
- Dan: if more than two codes are applicable, should we repeat the property?
- 20:12:02 [AxelPolleres]
- Richard: several languages may appear in a doc, the intended language of the user is the top level of the meta-data, but particular cells or columns could have different language.
- 20:12:20 [AxelPolleres]
- Ivan: there should be levels for meta-data at all levels of granularity.
- 20:12:28 [ericP-mobile]
- ericP-mobile has joined #csvw
- 20:12:38 [JeniT_]
- JeniT_ has joined #csvw
- 20:13:02 [danbri]
- (example in mind: col in table that might be any of the croatian/bosnian/serbian lang, some in cyrillic some in latin script, but lacking formal per-cell details)
- 20:13:04 [AxelPolleres]
- … additional information about “all languages” used/mentioned in the document?
- 20:13:44 [AxelPolleres]
- Richard: top level is “who are the users”, second level is “rendering”.
- 20:14:02 [ericP-mobile]
- q+ to ask if there are use cases for language-independent locales?
- 20:14:24 [AxelPolleres]
- ivan: we didn’t differentiate that so far
- 20:14:26 [danbri]
- q- ericP-mobile
- 20:15:40 [AxelPolleres]
- ericP: trivial use cases, like numeric data have no language.
- 20:15:59 [AxelPolleres]
- Addision Williams: there is language tags for “no language”
- 20:16:38 [AxelPolleres]
- topic: datatypes
- 20:17:40 [AxelPolleres]
- jeniT: datatypes per columns, cell, etc. are a common issue, e.g. xml:schema, string-value vs. semantic value
- 20:18:24 [AxelPolleres]
- … in XML schema string values are constrained.
- 20:18:41 [danbri]
- q?
- 20:19:06 [AxelPolleres]
- … to ISO format, extremly difficult for CSVs if generated locally.
- 20:19:28 [AxelPolleres]
- … we would like to be able to map from type to particular formatting for that type.
- 20:19:52 [ivan]
- q+
- 20:20:04 [bjdmeest]
- bjdmeest has joined #csvw
- 20:20:13 [AxelPolleres]
- Addison Williams: e.g. there are other calendars besides Gregorian, this makes it much more complex
- 20:20:46 [danbri]
- ack ivan
- 20:20:53 [AxelPolleres]
- (on the example of data “27/10/2014” vs “27th October 2014” on the whiteboard)
- 20:21:08 [ericP-mobile]
- q+ r12a
- 20:21:21 [danbri]
- e.g. http://www.w3.org/TR/xpath-functions-30/#syntax-of-picture-string
- 20:21:22 [AxelPolleres]
- ivan: trying to see whether there’s a standard on the “picture values”
- 20:21:27 [danbri]
- q?
- 20:21:53 [danbri]
- http://cldr.unicode.org/
- 20:22:18 [AxelPolleres]
- Addison: refers to ICU library
- 20:22:48 [AxelPolleres]
- (is that this one http://site.icu-project.org/ ?)
- 20:23:05 [danbri]
- http://www.unicode.org/cldr/charts/26/by_type/index.html
- 20:23:23 [danbri]
- q?
- 20:23:52 [AxelPolleres]
- Richard: different schemas, XML Schema (referring to ISO), HTML, UNICODE…
- 20:24:31 [AxelPolleres]
- Ivan: we are rather talking about how to “understand” certain strings as “ISO” …
- 20:25:14 [danbri]
- q?
- 20:25:22 [danbri]
- ack r
- 20:25:26 [JeniT]
- http://www.unicode.org/cldr/charts/26/summary/en.html
- 20:25:37 [AxelPolleres]
- Addison Williams: unicode defines all those here : http://www.unicode.org/cldr/charts/26/summary/en.html
- 20:26:44 [AxelPolleres]
- Richard: you might need more than just the picture strings, e.g. ‘$’ meaning USD or Australien Dollars or HK Dollar, etc.
- 20:27:06 [aphillip]
- aphillip has joined #csvw
- 20:27:16 [danbri]
- http://en.wikipedia.org/wiki/ISO_4217
- 20:27:21 [AxelPolleres]
- Addison Williams: 3-char code for currencies: ISO4217
- 20:27:29 [aphillip]
- http://www.unicode.org/reports/tr35
- 20:27:57 [danbri]
- q?
- 20:28:09 [AxelPolleres]
- q+ to ask wher to stop
- 20:28:17 [aphillip]
- s/Addison Williams/Addison Phillips/
- 20:28:34 [danbri]
- q?
- 20:28:45 [jtandy]
- q+
- 20:28:57 [danbri]
- ack axel
- 20:28:57 [Zakim]
- AxelPolleres, you wanted to ask wher to stop
- 20:29:05 [danbri]
- "There are things like unit ontologies too...
- 20:29:10 [danbri]
- .. could go arbitarily far
- 20:29:15 [danbri]
- … standard units, …
- 20:29:21 [danbri]
- … i am not clear on where this would stop
- 20:29:28 [danbri]
- e.g. the number of cars per 1000% people
- 20:29:33 [danbri]
- [see also QUDT]
- 20:30:07 [r12a]
- r12a has joined #csvw
- 20:30:08 [danbri]
- ack jtandy
- 20:30:13 [AxelPolleres]
- ivan: we shouldn’t go beyond XSD datatypes.
- 20:30:32 [r12a]
- q+
- 20:31:01 [danbri]
- q+
- 20:31:10 [AxelPolleres]
- Jeremy: maybe we can add in metadata a script that trnasforms “picture strings” into prescribed format before validation.
- 20:31:20 [danbri]
- ack r12a
- 20:31:25 [bjdmeest]
- s/trnasforms/transforms/
- 20:31:25 [AxelPolleres]
- … we should allow people to work around.
- 20:31:36 [danbri]
- q-
- 20:32:00 [AxelPolleres]
- Richard: it would be easier if you’d go with one global standard.
- 20:32:26 [AxelPolleres]
- ivan: if you consider the data out there, that wouldn’t work, in reality, everybody uses what they want.
- 20:33:01 [ericstephan]
- q+
- 20:33:34 [AxelPolleres]
- Addison Williams: range of date variation formats is huge.
- 20:33:35 [gkellogg]
- gkellogg has joined #csvw
- 20:33:37 [r12a]
- q+
- 20:33:48 [ericP-mobile]
- q+ to ask the coverage of picture formats
- 20:34:08 [AxelPolleres]
- Ivan: is there a relatively simple picture string format we could refer to and use, which covers ~70% of cases?
- 20:34:35 [AxelPolleres]
- … that we can refer to and otherwise, for special cases, allow preprocessing?
- 20:35:10 [AxelPolleres]
- Addison Williams: e.g. month abbreviations in various language already make it complex.
- 20:36:06 [AxelPolleres]
- Ivan: month abbreviations should be part of locale. We should look around usual libraries in common prog. langs
- 20:36:32 [AxelPolleres]
- … I am uneasy with saying “either use an ISO string or give me a program”
- 20:36:53 [danbri]
- q?
- 20:37:20 [danbri]
- ack erics
- 20:38:23 [danbri]
- ericstephan, … see http://www.w3.org/TR/2014/WD-tabular-data-model-20140327/#excel for special casing documentation around Excel
- 20:38:25 [AxelPolleres]
- erik: we specify in the tabular metadata doc explicitly about e.g. Excel and the Date-formatting they use
- 20:38:51 [hadleybeeman]
- Re spreadsheets, I think the Open Document Format supports dc:date, if that helps any
- 20:38:53 [AxelPolleres]
- … that is a technology-based solution.
- 20:38:57 [danbri]
- ack r
- 20:40:02 [AxelPolleres]
- Richard: HTML will require you to use one standard format for dates, why not start out with that format.
- 20:40:16 [AxelPolleres]
- JeniT: because there are masses of documents that don’t use it.
- 20:40:40 [danbri]
- ack ericP-mobile
- 20:40:40 [Zakim]
- ericP-mobile, you wanted to ask the coverage of picture formats
- 20:41:01 [AxelPolleres]
- Richard: the argument about prescribing utf-8 is similar.
- 20:41:32 [hadleybeeman]
- Here is how the ODF spec handles it: http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#__RefHeading__1416366_253892949
- 20:42:08 [AxelPolleres]
- Addison Williams: CLDR contains 100s of locales, not everything, but for data out there is has decent coverage
- 20:42:18 [danbri]
- q?
- 20:42:43 [danbri]
- JeniT, how much time needed for rest of datatypes topics in agenda?
- 20:42:52 [AxelPolleres]
- ericP: the value of being able to read existing data, is not that much value.
- 20:43:02 [ErikMannens]
- ErikMannens has joined #CSVW
- 20:43:07 [JeniT]
- we should move on if we can, but it would be good to get a direction of travel
- 20:43:26 [AxelPolleres]
- q+ to ask about the value of annotatin existing data with scripts.
- 20:43:27 [phila]
- phila has joined #csvw
- 20:43:42 [danbri]
- q+ to ask i18n folks how to continue this
- 20:44:39 [ericstephan]
- ericstephan has left #csvw
- 20:45:09 [ericstephan]
- ericstephan has joined #csvw
- 20:45:16 [AxelPolleres]
- ericP: we could do several levels, the question we want to ask ourselves is where to step.
- 20:45:55 [danbri]
- axel: q … if we ask ourselves how far we want to go. What makes us believe people who are not willing to convert their data into a specific format, … why will they go produce metadata to do the mappings
- 20:46:00 [danbri]
- jenit: it could be a 3rd party
- 20:46:01 [danbri]
- q?
- 20:46:04 [danbri]
- ack axel
- 20:46:04 [Zakim]
- AxelPolleres, you wanted to ask about the value of annotatin existing data with scripts.
- 20:46:21 [AxelPolleres]
- Ivan: metadata can be decoupled
- 20:46:23 [danbri]
- ack me
- 20:46:23 [Zakim]
- danbri, you wanted to ask i18n folks how to continue this
- 20:46:30 [AxelPolleres]
- Axel: that answers my question then.
- 20:47:15 [AxelPolleres]
- JeniT: i’m inclined to get us jsut to Picture-String without locale (e.g. no multi-language month abbreviations)
- 20:47:57 [AxelPolleres]
- JeniT: … that seems like a good direction for me.
- 20:48:00 [danbri]
- q+ r12a
- 20:48:35 [AxelPolleres]
- Ivan: not sure, we make a requirement that is weaker than most of the prog. lang. libraries out there.
- 20:48:56 [r12a]
- q+
- 20:49:41 [danbri]
- ack r
- 20:49:48 [AxelPolleres]
- Richard: currency examples not covered.
- 20:49:55 [danbri]
- (eg. $)
- 20:49:58 [danbri]
- q?
- 20:50:02 [AxelPolleres]
- q+ to state unit != datatype
- 20:50:04 [jtandy]
- q+
- 20:50:23 [hadleybeeman]
- Is currency best captured under locale, or as metadata in its own right?
- 20:50:25 [AxelPolleres]
- q-
- 20:50:36 [danbri]
- ack jt
- 20:50:48 [AxelPolleres]
- ivan: datatypes are not units, currency not a good example.
- 20:51:20 [AxelPolleres]
- q+ to repeat myself.
- 20:51:39 [jtandy]
- q?
- 20:52:09 [danbri]
- ack A
- 20:52:09 [Zakim]
- AxelPolleres, you wanted to repeat myself.
- 20:52:50 [danbri]
- axel "agree currency eg is not a good one (for datatypes). The datatype of a price is number not currency. If the metadata could be decoupled from data, … we could equally well say that someone else republishes the curated data."
- 20:53:12 [danbri]
- … if there is so much data out there with poor datatyping, … … isn't data republishing as likely as metadata annotation?
- 20:53:27 [bjdmeest]
- q+ to say someone could publish a parsing method
- 20:53:31 [danbri]
- ivan: a matter of scale, … if you have terrabytes of data working at metadata level is easier
- 20:54:08 [danbri]
- ack b
- 20:54:08 [Zakim]
- bjdmeest, you wanted to say someone could publish a parsing method
- 20:54:24 [AxelPolleres]
- Ben: instead of republishing, can we publish a “re-publishing” method.
- 20:54:39 [AxelPolleres]
- s/method./method?/
- 20:54:55 [danbri]
- Thanks aphillip, r12a :)
- 20:55:16 [JeniT]
- http://w3c.github.io/csvw/metadata/#datatypes
- 20:55:31 [AxelPolleres]
- topic: built-in datatypes
- 20:56:33 [danbri]
- jenit: number, binary, datetime come from json tables which comes from json schema
- 20:56:51 [gkellogg]
- q+ to ask about schema:Date/Time/Duration types
- 20:57:15 [AxelPolleres]
- ivan: we may not want to add geopoint
- 20:57:39 [AxelPolleres]
- JeniT: propose we just ignore geopoint alltogether
- 20:58:12 [JeniT]
- PROPOSAL: We should not support ‘geopoint’ as a datatype
- 20:58:15 [AxelPolleres]
- discussing issues ISSUE-13 ff
- 20:58:18 [ivan]
- +1
- 20:58:19 [JeniT]
- +1
- 20:58:22 [jtandy]
- +1
- 20:58:22 [ericstephan]
- +1
- 20:58:25 [gkellogg]
- +0
- 20:58:25 [danbri]
- +1
- 20:58:27 [bill-ingram1]
- +1
- 20:58:30 [AxelPolleres]
- +0
- 20:58:30 [bjdmeest]
- +0
- 20:58:45 [ivan]
- RESOLVED: We should not support ‘geopoint’ as a datatype
- 20:59:05 [JeniT]
- PROPOSAL: We should not support ‘object’, ‘array’ or ‘geojson’ as datatypes
- 20:59:12 [JeniT]
- (this is ISSUE 14 in the document)
- 20:59:13 [danbri]
- (issue is purely in the doc, not in w3c tracker or github tracker)
- 20:59:16 [ivan]
- +1
- 20:59:21 [danbri]
- +1
- 20:59:22 [JeniT]
- +1
- 20:59:24 [bill-ingram]
- +1
- 20:59:26 [ericstephan]
- +1
- 20:59:29 [AxelPolleres]
- (ISSUE 13 in the document should be closed)
- 20:59:32 [gkellogg]
- +1
- 20:59:49 [AxelPolleres]
- +1
- 21:00:32 [bjdmeest]
- +1
- 21:00:35 [ivan]
- RESOLVED: We should not support ‘object’, ‘array’ or ‘geojson’ as datatypes
- 21:00:49 [danbri]
- issue 15, the any type
- 21:01:01 [danbri]
- from doc, "We invite comment on whether the any type is useful."
- 21:01:04 [AxelPolleres]
- Jeremy: we will support some knd of list types though, or parsing lists.
- 21:01:11 [danbri]
- q?
- 21:01:31 [danbri]
- gkellogg, can we deal with your point after these issues are resolved?
- 21:02:17 [AxelPolleres]
- JeniT: ISSUE-15 we could enable to let people declare explicitly that something is of no particular datatype.
- 21:02:19 [JeniT]
- PROPOSAL: It is useful to have an ‘any’ type to explicitly say that anything is allowed
- 21:02:23 [jtandy]
- +1
- 21:02:30 [ivan]
- +1
- 21:03:40 [AxelPolleres]
- eric: what’s the difference between any type and string?
- 21:03:53 [danbri]
- +1
- 21:03:59 [AxelPolleres]
- jeremy: it is made explicit.
- 21:04:45 [AxelPolleres]
- -0
- 21:04:52 [ivan]
- +1
- 21:04:54 [ericstephan]
- +1
- 21:04:55 [danbri]
- +0.5
- 21:05:02 [AxelPolleres]
- JeniT: still unsure.
- 21:05:03 [gkellogg]
- +0
- 21:05:15 [danbri]
- q?
- 21:05:17 [fjh]
- fjh has joined #csvw
- 21:05:38 [danbri]
- q+ to ask about null
- 21:05:42 [fjh]
- fjh has joined #csvw
- 21:05:53 [fjh]
- fjh has joined #csvw
- 21:05:58 [AxelPolleres]
- Ivan: this is for mixed type columns
- 21:06:57 [danbri]
- ack g
- 21:06:57 [Zakim]
- gkellogg, you wanted to ask about schema:Date/Time/Duration types
- 21:07:42 [AxelPolleres]
- gregg: schema.org uses different datatypes than xml schema…
- 21:07:58 [danbri]
- ack danbri
- 21:07:58 [Zakim]
- danbri, you wanted to ask about null
- 21:08:00 [danbri]
- q?
- 21:08:06 [AxelPolleres]
- ivan: I am changeing back my vote to 0
- 21:08:29 [danbri]
- example: birthdate, deathdate
- 21:08:44 [AxelPolleres]
- JeniT: shall we allow empty values for particular cells, e.g. deathdate
- 21:09:11 [AxelPolleres]
- “”^^:null
- 21:09:18 [danbri]
- +1
- 21:10:59 [bjdmeest]
- EricP: any can be solved on the application level
- 21:11:02 [AxelPolleres]
- scribe: bjdmeest
- 21:11:07 [ivan]
- q+ to a reference back to the locale issue for the minutes
- 21:11:13 [danbri]
- time til next break: 19 mins
- 21:11:19 [bjdmeest]
- EricP: technicaly, it will be a string
- 21:11:19 [laufer]
- q+
- 21:11:21 [JeniT]
- +1
- 21:11:27 [danbri]
- thanks new scribe, thanks old scribe!
- 21:12:03 [danbri]
- q+
- 21:12:03 [bjdmeest]
- EricP: for RDF: top-level is string
- 21:12:58 [danbri]
- q-
- 21:13:25 [bjdmeest]
- Jeni: might keep this as an issue...
- 21:13:46 [bjdmeest]
- laufer: semantics is in the application, or in the datatype?
- 21:13:58 [bjdmeest]
- ivan: data can come from different sources
- 21:14:24 [danbri]
- ack ivan
- 21:14:24 [Zakim]
- ivan, you wanted to a reference back to the locale issue for the minutes
- 21:14:24 [AxelPolleres]
- q?
- 21:14:27 [ivan]
- The reference to use for the date 'picture strings' is: http://www.unicode.org/reports/tr35/
- 21:14:40 [danbri]
- ack laufer
- 21:14:59 [danbri]
- next, "We invite comment on whether there should be types for formats like XML, HTML and markdown which may appear within CSV cells"
- 21:15:06 [bjdmeest]
- Jeni: issue 16 is support for other kind of stuff (xml, html, markdown)
- 21:15:12 [bjdmeest]
- ... how to handle those
- 21:15:20 [bjdmeest]
- ... simple strings? specific datatypes?
- 21:15:42 [bjdmeest]
- ... Markdown would be very useful
- 21:15:55 [danbri]
- q?
- 21:15:55 [bjdmeest]
- EricP: what is user can define types?
- 21:16:06 [JeniT]
- q+ to suggest using media types
- 21:16:11 [bjdmeest]
- ... to support different markdown flavors
- 21:16:12 [danbri]
- q+ to ask difference datatype / mediatypes (aka mimetypes)
- 21:16:16 [ivan]
- q+
- 21:16:27 [danbri]
- ack j
- 21:16:27 [Zakim]
- JeniT, you wanted to suggest using media types
- 21:16:29 [bjdmeest]
- Jeni: possiblity:
- 21:16:45 [bjdmeest]
- ... specify media-type
- 21:17:13 [danbri]
- ack me
- 21:17:13 [Zakim]
- danbri, you wanted to ask difference datatype / mediatypes (aka mimetypes)
- 21:17:14 [bjdmeest]
- danbri: what's the difference between media-type and data-type?
- 21:17:27 [danbri]
- ack i
- 21:17:33 [bjdmeest]
- ivan: technical question:
- 21:17:45 [bjdmeest]
- ... not full xml, but only fragments, same for html
- 21:18:08 [bjdmeest]
- ... is that something to use a datatype?
- 21:19:09 [bjdmeest]
- ... maybe we have to define a datatype for markdown?
- 21:19:27 [danbri]
- (q: normative refs to markdown?)
- 21:19:33 [danbri]
- q?
- 21:19:35 [bjdmeest]
- ... don't standardize markdown, just add datatype
- 21:20:02 [bjdmeest]
- jenit: people can specify their own with a prefix
- 21:20:31 [bjdmeest]
- ... we cannot define a markdown datatype, as there is no spec
- 21:21:11 [bjdmeest]
- danbri: fragments is usefull, we get hyperlinks
- 21:21:17 [bjdmeest]
- jenit: what about json in CVS?
- 21:21:47 [JeniT]
- PROPOSAL: We should add ‘xml’ and ‘html’ datatypes
- 21:21:49 [bjdmeest]
- jtandy: i have a lot of people add json in CSV
- 21:21:53 [danbri]
- q+
- 21:21:56 [JeniT]
- +1
- 21:22:04 [laufer]
- q+
- 21:22:05 [bjdmeest]
- gkellog: it's quite common to add html table
- 21:22:06 [ivan]
- +1
- 21:22:08 [ericstephan]
- +1
- 21:22:09 [bill-ingram]
- +1
- 21:22:19 [bjdmeest]
- danbri: csv is not really specified strcitly
- 21:22:37 [danbri]
- ack me
- 21:22:48 [gkellogg]
- +1
- 21:23:00 [bjdmeest]
- laufer: one may define its own datatype?
- 21:23:14 [JeniT]
- PROPOSAL: We should add ‘xml’ and ‘html’ datatypes as defined in RDF
- 21:23:19 [bjdmeest]
- ... string can have its own datatype
- 21:23:21 [chunming]
- chunming has joined #csvw
- 21:23:29 [bjdmeest]
- jtandy: define it in your own namespace
- 21:23:42 [bjdmeest]
- laufer: json is not a qualified datatype?
- 21:23:51 [bjdmeest]
- jenit: its not on the list (yet)
- 21:23:58 [bjdmeest]
- s/its/it's/
- 21:24:13 [ivan]
- +1
- 21:24:19 [danbri]
- +1
- 21:24:25 [JeniT]
- RESOLVED: We will add ‘xml’ and ‘html’ datatypes as defined in RDF
- 21:24:36 [JeniT]
- PROPOSAL: We should add ‘json’ datatype with our own namespace
- 21:24:52 [bjdmeest]
- ivan: what is the official status of json?
- 21:25:01 [bjdmeest]
- jenit: there is an IETF and an ?? spec
- 21:25:08 [danbri]
- (ECMA spec)
- 21:25:11 [bjdmeest]
- ivan: is anything stable?
- 21:25:29 [gkellogg]
- Common Markdown Spec: http://spec.commonmark.org/0.6/
- 21:25:41 [bjdmeest]
- ericP: easier to have a stable spec for json than for markdown
- 21:25:54 [bjdmeest]
- gkellog: there is a communicty spec for (common) markdown
- 21:26:12 [bjdmeest]
- s/communicty/community/
- 21:26:16 [ivan]
- +0
- 21:26:20 [bjdmeest]
- jenit: should we have a json datatype?
- 21:26:21 [danbri]
- +1
- 21:26:29 [ericstephan]
- +1
- 21:26:32 [jtandy]
- +1 ... I've seen it in the wild
- 21:26:32 [gkellogg]
- +1
- 21:26:34 [JeniT]
- +1
- 21:26:39 [bill-ingram]
- +1
- 21:26:54 [bjdmeest]
- ericP: implications might be big? at parsing...
- 21:27:15 [danbri]
- q?
- 21:27:21 [bjdmeest]
- ... signed up for row with value, possibly array, but with json... might explode
- 21:27:38 [bjdmeest]
- jenit: if there is embedded json: do we parse it?
- 21:27:39 [danbri]
- laufer, you're still on the queue. Was this a new question/topic?
- 21:27:53 [bjdmeest]
- gkellog: what about json-ld? merge with graph in RDF serialization?
- 21:28:06 [bjdmeest]
- ivon: json inside json? how do processors react?
- 21:28:18 [bjdmeest]
- jenit: same with xml serialization
- 21:28:41 [danbri]
- q+
- 21:28:55 [bjdmeest]
- ericP: my guess: serialization: escape with quotes
- 21:28:56 [laufer]
- q-
- 21:29:27 [danbri]
- ack me
- 21:29:31 [bjdmeest]
- danbri: what is worse than 10 mb of json inside csv? 10 mb of anything inside csv
- 21:29:50 [bjdmeest]
- jenit: during mapping: json or xml or ... remain strings
- 21:29:58 [bjdmeest]
- ... possibly datatype string