Data on the Web Best Practices Working Group Teleconference -- 27 Mar 2015

<trackbot> Date: 27 March 2015

<hadleybeeman> Hi all!

<annette_g> Hi!

<RiccardoAlbertoni> hi all!

<hadleybeeman> adler1, would you be willing to scribe?

<phila> scribe: adler1

<hadleybeeman> http://www.w3.org/2013/meeting/dwbp/2015-03-20

proposed: approve last meeting minutes

<ericstephan> +1

<yaso1> +1

<MTCarrasco> +1

<annette_g> +1

<newton> +1

<RiccardoAlbertoni> +1

<deirdrelee> +1

<phila> RESOLVED: approve last meeting minutes

<hadleybeeman> http://www.w3.org/2013/dwbp/track/issues/open

hadleybeeman: lets jump into open issues

<phila> issue-48?

<trackbot> issue-48 -- Phil to look at whether the ucr doc sufficiently covers code lists -- open

<trackbot> http://www.w3.org/2013/dwbp/track/issues/48

<phila> issue-48 resolved on the mailing list http://www.w3.org/2013/dwbp/track/issues/48

<phila> close issue-48

<trackbot> Closed issue-48.

<deirdrelee> formats: 67 examples: 56 policy: 74 technologies: 144

what is the general structure of the best practices document

<phila> issue-67?

<trackbot> issue-67 -- Should we include a best practice around which format to use? (csv, json, json-ld, xml, etc.) -- closed

<trackbot> http://www.w3.org/2013/dwbp/track/issues/67

<phila> issue-56?

<trackbot> issue-56 -- We need context and examples. do they go into the rec-track documents or into a separate note? -- closed

<trackbot> http://www.w3.org/2013/dwbp/track/issues/56

<phila> issue-74?

<trackbot> issue-74 -- Is it in scope to include mention of policy framework etc. as part of the non-normative discussion/editorialisation of the bp doc -- open

<trackbot> http://www.w3.org/2013/dwbp/track/issues/74

<phila> issue-144?

<trackbot> issue-144 -- There is a technological bias in several parts of the document -- open

<trackbot> http://www.w3.org/2013/dwbp/track/issues/144

<phila> issue-67?

<trackbot> issue-67 -- Should we include a best practice around which format to use? (csv, json, json-ld, xml, etc.) -- closed

<trackbot> http://www.w3.org/2013/dwbp/track/issues/67

<annette_g> link for 67? not on the list

bernadette: I was in Sao Paolo, and we worked through some issues and closed issue-67

<MTCarrasco> DSV (Delimiter-separated values)

<MTCarrasco> http://en.wikipedia.org/wiki/Delimiter-separated_values

phila: you can use tabs or commas to separate things
... the term everyone uses is CSV

<ericstephan> I agree that we should broaden the term, but the CSV on the web WG uses"Tabular Data".

phila: when we say CSV we also mean tab and comma, but it may be more accurate to say delimeter separated value

CSV is easier

no need to use terms people don't often use

<MTCarrasco> Find to use CSV meaning DSV

<BartvanLeeuwen> +1 to confusion

<ericstephan> http://www.w3.org/TR/tabular-data-model/ Tabular data is data that is structured into rows, each of which contains information about some thing.

annette: we could build a glossary to explain formats

<MTCarrasco> http://en.wikipedia.org/wiki/Attribute%E2%80%93value_pair

<MTCarrasco> Key-value pairs

thomas: the question includes using key value pairs as a format

<ericstephan> adler1: There are so many other file types that are used in the open data

<BernadetteLoscio> +1 to Annette

annette: we should avoid key value pairs because it suggests many formats

deidrelee: I agree with annette, for the main types of data we can stay with common formats and in Ireland we can name common formats for additional filetypes
... there are other filetypes to ack those as well

<ericstephan> there are also many other open binary formats that might be useful as well

<ericstephan> +1 deirdrelee

<MTCarrasco> Formats: the most populars - textual: CSV, key-value, JSON, XML

<Zakim> phila, you wanted to talk about recommended file types

<annette_g> JSON = key-value

thomas: we can mention the most popular file types, csv, key value, xml, json

phila: RDF too

<MTCarrasco> Graphical: PNG

<phila> adler1: Makes the point about information needing to be human consumable in things like images, videos and docs etc.

ericstephen: original discussions about formats and we should be explicit about file formats
... maybe we need a document talking about specific formats
... families of formats, like netcdf. we could go down the road describing all kinds of formats
... is there a bucket we could put all the file formats into

bernadette: maybe we can find some data formats and create a section in the document and list the most used data formats
... i think we can list the main data formats

deidrelee: I am not going to disagree with steve. there is an org in ireland that does archiving for ireland
... we focus on tabular and the org focuses on other file type

<ericstephan> +1 deirdrelee maybe a "data catalog"?

deidrelee: we could classify formats as open and machine-readable and make a matrix

JPEG is far from dead. as a photographer, I use it all the time

annette: a good idea to include image and video formats, but we should focus on a data format and reference to media files

<MTCarrasco> The examples must be specific and give examples with the most popular formats: textual, graphics ... at least one for each type

media files have EXIF metadata

thomas: we must use specific examples and most popular formats

<MTCarrasco> Also package formats

how to package the things

<MTCarrasco> http://joinup.ec.europa.eu/site/med/tem/

<hadleybeeman> +1 to phila

<MTCarrasco> Package forma: http://dragoman.org/xdossier

<RiccardoAlbertoni> +1 to phil

phila: there is an intermediary between the data and a human being, and our remit stops at that intermediary

<hadleybeeman> We have to help people publish good, clean, usable data so that developers can make good visualisations, displays, conclusions, etc from it.

<ericstephan> scientists also have nasty habits of hanging onto formats that might be considered obsolete by the web publishing community. non-binary formats are used to be a bridge to those formats

<phila> Share-PSI http://www.w3.org/2013/share-psi/bp/hls/

<MTCarrasco> The formats must be human and machine readable -

the use of media files are well documented in our use cases

<CarlosIglesias> multimedia galleries is a quite usual use case for open government data as well

<deirdrelee> +1

phila: disagrees with adler1 and thinks media files are out of scope for the group

<MTCarrasco> "Package or Perish"

bart: how much does a specific file format have on the best practice

<hadleybeeman> @mtcarrasco: I don't know much about packaging. When would someone want to do it? What are the use cases?

<ericstephan> good point BartvanLeeuwen

<MTCarrasco> Not mentioned formats are not excluded - but we have to be specific

<ericstephan> of course if you note my comment on the sciences this isn't true

<phila> adler1: CSV is nearly 40 years old as an export format from Lotus

<phila> adler1: JPG is used everywhere for images

<hadleybeeman> steve is brilliantly articulating the reason for the charter for the CSV on the Web working group :)

<ericstephan> shapefiles geospatial...

<phila> ... AVI, MP3 etc are all widely used. Open data is new

<phila> ... we don't have to catalogue all file formats but we do need to look at our use cases and the file formats that they refer to

<phila> ... I like the idea of a table of file formats that may be used and what they can be used for.

<phila> adler1: Some files have their own metadata file (EXIF etc.)

<phila> ... and we can talk about that.

<MTCarrasco> The Art of Unix Programming - Textuality - http://www.catb.org/esr/writings/taoup/html/textualitychapter.html

<phila> adler1: I think it behoves us to create a glossary of these and where our standards fit

deidrelee: there are an infinite number of file types, and a lot are specific to systems, and we care about data inter-operability

<ericstephan> Should we ONLY reference file formats that have a syntatic online definition we can referenc?

deidrelee: we focus on RDF, CSV and they are fit for purpose

<ericstephan> E.g. IETF, W3C links to formats

deidrelee: our document is a snapshot of what we know and does not exlcude new formats

annette: I think we are talking about two different things, data vs file format
... maybe we need to split that into two different things

thomas: we must create a specific example with specific formats
... the example must be human and machine-readable

<CarlosIglesias> don't think all formats need to be human and machine readable

<CarlosIglesias> something nice to have sometimes

hadley: we have a lot of different topics in this discussion and we should focus on data on the web

<CarlosIglesias> but not others

<CarlosIglesias> even counter-productive in same cases

<MTCarrasco> terminology: data model vs. data model - http://dragoman.org/format

JPEG is machine readable

<MTCarrasco> terminology: data model vs. data format - http://dragoman.org/format

<phila> issue-67?

<trackbot> issue-67 -- Should we include a best practice around which format to use? (csv, json, json-ld, xml, etc.) -- closed

<trackbot> http://www.w3.org/2013/dwbp/track/issues/67

open a new issue

what file formats should we list in our glossary that are human and machine readable

<annette_g> should we add a BP about file formats (as opposed to data formats)?

<annette_g> +1 to close

<newton> Maybe we should create one action to someone to provide examples for the BP. Those examples showing the possible formats that could be used, showing json, xml, csv

<MTCarrasco> Format is central to data

<annette_g> http://www.w3.org/TR/dwbp/#dataFormats

<phila> adler1: I agree that the issue is closed. But today we're tallking about something else. We're looking at discovery, quality etc. And that can include data in any file formats

<phila> ... these BPs could apply to multiple file types that are both human and machine readable, and that's the criteria for inclusion in the glossary

<MTCarrasco> Define data types

annette: file formats are jpeg and data file formats are csv

<hadleybeeman> +1 to annette

annette: ASCII is the file format and CSV is the structure

<Zakim> phila, you wanted to talk about using PDF consistently with our BPs

<phila> Is the requirement specifically relevant to data published on the Web?

<phila> Does the requirement encourage reuse or publication of data on the Web?

<phila> Is the requirement testable?

<CarlosIglesias> jpeg has also its own structure

<CarlosIglesias> why it is not a "data format"?

phila: if you publish data in pdf you are publishing information for people, not machines, it works against re-use

<hadleybeeman> @CarlosIglesias, how can jpegs be reused? :)

photos are machine searchable today for faces, buildings, and text

<hadleybeeman> (or rather, how can the data in jpegs be reused)

<CarlosIglesias> for multimedia content

<yaso1> Agree with phil, maybe put metadata on the image, on the video, on the gif, but that's all

<CarlosIglesias> through the associated metadata

<ericstephan> images are analyzed for electron microscopy , new instruments are producing 100,000 images a second that need to be analyzed by machines.

<CarlosIglesias> in fact they are one of the most reused formats

<CarlosIglesias> you can reuse the full image, not necessary just some bits

<MTCarrasco> Define: data model, file format, data format, data type

<annette_g> we can define terms in the intro, too

<MTCarrasco> file format and data format: the same

antoine: in the some of the use case, DQ mentions quality of type and features

<ericstephan> Its a really interesting discussion publishing for humans/machines

<hadleybeeman> +1 to ericstephan

antoine: idea to sort the issue of different file types to consider DQ and if it makes sense for these files

indeed it is critically important to validate DQ in images and video which are constantly abused

<phila> I like that antoine

<ericstephan> BernadetteLoscio: perhaps this is a data usage vocab item?

<MTCarrasco> data quality: format quality and content quality

I could send you all terabytes of fraudulent images and video

<BernadetteLoscio> https://www.w3.org/2013/dwbp/wiki/Glossary

<MTCarrasco> terms: +1

80% ofall internet traffic is media

<hadleybeeman> me, bye antoine!

<deirdrelee> +1. A glossary will address all these issues: 52, 59, 68, 80, 82, 133

thank you

<Caroline_> bye!

bank holiday next week

cancelling call next week

<Caroline_> it will be Holliday in Brazil

<phila> Easter Friday is a holiday in many countries so no call next week

<Caroline_> ok

<AdrianoC-UFMG> Thanks, all!

<ericstephan> yeah for sleeping in!

<ericstephan> We need a F2F agenda still?

<yaso1> Bye! Happy easter and eat chocolates!

<CarlosIglesias> bye!

thanks everyone

<MTCarrasco> bye

<RiccardoAlbertoni> bye

<annette_g> bye

<phila> trackbot end meeting

- DRAFT -

Data on the Web Best Practices Working Group Teleconference

27 Mar 2015

Attendees

Contents

Summary of Action Items

Scribe.perl diagnostic output