See also: IRC log
<trackbot> Date: 27 March 2015
<hadleybeeman> Hi all!
<annette_g> Hi!
<RiccardoAlbertoni> hi all!
<hadleybeeman> adler1, would you be willing to scribe?
<phila> scribe: adler1
<hadleybeeman> http://www.w3.org/2013/meeting/dwbp/2015-03-20
proposed: approve last meeting minutes
<ericstephan> +1
<yaso1> +1
<MTCarrasco> +1
<annette_g> +1
<newton> +1
<RiccardoAlbertoni> +1
<deirdrelee> +1
<phila> RESOLVED: approve last meeting minutes
<hadleybeeman> http://www.w3.org/2013/dwbp/track/issues/open
hadleybeeman: lets jump into open issues
<phila> issue-48?
<trackbot> issue-48 -- Phil to look at whether the ucr doc sufficiently covers code lists -- open
<trackbot> http://www.w3.org/2013/dwbp/track/issues/48
<phila> issue-48 resolved on the mailing list http://www.w3.org/2013/dwbp/track/issues/48
<phila> close issue-48
<trackbot> Closed issue-48.
<deirdrelee> formats: 67 examples: 56 policy: 74 technologies: 144
what is the general structure of the best practices document
<phila> issue-67?
<trackbot> issue-67 -- Should we include a best practice around which format to use? (csv, json, json-ld, xml, etc.) -- closed
<trackbot> http://www.w3.org/2013/dwbp/track/issues/67
<phila> issue-56?
<trackbot> issue-56 -- We need context and examples. do they go into the rec-track documents or into a separate note? -- closed
<trackbot> http://www.w3.org/2013/dwbp/track/issues/56
<phila> issue-74?
<trackbot> issue-74 -- Is it in scope to include mention of policy framework etc. as part of the non-normative discussion/editorialisation of the bp doc -- open
<trackbot> http://www.w3.org/2013/dwbp/track/issues/74
<phila> issue-144?
<trackbot> issue-144 -- There is a technological bias in several parts of the document -- open
<trackbot> http://www.w3.org/2013/dwbp/track/issues/144
<phila> issue-67?
<trackbot> issue-67 -- Should we include a best practice around which format to use? (csv, json, json-ld, xml, etc.) -- closed
<trackbot> http://www.w3.org/2013/dwbp/track/issues/67
<annette_g> link for 67? not on the list
bernadette: I was in Sao Paolo, and we worked through some issues and closed issue-67
<MTCarrasco> DSV (Delimiter-separated values)
<MTCarrasco> http://en.wikipedia.org/wiki/Delimiter-separated_values
phila: you can use tabs or commas
to separate things
... the term everyone uses is CSV
<ericstephan> I agree that we should broaden the term, but the CSV on the web WG uses"Tabular Data".
phila: when we say CSV we also mean tab and comma, but it may be more accurate to say delimeter separated value
CSV is easier
no need to use terms people don't often use
<MTCarrasco> Find to use CSV meaning DSV
<BartvanLeeuwen> +1 to confusion
<ericstephan> http://www.w3.org/TR/tabular-data-model/ Tabular data is data that is structured into rows, each of which contains information about some thing.
annette: we could build a glossary to explain formats
<MTCarrasco> http://en.wikipedia.org/wiki/Attribute%E2%80%93value_pair
<MTCarrasco> Key-value pairs
thomas: the question includes using key value pairs as a format
<ericstephan> adler1: There are so many other file types that are used in the open data
<BernadetteLoscio> +1 to Annette
annette: we should avoid key value pairs because it suggests many formats
deidrelee: I agree with annette,
for the main types of data we can stay with common formats and
in Ireland we can name common formats for additional
filetypes
... there are other filetypes to ack those as well
<ericstephan> there are also many other open binary formats that might be useful as well
<ericstephan> +1 deirdrelee
<MTCarrasco> Formats: the most populars - textual: CSV, key-value, JSON, XML
<Zakim> phila, you wanted to talk about recommended file types
<annette_g> JSON = key-value
thomas: we can mention the most popular file types, csv, key value, xml, json
phila: RDF too
<MTCarrasco> Graphical: PNG
<phila> adler1: Makes the point about information needing to be human consumable in things like images, videos and docs etc.
ericstephen: original discussions
about formats and we should be explicit about file
formats
... maybe we need a document talking about specific
formats
... families of formats, like netcdf. we could go down the road
describing all kinds of formats
... is there a bucket we could put all the file formats
into
bernadette: maybe we can find
some data formats and create a section in the document and list
the most used data formats
... i think we can list the main data formats
deidrelee: I am not going to
disagree with steve. there is an org in ireland that does
archiving for ireland
... we focus on tabular and the org focuses on other file
type
<ericstephan> +1 deirdrelee maybe a "data catalog"?
deidrelee: we could classify formats as open and machine-readable and make a matrix
JPEG is far from dead. as a photographer, I use it all the time
annette: a good idea to include image and video formats, but we should focus on a data format and reference to media files
<MTCarrasco> The examples must be specific and give examples with the most popular formats: textual, graphics ... at least one for each type
media files have EXIF metadata
thomas: we must use specific examples and most popular formats
<MTCarrasco> Also package formats
how to package the things
<MTCarrasco> http://joinup.ec.europa.eu/site/med/tem/
<hadleybeeman> +1 to phila
<MTCarrasco> Package forma: http://dragoman.org/xdossier
<RiccardoAlbertoni> +1 to phil
phila: there is an intermediary between the data and a human being, and our remit stops at that intermediary
<hadleybeeman> We have to help people publish good, clean, usable data so that developers can make good visualisations, displays, conclusions, etc from it.
<ericstephan> scientists also have nasty habits of hanging onto formats that might be considered obsolete by the web publishing community. non-binary formats are used to be a bridge to those formats
<phila> Share-PSI http://www.w3.org/2013/share-psi/bp/hls/
<MTCarrasco> The formats must be human and machine readable -
the use of media files are well documented in our use cases
<CarlosIglesias> multimedia galleries is a quite usual use case for open government data as well
<deirdrelee> +1
phila: disagrees with adler1 and thinks media files are out of scope for the group
<MTCarrasco> "Package or Perish"
bart: how much does a specific file format have on the best practice
<hadleybeeman> @mtcarrasco: I don't know much about packaging. When would someone want to do it? What are the use cases?
<ericstephan> good point BartvanLeeuwen
<MTCarrasco> Not mentioned formats are not excluded - but we have to be specific
<ericstephan> of course if you note my comment on the sciences this isn't true
<phila> adler1: CSV is nearly 40 years old as an export format from Lotus
<phila> adler1: JPG is used everywhere for images
<hadleybeeman> steve is brilliantly articulating the reason for the charter for the CSV on the Web working group :)
<ericstephan> shapefiles geospatial...
<phila> ... AVI, MP3 etc are all widely used. Open data is new
<phila> ... we don't have to catalogue all file formats but we do need to look at our use cases and the file formats that they refer to
<phila> ... I like the idea of a table of file formats that may be used and what they can be used for.
<phila> adler1: Some files have their own metadata file (EXIF etc.)
<phila> ... and we can talk about that.
<MTCarrasco> The Art of Unix Programming - Textuality - http://www.catb.org/esr/writings/taoup/html/textualitychapter.html
<phila> adler1: I think it behoves us to create a glossary of these and where our standards fit
deidrelee: there are an infinite number of file types, and a lot are specific to systems, and we care about data inter-operability
<ericstephan> Should we ONLY reference file formats that have a syntatic online definition we can referenc?
deidrelee: we focus on RDF, CSV and they are fit for purpose
<ericstephan> E.g. IETF, W3C links to formats
deidrelee: our document is a snapshot of what we know and does not exlcude new formats
annette: I think we are talking
about two different things, data vs file format
... maybe we need to split that into two different things
thomas: we must create a specific
example with specific formats
... the example must be human and machine-readable
<CarlosIglesias> don't think all formats need to be human and machine readable
<CarlosIglesias> something nice to have sometimes
hadley: we have a lot of different topics in this discussion and we should focus on data on the web
<CarlosIglesias> but not others
<CarlosIglesias> even counter-productive in same cases
<MTCarrasco> terminology: data model vs. data model - http://dragoman.org/format
JPEG is machine readable
<MTCarrasco> terminology: data model vs. data format - http://dragoman.org/format
<phila> issue-67?
<trackbot> issue-67 -- Should we include a best practice around which format to use? (csv, json, json-ld, xml, etc.) -- closed
<trackbot> http://www.w3.org/2013/dwbp/track/issues/67
open a new issue
what file formats should we list in our glossary that are human and machine readable
<annette_g> should we add a BP about file formats (as opposed to data formats)?
<annette_g> +1 to close
<newton> Maybe we should create one action to someone to provide examples for the BP. Those examples showing the possible formats that could be used, showing json, xml, csv
<MTCarrasco> Format is central to data
<annette_g> http://www.w3.org/TR/dwbp/#dataFormats
<phila> adler1: I agree that the issue is closed. But today we're tallking about something else. We're looking at discovery, quality etc. And that can include data in any file formats
<phila> ... these BPs could apply to multiple file types that are both human and machine readable, and that's the criteria for inclusion in the glossary
<MTCarrasco> Define data types
annette: file formats are jpeg and data file formats are csv
<hadleybeeman> +1 to annette
annette: ASCII is the file format and CSV is the structure
<Zakim> phila, you wanted to talk about using PDF consistently with our BPs
<phila> Is the requirement specifically relevant to data published on the Web?
<phila> Does the requirement encourage reuse or publication of data on the Web?
<phila> Is the requirement testable?
<CarlosIglesias> jpeg has also its own structure
<CarlosIglesias> why it is not a "data format"?
phila: if you publish data in pdf you are publishing information for people, not machines, it works against re-use
<hadleybeeman> @CarlosIglesias, how can jpegs be reused? :)
photos are machine searchable today for faces, buildings, and text
<hadleybeeman> (or rather, how can the data in jpegs be reused)
<CarlosIglesias> for multimedia content
<yaso1> Agree with phil, maybe put metadata on the image, on the video, on the gif, but that's all
<CarlosIglesias> through the associated metadata
<ericstephan> images are analyzed for electron microscopy , new instruments are producing 100,000 images a second that need to be analyzed by machines.
<CarlosIglesias> in fact they are one of the most reused formats
<CarlosIglesias> you can reuse the full image, not necessary just some bits
<MTCarrasco> Define: data model, file format, data format, data type
<annette_g> we can define terms in the intro, too
<MTCarrasco> file format and data format: the same
antoine: in the some of the use case, DQ mentions quality of type and features
<ericstephan> Its a really interesting discussion publishing for humans/machines
<hadleybeeman> +1 to ericstephan
antoine: idea to sort the issue of different file types to consider DQ and if it makes sense for these files
indeed it is critically important to validate DQ in images and video which are constantly abused
<phila> I like that antoine
<ericstephan> BernadetteLoscio: perhaps this is a data usage vocab item?
<MTCarrasco> data quality: format quality and content quality
I could send you all terabytes of fraudulent images and video
<BernadetteLoscio> https://www.w3.org/2013/dwbp/wiki/Glossary
<MTCarrasco> terms: +1
80% ofall internet traffic is media
<hadleybeeman> me, bye antoine!
<deirdrelee> +1. A glossary will address all these issues: 52, 59, 68, 80, 82, 133
+1
thank you
<Caroline_> bye!
bank holiday next week
cancelling call next week
<Caroline_> it will be Holliday in Brazil
<phila> Easter Friday is a holiday in many countries so no call next week
<Caroline_> ok
<AdrianoC-UFMG> Thanks, all!
<ericstephan> yeah for sleeping in!
<ericstephan> We need a F2F agenda still?
<yaso1> Bye! Happy easter and eat chocolates!
<CarlosIglesias> bye!
thanks everyone
<MTCarrasco> bye
<RiccardoAlbertoni> bye
<annette_g> bye
<phila> trackbot end meeting
<phila> trackbot end meeting
This is scribe.perl Revision: 1.140 of Date: 2014-11-06 18:16:30 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/propose: approve last meeting minutes// Succeeded: s/hadly/hadleybeeman/ Succeeded: s/ as / uses/ Succeeded: s/CVS/CSV/ Succeeded: s/ if you publish data in pdf you are publishing information for/ if you publish data in pdf you are publishing information for people, not machines, it works against re-use/ Found Scribe: adler1 Inferring ScribeNick: adler1 WARNING: No "Topic:" lines found. Default Present: BartvanLeeuwen, phila, MTCarrasco, annette_g, HadleyBeeman, ericstephan, deirdrelee, RiccardoAlbertoni, antoine, adler1, newton, BernadetteLoscio, Caroline_, CarlosIglesias Present: BartvanLeeuwen phila MTCarrasco annette_g HadleyBeeman ericstephan deirdrelee RiccardoAlbertoni antoine adler1 newton BernadetteLoscio Caroline_ CarlosIglesias Agenda: https://www.w3.org/2013/dwbp/wiki/Meetings:Telecon20150327 Found Date: 27 Mar 2015 Guessing minutes URL: http://www.w3.org/2015/03/27-dwbp-minutes.html People with action items: WARNING: Input appears to use implicit continuation lines. You may need the "-implicitContinuations" option. WARNING: No "Topic: ..." lines found! Resulting HTML may have an empty (invalid) <ol>...</ol>. Explanation: "Topic: ..." lines are used to indicate the start of new discussion topics or agenda items, such as: <dbooth> Topic: Review of Amy's report[End of scribe.perl diagnostic output]