W3C

– DRAFT –
DXWG DCAT subgroup teleconference 28 June 2018

28 June 2018

Meeting Minutes

Confirm agenda

+1 for agenda

Approve minutes from last meetin

<riccardoAlbertoni> +1 to agenda

0 not there

<SimonCox> https://‌www.w3.org/‌2018/‌06/‌21-dxwgdcat-minutes

<SimonCox> 0 not there

<PWinstanley> +1

<riccardoAlbertoni> 0 ( i was not there)

<Jaroslav_Pullmann> 0 (absent)

Resolved: minutes approved

Mailing list questions

<SimonCox> https://‌lists.w3.org/‌Archives/‌Public/‌public-dxwg-comments/‌2018Apr/‌0001.html

<riccardoAlbertoni> +1 to have an issue

Action: SimonCox to acknowledge comment

<trackbot> Sorry, but no Tracker is associated with this channel.

Action: Simon to create issue from mailing list comment

<trackbot> Sorry, but no Tracker is associated with this channel.

Catalogues in which dataset is a bag of files

<SimonCox> https://‌github.com/‌w3c/‌dxwg/‌issues/‌256

SimonCox: Do we need the plenary to vote on this?

PWinstanley: Push it through here and get the nod from the main group (UCR).
… should get top of the agenda next week
… for the main meeting
… this is a special case, we won't need new use cases in many instances

How to express distributions provided as compressed files

<SimonCox> https://‌github.com/‌w3c/‌dxwg/‌issues/‌259

<SimonCox> arm

<SimonCox> arminhaller: from API pov it is just the media-type that matters

Jaroslav_Pullmann: There was a suggestion to indicate the original media type, i.e. in the distribution metadata we should allow the media type description

PWinstanley: How deep to we go, mime types, encodings? This can become a rabbit hole

Jaroslav_Pullmann: we have packaging like tar, compressions like zip

PWinstanley: We should only be concerned with the compression type, not the content

+1 to PWinstanley

SimonCox: we should give the user more information then the Web architecture provides us

<Zakim> AndreaPerego, you wanted to ask if Makx's comment on GH could be relevant here: https://‌github.com/‌w3c/‌dxwg/‌issues/‌54#issuecomment-359062055

AndreaPerego: for packaged distributions, at least compressed one's you can use the +zip to include the included mime type

SimonCox: In the past we had encountered similar problems with GML, with one level of compression

<riccardoAlbertoni> +1 to consider also the type of resource considered ( ex. Gml in the simon's example)

SimonCox: the risk is that you have a potential infinite level of nesting

<PWinstanley> +1 to arminhaller and the avoidance of rabbit holes

arminhaller: we should support +zip suffix, but not arbitrary levels of file hierarchies that may be contained in a packaged file or compressed file

Jaroslav_Pullmann: there might be recursive structures, but that is normally not the case
… it is important to let the user know what is in the compressed format

AndreaPerego: Want to report how we deal with this issue. We ignore the fact that a package distribution is compressed.
… people like to know what is inside, shape, CSV or whatever
… we want to say what is the primary format
… there was also a comment from David Reed. I don't care about the compression. The software can uncompress on the fly. There is not need to tell the machine the compression format.

<Zakim> SimonCox, you wanted to note connection with previous agenda topic

<riccardoAlbertoni> I agree on the fact that the compression is not the interesting thing the interesting thing is what is compressed

SimonCox: Are we understanding what the requirement is as riccardoAlbertoni said
… the fact that there may be multiple files within overlaps with the previous issue that we did not discuss
… we should record an agreement that it is important to know what is within the archive

<Jaroslav_Pullmann> +1 for focusing on purpose of Distribution metadata (indicating bare content type)

<SimonCox> Proposed: We agree that the content of an archive distribution (i.e. what is inside a zip or tar file) is important for the users of a Catalogue and should be part of the description

<SimonCox> +1

<Jaroslav_Pullmann> +1

<riccardoAlbertoni> +1

<AndreaPerego> +1 (we also do that in the JRC Data Catalogue)

+1 (just using the +zip suffix if it is a compressed file)

s/we also to/we also do

<SimonCox> ... in the dcat:mediaType property

<PWinstanley> I've a question about the degree of requirement of this, because there is an element of agency involved here - sometimes people keeping the collection (e.g. in CKAN etc) may be given a compressed file but might not have all the information / understanding of the contents. So the information is desirable, but the resolution uses 'should'

Resolved: We agree that the content of an archive distribution (i.e. what is inside a zip or tar file) is important for the users of a Catalogue and should be part of the description

Jaroslav_Pullmann: We are not considering nesting of content
… flat content that is optimised through compression
… metadata is there for a purpose for the agent to know what is in
… automated agents should know about the surface format

<riccardoAlbertoni> +1 to Jaroslav_Pullmann (otherwise, if we are not talking about "flat" file we need an extra use case/requirement to consider)

Jaroslav_Pullmann: i am advocating that we have both in

SimonCox: what about an archive with mixed file formats in it?

Jaroslav_Pullmann: Then we talk about different data

<AndreaPerego> I think it's too strong to say it's different data.

arminhaller: What about a compressed file that contains ttl, n3 and rdf/xml files that are all equivalent, semantically.I have done that before.

.I have/. I have

<AndreaPerego> +1 to arminhaller. Same with CSV, TSV, and spreadsheet formats.

AndreaPerego: In the geospatial community it is common to have a shape file with additional files like manifests included in there.

<riccardoAlbertoni> +1 AndreaPerego

AndreaPerego: for standard nested formats we don't need to do anything
… if nesting is done in an arbitrary way, a readme file within the structure should be used

<SimonCox> This topic is also related to https://‌github.com/‌w3c/‌dxwg/‌issues/‌256 and https://‌github.com/‌w3c/‌dxwg/‌issues/‌81

AndreaPerego: the fact that they use a zip bundle is deliberate, because they intentionally want to help users

<riccardoAlbertoni> perhaps we should add an extra use case/requirement about the andrea's bundle ..

AndreaPerego: here I find it difficult to use metadata to describe the content, unless we use sitemap

<SimonCox> where it is well-known bundle structure, then is this handled with dct:conformsTo ?

<SimonCox> +1 to riccardoAlbertoni !

PWinstanley: In highly structured content, they give you a context.xml file with a common pattern that only deals with the current level in relation to the parent level

SimonCox: I wonder if this is already covered with the conformsTo property

+1 on two seperate uses cases

riccardoAlbertoni: just to reiterate we need two use cases, one flat file use case and one for bundled distributions

SimonCox: +1 to Jakubklimek's contributions on Github

Action: SimonCox to add some notes into https://‌github.com/‌w3c/‌dxwg/‌issues/‌259 about our discussion

<trackbot> Sorry, but no Tracker is associated with this channel.

AndreaPerego: We are not getting any feedback on DCAT
… is there a conference where we can get feedback?

no feedback on DCAT FPWD

SimonCox: I briefed DCAT at several conferences recently. One at a conference in Melbourne, that is organised by my organisation, CSIRO.
… another one with ANDS, the data service provider in Australia
… i can trigger responses from those users

Action: SimonCox to trigger feedback from ANDS

<trackbot> Sorry, but no Tracker is associated with this channel.

<riccardoAlbertoni> thanks enjoy the rest of week!

bye

<Jaroslav_Pullmann> bye!

<PWinstanley> bye

Summary of Action Items

  1. SimonCox to acknowledge comment
  2. Simon to create issue from mailing list comment
  3. SimonCox to add some notes into https://‌github.com/‌w3c/‌dxwg/‌issues/‌259 about our discussion
  4. SimonCox to trigger feedback from ANDS

Summary of Resolutions

  1. minutes approved
  2. We agree that the content of an archive distribution (i.e. what is inside a zip or tar file) is important for the users of a Catalogue and should be part of the description
Minutes formatted by Bert Bos's scribe.perl version 2.41 (2018/03/23 13:13:49), a reimplementation of David Booth's scribe.perl. See CVS log.

Diagnostics

Succeeded: s/201/2018/

Succeeded: s/+/+zip

Succeeded: s/Proposal/Proposed/

Succeeded: s/+1 (we also to that in the JRC Data Catalogue)/+1 (we also do that in the JRC Data Catalogue)/

Failed: s/we also to/we also do

Succeeded: s/mandation/requirement/

Succeeded: s/in it/in it?

Succeeded: s/common pattern/common pattern that only deals with the current level in relation to the parent level/