W3C

– DRAFT –
DXWG DCAT Subgroup

28 October 2020

Attendees

Present
alejandra, AndreaPerego, PWinstanley, riccardoAlbertoni_
Regrets
-
Chair
riccardoAlbertoni_
Scribe
alejandra, AndreaPerego

Meeting minutes

<riccardoAlbertoni_> PROPOSED: approve last meeting minutes https://www.w3.org/2020/10/14-dxwgdcat-minutes

approve last meeting's minutes

<AndreaPerego> +1

+0 (was not present)

<riccardoAlbertoni_> +1

Resolution: approve last meeting minutes https://www.w3.org/2020/10/14-dxwgdcat-minutes

approve agenda

<AndreaPerego> +1

<riccardoAlbertoni_> +1

+1

Dataset Series

riccardoAlbertoni_: we prepared some material about dataset series

<riccardoAlbertoni_> https://github.com/w3c/dxwg/wiki/Examples-on-dataset-series,

<AndreaPerego> Draft section in ED: https://raw.githack.com/w3c/dxwg/dcat-dataset-series/dcat/index.html#dataset-series

riccardoAlbertoni_: I'd like to start discussing the PR prepared by Andrea

AndreaPerego: put together what it was discussed so far, trying to structure some consistent ideas on how to specify dataset series
… I organised the section in introductory and two subsections
… notion of dataset series, same issue as with versions
… it is community specific, and the term is not used consistently
… in stats domain it is formally define in a very strict way
… set of dimensions where the values are always the same
… in other cases it is not so strict
… group according to space or temporal dimensions and you may need to include some additional information as part of the dataset that is part of the series
… it is a very heterogenous situation so we are not trying to define dataset series but to explain how to represent them using DCAT
… considered use cases that we got from people
… consensus of using dct:isPartOf dct:hasPart
… datasets released on a yearly version, using what we did for versioning section
… last section on some discussion not developed completely
… dataset with some characteristics, spatial / temporal coverage, spatial / temporal resolution
… how this should be reflected in the metadata describing the series
… discussion about this topic and idea of inheritance principle
… child dataset should inherit what it was defined at the series level
… upstream inheritance might make more sense
… reason is that dataset series with data from different years, the temporal coverage for each dataset is a specific year, but the series year is all the years of the series
… similarly with spatial coverage
… also temporal / spatial resolution
… how to deal with annotation properties related to the publication process
… what should be the last update date, update frequency
… these aspects are addressed on the last part of the proposal
… the idea is that the creation date of the series should be when the first child dataset has been created, earliest date
… when talking about last update, the series will have the date of the last child

AndreaPerego: final point is about update frequency, the proposal is that it should correspond to the most frequently updated child dataset
… the proposal was a way to raise these issues for discussion

PWinstanley: all of what you said makes sense
… when using dcat:Distribution, can we have distribution of distributions
… I wonder if we should have somethingt hat specifically characterises datasets that are series
… we've included hasPart but there might be a situation when we created a dataset and the hasPart has not been minted yet but we know it is going to be a series

AndreaPerego: about the last point, I forgot to mention that we are discussing if we should create a new type or use a soft type approach
… for the time being, while we discuss it, I used the soft typing approach

AndreaPerego: for the distribution, practical issues
… the way people are using it
… most frequently catalogue patterns are not supporting the notion of dataset series

PWinstanley: if I have a budget dataset and I have distributions (csv, and other format), those are distributions
… having it in there, acknowledges them

AndreaPerego: we have an example by Jakub about many datasets
… other examples for geospatial data from the INSPIRE infrastructure
… each satellite image as a dataset
… many datasets for one series
… in the geospatial domain, people are creating series and not creating the children dataset records

PWinstanley: use data cube for this

AndreaPerego: yes, but this is on the data management side of things

PWinstanley: as a cube, it could be slices of information

AndreaPerego: for me, the uses cases that were contributed shouldn't be dealt with at the catalogue level
… postal information, one of the post complex data models in INSPIRE
… complex in terms of representation
… have a landing page where you can filter based on the criteria the slice that you want on the dataset
… in these cases, I don't see how it can be managed effectively

PWinstanley: service that delivers the distribution, but its metadata is going to be fairly generic

AndreaPerego: something else - the section is focusing on the approach that a series is itself a dataset
… paragraph mentioning that people are using distributions, but we propose as a default approach creating a dataset for child dataset
… in case where the distribution approach is more suited, we don't restrict it
… this is the request that we got, consistent way of doing this if people want to do it

riccardoAlbertoni_: as far as I am concern, this is a set forward, some issues that need to be discussed still
… we can try to highlight the open issues in the document

<riccardoAlbertoni_> https://github.com/w3c/dxwg/wiki/Examples-on-dataset-series

riccardoAlbertoni_: some of Peter's observations are mentioned as aspects in the wiki page
… at the end, there are different examples in the spirit that Andrea explained
… at the end there is a discussion section, where different aspects are listed
… one aspect about using dataset or distribution, already expressed my concerns about using distribution
… but at the moment, mentioning both is the best approach
… what can we do next?
… one question for Andrea - what is missing to accept the draft PR?

riccardoAlbertoni_: we are expecting to have the first PWD at the end of November
… there will be issues we won't be able to resolve
… but important to show the discussions
… for example, the note about the upstream inheritance could deserve an issue

+1 for creating an issue on that
… fixing all the points, trying to get an agreement and get it out

riccardoAlbertoni_: we have time to do some refinements on this proposal
… if we include what we have already is a good start

AndreaPerego: fine with that, final version on this section and the versioning section by the end of the Month
… worth including in the PWD
… having a draft version is not so dangerous
… based on what has happened so far, it is the only way to have it in a PWD

alejandra: It would be good to have it in the FPWD.
… I also agree to create issues for specific aspects.
… My question is about the not so many meetings we have before FPWD.

<alejandra> PROPOSED: draft on dataset series should be included in the FPWD for DCAT 3

<riccardoAlbertoni_> +1

+1

<AndreaPerego> +1

<PWinstanley> +1

Resolution: draft on dataset series should be included in the FPWD for DCAT 3

Versioning

riccardoAlbertoni_: have we received any feedback?

AndreaPerego: no new feedback

FPWD is due by 24th November

<riccardoAlbertoni_> https://github.com/w3c/dxwg/milestone/27

riccardoAlbertoni_: how to speed up?

AndreaPerego: most of the issues are going to be addressed by the two new sections that we have (versioning and dataset series)
… and I would close them once we have them in the spec
… we can create new issues if necessary
… left other issues not related to versioning or series, and they could be removed if they are not going to be addressed
… last two can be easily merged
… no intention for our side on putting restrictions on domain and range

Issue: https://github.com/w3c/dxwg/pull/1260

<trackbot> Created ISSUE-1 - Https://github.com/w3c/dxwg/pull/1260. Please complete additional details at <https://www.w3.org/2017/dxwg/track/issues/1/edit>.

AndreaPerego: second one is the use of inconsistent use of IANA uris

https://github.com/w3c/dxwg/pull/1261

AndreaPerego: two new sections - versioning and dataset series
… they mention some properties that are not explicitly included in the class descriptions
… two options: either we include them in the class descriptions or we wait for some feedback

<riccardoAlbertoni_> +1 to postpone the change in the normative part

AndreaPerego: about modifying the class descriptions, we haven't received any feedback, so it might not be worth in getting modifications of the normative part of the document
… we leave the changes of the normative part for later

alejandra: I agree with the milestone, and how to address the issues

outstanding actions

riccardoAlbertoni_: I think we can close alejandra 's action.

close action-391
… and also AndreaPerego 's one.

<trackbot> Closed action-391.

close action-433

<trackbot> Closed action-433.

<alejandra> close ACTION-394

<trackbot> Closed ACTION-394.

<alejandra> AndreaPerego: question about what to do with issues that we cannot cover by end of Nov

<alejandra> alejandra: I think we can move them to the next milestone, as it is better to have shorter cycles for PWD

AndreaPerego: revision of deadlines

riccardoAlbertoni_: deadline to share with the group

PWinstanley: is there a concept of issues at risk?

riccardoAlbertoni_: no, we are in the FPWD

the HTTPS versions seem to work: check https://www.iana.org/assignments/media-types/text/csv

discussion about the IANA issues

Action: riccardoAlbertoni_ to write to DXWG group about the dataset series PR and advance notice for FPWD

<trackbot> Error finding 'riccardoAlbertoni_'. You can review and register nicknames at <https://www.w3.org/2017/dxwg/track/users>.

[meeting adjourned]

Summary of action items

  1. riccardoAlbertoni_ to write to DXWG group about the dataset series PR and advance notice for FPWD

Summary of resolutions

  1. approve last meeting minutes https://www.w3.org/2020/10/14-dxwgdcat-minutes
  2. draft on dataset series should be included in the FPWD for DCAT 3

Summary of issues

  1. https://github.com/w3c/dxwg/pull/1260
Minutes manually created (not a transcript), formatted by scribe.perl version 124 (Wed Oct 28 18:08:33 2020 UTC).

Diagnostics

Succeeded: s/consistent/consistently

Succeeded: s/consesus/consensus

Succeeded: s/Nope//

Succeeded: s/... question about use of emojis in the comments in github/