W3C

– DRAFT –
Dataset Exchange Working Group Teleconference

17 June 2020

Attendees

Present
AndreaPerego, PWinstanley_, riccardoAlbertoni
Regrets
SimonCox
Chair
RiccardoAlbertoni
Scribe
PWinstanley_

Meeting minutes

<riccardoAlbertoni> PROPOSED: approve last meeting minutes https://‌www.w3.org/‌2020/‌06/‌03-dxwgdcat-minutes

<riccardoAlbertoni> +1

<AndreaPerego> +1

+1

Resolution: approve last meeting minutes https://‌www.w3.org/‌2020/‌06/‌03-dxwgdcat-minutes

<riccardoAlbertoni> https://‌www.w3.org/‌2017/‌dxwg/‌wiki/‌Meetings:Telecon2020.06.17

versioning

<riccardoAlbertoni> https://‌github.com/‌w3c/‌dxwg/‌issues/‌89

riccardoAlbertoni: issue #89 - version delta
… in the last meeting we discussed a lot but didn't come to a clear conclusion
… I am proposing that we use adms:versionNotes

<AndreaPerego> +1

<riccardoAlbertoni> Proposed: Consider adms:versionNotes as a working solution for expressing textual description of the version delta.

<AndreaPerego> +1

riccardoAlbertoni: principally because it does not imply any entailments. It is a good flexible starting point

<riccardoAlbertoni> +1

+1

<riccardoAlbertoni> resolve:Consider adms:versionNotes as a working solution for expressing textual description of the version delta.

Resolution: Consider adms:versionNotes as a working solution for expressing textual description of the version delta.

<riccardoAlbertoni> https://‌github.com/‌w3c/‌dxwg/‌issues/‌91 Version release date

riccardoAlbertoni: version release date -
… we need to have some mapping between dcterms and prof
… we had already included dc:modified
… what are the feelings about dc:issued?

AndreaPerego: I think we should start with the simple solution, in the absence of a motivating use case
… until we have a good use case then we should not tie ourselves to anything . a version of a dataset is a dataset

riccardoAlbertoni: so you agree with adopting dct:created ... assuming that each time a new version is created it is a new dataset

AndreaPerego: perhaps the idea was to have a creation date with the initiation of the dataset, then a version for each publication. Until there is something tangible, using the same properties means that it is like another dataset. Using this approach we are not explicitly providing a solution for a release date

riccardoAlbertoni: one discussion related to the one you are trying is the PAV which distinguishes between two types of version - one for long-standing entities and another for more short-lived ones
… since one of the driving ideas I was following was to make DCAT a lingua franca, I wonder if we need to distinguish between long-standing assets and shorter lived assets

AndreaPerego_: we need a use case.

<riccardoAlbertoni> https://‌docs.google.com/‌spreadsheets/‌d/‌1kOp810ep3gQ2iezVXH-abX2q2QubqxNmyJ2bcX6WAFw/‌edit?usp=sharing

AndreaPerego_: for the time-being we assume that a version of a dataset is another dataset. Without a use case it is difficult to know how to handle this point

riccardoAlbertoni: To explain why I am dealing with this type of issue, 2 of the most powerful vocabularies about versioning make this distinction (long-lived vs short-lived) and so perhaps it is an implicit requirement
… we could ignore fine-tuning but we lose expressivity. We need to think about the need to be able to map a vocab like PAV to DCAT
… So, although we don't have a formal requirement, my concern is that we should be able to map these vocabularies.

AndreaPerego: I understand the point, but I'm still concerned about addressing this in DCAT without a statement about how and why we are doing this. Perhaps it is just to get alignment with PAV or Version
… so perhaps it is an alignment issue rather than an extension of DCAT

PWinstanley_: there is already a link to Prov and use thing like wasGeneratedBy

riccardoAlbertoni: the PAV shows how dct:isVersionOf is the prop that links something that is not version with something that it.... this kind of property occurs in so many places it makes it difficult to actually find the actual version that an entity is. We need something that is explicit

AndreaPerego: so we can provide something simple in DCAT and if that is not enough we can direct users to PAV or other version vocabularies

riccardoAlbertoni: I think that if you provide only the simplest thing you are providing something that is not very useful/useable
… This isn't an issue that we absolutely need to deal with , but we risk providing a solution that is not really satisfactory - but it depends on what kind of user you have in mind. Simple annotation of the next version, then that simple case is enough, but with the need to know the most recent version then something more sophisticated is needed

AndreaPerego: I agree that the key information needed is whether a version is the most recent or not.
… I think with the prof and adms properties we can cover things like next and last. This is ok for the use cases that I can figure out, but there may be some that cannot be covered by this. For the time-being I think that what we have from dc and adms could be enough

riccardoAlbertoni: the relation which is the pointer to the latest version - what does that point to? there needs to be an update to earlier versions at publication time

AndreaPerego: if there is a URL pointing to the latest version, then that is OK, but otherwise there is a need to have updates
… so we need to distinguish between a vocabulary requirement and an implementation approach
… this is something from DWBP

riccardoAlbertoni: there needs to be a property that returns the latest version without assuming anything about versioning practice
… my hunch is that if we don't want to enter into this best practice we need a more complex solution
… each time the version is updated the prev relationship needs to be entered to the previous version

AndreaPerego: the unversioned entity always corresponds to the most recent?
… this is about the ways of publishing data. It is platform-dependent. Otherwise it has to be done another way
… zenodo has a mechanism for providing a list of previous versions
… but in this case it is related to the publication mechanism

riccardoAlbertoni: you can cover both approaches with the same vocabulary

AndreaPerego: in CKAN I don't think this is possible

AndreaPerego: so there is no single option. PAV would fit with some platforms; adms into others
… so instead of mandating a single approach we can show alternatives and people can select depending on the data publishing platform

AndreaPerego: I know use cases where conceptually this relationship wouldn't work
… gathering data for a specific purpose, what may happen is that the approach may change from year to year. So they are different datasets because the survey approach changed from year to year
… so it is important to know that datasets are related, but they are not versions of the same entity because of the survey changes. Nevertheless I would still want to know the latest version

PWinstanley_: we need to keep a broad approach that will allow the versioning of any of the main classes of DCAT

riccardoAlbertoni: one possible approach is to describe the cases that AndreaPerego was suggesting and see how different vocabs measure up

AndreaPerego: we are not going to address versioning in detail, we are going to deal with a subset - but we can refer to other vocabularies whilst looking at the subset of cases
… we can provide pointers to other vocabularies

AndreaPerego: prov is powerful but the same thing can be said in different ways.

riccardoAlbertoni: we just need to discuss this. we might not arrive at a solution, but we need to see what we can do

AndreaPerego: we can include dc and adms, they are stable and related to DCAT

riccardoAlbertoni: if there are terms from vocabularies that are not standard we could implement something similar within DCAT. If we are precise enough we can have a mapping with existing vocabularies

riccardoAlbertoni: People still want a 'solution' for versioning in spite of there being many vocabularies.... I think we should take this opportunity

how to catalog a relational database

<riccardoAlbertoni> https://‌github.com/‌w3c/‌dxwg/‌issues/‌1240

AndreaPerego: this is related to cataloguing a service or API. there was a similar requirement for a triplestore recently. We need to put this on the agenda
… this is something we might want to address for the next version

riccardoAlbertoni: this is also something for the primer, and we might not be able to provide a full solution, but we can give some pointers
… We need to get more thoughts about versioning, but for the next meeting we perhaps need to focus on something else and see if we can make progress there

riccardoAlbertoni: I will send out an email asking for topics for the next call

Summary of resolutions

  1. approve last meeting minutes https://‌www.w3.org/‌2020/‌06/‌03-dxwgdcat-minutes
  2. Consider adms:versionNotes as a working solution for expressing textual description of the version delta.
Minutes manually created (not a transcript), formatted by scribe.perl version 121 (Mon Jun 8 14:50:45 2020 UTC).

Diagnostics

Succeeded: s/prov/prof/

Maybe present: AndreaPerego_