DXWG DCAT subgroup call

Meeting minutes

<riccardoAlbertoni> PROPOSED: approve last meeting minutes https://‌www.w3.org/‌2020/‌04/‌29-dxwgdcat-minutes

proposed: accept last meeting minutes

<riccardoAlbertoni> +1

<AndreaPerego> +1 (but I'm missing from the participants)

Resolution: accept last meeting minutes

riccardoAlbertoni: agenda mainly to attend to the wiki page on versioning and to consider next steps

<riccardoAlbertoni> https://‌github.com/‌w3c/‌dxwg/‌wiki/‌Material-for-a-SPRINT-on-Versioning

Versioning

riccardoAlbertoni: there are 2 main sections - the vocabularies, and the principles we may want to consider - and also strategies
… Starting with the section on vocabularies: there is a table gathering the vocabulary terms (thanks AndreaPerego ) organised according to the versioning aspect that the term/s consider
… this is the first step in evaluation
… In the subsections there are descriptions of each of the vocabularies with consideration of how suitable they might be for our solution (but this is not the official opinion of the group)
… Some adjustments are still needed. Pros and cons are not handled consistently . The Registered Version vocabulary is missing.

AndreaPerego: I was trying to group the terms of the vocabularies to help understand the different aspects but if there are other approaches for classification the table can be reorganised

riccardoAlbertoni: the table is very useful. I have been looking at requirements but haven't found anything more precise. At some point we need to fix the requirements that we have

PWinstanley: is there any distinction between 'lineage' and 'provenance'

riccardoAlbertoni: here we are concerned with a specific aspect of provenance which is 'versioning'
… I think that Prov is very flexible and can address many uses, but versioning is more specific than provenance

PWinstanley: perhaps we need to establish what is the difference between a new dataset and a version change

riccardoAlbertoni: we have discussed this elsewhere and cannot provide much guidance

AndreaPerego: one reason for the table is to help understand if there are aspects of versioning covered by more than one vocabulary - this would indicate that there are aspects common across communities
… We also need to understand how we can look at versioning not as a concept but as an activity ...
… We need to understand what we are targeting - if it is too abstract then we risk not being applicable/ adopted
… Are there examples of catalogues where a versioning policy is in place?

riccardoAlbertoni: we need to determine if there are new terms required in DCAT

<riccardoAlbertoni> https://‌github.com/‌w3c/‌dxwg/‌wiki/‌Material-for-a-SPRINT-on-Versioning#2-design-considerations

riccardoAlbertoni: if we move quickly on the design considerations (see link) then we need to ensure that DCAT is an interoperability solution, perhaps between the diverse approaches for versioning. In the desiderata 2 there are simple competency questions (CQs)
… which we need to validate against the existing vocabularies
… if we discuss provenance, lineage, etc we end up in the weeds and miss the point about versioning

AndreaPerego: I recall discussions about statistical data where statistical agencies are keeping versions, but for others there is no use in storing different copies. But Zenodo, for instance, supports versioning through URIs
… The idea is that in research data I should be able to use the exact data used for analysis

riccardoAlbertoni: in this first stage we should focus on the terminology and leave the guidance to later
… we need to focus on requirements and CQs and then we can build examples using the vocabs you provided
… communities differ in their interpretation of the term 'version', so we should avoid generalisation

AndreaPerego: I agree
… I am also not sure that we can provide guidance at the moment. My concern about where we should put this is relevant too - in the standard or in the primer

riccardoAlbertoni: I think the answer depends on the need or otherwise to develop new terms for DCAT
… There are many vocabs out there providing some versioning support - but we need to consider what elements we want to include (we can't necessarily promote the use of whole vocabularies)
… We need to form the recommendation first to see if we are discussing new terms or simply the use of existing

AndreaPerego: I agree. Are you or Peter taking this forward? We need a roadmap . FRBR includes a vocabulary for derivations of works. We can reach out to other experts in the group, and see if this is helpful for defining the versioning vocabulary and the practice

riccardoAlbertoni: I'm not expert in these vocabulaires, but am following the vocabularies and and am looking for what will work. I am cautious about asking the group for their views as to what works because there are similar elements in many domain vocabularies. My attempt was to see if there is a strategy without asking for wide generic feedback, and this is in the last part of the document
… Perhaps AndreaPerego proposal could be re-worked to ask specific questions rather than general ones

AndreaPerego: I think it is right to avoid generic feedback but to focus on expert feedback and then make a proposal for the whole group to see if it needs more input or not
… The feedback I'm missing at the moment is about FRBR

<riccardoAlbertoni> ack

<riccardoAlbertoni> yes ..

PWinstanley: first point: perhaps we need to get wide agreement on the CQs. the tight focus of the CQs will direct the solution
… second point: to add a review of the SPAR ontologies into the mix: http://‌www.sparontologies.net/‌ontologies

<riccardoAlbertoni> ok

<riccardoAlbertoni> ack

<riccardoAlbertoni> https://‌www.rd-alliance.org/‌group/‌data-versioning-wg/‌outcomes/‌principles-and-best-practices-data-versioning-all-data-sets-big

riccardoAlbertoni: I agree with the CQs point. My main concern is that the SPAR ontologies might be interpreted a different way when it comes to data. Same with RDA. When looking at RDA

PWinstanley: My reference to RDA was for https://‌www.oclc.org/‌en/‌rda/‌about.html

riccardoAlbertoni: the Research Data Alliance (RDA) had a difference between versions of documents and versions of datasets
… I'm not sure if the 4-level structure of FRBR fits with data, because in DCAT we said already that all first-class objects are subject to versioning

<riccardoAlbertoni> Frbr

riccardoAlbertoni: I would use FRBR to complement the solution, and only in that librarian community

AndreaPerego: the SPAR ontologies can be reviewed but some are already based on ones that we have looked at.
… I wonder whether adding other vocabularies can be helpful or not. Perhaps it could be done in parallel; but we need concrete steps. We could add an agenda item to the plenary to state the position and to seek feedback from the specialists

riccardoAlbertoni: Should we put a specific issue to the plenary? I think this is the only way to collect wide feedback

AndreaPerego: it would be good to know if they think our approach makes sense, and to double check that we have all the best relevant vocabularies

AndreaPerego: in the plenary we need to have some specific questions and also invite people to join in. the risk is that we ask for feedback but nothing happens

<AndreaPerego> [meeting adjourned]

– DRAFT –
DXWG DCAT subgroup call

20 May 2020

Attendees

Meeting minutes

Versioning

Summary of resolutions

Diagnostics