DXWG DCAT Working Session teleconference 13 February 2019 21:00 UTC

Meeting minutes

<DaveBrowning> https://‌docs.google.com/‌document/‌d/‌1fApxJIotapugde-hyS2lmsElNO3mLvoi7nLqDYJQZ7g/‌edit?usp=sharing

<SimonCox> regrets, AndreaPerego

scribenick PWinstanley

<DaveBrowning> https://‌github.com/‌w3c/‌dxwg/‌issues?q=is%3Aopen+is%3Aissue+label%3Adcat+label%3Aversioning

DaveBrowning: there are useful resources in the links - esp the list of relevant github issues
… these have been tagged with 'versioning'

<DaveBrowning> https://‌github.com/‌w3c/‌dxwg/‌projects/‌9

DaveBrowning: the same things are grouped into github project categories
… Work done towards the beginning of the WG - alejandra did a review of versioning
… there are other notes
… and notes on using pav

<DaveBrowning> https://‌lists.w3.org/‌Archives/‌Public/‌public-dxwg-wg/‌2019Feb/‌0208.html

DaveBrowning: Makx sent a suggestion to the mailing list

<riccardoAlbertoni> +1 to Makx's suggestion

DaveBrowning: we need to take the position that it is not for DCAT to determine the point of change from one version to another for a dataset - this is established within the domain
… but we need to provide a mechanism. We need to formalise any consensus view on this requirement first

<SimonCox> +1

proposed: we are not going to talk about why, when or where, but are talking about how

<DaveBrowning> +1

<riccardoAlbertoni> +1

<Jaroslav_Pullmann> +1

Resolved: we are not going to talk about why, when or where, but are talking about how

<alejandra> +1

DaveBrowning: the follow on: do we want to make an explicit statement about this?

+1 to explicit - very explicit

<riccardoAlbertoni> +1 to make it explicit in the document

<DaveBrowning> +1

<alejandra> +1 to explicit

<Jaroslav_Pullmann> +1

<Makx> +1

DaveBrowning: So the question now is: how much effort does this task require

<DaveBrowning> PWinstanley: Don't gold plate, go for coverage not depth

<DaveBrowning> ... simple illustrative case

<DaveBrowning> ... that shows something can be done, something that shows it can scale...

<DaveBrowning> ... maybe do a more complex case...?

<riccardoAlbertoni> +1 to simple and illustrative ..

Jaroslav_Pullmann: I am having difficulties understanding if we have a vision of versions,. Do we consider alternative distributions of different languages 'versions' or 'distributions'?
… what are our version properties?

DaveBrowning: last week we agreed to be loose in interpretation or definition of distributions. we are minimising the complexity of 'informational equivalence' leaving this to the the publisher

<alejandra> you can catch up on the update of the distribution definition at: https://‌w3c.github.io/‌dxwg/‌dcat/#Class:Distribution

Makx: reacting to the point of languages, this is a comon case, but I suggest the set of 6 scenarios, and I think these are for publisers to define what constitutes a version change
… we can say what the properties might be to support their choices
… but we need to leave it open to them to make the design decisions depending on their requirements

<alejandra> https://‌www.w3.org/‌TR/‌hcls-dataset/

<SimonCox> https://‌www.w3.org/‌TR/‌hcls-dataset/#datasetdescriptionlevels

alejandra: I agree with Makx - but I thought the work we did with the HCLS profile for data sets might be instructive - see s.5 and the diagram that separates the data set description from the distribution and the version level description that alows one to describe the relations between data versions

alejandra: in the table, for each descrription level we specify the requirement of properties for each level

SimonCox: this still leaves it quite abstract in terms of scenarios

alejandra: the specific stuff for versioning is within the table 'provenance and change'

alejandra: using dct and pav attributes

riccardoAlbertoni: I like the approach of pav - a potential solution. the other approach is using qualified relations.
… I mention qualified relations because versioning is a relationship between datasets, and in DCAT we are already considering qualified relations. Covering versioning the same way is another possibility.

<DaveBrowning> ack +

<DaveBrowning> acl Makx

<SimonCox> Qualified Relations are the answer to everything ... but must be accompanied by a controlled vocabulary of 'roles' that relate one dataset to another

Makx: alejandra mentioned the HLCS - is the approach there to be the one we direct people to, or do we bring an extra class into DCAT?

alejandra: HLCS is in use, but in niche life sciences areas. Combined with riccardoAlbertoni comment, qualified relations are referenced
… I think it is better for DCAT to have its own version of this because we are covering a wider user. I'm not suggesting a specific solution ... yet. But my mention of this is to add it to the discussion of options.

<alejandra> We do already have 'is version of' as an example in the Qualified Relationships section: https://‌w3c.github.io/‌dxwg/‌dcat/#qualified-forms

<DaveBrowning> PWinstanley: Might want a more low tech option, as well as the more sophisticated

<DaveBrowning> acl Jaroslav_Pullmann

Jaroslav_Pullmann: it might not be possible to combine both approaches - the DCAT document itself might not be the place to describe these information. If I'm wanting just summary material what am I expecting to see
… we could try to draw out what this constellation might actually look like
… AFAIK the distributions in HCLS are not versioned, only the dataset.

alejandra: the dataset is abstract, the distributions are concrete and can come in different languages/formats/profiles
… the versions are of datasets only. For representation one doesn't need a separate class

Jaroslav_Pullmann: can we consider versioning in terms of effort - it is a lot of effort to describe a dataset. we can approach this on different levels of resolution
… perhaps we should take effort into account

DaveBrowning: Summary: lots of suggestions, but with the exception of riccardoAlbertoni and the qualified relations, we are circling around the problem
… I am still looking for a strong suggestion

riccardoAlbertoni: the example proposed in the google doc is a straw man
… perhaps we should try to keep it simple, suggesting pav as the first attempt. One issue I see is that we had another vocab which is not a W3C standard
… but this could be unproblematic. I also acknowledge that the reference to PAV is easier to realise in the short time we have
… The qualified relations could be part of an incremental approach for which plain PAV is the start. PAV itself uses qualified relations for complex patterns

DaveBrowning: one advantage of that approach is that we start drafting and become more elaborate as we move forward

+1 to drafting

DaveBrowning: what examples might we want to use?

Jaroslav_Pullmann: Makx already mentioned 2 scenarios where the data sets are versions of the 'summary level'
… but we can have versions at distribution level too as an example

Makx: in my message last week there were plenty of illustrations that we can use to test the current DCAT and evaluate to see if anything is missing

DaveBrowning: are you expecting qualified relations modelling?

Makx: these are 'real world' examples from DCAT-AP work. The language one can be done with different distributions under the same dataset.

<SimonCox> It would be easy to add more examples like https://‌w3c.github.io/‌dxwg/‌dcat/#qualified-relationship and make it explicit that the related resources are of type=dcat:Dataset

Makx: but there area annual budgets for different periods, - there are many options, but we need to be able to point which one follows which
… going through these examples will be instructive
… we either model different dataset versions, or different versions of distributions

<SimonCox> +1 to Makx - provide an inventory of use-cases based on our known cases, from DCAT-AP etc, with recommended patterns, showing both Distributions and Datasets

Makx: we need to discover in real stuff what works and what doesn't. At the moment we are not discussing concrete things, just general stuff

SimonCox: in the contributions I've made I find real examples most helpful - I used the CSIRO data repo
… in most cases it has uncovered niggles
… let's descend to concrete examples
… I also drop examples into the 'examples' folder of github, we can place them there

<SimonCox> https://‌github.com/‌w3c/‌dxwg/‌tree/‌gh-pages/‌dcat/‌examples

<alejandra> https://‌www.w3.org/‌TR/‌hcls-dataset/#appendix_1

alejandra: I agree about the examples The HCLS example from the chemical compounds database doesn't fit DCAT.
… in addition to examples and UC as proposed by Makx , perhaps we should also consider what queries we would like the metadata to answer
… we can only attach qualified relations or PAV properties to dataset level

<SimonCox> (more examples coming when https://‌github.com/‌w3c/‌dxwg/‌pull/‌730 is merged :-) )

alejandra: we need to determine which domains require these properties

Jaroslav_Pullmann: I think this will remain inconsistent because of the choices of the publishers. I think both (dataset/distribution) levels might be applicable

<DaveBrowning_> nick: DaveBrowning

Makx: I don't see that we are concerned with inconsistency. we can create new datasets, or new distributions under the same dataset. Wejust need to say which properties need to be used for each case. We don't need a singular view of everything, but we need to say that if you want to do A then do this, and B then do that

Jaroslav_Pullmann: if we have a set of properties that migt be used on either dataset or distribution level then the querying might yield confusing results

Makx: people are doing these things, so we have to roll with it.
… but we can suggest routes

<Zakim> riccardoAlbertoni, you wanted to say alejandra was referring to the issue : Version subject https://‌github.com/‌w3c/‌dxwg/‌issues/‌93

Jaroslav_Pullmann: this might suggest different sets of properties for datasets and distributions, and this choice might then be informative

riccardoAlbertoni: the discussion suggests to me that we are discussing issue #93
… saying that it is up to the user to determine the subject of the versioning
… it might be any first-class object from the DCAT vocab

alejandra: if we support this then we are leaning towards a solution that will combine properties and qualified relationships
… This needs to be illustrated with our examples

Jaroslav_Pullmann: searching - creating models leads to diversity, but queries will need to be able to establish the type of versioning pattern

<alejandra> considering the time series data, which is one of the use cases that Makx listed, DCAT-AP represents it using hasPart and no reference to versions: https://‌joinup.ec.europa.eu/‌release/‌dcat-ap-how-model-dataset-series

Jaroslav_Pullmann: this is up to the exploration of the patterns applied to the metadata

<SimonCox> THe PROV-O property is called `prov:qualifiedDerivation` - https://‌www.w3.org/‌TR/‌prov-o/#qualifiedDerivation - (g) here https://‌www.w3.org/‌TR/‌prov-o/#qualifiedDerivation

<SimonCox> I meant here https://‌www.w3.org/‌TR/‌prov-o/#qualified-terms-figure

alejandra: Makx - re: the link of how DCAT-AP does this with annual budget data. AFAIK there is no reference to version, but to dataset parts
… please can you (Makx) point to how DCAT-AP handles versions

<riccardoAlbertoni> SimonCox: https://‌www.w3.org/‌TR/‌prov-o/#qualifiedRevision

<alejandra> "Additionally, DCAT-AP allows relating datasets as ‘versions’ using dct:hasVersion/dct:isVersionOf but it is not clearly described in which cases to use these properties."

Makx: we looked and couldn't find an agreed approach.
… CKAN thought it was ridiculous to have different distributions. It is only visible on the screen, there is no metadata.

<SimonCox> The `dcat:qualifiedRelation` has domain `dcat:Resource` and range `dcat:Relation` which carries the property `dct:relation` which can point to anything

Makx: my point was that W3C was going to resolve it (Us?!!)

<Makx> yes, that would be us!

DaveBrowning_: can we summarise the conversation about qualified relations?

riccardoAlbertoni: there are diverse properties that relate to this area between DCAT and PROV, but I am uncertain that it is totally appropriate to our needs

<SimonCox> Makx - are you proposing a 'versionNumber' or 'versionDesignator' property?

Makx: we need to have the qualified relation to express the exact version. we also need to be cautious about how deep we go into this. in the library world there is this issue of complexity in book revisions. sometimes it is not just version 1,2,3, etc, but sometimes there are additional free notes. we need to ensure that any solution we achieve is reasonable and fits peoples' needs
… some basic approach migth be a good way forward, then to increase the complexity and see how it fares

Jaroslav_Pullmann: we have discussions on 2 levels of solution level
… on the lower slopes then simple properties are enough, but for more complex situations qualified relations
… but we need to agree to what entities we would apply a version . we need to provide hints or definite advice.
… if we cannot easily provide these then this is the problem to solve.

SimonCox: asking Makx - it sounds like you're identifying a gap and perhaps with enough examples using the properties we have available is another property giving a version something that would meet your requirements

Makx: in DCAT-ap there is a version indicator and a version note - these are the simple requirements met, but eqyally not the only way to do this. I was wanting to discover how much precision we need to bring into this, because we were working for a long time and agreement was hard to reach
… so perhaps we should not do that effort

SimonCox: anytime there is a property and some explanation, it is another class - a more complicated pattern

<SimonCox> more than one property grouped together == a class

<Zakim> SimonCox, you wanted to ask if Makx is proposing a 'versionNumber' or 'versionDesignator' property?

alejandra: In the google doc is a diagram - if we have 2 versions of a dataset we want to descrie their relations. but these might have different distributions. we need to be able to relate the datasets / or the distributions. so as to decide which is the next version
… we want to give people freedom, but we need to give them the properties to express these relationships

<alejandra> the google doc link is: https://‌docs.google.com/‌document/‌d/‌1fApxJIotapugde-hyS2lmsElNO3mLvoi7nLqDYJQZ7g/‌edit

Makx: I cannot see the diagram, but rather than saying to people how to do things, the DCMI Terms versionOf does the job.
… we cannot expect people to do what 'we' think

alejandra: yes, but we need to provide guidance
… we need a position on the best , cleanest way of doing this

Makx: I agree, - the examples I have are ones that we might want to say something about

riccardoAlbertoni: I agree with the idea of allowing the user to do what they want.

<alejandra> and let's not forget dcat:Resource and services!

riccardoAlbertoni: the user should decide when to apply versioning. On the issue of simplicity I take a diifferent line to Makx - same qualified pattern will avoid using one different pav terms every possibilities. it is just a matter of judicious choice of the term

Jaroslav_Pullmann: my summary - support for simplicity; for the drawing which shows the degree of freedom people have;
… the modelling pattern is the individual decision of the publisher

<alejandra> even if we allow freedom, we should guide through a few patterns

Jaroslav_Pullmann: we can describe options , as alejandra did, which do not break the structures

<SimonCox> I just took a look at schema.org - it has fairly weak support for versioning, only a version designator https://‌schema.org/‌version, which is not tied into a link to another thing, and https://‌meta.schema.org/‌supersededBy which is only one possible versioning relationship

<SimonCox> ... and is not in the core

<SimonCox> ... and is only related to model constructs not datasets

DaveBrowning: bringing us back to the recommendation, do we expect to talk much in the rec - or is it there to provide some illustrations of versioning and we accept that publishers will develop their own styles?

alejandra: we could discuss and illustrate riccardoAlbertoni point - there are vocabularies, so let's decide which might be used for our examples

DaveBrowning: tbh, pav not being a W3C standard is an advantage - there is more than one provider, and this shows strength in the approach

Makx: wen doing DCAT v1 there was pushback from W3C for using DCT, but PAV is referred to in DWBP so there is no problem referencing it

riccardoAlbertoni: I don't know if using PAV is a problem, but in DWBP PAV is provided as an example only, not a recommendation

Jaroslav_Pullmann: do we have a gap? is there a real issue of missing vocab?

Makx: good point Jaroslav_Pullmann . I suggest we go through examples and that will illustrate any gaps.

<riccardoAlbertoni> +1 to Makx and Jaroslav_Pullmann about examples

DaveBrowning: families of examples: are the ones from Makx good? Nobody suggests otherwise... are there any others?

Makx: there is serial versioning and parallel versioning.

<SimonCox> I like Makx categories

Makx: I don't know if it is an issue that we take into account

Jaroslav_Pullmann: I support this vision - it is to do with obsolescence. One obsoletes the other. I want to know what is current

<SimonCox> supersedes?

Makx: Jaroslav_Pullmann brings up a number of points - there is sequencing where members are equally valid. each requires a different set of functions.

<SimonCox> we see many 'versions' in simulation and forecasting datasets, all of which are 'valid' for different functions

Makx: There can be replacement.

Jaroslav_Pullmann: we are reaching the crucial point of the version - to let the client indicate the current shape of the dataset
… we should support this axis of interest using the most appropriate means
… these may inform the gap analysis.

<SimonCox> 'supersedes' is more common term than 'obsoletes'

<SimonCox> (unless there is a nuance I'm missing)

Jaroslav_Pullmann: the different patterns could be described, and we could give a minimal requirement of how each pattern might be expressed

<riccardoAlbertoni> +1 to sprint if we have examples in the meanwhile

q: are people still getting value from the sprint approach?

<SimonCox> I got less value from this sprint, because no concrete proposal on the table

<DaveBrowning> +1 to SimonCox view

<SimonCox> ... need some wording, a document section ...

<SimonCox> Can Jaroslav_Pullmann draft a starting point?

Makx: I think Jaroslav_Pullmann did a concrete proposal. Whatever you do , provide version information, version indicator, version notes. if you think it is the dataset that has changed, then apply to dataset. if distribution, then apply to the distributions.

<alejandra> for the basic structure, we might as well refer to https://‌www.w3.org/‌TR/‌dwbp/#dataVersioning

Makx: going further than that (e.g. annual budget data) then provide examples of handling these more complex cases.
… I think that what Jaroslav_Pullmann proposed takes us the first step of the way.
… but we need some concrete proposals, and if we have those then we will clean up the work quickly
… We don't need the sprint to create the proposal though

alejandra: what Jaroslav_Pullmann proposed is similar to DWBP

1= sprint; 2= meeting as normal

<alejandra> we need a concrete proposal about this

<Makx> 1

<Makx> no sorry 2

<Jaroslav_Pullmann> +1 for meeting (2), since too late

1= sprint around a concrete proposal; 2= meeting as normal

<SimonCox> 2

<alejandra> 2

<Makx> 2

<Jaroslav_Pullmann> 2

<SimonCox> ... until there is a concrete proposal on versioning ready

<riccardoAlbertoni> +1 (if we have proposals to discuss) - +2 otherwise

<riccardoAlbertoni> yes

<riccardoAlbertoni> bye thanks for the interesting discussion

<Jaroslav_Pullmann> thank you!

<riccardoAlbertoni> RSAgent, draft minutes v2

<alejandra> we have a section here: https://‌w3c.github.io/‌dxwg/‌dcat/#dataset-versions

<Jaroslav_Pullmann> thanks!

<Jaroslav_Pullmann> present

bye!

– DRAFT –
DXWG DCAT Working Session teleconference 13 February 2019 21:00 UTC

13 February 2019

Meeting minutes

Summary of resolutions

Diagnostics