DXWG Oxford Face to Face -- 17 Jul 2017

<Caroline_> Introductions...

Introductions

kcoyle: the main goal for our F2F is to discuss the UCR
... Caroline and I tried to categorize them. If we get to a Use Case and think it is in another category we just move it
... the idea is to get through all of them even though if we don't get resolutions about all we have listed

<danbri> https://www.w3.org/2017/dxwg/wiki/Main_Page#Working_Documents -> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space & https://www.w3.org/2017/dxwg/wiki/Use_Cases_and_Requirements

kcoyle: if we need we may finish some of them afterwards

DCAT and "dataset"

kcoyle: the first one we are going to discuss is https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID8

SimonCox: looking the version one of DCAT thear is no ??
... to the extended DCAT part of what we are looking at is part of dublincore

<danbri> https://www.w3.org/TR/vocab-dcat/#introduction"""Data can come in many formats, ranging from spreadsheets over XML and RDF to various speciality formats. DCAT does not make any assumptions about the format of the datasets described in a catalog. Other, complementary vocabularies may be used together with DCAT to provide more detailed format-specific information."""

SimonCox: what is the scope for DCAT descriptions?
... also dataset
... we recommend the use of existing DCAT recommendations

<danbri> DCAT alludes to http://dublincore.org/documents/2003/02/12/dcmi-type-vocabulary/"""(Dataset) A dataset is information encoded in a defined structure (for example, lists, tables, and databases), intended to be useful for direct machine processing."""

SimonCox: the original dublicore metadata
... the description of the use case is above
... it is clear as well as the requirements "Guidance on use of dc:type or similar for DCAT records. Recommendation on content-type vocabularies."

Jaroslav_Pullmann: Is this still a dataset or is any resource which is not anymore a dataset?
... I support the dataset

<AndreaPerego> About the different resource types in different metadata standards, I prepared a summary table (incomplete): https://docs.google.com/spreadsheets/d/1nlAgLUGQcBe40oTk5WNCVz-6rud1JtLwjoYyyqAT45U/edit?usp=sharing

Jaroslav_Pullmann: it should be more than separately

s/separetaly/separately

Makx: I am against of limiting the scope of what DCAT dataset is

<AndreaPerego> +1 to Makx

Makx: I am in favor of using vocab to say what dataset is

annette_g: I think the use case approach should come down to actual use cases
... some of the use cases are questions
... we may consider those as separate questions

LuizBonino: I like the idea to be able to describe diferent types of information as assets

antoine: it seems this use case is to describe what is the dataset but it can also be understood about the context

alejandra: I think it is important to discuss the scope of the use cases
... make sure that we provide guidance on the type
... I agree with the Use Case and I think we need to consider it

<Keith> the problem with using 'type' is that 'type' may be made up of many different attributes

<antoine> Keith++

<Zakim> danbri, you wanted to suggest that ANY collection of 0s and 1s (including empty collection) can be treated as a dataset; "dataset" is about how the data is handled/treated/managed,

Makx: the definition of dataset
... has to be curated

<alejandra> curated

<roba> seems to me the main thing is not to try to define it now - but to decide if we will maintain (or adopt) a list of types

Makx: I think it is important to clear it up

danbri: I think we agree
... is about the curation of the process around data

<Thomas> +1 for makx and dan

<alejandra> maybe this is useful: software vs data https://github.com/danielskatz/software-vs-data

<Makx> accept

<danbri> [I agree with Makx that being a dataset is around the social context surrounding data, not the data itself]

kcoyle: can we accept the use case ID8 as it is?

Jaroslav_Pullmann: we can just accept it

<annette_g> +1 to Jaroslav

Jaroslav_Pullmann: there are questions that are not stated on the use case

<annette_g> S/can/can't/

Jaroslav_Pullmann: maybe we could check others use case related to see the requirements and descriptions to see if they complete themselves

kcoyle: let's check the use case ID20 https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID20
... we ante to be able to specify a type
... we are probably going to have to point to a small number of recommended vocabs
... given that, could we vote on the ID8 and ID20 at the same time?

<SimonCox> The link to parse.insight in the use-case description was unhelpful - I've corrected it

antoine: I think it would be all right
... maybe SimonCox could explain

SimonCox: there are a lot of diferent file types
... they call content type

<roba> dataset type != encoding type - dataset may be exposed in many encodings

SimonCox: there are different formats of media type

antoine: on the web context content type uses media type

<danbri> [media type could be .Z (application/x-compress, LZW) in the case of the Web History collection https://www.w3.org/History/1992/timbl-floppies/TimBerners-Lee_CERN/hype.tar.Z]

<Keith> a problem with the concept of dataset concerns streaming data because of its continuity: is the dataset the whole thing or a defined 'window'

SimonCox: I am talking about semantic oriented
... the language chosen is certain conflicted
... talking about content type

<roba> should definitely change "content-type" wording in Use Case

<roba> we are talking about the range of dc:type

<LuizBonino> Is it the "nature" of the dataset instead of how it is serialised, right?

<Thomas> Right; that's how I perceive it also

SimonCox: the dublincore descriptions from 20 years ago recognize datasets which are images, maps, spreadsheets, etc
... there is a strong sense the images are different

antoine: I accept it

<antoine> https://en.wikipedia.org/wiki/List_of_HTTP_header_fields

antoine: as someone suggested to put a small note saying it

kcoyle: we have mentioned something that was not discussed on the use cases

SimonCox: it says that in the use case

kcoyle: are we at a point that could we vote on this

Jaroslav_Pullmann: we should merge them

<alejandra> +1 to merge them

<AndreaPerego> Sorry, merge what?

Makx: reminded us that we could merge only the requirements

<Makx> +1 to merging reqs

the uses cases ID8 and ID20, AndreaPerego

<antoine> +1 to keeping the use case separated (they were contributed separately) but having the requirements consolidated.

<Jaroslav_Pullmann> +1

<SimonCox> +1 to merge reqs - this will drive DCAT 1.x - keep use cases separate for record keeping

Jaroslav_Pullmann: if we are looking for audiences we have differents
... they were not in the discussions. That was my motivation to merge them
... it might be interesting for researchers to see them merged
... if we talk about access the question is if are we talking about datasets
... we should be talking always about digital access resources
... the access would be only by protocols
... the definition of data maybe also about non digital data. It can be anything. So we must be sure to be talking about data accessible

<danbri> [is there anything DCAT can't describe? :]

Thomas: these two use cases could be about anything
... the discussion about content type and so on is part of content negotiation
... agree with Jaroslav_Pullmann to merge the requirements

<SimonCox> +1 danbri

Jaroslav_Pullmann: is the purpose is to have a history we should merge only the requirements
... sometimes the use cases are very valuable
... it is important to have reports of what we are missing

kcoyle: if you feel there is a use case missing, please create it

AndreaPerego: we should consider include descriptions or resources that are not data

<Zakim> LarsG, you wanted to ask if it's just about to accept or decline use cases

LarsG: I have a metaquestion. are we discussin the merging and how to proceed?
... we discussed that in a call and agreed to keep the use cases separeted and merge the requirements
... alo a catalogue should be considered

<Thomas> Proposal will follow here

<SimonCox> I agree that ID20 partly elaborates ID8, but it is only the requirements arising from these that matters in the end!

PROPOSAL: to accept the use cases ID8 and ID20 as they are

<SimonCox> The use-cases stay on the books so that we can check at the end if the products solve the use-cases

<Makx> +1 o Simton

<antoine> +1

kcoyle: is up to the group to drive requirements

PROPOSAL: to accept the use cases ID8 and ID20 as they are

<newton> +1

<Thomas> +1

<SimonCox> +1

<riccardoAlbertoni> +1

<alejandra> +1

<kcoyle> +1

<annette_g_> -!

<PWinstanley> +1

<LuizBonino> +1

<AndreaPerego> +1

<Jaroslav_Pullmann> +1

<LarsG> +1

<roba> +1

<Ine_> +1

<Makx> +1

<annette_g_> -1

<Keith> +1

<danbri> +1

<antoine> with or without the requirement part?

<DaveBrowning> +1

<dsr> +1

<Thomas> antoine without for now

annette_g_: I still have a concern about the ID8 being a use case

<antoine> ok then +1

annette_g_: it is too general
... I feel the use cases should be concrete

kcoyle: annette_g_ do you volunteer to rewrite it?

SimonCox: I agree that annette_g_ do it

PROPOSAL: to accept the use cases ID8 with edits that annette_g_ will provide and ID20 as it is

<Thomas> +1

<PWinstanley> +1

<newton> +1

<annette_g_> +1

<alejandra> +1

<DaveBrowning> +1

<AndreaPerego> +1

<LarsG> +1

<dsr> +1

<Ine_> +1

<Keith> +1

<kcoyle> +1

<roba> +1

<Jaroslav_Pullmann> +1

<LuizBonino> +1

<riccardoAlbertoni> +1

<SimonCox> +1

<danbri> +1

<antoine> +0

RESOLUTION: to accept the use cases ID8 with edits that annette_g_ will provide and ID20 as it is

<Thomas> philippe keep the space after +

<Thomas> sorry; it works

<Thomas> (still getting used to IRC)

<SimonCox> IMO we should be quite generous in accepting use-cases, since these exemplify concerns in the community. The more challenging part is distilling the _requirements_ and consolidating these where they overlap or duplicate. The requirements will drive the design of the products.

<antoine> sorry I've abstained only because I've missed the explanation of how annette_g_ wanted to make the UC more concrete.

the use case ID36 https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID36

Makx: Cross-vocabulary relationships is about the need that might be in the dcat about those other type of datasets

<riccardoAlbertoni> +1 to Makx ( probably is just a matter of providing some examples..)

<Keith> agree with Simon, accept all use cases and get on with the work of distilling requirements

<danbri> [q: couldn't I distribute my qb:DataSet in either Turtle or RDF/XML syntaxes, each being a Distribution?]

Jaroslav_Pullmann: I can refer to the wikipage
... Makx is right. Some schema.org consider the data being abstract

roba: I think it is an important use case
... it is not just a distribuition
... we should just double check that we create a situation that can't be a dcat

<AndreaPerego> +1 to roba

Makx: it is a litle bit more complicate than that
... if you have a dataset as a datacube
... the concept is almost the same, but now you have 2 implementation
... one part would be of what dcat call a dataset

roba: I was saying that description can be a distribution
... just we don't get confused on describing data

<Zakim> danbri, you wanted to mention CSVW too

danbri: it is a very important problem

<Keith> dataset/distribution: the problem is DCAT does not use the concepts conceptual, logical, physical - this would help

danbri: we have the choice of going of very specific things
... seems that we have agreed with every domain
... we have to be pragmatic
... if we are describing as a distribution then describe it as a distribution
... there is no right answer, but having concrete use cases might help

Jaroslav_Pullmann: if this would modif dcat standard
... concepts of what this dataset is
... if we agree that the dataset is abstract
... with this notion in mind we should compare with other standards
... these are the differences
... comparing to schema

<danbri> [Dublin Core is scruffy and pragmatic where https://en.wikipedia.org/wiki/Functional_Requirements_for_Bibliographic_Records#FRBR_entities is overly prescriptive; even scoped to libraries, having 4 mutually exclusive types has been hard. It feels like there's a lesson for describing data here.]

kcoyle: is this a use case we want to address?

PROPOSAL: accept the use case ID36

<Thomas> +1

<LuizBonino> +1

<AndreaPerego> +1

<Philippe> +1

<Makx> +1 of course

<Jaroslav_Pullmann> +1

<newton> +1

<alejandra> +1

<PWinstanley> +1

<kcoyle> +1

<LarsG> +1

<Ine_> +1

<danbri> +1

<riccardoAlbertoni> +1

<antoine> +1

<DaveBrowning> +1

<SimonCox> +1

<annette_g_> +1

<dsr> +1

<Keith> +1

<danbri> +2

RESOLUTION: accept the use case ID36

<scribe> scribe: DaveBrowning

<SimonCox> I vote to accept all use cases. But then we will need to distill, and collate, the *requirements* implied by the use cases.

<dsr> scribenick: dsr

<roba> +1

<scribe> scribe: Dave_Raggett

DCAT data elements

We start with ID9, seehttps://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID9

which talks about Common requirements for scientific data

AndreaPerego: this is a use case based upon experience at JRC

We need to verify requirements for multidisciplinary scientific data

we want to be able to describe the context, including authors, lineage, usage, links to publications about the dataset and links to input data

we should start with a link to the context, and later work on what we can describe in the context

PWinstanley: I would be very hesitant to distinguish scientific in the requirements, although its fine as a use case

<danbri> +1 to Peter's concern about distinguishing "science" from non

<alejandra> +q

Keith: I would like to go further with a complex set of role bound properties

We need this additional layer if intelligent software is to make use if it effectively

Annette will extend the use case

<Keith> Keith will generate an extended use case referencing ID9 emphasising relationships of dataset to many other entities with role and temporal limits

Jaroslav: for scientific datasets, there will be an appropriate set of metadata

<scribe> ACTION: Keith to generate an extended use case referencing ID9 emphasising relationships of dataset to many other entities with role and temporal limits [recorded in http://www.w3.org/2017/07/17-dxwg-minutes.html#action01]

<trackbot> Error finding 'Keith'. You can review and register nicknames at <https://www.w3.org/2017/dxwg/track/users>.

Thomas: we don’t want to scare people off with long lists of metadata which may be optional

<Zakim> AndreaPerego, you wanted to comment on the use of "scientific" in the use case

<AndreaPerego> Data lineage: https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#Modeling_data_lineage

<Caroline_> ACTION: annette_g_ to make the UC ID8 more concrete [recorded in http://www.w3.org/2017/07/17-dxwg-minutes.html#action02]

<trackbot> Error finding 'annette_g_'. You can review and register nicknames at <https://www.w3.org/2017/dxwg/track/users>.

<danbri> [what is the unique content of this usecase, beyond those it covers? e.g. I just noticed data citation is also in UC10 also from AndreaPerego]

<annette_g_> S/annette_g_/annette_g/

AndreaPerego: you put a link to the original dataset, but it could also be interesting to describe the processing involved in the lineage

<Caroline_> ACTION: annette_g to make the UC ID8 more concrete [recorded in http://www.w3.org/2017/07/17-dxwg-minutes.html#action03]

<trackbot> Created ACTION-14 - Make the uc id8 more concrete [on Annette Greiner - due 2017-07-24].

It would be very useful to have two levels

<LarsG> [Do use cases need to be unique? I thought we just decided that they are only there to be hooks for unique _requirements_]

AndreaPerego: it is useful to have a link to a specific community where the metadata is relevant

<Zakim> danbri, you wanted to note that data citation metadata is critical for data(set) discovery (+maybe "scholarly" can substitute for "science" in some places?)

danbri: I wanted to speak up for search indexing, we love text rather than numbers

<danbri> AndreaPerego, is there anything unique in UC9 not in your other related UCs?

LuizBonino talks about different roles of authors

<AndreaPerego> danbri, it's more a "meta" use case, giving the general context

<SimonCox> +1 to danbri "what is the extra requirement from this use case" - we should spend our time on extracting requirements. There will be a lot of overlaps, but I'm not sure that chugging through votes on each uc is good use of time?

Karen: can we vote on ID9, and how it relates to profiles

<danbri> +0 then (it seems a useful aggregation of the others, but if it has no unique content, seems an administrative/editorial matter)

<kcoyle> PROPOSAL: accept id9, and consider this also when we discuss profiles

<danbri> +1

<antoine> +1

<Jaroslav_Pullmann> +1

<SimonCox> +1

<alejandra> +1

<LuizBonino> +1

<Ine_> +1

<riccardoAlbertoni> +1

<newton> +1

<Philippe> +1

<Caroline_> +1

<Keith> +1 accept all use cases and get on with requirements

<LarsG> +1

<annette_g_> +1

<PWinstanley> +1 but with the caveat that we remove 'scientific' from it

<Thomas> +1 and +2 to Keith

<kcoyle> +1

<DaveBrowning> +1

<Makx> +1

RESOLUTION: accept id9, and consider this also when we discuss profiles

<kcoyle> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID10

We look at ID10 Requirements for data citation

Karen invites Andrea to introduce ID10

AndreaPerego: this is about being able to cite bibliographic information and to associated related resources with persistent identifiers

<danbri> "Being able to specify the basic mandatory information for data citation" suggests a relation to using SHACL/SHEX or similar c.f. https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID41. I don't see the word mandatory in https://www.w3.org/TR/vocab-dcat/

DCAT doesn’t yet provide the means to distinguish these identifiers sufficiently

Karen summarises the wording in the DXWG charter. We want to describe what is meant by an application profile, but not to define domain specific vocabularies for profiles

AndreaPerego: the use case is not about that as such, but rather about enabling citations

so ID10 is about DCAT rather than app profiles

<alejandra> +q

Thomas: the title of this use case is a little misleading

<danbri> [i.e. If a DCAT-based description is going to be useful for data citation, we'll need to at least show how it would be modeled i.e. in terms of vocabulary not mandatory-ness. Someone else's problem to represent that profile using shex/shacl/etc.]

Keith: one of the big thing with citation is being able to reference a specific version and section of a data set, and this is best handled in terms of a query expression

LarsG asks for clarification about the target of the use case

AndreaPerego: this is about DCAT

alejandra: I think this is an important use case for DCAT, and we need to clarify the requirements as it overlaps with ID9

Jaroslav: we need to consider the query parameters for referencing the distribution

Keith: this can get really complicated with some data stores

LarsG: I am still not sure if this is about DCAT or DCAT-AP

are we here to extend DCAT or to support some form of profile of DCAT usage

<alejandra> +q

Jaroslav: we seem to missing a use case on data identification

<AndreaPerego> I wonder whether "data identification" could not be too abstract. I see it more as a requirement.

<alejandra> isn't that kind of described in ID11?

<scribe> ACTION: Jaroslav_Pullmann to work with Keith on a use case on data identification [recorded in http://www.w3.org/2017/07/17-dxwg-minutes.html#action04]

<trackbot> Created ACTION-15 - Work with keith on a use case on data identification [on Jaroslav Pullmann - due 2017-07-24].

<kcoyle> PROPOSAL: accept ID10

<annette_g_> +1

<alejandra> +1

<riccardoAlbertoni> +1

<Caroline_> +1

<LuizBonino> +1

<Keith> +1

<Philippe> +1

<Ine_> +1

<Jaroslav_Pullmann> +1

<newton> +1

<Thomas> Meaning some core citation info is essential for DCAT VOC

<Thomas> +1

<antoine> +1

<PWinstanley> +1

<DaveBrowning> +1

annette_g: every scientific domain has its own list of metadata for its data sets

what is the level that a profile sits at?

<Makx> +1 to Karen

Karen: we will define how to express a dataset profile, but we won’t work on specific profiles which will be left to the relevant communities

annette_g: we do need to provide guidance to communities as to what we’re expecting them to do

Thomas talks about how to define profiles

how to provide a consistent set of extensions

Karen: we can look at how Dublic Core tackled this

<Makx> +1

<AndreaPerego> +1

<annette_g_> +1

RESOLUTION: accept ID10

<SimonCox> +1 to accept all use cases ...

<Makx> +1

<Thomas> +1

<Jaroslav_Pullmann> +1

<newton> +1

<antoine> +1

<LarsG> +1

<Ine_> +1

<Keith> +1 accept use cases and get to requirements

<LuizBonino> +1

We move onto ID11 https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID11 Modeling identifiers and making them actionable

<danbri> +1

<Thomas> +1

Karen: this is similar to others, can we just vote on accepting it?

Keith: many identifiers are role based, and we need to be general in supporting them

Karen: you need to say what kind of identifier it is to enable search

<alejandra_> it also says alternative identifiers

Annette_G: we need to void limiting people to a single de-referenceable link

LuizBonino: people need to state their identifier and the schema it belongs to

<kcoyle> PROPOSED: Accept ID11

<Thomas> +1

<annette_g_> +1

<LarsG> +1

<Philippe> +1

<Caroline_> +1

<LuizBonino> +1

<riccardoAlbertoni> +1

<SimonCox> +1

<Keith> +1

<PWinstanley> +1

<alejandra_> +1

<newton> +1

<Makx> +1

<kcoyle> +1

<Ine_> +1

<antoine> +1

RESOLUTION: Accept ID11

<danbri> +1

<Jaroslav_Pullmann> +1

<DaveBrowning> +1

<Makx> the question should be: is the UC clear?

<danbri> PROPOSAL: We accept all use cases.

<SimonCox> +1

<LuizBonino> +1

<Keith> +1

<Makx> accept them unless osmeone objects

<danbri> No objections so far

<Makx> +1

<AndreaPerego> +1

danbri proposes we accept all of the use cases so we can discuss requirements

<Zakim> danbri, you wanted to propose we accept all usecases

<antoine> -1

<DaveBrowning> +1

<danbri> antoine, can you repeat?

<SimonCox> notuc = no objection to unanimous consent

<antoine> I am co-chairing a group where people actually submitted UCs that were out of scope

<antoine> so I have to flag this

<antoine> That said, I will not strongly object if the WG here decides to just move on!

<antoine> Maybe this is a better group :-)

Karen: we want to decide whether the use cases are in or out scope

danbri: has anyone some ideas as to which use cases should be out of scope

ID9 needs some rewriting to avoid being specific to scientific datasets

Out of scope, means that we won’t address the use case in DCAT

<riccardoAlbertoni> +1 to AndreaPerego

AndreaPerego: it is better to be concrete in the use cases and then generalise the requirements

Karen agrees

Do we want to go through each of the use cases to resolve whether they are in or out of scope?

We won’t be able to get a full list of requirements from the use cases today.

<Makx> what time come back?

Our purpose today is to determine what is the scope of DCAT 1.1

<danbri> [Maybe we can get away with a 'bulk' resolution that we believe all UCs submitted are in-scope to *consider* as reasonable asks of DCAT 1.1]

scribe: we break for 15 minutes …

<Caroline_> in 15min we will come back

<SimonCox> I won't come back. Getting late here and I'm still nursing pneumonia

<Caroline_> hope you get better soon SimonCox

<danbri> goodnight, SimonCox!

RESOLUTION: SimonCox to get well soon

<SimonCox> :-)

<Thomas> on a pause her, makx/andrea

<Thomas> starting within a few minutes

<Thomas> (I think)

<Makx> My apologies, will have to disconnect at 13:00 my time/noon Oxford for another call.

<Makx> Hope we'll get through Quality by the top of the hour

<Thomas> scribe Thomas

<scribe> scribenick: Thomas

kcoyle: next two use cases - similar to identifiers

id28 & id29 parallel

are these out of scope?

AndreaPerego: relationship id28-29

both about spatial aspect of data

not related otherwise

28 is about specifying reference systems that the data use (coordinate systems)

29 is about the spatial coverage

29 is more literal by nature

<kcoyle> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID28

<kcoyle> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID38

kcoyle: talking about 28 and 38

29 later

<roba> these are both cases of ID26

looking at id28 and id38 now

<Caroline_> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID28

<Caroline_> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID38

any objections against 28?

<roba> 28 and 27 are basically the same pattern

<danbri> Noting that https://www.w3.org/TR/owl-time/ is in Candidate Recommendation review re time-related aspects in ID38

<LuizBonino> Again, it is seems that both 28 and 38 are suitable for extensions/profiles

<alejandra> can we update the agenda to have the correct name if ID28?

roba: 27 and 28 are alike and go about the modelling aspects of semantics

general set of requirements will have to be extracted from there

and then discussed

<AndreaPerego> I would say UC27 (temporal coverage) is related to UC29 (spatial coverage), not UC28 (reference systems).

lot of people are using these things differently

<antoine> AndreaPerego++

kcoyle: grandfather in related aspects to 26

Jaroslav_Pullmann: we're going to look into requirements - we have to ask of the requirement in UC28 is enough?

kcoyle: are there other requirements - they can be added to the UC

PWinstanley: spatial and temporal - have a 'scaling'-property

Maybe some 'superclass'-object for scaling?

Describe a reference system where the scaling info originates to

some ontology etc

<roba> UC1 is such even more general UC in https://www.w3.org/2017/dxwg/wiki/Use_Cases_and_Requirements

dsr: what are we up to here?

describing conventions or validating consistency to a reference system

PWinstanley: do we need a place for this in DCAT? We shouldn't miss out on flexibility
... keep a future-proof architecture

dsr: do we need to check on integrity?

PWinstanley: yes, but we need a chunck to make that possible

kcoyle: can we make a use case for this?

PWinstanley: ACTION: Peter W will do a use case for this

<kcoyle> https://www.w3.org/2017/dxwg/wiki/Use_Cases_and_Requirements#UC1_Consistent_use_of_summary_properties_and_extension_points_with_more_detailed_domain_specific_information_models

roba: general use case-attempt for that one

feel free to edit this one

roba: job of just describing a reference is really not easy

a simple property defining the reference is just not enough

especially within spatial world

look at the spatial data on the web WG for that

not overspecify within DCAT

Keith: don't forget astronomical and microscopical coordinate systems

AndreaPerego: in the UC there is a reference to the SDW-WG

review the work from there and follow the best practices might be an option

<Zakim> AndreaPerego, you wanted to point also to UC14 and ID16

AndreaPerego: related to UC14 and UC 16

kcoyle: let's stick with the spatial and temporal for the time being

objections for having these?

<kcoyle> PROPOSED: Accept ID28 and ID38

<AndreaPerego> Relevant SDWBP, linked from UC28: https://www.w3.org/TR/sdw-bp/#bp-crs

<annette_g> +1

<antoine> +1

<alejandra> +1

<LarsG> +1

<Caroline_> +1

<riccardoAlbertoni> +1

<Jaroslav_Pullmann> +1

<Philippe> +1

<DaveBrowning> +1

<roba> +1

<dsr> +1

<Ine_> +1

<kcoyle> +1

<Keith> +1

<AndreaPerego> +1

<PWinstanley> +1

<danbri> +1

RESOLUTION: Accept ID28 and ID38

<kcoyle> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID14

kcoyle: data quality; UC14-15

<kcoyle> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID15

<alejandra> ID14 is related to ID43

<dsr> I note that XBRL supports hypercubes as an abstract coordinate space for financial reporting data

kcoyle: 14 about data quality and 15 about precision and accuracy

<AndreaPerego> alejandra, yep, it is.

PWinstanley: 15 is a subset of 14

And constraints on useability come into that

PWinstanley: What are the things that restricts the re-use of the data?

Partly the data-quality and partly the collection-process

Maybe we should use another pluggable container alike reference systems

PWinstanley: we shouldn't focus on the things that we directly see at hand

<Zakim> danbri, you wanted to ask what "Provide patterns for " means here; e.g. is it showing some examples using other vocabs?

all the time

danbri: should we provide examples of other vocabularies

AndreaPerego: what is missing was a recommendation to follow

<danbri> Thomas: what I saw in these 2 use cases is another case of using reference systems. We shouldn't make lots and lots of reference system classes, but have some common modelling structure.

<danbri> ... we shouldn't go further than that in defining DCAT. Anything going deeper is up to the profiles.

Jaroslav_Pullmann: in a commercial PoV, in order to express quality of service levels, we need that information also
... they ought to be valid use cases for us

<dsr> Can we enable profiles with annotations covering:

<dsr> * Where the estimates of precision/accuracy come from

<dsr> * When a data point has been interpolated (e.g. lost data, broken sensor)

<dsr> * When a sensor is no longer trusted, despite what it says

<danbri> [ AndreaPerego - do you think the topic of modeling caveats would fit in these UCs? https://lists.w3.org/Archives/Public/public-dxwg-wg/2017Jul/0041.html]

<AndreaPerego> [ danbri, possibly, but need to look at it more closely ]

dsr: a question is how to enable profiles to allow the reason/connotation on data quality issues

<kcoyle> PROPOSED: Accept ID14 and ID15

broken sensors vs fawl data

<alejandra> ++1

<alejandra> +1

<annette_g> +1

<Caroline_> +1

<Philippe> +1

<antoine> +1

<LuizBonino> +1

<roba> +1

<dsr> +1

<riccardoAlbertoni> +1

<Ine_> +1

<Keith> +1

<LarsG> +1

<DaveBrowning> +1

<danbri> +1

<newton> +1

<PWinstanley> +1

<newton> +1

<Jaroslav_Pullmann> +1

RESOLUTION: Accept ID14 and ID15

kcoyle: go to ID12 and then have lunch
... ID12 - data lineage

https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID12

kcoyle: we probably need a place to denote the source of the data

<Keith> does data lineage = provenance?

Is PROV an option?

Jaroslav_Pullmann: provenance is important

Textual of structured?

we need a structured, machine-readable way for this

<AndreaPerego> Keith, indeed, there may be related (lineage / provenance), but it depends on the definition of provenance.

<AndreaPerego> Thomas, PROV is mentioned as an option in the UC.

alejandra: is'nt this too generic

thx, andrea

s/isn't isn't

Keith: vertical vs horizontal provenance including the relationship between those two

goeis'nt isn't beyond PROV VOC

AndreaPerego: comment on provenance lineage

sometimes 'who did the job'?

<dsr> Keith: the ability to reconstruct the state at a specified time

you have provenance on the dataset-level and provenance on the agent roles, workload, ...

s/you AndreaPerego :/

PWinstanley: you have also instances where the data is being used etc

don't strictly belong to the provenance of the dataset

<AndreaPerego> Quoting from the DC definition of dct:provenance: "A statement of any changes in ownership and custody of the resource since its creation that are significant for its authenticity, integrity, and interpretation."

kcoyle: we don't have anything that goes beyond the strict provenance

PWinstanley: when a dataset is used in different contexts, the meaning/nature of the dataset might change but the dataset itself isn't
... we could have an 'event' and a 'transition' (transition = change; event = not changed)

all can adhere to 'provenance'

kcoyle: are we describing another requirement

PWinstanley: want to leave that to be decided

Jaroslav_Pullmann: UC 'funding sources' is another related aspect - that UC is linked to this one

also very strong related to versioning

<scribe> ACTION: Jaroslav_Pullmann will link these [recorded in http://www.w3.org/2017/07/17-dxwg-minutes.html#action05]

<trackbot> Created ACTION-16 - Will link these [on Jaroslav Pullmann - due 2017-07-24].

roba: what is the goal for bringing extra properties into DCAT?

<kcoyle> PROPOSED: Accept ID12

AndreaPerego: we should also take into account the goal to which the dataset should be used

(andrea: correct me if I'm wrong please)

Jaroslav_Pullmann: we should understand why provenance should be modeled

it isn't clear this time

kcoyle: that's what happens when we pull the requirements out of the use cases

<AndreaPerego> Thanks, Thomas. The 2 purposes I see are: data reproducibility and fitness for purpose.

thx

<Caroline_> +1

<roba> +1

<alejandra> +1

<newton> +1

<danbri_> +1

<LuizBonino> +1

<Jaroslav_Pullmann> +1

<LarsG> +1

<PWinstanley> +1

<DaveBrowning> +1

<dsr> +1

<Philippe> +1

<Ine_> +1

<annette_g> +1

<Keith> +1

RESOLUTION: Accept ID12

kcoyle: declares lunch

<antoine> +1

resume in one hour

<AndreaPerego> +1

s/! 1/

<roba> bye

<AndreaPerego> Bye.

<roba> have other commitments tomorrow eve - so will be joining later in your morning for a bit.

<Makx> Where are we on the agenda?

<Makx> OK so I just missed the whole item on Quality. Pity.

<Caroline_> chair: Caroline_

<Caroline_> scribenick: alejandra

data quality

<Caroline_> Use Case https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID16

ID16: https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID16

is it a duplicate of ID43?

<dsr> (we break for lunch)

antoine: wondering if Andrea was joking, there was a lot of discussion about this use case
... in the DWBP WG
... test wether a dataset complies with a given standard, one wants to record this
... this was a use case in the data quality vocabulary
... we ended up not being able to implement what Andrea wanted
... should I reformulate the discussion or confirm that the use case it is still relevant?

Caroline_: do people think it is out of scope or relevant?

Jaroslav_Pullmann: expression of quality is hard to express or assess without reference to an evaluation criteria

kcoyle: same as ID14

LarsG: same question about providing a hook in DCAT core to provide these things?
... or is it outside of DCAT core?

LuizBonino: my understanding is about compliance with something (standard, quality parameter, etc)
... then there will be a validator to check the compliance
... compliance is context dependence

<antoine> For the record here is the part about it in DQV, with a note about our resolution in DWBP: https://www.w3.org/TR/vocab-dqv/#ExpressConformanceWithStandard

annette_g: ... there is a lot of overlap with the data quality vocabulary

<AndreaPerego> Thanks, Antoine.

<Zakim> danbri, you wanted to note that lots of UCs have this structure - a reasonable usecase that may likely be beyond core and addressed by dcat + another vocab. Will we make a

annette_g: maybe we need to discuss how to address that with DCAT and at this point, not how to deal with it

danbri: a lot of the UCs indicate that we can grow the DCAT core very quickly
... should we collect a list of useful extras?

<Makx> +q

<Keith> can we get requirements from UCs and then decide what is core and what not?

<Makx> -q

kcoyle: we need a document advising about what fits in

<danbri> alejandra, ... sorry I meant to say the opposite. Rather that we keep getting UCs where we could make a small change to the core but not address the full usercase.

sorry!

<antoine> Also for the record, I could dig the issue about Andrea's suggestions: https://www.w3.org/2013/dwbp/track/issues/202

<Zakim> AndreaPerego, you wanted to explain about the specificity of this UC

<dsr> The charter makes provision for work on a primer

<danbri> ... and that we don't have a repeatable answer, such as "we'll add this to our 'useful multi-vocabularies cookbook' page/document"

<PWinstanley> https://lists.w3.org/Archives/Public/public.../att.../DCAT-APimplementationguide.pdf

AndreaPerego: this UC is not included in data quality 1 because we came across when using DCAT for spatial metadata
... you should be able to express conformity and non-conformity
... important for discovery purposes
... what are the data that needs to be modified to be conformant
... general UC wasn't explaining these specific issues

<PWinstanley> https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Jul/att-0010/DCAT-APimplementationguide.pdf

AndreaPerego: we identified this in the implementation of DCAT
... in some cases, we found a solution, not in others
... 90% of these use cases are meta-UCs
... use of DCAT for supporting cross-domain interoperability
... there is always a reference to other standards
... we want to support interoperability across metadata standards
... we had to address this problem on how data standards are modeling things

LuizBonino: the majority of the use cases we discussed seem suitable for extensions around datasets
... other parts may interfere on the structure of DCAT as it is now
... if you have an approach for versioning, you have a version and a distribution, the distribution should not be attached to the dataset anymore but to the version

<Makx> @PWinstanley https://joinup.ec.europa.eu/asset/dcat-ap_implementation_guidelines/description are based on actual problems brought forward by implementers.

LuizBonino: we need to define the profile description method to define how people are going to use this

kcoyle: I edited some UCs related to this
... e.g. in ID42
... this is about the dataset itself and not about the data itself
... I don't know if it needs to be brought to the level of DCAT

LuizBonino: we have the dataset and the distribution, and we have the metadata about the semantics
... each distribution matches to the generic concept
... the constraints on what you have to provide is what I would consider a profile

kcoyle: a picture would be good

PROPOSAL: accept ID16

<annette_g> +1

<riccardoAlbertoni> +1

<antoine> +1

<newton> +1

<Caroline_> +1

<Ine_> +1

<Makx> +1

<Jaroslav_Pullmann> +1

<PWinstanley> +1

<danbri> +1

<LarsG> +1

<Caroline_> +1

<Keith> +1

<DaveBrowning> +1

<dsr> +1 - yes to in scope as a use case

<antoine> I agree it is very similar to 43

is this overlapping with ID43?

<Makx> @antoine French headphone level limit?

<annette_g> Overlap is okay, no?

<AndreaPerego> Yes

<Thomas> +1

LuizBonino: explaining diagram

diagram here also useful: https://www.w3.org/TR/hcls-dataset/

<danbri> see https://twitter.com/danbri/status/886933651178573824

<Makx> can't you share a screen on Webex?

LuizBonino: dataset can extended with any profile
... distinction between dataset, version/release, distribution

Makx: this is a substantial model change to DCAT

kcoyle: this is LuizBonino's current model
... it doesn't mean that we will follow this model

Makx: we need to be very careful in making substantial model changes
... as it can break current implementations

<AndreaPerego> +1 to Makx

RESOLUTION: accepted ID 16

<antoine> +1

Caroline_: now discussing ID23: https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID23

riccardoAlbertoni: I'll give some context to this UC that we collected from the DQV
... Antoine, other people and I contributed
... meta-use case, data quality is very important for any reuse of data
... data collected in the past is considered also in this group
... it seems that there is some overlap in the use cases that Andrea proposed
... even though from a different perspective
... some concrete case studies, how to identify integrity constraints (e.g.)
... depending on how far we want to go in data quality within DCAT, there is some DQV housekeeping
... DQV was released last December
... it would be great for us to have the possibility to make small changes

antoine: general question on what should be the position of the DQV in terms of the core and profiles for DCAT

riccardoAlbertoni: it is quite difficult to define how far we have to go in data quality
... this discussion should consider the UCs presented by AndreaPerego

Makx: two comments on DQV
... it makes sense to use UCs as a good point to see how DQV can be attached to DCAT
... the one that danbri came up in the last couple of days w.r.t caveats on statistical data

<danbri> Caveats discussion: https://lists.w3.org/Archives/Public/public-dxwg-wg/2017Jul/0041.html (caveat/footnotes even at data item level)

Makx: I've got a use case I forgot to put it

<danbri> statDCAT AP, https://joinup.ec.europa.eu/node/147940

Makx: people have included annotations of DQV to datasets
... I will write that UC
... and danbri can write the other UC

Caroline_: yes, please write more use cases

<AndreaPerego> Just to note that another case for the use of DQV in DCAT is UC15, where DQV is indicated in the existing approaches, and mentioned also in SDWBP: https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID15

<danbri> ACTION: danbri write UC for data-item level caveat annotations [recorded in http://www.w3.org/2017/07/17-dxwg-minutes.html#action06]

<trackbot> Created ACTION-17 - Write uc for data-item level caveat annotations [on Dan Brickley - due 2017-07-24].

PROPOSAL: accept ID23 (https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID23)

<Thomas> +1

<AndreaPerego> +1

<danbri> +1

<LuizBonino> +1

<newton> +1

<DaveBrowning> +1

<annette_g> +1

<PWinstanley> +1

<Makx> will you give me an action for DQV annotiation?

<antoine> +1

<Caroline_> +1

<Ine_> +1

kcoyle: there will be a need to tease out lots of requirements

<dsr> +1

<LarsG> +1

<Keith> +1

<riccardoAlbertoni> +1

<scribe> ACTION: Makx to create a UC for DQV annotation [recorded in http://www.w3.org/2017/07/17-dxwg-minutes.html#action07]

<trackbot> Created ACTION-18 - Create a uc for dqv annotation [on Makx Dekkers - due 2017-07-24].

RESOLUTION: accepted ID23

Next UC ID19: https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID19

AndreaPerego: this is a general or meta UC
... modelling different types of information
... e.g. input data
... property linking a dataset with the publisher and the author
... you may need to attach to these relationships some additional information
... such as the temporal context
... general use case where we need some guidance on how to provide this information
... it can be applied for any type of information to be attached to a relationship

kcoyle: in our UCs we have mixed up UCs about the dataset and the data in the dataset
... we need to tease those apart
... as we may want to address them differently
... there may be statements that we may want to make about the data / data semantics
... we haven't made that distinction in the discussion

AndreaPerego: when we talk about dataset and when we talk about the data itself
... do we have this in DCAT already? CatalogRecord and Dataset

kcoyle: ... are we ok in mixing those or do we need to keep them separately?
... if we need to say something about quality, we need to say quality about what

AndreaPerego: we had these discussions in the DQV and we concluded that in most cases we are talking about data
... we can use the same approach in data or metadata
... this is a topic that may need further discussion

Jaroslav_Pullmann: I didn't make this difference
... dataset is about whatever data is behind
... I think this is related to what Rob was proposing
... atomic properties and specific descriptions
... I think this is a general approach of modelling
... atomic properties and complex descriptions, which the UC says qualified descriptions
... what does it mean qualified form?

AndreaPerego: this is from PROV-O
... different ways of representing the same information: the core, the extended and, the qualified
... reified representation of a relationship
... where you can attach additional information to a relationship

<AndreaPerego> PROV properties with qualified forms: https://www.w3.org/TR/prov-o/#inverse-names-table

AndreaPerego: e.g. prov:qualifiedAssociation
... I don't know if there is a better term, but the idea is to have another relationship to add more information
... bridge between the dataset and the source data
... where you can attached more information on when the data was processed and so on

Jaroslav_Pullmann: simple atomic properties and qualified properties
... is there a concrete suggestion on when this patterns applies?
... e.g. for quality, accuracy
... how would you restrict the application of this pattern?

AndreaPerego: personally I would stick to what we define as concrete requirements
... I would rely on what the community used in DCAT to identify what is relevant
... the point is to have concrete use cases where people want to specify concrete information and either they can't do it or they do it differently every time

Jaroslav_Pullmann: would this be an extensibility pattern for DCAT?

AndreaPerego: yes, I think so - but you cannot be sure if the proposal is universally applicable
... unless you collect use cases

PROPOSAL: accept ID19 as relevant use case https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID19

<riccardoAlbertoni> +1

<Caroline_> +1

<Philippe> +1

<danbri> +1

<Thomas> +1

<Ine_> +1

<Keith> +1

<dsr> +1

<Jaroslav_Pullmann> +1

<DaveBrowning> +1

<newton> +1

<annette_g> +1

<LarsG> +1

<PWinstanley> +1

<LuizBonino> +1

RESOLUTION: ID19 accepted

<antoine> +1

DCAT general

Next UC: https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID26

ID26

Jaroslav_Pullmann: this is a meta UC
... I consider DCAT as a core where one would attach properties with specific standards/vocabularies

<Thomas> UTC :-)

Jaroslav_Pullmann: identify what are the specific aspects that need an extension
... and identify a property for each of them

kcoyle: are you anticipating that there will be particular vocabularies that DCAT needs to consider?
... how open should we be?
... should we decide for each element whether there are recommended vocabularies?

<Makx> +q

Jaroslav_Pullmann: what should happen is an analysis of what is there
... and have simple properties for describing simple stuff
... and attach further descriptions

<Zakim> danbri, you wanted to suggest UC bakes in a specific kind of solution

danbri: I like the general intent
... the current formulation of the UC assumes a specific technical approach
... LuizBonino diagram includes a release structure
... Makx's pointed out on being careful on changing the structure
... maybe it would be good to rephrase not to consider specific implementation
... we shouldn't assume that a specific property is the solution

Makx: I have difficulties understanding how this would work
... danbri indicates that we shouldn't mention properties, but in an RDF world we need to speak about properties
... these 6 bullet points seem to imply that there are separate properties for separate extensions
... but we already have a potential provenance one
... we already have specific ones for temporal and spatial coverage

<danbri> Makx, my point was that in some extreme/complex important cases the WG might actually restructure DCAT's overall pattern with new types (e.g. Release/Version as in Luiz's diagram)

Makx: I'm not quite sure on what the proposal is
... if a catch-all approach
... with loose semantics
... or identifying what extensions are needed
... I prefer the latter

LarsG: I'm pro having extension points in DCAT, I don't think we should mandate which vocabularies to use

<Makx> +q

LarsG: here you can put provenance, you may want to use PROV
... but we shouldn't say 'you must use PROV', as this is getting into the area of profiles

Jaroslav_Pullmann: benefit of this use case is to identify the important aspects that should not be forgotten
... there should be a dedicated property that relates to whatever specification of this aspect

Makx: I wanted to react to what LarsG was saying
... we are absolutely not in a position to recommend vocabularies
... we can only provide a property where people can put whatever they think it is relevant

annette_g: I agree with LarsG, it would be the role of a profile to define what extensions to use
... we can say in a profile 'we will use PROV-O'
... but not in the core

<riccardoAlbertoni> +q

kcoyle: we can provide guidance

<riccardoAlbertoni> -q

<LarsG> alejandra: for each element listed in the UC there are specific use cases

<LarsG> ... there are areas that might not be relevant for DCAT but for profiles

<LarsG> Jaroslav_Pullmann: it's about grouping the other UCs

<LarsG> ... we have extension points where we can hook that in

so, this UC is for grouping the other UCs

<Caroline_> scribenick: LarsG

Thomas: It would be useful for clients to know how to handle that

antoine: wants to continue on alejandra 's point
... UCs are not very specific, but more meta
... every property in DCAT can be seen as an extension point

Jaroslav_Pullmann: it wasn't meant to be implemented in the model

<Makx> good point Antoine

Jaroslav_Pullmann: more a hint that we need to consider this in the model
... it's a meta use case listing what I think is important

antoine: so it's more like a design principle

<Makx> +q

antoine: "if I want to extend DCAT this is what I should consider"

Jaroslav_Pullmann: if we agree on accepting this UC, Jaroslav_Pullmann would have a task
... to think about this

Makx: when we worked on the European profile this is exactly what we did (describing extension points)
... if there is a large group of people that want the same thing, we could go back to DCAT and add it
... so the six bullet points in the UC, there might be 200, but in the end we need to figure
... out which ones we want: what goes into DCAT core and what is profile

antoine: agrees with Makx, would be in favour of accepting the UC
... suggests that Jaroslav_Pullmann focuses on the meta aspect
... should phrase it as a methodological point

<Makx> +100 to antoine

antoine: "if you have own needs, we have a methodology for creating profiles"

Jaroslav_Pullmann: The UC was a first shot. Compared to the ISO standards there is an aspect
... of maintenance that isn't covered in DCAT
... DCAT needs a reference to that
... those are aspects that are usually covered by specific vocabularies

PROPOSED: Accept ID26

<Makx> +1

<antoine> +1

<annette_g> +1

<AndreaPerego> +1

<antoine> with editing!

<PWinstanley> +1

<LuizBonino> +1

<Ine_> +1

<PWinstanley> with editing

<Thomas> +1

PROPOSED: Accept ID26 with editing by Jaroslav_Pullmann

<riccardoAlbertoni> +1

<dsr> +1 subject to editing the text

<Caroline_> +1 with editing

<annette_g> W/e

<newton> +1

<DaveBrowning> +1

<danbri> Jaroslav_Pullman, suggested edit: "extension points (properties) " -> "extension points (typically properties)"

<danbri> +1

<scribe> ACTION: Jaroslav_Pullmann to edit ID26 [recorded in http://www.w3.org/2017/07/17-dxwg-minutes.html#action08]

<trackbot> Created ACTION-19 - Edit id26 [on Jaroslav Pullmann - due 2017-07-24].

[discussion about how to get from Use Cases to Requirements...]

RESOLUTION: Accept ID26 with editing by Jaroslav_Pullmann

<Caroline_> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID33

alejandra: we need a way to provide an overview of data
... could be statistics
... and might go into a profile

<danbri> https://www.w3.org/TR/hcls-dataset/#s6_6 (mentioned by alejandra) seems to use VoID for rdf triple stats

kcoyle: danbri said that search engines are better at text than at numbers

<alejandra> +q

kcoyle: so like an abstract in a paper, an overview could improve discovery
... can DCAT help there

<danbri> (text and also entities that can be found via textual queries, rather than raw decontextualized numbers)

alejandra: it's more than just a description, but telling potential users of how much data to expect
... ten patients or 1000
... also how many triples etc

<Makx> +q

alejandra: but hard to generalise so might go into a profile

PWinstanley: so it won't be text

kcoyle: if it's not in a particular format, is it for display?

LuizBonino: it might be for validation
... if you use a profile you want to check
... in the profile you need to attach the vocabulary that describes how many patients
... and then you can validate: does the metadata contain this statistical information?

Makx: we tried some of that. DCAT only has byteSize (not clear to everybody). When you start talking about how many things are in your DS, there are many
... different ways to define that and that is specific to the use of the DS
... so this is community-specific and can hardly be generalised => profile

<alejandra> I agree Makx - maybe to consider for AP guidelines

<antoine> https://www.w3.org/2013/dwbp/track/issues/164

<antoine> https://www.w3.org/2013/dwbp/track/issues/189

antoine: sends around a cople of pointers from the data quality work.
... there statistical information was very important
... they point to initiatives about statistics that were considered relevant
... agrees with Makx that counting is very difficult

<antoine> https://www.w3.org/TR/void/#statistics

<alejandra> +q

antoine: void has somie counting properties, and even if we don't incorporate
... void into DCAT there are similarities that might satisfy this DCAT

alejandra: void is specific to RDF so could be more a guidance
... but this UC could be about a generic pattern how to count things

Thomas: if you leave semantics behind you can count anything, so we need to stay within the domain

antoine: we should at least be able to say why we didn't look at these issues

PWinstanley: if we were to put the summary as an XMLLiteral the search engines could pick that up and leave a hook for people to provide structured data

Jaroslav_Pullmann: usually it's important to provide the range of a property to give hints: Do we plan to do that in our ontologies?

PWinstanley: we're not obliged to, it's an additional layer of modelling

<Zakim> danbri, you wanted to report a bioschemas discussion on "Data record" structures that relates

Jaroslav_Pullmann: but that's an important part of modelling

danbri: It depends on how static your ontology is. schema keeps changing so they are very conservative with domains and ranges

<danbri> http://bioschemas.org/

danbri: Bioschemas do much typing (rows/columns) and it would be good if DXWG do the same

kcoyle: domains and ranges could be part of a profile, not necessarily DCAT

<danbri> ... we saw value in having a DataRecord view into contents of a datset, but even a simple multi-table dataset has two obvious representations as a set of records (1. entities 2. table rows). Either or both may be useful.

kcoyle: profiles can not only add new elements but also add constraints to existing ones

alejandra: much of it could be put into a profile
... healthcare data is important because it's often not freely available

PROPOSED: Accept ID33

<alejandra> +1

<annette_g> +1

<Caroline_> +1

<Jaroslav_Pullmann> +1

<LuizBonino> +1

<dsr> +1

<danbri> +1

<PWinstanley> +1

<antoine> +1

<Ine_> +1

<newton> +1

<DaveBrowning> +1

<Thomas> +1; curious about the requirement here

RESOLUTION: Accept ID33

<Keith> +1

<Makx> +1

next: UC35

<Caroline_> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID35

Makx: we have had this discussion before: There is a dataset and a description of it but no designated catalogue (or no catalogue at all)
... like people creating datasets of their own.

<alejandra> +q

Makx: Is DCAT the data _catalogue_ or about datasets, too

<Zakim> danbri, you wanted to suggest RDF vocabs don't do 'mandatory'

Makx: you could use DCAT to describe a dataset before it's made part of a catalogue

danbri: doesn't see a big problem. Vocabularies just provide useful terms
... can provide some statistics from google

Keith: catalogues are create by manual creation or by harvesting from other catalogues, so it's not a proble

LuizBonino: the focus of DCAT is the dataset and not the catalogue
... the model isn't clear, though. It should be possible to have datasets without a catalogue, so we need to fix the cardinality

Jaroslav_Pullmann: one issue could be that the concepts/topics are part of the catalogue and would be lost for datasets without it

<alejandra> dcat:Dataset definition: A collection of data, published or curated by a single agent, and available for access or download in one or more formats.

<Keith> and of course the catalogs are themselves all datasets

<alejandra> doesn't mention catalogue

<AndreaPerego> First sentence from DCAT does: "DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web."

PROPOSED: Accept ID35

<LuizBonino> +1

<newton> +1

<Thomas> +1

<Ine_> +1

<AndreaPerego> +1

<Jaroslav_Pullmann> +1

<Keith> +1

<Caroline_> +1

<PWinstanley> +1

<Makx> +1

<Philippe> +1

<alejandra> +1

<dsr> +1

<antoine> +1

<danbri> Interesting - https://www.w3.org/TR/vocab-dcat/#class-dataset is quite restrictive. Whereas https://www.w3.org/TR/vocab-dcat/#introduction is quite general about data.

<danbri> +1

<annette_g> +0

<DaveBrowning> +1

<Makx> Section 1 is non-noramtive

<danbri> What does "published or curated by a single agent" mean? If two people publish something together must we treat them as an Organization to meet this semantic?

RESOLUTION: Accept ID35

<Thomas> I agree, Lars

<Makx> OK

Caroline_: we're discussing more than the chairs had planned

LuizBonino: What with the people who participate remotely in specific timeslots?

kcoyle: it's specifically about profile negotiation wher Ruben wanted to call in, but LarsG s here to cover that

antoine: would want to have ID37 moved up since he can only join until 3pm

Coffee break until 4pm

<AndreaPerego> ^^ 5PM CEST

<Makx> when will you be back from breack

in 30 minutes

<AndreaPerego> scribe: alejandra

<AndreaPerego> scribe: LarsG

https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID40

<Caroline> scribe: Jaroslav_Pullmann

<Caroline> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID40

<AndreaPerego> scribenick: Jaroslav_Pullmann

<Caroline> reading the use case ID40

kcoyle: what part of dcat should be aligned with Schema.org?

<AndreaPerego> kcoyle, I made a comment on this topic here http://lists.w3.org/Archives/Public/public-dxwg-wg/2017Jul/0052.html

danbri: summarizing about the evolution approach of Schema.org

kcoyle: the question remains - how do we get both aligned (via sameAs etc.)?

<Makx> can we please respect speaker queue?

kcoyle: all the properties are in dcat/dct namespace ..

<Caroline> kcoyle was on her turn :)

<danbri> view-source:http://schema.org/Dataset

Keith: suggestion to use an Schema-annotated HTML page to make catalog/datasets accessible (~ landing page)

<dsr> How are data catalogues discovered? One idea is to embed schema.org tags in web pages as a means to discover catalogues, and then use the DCAT vocabulary for further queries

Makx: rewrite the use case to exemplify the publishing process

<dsr> Keith: we need a way to expose DCAT to schema.org

danbri: there are mutliple classes in Schema.org that might fit the individual Dataset notion like Product, CreativeWork

<Makx> are we talking about the use case or arguing dumping DCAT in favour of schema.org?

danbri: describes how the metadata is being extracted and processed out of the web pages, there should be mapping to this Schema.org subset from DCAT

<danbri> see https://developers.google.com/search/docs/data-types/datasets and associated blog post

<danbri> i.e. https://research.googleblog.com/2017/01/facilitating-discovery-of-public.html

<Makx> +1 to kcoyle

kcoyle: approach to be best found by search engines via web pages annotated with Schema.org metadata?

<Caroline> Jaroslav_Pullmann: in the level of a dataset which may be dynamic we don't need a API endpoint

<Caroline> we can annotate with some of the schema.org elements

<Caroline> which of these propoerties could be exported to schema.org?

<Caroline> danbri: shared the documentation see https://developers.google.com/search/docs/data-types/datasets and associated blog post i.e. https://research.googleblog.com/2017/01/facilitating-discovery-of-public.html

<danbri> https://www.w3.org/TR/2016/NOTE-csvw-html-20160225/

<danbri> ... is the CSVW WG's note on JSON-LD in HTML

dsr: asking for use cases for seraching of particular type of resources (datatsets, services) ..

LuizBonino: 2 differents ways considered

<Makx> can LuizBonino speak louder please

<Caroline> is it better Makx ?

1) we generate HTML pages annotated by Schema.org metadata for dataset

<Makx> a bit bvetter yes

<dsr> users typing informal queries to discover catalogues and data sets and linking to pages offering a richer more structured search, intent based search involving hidden APIs for quick added value results, or back end use cases where a service initiates a query and generates a composition of services as a design for later instantiation or a dynamically instantiated composition for immediate use.

the dcat:Dataset is described as schema:Dataset as well

<Caroline> LuizBonino was showing on the figure he draw what he is explaining. He is talking about DCAT:dataset in the figure

2) the findability is further supported by indicating a dcat:theme

alejandra: might Google support DCAT natively, in contrast to mapping to Schema.org?

<Makx> European Data Portal 750.000, data.gov 160.000 data sets

Jaroslav_Pullmann: What would be the target of such an indexing by search engines?

<Caroline> Jaroslav_Pullmann: what to do if you are searching for data and are given 200.000 datasets?

<Makx> +q

Makx: comming back to the use case..

<kcoyle> +1

assuming the approach to define such a landing page of a dataset, what is the guidance of how to epxose it in terms of Schema.org annotation?

<Makx> cookbooks are good

<alejandra> +1

<danbri> cookbook feels the right level to me; DCAT-shaped structures...

<Makx> +1

PWinstanley: suggesting a cook book with examples

<danbri> ... with extras from other vocabs, and in schema case maybe mappings

<dsr> +1 to separate cookbook + alignment on terms where possible

<danbri> (prefer "cookbook" to "best practice" given that these things are still in flux)

danbri: in effect, this is mostly about mapping DC terms to Schema, which has been done already

<newton> danbri would the cookbook be like a primer?

PWinstanley: what is the (tool) support for creating these annotations?

dsr: what is our adivse on choosing and using such tools?

PWinstanley: there is a commercial potential for creation and provision of such tools and services

<danbri> +1

<kcoyle> PROPOSED: accept ID40 as a use case for a non-normative document

<Ine_> +1

<Thomas> +1

<annette_g> +1

<danbri> +1

<LuizBonino> +1

<newton> +1

<alejandra> +1

<dsr> +1

<LarsG> +1

<kcoyle> +1

<Keith> +1

<Caroline> +1

<DaveBrowning> +1

RESOLUTION: accept ID40 as a use case for a non-normative document

<dsr> We should consider how open source projects could help with building both DCAT and schema.org markup

<antoine> belated +1

<riccardoAlbertoni> /me sorry I have to leave.. Thanks for the interesting discussion, See you tomorrow!

bye

<Caroline> thank you for participating riccardoAlbertoni

<danbri> [PWinstanley talking about https://twitter.com/nwplanet]

<danbri> dsr, I did have a conversation with someone in CKAN community about getting schema.org dataset markup into CKAN per-dataset landing pages. Idea would be to improve and publicise the existing DCAT addon rather than make a rival addon.

PWinstanley: suggesting to create a wiki on tooling support

Caroline: we will create an informal document on topic (cook book)

<antoine> +1 for discussing it now.

<kcoyle> scribenick: kcoyle

<Caroline> https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID18

<LarsG> scribe: kcoyle

andrea: ID18; there are three other use cases that mention same problem
... problem is that in many cases your distribution is not a direct file download
... could be an api or another service

<danbri> [q: (Makx?) how would I find DCAT from a page like view-source:https://www.europeandataportal.eu/data/en/dataset/air-pollution-monitoring-data-dublin-city?]

andrea: the issue is that both machines and humans what happens when you follow the link
... the response from the api may be an error or doesn't make sense to you
... this is a main issue left open by DCAT 1.0
... there was once a subclass of dcat:Distribution -> dcat:WebService, but this was dropped
... this is a big problem that users have - when they don't get the data back they are confused
... a sparql endpoint, they get multiple datasets back

dsr: this is about what people expect from a search

Jaroslav_Pullmann: dynamic distribution - let the data be pushed

danbri: does the group consider finding commercial datasets? find out that it exists and how much you pay for it

Jaroslav_Pullmann: there could be domain-specific solutions
... or templated urls

<dsr> where we need to give the url parameters some kind of semantics

LarsG: We are not limiting ourselves only to open datasets; it's about finding
... this is how Europeana works; you can find things but they may be behind a firewall

<DaveBrowning> +1 for Lars

AndreaPerego: find a way to model the info in a domain-independent way, a minimal set
... 2 main things: 1) distribute is not direct, uses API / service
... 2) type of service - specify with a code type of endpoint/service
... even this small amount of info would be helpful to people
... and could be used by software engines if know the code

<Ine_> +1 for Andrea

<Zakim> AndreaPerego, you wanted to say that it may be worth finding a domain-independent solution

AndreaPerego: what is missing is that minimal info

Keith: 1) searching an individual dataset
... worst case 2) complex API with distributed data
... are we going to describe APIs or datasets?

<AndreaPerego> Yep, the API description is the complex bit.

<danbri> suggesting that https://www.w3.org/TR/vocab-data-cube/ covers some of Keith's (1.).

dsr: links to goals of WoT in W3C - links to general services and domain-specific situations
... dcat needs to say - the type of this is an api. beyond that is outside of dcat

LuizBonino: in health area, data is electronic, access process is offline

<Zakim> LarsG, you wanted to say that antoine was accidentally kicked out of the queue...

<Caroline> so sorry antoine

antoine: asking Andrea if his use case includes sparql endpoints, because dcat has that solution

AndreaPerego: ? dcat has a solution for sparql end points?

antoine: there is a dcat access url that could be used for sparql endpoints

Makx: it's true that dcat says that this could be used with sparql end points but never says how

<dsr> DCAT should provide information about where to get further information about an API and if this is machine interpretable, what formats are supported, e.g. thing descriptions for the Web of Things, or schema languages for RESTful APIs

<antoine> https://www.w3.org/TR/vocab-dcat/, search for 'SPARQL' and this eventually gives dcat:accessURL

<AndreaPerego> dcat:WebService: https://www.w3.org/TR/2012/WD-vocab-dcat-20120405/#Class:_WebService

AndreaPerego: dcat:WebService was dropped from the document

<danbri> nearby, sparql, void etc: https://www.w3.org/TR/void/#sparql-sd

antoine: is sparql included in your use case?

AndreaPerego: no, not mentioned

<danbri> ... and then there is some literature around SPARQL as interface to data cubes e.g. https://jbiomedsem.biomedcentral.com/articles/10.1186/s13326-017-0112-6

<scribe> ACTION: Andrea will add SPARQL endpoint to ID18 [recorded in http://www.w3.org/2017/07/17-dxwg-minutes.html#action09]

<trackbot> Created ACTION-20 - Will add sparql endpoint to id18 [on Andrea Perego - due 2017-07-24].

<danbri> FWIW in last week's IoT/WoT discussions, RAML, Swagger (https://swagger.io/specification/) and JSON-Schema came up a lot.

<danbri> Peter: some don't handle both GET and POST params

dsr: as above, what comes back: file or msg? what service/more info do you get.
... is it machine-readable?

annette_g: what you get back ... can differ

<Thomas> +1

PROPOSE: Accept use case ID18 as in scope

<danbri> +1

<newton> +1

<DaveBrowning> +1

<Ine_> +1

<LuizBonino> +1

<antoine> +1

<alejandra> +1

<Caroline> +1

<annette_g> +1

<LarsG> +1

<Philippe> +1

<dsr> +1

<Makx> +1

<PWinstanley> +1

<Keith> +1

RESOLUTION: Accept use case ID18 as in scope

Acknowledgements

We gratefully acknowledge funding for lunch and coffee breaks from the VRE4EIC project.

DXWG Oxford Face to Face

17 Jul 2017

Attendees

Contents

Introductions

DCAT and "dataset"

DCAT data elements

data quality

DCAT general

https://www.w3.org/2017/dxwg/wiki/Use_Case_Working_Space#ID40

Acknowledgements

Summary of Action Items

Summary of Resolutions