W3C

- DRAFT -

Smart Descriptions & Smarter Vocabularies (SDSVoc) Day 1

30 Nov 2016

Agenda

See also: IRC log

Attendees

Present
PWinstanley, AndreaPerego, nandana, phila, newton, Ivan, BartvanLeeuwen, Caroline_, brandon, damires, LarsG, Caroline
Regrets
Chair
PhilA
Scribe
phila, Jacco, deirdrelee

Contents


Opening remarks

<phila> Jacco and Phil made general opening welcomes

VRE4EIC Project, CERIF

<phila> Keith: Follows slides which are self describing

<phila> scribe: phila

<scribe> scribeNick: phila

Keith: Talks about context of a citation, which includes many facets
... Gets in a slight dig about 'the later Dublin Core'

[Not making many notes here as Keith's slides are comprehensive]

AndreaPerego: You said you have mapping, Keith. Temporal dimension seems to be missing?

keith: Yes, it's missing and we know that. Metadata often ignores temporal

AndreaPerego: Do you plan to add, maybe using PROV?

Keith: We're working on Prov-o with Kerry Taylor, and there's an ENVRI+ project

PeterW: Acronym hell. RDA means something else (Resource Description and Access)

Keith: Sorry, yes. resource Description

PWinstanley: So RDA is the body to use for this?

Keith: I did point out the acronym clash

ThomasDH: You talked about locations and persons etc. In the EU context we have the Core Vocs. I hope we can merge? Across Govt and science?
... Don't want differnet standards on different levels.

Keith: Yes, the VRE4EIC project has that in its sights. CERIF has concept of declared semantics.
... Doesn't say you must use these semantics, but provides containers for semantics

<PWinstanley> s/Agaist/Against

Dataset Description Modelse

AG: Talks about The HCLS Community Profile: Describing Datasets, Versions, and Distributions
... Talks about origin in OpenPHACTS project
... Highlights ChemBL versioning issues
... ChemBL was at version 13, but that number wasn't in our data
... Still don't know if we used version 8 or 13 in OpenPHACTS
... Includes provenance feature so you can see where data items came from
... Now using ChemBL 20

[Slides are self describing]

AG: contrasts DC and VOiD as opposite ends of spectrum, neither met requirements for HCLS

-> https://www.w3.org/TR/hcls-dataset/ Dataset Descriptions: HCLS Community Profile

AG: Talks about mandatory and optional properties
... This requires tooling
... Developed the validata tool, more on that tomorrow.
... Several implementations of HCLS profile
... Emphasises thjat we need to know about versions

Q: Adopted beyond your community?

AG: Not aware of it but it is generic and could be

Q: In latest version of DCAT-AP covers some of what you say

AG: isVersionOf didn't exist when we were doing this 4 years ago, glad it's in DCAT-AP

Jacco: Debate about whether version is in the URL?

AG: We don't say that, just that there should be different URLs for summary description, etc.

Andrea Perego Using DCAT-AP for research data

AndreaPerego: Introduces DCAT-AP

-> https://joinup.ec.europa.eu/asset/dcat_application_profile/description DCAT-AP

AndreaPerego: Introduces JRC

[Slide on JRC is self explanatory]

AndreaPerego: Talks about wide variety of methods and standards. Some people asking what metadata is
... Talking about citations. Some people don't care about their data being cited.
... Prov used for complex/complete info
... On Data Citation
... Data reproducability is important for policy as well as science
... Did a mapping exercise between DCAP-AP and DataCite
... Mostly good matches
... Agent Roles seems particularly hard
... May need a registry of roles to use across standards
... Skips to Publishing metadata on the Web
... Talks about mapping to schema.org
... Identified some gaps. But do we need to fill those in schema.org?
... Do we need to publish all our metadata, or just what improves visibility?

AxelPolleres: Is there any effort to endorse identifiers like ORCID?
... The link with STORK etc. would be interesting, but there's no initiative AFAIK

Keith: Often IDs are associated with a role, like ORCID and Driving Licence info

Ivan: Force11 had their general principles. Did you match against those?

AndreaPerego: Yes, we have looked at that, and FAIR
... Trying to address practical issues

Ivan: Sure they're at a higher level

The Metadata Ecosystem of DataID

Markus: I'm release manager of DBPedia
... We have a lot of data in our releases

[Slides include text]

<AxelPolleres> FWIW, further to my question… there seem to have been some efforts to e.g. link STORK (national eIDs) to ECAS, cf. https://www.eid-stork.eu/index.php?option=com_content&task=view&id=253&Itemid=83 … the reason why I had asked about links to ORCID is that many of the information you have to provide to the EU for ECAS overlap with info covered in ORCID, e.g. publications, grants, etc.

[Slides still self-explanatory]

Markus: Talks about core and extensions in DataID for things like statistics, you need extra fields

phila: You used ODRL a little, but not a lot. Is it lacking?

Markus: Nothing fixed yet, open to change

phila: GOod - ODRL on Rec Track now so speak up!

Towards a Common Description Vocabulary for Industrial Datasets

CM: Work with Soeren Auer
... Introduces Smart Services and Industry 4.0
... Talks about needing to be aware of privacy and some control over data
... Want to build reference architecture for secure data infrastructure, retaining sovereignty
... IDS = Industrial Data Spaces
... Industrial Data Space vocab as glue to capture domain-spcific semantics

[Slide self explanatory]

CM: IDS defining own protocol

-> http://ids.semantic-interoperability.org/ The Industrial Data Space Metadata Vocabulary

Q: How specific is this to industrial data? Can it work in other domains?

CM: It's about requirements, like security. Things like which vocabs to use for different tasks, not domains

PeterW; have you though of entity resolution. What metadata to associate with their data? They may not know.

CM: No, we've not looked at that. We want to partner with data publishers and help them make their data more easily found on the Web.

Loupe - An RDF Dataset Description Model for Expressing Vocabulary Usage Patterns

nandana: 2 use cases we have problems with
... Discovery, I don't want to spend a lot of time searching.
... data for training machine learning is hard to find automatically
... Another use case - if Imn an ontology engineer, I'd like to see how my vocab has been used in a dataset, to see if my conceptualisation matches reality
... e.g. where is the SSN Ontology used?
... This can be done using LOD stats but if you want to know how they were used, ranges etc. that's harder

[slides descriptive]

nandana: Wraps up brief presentation and invites questions

AG: Can you give a statistical report about which properties and classes are linked

Discussion

Keith: Big range of topics. Expressivity etc.

makx: I was hearing things like there is this standard, but it didn't work for me.
... You need to be aware - we need to try and solve a problem. DC tries to ;look at common problems, CERIF tries to go into depth
... We spent most time trying to solve common problems. DC and DCAT start simple and then people complain that things are missing
... You can extend
... You come up with different requirements and you soon come up with 50 properties that are never used

MF: I agree. The general approach of DCAT has its benefits
... But there is important data misisng when we handle datasets. For e.g. more specific prov info
... Basic pattern of catalogue, dataset and distribution has prevailed
... But we shojld look at how to improve DCAT and that's whey we're here

AG: In HCLS we didn't want to cme up with a standard, just a profile that used existing ones

PeterW: I find lots of people talking about differente metadata frameorks but less about the data that goes into them
... Some ilustrations of marked up stuff. If I have a dataset, what are the frameworks that match the pattern of the data that I have
... Maybe ML techniques can be used.

Keith: The papers have more. The RDA has a metadata standards group that is making a list of hte available metadata schemes. Nots of work coming from Digital Curatiuon Centre (see Kevin Ashley)

Q: Automatic machine readable to access the data itself, not just a URL

MF: Yes, this is a task for us
... This problem came up a lot
... I'm hooing for insights from other directions

Q: Accessing satellite data for e.g. you need to restrict the access to specific subsets.

scribe: There are rest APIs like Swagger, but there's no predefined method

CM: Lotys of approaches for describing services on the Web, but haven't had a lot of impact for some raeson
... Maybe because they introduce complexity

nandana: Hydra CG is in thaty direction

Q: But that's very restricted to Rest.

AndreaPerego: We have a bar camp on this specific topic :-)

MF: It's a big issue. We're dealing with datasets usuall,y not endpoints

phila: Talks about subsetting issue Open Search etc.

Keith: You can't get into the data because it's too big so you don't know what to ask for

AndreaPerego: Talks about different levels that can be addressed. Need to include users

Keith: Geonetwork allows you to peek into the data to see if you're in the right area

CM: Working with industrial partners - tooling is very important. If you have a schema, you need the partners to tell you the detail you need

AG: We developed a very specific tool that was user-driven
... focussed on user-friendliness so it's not easily transferrable

<danbri> :)

<PWinstanley> could panel members please talk to the room rather then just among themselves - it's not easy to hear them without PA sytems

<danbri> there's a microphone on the smaller desk, is it not wired up?

<PWinstanley> @danbri: it is needed at the larger table for the group discussion

Call for greater clarity in some of the DCAT definitions. Also guidance, perhaps a primer. Take various national APs as input

<PWinstanley> @phila: https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Jul/att-0010/DCAT-APimplementationguide.pdf needs to be updated

<danbri> can I respond to the google/schema question?

<danbri> ok will respond later

Dee is in charge, not me :-)

Discussion around data that is not published in rarely versions

Need to handle data that changes all the time (real time data etc.)

<AxelPolleres> +1 volatile/dynamic datasets probably need different metadata than “slower changing” datesets… where more versioning vocab is an issue.

<PWinstanley> mutable vs immutable datasets is relevant information

<AxelPolleres> there are different forms of “mutable”, e.g. (monotone) growing vs. actually changing… is that reflected in any of the existing vocabs?

<PWinstanley> @AxelPolleres: yes, that's my point

<AxelPolleres> for us (use case crawling and tracking changes/evolution) it would be very(!) useful if these were advertised.

<jrvosse> Danielle Bailo: what are the boundaries of DCAT?

<jrvosse> Andreas Kuckartz: DCAT seems less useful for describing binary programs

<jrvosse> Makx: Some people in the WG see DCAT as very general that can describe many things

<newton> PWinstanley and AxelPolleres it's a real issue, we had some discussions about it during DWBP meetings

<newton> I would like to see this addressed on the charter of a new WG

<AxelPolleres> newton, are you aware of any vocabs that actually define this difference? i.e., monotone groth vs. arbitrary changes, changeFrequency, groethrate, etc.?

<antoine> newton, Axel: there used to be a vocabulary for 'accrcual policies' at DC. Mayb e not the right granularity though.

<Caroline_> Present_ Caroline_

<jrvosse> Linda van den Brink from Geonovum on geospatial data

<jrvosse> ... a key problem is that people from outside the geo domain do not understand the standards we use

<jrvosse> * I'm scribing but feel free to add

<scribe> scribe: Jacco

<scribe> scribeNick: jrvosse

Linda is discussing a testbed testing use of mappings in the context of geoDCAT (see https://joinup.ec.europa.eu/node/154143/) and schema.org

see slides for testbed report

Q: Jacco: what do you think the key mission of a new WG be?

A: Lynda: Small core of a standard, for SDI coverage is really key, quality is also very important

Q: Phil: is something like the dcterms spatial concept core?

A: Linda: yes

Q: Daniele Bailo: Is the loss of data in the mappings really an issue for end users on the web?

L: Linda: Maybe not, for discovery it may not be a problem. There are levels of importance

Andrea Perego on GeoDCAT-AP

GeoDCAT-AP not replacing existing standards such as INSPIRE or ISO 19115 metadata for spatial, but providing extra interoperability by providing RDF-binding

Time and Space

scribe: need for http conneg on profiles/schemas not just on format
... need to model dataset distributions, distinguish data sets from data APIs
... need for best practices for quality-related descriptions, there are too many patterns/standards

Q: Keith Jeffrey: need spatial coordidates both for what is observed and from where, what are your thoughts?

A: Yes, this is a difficult problem, also in crowdsource context and other contexts, but is not addressed at the metadata level, more at the level of the features

Q: Herbert: New iso spec "ResourceSync" from those that made PMH, but more "webby"

Herbert: I'm involved in signposting.org which is also relevant

Panel on Time and Space

Otakar Čerba joins panel

Otakar Čerba I'm here because we are developing a smart points of interest RDF dataset with 120M POI published, incl via a SPARQL endpoint

Daniele Bailo joins the panel

Daniele Bailo represents the EPOS geo ESFRI with lots of geospatial data

scribe: with many different types of data, this needs to be reflected in the metadata
... need to think about who the audience is: general web users vs scientists from specific domains?
... need to think about who the audience is: general web users vs scientists from specific domains?

Q Bart: is just getting the metadata currently not too complicated already?

<AxelPolleres> remark Re: dataset vs service description - this is also to some extent related to the issue we mentioned before some time up in the chat about fast-changing/highly dynamic data (which may be rather seen as a service than a dataset)

A Andrea: Yes, for the general public ISO may be too much, especially if it is just for discovery purposes

scribe: for us , a dataset is what you decided to call a dataset

Linda: I see dcat as something for portals to find and reuse each other data sets, not necessarily as something for the end user

Daniele: I know the scientific user relatively well, typically does not want general web search. The "web use"r could be an software agent or human user.

Otakar: same experience, users often do not use metadata. We have many Czech data portals but few real users

Andrea: my students use Google also because they do not know where the data is, this also makes it important to publish data on the Web

Daniele: I agree, but I'm trying to understand the requirements for doing so. In my community people tend not to use persistent IDs or even URLs. This is a challenge/

Bart: high level conclusion could be that there is too much info from the data in the metadata

Otakar: we also need feature metadata in the geo spatial domain

Andrea: data quality is more general that just spatial, and solutions can be reused for other domains

Linda: spatial coverage is key for first discovery step, use of all other quality and prov metadata is part of a second step

Daniele: what is need is on the scientific side is a huge effort on data and metadata harmonisation

<deirdrelee> scribe: deirdrelee

Searching for data

Show Me The Way session

Searching for data session

Dmytro Potiekhin

CivicOS: Governance & Campaigning Data Standard

dmytro: worked in ukraine
... important to work with civil society and citizens is very important when there is danger of falsification at elections
... integrating data is an important issue to protect democracy
... it is obvious w/out a proper voabulary describing needs of civil society, this is impossible
... secondly, it is impossible to create such a vocabulary from top-down approach
... e.g. even with vocabularies that the european commission are working on
... this vocab or set of interoperability vocabs must be demand driven
... something that is accepted by the citizens
... this is CivicOS
... if we can unite efforts around development of such vocabularies, I am glad to help and this is what I am trying to do with colleagues
... e.g. i am collaborating with the Stanford ?? Institute
... this is not just a problem for Ukranians, but it is a global problem
... my final request would be, not to just give everything to the governments.
... in democratic societies, it is okay for governments to have all this technology, etc.
... but in countries still fighting for democracy, this can be a problem
... for an example, there are often petitions to put pressure on governments. but if the petition is done by the governments, it is bureaucratic. and it is also giving them a contact list of people that disagree with them
... undermining civil society and what they are trying to achieve
... i encourage to keep developing vocabularies, but also to retain the activation and development of civil society

kevin: questions?
... the situation at the moment isn't ideal for discovery and interoperability of data, for the use-case you are talking about - empowering citizens

dmytro: for the commercial part it is working great, e.g. flight information automatically added to google calender
... so standards are already working, but this needs to be brought to our community
... in egovernment, we see this too. but we see a trend to focus on egovernment, and not on egovernance or ecivilsociety
... these platforms should be controlled by civil society, not by dictators
... even if personal identity issues are resolved, there will be interoperability issues
... and this integration should not be less successful than government or commercial sectors
... we need to apply these commercial standards in the government and civil society sectors

phila: you mentioned you wanted to integrate with schema.org

dmytro: we are experienced in the structures of what makes civil society works
... but these are not described in schema.org
... e.g. we have a list of 200 different types on non-violent actions
... the leading vocabularies only document about 5
... we would like to collaborate on the development of vocabularies and how to incorporate into schema

danbri: you can just go ahead and develop a vocabulary. we have built extensions that facilitate that
... there are some generic descriptions that could potentially in schema.org core, and for more detailed terms, extension might be best
... but happy to chat

attendee: there is some similar work being done in the US, by beth novack, called ???
... this can have an impact and is similar to what you were talking about

<danbri> cf huridocs for human rights documentation

danbri: we were actually discussing schema.org and the documentation of hate crimes last week
... happy to continue discussions

Raf Buyle

The Public Sector DNA on the web: semantically marking up government portals.

Raf: representing the Flemish Government
... we believe publi services should be centered around citizens and businesses
... today if you ask for info online about opening times and location of publicc building you get it
... we think this should go further, e.g. providing info on using services
... need to link to base registries
... flemish governmetn is working on strategy to add markup to government portals
... we have seen success with schema.org, etc. this can be a bridge between public and private sectors
... the citizen wants to find the info on the public service they want, regardless of public body providing it
... we are looking at using and extending open standards, e.g. from W3C, ISA, OGC, etc.
... the European Interoperability Framework states that you should look at all layers of interoperability, semantic, technical, etc
... base registries are fundamental, but it is very difficult to get this data on the web, to integrate it with the private sector
... imagine if we could ask private company, like google, about public services. where you could make an appointment, all the information at a user's fingertips
... bridging between public and private sectors. schema.org is working very well. it has been widely adopted
... this could be a strategy to get public services information out there
... schema.org was first to discover data, but it is also used for new data services, e.g. bing and google knowledge graph
... we have a pilot [slide with architecture diagram]
... we would like to combine schema.org with ISA core vocabularies
... rdfs:seealso pointing from a schema.org resource to a isa core voc resource shows that more info is available
... we are waiting to rolling this our on local and regional level
... on the one hand, we are saying it is not difficult to annotate data in this way
... we also want to see if these annotations are picked up by major search engines
... and also interested in seeing if the search engines will pick up the extra ISA core voc info and display that as well
... i have some questions [FEEDBACK slide with questions]

kevin: questions?

PWinstanley: what kind of mechanisms can we use to avoid false information getting into system?

Raf: i talked about a feedback loop. perhaps there could be a validation check comparing the original data and data being presented

Luis-Daniel Ibáñez

How we search for data? Towards User-Driven dataset descriptions

Luis-Daniel: For better data search
... we carried out an analysis of data searches by talking to data professionals and analysing logs from data portals
... [reads feedback from interviews - quotes from data professionals]
... we found a lot of things that we discussed in previous talks
... something maybe to highlight is users asking for a summary/preview of data
... with quantitative results, mainly desktop devices, etc....
... 68% of queries came from web search engines, suggesting that dat search is a work-related activity and people are relying on general-purpose search engines
... is this because people use what they know or data portals are not doing their job properly? still open question for us
... query characteristics show exploratory search, e.g. 'crime' - show me all crime data, not specific query

Artemis Lavasa

CERN Analysis Preservation

Artemis: our aim is to capture, analyse and preserve data
... we need to preserve the tools, processing steps, etc. we capture everything
... we want to have as much context as possible so that we can recreate thata in future
... we capture all that information via our forms
... we describe our information using a json-based schema
... it can handle complex metadata, which we have
... the data capture forms are rich, so can be very long and vary a lot from experiment to experiment
... [showing slide of example metadata]
... we work closely with physicists and callibrate them according to their needs
... in order to facilitate search, we need this metadata. e.g. a physicist might want to look at a particular particle, so looking at the title of the metadata is not sufficient
... we need intelligent search, very precise
... we played around with schema.org and json-ld. we could describe the high-level information, but not specialised fields.
... we would like to use a standardised approach
... i have tried to harmonise the schemas we have, but 80% of fields are something unique to what a physicist wanted

Alejandra Gonzalez-Beltran

DATS: dataset descriptions for data discovery in DataMed

Alejandra: project funded by NIH in the US
... DATS DatA Tag Suite is used to index data sources in datamed
... [slide with online links to work]
... we focus on the findability and accessiblity of datasets
... we rely on adoption by data providers
... we started by collecting lots of use-cases from the community and by looking at existing schemas
... we considered multiple existing models, e.g. schema.org, datacite, rif-cs, hcls, dcat, etc
... these models are lacking some elements in use-cases
... we also looked at domain-specific models from biomed domain
... the DATS model is a combination of elements we needed
... we split the model into core entities (adopted elements from datacite and Force) and extended entities
... we did a mapping to schema.org and looking at elixir
... there are adopters of DATS, implementing it in their systems
... i would like to thank groups that were involved

Richard Nagelmaeker

Linked Data needs a Data Location Service

RRSAgent: draft minutes

Richard: I would like to pitch an idea to you
... when i started with Linked Data, the idea was to put all data in one triple store
... the internet has DNS
... [shows slide with diagram]
... there is data that as an organisation you have control over, and can have IRIs part of big picture
... but there is also information that as an organisation you want to know, but is external to the organisation
... e.g. customers, suppliers, etc. but as they are external they will have different IRIs

the issue is that behind a sparql endpoint will always contain the IRIs of the domain of the endpoint

scribe: DNS cannot help us
... but the problem is similar to what DNS solves, so could it potentially help us find Linked Data IRIs?
... there are a number of building blocks, e.g. triple stores, sparql endpionts, VOID
... results slide..... resolves the discrepancy between dataset IRIs and IRIs of a SPARQL endpoint

panel

kevin: what evidence do we have that any of the efforts we've been talking about today will help people find the data they want?
... if we don't have evidence, how can we get it?

richard: just do it!

kevin: instead of let's building it and see what happens, well with the web, once something is implemented, how can you measure before?

PWinstanley: there was a project in 2004 on bioinformatics [Cancer Bioinformatics Grid -- caBIG] that disappeared. Were the lessons learned from that picked up by alejandra's project?

alejandra: there was a heavy load, you had to build uml model, tag with ontologies, etc. I think the lessons learned is that there is a more 'webby' approach, lighter, easier
... at least in biomedical databases, there is a lot of effort in curation. many databases already have ways to find data
... hopefully we will help people search across databases

kevin: luis, you have looked at what users are actually behaving on open data portals

luis: one observation, for the qualitative part we were with data experts, but with the quantitative part, it was open to all users. those that just wanted an answer, not necessarily 'data'

danbri: there are two very different paths, one making data available to billions of people, and making data available to the tiny minority of people who want to analyse specific data
... both are important and can have huge impact, but very different.
... ultimately, we want computer/google that knows the information, not just the data file

attendee: for luis' presentations, the one-word searches might be more related to people just finding an answer, not that there is structured data behind them

luis: we also know what people actually click on, not just search

Andreas Kuckner: will technologies like sparql still play a role in ten years?

richard: it depends how you look at IT. I think IT is a tool to help people. in this way i think sparql will be there
... the way i look at neural networks, they are trying to do something by themselves, this is a different kind of IT

Raf: if you can look at rdf and sparql, i think these are approaches, moreso than technology

Artemis: i think in one way or another we all use rdf, so if not in this form it will survive in some form

luis: i think neural networks will learn how to use rdf
... but the big question...will neural networks replace us all!

danbri: sparql is a very practical technology, which tend to stick around. I'm sure it'll be seen as a tool for using data, like sql and purl. but AI might increase more and more

kevin: will it be difficult to enrich data?

luis: it is important to know what has been done to the data, for example with crime data if the data was anonymised on purpose, should there be an effort to uncover the data that was removed?

alejandra: whatever the data is, what we care about is finding patterns in the data ...
... [question to luis] because you were looking at user search, were you constrained by keyword

luis: we wanted to see if people asked questions or used keyword search

phila: In Raf's case, data is relevant to everyone in Flanders (public) and Artemis' case is relevant to very specialised physicists
... danbri said that csv data can be incorporated into google's knowledge graph using csv on the web

aremis: there is a cern data portal, with huge data releases - TBs and PBs of data
... there is also private data, meant for collaborations
... the analysis is for specific purposes, people wanted to preserve this, but it is very sensitive, it won't be opened. most people also wont be interested in this data
... aim was to help physicists preserve their analysis. that was the demand

Raf: why are public services data important?
... 1. if I want to move to flanders
... and want to set up a business

2. for business intelligence - e.g. you can compare how flanders compares to other regions, e.g. for a place to live

scribe: if this data is on the Web, more people can use these public services, lowering the barriers

danbri: a good thing from private orgs is that even if they can't release the data, you can release software

kevin: and also release info about the dat ais there, so that people can follow up on potential data access

BartvanLeeuwen: Raf, you asked is 'annotated data the new datset'?
... so danbri is this something that will be possible

danbri: you can do ?productname?
... there will also be dataset search, there is a page onine already, will distribute
... we are looking at research data, data portals, we'll see what we can build

Raf: a lot of portals have feedback channels. should schema.org incorporate that?

danbri: maybe. schema.org is a dictionary, you have to build things with it. we have reviews, rating, etc

Attendee: Describing datasets properly is a problem. also the problem is mapping the questions from natural language to sparql for example. A lot of problems around data discovery relates to where the data is and what kind of data there is. Any comments on natural language to formal queries?

Alejandra: there is one pilot project who are looking into this question of how the user can find datasets

danbri: we did put question/answer in schema.org,e.g. stack overflow. and researchers are starting to pick up on that
... i would hope there would be more focus on social aspects of open data portals, which could in turn help discoverability

attendee: What mechanisms can address general search but also very focused search?
... how does this affect reproducability

Raf: we combine schema.org at a general level for discoverabilty, which we combne with the core vocabularies which help with the specifics

alejandra: in the curation practices, generically it is very important to consider this. it is very relevant to know higher level terms and dmore specific terms that are speific
... for reproducabilty it goes much further, you not only need the discoverability metadata, but also how the data was prepared, etc.

danbri: we recentely added a field in schema.org variable

luis: to me there is the dataset search levels in metadata
... but to answer a more detailed question, you have to go deeper
... what is the effort involved in adding this to metadata

AndreaPerego: another piece of information on helping to find the data is how the data is being used
... e.g. feedback from users, this is important data
... datasets have been used for purposes other than their original purpose
... people can see how other people have used the data and it might help them decide if it's useful for them

Raf: if you knew information about when people physically go to public services, this could help advise when people should go
... info on how public services are used could help improve the service provided

luis: i agree, it's important to know how data is being used, but it's difficult to convince users of this

alejandra: something very important is data citations. it is great to have it, but it is limitations

danbri: when datasets are used and discovered, they can go to their funders and justify the availability of data
... data citation in the scholarly sector is done, but it is not common for example in media
... this might be turning point

alejandra: it is also important to have contact information

<newton> wonders if W3C WebMention spec could help with this issue of citation and data usage [ https://www.w3.org/TR/webmention/ ]

kevin: it is difficult to measure if we had implemented something differently, how would impact be different
... guidelines like the w3c dwbp and csv on the web have been referenced a lot today, they're obviously very useful for the community

RRSAgent: generate minutes

Time for wine and canapes!!!

<newton> +1 Dee

Summary of Action Items

Summary of Resolutions

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.148 (CVS log)
$Date: 2016/11/30 16:56:05 $

Scribe.perl diagnostic output

[Delete this section before finalizing the minutes.]
This is scribe.perl Revision: 1.148  of Date: 2016/10/11 12:55:14  
Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/

Guessing input format: RRSAgent_Text_Format (score 1.00)

Succeeded: s/RDA means something else/RDA means something else (Resource Description and Access)/
FAILED: s/Agaist/Against/
Succeeded: s/PeterW/PWinstanley/
Succeeded: s/Dataset Description Mod/Dataset Description Models/
Succeeded: s/coves/covers/
Succeeded: s/ ion / on /
Succeeded: s/me Caroline_ yes, apologies for not mentioning all of you!//
Succeeded: s/front/smaller/
Succeeded: s/terribly/very(!)/
Succeeded: s/Lynda/Linda/
Succeeded: s/ISO/ISO 19115/
Succeeded: s/Hermert/Herbert/
Succeeded: s/Resources/ResourceSync/
Succeeded: s/This si/This is/
Succeeded: s/beuracratic/bureaucratic/
Succeeded: s/questiions/questions/
Succeeded: s/aboutj/about/
Succeeded: s/publi/public/
Succeeded: s/How we search for data? Towards User-Driven dataset descriptions/Topic: How we search for data? Towards User-Driven dataset descriptions/
Succeeded: s/roll/role/
Succeeded: s/bioinformatics ??/bioinformatics [Cancer Bioinformatics Grid -- caBIG]/
Succeeded: s/Caro//
Succeeded: s/communityl/community/
Found Scribe: phila
Inferring ScribeNick: phila
Found ScribeNick: phila
Found Scribe: Jacco
Found ScribeNick: jrvosse
Found Scribe: deirdrelee
Inferring ScribeNick: deirdrelee
Scribes: phila, Jacco, deirdrelee
ScribeNicks: phila, jrvosse, deirdrelee
Present: PWinstanley AndreaPerego nandana phila newton Ivan BartvanLeeuwen Caroline_ brandon damires LarsG Caroline
Agenda: https://www.w3.org/2016/11/sdsvoc/agenda
Got date from IRC log name: 30 Nov 2016
Guessing minutes URL: http://www.w3.org/2016/11/30-sdsvoc-minutes.html
People with action items: 

[End of scribe.perl diagnostic output]