Smart Descriptions & Smarter Vocabularies (SDSVoc) Day 2 -- 01 Dec 2016

NB These are the raw minutes from day 2 of the workshop and may contain errors. A report will be published on the workshop website as soon as possible, likely to be in January 2017.

negotiation by profile

<tessel> ruben: necessity of content negotiation

<phila> scribe: Tessel

<phila> scribeNick: tessel

[slides]

[slides self-explaining]

Ivan: agree BUT: practicalities

content neg on server side: pain

scribe: partially technical, partially lobbying

Ruben: two mechanisms side to side possible
... rethink this as a basis

Eric: roll out without profiles
... later lobbying

Ivan: suboptimal other solution, deliver as any other resource
... we need fall back or guidelines
... http guideline, you need guidelines

Attendee: one resource, problem is providing link
... maybe querystring with format params

Ruben: appl. specific url will be used
... not interoperable
... apl. spec. url is risky

Jacco: I disagree :-), successful negotiation room for neg.
... client/server: switching

ruben: already discussion, so there is a possibility

Attendee: not just tech issue, groups of communities interested in linked data or not, difficult to reach agreement
... simple tech solution to use this as a basis
... big fan content neg., in this case using linkheaders where server provides a list of alternates
... imagine profiles here
... client picks format of choice
... diff. link, but link canonincal

Ruben: other form of negotiation
... diff. negotiations

LarsG: RFC2296 1998
... content neg. on MIME types
... there are uses for it
... precedent for future, this is what it should look like

Eugene: why do we have to negotiate? why segregate? why not send instance with full properties

Ruben: complex is more problematic

Attendee: content neg. not only neg. fall back options needed

Ivan: special header equires certain rights

danbri: can we distinguish more carefully on the profiles? look carefully why accept content neg.
... there are other options, what has gone wrong?

Ruben: publication for ages, needed there

danbri: accept header is huge failure, not used
... bit overstated, what about mobile?

ruben: timescale, publishing for ages, diff timescale

<AxelPolleres> my question/remark: content negotiation is cool, but hard to enforce, we need precedence/best practices to enable some methods for all possible publishers, e.g. CVSWEB defined 4 possible ways (in an order of precedence) to advertice CSV metadata: https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#locating-metadata

next speaker: LarsG

LarsG: violent disagreement here

[slides]

LarsG: i think we do need content negotiation
... we need a way for the client to choose resource
... five options [slides]
... getting the profile helps the client know what is there

Attendee: MIME cannot be touched, no attributes can be added
... it would be an enormous uphill battle

LarsG: also an n-squared problem, explosion of accept header
... RFC 6906, but we cannot state preferences with q-value
... q-value between 1-0, 1: really want
... xml doc with namespaces, each ns should conform to xml schema

herbert: not seen link header in request

LarsG: need to register profile headers
... none of the options is perfect

panel: topic content negotiation

antoine: before we start, maybe try recap value of solution, diff. flavors of data, need access to that
... we need ways to distinguish between flavors

Content Negotiation Panel

eric: measure against infrastructure where we do have a lot of control, where content neg. does work
... critical service, so people will work hard

herbert: agree with antoine, important problem
... use diff. xml schema, diff. formats, now we have the ingredients for coming with a solution
... simplicity of speccing it, will you find listening ear?
... that's why i said harshly don't touch mime
... adoption, certain paths might be easier

ivan: i don't agree with eric. we need a solution that works for simple people, who are not in a data warehouse
... we need something that works, fall back standardized from ideal solution
... we try to push problem server side
... client only expresses it
... other possibility: server gives options, clients choose

<AxelPolleres> +1 to have nice solutions spec’d PLUS best practices for fallbacks solutions to allow “best effort” workarounds for those publishers without control of the servers.

ivan: we have to be able to provide that, get back from dumb simple server that says what is has
... allow people tp publish data without touching server

herbert: does not require a manifest

ivan: manifest generally

ruben: it is possible to do both, risk fallback
... can become complicated
... fallback will be the easy solution
... i think it's important to invest in tooling, rather than fallbacks

phila: w3c a. others cannot say this is what you should do, does not work
... it has to work, simple option, people will take that
... if it does come down to new header, hard task

erci: do not raise bar for people who publish trivially, people who publish critical infrastructure, and people who try to predict future, three diff use cases

LarsG: fallback, ignore the acceptheader, if you don't have diff profiles. where can i ignore things?

antoine: every fallback creates problems choosing between diff. options
... one of the problems flavors many resources at the same time. extract links interested in.
... what will the profile be? we are looking at the definition, you cannot sya everything, if we work on speccing that stuff, need to be more precise

herbert: just want to add one caveat, something we ran into with memento,
... sppecify what has preference, profile in two media types, server can give only one

ruben: i guess maybe fallback is not that bad if you measure the consequences, show it is worth the extra investment
... what types of clients do we envision? what are the use cases? what else do they want to do?
... not publishing but also clients as consumers

antoine: next to techn. proposal a decision tree.

eric: i think the last slides had preference for new header, phil cannot do it, but it might fly

\herbert: no entire menu, you want one fallback solution

LarsG: you can combine diff. solutions

herbert: depend on link headers, not all
... under negotiation more possible
... just like to add, there is way to avoid new header, but it takes you a long way.
... not ideal, but takes you there

phila: track process, RFC not w3c, but there is room for cooperation
... clearly this needs fixing, there is a community

<AndreaPerego> scribe: AndreaPerego

<danbri> This is officially a permathread now, … we crossed the 20 year mark (!!) earlier this year - see Warwick Framework DC discussions, http://www.dlib.org/dlib/july96/lagoze/07lagoze.html

<scribe> scribeNick: AndreaPerego

Lightning Talks

Carlo Meghini: [introducing the session]

Configuring the EntryScape platform to effectively support Application Profiles

Matthias Palmér: [slides self-explicatory]

Duplicate Evaluation in the European Data Portal

Simon Dutkowski: [slides self-explicatory]

Keith Jeffery: The problem you're describing is basically about versioning. Also, I disagree that people would like to have only 1 version - having more than one can be relevant for different use cases.

Simon Dutkowski: The issue is also to detect whether it's a different version or a duplicate.

Challenges of mapping current CKAN metadata to DCAT

Sebastian Neumaier: [slides self-explanatory]

Sebastian Neumaier: We define our quality metrics on top of DCAT

phila: Would you like W3C to include CKAN mappings, possibly in an informative section?

Sebastian Neumaier: Well, this is not a problem of the DCAT specification, but of its implementation.

Carlo Meghini: So, in your experience, why not all with CKAN portals are not supporting DCAT

Sebastian Neumaier: Simply because they are not using the relevant CKAN extension

attendee: About the distribution issue you mentioned, shouldn't this rather modelled as a dataset?

Sebastian Neumaier: CKAN does not support the notion of sub/super dataset, so this would make implementation complicate.

s/rather modelled as a sub or super dataset?/rather modelled as a sub/super dataset?/

Interoperability between metadata standards: a reference implementation for metadata catalogues

Geraldine Nolf: [slides self-explanatory]

s/s\/rather modelled as a dataset?\/rather modelled as a sub\/super dataset?\///

Keith Jeffery: We are doing the same task in CERIF - bridging metadata standards. In Flanders there's an organisation FRIS using CERIF, it may be worth contacting them.

<tessel> FRIS is the missed name

phila: About contribution from standard bodies, W3C and OGC are collaborating on this. The problem is that it's the community that needs to be active on this, and adopt the developed solutions.
... Another point is: how we should public mappings?

Geraldine Nolf: Standard bodies should basically provide best practices on the use of different competing standards.

<danbri> scribe: Dan Brickley

<danbri> scribenick: danbri

Ronald Siebes (chairing) intro

ronald: previous session, a maze of standards, … to navigate and make best of opportunities you need tools

Christian Mader on version control systems

christian: overview on a product/tool we are developing at Fraunhofer, based on phd work from Lavdim Halilaj
... movation: in software development tools for versioning are well established, e.g. recently large impact of Github, with 35m repos, 40m+ users.

Tooling

christian: this is in software dev world. On the other hand, we have the need for developing/creating various vocabs e.g. schema.org, FOAF et al.
... these need some tooling in the background, for developing them in a collaborative way, as they go beyond what a single person can do
... in our research we are looking at distributed vocabulary development. Requirements: collaboration support, governance mechanisms with roles and permissions, we need communication methods, and to integrate quality assurance into the dev process and technical validation, testing, as in software development to verify vocab dev w/ competency questions.
... UX is very important, users need a good visualisation of what they're developing.
... we also have tools that support them on the vocab side. All these need integrating into a vocab dev environment. We came up with a tool called VoCol, opensourced on Github.
... it should give user support in the modeling task. A user creates a vocabulary, creating also instance data into a SPARQL endpoint
... database in b/g for testing purposes. Round-trip development supported by tool.
... we can cover whole lifecycle - authoring, validation, issue traking, …[etc - see slide]
... for all this we do not invent new systems, but look at version control systems that are currently available, notably git/github
... people use their own preferred editors
... people use those but submit content into version control / hosting
... with VoCol we extend these mechanisms by building w/ a VoCol server
... which in addition to existing mechanisms, when a vocol server is attached/configured, after changes are committed to repo, additional things happen -
... such as syntax validation, giving timely feedback on problems
... a monitoring service in b/g can generate documentation, visualizations of ontology, build up environment for querying the data etc.
... users don't need to care about these things they just work, on top of the basic usual environment
... [shows screenshot of one-time configuration page]

s/b\/g/background/

christian: we also provide a standard turtle-based editor
... directly in VoCol server on a Web page
... generates documentation based on existing tools
... runs documentation generator on each commit
... also visualization of ontologies generated
... integrated capability of querying the ontology and/or verifying instance data
... also can run various analytics on data you created
... e.g. histograms of types
... opensource licensed. Has been deployed in industrial settings, e.g. manufacturing company, 10+ people (knowledge engineers + domain experts)
... also R2RML support
... to conclude, VoCol builds upon (rather than replaces) existing version control systems, in a user friendly way. Can integrate several services. Extensible and loosely coupled architecture.
... based around git but could be applied to other protocols

[end]

Questions:

IvanHerman: Trying to understand exact setup, I can store/use my usual env on github, … set this up as we might e.g. use Travis?

christian: yes

IvanHerman: … so the service would generate the outputs e.g. into github via ghpages?

christian: currently visualizations etc are served as web pages but not currently commited into a repo

Lavdim Halilaj: there are a number of services directly provided by VoCol, and some services are limited by the nature of github currently

alistair gray: with your diffs, are you just doing github diffs, or at the semantic level, "ontology/logical diffs"

a: semantic diffs too

phil: this is v interesting to us. Would it be possible for the documentation to be pushed to e.g. W3C's site?
... one thing that's missing so far from my perfect wishlist is various kinds of usage statistics
... integration with lodstats?

a: yes, we could do this

scribe: we are also working more on test-driven development e.g. metadata for tests and constraints and their relationships

Lavdim Halilaj answered the last Qs too

danbri: do you support shexacl etc?

lavdim: not yet but could integrate

<Linda> Shexacl, a funny name indeed.

Validata: A tool for testing profile conformance

Alasdair Gray: presenting Validata, result of an undergraduate project, see slides for list of co-authors

context: HCLS Dataset Descriptions
... requirements came from W3C Interest Group, … requirement was that the tool would be published as static content on a W3C Apache http server

alasdair: expected both Web UI and an API, supporting property and data value constraints, different levels of user messages (error, warning, info), and configurable for different application areas (hcls, dcat, open phacts etc).
... you could also change the level of validation e.g. "check this against everything that is mandatory and recommended"
... highly configurable - not hardcoded for HCLS
... . example: A dataset must be declared as of type dctype:Dataset, must have a dcterms:title as a language typed string, must not have dcterms:createdDate

(see slide for shape graph)

if we change our example data to use pav:createdOn instead of dcterms:creationDate it can be ok per the above rule (although there is also a closed world which will complain when encountering unknown terms)

uses ShEx Validation

we added requirement levels to Shex shape, to have MUST vs MAY

scribe: to support warnings on missing parts (? missed exact detail)

see https://github.com/HW-SWeL/Validata https://github.com/HW-SWeL/ShEx-validator

live demo at http://hw-swel.github.io/Validata/

[shows demo]

loads built-in chembl dataset

shows nice feature that errors are linked to line in the input causing the problem, where appropriate

configuration file is a shape expression

it looks quite like turtle

there are still a few bugs and the original creators have now graduated, so encouraging opensource collaboration for further development

open source javascript implementation, can be run via node

we used shape expression 2 years ago, shacl was then in its early days, one option could be to migrate or keep going with shex

[end]

questions?

Ronald: Chembl - does it load the whole thing into your browser? can I express similar constraints in OWL?

alasdair: it is only loading a description of chembl, not the entire chembl dataset
... if you had 1 million versions of chembl or distributions you might have issues but we don't load core data

re OWL, Eric answers

eric: You can't fail minimum cardinality constraint with OWL, it is open world

danbri: in OWL your constraints are constraints about the world, not about the data description of the world

mysterious-voice-from-above: with constraint violation, have you considered offering guidance on fixing e.g. standardized recovery methods?

alasdair: we split into 2 parts. The validator via API just reports a problem, and then the user of the API, such as our own GUI, has to do something with that. We tried to make useful user error messages.

Ronald: how much is this depending on the UI? Can you make this available e.g. to integrate with VoCol?

alasdair: yes, feasible

----

Willem van Gemert and Eugeniu Costetchi - Towards Executable Application Profiles for European vocabularies

https://www.w3.org/2016/11/sdsvoc/agenda#p23

Willem: At publications office of the EU, we are an inter-institutional body, work for all the EU institutions, and publish for them, also access/re-use and preservation.
... the institutions send us publications and we disseminate them
... services include EUR-Lex, EU bookshop, EU Open Data Portal, Ted, EU whoiswho, CORDIS
... we needed to standardize the metadata we get from these different sources
... often the EU institutions cannot agree amongst themselves, so we - as an inter-institutional body - got involved with standardization efforts
... around 70 authority tables (with dereferencable URIs)
... concept URIs now de-reference :)
... working to improve this further

full paper: https://www.w3.org/2016/11/sdsvoc/SDSVoc16_paper_23

Willem: We also maintain EuroVoc (EU multilingual thesaurus)
... popular work includes alignment mappings to other controlled vocabularies
... using vocbench to maintain eurovoc; received ISA funding for this
... "Application Profiles" that we have developed -
... began towards a common representation rich enough to capture all that we need
... using SKOS but with other standards too

For EU controlled vocabs: SKOS-API-EU (SKOS, DCT, Lemon, LexVo, etc.). For EU Open Data Portal: DCAT-AP-OP (DCAT, ADMNS, DCT etc); for [see slides for rest!]

Eugeniu: I'm helping develop these vocabularies and putting them in practice
... one issue we bumped into was making sure things are published properly as intended, and to spot any errors early.
... context of workflow - the source format of authority tables is XML
... from this via XSLT, we make ATTO XML, SKOS (RDF/XML), IMMC (XSD), SKOS-AP-EU (RDF/XML)
... Issues. Need for RDF validation, to make sure it meets our profile. Separation between the documentation and our implementation.
... checking of integrity constraints on source data
... also data fingerprinting - what patterns are in the data, what regularities emerge? (kind of backwards validation)
... originally app profiles were like a recipe

(see nice slide)

Eugeniu: different approaches to RDF validation. People have tried [stretching] OWL reasoners, SPARQL, XSD schemas, … rule languages like SPIN, ShEx SHACL, SWRL, RIF, ...
... ShEx as predecessor of SHACL
... we translate the constraints of the app profile into a set of SHACL shapes
... we created a small commandline wrapper, makes things easy to automate in a publications workflow
... also generate HTML documents from from the shape definitions, simiilar to the technique Alasdair described above

e.g. we have for SKOS a set of constraints for anything that claims to be a skos:Concept

scribe: can create arbitrary graph patterns using shacl / sparql constraints
... our main constraints were property cardinality constraints, domain/range constraints, and "if p then Q" constraints, e.g. if an end date then also need a start date; if C1 replaces C2, then C2 must have a deprecated status

<jack> hi

Eugeniu: we also want to look at property/class patterns to analyse what's in the dataset
... e.g. can estimate app profile ingredients for dct:FrequencyClass

e..g this helps highlight other error forms e.g. a typo in namespace

Conclusions (see slide for details)

scribe: we use shacl as a source for our documentation and to validate the data
... helps us discover inconsistencies in the data and/or transformation rules
... we use it for skos-ap-eu but can be applied to any other ap (e.g. dcat-ap, org-ap)

but: is shacl stable enough? what to expect of the future?

—

Qs, direct and then general for all speakers (15 mins for questions)

antoine: checking points re shacl: do I understand well that you have a shacl specification for skos as part of your work?

eugeniu: not for SKOS itself but SKOS and a few other things in our AP
... skos plus eurovoc

<jack> need a quick suggestion

<jack> I have to develop a reservation app like the hotel reservation

<jack> is django suitable for a reservation app ?

eugeniu: using prev shacl, there was already a file for skos shapes in tool

jack, we're using this channel for live meeting notes

i.e. https://www.w3.org/2016/11/sdsvoc/

<jack> so soory iam here for the first time will log out

eric: domain of shapes … is data that I'm expecting a service to produce or produce (or have a consistent db on my disk)
... relationship w/ ontologies is combinational
... narrow down near infinite number of ways of combining vocabularies

eugeniu: shape definitions look v similar to OWL

scribe's aside: see also https://www.w3.org/TR/schema-arch and https://www.w3.org/2001/06/rdf-xproc/1 for history on this

eugeniu: like different dialects of same language...

ronald: q about the community. is there some kind of w3c-like approach? can people suggest changes in the vocabulary? how can constraints be over-ridden?

eugeniu: the way constraints are generated, we do not complain about new/un-anticipated content
... e.g. we can give minimal definitions for Person or Concept

eric: problem with open shapes, you don't spot typos
... but you can say, if you have multiple shapes that apply to something, you can notice triples that are not touched/used

Matthias Palmer: q: one thing we are doing w/ shapes … defining templates, one for each property, … and then re-use

scribe: overiding most often cardinality constraints
... when I did swedish dcat-ap templates I used several from different sources, amended them

eric: restricting or expanding?

matthias: mostly restricting

eugeniu: we have some auxhillary shapes e.g. dated with start/end; also mapped pairs of entities
... we can then say that corp body is a datedthign, a mappedthing etc. to invoke these re-usable shape definitions
... to restrict how the corp bodies table should look like

matthias: we made a tool formulator that helps profile development with rdf forms

eugeniu: one thing shacl does further, it allows shape templates to be defined
... e.g. whenever you find a property A, make sure you have a property B, else it is a faulty entitiy [description]

ChristianMader: there has been work on university of leipzig … w/ similar functionality

danbri: what should w3c do, anything blocking need fixing?

eugeniu: some things like shacl api impl we have some things we would like to do that we can't currently. A simple commandline tool would help. Others are making similar.

ivanherman: w3c itself wouldn't do that

eugeniu: indeed, community would do the tools

ronald: shex, shacl, … what's up?

eric/ivan/phila: [embarrassed noises]

eric: the biggest differences are that … way docs are presented. But in fundamental semantics differences, shacl builds on sparql extension mechanism and inference rules. shex has a core piece, a bnf-like behaviour if you see repeated properties, a and b or c and d you can only have one side of that (?). If we partition it to exclude this feature there there is potential for convergence.

[audience thanks the speakers]

Show Me The Way

<phila> scribe: Phil

<phila> scribeNick: phila

Discovering Open Data Standards, Deirdre Lee

Dee: Talks about value when data is integrated. Many standards from many places
... Highlights DWBP

[Slides]

Dee: Talks about IODC session
... Catalogue for discovery and evaluation of OD standards

[Slides self explanatory]

Dee: How can we bring this work into Ireland?
... Invites everyone to join in

PeterW: EC has had assessment method for standards for aeons, and have catalogue of them. Should this not be merged?

Dee: It's about building on what is there already
... Want to discover who is using the standard as well

PeterW: Cams has all of that

Makx: There is a standard for describing standards, it's called ADMS but we found that there was no interest in it
... Don't try to dilute the work, at least reuse it
... Things like usage etc. can be added to DCAT

Susanne: I am the maintainer of the CAMS method so we should def be talking
... It's based on ADMS which is based on DCAT.
... Joinup has the same standards described using ADMS and we found that we can merge it all easily.

Ronald: Are you talking about syntaxes, linksets? vocabularies? Which linkset is appropriate for the vocabs
... Do you want to standardise all 4 of them?

Dee: Diff people work in diff areas

Show me the Way Panel

Jacco: My high level summary is that a lot of good work has been done with more ahead.
... Let's star with the 2 people we haven't met yet

RW: Rebecca Williams from GovEx. Prior to that worked at the White House on OD policy, geospatial data, before that in data.gov
... Worked on US DCAT extension
... I think Europe is further ahead with DCAT-AP than the US

MD: Makx introduces self
... editor of DCAT-AP, StatDCAT-AP

Jacco: Asks for a one line opening statement of what needs to be done and by whom?

AndreaPerego: In INSPIRE - the adoption of geospatial standards - there are a number of issues. In EU context, guidnace in diff languages
... Not all actors are familiar with English
... Don't ask us to adopt a standard without the tools to use it
... We have ISO/OGC standards, but industry not always able to provide the tools to use latest versions of the tools.
... I'd like to know how the others have done that

Makx: I'm mencouraged by seeing so many people here building experiences with things like DCAT-AP
... When we defined DCAT we dodn't have any experience
... Now we see lots of people that need to ber changed/added - and W3C needs to do that, turn the implementatyion exp into DCAT 2.0

AG: We all think metadata is essential and fun. Data providers don't agree, it's a necessary evil. Need to create tools to help.

Dee: +1 to the before. Learn from the expereinces of diff sectors
... Some tools more advanced. How can we share those experiences
... Perhaps W3C can improve DCAT...
... But also the sharing of expereinces and tools that are availablem but that's more the community's responsibility

AL: CERN is a bit of a bubble, we don't expect DCAT to meet our requirementes. But s there a way to describe the actual Dataset wojuld be appreciated, not the context
... Were the ones that knw what the requirements are. Some tools and support would be useful.

RW: I'm part of the community and a Gov advocate. Tools important.

KJ: End user view of hte assets available is most important. Need to interoperate across hundreds/thousands of diff metadata standards
... We have to interoperate acorss them. The end user shojld see the assets through their own prism. We have to make it look like that.
... Who should do it? The research community and W3C

Jacco: I was expecting to hear - Google Rich snippets
... What is the role of schema.org?

AG: In the bioInformatics community - they're setting up a task force this year to deliver a small extn to schema.org to help find he long tail of datasets
... Want to use the search engines to find lesser known datasets

RW: data.gov, majority traffic is from search engine and that's from rich snippets.

KJ: Standardised vocabs are useful, but won't substitute domain-specific

AL: Hepdata has loosely used schema.org, if people were looking for a dataset, we wanted to see if they'd find it through Google. This wasn't possible
... as far as we know, tjhe only people who find it are the Google bots.
... We wanted to get users from Google - hasn't worked out so far.

Ronald: Who is familoar with SWagger/OpenAPI
... Years ago we had lots of conversations, then it died out, then swagger came out, which is just a JSON schema to annotate restful services
... Wasn't meant to be a standard, just got used. Good news was more stylesheets generated nice clickable UI
... Swagger also used to match make between services
... Maybe that example from the grassroots will be instructive
... Don't make it complicated
... Maybe then it becomes useful

PeterW: The prob with Swagger, as with DCAT, someone sets it up, then when you get to use it, there's a gaping hole that needs patching.

ivan: One thing I didn't hear... metadata is a pain for providers. Only answer I heard is that we need better tools.
... We need tools that use the metadata and convince people that it's useful.
... Then tools follow.
... What is the reason to have metadata?
... Always boils down to show me the users

MD: We shouldn't be too pessimistic
... The people who have implemented DCAT-AP are asking for things. They don't need convincing.
... EU portals harvets other portals
... No lack of data, maybe lack of tools. No lack of willingness to provide data.

AG: data providers could see benefit of metadata for openPHACTS. Even with tooling, it's too costly

ivan: An example in a different area. In the publication world, they have ONIX metadata standard, hundreds of pages of XML that publishers
... use. The tools are difficult to use but publishers know they need it for their own survival.

RW: The US experience - the federal Open Data policy was passed. DCAT wasn't finalised at that time.
... Used a simpler version. People asking for a JSON file - had to explain what JSON was
... Building tools as we went - converters and validatora
... All happened very slowly.
... Proprietary portals, Socrata was only supporter. Most top providers now support it.
... Lots of guidance needed. There are some tools, but we need more.
... The OD schema, lots of local governments don't know about it. Have to look to turnkey vendors
... who don't know either
... Need more mapping and more conversion tools

KJ: I agree with Ivan for the need for real life uses.
... Look at the SWIFT network - that works
... Amazon works
... Travel industry have metadata standards that work well across transport varieties
... These work because of financial gain. Ours does't work because it's an academic discussion

Daniele: We're setting up a hub to integrate data from different communities
... Big need was the need for guidance and tools
... We needed rich metadat to cover provenance.
... Many providers saying that they cold set up a CKAN instance. It's easy to download and install.
... We shouldn't expect domain specialists to be IT specialists

Dee: During the last 2 days, the DWBP guidelines have been mentioned a lot. They're guidelines. They're easy to understand, they're clear, the benefits are there
... We put a lot of focus on making them attainable across the board and that's helping with take up.
... SO maybe the next WG can dig into them a little deeper
... Maybe we need examples of the impact it can have

MD: Guidelines are necessary. But on the level of a W3C standard works well. People need guidance on how to do their work in their particular situation.
... DCAT didn't need selling as it provided a means to meet the end of making catalogues interoperable.
... It's the community that does the implementiung.

Jacco: Phil highlighted the 2 main communities here.
... We all believe thar research is gettying better if we share our data and society gets better if we share gv data. Lots of money involved but not clear where it is

KJ: The highest spending nations are spending 7% of GDP, most around 2-3%. It's a drop in the ocean.
... Lots of open gov data comes from research
... Looking at data.gov.* most is in PDF which is useless for research.

Jacco: Makx says he wants to do a DCAT 2.0 which could be an outcome of this workshop. What would it contain?

MD: There are things missing and errors.
... There things that have been left out
... service based data access, versioning and grouping

<tessel> s/comntain/contain

MD: There were some problems we couldn't solve in DCAT-AP because we couldn't change DCAT
... relationships, and maybe Best Practices

<PWinstanley> ok

ivan: Time stamps?

MD: Sure. I was talking about things that people were screaming about.
... Mustn't overstretch.
... take on what we can do

Jacco: Cna you be concrete about what's missing?

MD: Versioning, time series, inheritance... modelling of pav isn't standardised. Non-file datasets
... Clarifications of what a distribution is

Jacco: That sounds like 2.0, not 1.1

MD: We should add to what people have not start again

Geraldine: And taking one step back anbd looking at otrher standards?

MD: We'd want people from different communities
... So people can refer to their work

Dee: If there is a next WG, there is also scope for seeing if there are other vocabs that could be made recs, like Data Usage, dataset quality
... Shold they be brought forward? Others that could be standardised

Jacco: It's good to have a stamp of approval on a vocab/data

Dee: People want to see how stable a vocab is? is it persistent?

KJ: Always very worried about people talking about quality
... Ity's usually the supplier that describes quality

MD: The work that was done in the DWBP doesn't say what good and bad is, it givs a vocab for expressing what you think. And in DUV allows users to say what they think
... Ways of describing what the quality is

AndreaPerego: On the DQV and DUV, there are pieces that are missing. We have heard about quality in the last days
... The DQV and DUV provide solutions to this. But they're a bit complex - less flat than DCAT
... Maybe future work can be to provide guidance on how to use them

AG: We should also be wary of not expanding to cover everything that can ever say about a dataset. DCAT is extensible. See egs from spatial, life sceinces etc.

Kevin: Back up what Makx said about the way the DWBP handle quality
... It's a multi-dimensional things
... It allowed different people to make different assessments.
... bear that way of working in mind

Ronald: The Web is waiting for accountability
... Someone writes what I say but is it true? What the validation?
... Accountability is important
... Maybe on statemente/triple level
... Claiming ownership of the statements you make. JJC did did some work on that.

AG: There is some work in that area - Nano-publications
... You end up with a huge number of triples.
... Can be moved around. For one claim you get hundred triples
... UniProt is several billion triples

ivan: One thing that I haven't heard yet - we discussed this morning on HTTP headers etc. Is it something that W3C should take up? Leave to IETF? Leave it to Eric?

AG; If you go ahead with it, it's different from DCAT

scribe: Not same activity

<PWinstanley> phila: closing replarks

<PWinstanley> ... it seems clear that wokrk on DCAT is needed

<PWinstanley> ... pleased to see so menay communuties, but there is a need to get Americans involved

<PWinstanley> ... we need to be cross-domain

<PWinstanley> ... whether 1.1 or 2.0 isn't a W3C decision, but it's clear something has to be done. However REC track is too heavyweight

<PWinstanley> ... a new w/g should do it

<PWinstanley> ... on the IETF - ???

<PWinstanley> ... W3C don't write guidelines, but write primers. We can go down the road a little, but are mainly a standards body

<PWinstanley> ... W3C doesn't do anything - its members and the community do

<PWinstanley> Jacco: thanks to the audience for participation

Bar Camp session: Property: Legal Ground

Many web resources published by governments (datasets, but also service descriptions and legislation), are created and published because that government is obliged to do so by some regulation. Let's call that the 'legal ground' for today. In this bar camp session we would like to discuss the need and possibilities to re-use existing vocabularies to define properties that can relate resources to legislation. We will look for instance at the Core Vocabularies of ISA2, European Legislation Identifier and Akomo Ntoso. Any additional source is welcome.

Steps forward:

Invite others to contribute
Decide if it's profitable to create a community
Requirement specification
Findings of possible candidate vocabularies (ELI, CPSV, ORDL, AKOMA NTOSO, ...)
Make a proposal for new property
Find the level of standardization (national, EU, W3C)
Publish the specifications at the level decided

Bar Camp session: Illustrative data examples

Chair: Peter Winstanley

Agnieszka Zajac
Dom Fripp (scribe)
Markus Freudenberg
Keith Jeffrey
Anne Asserson
Beata Lisowska
(scribe error - there was someone else but he had no badge so did not get his name - sorry)

The two questions to answer:

1 Is there an agreement that there needs to be a gold standard datasets against which the effectiveness and efficiency of metadata standards and vocabularies can be compared and against which new standards could be created?
2. If it is, what are the next steps to get something off the ground to be persistent and sustainable? Like TREC (http://trec.nist.gov/) - gold standard datasets for the metadata models

Making a standardised platform of test data for metadata modelling

MarK: sometimes looking for detail on dataset that I can't find. Had to call them up and ask them about it. DBPedia is a good but non diverse use case

Using TREC analogy - how complex is the process of marking up a dataset? Need something comprehensive enough to illustrate how individual metadata schemas would be implemented and assess why one is better than the other in any use case?

Start with use cases but we need the proof that one schema is better than another to satisfy that use case.

Tooling and automation must be standardised against a common set of objects for analysis.

Some example use cases would also be handy in this case. When a standard is created there must have been some data involved that drove that process, is it possible to find that again and use it as a test corpus.

Test dataset ensures the intentions of the authors of that data.

Enable comparison between one version of dcat and another.

he dataset (and combination of datasets) need to be sufficiently complex to test. The results need to be packaged and published to the standards creators so they can assess the results against their standards.

Start with: do we have a dataset? Do you have a metadata standard you want to use? What is the use case?

E.g.'s from the group.

EU open data portal - want to publish metadata as multilingual as possible using euroVOC, automatic translation of certain fields. This dataset could be useful.

A: EuroVOC is a thesaurus but also is a dataset. This could be a useful test case as that sort of thing not immediately thought of as a dataset.

Take a subset of those and show a working of how they are mapped into cerif and how they get mapped into other standards?

How much metadata is common across research and how much is specific to the discipline/environment?

Often we only get metadata and some narrative, not a fully worked up example i.e. every aspect that was chosen and why.

DCAT and SKOS - become problems in the real world (not where they were created)

Need to have much more notes. SKOS has been around for a long time, it has worked well but over time then issues come up.

Need exemplars to help developers understand what good metadata on data looks like. Not adequate to guess what the intention of the metadata schema developer was.

What do the consumers want? And is the intention

Q2. Choose appropriate datasets and what do they look like?

Tabular and structured examples
Diverse as possible
Multimedia (datasets comprised of many datatypes)
Multi Datasets (multiple records rather than single)
Multilingual
Streaming data source
API

E.g. collection (rather than a dataset) - MOOC at University of Delph. Problem with that, how can that be delivered as a consistent chunk?

This would represent all of the conditions. And brings in a lot of the entities, person, org, etc. It is sufficiently complex to not be fitted by one standard.

Good for interoperability because these are linked and connected so could see the links between the metadata from the data.

Datasets from years ago, there is little metadata available (apart from the really big well curated ones) - having examples is good for showing people what good curation looks like and induces good behaviour in others to maintain the quality.

Worth the group finding datasets that fit the requirements. Do an inital data drop and PW to sift through?

Would it be worth picking chunks of websites? Or pick specific dimensions for a set of datasets (i.e. base a collection around multimedia not just pick one dataset that fits all conditions for test), diversity through statistics.

This would allow specific benchmarking against certain conditions of the metadata performance.

Hosting: taxonomy and diff should be easy to host. Need to rely on the people who download and test to host the results. Not very expensive. Closed areas? Everything should be open.

Tabular, teaching materials, uniprot - do you test across all three or specialise in a particular kind of data mix? Let the tester have the choice, allow flexibility in approach.

Volcanos and earthquakes

Who are the users?

For people who create standards
For people who implement standards
Training about modern metadata standards

It allows comparison between the needs of these users. Hard to compare the outputs of these groups without having a test set.

Problem is a lot of metadata standards are community focused. But these still need to be compared when it is claimed that these standards are suitable.

Are there lots of competing standards and vocabularies? This does happen, lot of people start off doing things then realise they move into the same space (try to do everything)

Now it's either comprehensive or minimal.

Application profiles: how do I determine what goes into an application profile?

Are we interested in examples of well described data but don't know the standard? (if there is one)

Highlight elements in datasets that are well done. (e.g. EEA, good work with versioning) - lots of people struggle with this element and struggle to get it into the model. Important to look for the elegant way of doing things.

Sustainability: if we are going to pick big chunks of websites (that contains datasets, information assets), where will we put it?

Categorisation of data is really important. How do we create define what the datasets are about? Not all have subject or are discipline specific.

Let users categorise downstream.

So these chunks of websites can be tagged independently by users / testers.

Where can the data be hosted?

W3C(?) - everyone can get at it and want it going in perpetuity. Need other bodies and groups to say that this data is an open, useful, valuable asset and then it can be frozen and used long term.

Makx (from summary session) One possible host could be the European Commission?

Start small and add to it over time. Promote the same way TREC does and use it for everything. Have it as THE standard place.

Five different kinds of datasets that can be tested on. Streaming data test also.

Some way of recording in the metadata how the data has been used would be really useful. Doesn't limit the usage though, people come to the datasets and use it how they want. (numerous examples from science given about how unexpected benefits can be found from outside of the original use)

Has to be applicable to a wide range of use cases. Have to have the depth to illustrate the complexity of what others have to do. Ultimately, these datasets will be linked to tools.

Example: Visualisation and preview has a positive effect on data usage.

Bar Camp session: Versions and archives - how to annotate and query

See separate minutes

Bar Camp session: Modelling service/API-based data access

Chair: Andrea Perego

Participants:

Makx Dekkers
Matthias Palmer
Christian Mader
Simon Dutkowski
Uwe Voges
Phil Archer (for some time)

Andrea: [Summarising the problem, starting from what proposed in the DCAT-AP IGs - see https://www.w3.org/2016/11/sdsvoc/SDSVoc16_paper_27#modelling-service-api-based-data-access]

Makx: Recap: we have files and endpoints/API - so this is about the distribution type. The type may be more than one.

All: [Discussion on the type of distributions from the OP's MDR and how to use them: http://publications.europa.eu/mdr/authority/distribution-type/ ]

Uwe: It may be not so straightforward whether it's a file or a service

Andrea: The distinction can simply be based on what we get back, a file or a query interface

All: [Discussing again about the distribution types from the OP's MDR: fine with downloadable file, fine with visualisation]

Matthias: It may be good to provide information on the "type of service", e.g., REST or SOAP - which is a different thing from the specific service interface - e.g., SPARQL, WMS, WPS

Makx: An option is to use the format, maybe

Simon: What we are still missing is how to query the service, and the relevant dataset

Makx: We have 4 levels: distribution type, service type, link to a service description, and a description of how to instantiate the service.

Simon + Matthias: [discussion about the need of being able to parametrise the call to the service to get the relevant dataset]

Matthias: I would also care having a textual, human-readable description.

All: [No problem to have it, but we have to decide whether to have also a machine readable description]

Phil: I think this came up frequently, and the issue is that there's no standard way of specifying the relevant data subset.

All: [Discussion on the possibility of using URL templates to specify parameters as a general approach - see https://tools.ietf.org/html/rfc6570]

Andrea: Trying to make the point:

We agree to have the distribution type (dct:type) - see MDR NAL: http://publications.europa.eu/mdr/resource/authority/distribution-type/
We agree to have a free text description to inform humans this is not a file and, possibly, how to use the service / API. A possibility is to use dct:description
We agree to have the "service macro-type" (we can also call it "service category" or "service protocol") - SOAP, REST, etc. - but we have to decide how to model this.
We agree to have the "service type" (dct:conformsTo) - WMS, SPARQL - see, e.g., https://github.com/OSGeo/Cat-Interop/blob/master/LinkPropertyLookupTable.csv
We agree to have the template URL but (a) we need a specific property for this and (b) this does not address POST requests
We agree to have the information used to instantiate the template URL - to be investigated how this information can be found / derived

Barcamp session closes. Results reported to the plenary by Matthias.

Smart Descriptions & Smarter Vocabularies (SDSVoc) Day 2

01 Dec 2016

Attendees

Contents