See also: IRC log
<phila> Jacco and Phil made general opening welcomes
<phila> Keith: Follows slides which are self describing
<phila> scribe: phila
<scribe> scribeNick: phila
Keith: Talks about context of a
citation, which includes many facets
... Gets in a slight dig about 'the later Dublin Core'
[Not making many notes here as Keith's slides are comprehensive]
AndreaPerego: You said you have mapping, Keith. Temporal dimension seems to be missing?
keith: Yes, it's missing and we know that. Metadata often ignores temporal
AndreaPerego: Do you plan to add, maybe using PROV?
Keith: We're working on Prov-o with Kerry Taylor, and there's an ENVRI+ project
PeterW: Acronym hell. RDA means something else (Resource Description and Access)
Keith: Sorry, yes. resource Description
PWinstanley: So RDA is the body to use for this?
Keith: I did point out the acronym clash
ThomasDH: You talked about
locations and persons etc. In the EU context we have the Core
Vocs. I hope we can merge? Across Govt and science?
... Don't want differnet standards on different levels.
Keith: Yes, the VRE4EIC project
has that in its sights. CERIF has concept of declared
semantics.
... Doesn't say you must use these semantics, but provides
containers for semantics
<PWinstanley> s/Agaist/Against
AG: Talks about The HCLS
Community Profile: Describing Datasets, Versions, and
Distributions
... Talks about origin in OpenPHACTS project
... Highlights ChemBL versioning issues
... ChemBL was at version 13, but that number wasn't in our
data
... Still don't know if we used version 8 or 13 in
OpenPHACTS
... Includes provenance feature so you can see where data items
came from
... Now using ChemBL 20
[Slides are self describing]
AG: contrasts DC and VOiD as opposite ends of spectrum, neither met requirements for HCLS
-> https://www.w3.org/TR/hcls-dataset/ Dataset Descriptions: HCLS Community Profile
AG: Talks about mandatory and
optional properties
... This requires tooling
... Developed the validata tool, more on that tomorrow.
... Several implementations of HCLS profile
... Emphasises thjat we need to know about versions
Q: Adopted beyond your community?
AG: Not aware of it but it is generic and could be
Q: In latest version of DCAT-AP covers some of what you say
AG: isVersionOf didn't exist when we were doing this 4 years ago, glad it's in DCAT-AP
Jacco: Debate about whether version is in the URL?
AG: We don't say that, just that there should be different URLs for summary description, etc.
AndreaPerego: Introduces DCAT-AP
-> https://joinup.ec.europa.eu/asset/dcat_application_profile/description DCAT-AP
AndreaPerego: Introduces JRC
[Slide on JRC is self explanatory]
AndreaPerego: Talks about wide
variety of methods and standards. Some people asking what
metadata is
... Talking about citations. Some people don't care about their
data being cited.
... Prov used for complex/complete info
... On Data Citation
... Data reproducability is important for policy as well as
science
... Did a mapping exercise between DCAP-AP and DataCite
... Mostly good matches
... Agent Roles seems particularly hard
... May need a registry of roles to use across standards
... Skips to Publishing metadata on the Web
... Talks about mapping to schema.org
... Identified some gaps. But do we need to fill those in
schema.org?
... Do we need to publish all our metadata, or just what
improves visibility?
AxelPolleres: Is there any effort
to endorse identifiers like ORCID?
... The link with STORK etc. would be interesting, but there's
no initiative AFAIK
Keith: Often IDs are associated with a role, like ORCID and Driving Licence info
Ivan: Force11 had their general principles. Did you match against those?
AndreaPerego: Yes, we have looked
at that, and FAIR
... Trying to address practical issues
Ivan: Sure they're at a higher level
Markus: I'm release manager of
DBPedia
... We have a lot of data in our releases
[Slides include text]
<AxelPolleres> FWIW, further to my question… there seem to have been some efforts to e.g. link STORK (national eIDs) to ECAS, cf. https://www.eid-stork.eu/index.php?option=com_content&task=view&id=253&Itemid=83 … the reason why I had asked about links to ORCID is that many of the information you have to provide to the EU for ECAS overlap with info covered in ORCID, e.g. publications, grants, etc.
[Slides still self-explanatory]
Markus: Talks about core and extensions in DataID for things like statistics, you need extra fields
phila: You used ODRL a little, but not a lot. Is it lacking?
Markus: Nothing fixed yet, open to change
phila: GOod - ODRL on Rec Track now so speak up!
CM: Work with Soeren Auer
... Introduces Smart Services and Industry 4.0
... Talks about needing to be aware of privacy and some control
over data
... Want to build reference architecture for secure data
infrastructure, retaining sovereignty
... IDS = Industrial Data Spaces
... Industrial Data Space vocab as glue to capture
domain-spcific semantics
[Slide self explanatory]
CM: IDS defining own protocol
-> http://ids.semantic-interoperability.org/ The Industrial Data Space Metadata Vocabulary
Q: How specific is this to industrial data? Can it work in other domains?
CM: It's about requirements, like security. Things like which vocabs to use for different tasks, not domains
PeterW; have you though of entity resolution. What metadata to associate with their data? They may not know.
CM: No, we've not looked at that. We want to partner with data publishers and help them make their data more easily found on the Web.
nandana: 2 use cases we have
problems with
... Discovery, I don't want to spend a lot of time
searching.
... data for training machine learning is hard to find
automatically
... Another use case - if Imn an ontology engineer, I'd like to
see how my vocab has been used in a dataset, to see if my
conceptualisation matches reality
... e.g. where is the SSN Ontology used?
... This can be done using LOD stats but if you want to know
how they were used, ranges etc. that's harder
[slides descriptive]
nandana: Wraps up brief presentation and invites questions
AG: Can you give a statistical report about which properties and classes are linked
Keith: Big range of topics. Expressivity etc.
makx: I was hearing things like
there is this standard, but it didn't work for me.
... You need to be aware - we need to try and solve a problem.
DC tries to ;look at common problems, CERIF tries to go into
depth
... We spent most time trying to solve common problems. DC and
DCAT start simple and then people complain that things are
missing
... You can extend
... You come up with different requirements and you soon come
up with 50 properties that are never used
MF: I agree. The general approach
of DCAT has its benefits
... But there is important data misisng when we handle
datasets. For e.g. more specific prov info
... Basic pattern of catalogue, dataset and distribution has
prevailed
... But we shojld look at how to improve DCAT and that's whey
we're here
AG: In HCLS we didn't want to cme up with a standard, just a profile that used existing ones
PeterW: I find lots of people
talking about differente metadata frameorks but less about the
data that goes into them
... Some ilustrations of marked up stuff. If I have a dataset,
what are the frameworks that match the pattern of the data that
I have
... Maybe ML techniques can be used.
Keith: The papers have more. The RDA has a metadata standards group that is making a list of hte available metadata schemes. Nots of work coming from Digital Curatiuon Centre (see Kevin Ashley)
Q: Automatic machine readable to access the data itself, not just a URL
MF: Yes, this is a task for
us
... This problem came up a lot
... I'm hooing for insights from other directions
Q: Accessing satellite data for e.g. you need to restrict the access to specific subsets.
scribe: There are rest APIs like Swagger, but there's no predefined method
CM: Lotys of approaches for
describing services on the Web, but haven't had a lot of impact
for some raeson
... Maybe because they introduce complexity
nandana: Hydra CG is in thaty direction
Q: But that's very restricted to Rest.
AndreaPerego: We have a bar camp on this specific topic :-)
MF: It's a big issue. We're dealing with datasets usuall,y not endpoints
phila: Talks about subsetting issue Open Search etc.
Keith: You can't get into the data because it's too big so you don't know what to ask for
AndreaPerego: Talks about different levels that can be addressed. Need to include users
Keith: Geonetwork allows you to peek into the data to see if you're in the right area
CM: Working with industrial partners - tooling is very important. If you have a schema, you need the partners to tell you the detail you need
AG: We developed a very specific
tool that was user-driven
... focussed on user-friendliness so it's not easily
transferrable
<danbri> :)
<PWinstanley> could panel members please talk to the room rather then just among themselves - it's not easy to hear them without PA sytems
<danbri> there's a microphone on the smaller desk, is it not wired up?
<PWinstanley> @danbri: it is needed at the larger table for the group discussion
Call for greater clarity in some of the DCAT definitions. Also guidance, perhaps a primer. Take various national APs as input
<PWinstanley> @phila: https://lists.w3.org/Archives/Public/public-dwbp-wg/2015Jul/att-0010/DCAT-APimplementationguide.pdf needs to be updated
<danbri> can I respond to the google/schema question?
<danbri> ok will respond later
Dee is in charge, not me :-)
Discussion around data that is not published in rarely versions
Need to handle data that changes all the time (real time data etc.)
<AxelPolleres> +1 volatile/dynamic datasets probably need different metadata than “slower changing” datesets… where more versioning vocab is an issue.
<PWinstanley> mutable vs immutable datasets is relevant information
<AxelPolleres> there are different forms of “mutable”, e.g. (monotone) growing vs. actually changing… is that reflected in any of the existing vocabs?
<PWinstanley> @AxelPolleres: yes, that's my point
<AxelPolleres> for us (use case crawling and tracking changes/evolution) it would be very(!) useful if these were advertised.
<jrvosse> Danielle Bailo: what are the boundaries of DCAT?
<jrvosse> Andreas Kuckartz: DCAT seems less useful for describing binary programs
<jrvosse> Makx: Some people in the WG see DCAT as very general that can describe many things
<newton> PWinstanley and AxelPolleres it's a real issue, we had some discussions about it during DWBP meetings
<newton> I would like to see this addressed on the charter of a new WG
<AxelPolleres> newton, are you aware of any vocabs that actually define this difference? i.e., monotone groth vs. arbitrary changes, changeFrequency, groethrate, etc.?
<antoine> newton, Axel: there used to be a vocabulary for 'accrcual policies' at DC. Mayb e not the right granularity though.
<Caroline_> Present_ Caroline_
<jrvosse> Linda van den Brink from Geonovum on geospatial data
<jrvosse> ... a key problem is that people from outside the geo domain do not understand the standards we use
<jrvosse> * I'm scribing but feel free to add
<scribe> scribe: Jacco
<scribe> scribeNick: jrvosse
Linda is discussing a testbed testing use of mappings in the context of geoDCAT (see https://joinup.ec.europa.eu/node/154143/) and schema.org
see slides for testbed report
Q: Jacco: what do you think the key mission of a new WG be?
A: Lynda: Small core of a standard, for SDI coverage is really key, quality is also very important
Q: Phil: is something like the dcterms spatial concept core?
A: Linda: yes
Q: Daniele Bailo: Is the loss of data in the mappings really an issue for end users on the web?
L: Linda: Maybe not, for discovery it may not be a problem. There are levels of importance
Andrea Perego on GeoDCAT-AP
GeoDCAT-AP not replacing existing standards such as INSPIRE or ISO 19115 metadata for spatial, but providing extra interoperability by providing RDF-binding
scribe: need for http conneg on
profiles/schemas not just on format
... need to model dataset distributions, distinguish data sets
from data APIs
... need for best practices for quality-related descriptions,
there are too many patterns/standards
Q: Keith Jeffrey: need spatial coordidates both for what is observed and from where, what are your thoughts?
A: Yes, this is a difficult problem, also in crowdsource context and other contexts, but is not addressed at the metadata level, more at the level of the features
Q: Herbert: New iso spec "ResourceSync" from those that made PMH, but more "webby"
Herbert: I'm involved in signposting.org which is also relevant
Otakar Čerba joins panel
Otakar Čerba I'm here because we are developing a smart points of interest RDF dataset with 120M POI published, incl via a SPARQL endpoint
Daniele Bailo joins the panel
Daniele Bailo represents the EPOS geo ESFRI with lots of geospatial data
scribe: with many different types
of data, this needs to be reflected in the metadata
... need to think about who the audience is: general web users
vs scientists from specific domains?
... need to think about who the audience is: general web users
vs scientists from specific domains?
Q Bart: is just getting the metadata currently not too complicated already?
<AxelPolleres> remark Re: dataset vs service description - this is also to some extent related to the issue we mentioned before some time up in the chat about fast-changing/highly dynamic data (which may be rather seen as a service than a dataset)
A Andrea: Yes, for the general public ISO may be too much, especially if it is just for discovery purposes
scribe: for us , a dataset is what you decided to call a dataset
Linda: I see dcat as something for portals to find and reuse each other data sets, not necessarily as something for the end user
Daniele: I know the scientific user relatively well, typically does not want general web search. The "web use"r could be an software agent or human user.
Otakar: same experience, users often do not use metadata. We have many Czech data portals but few real users
Andrea: my students use Google also because they do not know where the data is, this also makes it important to publish data on the Web
Daniele: I agree, but I'm trying to understand the requirements for doing so. In my community people tend not to use persistent IDs or even URLs. This is a challenge/
Bart: high level conclusion could be that there is too much info from the data in the metadata
Otakar: we also need feature metadata in the geo spatial domain
Andrea: data quality is more general that just spatial, and solutions can be reused for other domains
Linda: spatial coverage is key for first discovery step, use of all other quality and prov metadata is part of a second step
Daniele: what is need is on the scientific side is a huge effort on data and metadata harmonisation
<deirdrelee> scribe: deirdrelee
Show Me The Way session
Searching for data session
Dmytro Potiekhin
CivicOS: Governance & Campaigning Data Standard
dmytro: worked in ukraine
... important to work with civil society and citizens is very
important when there is danger of falsification at
elections
... integrating data is an important issue to protect
democracy
... it is obvious w/out a proper voabulary describing needs of
civil society, this is impossible
... secondly, it is impossible to create such a vocabulary from
top-down approach
... e.g. even with vocabularies that the european commission
are working on
... this vocab or set of interoperability vocabs must be demand
driven
... something that is accepted by the citizens
... this is CivicOS
... if we can unite efforts around development of such
vocabularies, I am glad to help and this is what I am trying to
do with colleagues
... e.g. i am collaborating with the Stanford ??
Institute
... this is not just a problem for Ukranians, but it is a
global problem
... my final request would be, not to just give everything to
the governments.
... in democratic societies, it is okay for governments to have
all this technology, etc.
... but in countries still fighting for democracy, this can be
a problem
... for an example, there are often petitions to put pressure
on governments. but if the petition is done by the governments,
it is bureaucratic. and it is also giving them a contact list
of people that disagree with them
... undermining civil society and what they are trying to
achieve
... i encourage to keep developing vocabularies, but also to
retain the activation and development of civil society
kevin: questions?
... the situation at the moment isn't ideal for discovery and
interoperability of data, for the use-case you are talking
about - empowering citizens
dmytro: for the commercial part
it is working great, e.g. flight information automatically
added to google calender
... so standards are already working, but this needs to be
brought to our community
... in egovernment, we see this too. but we see a trend to
focus on egovernment, and not on egovernance or
ecivilsociety
... these platforms should be controlled by civil society, not
by dictators
... even if personal identity issues are resolved, there will
be interoperability issues
... and this integration should not be less successful than
government or commercial sectors
... we need to apply these commercial standards in the
government and civil society sectors
phila: you mentioned you wanted to integrate with schema.org
dmytro: we are experienced in the
structures of what makes civil society works
... but these are not described in schema.org
... e.g. we have a list of 200 different types on non-violent
actions
... the leading vocabularies only document about 5
... we would like to collaborate on the development of
vocabularies and how to incorporate into schema
danbri: you can just go ahead and
develop a vocabulary. we have built extensions that facilitate
that
... there are some generic descriptions that could potentially
in schema.org core, and for more detailed terms, extension
might be best
... but happy to chat
attendee: there is some similar
work being done in the US, by beth novack, called ???
... this can have an impact and is similar to what you were
talking about
<danbri> cf huridocs for human rights documentation
danbri: we were actually
discussing schema.org and the documentation of hate crimes last
week
... happy to continue discussions
Raf Buyle
Raf: representing the Flemish
Government
... we believe publi services should be centered around
citizens and businesses
... today if you ask for info online about opening times and
location of publicc building you get it
... we think this should go further, e.g. providing info on
using services
... need to link to base registries
... flemish governmetn is working on strategy to add markup to
government portals
... we have seen success with schema.org, etc. this can be a
bridge between public and private sectors
... the citizen wants to find the info on the public service
they want, regardless of public body providing it
... we are looking at using and extending open standards, e.g.
from W3C, ISA, OGC, etc.
... the European Interoperability Framework states that you
should look at all layers of interoperability, semantic,
technical, etc
... base registries are fundamental, but it is very difficult
to get this data on the web, to integrate it with the private
sector
... imagine if we could ask private company, like google, about
public services. where you could make an appointment, all the
information at a user's fingertips
... bridging between public and private sectors. schema.org is
working very well. it has been widely adopted
... this could be a strategy to get public services information
out there
... schema.org was first to discover data, but it is also used
for new data services, e.g. bing and google knowledge
graph
... we have a pilot [slide with architecture diagram]
... we would like to combine schema.org with ISA core
vocabularies
... rdfs:seealso pointing from a schema.org resource to a isa
core voc resource shows that more info is available
... we are waiting to rolling this our on local and regional
level
... on the one hand, we are saying it is not difficult to
annotate data in this way
... we also want to see if these annotations are picked up by
major search engines
... and also interested in seeing if the search engines will
pick up the extra ISA core voc info and display that as
well
... i have some questions [FEEDBACK slide with questions]
kevin: questions?
PWinstanley: what kind of mechanisms can we use to avoid false information getting into system?
Raf: i talked about a feedback loop. perhaps there could be a validation check comparing the original data and data being presented
Luis-Daniel Ibáñez
Luis-Daniel: For better data
search
... we carried out an analysis of data searches by talking to
data professionals and analysing logs from data portals
... [reads feedback from interviews - quotes from data
professionals]
... we found a lot of things that we discussed in previous
talks
... something maybe to highlight is users asking for a
summary/preview of data
... with quantitative results, mainly desktop devices,
etc....
... 68% of queries came from web search engines, suggesting
that dat search is a work-related activity and people are
relying on general-purpose search engines
... is this because people use what they know or data portals
are not doing their job properly? still open question for
us
... query characteristics show exploratory search, e.g. 'crime'
- show me all crime data, not specific query
Artemis Lavasa
Artemis: our aim is to capture,
analyse and preserve data
... we need to preserve the tools, processing steps, etc. we
capture everything
... we want to have as much context as possible so that we can
recreate thata in future
... we capture all that information via our forms
... we describe our information using a json-based schema
... it can handle complex metadata, which we have
... the data capture forms are rich, so can be very long and
vary a lot from experiment to experiment
... [showing slide of example metadata]
... we work closely with physicists and callibrate them
according to their needs
... in order to facilitate search, we need this metadata. e.g.
a physicist might want to look at a particular particle, so
looking at the title of the metadata is not sufficient
... we need intelligent search, very precise
... we played around with schema.org and json-ld. we could
describe the high-level information, but not specialised
fields.
... we would like to use a standardised approach
... i have tried to harmonise the schemas we have, but 80% of
fields are something unique to what a physicist wanted
Alejandra Gonzalez-Beltran
Alejandra: project funded by NIH
in the US
... DATS DatA Tag Suite is used to index data sources in
datamed
... [slide with online links to work]
... we focus on the findability and accessiblity of
datasets
... we rely on adoption by data providers
... we started by collecting lots of use-cases from the
community and by looking at existing schemas
... we considered multiple existing models, e.g. schema.org,
datacite, rif-cs, hcls, dcat, etc
... these models are lacking some elements in use-cases
... we also looked at domain-specific models from biomed
domain
... the DATS model is a combination of elements we needed
... we split the model into core entities (adopted elements
from datacite and Force) and extended entities
... we did a mapping to schema.org and looking at elixir
... there are adopters of DATS, implementing it in their
systems
... i would like to thank groups that were involved
Richard Nagelmaeker
RRSAgent: draft minutes
Richard: I would like to pitch an
idea to you
... when i started with Linked Data, the idea was to put all
data in one triple store
... the internet has DNS
... [shows slide with diagram]
... there is data that as an organisation you have control
over, and can have IRIs part of big picture
... but there is also information that as an organisation you
want to know, but is external to the organisation
... e.g. customers, suppliers, etc. but as they are external
they will have different IRIs
the issue is that behind a sparql endpoint will always contain the IRIs of the domain of the endpoint
scribe: DNS cannot help us
... but the problem is similar to what DNS solves, so could it
potentially help us find Linked Data IRIs?
... there are a number of building blocks, e.g. triple stores,
sparql endpionts, VOID
... results slide..... resolves the discrepancy between dataset
IRIs and IRIs of a SPARQL endpoint
kevin: what evidence do we have
that any of the efforts we've been talking about today will
help people find the data they want?
... if we don't have evidence, how can we get it?
richard: just do it!
kevin: instead of let's building it and see what happens, well with the web, once something is implemented, how can you measure before?
PWinstanley: there was a project in 2004 on bioinformatics [Cancer Bioinformatics Grid -- caBIG] that disappeared. Were the lessons learned from that picked up by alejandra's project?
alejandra: there was a heavy
load, you had to build uml model, tag with ontologies, etc. I
think the lessons learned is that there is a more 'webby'
approach, lighter, easier
... at least in biomedical databases, there is a lot of effort
in curation. many databases already have ways to find
data
... hopefully we will help people search across databases
kevin: luis, you have looked at what users are actually behaving on open data portals
luis: one observation, for the qualitative part we were with data experts, but with the quantitative part, it was open to all users. those that just wanted an answer, not necessarily 'data'
danbri: there are two very
different paths, one making data available to billions of
people, and making data available to the tiny minority of
people who want to analyse specific data
... both are important and can have huge impact, but very
different.
... ultimately, we want computer/google that knows the
information, not just the data file
attendee: for luis' presentations, the one-word searches might be more related to people just finding an answer, not that there is structured data behind them
luis: we also know what people actually click on, not just search
Andreas Kuckner: will technologies like sparql still play a role in ten years?
richard: it depends how you look
at IT. I think IT is a tool to help people. in this way i think
sparql will be there
... the way i look at neural networks, they are trying to do
something by themselves, this is a different kind of IT
Raf: if you can look at rdf and sparql, i think these are approaches, moreso than technology
Artemis: i think in one way or another we all use rdf, so if not in this form it will survive in some form
luis: i think neural networks
will learn how to use rdf
... but the big question...will neural networks replace us
all!
danbri: sparql is a very practical technology, which tend to stick around. I'm sure it'll be seen as a tool for using data, like sql and purl. but AI might increase more and more
kevin: will it be difficult to enrich data?
luis: it is important to know what has been done to the data, for example with crime data if the data was anonymised on purpose, should there be an effort to uncover the data that was removed?
alejandra: whatever the data is,
what we care about is finding patterns in the data ...
... [question to luis] because you were looking at user search,
were you constrained by keyword
luis: we wanted to see if people asked questions or used keyword search
phila: In Raf's case, data is
relevant to everyone in Flanders (public) and Artemis' case is
relevant to very specialised physicists
... danbri said that csv data can be incorporated into google's
knowledge graph using csv on the web
aremis: there is a cern data
portal, with huge data releases - TBs and PBs of data
... there is also private data, meant for collaborations
... the analysis is for specific purposes, people wanted to
preserve this, but it is very sensitive, it won't be opened.
most people also wont be interested in this data
... aim was to help physicists preserve their analysis. that
was the demand
Raf: why are public services data
important?
... 1. if I want to move to flanders
... and want to set up a business
2. for business intelligence - e.g. you can compare how flanders compares to other regions, e.g. for a place to live
scribe: if this data is on the Web, more people can use these public services, lowering the barriers
danbri: a good thing from private orgs is that even if they can't release the data, you can release software
kevin: and also release info about the dat ais there, so that people can follow up on potential data access
BartvanLeeuwen: Raf, you asked is
'annotated data the new datset'?
... so danbri is this something that will be possible
danbri: you can do
?productname?
... there will also be dataset search, there is a page onine
already, will distribute
... we are looking at research data, data portals, we'll see
what we can build
Raf: a lot of portals have feedback channels. should schema.org incorporate that?
danbri: maybe. schema.org is a dictionary, you have to build things with it. we have reviews, rating, etc
Attendee: Describing datasets properly is a problem. also the problem is mapping the questions from natural language to sparql for example. A lot of problems around data discovery relates to where the data is and what kind of data there is. Any comments on natural language to formal queries?
Alejandra: there is one pilot project who are looking into this question of how the user can find datasets
danbri: we did put
question/answer in schema.org,e.g. stack overflow. and
researchers are starting to pick up on that
... i would hope there would be more focus on social aspects of
open data portals, which could in turn help discoverability
attendee: What mechanisms can
address general search but also very focused search?
... how does this affect reproducability
Raf: we combine schema.org at a general level for discoverabilty, which we combne with the core vocabularies which help with the specifics
alejandra: in the curation
practices, generically it is very important to consider this.
it is very relevant to know higher level terms and dmore
specific terms that are speific
... for reproducabilty it goes much further, you not only need
the discoverability metadata, but also how the data was
prepared, etc.
danbri: we recentely added a field in schema.org variable
luis: to me there is the dataset
search levels in metadata
... but to answer a more detailed question, you have to go
deeper
... what is the effort involved in adding this to metadata
AndreaPerego: another piece of
information on helping to find the data is how the data is
being used
... e.g. feedback from users, this is important data
... datasets have been used for purposes other than their
original purpose
... people can see how other people have used the data and it
might help them decide if it's useful for them
Raf: if you knew information
about when people physically go to public services, this could
help advise when people should go
... info on how public services are used could help improve the
service provided
luis: i agree, it's important to know how data is being used, but it's difficult to convince users of this
alejandra: something very important is data citations. it is great to have it, but it is limitations
danbri: when datasets are used
and discovered, they can go to their funders and justify the
availability of data
... data citation in the scholarly sector is done, but it is
not common for example in media
... this might be turning point
alejandra: it is also important to have contact information
<newton> wonders if W3C WebMention spec could help with this issue of citation and data usage [ https://www.w3.org/TR/webmention/ ]
kevin: it is difficult to measure
if we had implemented something differently, how would impact
be different
... guidelines like the w3c dwbp and csv on the web have been
referenced a lot today, they're obviously very useful for the
community
RRSAgent: generate minutes
Time for wine and canapes!!!
<newton> +1 Dee
This is scribe.perl Revision: 1.148 of Date: 2016/10/11 12:55:14 Check for newer version at http://dev.w3.org/cvsweb/~checkout~/2002/scribe/ Guessing input format: RRSAgent_Text_Format (score 1.00) Succeeded: s/RDA means something else/RDA means something else (Resource Description and Access)/ FAILED: s/Agaist/Against/ Succeeded: s/PeterW/PWinstanley/ Succeeded: s/Dataset Description Mod/Dataset Description Models/ Succeeded: s/coves/covers/ Succeeded: s/ ion / on / Succeeded: s/me Caroline_ yes, apologies for not mentioning all of you!// Succeeded: s/front/smaller/ Succeeded: s/terribly/very(!)/ Succeeded: s/Lynda/Linda/ Succeeded: s/ISO/ISO 19115/ Succeeded: s/Hermert/Herbert/ Succeeded: s/Resources/ResourceSync/ Succeeded: s/This si/This is/ Succeeded: s/beuracratic/bureaucratic/ Succeeded: s/questiions/questions/ Succeeded: s/aboutj/about/ Succeeded: s/publi/public/ Succeeded: s/How we search for data? Towards User-Driven dataset descriptions/Topic: How we search for data? Towards User-Driven dataset descriptions/ Succeeded: s/roll/role/ Succeeded: s/bioinformatics ??/bioinformatics [Cancer Bioinformatics Grid -- caBIG]/ Succeeded: s/Caro// Succeeded: s/communityl/community/ Found Scribe: phila Inferring ScribeNick: phila Found ScribeNick: phila Found Scribe: Jacco Found ScribeNick: jrvosse Found Scribe: deirdrelee Inferring ScribeNick: deirdrelee Scribes: phila, Jacco, deirdrelee ScribeNicks: phila, jrvosse, deirdrelee Present: PWinstanley AndreaPerego nandana phila newton Ivan BartvanLeeuwen Caroline_ brandon damires LarsG Caroline Agenda: https://www.w3.org/2016/11/sdsvoc/agenda Got date from IRC log name: 30 Nov 2016 Guessing minutes URL: http://www.w3.org/2016/11/30-sdsvoc-minutes.html People with action items:[End of scribe.perl diagnostic output]