W3C

Disposition of comments for the Data on the Web Best Practices Working Group

paged view

Not all comments have been marked as replied to. The disposition of comments is not complete.

In the table below, red is in the WG decision column indicates that the Working Group didn't agree with the comment, green indicates that a it agreed with it, and yellow reflects an in-between situation.

In the "Commentor reply" column, red indicates the commenter objected to the WG resolution, green indicates approval, and yellow means the commenter didn't respond to the request for feedback.

CommentorCommentWorking Group decisionCommentor reply
LC-3036
# Preservation
There are existing guidelines about the process of preservation itself. Those could be cited to guide people on how to do preservation. There is also a lot of repositories that exist to preserve data at different levels (institution, national, ...).
There should be something there! In terms of BPs, the following points should be addressed:
* As a data publisher, do you want to, or have to, preserve your data ?
* If yes, what to preserve ?
* Who to give it to ? Only to one archive or several ? One could be mandated to do preservation whatever is quality as an archive is. There are existing certifications (DSA, etc) that can be used to help publishers make informed choices about who to trust.
* Think about the level of access for the preserved copy (public, private, ...)
* The type of data matter for preservation. Publishers need to be aware of that. It is also important to think about preserving with context and thus push not only a dataset alone but also preserve the resources that are needed to make sense of it (documentation, schemas, ...)
A data preservation section was included in the document.
http://w3c.github.io/dwbp/bp.html#dataPreservation
yes
LC-3003 Andrea Perego <andrea.perego@jrc.ec.europa.eu> (archived comment)
1. BP-1 ("Document data") seems to mix two different requirements:
(a) publishing data documentation (metadata)
(b) publishing metadata in human-readable formats
Is this correct?
In such a case, shouldn't these be rather addressed by two different
BPs? The requirement of publishing metadata shouldn't necessarily
address *how* this is done. This would also be inconsistent with the
fact that the requirement about publishing metadata in
machine-readable formats is addressed by a specific BP (BP-2).
Text is being updated, Bernadette has been in touch with Andrea. Resolved at f2f 2015-04-13 yes
LC-3004 Andrea Perego <andrea.perego@jrc.ec.europa.eu> (archived comment)
2. BP-2 ("Use machine-readable formats to provide metadata"), section
"Intended outcome":
"It should be possible for computer applications, notably search
tools, to locate and process the metadata easily, which makes it human
readable metadata, machine readability metadata."
(a) It is unclear why this "makes it human readable metadata".
(b) There's probably a typo in "[... ] machine readability metadata" -
shouldn't this rather be "[...] machine readable metadata"?
At f2f 2015-04-13 yes
LC-3005 Andrea Perego <andrea.perego@jrc.ec.europa.eu> (archived comment)
3. BP-2 makes the point about the use of machine-readable formats for
data discovery via software agents, including search engines. It
points also to specific machine-readable metadata serialisations that
can be embedded in human-readable metadata, and that are currently
used by search engines to optimise discovery. However, I have two
questions on this:
(a) Shouldn't be a requirement for human-readable metadata to *always*
embed their machine-readable version? This also when machine-readable
metadata are available separately. I see a couple of use cases for
this - e.g., optimising discovery via search engines, existing browser
plug-ins able to read RDFa, etc.
(b) Do you think that the requirement of being "discoverable" by Web
search tools should be extended to data? BP-12 partially address this,
but not explicitly. I'm asking since this issue may be relevant to the
SDW WG - see http://www.w3.org/2015/spatial/wiki/BP_Requirements#Content_need_to_be_crawlable.2C_then_able_to_ask_search_engine_or_other_service
All linked in with the wider issues around metadata which are bing updated. yes
LC-3048 Annette Greiner <amgreiner@lbl.gov> (archived comment)
Data enrichment
Issue: to discuss about enrichment yields derived data, not just metadata. For example, you could take a dataset of scheduled and real bus arrival times and enrich it by adding on-time arrival percentages. The percentages are data, not metadata.
Issue: to discuss about the meaning of the word “topification”.
The section was reformulated. yes
LC-3046 Annette Greiner <amgreiner@lbl.gov> (archived comment)
* Data Identification
Issue: to discuss about limiting this section to information that applies to publishing *data*.
yes
LC-3049 Annette Greiner <amgreiner@lbl.gov> (archived comment)
Data Identification
Issue: to discuss about limiting this section to information that applies to publishing *data*.
yes
LC-3050 Annette Greiner <amgreiner@lbl.gov> (archived comment)
Provide data up to date
Issue: to debate if the goal should be to adhere to a published schedule for updates.
Duplicated with LC-3047 tocheck
LC-3037 Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived comment)
# Preservation
There are existing guidelines about the process of preservation itself. Those could be cited to guide people on how to do preservation. There is also a lot of repositories that exist to preserve data at different levels (institution, national, ...).
There should be something there! In terms of BPs, the following points should be addressed:
* As a data publisher, do you want to, or have to, preserve your data ?
* If yes, what to preserve ?
* Who to give it to ? Only to one archive or several ? One could be mandated to do preservation whatever is quality as an archive is. There are existing certifications (DSA, etc) that can be used to help publishers make informed choices about who to trust.
* Think about the level of access for the preserved copy (public, private, ...)
* The type of data matter for preservation. Publishers need to be aware of that. It is also important to think about preserving with context and thus push not only a dataset alone but also preserve the resources that are needed to make sense of it (documentation, schemas, ...)
yes
LC-3038 Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived comment)
# Feedback
This section should also relate to preservation. One way to do it is to list stakeholders around preservation (see RDA for an impression).
BP: there should be identifiers to give feedback on a specific part of the data
BP: Use feedback as data enrichment, e.g. crowd annotation
yes
LC-3039 Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived comment)
# Metadata
Need to say where the taxonomy comes from. The document speaks about 3 types instead of the 5 commonly observed. The two missing ones are preservation metadata (how, where, ...) and technical metadata (EXIF,...)
BP: Use standard terms but then make extensions public when they are needed
To update the document to be more generic about the different types of metadata. Christophe said: "It should also be ok saying there are several type of metadata without going into much details." yes
LC-3040 Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived comment)
# Data quality
Does this applies to data or metadata ?
There is a lot of granularity aspects in data that need to be taken in account
How do you define quality ?
Completeness of the data is not related to quality. There should be an element of comparison to check the completeness against something (e.g. "data is complete according to EDM")
There should be something about Quality VS Usability, partly because fitting data into quality standards can lead to loosing important data (mainly everything that does not fit)
yes
LC-3053 Erik Wilde <dret@berkeley.edu> (archived comment)
- "Best Practice 14: Provide data in multiple formats" might want to say
if that should be done by different URIs, or one URI and HTTP conneg.
that's a very typical question publishers have, so it should be
mentioned at the very least, even if the answer is "we have no specific
recommendation either way".

- "Best Practice 14: Provide data in multiple formats" should say that
for fragment identifiers to be consistent across formats, care is needed
to make sure that this is the case (as much as possible, depending on
the formats and their features).
tocheck
LC-3055 Erik Wilde <dret@berkeley.edu> (archived comment)
- best practices 24 and 27 kind of conflict. one important idea of REST
is to avoid versioning, and having versioned URIs is a pretty certain
sign of bad design smell when it comes to media types and API design.
tocheck
LC-3033 Maurino Andrea <maurino@disco.unimib.it> (archived comment)
Please consider that this BP is strictly related to the data quality bp due to the fact the way in which are calculated temporal-related quality dimensions and such two BP must be correlated and coherent
This BP concerns how to keep data up to date instead of providing information if data is being updated as expected.

The discussion about data quality assessment is in the scope of the Data Quality Vocabulary (http://w3c.github.io/dwbp/vocab-dqg.html).
yes
LC-3028 Maurino Andrea <maurino@disco.unimib.it> (archived comment)
Issue 6: IMHO there is the need that at least a very well defined subset of metadata terms MUST be described by means of standard terms and consequently if they must be expressed with well-known RDF vocabulary. Example of such mandatory list of metadata terms could include the owner, the type of license associated to the data, and date of last modification.
Changes were made on the metadata section and specific vocabularies are mentioned in the Possible Approach to Implementation section. yes
LC-3029 Maurino Andrea <maurino@disco.unimib.it> (archived comment)
According to the experience of Comsode project license is a mandatory requirement for publishing data on the web due to the fact without a license there is no clear indication about the limits (if any) of usability of such data and this lack significantly reduce the possibility to have a real web of data. It is possible to suggest that in case someone publishes data without license this will imply that such data can be consumed for free by both humans and machines but they cannot be modified, reused an so on without an explicit acceptation of the data owner.
I am not sure if we can make such suggestion because this may depend from the policies of the organization. I think we can only suggest that data license information should be available. yes
LC-3032 Maurino Andrea <maurino@disco.unimib.it> (archived comment)
This a big issue because if it is correct to protect the people's right to privacy there is also the "right to know" about activities realized by public administrations (for example legal sentences); In Italy, just as an example, personal information including salary related to person working in Public administration at higher level or consultants paid with public money must to be released as open data due to Italy transparency decree for 5 years (after such period there is "the right to be forgotten" that many of you known related to the google vs European Union case).
Some actions were taken to change BP for Sensistive Data. Changes will be made for the next version.
http://www.w3.org/2013/dwbp/track/actions/164
http://www.w3.org/2013/dwbp/track/actions/166
yes
LC-3030 Maurino Andrea <maurino@disco.unimib.it> (archived comment)
Issue 7 I suggest to draw some strategies related to how attach quality information. In some case such information are defined inside data (for example when the time of last modification of an item is part of the dataset itself), in other situations there are the need to express quality dimensions related to schema description only (e.g. conciseness of schema) , or related to the dataset. I also suggest (but it is clear that I'm a little biased on such topic :) ) to better describe how to describe the quality information (including quality dimensions, adopted quality metric, and quality value see for example as starting point [1])
To keep this discussion for the Quality and Granularity Vocabulary document. yes
LC-3031 Maurino Andrea <maurino@disco.unimib.it> (archived comment)
This is a crucial problem in particular in the case of linked data due to possible impact wrt. existing interlinked resources. Some good practice could be discussed
In the current version of the document there is a section for Data Versioning and the following BP:

http://w3c.github.io/dwbp/bp.html#VersioningInfo
http://w3c.github.io/dwbp/bp.html#VersionHistory

It is not in the scope of the document to propose BP specific for linked data. The proposed BP are generic.
yes
LC-3007 Herbert Van de Sompel <hvdsomp@gmail.com> (archived comment)
(1) vocabulary versioning

The Memento-related comments I made about Data Versioning apply
equally to Vocabulary Versioning. All approaches described in
<http://mementoweb.org/guide/howto/> apply to data and vocabulary. As
a matter of fact, when implementing Memento protocol support for both
data and vocabularies used in data, temporal versions of the data can
automatically be aligned with the temporally correct version of the
used vocabulary.
Resolved 2015-04-13 That we refer to Memento in the 'possible implementation' section of the relevant BP; resolve yes to the comment and action Newton to write to Herbert.
The action to Newton was done by Christophe Guéret. Christophe replied back to Herbert.
tocheck
LC-3034 Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived comment)
# Overall points
The document concerns more data publishers than it concerns consumers. This also seems to be reflected by the composition of editors/contributors, there should be more data consumers jumping in and adding BPs that matter to them.
"Data must be available in machine readable" -> only should, must is way too strong. Some data consumers may want to have access to data that is not machine readable (e.g. scanned old document) and not being only restricted to their machine-translated counterparts (e.g. OCRed old document)
yes
LC-3035 Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived comment)
# Data vocabularies
Issue 9 : we should stick to using "vocabularies"
Issue 10 : we should aim at being generic
BP 19: there is a problem in advocating for simplicity as this can prevent from having rich vocabularies. It could instead be suggest that publishers may provide vocabularies as rich as needed but strive at basing them on "simpler" ones (e.g. core ontologies / upper ontologies / ... ) to ensure there is always a minimum level of understanding. See, e.g. http://arxiv.org/abs/1304.5743 for a discussion about this.
The group discussed this topic and we got a consensus about the terms. yes
LC-3052 Erik Wilde <dret@berkeley.edu> (archived comment)
- when it comes to versioning, i am always recommending to focus on
openness and extensibility and have robust and well-defined models for
those (this almost always requires well-defined processing models for
data). this often avoids the need for versioning, which when done badly
will be a breaking change.

- when it comes to versioning, it is important to distinguish between
breaking and non-breaking versioning changes. this comes down to the
comment above: good openness and extensibility makes it easier to have
non-breaking versioning, which helps tremendously in decentralized
ecosystems.
Addressed: We now have a BP “Avoid breaking changes to your API” tocheck
LC-3056 Erik Wilde <dret@berkeley.edu> (archived comment)
- regarding best practice 30, i am wondering if
https://github.com/dret/I-D/blob/master/sunset-header/draft-wilde-sunset-header-00.txt
is something that might be worth mentioning in some form. this is
currently a pre-I-D draft, but maybe the general idea of communicating
resource availability is relevant for DWBP?
tocheck
LC-3008 Herbert Van de Sompel <hvdsomp@gmail.com> (archived comment)
(2) preservation

The Memento protocol can play a significant role in the realm of
access to preserved data, as is exemplified by its broad adoption by
web archives and the demonstration implementation of the DBpedia
archive. But it also plays a role in making preserved/captured
resources recognizable via the Memento-Datetime header (expresses
datetime of capture/preservation) and the HTTP Link header that
carries an "original" link that connects a preserved/captured resource
with the URI where it originally resided.
It was created an example in BP29 that uses Memento, and it is available on the current draft.

Christophe and Herbert made this: https://lists.w3.org/Archives/Public/public-dwbp-wg/2015May/0103.html
yes

Developed and maintained by Dominique Hazaël-Massieux (dom@w3.org).
$Id: single.html,v 1.1 2017/08/11 06:47:22 dom Exp $
Please send bug reports and request for enhancements to w3t-sys.org