Disposition of comments for the Data on the Web Best Practices Working Group

Not all comments have been marked as replied to. The disposition of comments is not complete.

In the table below, red is in the WG decision column indicates that the Working Group didn't agree with the comment, green indicates that a it agreed with it, and yellow reflects an in-between situation.

In the "Commentor reply" column, red indicates the commenter objected to the WG resolution, green indicates approval, and yellow means the commenter didn't respond to the request for feedback.

Commentor	Comment	Working Group decision	Commentor reply
LC-3036	# Preservation There are existing guidelines about the process of preservation itself. Those could be cited to guide people on how to do preservation. There is also a lot of repositories that exist to preserve data at different levels (institution, national, ...). There should be something there! In terms of BPs, the following points should be addressed: * As a data publisher, do you want to, or have to, preserve your data ? * If yes, what to preserve ? * Who to give it to ? Only to one archive or several ? One could be mandated to do preservation whatever is quality as an archive is. There are existing certifications (DSA, etc) that can be used to help publishers make informed choices about who to trust. * Think about the level of access for the preserved copy (public, private, ...) * The type of data matter for preservation. Publishers need to be aware of that. It is also important to think about preserving with context and thus push not only a dataset alone but also preserve the resources that are needed to make sense of it (documentation, schemas, ...)	A data preservation section was included in the document. http://w3c.github.io/dwbp/bp.html#dataPreservation	yes
LC-3003 Andrea Perego `<andrea.perego@jrc.ec.europa.eu>` (archived comment)	1. BP-1 ("Document data") seems to mix two different requirements: (a) publishing data documentation (metadata) (b) publishing metadata in human-readable formats Is this correct? In such a case, shouldn't these be rather addressed by two different BPs? The requirement of publishing metadata shouldn't necessarily address how this is done. This would also be inconsistent with the fact that the requirement about publishing metadata in machine-readable formats is addressed by a specific BP (BP-2).	Text is being updated, Bernadette has been in touch with Andrea. Resolved at f2f 2015-04-13	yes
LC-3004 Andrea Perego `<andrea.perego@jrc.ec.europa.eu>` (archived comment)	2. BP-2 ("Use machine-readable formats to provide metadata"), section "Intended outcome": "It should be possible for computer applications, notably search tools, to locate and process the metadata easily, which makes it human readable metadata, machine readability metadata." (a) It is unclear why this "makes it human readable metadata". (b) There's probably a typo in "[... ] machine readability metadata" - shouldn't this rather be "[...] machine readable metadata"?	At f2f 2015-04-13	yes
LC-3005 Andrea Perego `<andrea.perego@jrc.ec.europa.eu>` (archived comment)	3. BP-2 makes the point about the use of machine-readable formats for data discovery via software agents, including search engines. It points also to specific machine-readable metadata serialisations that can be embedded in human-readable metadata, and that are currently used by search engines to optimise discovery. However, I have two questions on this: (a) Shouldn't be a requirement for human-readable metadata to always embed their machine-readable version? This also when machine-readable metadata are available separately. I see a couple of use cases for this - e.g., optimising discovery via search engines, existing browser plug-ins able to read RDFa, etc. (b) Do you think that the requirement of being "discoverable" by Web search tools should be extended to data? BP-12 partially address this, but not explicitly. I'm asking since this issue may be relevant to the SDW WG - see http://www.w3.org/2015/spatial/wiki/BP_Requirements#Content_need_to_be_crawlable.2C_then_able_to_ask_search_engine_or_other_service	All linked in with the wider issues around metadata which are bing updated.	yes
LC-3048 Annette Greiner `<amgreiner@lbl.gov>` (archived comment)	Data enrichment Issue: to discuss about enrichment yields derived data, not just metadata. For example, you could take a dataset of scheduled and real bus arrival times and enrich it by adding on-time arrival percentages. The percentages are data, not metadata. Issue: to discuss about the meaning of the word “topification”.	The section was reformulated.	yes
LC-3046 Annette Greiner `<amgreiner@lbl.gov>` (archived comment)	* Data Identification Issue: to discuss about limiting this section to information that applies to publishing data.		yes
LC-3049 Annette Greiner `<amgreiner@lbl.gov>` (archived comment)	Data Identification Issue: to discuss about limiting this section to information that applies to publishing data.		yes
LC-3050 Annette Greiner `<amgreiner@lbl.gov>` (archived comment)	Provide data up to date Issue: to debate if the goal should be to adhere to a published schedule for updates.	Duplicated with LC-3047	tocheck
LC-3037 Christophe Guéret `<christophe.gueret@dans.knaw.nl>` (archived comment)	# Preservation There are existing guidelines about the process of preservation itself. Those could be cited to guide people on how to do preservation. There is also a lot of repositories that exist to preserve data at different levels (institution, national, ...). There should be something there! In terms of BPs, the following points should be addressed: * As a data publisher, do you want to, or have to, preserve your data ? * If yes, what to preserve ? * Who to give it to ? Only to one archive or several ? One could be mandated to do preservation whatever is quality as an archive is. There are existing certifications (DSA, etc) that can be used to help publishers make informed choices about who to trust. * Think about the level of access for the preserved copy (public, private, ...) * The type of data matter for preservation. Publishers need to be aware of that. It is also important to think about preserving with context and thus push not only a dataset alone but also preserve the resources that are needed to make sense of it (documentation, schemas, ...)		yes
LC-3038 Christophe Guéret `<christophe.gueret@dans.knaw.nl>` (archived comment)	# Feedback This section should also relate to preservation. One way to do it is to list stakeholders around preservation (see RDA for an impression). BP: there should be identifiers to give feedback on a specific part of the data BP: Use feedback as data enrichment, e.g. crowd annotation		yes
LC-3039 Christophe Guéret `<christophe.gueret@dans.knaw.nl>` (archived comment)	# Metadata Need to say where the taxonomy comes from. The document speaks about 3 types instead of the 5 commonly observed. The two missing ones are preservation metadata (how, where, ...) and technical metadata (EXIF,...) BP: Use standard terms but then make extensions public when they are needed	To update the document to be more generic about the different types of metadata. Christophe said: "It should also be ok saying there are several type of metadata without going into much details."	yes
LC-3040 Christophe Guéret `<christophe.gueret@dans.knaw.nl>` (archived comment)	# Data quality Does this applies to data or metadata ? There is a lot of granularity aspects in data that need to be taken in account How do you define quality ? Completeness of the data is not related to quality. There should be an element of comparison to check the completeness against something (e.g. "data is complete according to EDM") There should be something about Quality VS Usability, partly because fitting data into quality standards can lead to loosing important data (mainly everything that does not fit)		yes
LC-3053 Erik Wilde `<dret@berkeley.edu>` (archived comment)	- "Best Practice 14: Provide data in multiple formats" might want to say if that should be done by different URIs, or one URI and HTTP conneg. that's a very typical question publishers have, so it should be mentioned at the very least, even if the answer is "we have no specific recommendation either way". - "Best Practice 14: Provide data in multiple formats" should say that for fragment identifiers to be consistent across formats, care is needed to make sure that this is the case (as much as possible, depending on the formats and their features).		tocheck
LC-3055 Erik Wilde `<dret@berkeley.edu>` (archived comment)	- best practices 24 and 27 kind of conflict. one important idea of REST is to avoid versioning, and having versioned URIs is a pretty certain sign of bad design smell when it comes to media types and API design.		tocheck
LC-3033 Maurino Andrea `<maurino@disco.unimib.it>` (archived comment)	Please consider that this BP is strictly related to the data quality bp due to the fact the way in which are calculated temporal-related quality dimensions and such two BP must be correlated and coherent	This BP concerns how to keep data up to date instead of providing information if data is being updated as expected. The discussion about data quality assessment is in the scope of the Data Quality Vocabulary (http://w3c.github.io/dwbp/vocab-dqg.html).	yes
LC-3028 Maurino Andrea `<maurino@disco.unimib.it>` (archived comment)	Issue 6: IMHO there is the need that at least a very well defined subset of metadata terms MUST be described by means of standard terms and consequently if they must be expressed with well-known RDF vocabulary. Example of such mandatory list of metadata terms could include the owner, the type of license associated to the data, and date of last modification.	Changes were made on the metadata section and specific vocabularies are mentioned in the Possible Approach to Implementation section.	yes
LC-3029 Maurino Andrea `<maurino@disco.unimib.it>` (archived comment)	According to the experience of Comsode project license is a mandatory requirement for publishing data on the web due to the fact without a license there is no clear indication about the limits (if any) of usability of such data and this lack significantly reduce the possibility to have a real web of data. It is possible to suggest that in case someone publishes data without license this will imply that such data can be consumed for free by both humans and machines but they cannot be modified, reused an so on without an explicit acceptation of the data owner.	I am not sure if we can make such suggestion because this may depend from the policies of the organization. I think we can only suggest that data license information should be available.	yes
LC-3032 Maurino Andrea `<maurino@disco.unimib.it>` (archived comment)	This a big issue because if it is correct to protect the people's right to privacy there is also the "right to know" about activities realized by public administrations (for example legal sentences); In Italy, just as an example, personal information including salary related to person working in Public administration at higher level or consultants paid with public money must to be released as open data due to Italy transparency decree for 5 years (after such period there is "the right to be forgotten" that many of you known related to the google vs European Union case).	Some actions were taken to change BP for Sensistive Data. Changes will be made for the next version. http://www.w3.org/2013/dwbp/track/actions/164 http://www.w3.org/2013/dwbp/track/actions/166	yes
LC-3030 Maurino Andrea `<maurino@disco.unimib.it>` (archived comment)	Issue 7 I suggest to draw some strategies related to how attach quality information. In some case such information are defined inside data (for example when the time of last modification of an item is part of the dataset itself), in other situations there are the need to express quality dimensions related to schema description only (e.g. conciseness of schema) , or related to the dataset. I also suggest (but it is clear that I'm a little biased on such topic :) ) to better describe how to describe the quality information (including quality dimensions, adopted quality metric, and quality value see for example as starting point [1])	To keep this discussion for the Quality and Granularity Vocabulary document.	yes
LC-3031 Maurino Andrea `<maurino@disco.unimib.it>` (archived comment)	This is a crucial problem in particular in the case of linked data due to possible impact wrt. existing interlinked resources. Some good practice could be discussed	In the current version of the document there is a section for Data Versioning and the following BP: http://w3c.github.io/dwbp/bp.html#VersioningInfo http://w3c.github.io/dwbp/bp.html#VersionHistory It is not in the scope of the document to propose BP specific for linked data. The proposed BP are generic.	yes
LC-3007 Herbert Van de Sompel `<hvdsomp@gmail.com>` (archived comment)	(1) vocabulary versioning The Memento-related comments I made about Data Versioning apply equally to Vocabulary Versioning. All approaches described in <http://mementoweb.org/guide/howto/> apply to data and vocabulary. As a matter of fact, when implementing Memento protocol support for both data and vocabularies used in data, temporal versions of the data can automatically be aligned with the temporally correct version of the used vocabulary.	Resolved 2015-04-13 That we refer to Memento in the 'possible implementation' section of the relevant BP; resolve yes to the comment and action Newton to write to Herbert. The action to Newton was done by Christophe Guéret. Christophe replied back to Herbert.	tocheck
LC-3034 Christophe Guéret `<christophe.gueret@dans.knaw.nl>` (archived comment)	# Overall points The document concerns more data publishers than it concerns consumers. This also seems to be reflected by the composition of editors/contributors, there should be more data consumers jumping in and adding BPs that matter to them. "Data must be available in machine readable" -> only should, must is way too strong. Some data consumers may want to have access to data that is not machine readable (e.g. scanned old document) and not being only restricted to their machine-translated counterparts (e.g. OCRed old document)		yes
LC-3035 Christophe Guéret `<christophe.gueret@dans.knaw.nl>` (archived comment)	# Data vocabularies Issue 9 : we should stick to using "vocabularies" Issue 10 : we should aim at being generic BP 19: there is a problem in advocating for simplicity as this can prevent from having rich vocabularies. It could instead be suggest that publishers may provide vocabularies as rich as needed but strive at basing them on "simpler" ones (e.g. core ontologies / upper ontologies / ... ) to ensure there is always a minimum level of understanding. See, e.g. http://arxiv.org/abs/1304.5743 for a discussion about this.	The group discussed this topic and we got a consensus about the terms.	yes
LC-3052 Erik Wilde `<dret@berkeley.edu>` (archived comment)	- when it comes to versioning, i am always recommending to focus on openness and extensibility and have robust and well-defined models for those (this almost always requires well-defined processing models for data). this often avoids the need for versioning, which when done badly will be a breaking change. - when it comes to versioning, it is important to distinguish between breaking and non-breaking versioning changes. this comes down to the comment above: good openness and extensibility makes it easier to have non-breaking versioning, which helps tremendously in decentralized ecosystems.	Addressed: We now have a BP “Avoid breaking changes to your API”	tocheck
LC-3056 Erik Wilde `<dret@berkeley.edu>` (archived comment)	- regarding best practice 30, i am wondering if https://github.com/dret/I-D/blob/master/sunset-header/draft-wilde-sunset-header-00.txt is something that might be worth mentioning in some form. this is currently a pre-I-D draft, but maybe the general idea of communicating resource availability is relevant for DWBP?		tocheck
LC-3008 Herbert Van de Sompel `<hvdsomp@gmail.com>` (archived comment)	(2) preservation The Memento protocol can play a significant role in the realm of access to preserved data, as is exemplified by its broad adoption by web archives and the demonstration implementation of the DBpedia archive. But it also plays a role in making preserved/captured resources recognizable via the Memento-Datetime header (expresses datetime of capture/preservation) and the HTTP Link header that carries an "original" link that connects a preserved/captured resource with the URI where it originally resided.	It was created an example in BP29 that uses Memento, and it is available on the current draft. Christophe and Herbert made this: https://lists.w3.org/Archives/Public/public-dwbp-wg/2015May/0103.html	yes