List of comments on “Data on the Web Best Practices” (dated 24 February 2015)

There are 30 comments (sorted by their types, and the section they are about).

1-20 21-30

question comments

Comment LC-3051: Difference between BP 8 and BP 18: Commenter: Erik Wilde <dret@berkeley.edu> (archived message); Context: Best Practice 18: Re-use vocabularies; Status:
Not assigned; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
- what is the difference between "Best Practice 8" and "Best Practice
18"? it seems that they are very similar, and if there indeed is a
subtle difference, maybe create one practice that spans both, or make it
more clear what the difference is?; Related issues: (space separated ids); WG Notes: Email sent to Erik Wilde https://lists.w3.org/Archives/Public/public-dwbp-wg/2016Mar/0124.html with the proposed resolution.; Resolution: Best Practice 18: Vocabulary versioning was removed from the document. The current version of the document doesn't deal with vocabulary versioning.; (Please make sure the resolution is adapted for public consumption)

general comment comments

Comment LC-3034: Christophe01: Commenter: Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived message); Context: in; Status:
Not assigned; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
# Overall points
The document concerns more data publishers than it concerns consumers. This also seems to be reflected by the composition of editors/contributors, there should be more data consumers jumping in and adding BPs that matter to them.
"Data must be available in machine readable" -> only should, must is way too strong. Some data consumers may want to have access to data that is not machine readable (e.g. scanned old document) and not being only restricted to their machine-translated counterparts (e.g. OCRed old document); Related issues: (space separated ids); WG Notes: During the discussions about the audience, the group agreed that publishers will be our primary audience. In this case, best practices should be employed by data publishers instead of data consumers. However, both publishers and consumers will benefit from this. Then, I suggest to keep publishers as the main primary audience for our BP. Concerning the "Data must be available in machine readable", I suggest to change for "should".; Resolution:; (Please make sure the resolution is adapted for public consumption)

Comment LC-3035: Christophe02: Commenter: Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived message); Context: in; Status:
Not assigned; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
# Data vocabularies
Issue 9 : we should stick to using "vocabularies"
Issue 10 : we should aim at being generic
BP 19: there is a problem in advocating for simplicity as this can prevent from having rich vocabularies. It could instead be suggest that publishers may provide vocabularies as rich as needed but strive at basing them on "simpler" ones (e.g. core ontologies / upper ontologies / ... ) to ensure there is always a minimum level of understanding. See, e.g. http://arxiv.org/abs/1304.5743 for a discussion about this.; Related issues: http://www.w3.org/2013/dwbp/track/issues/166
(space separated ids); WG Notes: There is an ongoing discussion about the Data Vocabularies section. I propose to postpone this discussion for the next draft of the DWBP document.; Resolution: The group discussed this topic and we got a consensus about the terms.; (Please make sure the resolution is adapted for public consumption)

Comment LC-3054: Data linkable and linked: Commenter: Erik Wilde <dret@berkeley.edu> (archived message); Context: Document as a whole; Status:
assigned to Bernadette Farias Loscio; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
- generally speaking, i am wondering why the terms hypertext or
hypermedia are not even mentioned in the spec. isn't that what data on
the web ideally should be, linkable and linked?
https://github.com/dret/webdata#one-star-linkable and
https://github.com/dret/webdata#four-star-linked are core principles for
good web data. *linkable* means more than just URIs. it also means, for
example, to provide meaningful and robust fragment identifiers for
others to link to. *linked* means to use URIs and to specifically avoid
other kinds of (often non-globally scoped) identifiers, so that links
don't break when taken out of context.; Related issues: (space separated ids); WG Notes: Same as https://www.w3.org/2006/02/lc-comments-tracker/68239/WD-dwbp-20150625/3059; Resolution:; (Please make sure the resolution is adapted for public consumption)

Comment LC-3056: BP 30: Commenter: Erik Wilde <dret@berkeley.edu> (archived message); Context: in; Status:
assigned to Bernadette Farias Loscio; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
- regarding best practice 30, i am wondering if
https://github.com/dret/I-D/blob/master/sunset-header/draft-wilde-sunset-header-00.txt
is something that might be worth mentioning in some form. this is
currently a pre-I-D draft, but maybe the general idea of communicating
resource availability is relevant for DWBP?; Related issues: (space separated ids); WG Notes: Duplicated comment. Same as https://www.w3.org/2006/02/lc-comments-tracker/68239/WD-dwbp-20150625/3061; Resolution:; (Please make sure the resolution is adapted for public consumption)

Comment LC-3052: Comments regarding versioning: Commenter: Erik Wilde <dret@berkeley.edu> (archived message); Context: 8.1.6 Data Versioning; Status:
Not assigned; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
- when it comes to versioning, i am always recommending to focus on
openness and extensibility and have robust and well-defined models for
those (this almost always requires well-defined processing models for
data). this often avoids the need for versioning, which when done badly
will be a breaking change.

- when it comes to versioning, it is important to distinguish between
breaking and non-breaking versioning changes. this comes down to the
comment above: good openness and extensibility makes it easier to have
non-breaking versioning, which helps tremendously in decentralized
ecosystems.; Related issues: (space separated ids); WG Notes: Email from Annette - Addressed: We now have a BP “Avoid breaking changes to your API”. Carol has sent an email to Erik Wilde; Resolution: Addressed: We now have a BP “Avoid breaking changes to your API”; (Please make sure the resolution is adapted for public consumption)

Comment LC-3007: vocabulary versioning 1: Commenter: Herbert Van de Sompel <hvdsomp@gmail.com> (archived message); Context: Best Practice 17: Vocabulary versioning; Status:
Not assigned; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
(1) vocabulary versioning

The Memento-related comments I made about Data Versioning apply
equally to Vocabulary Versioning. All approaches described in
<http://mementoweb.org/guide/howto/> apply to data and vocabulary. As
a matter of fact, when implementing Memento protocol support for both
data and vocabularies used in data, temporal versions of the data can
automatically be aligned with the temporally correct version of the
used vocabulary.; Related issues: (space separated ids); WG Notes:; Resolution: Resolved 2015-04-13 That we refer to Memento in the 'possible implementation' section of the relevant BP; resolve yes to the comment and action Newton to write to Herbert. The action to Newton was done by Christophe Guéret. Christophe replied back to Herbert.; (Please make sure the resolution is adapted for public consumption)

Comment LC-3008: preservation 1: Commenter: Herbert Van de Sompel <hvdsomp@gmail.com> (archived message); Context: 8.7 Data Preservation; Status:
Not assigned; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
(2) preservation

The Memento protocol can play a significant role in the realm of
access to preserved data, as is exemplified by its broad adoption by
web archives and the demonstration implementation of the DBpedia
archive. But it also plays a role in making preserved/captured
resources recognizable via the Memento-Datetime header (expresses
datetime of capture/preservation) and the HTTP Link header that
carries an "original" link that connects a preserved/captured resource
with the URI where it originally resided.; Related issues: (space separated ids); WG Notes: In this version of the document memento was not used in the Preservation section, but it was used on the Versioning section. With an example on using memento: http://w3c.github.io/dwbp/bp.html#VersioningInfo; Resolution: It was created an example in BP29 that uses Memento, and it is available on the current draft. Christophe and Herbert made this: https://lists.w3.org/Archives/Public/public-dwbp-wg/2015May/0103.html; (Please make sure the resolution is adapted for public consumption)

substantive comments

Comment LC-3039: Christophe04: Commenter: Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived message); Context: in; Status:
Not assigned; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
# Metadata
Need to say where the taxonomy comes from. The document speaks about 3 types instead of the 5 commonly observed. The two missing ones are preservation metadata (how, where, ...) and technical metadata (EXIF,...)
BP: Use standard terms but then make extensions public when they are needed; Related issues: (space separated ids); WG Notes: I dont have the reference. I'm gonna ask the reference to Chistophe.; Resolution: To update the document to be more generic about the different types of metadata. Christophe said: "It should also be ok saying there are several type of metadata without going into much details."; (Please make sure the resolution is adapted for public consumption)

Comment LC-3038: Christophe03: Commenter: Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived message); Context: in; Status:
Not assigned; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
# Feedback
This section should also relate to preservation. One way to do it is to list stakeholders around preservation (see RDA for an impression).
BP: there should be identifiers to give feedback on a specific part of the data
BP: Use feedback as data enrichment, e.g. crowd annotation; Related issues: (space separated ids); WG Notes: I propose to keep this discussion for the Dataset Usage Vocabulary document (http://w3c.github.io/dwbp/vocab-du.html).; Resolution:; (Please make sure the resolution is adapted for public consumption)

Comment LC-3040: Christophe05: Commenter: Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived message); Context: in; Status:
Not assigned; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
# Data quality
Does this applies to data or metadata ?
There is a lot of granularity aspects in data that need to be taken in account
How do you define quality ?
Completeness of the data is not related to quality. There should be an element of comparison to check the completeness against something (e.g. "data is complete according to EDM")
There should be something about Quality VS Usability, partly because fitting data into quality standards can lead to loosing important data (mainly everything that does not fit); Related issues: (space separated ids); WG Notes: I suggest to keep this discussion (the meaning of quality, granularity and completeness) for the Quality and Granularity Vocabulary.; Resolution:; (Please make sure the resolution is adapted for public consumption)

Comment LC-3048: Data enrichment: Commenter: Annette Greiner <amgreiner@lbl.gov> (archived message); Context: in; Status:
Not assigned; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
Data enrichment
Issue: to discuss about enrichment yields derived data, not just metadata. For example, you could take a dataset of scheduled and real bus arrival times and enrich it by adding on-time arrival percentages. The percentages are data, not metadata.
Issue: to discuss about the meaning of the word “topification”.; Related issues: (space separated ids); WG Notes: Enrichment BP was reformulated by Annette.; Resolution: The section was reformulated.; (Please make sure the resolution is adapted for public consumption)

Comment LC-3036: Christophe03: Commenter:; Context: in; Status:
assigned to Bernadette Farias Loscio; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
# Preservation
There are existing guidelines about the process of preservation itself. Those could be cited to guide people on how to do preservation. There is also a lot of repositories that exist to preserve data at different levels (institution, national, ...).
There should be something there! In terms of BPs, the following points should be addressed:
* As a data publisher, do you want to, or have to, preserve your data ?
* If yes, what to preserve ?
* Who to give it to ? Only to one archive or several ? One could be mandated to do preservation whatever is quality as an archive is. There are existing certifications (DSA, etc) that can be used to help publishers make informed choices about who to trust.
* Think about the level of access for the preserved copy (public, private, ...)
* The type of data matter for preservation. Publishers need to be aware of that. It is also important to think about preserving with context and thus push not only a dataset alone but also preserve the resources that are needed to make sense of it (documentation, schemas, ...); Related issues: (space separated ids); WG Notes: A data preservation section was included in the document. http://w3c.github.io/dwbp/bp.html#dataPreservation; Resolution: A data preservation section was included in the document. http://w3c.github.io/dwbp/bp.html#dataPreservation; (Please make sure the resolution is adapted for public consumption)

Comment LC-3055: About REST: Commenter: Erik Wilde <dret@berkeley.edu> (archived message); Context: in; Status:
assigned to Bernadette Farias Loscio; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
- best practices 24 and 27 kind of conflict. one important idea of REST
is to avoid versioning, and having versioned URIs is a pretty certain
sign of bad design smell when it comes to media types and API design.; Related issues: (space separated ids); WG Notes: Duplicated to LC-3060.; Resolution:; (Please make sure the resolution is adapted for public consumption)

Comment LC-3037: Christophe03: Commenter: Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived message); Context: in; Status:
Not assigned; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
# Preservation
There are existing guidelines about the process of preservation itself. Those could be cited to guide people on how to do preservation. There is also a lot of repositories that exist to preserve data at different levels (institution, national, ...).
There should be something there! In terms of BPs, the following points should be addressed:
* As a data publisher, do you want to, or have to, preserve your data ?
* If yes, what to preserve ?
* Who to give it to ? Only to one archive or several ? One could be mandated to do preservation whatever is quality as an archive is. There are existing certifications (DSA, etc) that can be used to help publishers make informed choices about who to trust.
* Think about the level of access for the preserved copy (public, private, ...)
* The type of data matter for preservation. Publishers need to be aware of that. It is also important to think about preserving with context and thus push not only a dataset alone but also preserve the resources that are needed to make sense of it (documentation, schemas, ...); Related issues: (space separated ids); WG Notes: Duplicated to LC-3036.; Resolution:; (Please make sure the resolution is adapted for public consumption)

Comment LC-3006: DanBri-1: Commenter: Dan brickley <danbri@google.com> (archived message); Context: 8.1 Metadata; Status:
assigned to Phil Archer; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
re http://www.w3.org/TR/2015/WD-dwbp-20150224/#metadata

Congratulations on your new Working Draft. Just a brief point as I
begin to work through the doc... I'd like to suggest that you consider
recycling an old sentence from the early RDF '97-9 work, which
addresses up front the awkwardness inherent in defining "metadata" as
"data about data":

"""The distinction between "data" and "metadata" is not an absolute
one; it is a distinction created primarily by a particular
application, and many times the same resource will be interpreted in
both ways simultaneously."""

One of RDF's strengths is that it works at both these levels. While
the dwbp doc's scope goes beyond RDF, I think the insight in that old
paragraph from the first RDF Recommendation remains relevant.
Currently you write "Metadata is data about data." as well as "A
metadata document must be published together with the data"; taken
together this makes the distinction seem more clear-cut than it often
seems in practice.

cheers,

Dan

(*) context: http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
"The World Wide Web was originally built for human consumption, and
although everything on it is machine-readable, this data is not
machine-understandable. It is very hard to automate anything on the
Web, and because of the volume of information the Web contains, it is
not possible to manage it manually. The solution proposed here is to
use metadata to describe the data contained on the Web. Metadata is
"data about data" (for example, a library catalog is metadata, since
it describes publications) or specifically in the context of this
specification "data describing Web resources". The distinction between
"data" and "metadata" is not an absolute one; it is a distinction
created primarily by a particular application, and many times the same
resource will be interpreted in both ways simultaneously."; Related issues: (space separated ids); WG Notes:; Resolution: See http://www.w3.org/2013/meeting/dwbp/2015-04-13#resolution_15; (Please make sure the resolution is adapted for public consumption)

Comment LC-3003: Andrea-1: Commenter: Andrea Perego <andrea.perego@jrc.ec.europa.eu> (archived message); Context: Best Practice 1: Document data; Status:
Not assigned; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
1. BP-1 ("Document data") seems to mix two different requirements:
(a) publishing data documentation (metadata)
(b) publishing metadata in human-readable formats
Is this correct?
In such a case, shouldn't these be rather addressed by two different
BPs? The requirement of publishing metadata shouldn't necessarily
address *how* this is done. This would also be inconsistent with the
fact that the requirement about publishing metadata in
machine-readable formats is addressed by a specific BP (BP-2).; Related issues: (space separated ids); WG Notes:; Resolution: Text is being updated, Bernadette has been in touch with Andrea. Resolved at f2f 2015-04-13; (Please make sure the resolution is adapted for public consumption)

Comment LC-3004: Andrea-2: Commenter: Andrea Perego <andrea.perego@jrc.ec.europa.eu> (archived message); Context: Best Practice 2: Use machine-readable formats to provide met...; Status:
assigned to Bernadette Farias Loscio; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
2. BP-2 ("Use machine-readable formats to provide metadata"), section
"Intended outcome":
"It should be possible for computer applications, notably search
tools, to locate and process the metadata easily, which makes it human
readable metadata, machine readability metadata."
(a) It is unclear why this "makes it human readable metadata".
(b) There's probably a typo in "[... ] machine readability metadata" -
shouldn't this rather be "[...] machine readable metadata"?; Related issues: (space separated ids); WG Notes:; Resolution: At f2f 2015-04-13; (Please make sure the resolution is adapted for public consumption)

Comment LC-3005: Andrea-3: Commenter: Andrea Perego <andrea.perego@jrc.ec.europa.eu> (archived message); Context: Best Practice 2: Use machine-readable formats to provide met...; Status:
assigned to Bernadette Farias Loscio; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
3. BP-2 makes the point about the use of machine-readable formats for
data discovery via software agents, including search engines. It
points also to specific machine-readable metadata serialisations that
can be embedded in human-readable metadata, and that are currently
used by search engines to optimise discovery. However, I have two
questions on this:
(a) Shouldn't be a requirement for human-readable metadata to *always*
embed their machine-readable version? This also when machine-readable
metadata are available separately. I see a couple of use cases for
this - e.g., optimising discovery via search engines, existing browser
plug-ins able to read RDFa, etc.
(b) Do you think that the requirement of being "discoverable" by Web
search tools should be extended to data? BP-12 partially address this,
but not explicitly. I'm asking since this issue may be relevant to the
SDW WG - see http://www.w3.org/2015/spatial/wiki/BP_Requirements#Content_need_to_be_crawlable.2C_then_able_to_ask_search_engine_or_other_service; Related issues: (space separated ids); WG Notes:; Resolution: All linked in with the wider issues around metadata which are bing updated.; (Please make sure the resolution is adapted for public consumption)

Comment LC-3028: Use standard terms to define metadata: Commenter: Maurino Andrea <Maurino Andrea> (archived message); Context: Best Practice 3: Use standard terms to define metadata; Status:
assigned to Bernadette Farias Loscio; Type:; Resolution status:
Response drafted Resolution implemented Reply sent to commenter; Response status:
No response from Commenter yet Commenter approved disposition Commenter objected to disposition
Commenter's response (URI):; Comment:
Issue 6: IMHO there is the need that at least a very well defined subset of metadata terms MUST be described by means of standard terms and consequently if they must be expressed with well-known RDF vocabulary. Example of such mandatory list of metadata terms could include the owner, the type of license associated to the data, and date of last modification.; Related issues: (space separated ids); WG Notes:; Resolution: Changes were made on the metadata section and specific vocabularies are mentioned in the Possible Approach to Implementation section.; (Please make sure the resolution is adapted for public consumption)

1-20 21-30

Add a comment.