W3C

List of comments on “Data on the Web Best Practices” (dated 24 February 2015)

Quick access to

There are 30 comments (sorted by their types, and the section they are about).

1-20 21-30

question comments

Comment LC-3051: Difference between BP 8 and BP 18
Commenter: Erik Wilde <dret@berkeley.edu> (archived message)
Context: Best Practice 18: Re-use vocabularies
Not assigned
Resolution status:

- what is the difference between "Best Practice 8" and "Best Practice
18"? it seems that they are very similar, and if there indeed is a
subtle difference, maybe create one practice that spans both, or make it
more clear what the difference is?
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

general comment comments

Comment LC-3034: Christophe01
Commenter: Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived message)
Context: in
Not assigned
Resolution status:

# Overall points
The document concerns more data publishers than it concerns consumers. This also seems to be reflected by the composition of editors/contributors, there should be more data consumers jumping in and adding BPs that matter to them.
"Data must be available in machine readable" -> only should, must is way too strong. Some data consumers may want to have access to data that is not machine readable (e.g. scanned old document) and not being only restricted to their machine-translated counterparts (e.g. OCRed old document)
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3035: Christophe02
Commenter: Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived message)
Context: in
Not assigned
Resolution status:

# Data vocabularies
Issue 9 : we should stick to using "vocabularies"
Issue 10 : we should aim at being generic
BP 19: there is a problem in advocating for simplicity as this can prevent from having rich vocabularies. It could instead be suggest that publishers may provide vocabularies as rich as needed but strive at basing them on "simpler" ones (e.g. core ontologies / upper ontologies / ... ) to ensure there is always a minimum level of understanding. See, e.g. http://arxiv.org/abs/1304.5743 for a discussion about this.
http://www.w3.org/2013/dwbp/track/issues/166
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3054: Data linkable and linked
Commenter: Erik Wilde <dret@berkeley.edu> (archived message)
Context: Document as a whole
assigned to Bernadette Farias Loscio
Resolution status:

- generally speaking, i am wondering why the terms hypertext or
hypermedia are not even mentioned in the spec. isn't that what data on
the web ideally should be, linkable and linked?
https://github.com/dret/webdata#one-star-linkable and
https://github.com/dret/webdata#four-star-linked are core principles for
good web data. *linkable* means more than just URIs. it also means, for
example, to provide meaningful and robust fragment identifiers for
others to link to. *linked* means to use URIs and to specifically avoid
other kinds of (often non-globally scoped) identifiers, so that links
don't break when taken out of context.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3056: BP 30
Commenter: Erik Wilde <dret@berkeley.edu> (archived message)
Context: in
assigned to Bernadette Farias Loscio
Resolution status:

- regarding best practice 30, i am wondering if
https://github.com/dret/I-D/blob/master/sunset-header/draft-wilde-sunset-header-00.txt
is something that might be worth mentioning in some form. this is
currently a pre-I-D draft, but maybe the general idea of communicating
resource availability is relevant for DWBP?
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3052: Comments regarding versioning
Commenter: Erik Wilde <dret@berkeley.edu> (archived message)
Context: 8.1.6 Data Versioning
Not assigned
Resolution status:

- when it comes to versioning, i am always recommending to focus on
openness and extensibility and have robust and well-defined models for
those (this almost always requires well-defined processing models for
data). this often avoids the need for versioning, which when done badly
will be a breaking change.

- when it comes to versioning, it is important to distinguish between
breaking and non-breaking versioning changes. this comes down to the
comment above: good openness and extensibility makes it easier to have
non-breaking versioning, which helps tremendously in decentralized
ecosystems.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3007: vocabulary versioning 1
Commenter: Herbert Van de Sompel <hvdsomp@gmail.com> (archived message)
Context: Best Practice 17: Vocabulary versioning
Not assigned
Resolution status:

(1) vocabulary versioning

The Memento-related comments I made about Data Versioning apply
equally to Vocabulary Versioning. All approaches described in
<http://mementoweb.org/guide/howto/> apply to data and vocabulary. As
a matter of fact, when implementing Memento protocol support for both
data and vocabularies used in data, temporal versions of the data can
automatically be aligned with the temporally correct version of the
used vocabulary.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3008: preservation 1
Commenter: Herbert Van de Sompel <hvdsomp@gmail.com> (archived message)
Context: 8.7 Data Preservation
Not assigned
Resolution status:

(2) preservation

The Memento protocol can play a significant role in the realm of
access to preserved data, as is exemplified by its broad adoption by
web archives and the demonstration implementation of the DBpedia
archive. But it also plays a role in making preserved/captured
resources recognizable via the Memento-Datetime header (expresses
datetime of capture/preservation) and the HTTP Link header that
carries an "original" link that connects a preserved/captured resource
with the URI where it originally resided.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

substantive comments

Comment LC-3039: Christophe04
Commenter: Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived message)
Context: in
Not assigned
Resolution status:

# Metadata
Need to say where the taxonomy comes from. The document speaks about 3 types instead of the 5 commonly observed. The two missing ones are preservation metadata (how, where, ...) and technical metadata (EXIF,...)
BP: Use standard terms but then make extensions public when they are needed
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3038: Christophe03
Commenter: Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived message)
Context: in
Not assigned
Resolution status:

# Feedback
This section should also relate to preservation. One way to do it is to list stakeholders around preservation (see RDA for an impression).
BP: there should be identifiers to give feedback on a specific part of the data
BP: Use feedback as data enrichment, e.g. crowd annotation
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3040: Christophe05
Commenter: Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived message)
Context: in
Not assigned
Resolution status:

# Data quality
Does this applies to data or metadata ?
There is a lot of granularity aspects in data that need to be taken in account
How do you define quality ?
Completeness of the data is not related to quality. There should be an element of comparison to check the completeness against something (e.g. "data is complete according to EDM")
There should be something about Quality VS Usability, partly because fitting data into quality standards can lead to loosing important data (mainly everything that does not fit)
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3048: Data enrichment
Commenter: Annette Greiner <amgreiner@lbl.gov> (archived message)
Context: in
Not assigned
Resolution status:

Data enrichment
Issue: to discuss about enrichment yields derived data, not just metadata. For example, you could take a dataset of scheduled and real bus arrival times and enrich it by adding on-time arrival percentages. The percentages are data, not metadata.
Issue: to discuss about the meaning of the word “topification”.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3036: Christophe03
Commenter:
Context: in
assigned to Bernadette Farias Loscio
Resolution status:

# Preservation
There are existing guidelines about the process of preservation itself. Those could be cited to guide people on how to do preservation. There is also a lot of repositories that exist to preserve data at different levels (institution, national, ...).
There should be something there! In terms of BPs, the following points should be addressed:
* As a data publisher, do you want to, or have to, preserve your data ?
* If yes, what to preserve ?
* Who to give it to ? Only to one archive or several ? One could be mandated to do preservation whatever is quality as an archive is. There are existing certifications (DSA, etc) that can be used to help publishers make informed choices about who to trust.
* Think about the level of access for the preserved copy (public, private, ...)
* The type of data matter for preservation. Publishers need to be aware of that. It is also important to think about preserving with context and thus push not only a dataset alone but also preserve the resources that are needed to make sense of it (documentation, schemas, ...)
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3055: About REST
Commenter: Erik Wilde <dret@berkeley.edu> (archived message)
Context: in
assigned to Bernadette Farias Loscio
Resolution status:

- best practices 24 and 27 kind of conflict. one important idea of REST
is to avoid versioning, and having versioned URIs is a pretty certain
sign of bad design smell when it comes to media types and API design.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3037: Christophe03
Commenter: Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived message)
Context: in
Not assigned
Resolution status:

# Preservation
There are existing guidelines about the process of preservation itself. Those could be cited to guide people on how to do preservation. There is also a lot of repositories that exist to preserve data at different levels (institution, national, ...).
There should be something there! In terms of BPs, the following points should be addressed:
* As a data publisher, do you want to, or have to, preserve your data ?
* If yes, what to preserve ?
* Who to give it to ? Only to one archive or several ? One could be mandated to do preservation whatever is quality as an archive is. There are existing certifications (DSA, etc) that can be used to help publishers make informed choices about who to trust.
* Think about the level of access for the preserved copy (public, private, ...)
* The type of data matter for preservation. Publishers need to be aware of that. It is also important to think about preserving with context and thus push not only a dataset alone but also preserve the resources that are needed to make sense of it (documentation, schemas, ...)
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3006: DanBri-1
Commenter: Dan brickley <danbri@google.com> (archived message)
Context: 8.1 Metadata
assigned to Phil Archer
Resolution status:

re http://www.w3.org/TR/2015/WD-dwbp-20150224/#metadata

Congratulations on your new Working Draft. Just a brief point as I
begin to work through the doc... I'd like to suggest that you consider
recycling an old sentence from the early RDF '97-9 work, which
addresses up front the awkwardness inherent in defining "metadata" as
"data about data":

"""The distinction between "data" and "metadata" is not an absolute
one; it is a distinction created primarily by a particular
application, and many times the same resource will be interpreted in
both ways simultaneously."""

One of RDF's strengths is that it works at both these levels. While
the dwbp doc's scope goes beyond RDF, I think the insight in that old
paragraph from the first RDF Recommendation remains relevant.
Currently you write "Metadata is data about data." as well as "A
metadata document must be published together with the data"; taken
together this makes the distinction seem more clear-cut than it often
seems in practice.

cheers,

Dan






(*) context: http://www.w3.org/TR/1999/REC-rdf-syntax-19990222/
"The World Wide Web was originally built for human consumption, and
although everything on it is machine-readable, this data is not
machine-understandable. It is very hard to automate anything on the
Web, and because of the volume of information the Web contains, it is
not possible to manage it manually. The solution proposed here is to
use metadata to describe the data contained on the Web. Metadata is
"data about data" (for example, a library catalog is metadata, since
it describes publications) or specifically in the context of this
specification "data describing Web resources". The distinction between
"data" and "metadata" is not an absolute one; it is a distinction
created primarily by a particular application, and many times the same
resource will be interpreted in both ways simultaneously."
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3003: Andrea-1
Commenter: Andrea Perego <andrea.perego@jrc.ec.europa.eu> (archived message)
Context: Best Practice 1: Document data
Not assigned
Resolution status:

1. BP-1 ("Document data") seems to mix two different requirements:
(a) publishing data documentation (metadata)
(b) publishing metadata in human-readable formats
Is this correct?
In such a case, shouldn't these be rather addressed by two different
BPs? The requirement of publishing metadata shouldn't necessarily
address *how* this is done. This would also be inconsistent with the
fact that the requirement about publishing metadata in
machine-readable formats is addressed by a specific BP (BP-2).
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3004: Andrea-2
Commenter: Andrea Perego <andrea.perego@jrc.ec.europa.eu> (archived message)
Context: Best Practice 2: Use machine-readable formats to provide met...
assigned to Bernadette Farias Loscio
Resolution status:

2. BP-2 ("Use machine-readable formats to provide metadata"), section
"Intended outcome":
"It should be possible for computer applications, notably search
tools, to locate and process the metadata easily, which makes it human
readable metadata, machine readability metadata."
(a) It is unclear why this "makes it human readable metadata".
(b) There's probably a typo in "[... ] machine readability metadata" -
shouldn't this rather be "[...] machine readable metadata"?
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3005: Andrea-3
Commenter: Andrea Perego <andrea.perego@jrc.ec.europa.eu> (archived message)
Context: Best Practice 2: Use machine-readable formats to provide met...
assigned to Bernadette Farias Loscio
Resolution status:

3. BP-2 makes the point about the use of machine-readable formats for
data discovery via software agents, including search engines. It
points also to specific machine-readable metadata serialisations that
can be embedded in human-readable metadata, and that are currently
used by search engines to optimise discovery. However, I have two
questions on this:
(a) Shouldn't be a requirement for human-readable metadata to *always*
embed their machine-readable version? This also when machine-readable
metadata are available separately. I see a couple of use cases for
this - e.g., optimising discovery via search engines, existing browser
plug-ins able to read RDFa, etc.
(b) Do you think that the requirement of being "discoverable" by Web
search tools should be extended to data? BP-12 partially address this,
but not explicitly. I'm asking since this issue may be relevant to the
SDW WG - see http://www.w3.org/2015/spatial/wiki/BP_Requirements#Content_need_to_be_crawlable.2C_then_able_to_ask_search_engine_or_other_service
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

Comment LC-3028: Use standard terms to define metadata
Commenter: Maurino Andrea <Maurino Andrea> (archived message)
Context: Best Practice 3: Use standard terms to define metadata
assigned to Bernadette Farias Loscio
Resolution status:

Issue 6: IMHO there is the need that at least a very well defined subset of metadata terms MUST be described by means of standard terms and consequently if they must be expressed with well-known RDF vocabulary. Example of such mandatory list of metadata terms could include the owner, the type of license associated to the data, and date of last modification.
(space separated ids)
(Please make sure the resolution is adapted for public consumption)

1-20 21-30

Add a comment.


Developed and maintained by Dominique Hazaël-Massieux (dom@w3.org).
$Id: index.html,v 1.1 2017/08/11 06:47:21 dom Exp $
Please send bug reports and request for enhancements to w3t-sys.org