LC-3036
|
# Preservation
There are existing guidelines about the process of preservation itself. Those could be cited to guide people on how to do preservation. There is also a lot of repositories that exist to preserve data at different levels (institution, national, ...).
There should be something there! In terms of BPs, the following points should be addressed:
* As a data publisher, do you want to, or have to, preserve your data ?
* If yes, what to preserve ?
* Who to give it to ? Only to one archive or several ? One could be mandated to do preservation whatever is quality as an archive is. There are existing certifications (DSA, etc) that can be used to help publishers make informed choices about who to trust.
* Think about the level of access for the preserved copy (public, private, ...)
* The type of data matter for preservation. Publishers need to be aware of that. It is also important to think about preserving with context and thus push not only a dataset alone but also preserve the resources that are needed to make sense of it (documentation, schemas, ...)
|
A data preservation section was included in the document.
http://w3c.github.io/dwbp/bp.html#dataPreservation |
yes |
---|
LC-3003
Andrea Perego <andrea.perego@jrc.ec.europa.eu> (archived comment) |
1. BP-1 ("Document data") seems to mix two different requirements:
(a) publishing data documentation (metadata)
(b) publishing metadata in human-readable formats
Is this correct?
In such a case, shouldn't these be rather addressed by two different
BPs? The requirement of publishing metadata shouldn't necessarily
address *how* this is done. This would also be inconsistent with the
fact that the requirement about publishing metadata in
machine-readable formats is addressed by a specific BP (BP-2).
|
Text is being updated, Bernadette has been in touch with Andrea. Resolved at f2f 2015-04-13 |
yes |
---|
LC-3004
Andrea Perego <andrea.perego@jrc.ec.europa.eu> (archived comment) |
2. BP-2 ("Use machine-readable formats to provide metadata"), section
"Intended outcome":
"It should be possible for computer applications, notably search
tools, to locate and process the metadata easily, which makes it human
readable metadata, machine readability metadata."
(a) It is unclear why this "makes it human readable metadata".
(b) There's probably a typo in "[... ] machine readability metadata" -
shouldn't this rather be "[...] machine readable metadata"?
|
At f2f 2015-04-13 |
yes |
---|
LC-3005
Andrea Perego <andrea.perego@jrc.ec.europa.eu> (archived comment) |
3. BP-2 makes the point about the use of machine-readable formats for
data discovery via software agents, including search engines. It
points also to specific machine-readable metadata serialisations that
can be embedded in human-readable metadata, and that are currently
used by search engines to optimise discovery. However, I have two
questions on this:
(a) Shouldn't be a requirement for human-readable metadata to *always*
embed their machine-readable version? This also when machine-readable
metadata are available separately. I see a couple of use cases for
this - e.g., optimising discovery via search engines, existing browser
plug-ins able to read RDFa, etc.
(b) Do you think that the requirement of being "discoverable" by Web
search tools should be extended to data? BP-12 partially address this,
but not explicitly. I'm asking since this issue may be relevant to the
SDW WG - see http://www.w3.org/2015/spatial/wiki/BP_Requirements#Content_need_to_be_crawlable.2C_then_able_to_ask_search_engine_or_other_service
|
All linked in with the wider issues around metadata which are bing updated. |
yes |
---|
LC-3048
Annette Greiner <amgreiner@lbl.gov> (archived comment) |
Data enrichment
Issue: to discuss about enrichment yields derived data, not just metadata. For example, you could take a dataset of scheduled and real bus arrival times and enrich it by adding on-time arrival percentages. The percentages are data, not metadata.
Issue: to discuss about the meaning of the word “topification”.
|
The section was reformulated. |
yes |
---|
LC-3046
Annette Greiner <amgreiner@lbl.gov> (archived comment) |
* Data Identification
Issue: to discuss about limiting this section to information that applies to publishing *data*.
|
|
yes |
---|
LC-3049
Annette Greiner <amgreiner@lbl.gov> (archived comment) |
Data Identification
Issue: to discuss about limiting this section to information that applies to publishing *data*.
|
|
yes |
---|
LC-3050
Annette Greiner <amgreiner@lbl.gov> (archived comment) |
Provide data up to date
Issue: to debate if the goal should be to adhere to a published schedule for updates.
|
Duplicated with LC-3047 |
tocheck |
---|
LC-3037
Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived comment) |
# Preservation
There are existing guidelines about the process of preservation itself. Those could be cited to guide people on how to do preservation. There is also a lot of repositories that exist to preserve data at different levels (institution, national, ...).
There should be something there! In terms of BPs, the following points should be addressed:
* As a data publisher, do you want to, or have to, preserve your data ?
* If yes, what to preserve ?
* Who to give it to ? Only to one archive or several ? One could be mandated to do preservation whatever is quality as an archive is. There are existing certifications (DSA, etc) that can be used to help publishers make informed choices about who to trust.
* Think about the level of access for the preserved copy (public, private, ...)
* The type of data matter for preservation. Publishers need to be aware of that. It is also important to think about preserving with context and thus push not only a dataset alone but also preserve the resources that are needed to make sense of it (documentation, schemas, ...)
|
|
yes |
---|
LC-3038
Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived comment) |
# Feedback
This section should also relate to preservation. One way to do it is to list stakeholders around preservation (see RDA for an impression).
BP: there should be identifiers to give feedback on a specific part of the data
BP: Use feedback as data enrichment, e.g. crowd annotation
|
|
yes |
---|
LC-3039
Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived comment) |
# Metadata
Need to say where the taxonomy comes from. The document speaks about 3 types instead of the 5 commonly observed. The two missing ones are preservation metadata (how, where, ...) and technical metadata (EXIF,...)
BP: Use standard terms but then make extensions public when they are needed
|
To update the document to be more generic about the different types of metadata. Christophe said: "It should also be ok saying there are several type of metadata without going into much details." |
yes |
---|
LC-3040
Christophe Guéret <christophe.gueret@dans.knaw.nl> (archived comment) |
# Data quality
Does this applies to data or metadata ?
There is a lot of granularity aspects in data that need to be taken in account
How do you define quality ?
Completeness of the data is not related to quality. There should be an element of comparison to check the completeness against something (e.g. "data is complete according to EDM")
There should be something about Quality VS Usability, partly because fitting data into quality standards can lead to loosing important data (mainly everything that does not fit)
|
|
yes |
---|
LC-3053
Erik Wilde <dret@berkeley.edu> (archived comment) |
- "Best Practice 14: Provide data in multiple formats" might want to say
if that should be done by different URIs, or one URI and HTTP conneg.
that's a very typical question publishers have, so it should be
mentioned at the very least, even if the answer is "we have no specific
recommendation either way".
- "Best Practice 14: Provide data in multiple formats" should say that
for fragment identifiers to be consistent across formats, care is needed
to make sure that this is the case (as much as possible, depending on
the formats and their features).
|
|
tocheck |
---|
LC-3055
Erik Wilde <dret@berkeley.edu> (archived comment) |
- best practices 24 and 27 kind of conflict. one important idea of REST
is to avoid versioning, and having versioned URIs is a pretty certain
sign of bad design smell when it comes to media types and API design.
|
|
tocheck |
---|
LC-3033
Maurino Andrea <maurino@disco.unimib.it> (archived comment) |
Please consider that this BP is strictly related to the data quality bp due to the fact the way in which are calculated temporal-related quality dimensions and such two BP must be correlated and coherent
|
This BP concerns how to keep data up to date instead of providing information if data is being updated as expected.
The discussion about data quality assessment is in the scope of the Data Quality Vocabulary (http://w3c.github.io/dwbp/vocab-dqg.html). |
yes |
---|
LC-3028
Maurino Andrea <maurino@disco.unimib.it> (archived comment) |
Issue 6: IMHO there is the need that at least a very well defined subset of metadata terms MUST be described by means of standard terms and consequently if they must be expressed with well-known RDF vocabulary. Example of such mandatory list of metadata terms could include the owner, the type of license associated to the data, and date of last modification.
|
Changes were made on the metadata section and specific vocabularies are mentioned in the Possible Approach to Implementation section. |
yes |
---|
LC-3029
Maurino Andrea <maurino@disco.unimib.it> (archived comment) |
According to the experience of Comsode project license is a mandatory requirement for publishing data on the web due to the fact without a license there is no clear indication about the limits (if any) of usability of such data and this lack significantly reduce the possibility to have a real web of data. It is possible to suggest that in case someone publishes data without license this will imply that such data can be consumed for free by both humans and machines but they cannot be modified, reused an so on without an explicit acceptation of the data owner.
|
I am not sure if we can make such suggestion because this may depend from the policies of the organization. I think we can only suggest that data license information should be available. |
yes |
---|
LC-3032
Maurino Andrea <maurino@disco.unimib.it> (archived comment) |
This a big issue because if it is correct to protect the people's right to privacy there is also the "right to know" about activities realized by public administrations (for example legal sentences); In Italy, just as an example, personal information including salary related to person working in Public administration at higher level or consultants paid with public money must to be released as open data due to Italy transparency decree for 5 years (after such period there is "the right to be forgotten" that many of you known related to the google vs European Union case).
|
Some actions were taken to change BP for Sensistive Data. Changes will be made for the next version.
http://www.w3.org/2013/dwbp/track/actions/164
http://www.w3.org/2013/dwbp/track/actions/166 |
yes |
---|
LC-3030
Maurino Andrea <maurino@disco.unimib.it> (archived comment) |
Issue 7 I suggest to draw some strategies related to how attach quality information. In some case such information are defined inside data (for example when the time of last modification of an item is part of the dataset itself), in other situations there are the need to express quality dimensions related to schema description only (e.g. conciseness of schema) , or related to the dataset. I also suggest (but it is clear that I'm a little biased on such topic :) ) to better describe how to describe the quality information (including quality dimensions, adopted quality metric, and quality value see for example as starting point [1])
|
To keep this discussion for the Quality and Granularity Vocabulary document. |
yes |
---|
LC-3031
Maurino Andrea <maurino@disco.unimib.it> (archived comment) |
This is a crucial problem in particular in the case of linked data due to possible impact wrt. existing interlinked resources. Some good practice could be discussed
|
In the current version of the document there is a section for Data Versioning and the following BP:
http://w3c.github.io/dwbp/bp.html#VersioningInfo
http://w3c.github.io/dwbp/bp.html#VersionHistory
It is not in the scope of the document to propose BP specific for linked data. The proposed BP are generic. |
yes |
---|