Status of comments about the last call working draft

From Data on the Web Best Practices
Jump to: navigation, search
No. Subject Author Message Comment Proposal Resolution and Implementation
1 Metadata - Structural Ivan Herman In example four it is probably a good practice to use datatypes that are as specific as possible to allow for data checks, for example. The one that comes to my mind is the "stop_url"; [2] includes the "anyURI" datatype, it is probably worth using it. There may be other such cases in the listed example. proposal: - inclusion of anyURI datatype

message to the author:

2 Metadata - Structural Ivan Herman B.t.w., it may be a good idea to refer to the CSVW primer[3] in this context. Maybe it goes too far to include it in this document, but [4] also includes an example on how to incorporate geospatial data into the metadata (which may be more appropriate for the bus stop example); it may be worth at least mentioning it in the text. proposal: - inclusion of CSV primer reference

message to the author:

3 Metadata - Locale parameters Michel Dumontier Best practice 3 uses dct:conformsTo, but the range of this is a URI of type dct:Standard, so it should be a URI for the ISO spec. proposal:

message to the author:

4 Data Enrichment Michel Dumontier one social: if one wants to follow BP31 - enrich data by generating new data - but that person is not the original data provider - i'd recommend that they make their contribution public (rather than republishing the whole dataset), with a machine readable provenance description of the work, and contribute the enrichment back to the original data provider. proposal:

message to the author:

5 Data versioning Andrea Perego The issue is about a specific metadata field, namely the date of last modification of a dataset (dct:modified in DCAT).

(see message for more details)

proposal: message to the author:

6 Data access Andrea Perego There's a scenario that I'm not sure it is addressed, at least explicitly. This concerns data that, to be accessed, require users to register. This is different from data that can be accessed only by authorised users.

(see message for more details)


message to the author:

7 Numerical data Frans Knibbe One thing I miss is the advice to use significant figures in numerical data. It is an easy way to make the data match their uncertainty, and in many cases it helps to compact data too. Numerical data with the wrong number of significant digits is a very common problem in geographical data (e.g. geographic coordinates with nanometre precision). proposal: this kind of advice is out of the scope of the DWBP working group. this kind of advice is out of the scope of the DWBP working group.

message to the author:

8 Data Enrichment David (Annette's colleague) IMO Topic 8.13 is a little too focused on automated methods for "filling in missing values". I like the summary: Enrich your data by generating new data from the raw data when doing so will enhance its value. but the text does not really address the "enhancement of value" part. It also seems weighted toward interpolation of data values as opposed to "generating new data".

Do you think it's worth emphasizing that enrichment should be demonstrable? I see this as a QA issue. Other examples include visual inspection to identify features in spatial data and cross-reference to external databases for demographic information. [ Lastly, generation of new data may be demand-driven, where missing values are calculated or otherwise determined by direct means. Measured application of these techniques informs the degree and direction of data enrichment]


resolution and commit: the author agreed with the proposal

message to the author:

9 Scope aphillips The scope section of the document lists criteria for inclusion in Section 3. The document makes no mention of character encoding later. While an exhaustive survey of best practices such as [Charmod-Norm]( is not desirable here, it is a somewhat fundamental BP to always choose a Unicode encoding. Would this be appropriate to this document? We are not sure if it is possible to include another BP at this moment. Should we include it as a general guideline in Section 1?
10 Descriptive Metadata aphillips In **Example 2**, the media type of the file is given as:

> dcat:mediaType "text/csv" ;

The media type should include the charset parameter, e.g. "text/csv;charset=UTF-8" since the default for text/* is ASCII and since UTF-8 should be preferred.

proposal: We will change it on Github.

message to the author:

11 Locale parameters Metadata aphillips In BP3 the word 'locale' is misspelled once as 'local', which could be confusing:

> Check if the metadata for the dataset itself includes information about local parameters (i.e. data, time, number formats, and language) in a human-readable format.

proposal: We will change it on Github.

message to the author:

12 Data formats aphillips Most machine-readable standardized formats also happen to be locale-neutral (by design, because it is a best practice ) Mention that fact as one of the benefits here? proposal: Should we include a new benefit or just mention that fact in the section's introduction?

message to the author:

13 Locale parameters Metadata aphillips Best practice #3 introduces itself as:

> Providing locale parameters helps humans and computer applications to work accurately with things like dates, currencies and numbers that may look similar but have different meanings in different locales.

But the actual best practice is to use **locale-neutral** representations that are interpreted/displayed to end-users in a locale-appropriate manner. For example, instead of storing the string "€2000.00", exchanging a data structure like the following is strongly preferred:

``` "price" {

  "value": 2000.00, 
  "currency": "EUR" 

} ```

The date examples given are all in xsd:date format, which is an excellent example of using a locale-neutral format.

Many things are dependent on locale: decimal symbol, grouping symbol, number of grouping digits, digit shapes, etc. It's because there can be wide variation (sometimes open to misinterpretation) that sending a locale neutral format is preferred for data values. Note also btw that the position of the currency symbol is dependent on the locale. In France it would be normal to write 2000.00 € rather than €2000.00. Same even when talking about USD when using $, ie. 2000.00 $.

proposal: Annette will contact the I18N WG to ask more details about this issue.

14 DUV aphillips & fsasaki

In the section above, there are a number of natural language field types that should have language and direction metadata associated with them.

proposal: To change DUV. resolution and commit
15 Locale parameters metadata aphillips Best Practice 3 includes as an example the use of `dct:language` to indicate locale and refers loosely to the need to indicate the language and locale of data values. The standards that embody locale and language identification on the Web (and on the Internet more generally) are IETF BCP 47 [RFC5646/RFC4647]( and [CLDR](

The I18N WG recognizes that these standards do not have a linked data representation currently, but the current representations given as examples in this document are incomplete and have a variety of limitations or problems. This is recognized, for example, by the fact that Dublin Core's language element was defined to reference RFC3066, which was the current BCP47 when that standard was published. However, BCP 47 has been updated since and the current formulation, while fully compatible with RFC3066, is the preferred reference.

The WG feels that BP3 should include a reference or recommendation to consistently use BCP47 as the standard for language and locale identification and, informatively, to CLDR as the source for both representing specific localized formats and as a reference for specific locale data values.

Please note that this is in addition to the need to recommend locale-neutral representations[1] instead).

proposal: To change BP 3. We need some help to make this change.

message to the author:

16 Locale parameters metadata aphillips BP3 covers locale-formats but does not suggest the need to tag natural language text with a language tag and with base direction metadata. Natural language can be indicated via the existing RDF representation. However, there currently exists no good mechanism to indicate base direction. This is seen as a gap in formats such as JSON-LD for which no good generalized solution exists. However, it is an unsolved problem that should be acknowledged (and which might be overcome via other means, such as providing Unicode controls or adding field-specific metadata to document formats). proposal: Annette will prepare a text to be included in the DWBP doc.

17 Vocabularies aphillips

Example 15 contains this citation:

> The Library of Congress publishes lists of ISO 639 country codes as Linked Data (see [ISO639-1-LOC] for two-letter codes):

ISO 639-1 is a list of language codes, not country codes. The standard for country codes is ISO3166-1. Please either change the reference to ISO3166-1 (the change I18N WG would prefer) or change the text to say "language codes".

proposal: We will change on github. We should validate the update with Antoine.

message to the author:

18 Metadata afasaki Section 9.2 Metadata

The section says "Best Practice 1: Provide metadata": "Metadata must be provided for both human users and computer applications"

This best practice does not talk about metadata in multiple languages. One should require that metadata is provided in the language of the user at least.

The same comment holds for "Best Practice 2: Provide descriptive metadata" in the same section, and in section 9.14 on enrichment

There is "Best Practice 3: Provide locale parameters metadata … Information about locale parameters (date, time, and number formats, language) should be described by metadata." but this best practice does not talk about metadata in multiple languages.

proposal: We will change on github. Should we include a general guideline in the introduction of Metadata Section?

message to the author: