comments on Data on the Web Best Practices

Dear working group, I'd like to report some comments and suggestion related to your first draft of Data on the web Best Practice  based on my research activities on data quality and as coordinator of the European project COMSODE (www.comsode.eu<http://www.comsode.eu>)



Bp3  Use standard terms to define metadata



Issue 6: IMHO there is the need that at least a very well defined subset of metadata terms MUST be described by means of standard terms and consequently if they must be expressed with well-known RDF vocabulary. Example of such mandatory list of metadata terms could include the owner, the type of license associated to the data, and date of last modification.



Best Practice 6: Provide data license information



According to the experience of Comsode project  license is a mandatory requirement for publishing data on the web due to the fact without a license there is no clear indication about the limits (if any) of usability of such data and this lack significantly reduce the possibility to have a real web of data. It is possible to suggest that in case someone publishes data without license this will imply that such data can be consumed for free by both humans and machines but they cannot be modified, reused an so on without an explicit acceptation of the data owner.





Best Practice 8: Provide data quality information

Issue 7 I suggest to draw some strategies related to how attach quality information. In some case such information are defined inside data (for example when the time of last modification of an item is part of the dataset itself), in other situations there are the need to express quality dimensions related to schema description only (e.g. conciseness of schema) , or  related to the dataset. I also suggest (but it is clear that I'm a little biased on such topic :) ) to better describe how to describe the quality information (including quality dimensions, adopted quality metric, and quality value see for example as starting point [1])





Best Practice 9: Provide versioning information



This is a crucial problem in particular in the case of linked data due to possible impact wrt. existing interlinked resources. Some good practice could be discussed



Best Practice 20: Preserve people's right to privacy

This a big issue because if it is correct to protect the people's right to privacy there is also the "right to know" about activities realized by public administrations (for example legal sentences); In Italy, just as an example,  personal information including salary related to person working in Public administration at higher level or consultants paid with public money must to be released as open data due to Italy transparency decree for 5 years (after such period there is "the right to be forgotten" that many of you known related to the google vs European Union case).



Thus I suggest to change the best practice in " Data publishers should preserve the privacy of individuals according to the law of the country of data owner ".



Best Practice 25: Provide data up to date

Please consider that this BP is strictly related to the data quality bp due to the fact the way in which are calculated temporal-related  quality dimensions  and such two BP must be correlated and coherent







Best regards



Andrea  Maurino



[1] http://ceur-ws.org/Vol-1184/ldow2014_paper_09.pdf

Received on Thursday, 12 March 2015 20:45:00 UTC