Technical factors for consideration when choosing data sets for publication

From Data on the Web Best Practices
Jump to: navigation, search


This section of the Data on the Web Best Practices document will include best practices for Technical factors for consideration when choosing data sets for publication.

Good Practices

1. Open to integrate to others services/platforms

2. Open file format

According to [1], proprietary file format could create technology dependency for the information use and this could generate restrictions to data access. Thus, the data need to be structured and organised to facilitate their manipulation for distinct software. For example, some data are available in PDF format which doesn't allow software analyse the document.

Open file format avoid the use of scraping techniques to translate a proprietary file format in open formats such as XML or JSON.

3. "RDFizations of Datasets"

4. Dataset versioning

5. File readable for machine

6. REST access for individual datasets

7. It is good to use other protocols

such as sftp, rsync, scp

8. Ontology definition must be standard and machine readable

9. URI must be persistent

10. Data must have more than one format available

According to [1], it mustn't make available files only one open format, as this would also undermine the use by a group of people (for knowledge lack), and in other cases could miss structure to manipulate the files.

11. Definition of update data frequency

12. Good filters/searches to avoid many unnecessary requests

13. Data must be structured

14. Reuse of existing ontologies

15. Dataset size must be limited to small portions to be consumed bit by bit

16. It is good to may some management tool's dataset

17. Use of a dedicated service

a service independent of the data origin



Relation between good practices and use cases

Good Practices Use Cases
12 22
Example Example
Example Example

Editors and Contributors

Flávio Yanai

Links and References