Warning:
This wiki has been archived and is now read-only.

Best Practices/Publish Statistical data in Linked Data format

From Share-PSI EC Project
Jump to: navigation, search

Share-PSI 2.0 Best Practice

Source: Best Practices/Publishing_Statistical_data_in_Linked_Data_format

Outline of the best practice

Linked Open Data (LOD) is a growing movement for organizations to make their existing data available in a machine-readable format. There are two equally important viewpoints to LOD: publishing and consuming. Government open data policies should include both sub-processes of this important aspect, especially focusing on the publication process

Management summary

Challenge

Statistical data is used as the foundations for policy prediction, planning and adjustments, and therefore has a significant impact on the society (from citizens to businesses to governments). The process of collecting and monitoring socio-economic indicators can be considerably improved if the data produced by government organizations such as Statistical Offices, National Banks, Employment services, etc. are published in Linked Data Format.

Solution

Linked Data paradigm has opened new possibilities and perspectives for government organizations to open data and interchange information. Data is open if it is technically open (available in a machine-readable standard format, which means it can be retrieved and meaningfully processed by a computer application) and legally open (explicitly licensed in a way that permits commercial and non-commercial use and re-use without restrictions), see the World Bank Open Data Essentials, http://opendatatoolkit.worldbank.org/en/essentials.html

The Linked Data approach enables datasets to be linked together through references to common concepts. A dataset is represented in the form of a graph, using the Resource Description Framework (RDF) as a general-purpose language. Linked Data publication process refers to a set of activities related to extraction, transformation, validation, exploration and publication of RDF datasets originating from different sources (e.g., databases) on the Web. The ready for use RDF datasets can be either stored locally or registered at a metadata catalog e.g. build with CKAN open-source tool.

In 2014, The RDF Data Cube Vocabulary was published by the W3C Government Linked Data Working Group as a Recommendation for publishing multi-dimensional data on the Web.

Best Practice identification

Why is this a Best Practice? What’s the impact of the Best Practice

The approach contributes to the standardization of the process of publishing and re-use of multi-dimensional data on the Web. The approach is based on RDF Data Cube vocabulary that is mature enough to be used for publishing statistical data as it improves interoperability and allows comparison of data from different statistical sources. The vocabulary underlies SDMX (Statistical Data and Metadata eXchange), an ISO standard for exchanging and sharing statistical data and metadata among organizations and provides a layer on top of data to describe domain semantics, dataset's metadata, and other crucial information needed in the process of statistical data exchange.

Links to the PSI Directive

Policies and Legislation

Why is there a need for this Best Practice?

To spread experience and encourage government organizations to follow existing approaches

What do you need for this Best Practice?

Implementing a new governmental open data policy that will force the use of tools for automating the data extraction and publication process.

This can be based on existing open-source tools for publishing the statistical data in Linked Data format, see e.g. the LOD2 Statistical Workbench (https://www.w3.org/2013/share-psi/wiki/images/6/65/Samos_Workshop_2014_-_IMP_submission.pdf).

Applicability by other Member States

The approach is applicable to any Member State. Many EU States (especially the Statistical Offices) already publish their data in Linked Data format. These services are usually available on national Web portals, while the metadata is harvested on European level e.g. by the Publicdata.eu. Additionally, the European Commission already maintains the Open Data Portal as a metadata catalogue available as Linked Data, see http://open-data.europa.eu/en/linked-data.

Contact info

Valentina Janev, Institute Mihajlo Pupin, valentina.janev@institutepupin.com

Related Best Practices