Warning:
This wiki has been archived and is now read-only.

Data on the Web Life Cycle

From Data on the Web Best Practices

Jump to: navigation, search

1 Data on the Web
2 Data on the Web Life Cycle
3 An Overview of the Data on the Web Life Cycle
4 Data on the Web Life Cycle and Best Practices
- 4.1 Examples of Best Practices

Data on the Web

Data from diverse domains (ex: governmental data, cultural heritage, scientific data, cross domain) available on the Web on a machine processable format.

Data on the Web Life Cycle

A set of tasks or activities that take place during the process of publishing and using data on the Web.
The process may pass through some number of iterations and may be represented using a spiral model.

Figure

An Overview of the Data on the Web Life Cycle

Data collection

Sources selection: identification of data sources that may offer relevant data (ex: relational databases, xml files, excel documents)

Data Generation

1st iteration: Dataset project
- Define the schema of the target dataset (structural metadata)
- Choose standard vocabularies
- Data (ex: FOAF, DC, SKOS, Data Cube)
- Dataset (ex: DCAT, PROV, VoiD, Data Quality Vocab)
- Data Catalog (ex: DCAT)
- Choose data formats (machine processable data)
- Create new vocabularies
- …
2nd iteration: ETL process (Extract, Transform and Load)
- Extract data from the selected data sources, transforms the data according to the decisions made during the dataset project and loads the data into the target dataset
- Metadata generation
- Produce (manually or automatically) structured metadata according to the metadata standards defined during the dataset project

Data Distribution

1st iteration: Plan
- URIs project: Design URIs that will persist and will continue to mean the same thing on the long term
- Choose a solution(s) for data publishing data catalogue, API, SPARQL endpoint, dataset dump, …
2nd iteration: Publish
- Publish data and metadata: Make data and metadata available on the Web
3rd iteration: Update
- Update data: Make a new version of the dataset available on the Web
- Update metadata: Make a new version of the metadata available on the Web

Data usage

Explore data: Identify important aspects of the data into focus for further analysis
Analyze data: Develop applications, build visualizations, …
Give feedback: Provide useful information about the dataset (ex: dataset relevance, data quality,…)
Provide data usage descriptions

Data on the Web Life Cycle and Best Practices

Best practices may be applied during the whole process of publishing and using data on the Web.
Best practices may be defined according to the activities performed in each one of the quadrants (or tasks).
For each best practice, a guidance of how to implement must be provided.
Some best practices may have more than one way of implementation.

Figure

Examples of Best Practices

Best practices - Data collection
- Have a catalogue to describe potential data sources, i.e., data sources that could provide data to be published on the Web
- …

Best practices - Data Generation
- Document the process of data generation
- Use standard vocabularies to describe data
- Use standard vocabularies to describe datasets and data catalogues (ex: DCAT)
- Provide stable URIs
- Provide data on machine processable formats
- Provide metadata to describe data
- …

Best Practices - Data Distribution
- Use standard ways to distribute data (ex: data catalogues and APIs)
- Provide details about data access
- Provide details about data licence
- Provide details about dataset provenance and quality
- Provide a schedule of dataset updates
- Keep a dataset history
- Provide ways to collect data consumers feedback
- Announce the publication of new datasets or new versions of existing datasets
- …

Best Practices - Data usage
- Provide feedback about datasets
- Provide descriptions about the usage of the dataset
- …

Retrieved from "https://www.w3.org/2013/dwbp/wiki/index.php?title=Data_on_the_Web_Life_Cycle&oldid=480"

Data on the Web Life Cycle

Contents

Data on the Web

Data on the Web Life Cycle

An Overview of the Data on the Web Life Cycle

Data collection

Data Generation

Data Distribution

Data usage

Data on the Web Life Cycle and Best Practices

Examples of Best Practices

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Navigation

extra links

Tools