Warning:
This wiki has been archived and is now read-only.

Glossary

From Data on the Web Best Practices
Jump to: navigation, search

Remark: We could re-use many of the definitions available in http://www.alliancepermanentaccess.org/index.php/consultancy/dpglossary

Dataset

DCAT definition: a dataset in DCAT is defined as a "collection of data, published or curated by a single agent, and available for access or download in one or more formats". A dataset does not have to be available as a downloadable file.

Data representation

DH Curation Guide: By "data representation" we mean any convention for the arrangement of symbols in such a way as to enable information to be encoded by a data producer and later decoded by data consumers.

Data format

DH Curation Guide: specific convention for data representation i.e. the way that information is encoded and stored for use in a computer system, possibly constrained by a formal data type or set of standards.
Examples of data formats: CSV, XML, RDF and JSON.

Data Preservation

APA defines preservation as "The processes and operations in ensuring the technical and intellectual survival of objects through time". This is part of a data management plan focusing on preservation planning and meta-data. Whether it is worthwhile to put effort into preservation depends on the (future) value of the objects, the resources available and the opinion of the stakeholders (= designated community)

Data Archiving

Data archiving is the set of practices around the storage and monitoring of the state of digital material over the years. These tasks are the responsibility of a Trusted Digital Repository (TDR), also sometimes referred to as Long-Term Archive Service (LTA). Often such services follow the Open Archival Information System which defines the archival process in terms of ingest, monitoring and re-use of data.

File Format

Wikipedia definition: A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.
Examples of file formats: txt, pdf, ps, avi, gif and jpg.

Machine Readable Data

extracted from the Linked Data Glossary definition:Data formats that may be readily parsed by computer programs without access to proprietary libraries. For example, CSV, TSV and RDF formats are machine readable, but PDF and Microsoft Excel are not.

Vocabulary

Linked Data Glossary definition: A collection of "terms" for a particular purpose. Vocabularies can range from simple such as the widely used RDF Schema, FOAF and Dublin Core Metadata Element Set to complex vocabularies with thousands of terms, such as those used in healthcare to describe symptoms, diseases and treatments. Vocabularies play a very important role in Linked Data, specifically to help with data integration. The use of this term overlaps with Ontology.

Structured data

Bernadette's definition: structured data refers to data that conforms to a fixed schema. Relational databases and spreadsheets are examples of structured data.