Use Cases Requirements

From Dataset Exchange Working Group

Imperative sentence starting with a verb each describing an individual task in order to solve the stated problem. Define way to describe/specify packaging of files in Distribution.

  • Indicate dataset authors.
  • Describe data lineage.
  • Give potential data consumers information on how to use the data ("usage notes").
  • Link to scientific publications about a dataset.
  • Link to input data (i.e., data used to create a dataset).
  • Link to software used to produce the data
  • Specify the basic mandatory information for data citation - i.e., dataset authors, title, publication year, publisher, persistent identifier.
  • Specify additional, although not mandatory, information relevant for data citation - e.g., dataset contributors and funding reference.
  • Encode identifiers as dereferenceable HTTP URIs
  • Model the identifier type
  • Model primary and alternative identifiers
  • Specify data lineage in a human-readable way.
  • Specify data lineage in a machine-readable way.
  • Model different types of agent roles
  • Provide patterns for a consistent modeling of the different aspects of data quality
  • Specify dataset precision
  • Specify dataset accuracy
  • Express conformance with a given quality standard / benchmark
  • Express data quality conformance test results
  • Specify access restrictions at both dataset and distribution level
  • Specify why a dataset or distribution is not publicly accessible
  • Denote distributions as pointing to a service / API, and not directly to the actual data.
  • Provide a human- and machine-readable description of the API / service, and its interface.
  • Availability of modeling patterns for qualified forms
  • Map qualified and non-qualified forms
  • Begin able to specify the dataset "type" (data, documents, software)
  • Model resources that are not datasets (services, events)
  • DCAT deliverable - requirements for DCAT extension or, in case DCAT1.1 entirely relies on DQV for addressing the quality documentation, the next round of DQV specification;
  • Updating Data Quality Vocabulary wrt updates in W3C Permissions and Obligation Expression vocabulary, as per https://github.com/w3c/dxwg/issues/5
  • Specifying in which part of the dataset the quality issue is present as raised by Amrapali Zaveri and Anisa Rula (see DWBP mail and related to the use case ID18-Modeling conformance test results on data quality)
  • Adding attributes for the severity of a quality problem, as per discussion with Amrapali Zaveri Amrapali Zaveri and Anisa Rula (see DWBP mail)
  • Discuss adding attributes for the 'provenance' of a quality measurement in a part of a dataset, as per discussion with Amrapali Zaveri (see DWBP mail in DWBP mailing list)
  • Elaborate on Parameters for Quality Metrics (DWBP-Issue-223)
  • Multilingual Translation for DQV
  • Should we rename QualityCertificate? the current name is a little misleading, it seems it is a quality certificate rather than an annotation pointing to a quality certificate.
  • AP Guidelines - Concrete case study
  • Guidance on how to specify integrity and cardinality constraints when defining an AP
  • Concrete case of study: DQV wants to have more integrity conditions (in SHACL?) to enhance interoperability between DQV implementations.
  • Guidance on how to express alignment/compatibility between profiles? (somehow related to the notions implied in the use cases: ID16-Quality modeling patterns and ID21-Guidance on the use of qualified forms )
  • Concrete case of study: alignment between DQV and quality features of HCLS dataset profile (DWBP-Issue-221)
  • Concrete case of study: alignment between DQV and other quality vocabularies - and try to have these vocabularies use DQV patterns instead of keeping different wheels (Radulovic et al., F¸rber et al, sister-ontologies of daQ).
  • Guidance on how DQV can work with quality statistics vocabulary shall be provided with future versions of the DQV documentation.
  • Guidance on how to publish an Application Profiles
  • Suggestion about when intermediate landing pages are appropriate and when they are not - Issues about Mail on Vocabulary Content Negotiation
  • Allow for explicit control of Dataset publication at dedicated Catalogs
  • Missing aspects of dataset descriptions are identified (see reference listing)
  • Corresponding extension points (ports) are defined
  • Specify start / end date of spatial coverage
  • Specify the reference system(s) used in a dataset
  • Specify spatial coverage with geometries
  • Negotiate a metadata profile via HTTP
  • Ability to represent the different relationships between datasets, including:
  • ability to represent the relationships between different versions of a dataset
  • ability to represent collection of datasets, to describe their inclusion criteria and to define the 'hasPart'/'partOf' relationships
  • ability to represent derivation, e.g. processed data that is derived from raw data
  • To support the summarization/characterization of dataset through summary statistics and similar metrics, maybe providing a pattern for people to provide information about statistics about a datset
  • Revise definition of Distribution, making it clearer what a distribution is and what it is not, in order to provide better guidance for data publishers.
  • Clarify the relationships between datasets and zero, one or multiple catalogues. In particular, consideration of approaches to harvesting and aggregation ñ when descriptions of datasets are copied from one catalogue to another ñ contemplating the way that relationships between the descriptions can be maintained and how identifiers can be assigned that allow for linking back to the source descriptions.
  • Clarify how these approaches are similar or different and how they interact, for example in the form of guidelines how to create a DCAT description of a VOID or Data Cube dataset.
  • Each application profile needs to be documented, preferably by showing/reusing what is common across profiles
  • Machine-readable specifications of application profiles need to be easily publishable, and optimize re-use of existing specification.
  • Application profiles need a rich expression for the the validation of metadata
  • Data publishers (data providers, intermediary aggregators, Europeana and DPLA) need to be able to indicate the profile to which a certain piece of data (record describing an individual cultural object, or a whole dataset) belong.
  • Data publishers need to be able to serve different profiles of the same data via the same data publication channel (Web API)
  • Data consumers (intermediary aggregators, Europeana and DPLA, data consumers) need to be able to specify the profile they are interested in
  • Europeana needs to be able to accept the data described using EDM extensions that are compatible with its EDM-external profile whether it doesn't ingest this data entirely (i.e. some elements will be left out are they are useless for the main Europeana Collections portal) or it does ingest it (e.g. for Thematic Collections portals or domain-specific applications that Europeana or third parties would develop)
  • TBD
  • (at least) a generic recommendation/guideline on how to proceed with this problem
  • (if possible) a start with a joint W3C-OGC-INSPIRE-JRC effort to harmonize standards regarding dataset descriptions
  • define schema.org equivalents for DCAT properties to support entailment of schema.org compliant profiles of DCAT records.
  • Profiles must support declaration of vocabulary constraints
  • It must be possible to include in the profiled vocabulary for a set of data what guidance rules were applied.
  • Ability to describe the standards to which the dataset conforms.
  • Define a means (templated URIs, RDF models etc.) to capture the identity of a DACT data(sub)set