Copyright © 2014 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document lists some use cases, compiled by the Data on the Web Best Practices Working Group, that represent scenarios of how data is commonly published on the Web and how it is used. This document also provides a set of requirements derived from these use cases that have been used to guide the development of the set of Data on the Web Best Practices and the development of two new vocabularies: Quality and Granularity Description Vocabulary and Data Usage Description Vocabulary.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This document was published by the Data on the Web Best Practices Working Group as a First Public Working Draft. If you wish to make comments regarding this document, please send them to public-dwbp-comments@w3.org (subscribe, archives). All comments are welcome.
Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This section is non-normative.
There is a growing interest on publishing and consuming data on the Web. Both government and non-government organizations already make a variety of data available on the Web covering several domains like education, economy, security, cultural heritage and scientific data. On the other hand, developers and journalists manipulate this data to create visualizations and to perform data analysis. However, despite of these experiences, several important issues need to be addressed in order to meet the requirements of both data publishers and data consumers.
To address these issues, the Data on the Web Best Practices Working Group seeks to provide guidance to publishers that will improve consistency in the way data is managed, thus promoting the reuse of data. The guidance will take two forms: a set of best practices that apply to multiple technologies, and vocabularies currently missing, but that are needed to support the data ecosystem on the Web.
In order to determine the scope of the best practices and the requirements for the new vocabularies, a set of use cases have been compiled. Each use case provides a narrative describing an experience of publishing and using Data on the Web. The use cases cover different domains and illustrate some of the main challenges faced by data publishers and data consumers. A set of requirements, used to guide the development of the set of best practices as well as the development of the vocabularies, have been derived from the compiled use cases.
This is a First Public Working Draft and shows the working group's current thinking and direction. Comments and new use cases are particularly welcome via public-dwbp-comments@w3.org (subscribe, archives).
There are many outstanding issues associated with the use cases presented here that are being addressed. Where those issue are related to a specific use case or requirement, they are highlighted in the body of the document below.
A use case describes a scenario that illustrates an experience of publishing and using Data on the Web. The information gathered from the uses cases should be helpful for the identification of the best practices that will guide the publishing and usage of Data on the Web. In general, a best practice will be described at least by a statement and a how to do it section, i.e., a discussion of techniques and suggestions as how to implement it. Use cases descriptions shows some of the main challenges faced by publishers or developers. Information about challenges will be helpful to identify areas where Best Practices are necessary. According to the challenges, a set of requirements were defined, in such a way that a requirement motivates the creation of one or more best practices.
(Contributed by Deirdre Lee)
Buildingeye.com makes building and planning information easier to find and understand by mapping what's happening in your city. In Ireland local authorities handle planning applications and usually provide some customized views of the data (PDFs, maps, etc.) on their own website. However there isn't an easy way to get a nationwide view of the data. BuildingEye, an independent SME, built http://mypp.ie/ to achieve this. However as each local authority didn't have an Open Data portal, BuildingEye had to directly ask each local authority for its data. It was granted access to some authorities, but not all. The data it did receive was in different formats and of varying quality/detail. BuildingEye harmonized this data for its own system. However, if another SME wanted to use this data, they would have to go through the same process and again go to each local authority asking for the data.
Elements:
Challenges:
Potential Requirements:
Requires: MetadataAvailable , FormatMachineRead , FormatStandardized , FormatOpen , LicenseAvailable and AccessBulk
R-FormatMachineRead seems to be more specific than the requirement from the two use cases listed as motivation?
(Contributed by Carlos Iglesias)
The IFAD Land Portal platform it's been completely rebuilt as an Open Data collaborative platform for the Land Governance community. Among the new features the Land Portal will provide access to comprehensive and in-depth 100+ indicators from 25+ different sources on land governance issues for 200+ countries over the world, as well as a repository of land related-content and documentation. Thanks to the new platform people could (1) curate and incorporate new data and metadata by means of different data importers and making use of the underlying common data model; (2) search, explore and compare the data through countries and indicators; and (3) consume and reuse the data by different means (i.e. raw data download at the data catalog; linked data and SPARQL endpoint at RDF triplestore; RESTful API; and built-in graphic visualization framework)
Elements:
Challenges:
Potential Requirements:
Requires: MetadataMachineRead , GranularityLevels , FormatMachineRead , FormatStandardized , FormatLocalize , VocabReference , VocabVersion , LicenseInteroperable , LicenseStandardized , ProvAvailable , AccessBulk , AccessRealTime , Persistent , QualityCompleteness and QualityMetrics
Requires: R-MetadataStandardized , MetadataInteroperable and GranularityLevels
(Contributed by Bernadette Lóscio )
Recife is a city situated in the Northeast of Brazil and it is famous for being one of the Brazil’s biggest tech hubs. Recife is also one of the first Brazilian cities to release data generated by public sector organizations for public use as Open Data. Then Open Data Portal Recife was created to offer access to a repository of governmental machine-readable data about several domains, including: finances, health, education and tourism. Data is available in CSV and GeoJSON format and every dataset has a metadata description, i.e. descriptions of the data, that helps in the understanding and usage of the data. However, the metadata is not described using standard vocabularies or taxonomies. In general, data is created in a static way, where data from relational databases are exported in a CSV format and then published in the data catalog. Currently, they are working to have dynamically generated data from the contents of relational databases, then data will be available as soon as they are created. The main phases of the development of this initiative were: to educate people with appropriate knowledge concerning Open Data, relevant data identification in order to identify the sources of data that their pontential consumers could find useful, data extraction and transformation from the original data sources to the open data format, configuration and installation of the open data catalogue tool, data publication and portal release.
Elements:
Challenges:
Requires: MetadataMachineRead , MetadataStandardized , MetadataDocum , VocabReference , VocabDocum , VocabOpen , SelectHighValue , SelectDemand , QualityCompleteness , DynamicGeneration , AutomaticUpdate and QualityComparable
Are R-SelectHighValue and R-SelectHighDemand are workable requirements?
The difference between R-DynamicGeneration and R-AutomaticUpdate is not clear.
(Contributed by Yasodara)
Data.gov.br is the open data portal of the Brazil's Federal Government. The site was built in community, in a network pulled by three technicians from the Ministry of Planning. They managed the WG3 from "INDA" or "National Infrastructure for Open Data". The CKAN was chosen because it is Free Software and present more independent solutions for the placement of data catalog of the Federal Government provided on the internet.
Elements:
Challenges:
Requires: R-VocabReference , R-LicenseAvailable , R-LicenseStandardized and R-QualityOpinions
(Contributed by Ghislain Atemezing)
ISO GEO is a company managing catalogs records of geographic information in XML, conformed to ISO-19139. (ISO- 19139 is a French adaptation of the ISO- 19115) An excerpts is here . They export thousands of catalogs like that today, but they need to manage them better. In their platform, they store the information in a more conventional manner, and use this standard for export dataset compliant to Inspire interoperability , or via the CSW protocol. Sometimes, they have to enrich their metadata with other ones, produced by tools like GeoSource and accessed through SDI (Spatial Data Infrastructure), with their own metadata records. A sample containing 402 metadata records in ISO 19139 are in public consultation here . They want to be able to integrate all the different implementations of the ISO 19139 in different tools in a single framework to better understand the thousand of metadata records they use in their day-to-day business. Types of information recorded in each file (see example here ) are the following: Contact info (metadata) [Data issued]; spatial representation ; reference system info [code space ], spatial Resolution ; Geographic Extension of the data, File distribution; Data Quality ; process step, etc.
Challenges:
(Contributed by Christophe Guéret)
The Netherlands has a set of registers they are looking at opening and exposing as Linked (Open) Data under the context of the project "PiLOD" community of expertise. The registers contain information about buildings, people, businesses and other individuals public bodies may want to refer to for they daily activities. One of them is, for instance, the service of public taxes ("BelastingDienst") which regularly pulls out data from several registers, stores this data in a big Oracle instance and curates it. This costly and time consuming process could be optimized by providing on-demand access to up-to-date descriptions provided by the register owners.
Challenges:
In terms of challenges, linking is for once not much of an issue as registers already cross-reference unique identifiers (see also http://www.wikixl.nl/wiki/gemma/index.php/Ontsluiting_basisgegevens ). A URIs scheme with predicable URIs is being considered for implementation. Actual challenges include:Requires: VocabReference , R-SensitivePrivacy , UniqueIdentifier , MultipleRepresentations and R-CoreRegister
(Contributed by Eric Stephan)
This use case describes a data management facility being constructed to support scientific offshore wind energy research for the U.S. Department of Energy’s Office of Energy Efficiency and Renewable Energy (EERE) Wind and Water Power Program. The Reference Facility for Renewable Energy (RFORE) project is responsible collecting wind characterization data from remote sensing and in situ instruments located on an offshore platform. This raw data is collected by the Data Management Facility and processed into a standardized NetCDF format. Both the raw measurements and processed data are archived in the PNNL Institutional Computing (PIC) petascale computing facility. The DMF will record all processing history, quality assurance work, problem reporting, and maintenance activities for both instrumentation and data. All datasets, instrumentation, and activities are cataloged providing a seamless knowledge representation of the scientific study. The DMF catalog relies on linked open vocabularies and domain vocabularies to make the study data searchable. Scientists will be able to use the catalog for faceted browsing, ad-hoc searches, query by example. For accessing individual datasets a REST GET interface to the archive will be provided.
Challenges:
For accessing numerous datasets scientists will be accessing the archive directly using other protocols such as sftp, rsync, scp, access techniques such as: http://www.psc.edu/index.php/hpn-sshRequires: FormatStandardized , VocabReference , VocabOpen and AccessRealTime
(Contributed by Christophe Guéret)
Taking the concrete example of the digital archive "DANS" , digital archives have so far been concerned with the preservation of what could be defined as "frozen" dataset. A frozen dataset is a finished, self-contained, set of data that does not evolve after it has been constituted. The goal of the preserving institution is to ensure this dataset remains available and readable for as many years as possible. This can for example concern an audio record, a digitized image, e-books or database dumps. Consumers of the data are expected to look-up for a specific content based on its associated persistent identifier , download it from the archive and use it. Now comes the question of the preservation of Linked Open Data. In opposition to "frozen" data sets, linked data can be qualified as "live" data. The resources it contains are part of a larger entity to which third parties contribute, one of the design principles indicate that other data producers and consumers should be able to point to data. As LD publishers stop offering their data (e.g. at the end of a project), taking the LD off-line as a dump and putting it in an archive effectively turns it into a frozen dataset, likewise to SQL dumps and other kind of data bases. The question then raises as to which extent this is an issue...
Challenges: The archive has to think about whether serving dereferencing for resources found in preserved datasets is required or not, also think about providing a SPARQL end point or not. If data consumers and publishers are fine with having RDF data dumps to be downloaded from the archive prior to its usage - just like any other digital item so far - the technical challenges could be limited to handling the size of the dumps and taking care of serialisation evolution over time (e.g. from Ntriples to Trig, or from RDF/XML to HDT ) as the preference for these formats evolves. Turning a live dataset into a frozen dump also raises the question of the scope. Considering that LD items are only part of a much larger graph that gives them meaning through context the only valid dump would be a complete snapshot of the entire connected component of the Web of Data graph the target dataset is part of.
Potential Requirements: Decide on the importance of the de-referencability of resources and the potential implications for domain names and naming of resources. Decide on the scope of the step that will turn a connected sub-graph into an isolated data dump.
Requires: VocabReference , UniqueIdentifier , PersistentIdentification and Archiving
(Contributed by Phil Archer )
On 27 March 2014, the LA Times published a story Women earn 83 cents for every $1 men earn in L.A. city government. It was based on an Infographic released by LA's City Controller, Ron Galperin. The Infographic was based on a dataset published on LA's open data portal, Control Panel LA . That portal uses the Socrata platform which offers a number of spreadhseet-like tools for examining the data, the ability to download it as CSV, embed it in a Web page and see its metadata.
Positive aspects:
Negative aspects:
Challenges:
Other Data Journalism blogs:
Requires: MetadataAvailable , MetadataStandardized , UniqueIdentifier and Citable
Review R-Citable as a requirement for Data Usage.
(Contributed by AGESIC )
Uruguay open data site holds 85 datasets containing 114 resources since the first dataset was published in Dec. 2012. Open data initiative prioritizes the “use of data” rather than “quantity of data”, that’s why the catalogue holds 25 applications using datasets resources in some way. It’s important for the project to keep the relation 1/3 between applications and datasets. Most of the resources are CSV and shapefiles; basically we have a 3 stars catalogue and the reason why we can’t go to the next level is the lack of resources (time, human, economic, etc.) at government agencies to implement an open data liberation strategy. So when we are asked about opening data, keep it simple is the answer, and CSV is far the easiest and smart way to start. Uruguay has an Access to public information law but don’t have legislation about open data. The open data initiative is leaded by AGESIC with the support of the open data working group. OD Working group: - Intendencia de Montevideo – www.montevideo.gub.uy - INE – www.ine.gub.uy - AGEV – www.agev.opp.gub.uy - FING – UDELAR – www.fing.edu.uy - D.A.T.A. – www.datauy.org
Elements:
Challenges: Consolidate tool to manage datasets, improve visualizations and transform resources to higher level (4 – 5 stars). Automate publication process using harvesting or similar tools. Alerts or control panel to keep data updated.
Requires: VocabReference , DynamicGeneration and AutomaticUpdate
(Contributed by Mark Harrison (University of Cambridge) & Eric Kauz (GS1) )
Retailers and Manufacturers / Brand Owners are beginning to understand that there can be benefits to openly publishing structured data about products and product offerings on the web as Linked Open Data. Some of the initial benefits may be enhanced search listing results (e.g. Google Rich Snippets) that improve the likelihood of consumers choosing such a product or product offer over an alternative product that lacks the enhanced search results. However, the longer term vision is that an ecosystem of new product-related services can be enabled if such data is available. Many of these will be consumer-facing and might be accessed via smartphones and other mobile devices, to help consumers to find the products and product offers that best match their search criteria and personal preferences or needs - and to alert them if a particular product is incompatible with their dietary preferences or other criteria such as ethical / environmental impact considerations - and to suggest an alternative product that may be a more suitable match. A more complete description of this use case is available.
Elements:
Challenges:
Potential Requirements:
Requires: FormatStandardized , FormatMultiple , ProvAvailable , AccessUptodate , LicenseLiability , PersistentIdentification , Citable , AutomaticUpdate and CoreRegister
(Contributed by Luis Polo )
Tabul.ae is a framework to publish and visually explore data that can used to deploy powerful and easy-to-exploit open data platforms, so contributing organizations to unleash the potential of their data. The aim is to enable data owners (public organizations) and consumers (citizens and business reusers) to transform the information they manage into added-value knowledge, empowering them to easily create data-centric web applications. These applications are built upon interactive and powerful graphs, and take the shape of interactive charts, dashboards, infographies and reports. Tabulae provides a high degree of assistance to create these apps and also automate several data visualizations tasks (i.e., recognition of geographical entities to automatically generate a map). In addition, the charts and maps are portable outside the platform and can be smartly integrated with any web content, enhancing the reusability of the information.
Elements:
Challenges:
Potential Requirements:
Requires: FormatStandardized , FormatLocalize , VocabReference , VocabVersion , LicenseStandardized , LicenseInteroperable , ProvAvailable , AutomaticUpdate and QualityCompleteness
(Contributed by Yasodara )
This is a Data Visualization made in 2012 by Vitor Batista , Léo tartari and Thiago Bueno for a W3C Brazil Office challenge about data from Rio Grande do Sul (a brazilian region). The data was released in a .zip package, the original format was .csv. The code and the documentation of the project are in it's GitHub repository
Elements:
Positive Aspects: the decision on transforming CSV in to JSON was based on the necessity to have hierarchical data - the positive point, that CSV structure can be mapped to an XML or JSON was considered. CSV only covers tabular format and JSON can cover more complex structures.
Negative Aspects: the data was in CSV format, but it's now (2014) outdated, and there's no prevision for new releases. There's no metadata in it.
Requires: MetadataAvailable , QualityCompleteness , PersistentIdentification and AutomaticUpdate
(Contributed by Carlos Laufer)
Bio2RDF is an open source project that uses Semantic Web technologies to make possible the distributed querying of integrated life sciences data. Since its inception [2], Bio2RDF has made use of the Resource Description Framework (RDF) and the RDF Schema (RDFS) to unify the representation of data obtained from diverse (molecules, enzymes, pathways, diseases, etc.) and heterogeneously formatted biological data (e.g. flat-files, tab-delimited files, SQL, dataset specific formats, XML etc.). Once converted to RDF, this biological data can be queried using the SPARQL Protocol and RDF Query Language (SPARQL), which can be used to federate queries across multiple SPARQL endpoints.
Elements:
Dataset metrics
Dataset | Namespace | #of triples |
---|---|---|
Affymetrix | affymetrix | 44469611 |
Biomodels | biomodels | 589753 |
Comparative Tox-icogenomics Data-base | ctd | 141845167 |
DrugBank | drugbank | 1121468 |
NCBI Gene | ncbigene | 394026267 |
Gene Ontology Annotations | goa | 80028873 |
HUGO Gene No-menclature Committee | hgnc | 836060 |
Homologene | homologene | 1281881 |
InterPro | interpro | 999031 |
iProClass | iproclass | 211365460 |
iRefIndex | irefindex | 31042135 |
Medical Subject Headings | mesh | 4172230 |
National Center for Biomedical Ontology | ncbo | 15384622 |
National Drug Code Directory | ndc | 17814216 |
Online Mendelian Inheritance in Man | omim | 1848729 |
Pharmacogenomics Knowledge Base | pharmgkb |
|
SABIO-RK | sabiork | 2618288 |
Saccharomyces Genome Database | sgd | 5551009 |
NCBI Taxonomy | 19 | 17814216 |
Total | taxon | 1010758291 |
Challenges:
Potential Requirements:
Requires: Archiving and R-FormatStandardized
(Contributed by Deirdre Lee)
While many cases of Data on the Web may contain meta-data about creation data and last update, the regularity of the release schedule is not always clear. Similarly, how and by whom the dataset is supported should also be made clear in the meta-data. These attributes are necessary to improve the reliability of the data so that third-party users can trust the timely delivery of the data, with a follow-up point should there be any issues.
Challenges:
Requires: MetadataAvailable , AccessUptodate and SLAAvailable
(Contributed by Deirdre Lee (based on Pieter Colpaert's paper 'Route planning using Linked Open Data') )
One of the advantages of publishing Open Data is often quoted as improving the quality of the data. Many eyes looking at a dataset helps spot errors and holes quicker than a public body may identify this themselves. For example, in his paper 'Route planning using Linked Open Data' Colpaert looks at how feedback can be incorporated into transport data to improve its data quality. How can this 'improved' data be fed back into the public body,processed an incorporated into the original dataset. Should there be an automated mechanism for this? How can the improvement be described in a machine readable format? What is best practice for reincorporating such improvements?
Technical Challenges:
Requires: QualityOpinions and IncorporateFeedback
(Contributed by Deirdre Lee (based on OKF Greece workshop) )
Many of the datasets that are required for Natural Disaster Management, for example critical infrastructure, utility services, road networks, are not available online as they are also deemed to be datasets that could be used for homeland security attacks.
Requires: SensitiveSecurity(Contributed by Deirdre Lee (based on 2012 ePSI Open Transport Data Manifesto) )
The Context: Transportation is an important contemporary issue, which has a direct impact on economic strength, environmental sustainability and social equity. Accordingly, transport data – largely produced or gathered by public sector organisations or semi-private entites, quite often locally – represents one of the most valuable sources of public sector information (PSI, also called ‘Open Data’), a key policy area for many, including the European Commission.
The Challenge: Combined with the advancement of Web 2.0 technologies and the increasing use of smart phones, the demand for high quality machine-readable and openly licensed transport data, allowing for reuse in commercial and non-commercial products and services, is rising rapidly. Unfortunately this demand is not met by current supply: many transport data producers and holders (from the public and private sectors) have not managed to respond adequately to these new challenges set by society and technology.
So what do we need?
Why is this not happening?
Requires: AccessBulk, FormatOpen, VocabOpen, QualityMetrics, FormatLocalize,and LicenseLiability
(Contributed by Deirdre Lee)
There are many potential/perceived benefits of Open Data, however in order to publish data, some initial investment/resources are required by public bodies. When justifying these resources and evaluating the impact of the investment, many Open Data providers express the desire to be able to track how the datasets are being used. However Open Data by design often requires no registration, explanation or feedback to enable the access to and usage of the data. How can data usage be tracked in order to inform the Open Data ecosystem and improve data provision?
Challenges:
Requires: TrackDataUsage
(Contributed by Deirdre Lee)
The Open City Data Pipeline aims to to provide an extensible platform to support citizens and city administrators by providing city key performance indicators (KPIs),leveraging Open Data sources. The assumption of Open Data is the “Added value comes from comparable Open datasets being combined”. Open Data needs stronger standards to be useful, in particular for industrial uptake. Industrial usage has different requirements than app hobbyist or civil society, it's important to think how Open Data can be used by industry at time of publication. They have developed a data pipeline to:
Current Data Summary
Base assumption (for our use case): Added value comes from comparable Open datasets being combined Challenges & Lessons Learnt:
Challenges:
Requires: FormatStandardized , LicenseInteroperable , IndustryReuse , QualityCompleteness and QualityComparable
(Contributed by Deirdre Lee, based on post by Leigh Dodds)
There are many different licenses available under which data on the web can be published, e.g. Creative Commons, Open Data Commons, national licenses, etc. It is important that the license is available in a machine-readable format. Leigh Dodds has done some work towards this with the Open Data Rights Statement Vocabulary http://schema.theodi.org/odrs/ http://theodi.org/guides/publishers-guide-to-the-open-data-rights-statement-vocabulary http://theodi.org/guides/odrs-reusers-guide Another issue is when data under different licenses are combined, the license terms under which the data is available also have to be merged. This interoperability of licenses is a challenge [may be out of scope of W3C DWBP, as it is more concerned with legal issues]
Challenges:
Requires: LicenseAvailable , LicenseMachineRead , LicenseStandardized and LicenseInteroperable
Does the WG have the capacity to deliver this? It's a potentially huge piece of work.
(Contributed by Deirdre Lee (based on a number of talks at EDF14) )
A main focus of publishing data on the web is to facilitate industry resuse for commercial purposes. In order for a commercial body to reuse data on the web, the terms of reuse must be clear. The legal terms of reuse are included in the license, but there are other factors that are important for commercial reuse, e.g. reliabiliy, support, incidient recovery, etc. These could be included in an SLA. Is there a standardized, machine-readable approach to SLAs?
Challenges:
Requires:SLAAvailable, SLAMachineRead and SLAStandardized
(Contributed by Deirdre Lee)
APIs are commonly used to publish data in formats designed for machine-consumption, as opposed to the corresponding HTML pages whose main aim is to deliver content suitable for human-consumption. There remains questions around how APIs can best be designed to publish data, and even if APIs are the most suitable way for publishing data at all . Could use of HTTP and URIs be sufficient? If the goal is to facilitate machine-readable data, what is best-practice?
Challenges:
Requires: AccessBulk and AccessRealTime
The use cases presented in the previous section illustrate a number of challenges faced by data publishers and data consumers. These challenges show that some guidance is required on specifc areas and therefore best practices should be provided. According to the challenges, a set of requirements were defined in such a way that a requirement motivates the creation of one or more best practices. Challenges related to Data Qaulity and Data Usage motivated the definition of specifc requirements for the Quality and Granularity Description Vocabulary and the Data Usage Vocabulary.
Challenge | Requirements |
---|---|
Metadata | Requirements for Metadata |
Data Granularity | Requirements for Data Granularity |
Data Formats | Requirements for Data Formats |
Data Vocabularies | Requirements for Data Vocabularies |
Licenses | Requirements for Licenses |
Provenance | Requirements for Provenance |
Data Selection | Requirements for Data Selection |
Data Access | Requirements for Data Access |
Sensitive Data | Requirements for Sensitive Data |
Data Identification | Requirements for Data Identification |
Data Publication | Requirements for Data Publication |
Industry-reuse | Requirements for Industry reuse |
Persistence | Requirements for Persistence |
Data Quality | Requirements for Data Quality |
Data Usage | Requirements for Data Usage |
Metadata should be available
Motivation: DocumentedSupportandRelease, BuildingEye, LATimesReporting and ViolenceMap
Metadata should be machine-readable
Motivation: RecifeOpenDataPortal, Bio2RDF and TheLandPortal
Metadata should be standardized
Motivation: RecifeOpenDataPortal, ISOGEOStory and LATimesReporting
Metadata vocabulary, or values if vocabulary is not standardized, should be well-documented
Motivation: RecifeOpenDataPortal
Metadata should be interoperable
Motivation: ISOGEOStory
Data available at different levels of granularity should be accessible and modelled in a common way
Motivation: ISOGEOStory and TheLandPortal
Data should be availabe in a machine-readable format
Motivation: BuildingEye and TheLandPortal
Data should be availabe in a standardized format
Motivation: OpenCityDataPipeline , WindCharacterizationScientificStudy , BuildingEye , TheLandPortal , GS1 Digital and Tabulae ,
Data should be availabe in an Open format
Motivation: BuildingEye ,
Data should be availabe in multiple formats
Motivation: GS1 Digital
It should be possible to localize data on the Web
Motivation: TheLandPortal and Tabulae
Existing reference vocabularies should be reused where possible
Motivation: OpenCityDataPipeline , RecifeOpenDataPortal , DadosGovBr , ISOGEOStory , DutchBasicRegisters , DigitalArchivingofLinkedData , TheLandPortal , UruguayOpenDataCatalogue and Tabulae
Vocabularies should be clearly documented
Motivation: RecifeOpenDataPortal
Vocabularies should be shared in an Open way
Motivation: RecifeOpenDataPortal and WindCharacterizationScientificStudy
Vocabularies should include versioning information
Motivation: TheLandPortal and Tabulae
Data should be associated with a license
Motivation: MachineReadabilityandInteroperabilityofLicenses , DadosGovBr and BuildingEye
Data licenses should be provided in a machine-readable format
Motivation: MachineReadabilityandInteroperabilityofLicenses
Standard vocabularies should be used to describe licenses
Motivation: MachineReadabilityandInteroperabilityofLicenses , DadosGovBr , TheLandPortal and Tabulae
Data licenses should be interoperable
Motivation: OpenCityDataPipeline , MachineReadabilityandInteroperabilityofLicenses , TheLandPortal and Tabulae
Liability terms associated with usage of Data on the Web should be clearly outlined
Motivation: GS1 Digital
Data provenance information should be available
Motivation: TheLandPortal , GS1 Digital and Tabulae
Datasets selected for publication should be of high-value
Motivation: RecifeOpenDataPortal
Datasets selected for publication should be in demand by potential users
Motivation: RecifeOpenDataPortal
Data should be available for bulk download
Motivation: PublicationofDataviaAPIs , BuildingEye and TheLandPortal
Where data is produced in real-time, it should be available on the Web in real-time
Motivation: PublicationofDataviaAPIs , WindCharacterizationScientificStudy and TheLandPortal
Data should be available in an up-to-date manner
Motivation: DocumentedSupportandRelease and GS1 Digital
Data should not infringe on a person's right to privacy
Motivation: DutchBasicRegisters
Data should not infringe on national security
Motivation: DatasetsforNaturalDisasterManagement
Each data resource should be associated with a unique identifier
Motivation: DutchBasicRegisters , DigitalArchivingofLinkedData , LATimesReporting and UruguayOpenDataCatalogue
A data resource may have multiple representations, e.g. xml/html/json/rdf
Motivation: DutchBasicRegisters
Dynamic generation of Data on the Web from non-Web data resources
Motivation: RecifeOpenDataPortal and UruguayOpenDataCatalogue
Automatic update of Data on the Web when original data source is updated
Motivation: RecifeOpenDataPortal , UruguayOpenDataCatalogue , GS1 Digital , Tabulae , ViolenceMap
Core registers should be accessible
Motivation: DutchBasicRegisters and GS1 Digital
Data should be suitable for industry reuse
Motivation: OpenCityDataPipeline
Service Level Agreements (SLAs) for industry reuse of the data should be available if requested
Motivation: DocumentedSupportandRelease and MachineReadabilityofSLAs
SLAs should be provided in a machine-readable format
Motivation: MachineReadabilityofSLAs
Standard vocabularies should be used to describe SLAs
Motivation: MachineReadabilityofSLAs
Potential revenue streams from data should be described
Motivation: DutchBasicRegisters
Data should be persistently identifiable
Motivation: DigitalArchivingofLinkedData , TheLandPortal , GS1 Digital and ViolenceMap
It should be possible to archive data
Motivation: DigitalArchivingofLinkedData
Data should be complete
Motivation: OpenCityDataPipeline , RecifeOpenDataPortal , TheLandPortal , Tabulae and ViolenceMap
Data should be comparable with other datasets
Motivation: OpenCityDataPipeline
Data should be associated with a set of standardized, objective quality metrics
Motivation: TheLandPortal
Subjective quality opinions on the data should be supported
Motivation: FeedbackLoopforCorrections and DadosGovBr
It should be possible to track the usage of data
Motivation: TrackingofDataUsage
It should be possible to incorporate feedback on the data
Motivation: FeedbackLoopforCorrections ,
It should be possible to cite data on the Web
Motivation: LATimesReporting and GS1 Digital