TaskForces/CommunityProjects/LinkingOpenData/DataSets/CKANmetainformation
Guidelines for Collecting Metadata on Linked Datasets in the datahub.io Data Catalog
For keeping the LOD cloud diagram up to date, the Linking Open Data community effort has started to collect meta-information about Linked datasets on datahub.io, a registry of open data and content packages provided by the Open Knowledge Foundation.
This page explains how dataset publishers or other people that want a dataset to be added to the LOD cloud, describe datasets on datahub.io.
The list of datasets about which we have already collected information is be found here:
http://validator.lod-cloud.net/
Which datasets are included into the LOD cloud diagram?
All datasets are included that fullfil the following requirements:
- Data items are accessible via dereferencable URIs, Offering only a SPARQL endpoint but no dereferencable URIs is not considered enough for inclusion.
- The dataset sets at least 50 RDF links pointing at other datasets or at least one other dataset is setting 50 RDF links pointing at your dataset.
How do I add a data set to datahub.io or edit an existing data set?
- Please register with datahub.io before editing or adding any packages.
- Please confirm that your data set does not already exist on datahub.io before adding a new data set.
- Add or edit your data set and describe it with the following minimum required information:
- name (a unique id)
- title
- URL
- number of triples
- links to other data sets.
- Please tag newly added data sets with
lod
. - If you are not aware of any in- or outlinks, tag it with
lodcloud.nolinks
. - Please provide as much additional information as possible (e.g. SPARQL endpoint, voiD description, license, and the topic of the data set) as described below. This information helps the community to know more about the development state of the Web of Linked Data and is made available via the datahub.io API.
Minimum Information
Please provide the following minimum information about your data set.
Standard CKAN fields
Field name | Description | Format/Examples |
Name | Unique ID for your data set on datahub.io | [a-z0-9-]+ "my-dataset" |
Title | Full name of your data set | "My Dataset" |
URL | Link to data set homepage | http://example.com/my-ds |
Custom datahub.io fields
Field name | Description | Format/Examples |
triples | Approximate size of your data set in RDF triples | 100000, 62345123 |
links:xxx | Number of RDF links pointing at data set with Data Hub ID xxx (http://thedatahub.org/dataset/xxx). Please provide separate links xxx statements for each data set your are linking to | 20000 |
datahub.io tags
Please use the following tags to provide meta-information about your data set.
We will use the topic information to color the LOD cloud later.
Please also list the vocabularies used by your data set so that the community can get an overview of which vocabularies are commonly used on the Web of Linked Data.
Linked Data published on the Web should be as self-describing as possible in order to make it easier for clients to understand and use the data. Important aspects of self-descriptiveness are making vocabulary terms dereferenceable according to the best practices described in Publishing RDF Vocabularies, using terms from common vocabularies and providing vocabulary mappings for proprietary vocabulary terms. In order to allow the community to get an overview which data sets implement these best practices, please tag your data set accordingly.
Tag | Purpose |
<topic>
|
One of:
|
Enhanced Information
Please provide the following additional information about your data set. This information helps the community to know more about the development state of the Web of Linked Data and is made available via the datahub.io API.
Standard datahub.io fields
Field name | Description | Format/Examples |
Version | Last modification date or version of your data set | "2010-04 (3.5)", "2006", "beta" |
Notes | Description of your data set | some free text |
Author | Name of publishing org and/or person | "Talis (Leigh Dodds)" |
Author email | Contact email | leigh@ldodds.com |
License | Standard license drop-down | OSI approved::MIT license |
Custom datahub.io fields
Field name | Description | Format/examples |
shortname | Short name for LOD bubble | "NY Times" |
license_link | Custom license link | http://example.com/so-sue-me |
sparql_graph_name | Named graph in SPARQL store (if used by your SPARQL endpoint) | http://species.geospecies.org |
namespace | Instance namespace | http://dbpedia.org/resource/ |
datahub.io resource links
Links (other than dereferenceable URIs) that enable alternative access to the data set (e.g., via downloads or SPARQL endpoints) should be specified in the Resources section of the CKAN entry form. Please also provide links to the voiD description or Semantic Web Sitemap describing your data set.
Purpose | Format | Description |
Download page | — | Download |
XML Sitemap | meta/sitemap
|
XML Sitemap |
SPARQL endpoint | api/sparql
|
SPARQL endpoint |
voiD file | meta/void
|
voiD description |
RDF/XML download | application/rdf+xml
|
Download |
Turtle download | text/turtle
|
Download |
N-Triples download | application/x-ntriples
|
Download |
N-Quads download | application/x-nquads
|
Download |
RDF Schema | meta/rdf-schema
|
Download link to RDF/OWL Schema used by your data set (in addition to having dereferenceable vocabulary URIs) |
RDF/XML example link | example/rdf+xml
|
Link to an example data item within your data set (RDF/XML) |
Turtle example link | example/turtle
|
Link to an example data item within your data set (Turtle) |
N-Triples example link | example/ntriples
|
Link to an example data item within your data set (N-Triples) |
HTML+RDFa example link | example/rdfa
|
Link to an example data item within your data set (RDFa) |
Vocabulary Mappings, e.g., OWL, RDFS, RIF, R2R | mapping/<format>
|
If your data set uses proprietary vocabulary terms and you know these terms also exists in other vocabularies, you should set owl:equivalentClass , owl:equivalentProperty , rdfs:subClassOf , and/or rdfs:subPropertyOf links pointing at these terms or provide mapping expressed as RIF rules or using the R2R Mapping Language. If your mappings can be downloaded as a single file, please provide the link to the download.
|
datahub.io tags
Please use the following tags to provide meta-information about your data set.
We will use the topic information to color the LOD cloud later.
Please also list the vocabularies used by your data set so that the community can get an overview of which vocabularies are commonly used on the Web of Linked Data.
Linked Data published on the Web should be as self-describing as possible in order to make it easier for clients to understand and use the data. Important aspects of self-descriptiveness are making vocabulary terms dereferenceable according to the best practices described in Publishing RDF Vocabularies, using terms from common vocabularies and providing vocabulary mappings for proprietary vocabulary terms. In order to allow the community to get an overview which data sets implement these best practices, please tag your data set accordingly.
Tag | Purpose |
format-<prefix>
|
A vocabulary used by the data set, e.g., format-skos , format-dc , format-foaf . Use http://prefix.cc/ to find a prefix for a vocabulary. If a vocabulary is not in prefix.cc, then add it there or ignore that vocabulary.
|
no-proprietary-vocab
|
Indicates that your data set does not use a proprietary vocabulary (defined within your top-level domain). |
deref-vocab
|
Indicates whether the proprietary vocabulary terms used by your data set (the ones that are defined within your top-level domain) are dereferenceable according to the best practices for Publishing RDF Vocabularies |
vocab-mappings
|
Indicates whether you provide mappings for proprietary vocabulary terms (by setting owl:equivalentClass , owl:equivalentProperty , rdfs:subClassOf , and/or rdfs:subPropertyOf links, or publish mapping expressed as RIF rules or using the R2R Mapping Language).
|
provenance-metadata
|
Indicates whether the data set provides provenance meta-information (creator of the data set, creation date, maybe creation method) as document meta-information or via a voiD description. For instance, using the dc:creator or dc:date properties.
|
license-metadata
|
Indicates whether the data set provides licensing meta-information as document meta-information or via a voiD description. For instance, using the dc:rights property.
|
published-by-producer
|
Indicates whether the data set is published by the original data producer or a third party. |
limited-sparql-endpoint
|
Indicates whether the SPARQL endpoint is not serving the whole data set. |
lodcloud.nolinks
|
Dataset has no external RDF links to other datasets. |
lodcloud.unconnected
|
Dataset has no external RDF links to or from other datasets. |
lodcloud.needsinfo
|
The data provider or dataset homepage do not provide mininum information (and information can't be determined from SPARQL endpoint or downloads). |
lodcloud.needsfixing
|
The dataset is currently broken. Provide details in the Notes. |