Data Catalog Vocabulary/Recipes

From Government Linked Data (GLD) Working Group Wiki
Jump to: navigation, search

In this page you can find recipes to represent use cases of different datasets in catalogs. All the work should be checked because there are some open issues which affect directly.

How to describe a catalog

Please, note that an organization might have more than a catalog, so they should be defined with a different URI.

:catalog
  a dcat:Catalog ;
  dct:title "Imaginary catalog" ;
  rdfs:label "Imaginary catalog" ;
  foaf:homepage <http://example.org/catalog> ;
  dct:publisher :body/transparency-office ;
  dcat:themes :themes ;
  dct:language "en"^^xsd:language ;
  dct:spatial :location ;
  dct:license :main-license ;
  dcat:record :record/001 ;
  dcat:dataset :dataset/001 ;
  .

The body which publish the catalog:

:body/transparency-office
  a foaf:Agent ;
    rdfs:label "Transparency Office" ;
  .

Theme scheme of the catalog datasets:

:themes
  a skos:ConceptScheme ;
  skos:prefLabel "A set of domains to classify datasets" ;
  .

The scheme contains a taxonomy of concepts to classify the datasets of the catalog.

:themes/environment 
  a skos:Concept ;
  skos:inScheme :themes ;
  skos:prefLabel "Environment" ;
  .

Spatial coverage of the catalog. In this case, a country.

:location
  a dct:Location ;
  rdfs:label "Belgium"@en ;
  owl:seeAlso <http://sws.geonames.org/2802361/> ;
  .

The terms of use by default related to the catalog.

:main-license 
  a cc:License ;
  rdfs:label "My license"@en ;
  cc:requires <http://creativecommons.org/ns#Notice> ;
  cc:requires <http://creativecommons.org/ns#Attribution> ;
  cc:permits <http://creativecommons.org/ns#Reproduction> ;
  cc:permits <http://creativecommons.org/ns#DerivativeWorks> ;
  cc:permits <http://creativecommons.org/ns#Distribution> ;
  dct:creator :transparency-office ;
  .

How to describe Catalog Records

Each dataset can be related with an entry of the catalog, this record is typed as dcat:CatalogRecord.

:record/001
  a dcat:CatalogRecord ;
  foaf:primaryTopic :dataset/001 ;
  dct:issued "2011-12-11"^^xsd:date ;
  dct:modified "2012-02-01"^^xsd:date ;
  .

How to describe a Dataset

:dataset/001
  a dcat:Dataset ;
  dct:title "Pollution levels in 2011"@en ;
  dct:description "Evolution of the pollution levels during 2011"@en;
  dcat:keyword "environment","pollution" ,"air" ;
  dcat:theme :themes/environment;
  dct:issued "2012-01-01"^^xsd:date;
  dct:updated "2012-01-05"^^xsd:date;
  dct:temporal [
    a dct:PeriodOfTime, time:Interval;
    time:hasBeginning [  
      a time:Instant;
      time:inXSDDateTime "2011-01-01T00:00:00-05:00"^^xsd:dateTime.
	];
	time:hasEnd [
	  a time:Instant;
	  time:inXSDDateTime "2011-12-31T23:59:59-05:00"^^xsd:dateTime.
	].
  ];  
  dct:publisher :agency/environment ;
  dct:accrualPeriodicity :frecuency-year ;
  dct:language "en"^^xsd:language ;
  dct:license <http://creativecommons.org/licenses/by/3.0/>;
  dcat:dataDictionary <http://www.example.org/files/usage-scheme-001.csv> ;
  dcat:dataQuality "This data may contain errors due to unexpected malfuntion of the air pollution stations"@en;
  dcat:distribution :dataset/001/csv ;
  dcat:distribution :dataset/001/webservice ;
  .

How to describe a distribution based on a CSV dump

:dataset/001/csv
  a dcat:Distribution ;
  dct:title "CSV distribution of the pollution levels in..." ;
  dcat:accessURL <http://www.example.org/files/001.csv> ;
  dcat:size [
    dcat:bytes 16162^^xsd:integer; 
    rdfs:label "16.2KB".
  ] ;
  dct:format [
    a dct:IMT ; 
    rdf:value "text/csv" ; 
    rdfs:label "CSV" .
  ] ;
  .

How to describe a distribution based on a Web Service

:dataset/001/webservice
  a dcat:Distribution ;
  dct:title "SOAP Web Service to access data of the pollution levels in..." ;
  dcat:accessURL <http://www.example.org/endpoints/SOAP> ;
  dct:format [
    a dct:IMT ; 
    rdf:value "application/soap+xml" ; 
    rdfs:label "SOAP Web Service" .
  ] ;
  .

Maybe this should have some additional information related to the technical implementation, parameters, etc. This documents may be referenced using "dcterms:references".


How to describe a distribution based on a CSV compressed into a ZIP file

:dataset/001/csv-zip
  a dcat:Distribution ;
  dct:title "CSV distribution of the pollution levels in..." ;
  dcat:accessURL <http://www.example.org/files/001.csv.zip> ;
  # the size of the ZIP or the CSV file?
  dcat:size [
    dcat:bytes 6237^^xsd:integer; 
    rdfs:label "6.2KB".
  ] ;
  # the format used to represent the data
  dct:format [
    a dct:IMT ; 
    rdf:value "text/csv" ; 
    rdfs:label "CSV" .
  ] ;
  .

In this case we need something to specify if the accessURL provides a direct access to the data. In this case, the access is indirect because the reuser must to perform an intermediate process to access to the data of the distribution.

This could be solved with something like this:

:dataset/001/csv-zip
  dcat:accessURL <http://www.example.org/files/001.csv.zip> ;
  dct:type :indirect-access.

or

:dataset/001/csv
  dcat:accessURL <http://www.example.org/files/001.csv> ;
  dct:type :direct-access.