From Government Linked Data (GLD) Working Group Wiki
Jump to: navigation, search

Back to Vocabulary Selection Wiki page

Dealing with vocabularies (Michael, Boris, Ghislain, Dani)

Discovery checklist

This checklist provides some steps to take into account when trying to find out existing vocabularies that could best fit the needs of a Government or a specialized agency. The reason is to avoid as much as possible building from scratch a new vocabulary and to reuse as much as possible existing *good* vocabularies of the domain.

  • What is your domain of interest ?
By this answer, you define and restrict the scope of your domain to quickly find out related works in LOD in your domain. 
Example: Geography, Environment, Administrations, State Services, Statistics, etc.
  • What are the keywords in your dataset?
By identifying the relevant keywords or categories of your dataset, it helps for the searching process using Semantic Web Search Engine.
If you have raw data in csv, the columns of the tables can be used for the searching process.
Example: commune, county, point, feature, address, etc...
  • Are you looking for a vocabulary in one specific language?
Many of the vocabularies are available in english. You may be aware of having a vocabulary in your own language. Consider this issue
as it may restrict your search. Sometimes it might be useful to translate some of the keywords to english. 

  • How could you find vocabularies?
 There are some specific search tools (Falcons, Watson, Sindice, Semantic Web Search Engine, Swoogle) that collect, analyse and indexe
 vocabularies and semantic data available online for efficient access.
  • Where could you find related vocabularies in datasets catalogues?
 Another way around is to perform search using the previously identified key terms in datasets catalogues. The latter provide some samples
 of how the underlying data was modelled and how it was used for. 
 Some existing catalogues are: Data Hub (previously CKAN), LOV directory,    Kasabi, etc...

Vocabulary selection criteria checklist

This checklist aims at giving some advices to better assess and select the best vocabulary, according to the output of the vocabularies discovered in the *Discovery* section. The final result should be one or two vocabularies that could be reused for your own purpose (mappings, extension, etc..)

  • Are they good use of rdfs:label and rdfs:comment?
The vocabulary should be self-descriptive. Each Class and Property should have a label and comments associated.
  • Is the vocabulary available in more than one language?
Multilingualism should be supported by the vocabulary at least for different lanquage different to English. That is also very important 
as the documentation should be clear enough with appropriate tag for the language used for the comments or the labels.  
  • Who is using the vocabulary?
It is always better to check how the vocabulary is used by others initiatives around and  its popularity.  
  • How is the vocabulary maintained?
The vocabulary selected should have a guarantee of maintenance in a long term, or at least the editors should be aware of that issue.
It also include here checking the permanence of the URIs, and how is the policy of vocabulary versioning.
  • Who is the publisher of the vocabulary?
Although anyone can create a vocabulary, it is always better to check if it is one person or a group or organization which is
responsible for publishing and maintaining the vocabulary.
It is recommended to better trust a well-known organization than a single person.
  • How permanent are the URIs?
It refers here to not have a 404 http error when trying to access at any *thing* of the vocabulary.
Also it refers to the permanent access to the server hosting the vocabulary.
  • What policies are applied to control the changes?
It refers to the mechanism put in place by the publisher to always take care of backward compatibilities of the versions, the ways those changes affected the previous versions.
  • Is the documentation available?
A vocabulary should be well-documented for machine readable (use of labels and comments; tags to language used), and 
also for human-readable,    that is an extra documentation should be provided by the publisher to better understands 
the classes and properties, and if possible with some valuable use cases. 
  • Labels and comments
  • Multilingualism ( Bart: is this a criteria ? see the disucussion )
  • Usage
  • Maintenance
    • Permanent URIs
    • Change control
  • Publisher
  • Documentation

Editorial notes for creators/maintainers:

The need for creating a vocabulary.

  • Creation
    • Namespace management
      • stability (PURL)
      • longevity (hit by bus)
    • Available ontology development methodologies (Informative)
  • Usage (instance-level, SPARQL, etc.)
    • Versioning
    • go back to creation
  • Partial or full deprecation

Cross-cutting issues: "Hit-by-bus"

Links from oleyerickson