Multi-lingualismOfVocabs

From Government Linked Data (GLD) Working Group Wiki
Jump to: navigation, search

Back to Vocabulary Selection Wiki page

Multi-lingualism Of Vocabs

This page addresses the issues around the usage of vocabularies by persons speaking different languages. Vocabularies are often created and published by English speaking individuals who therefore use English terms to compose the URIs. This is not a problem as with RDF the meaning is given by the description of a resource rather than its URI. However in too many cases this description is not good enough to facilitate the usage of English vocabularies by non-English speakers. In the following, we outline some quality criterion to assess the multi-lingual quality of a vocabulary and answer to some related questions a person designing a vocabulary may have.

Contributors: Christophe, Bart, Dani, Boris

Quality criteria related to multi-lingualism

  • The vocabulary uses English terms for Concepts and Properties (that's the most spoken language after all!)
  • The vocabulary uses several instances of rdfs:label to provide a human-readable version of the resource name in different languages
  • The vocabulary uses several instances of rdfs:comment to provide a human-readable description of the resource in different languages
  • The vocabulary is seconded by document explaining its usage.
  • The vocabulary extends a multi-lingual vocabulary

Questions & Answers

In the following, replace "X" by your favourite language.

I found a suitable vocabulary but its concepts are not in X, shall I make a X one ?

The name of the concepts and properties doesn't matter (at least from a machine point of view). Most important is to ensure that machines will be able to convey the proper meaning of these concepts/properties to applications end-users. Rather than creating a new vocabulary, extend the one you found by associating translations of the name and short descriptions in your own language.

Checking the quality

Note: the presence of rdfs:label/rdfs:comment can be tested with SPARQL queries, with LOV providing the end point. Shall someone from Mondeca include a checking system in their site?

Bart: My 2 cents

I think we should try to address 3 issues

  1. Make non native english speakers aware of the fact that the majority of ontologies use english labels and comments, so if you are looking for domain specific ontologies you should think about searching with english terms.
  2. If new ontologies are created or existing ones are extended make sure that the multilangual aspect is addressed, this has also something to do with governance, if you create a ontology the process of accepting translations should be taken care of.
  3. It should be made clear that although the documentation of the ontology is not available in your native language, that has no influence on the language of the expressions you make with it. You can have perfect french data described with a ontology with just english documentation.

I think that it is very dangerous to put multilangual quality stamp on ontologies, even RDFS is just available in english, is it bad ? Reuse and adoption is fare more important I think

(Christophe) > Agree, we should point at the translation possibilities and motivate to properly document the ontology even if it's only in english (add labels and languages tags to labels)