Best practises - previous notes

From Best Practices for Multilingual Linked Open Data Community Group

Some annotations and background material for the topic's discussion

Naming and dereferencing

Topics

  • IRIs vs URIs
  • Opaque URIs vs descriptive URIs
  • Namespace selection
  • Language based content negotiation

Patterns for Naming

ID Name Description Example Arguments in favor Arguments against References
P1.1 Descriptive URIs Descriptive URIs http://example.org/Armenia
  1. Readability (but English/latin alphabet)
  2. Tool support
  3. The only way of describing the concept in ontologies that lack labels and comments
  1. Unreadable for non-Latin alphabet users
  2. Difficult to be descriptive enough in a URI in certain contexts (biomedical, financial, ...), descriptive names are sometimes "criptical"
P1.2 Opaque URIs Opaque URIs http://example.org/#I23AX45
  1. Independence between content and language
  2. Changes in textual descriptions do not imply changes in URIs
  3. Suitable for automatic LD generation from existent resources
  1. Non human-readable
  2. Worse for developers
Examples: http://lemon-model.net/lexica/uby/wn/WN_Lexicon_0

Princeton WordNet, EuroWordNet, Agrovoc, FRBR models and ISBD in the bibliographic domain

P1.3 Full IRIs Full IRIs http://օրինակ.օրգ/#Հայաստան
  1. Readable (for one language)
  1. Security issues (spoofing)
  2. Unreadable for speakers of other languages
  3. Issues with right-to-left languages like Arabic or Hebrew
P1.4 Internationalized paths only Internationalized paths only http://example.org/#Հայաստան, http://example.org/Հայաստան
  1. Less security risks
  2. Path readable (for one language)
  1. Unreadable for speakers of other languages
  2. Problem: where the namespace does not end in / or # is difficult to see where the term is (eg.: namespace is http://w3.org/html and full name is http://w3.org/htmldiv). You should define prefixes that end with / or #
Local DBpedias
P1.5a Include language in host name of URI Include language in URIs http://hy.example.org/#Հայաստան
  1. practical reasons: divide in different datasets, as in DBpedia
  1. Where do we put it? (beginning, end, ..)
  2. Dialects
  3. Actually, "es" in DBpedia uri identifies NOT the language but the source
  4. May be technically challenging (requires DNS entry for each language)
  5. Confuses server/data distinction
Local DBpedias
P1.5b Include language in path of URI Include language in URIs http://example.com/en/Armenia

http://example.com/Armenia.en http://example.com/Armenia?lang=en

  1. Compatible with content negotiation


Related links:

See thread about "Fragment issues in ITS/HTML/XML mapping to NIF" at http://lists.w3.org/Archives/Public/public-bpmlod/2013Sep/0002.html

See interesting e-mail from Berners-Lee on this subject http://lists.w3.org/Archives/Public/public-lod/2011Apr/0282.html

QUESTIONS:

Which pattern for naming do we propose as prefered in multilingual scenarios?

Which other patterns are acceptable and under which conditions?

Patterns for Dereferencing

ID Name Description Example Arguments in favor Arguments against References
P1.6 No language negotiation Return always the same triples without taking into account the HTTP Accept-language header. No language negotiation You have all the information. In RDF if you have the language tag, you do not need content negotiation
P1.7 Language content negotiation the server attends the language preferences of the user agent, presented in the Accept-language header and returns different data for each language preference. Language content negotiation See example here Save bandwith You can loose information, specially if there is multilingual content
P1.8 Language content redirection the server attends the language preferences of the user agent, presented in the Accept-language header and returns a 303 (see also) redirect to a resource with triples in that language Example. If the URI to dereference is http://example.org and the Accept-language header is es, the server returns 303 (see also) to the URI http://example.org/?lang=es which contains triples with spanish content Maintains the difference between the generic representation of a resource in any language and the representation of that resource in a given language Not feasible for all the resources to have representations in different languages.

Related Links

W3C Internationalization Activity's Best Practice on selecting language tags

Comments

In principle we see few arguments in favour of language content negotiation in the context of Linked Data (http://www.w3.org/2014/02/21-bpmlod-minutes.html)

Textual Information

Topics

  • Labels with language tag
  • Labels without language tag
  • Longer descriptions
  • Lexicalizations and linguistic information
  • Localization information, see related discussion on localization workflows at 7 March 2014 call

Patterns for Textual information

ID Name Description Example Arguments in favour Arguments against References
P2.1 Use rdfs:label for Everything Linked data datasets should provide labels with the property rdfs:label for all resources: individuals, concepts and properties, not just the main entities.
:juan rdfs:label "Juan" . 
:Professor rdfs:label "Professor"@en .
:position rdfs:label "Position"@en ;
          rdfs:label "Posición"@es .
  • rdfs:label is a well established property which is supported by most of the tools
  • There may be some tools that don't support rdfs:label
  • It may be difficult to associate a rdfs:label to some resources, specially automatically generated resources
Label Everything
P2.2 Multilingual labels In a multilingual setting, it is necessary to attach language tags to textual information, in order to identify the appropriate label for localized applications.
:juan :position "Professor"@en ; 
      :position "Catedrático"@es .
Multilingual labels are part of the RDF standard and well supported by semantic web tools. To do a SPARQL query over a dataset with multilingual labels is a little more difficult than without language tags. For example, the following SPARQL query would return no results:
 SELECT * WHERE {
  ?x ex:position "Professor" .
 }

It is necessary to specify the language. So the SPARQL query that works is:

 SELECT * WHERE {
  ?x ex:position "Professor"@en .
 }

or if the language is unknown, it can be expressed as:

 SELECT * WHERE {
  ?x :position ?p .
  FILTER ( str(?p)="Professor" )
  }
Multilingual labels
P2.3 Labels without language tags Apart of language-tagged labels, one can also associate plain labels without a language tag.
:juan :position "Professor"@en ; 
      :position "Catedrático"@es ;
      :position  "Professor" .
This pattern can facilitate SPARQL queries when one does not know the language of the literals. In general, this is perceived as a bad practice. In which language should we write the label without language tag? Labels without language tags

Rules of thumb by Richard Cyganiak

P2.4 Annotate long descriptions Annotate long descriptions using new resources that can be represented with shorter labels or lexical entities.
:juan :job "Professor at the University of León"@en .

would turn into

:juan :job 
 "Professor at the University of León"@en ;
 :position :professor ;
 :workPlace :unileón ;
:professor rdfs:label "Professor"@en .
:uniLeón rdfs:label "University of León"@en . 
It facilitates their future translation to other languages. It is not always feasible to find the right resources.

Sometimes the long description can be replaced by the annotations.

divide longer descriptions
P2.5 Provide lexical information Provide lexical information in an external lexicon, for instance using lemon model
:unileón a lemon:LexicalEntry ;
  lemon:decomposition (
    [ lemon:element :University ]
    [ lemon:element :Of ]
    [ lemon:element :León ]
 );
  rdfs:label "University of León"@en .
:University a lemon:LexicalEntry ;
   lexinfo:partOfSpeech lexinfo:commonNoun ;
   rdfs:label "University"@en ;
   rdfs:label "Universidad"@es .
:Of a lemon:LexicalEntry ;
   lexinfo:partOfSpeech lexinfo:preposition ;
   rdfs:label "of"@en ;
   rdfs:label "de"@es .
:León a lemon:LexicalEntry ;
   lexinfo:partOfSpeech lexinfo:properNoun ;
   rdfs:label "León"
Providing lexical metadata for a resource can help linked data applications to visualize and manage textual information. It can also add a complexity overhead to the dataset that may be undesired. Labels Provide lexical information.

Lexical annotations can also be done using NIF to reference parts of the strings.

P2.6 Structured literals Literals in the RDF model can also be structured values in XML or HTML. Using structured literals.
:unileón :desc 
   "<p>University of
    <span translate="no">León</span>,
    Spain.</p>"^^rdf:XMLLiteral .
It is possible to offer longer descriptions leveraging the Internationalization practices that have already been proposed for those languages. Abusing of structured literals can miss the advantages of semantic modeling in RDF Structured literals

Related Links

W3C Internationalization Activity's Best Practice on selecting language tags

Comments

Whatever naming convention is used, it is important to provide descriptions or explanations of what the terms actually mean. In particular, it is a good idea to provide titles and if possible descriptions in multiple languages - ideally languages with different structures.

Linking

See http://oa.upm.es/8848/1/Multiling.pdf and http://www.weso.es/MLODPatterns/catalog.html

Patterns for Linking at the Conceptual level

ID Name Description Example Arguments in favour Arguments against References
P3.1 Cross-lingual identity links Use `owl:sameAs` to link resources expressed in different languages Suppose we have information about Armenia in English which is identified by http://hy.example.org#Հայաստան while the URI

http://en.example.org#Armenia contains information about Armenia in English. We can declare that both URIs refer to the same thing by asserting:

 <http://hy.example.org#Հայաստան>
  owl:sameAs
   <http://en.example.org#Armenia> .
owl:sameAs is a well-known property which is supported by several linked data applications. The semantics of owl:sameAs has some implications which may be undesirable. For

example, it could be that the information about Armenia in the different languages comes from different sources and thus, contains different data. Using owl:sameAs can then render inconsistencies.

Inter-language Identity links
P3.2 Cross-lingual soft links Use a soft property to state that two resources are inter-language linked (e.g., rdfs:seeAlso, skos:closeMatch, skos:exactMatch). The example in pattern P3.1 can be expressed as:
 <http://hy.example.org#Հայաստան>
  rdfs:seeAlso
   <http://en.example.org#Armenia> .  
Soft links are weaker regarding semantic implications than an

owl:sameAs link.

Using a custom property like dbo:wikiPageInterLanguageLink can provide

more freedom but those properties are usually not well recognized by automated software agents. Thus, the use of more common properties with similar semantics (i.e. rdfs:seeAlso, skos:related, etc) should be considered.

Soft inter-language links
P3.3 Cross-lingual taxonomical relations For instance rdfs:subClassOf, skos:broader, etc. thus considering not only identity links as in P3.1 but any other possible taxonomical relationship [This pattern somehow subsumes P3.1]
   ontology1:Person rdfs:label "person"@en .
   ontology2:Hombre rdfs:label "hombre"@es .
   ontology1:Hombre rdfs:subClassOf ontology2:Person .
P3.4 Domain dependent relations That is, using properties coming from other ontologies. E.g., foaf:currentProject, dbpedia-owl:capital, mo:artist, etc.
   ontology1:Москва rdfs:label "Москва"@ru .
   ontology2:Russia rdfs:label "Russia"@en .
   ontology1:Москва  dbpedia-owl:capital ontology2:Russia .
P3.5 Linkage by using common background knowledge In case related ontology entities are linked to a common external ontology, dataset (e.g., BabelNet, DBpedia) or lexicon, this background knowledge can be used as pivot for inferring a relation between such ontology entities.
 :bench-en a lemon:LexicalEntry ; 
   lemon:form [lemon:writtenrep "bench"@en] . 
:bench-en-sense_1 a lemon:LexicalSense ;    
   lemon:isSenseOf :bench-en ; 
   lemon:reference ontology1:bench . 
:bench-en-sense_2 a lemon:LexicalSense ;     
   lemon:isSenseOf :bench-en ; 
   lemon:reference ontology2:banco.

Patterns for Linking at the Linguistic level

Here the links would not be established between the concepts (or instances) themselves but between their associated linguistic information. This sort of mappings can be very useful when keeping uncoupled the conceptual and linguistic information is a major requirement. In order to allow two ontologies to interoperate at the linguistic level, mappings would be established between the linguistic descriptions of their concepts, which are not necessarily exact equivalents but the closest correspondences between culture-specific concepts. Gracia et al. 2012

ID Name Description Example Arguments in favour Arguments against References
P3.6 Implicit translations Let us suppose that entities in the ontology point to lexical entries in different monolingual lexicons. If lexical entries in different lexicons share the same (or equivalent) ontological referent, a translation can be inferred between them. The following example "bench"@en is the lexical realisation of two different ontology entities
:lexiconEN lemon:term :bench-en .
:bench-en a lemon:LexicalEntry ; 
   lemon:form [lemon:writtenrep "bench"@en] . 
:lexiconES lemon:term :banco-es .
:banco-es a lemon:LexicalEntry ; 
   lemon:form [lemon:writtenrep "banco"@en] .

:bench-en-sense a lemon:LexicalSense ;    
   lemon:isSenseOf :bench-en ; 
   lemon:reference ontology1:bench . 
:banco-es-sense a lemon:LexicalSense ;     
   lemon:isSenseOf :banco-en ; 
   lemon:reference ontology1:bench.

P3.7 Linkage by using explicit translations When the lexical information of the ontology is represented in an external lexicon, explicit translations can be declared among their senses. See http://purl.org/net/translation In the following example "bench"@en and "banco"@es are represented using lemon and their senses linked through a Translation object.
:bench-en-sense a lemon:LexicalSense ;
       lemon:isSenseOf :bench-en ;
       lemon:reference ontology1:bench .
:bench-en a lemon:LexicalEntry ;
       lemon:Form [lemon:writtenrep "bench"@en] .
:banco-es-sense a lemon:LexicalSense ;
       lemon:isSenseOf :banco-es ;
       lemon:reference ontology2:banco . 
:banco-es a lemon:LexicalEntry ;
       lemon:Form [lemon:writtenrep "banco"@es] .
:bench_banco-trans a tr:Translation ;
       tr:translationSource :bench-en-sense ;
       tr:translationTarget :banco-es-sense .
This pattern allows to represent translations explicitly and, as the relation is reified, additional information can be attached to it (provenance, confidence, etc.). On the other hand it adds complexity. A dataset using this representation mechanism can be found at http://linguistic.linkeddata.es/apertium/

Ontologies and vocabularies

Patterns for Vocabulary Reuse

ID Name Description Example Arguments in favour Arguments against References
P4.1 Monolingual vocabularies Define vocabularies with terms defined in a single language, usually English Many popular vocabularies and ontologies for the semantic web (FOAF, Dublin Core, OWL, RDF Schema, etc.) are monolingual in English, both for labels and comments. Easy to control the vocabulary evolution and avoid the appearance of bad translations or ambiguities between language versions Translation is needed if the ontology is used in a multilingual setup. Monolingual vocabularies
P4.2 Multilingual vocabularies Define vocabularies and ontologies where the concepts contain translations for several languages.
:position  a  owl:DatatypeProperty ;
 rdfs:domain  :UniversityStaff ;
 rdfs:label  "Position"@en ;
 rdfs:label  "Puesto"@es .
:UniversityStaff  a  owl:Class ;
 rdfs:label  "University staff"@en ;
 rdfs:label  "Trabajador universitario"@es .

Some multilingual ontologies are Agrovoc and Eurovoc.

  • Good to support multilingual applications.
  • Ontologies easier to understand by more people (users, developers, ...).
  • Ontology harder to maintain.
  • Some concepts are difficult to translate and there may appear ambiguities in the translations. For example, the label Professor may be translated to Profesor in Spanish. However, the meaning of those concepts is different (in Spanish it is usually preferred as Catedrático).
Multilingual vocabularies
P4.3 Localize existing vocabularies Enrich existing vocabularies with local translations, externally to the original vocabulary. A linked data application in Spanish may use the Dublin Core vocabulary to indicate the contributors of a given work. The end-user should see the labels in his own language. To that end, one can add a localized label to dc:contributor as:
dc:contributor  rdfs:label  "Colaborador"@es . 
  • A multilingual linked data application could transparently select the tagged literals in its preferred language.
  • Easier to maintain than the multilingual ontology solution
  • The data is not centralised in a single ontology, thus being more difficult to discover
  • Polluting well known vocabularies with localized literals may be controversial and should be handled with caution.
Localize existing vocabularies
P4.4 Create new localized vocabularies This pattern is about creating new localized properties and classes and relate them to existing ones using the owl:sameAs, owl:equivalentProperty or owl:equivalentClass properties.
dc:contributor 
   owl:equivalentProperty :colaborador .
:colaborador   
   rdfs:label "Colaborador"@es . 
This pattern gives freedom to vocabulary creators to tailor the vocabulary according to their exact needs. However, it can be more difficult for both humans and software agents to recognize and consume these new properties and classes. Create new localized vocabularies
P4.5 Use Lemon to enrich the multilingual semantics of existing vocabularies This pattern is about using well established ontologies, such as Lemon, to add multilingual semantics and translations to existing vocabulary terms.
:colaborador a lemon:LexicalEntry ; 
             rdfs:label "Colaborador"@es ;
             lemon:language "es" ;
             lemon:sense :colaborador_sense . 
:colaborador_sense lemon:reference dc:contributor .
Users can provide translations to vocabularies using well established multilingual standards (Lemon) which also add finer-grained semantics to the terms. However, agents not familiar with Lemon may find it harder to interpret the translations. lemon - The Lexicon Model for Ontologies

Quality