Lexicon Model for Ontologies: Community Report, 10 May 2016

Final Community Group Report

Editors:
Philipp Cimiano (Cognitive Interaction Technology Excellence Center, Bielefeld University)
John P. McCrae (Insight Centre for Data Analytics, National University of Ireland, Galway)
Paul Buitelaar (Insight Centre for Data Analytics, National University of Ireland, Galway)

Abstract

This document describes the lexicon model for ontologies (lemon) as a main outcome of the work of the Ontology Lexicon (Ontolex) community group.

Ontologies are an important component of the Semantic Web but current ontology languages such as OWL and RDF(S) lack support for enriching ontologies with linguistic information, in particular with information concerning how ontology entities, i.e. properties, classes, individuals, etc. can be realized in natural language. The model described in this document aims to close this gap by providing a vocabulary that allows ontologies to be enriched with information about how the vocabulary elements described in them are realized linguistically, in particular in natural languages.

OWL and RDF(S) rely on the RDFS lable property to capture the relation between a vocabulary element and its (preferred) lexicalization in a given language. This lexicalization provides a lexical anchor that makes the class, property, individual etc. understandable to a human user. The use of a simple label for linguistic grounding as available in OWL and RDF(S) is far from being able to capture the necessary linguistic and lexical information that Natural Language Processing (NLP) applications working with a particular ontology need.

The aim of lemon is to provide rich linguistic grounding for ontologies. Rich linguistic grounding includes the representation of morphological and syntactic properties of lexical entries as well as the syntax-semantics interface, i.e. the meaning of these lexical entries with respect to an ontology or vocabulary.

Status of This Document

This specification was published by the Ontology-Lexicon Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Final Specification Agreement (FSA) other conditions apply. Learn more about W3C Community and Business Groups.

This specification was published by the OntoLex Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Contributor License Agreement (CLA) there is a limited opt-out and other conditions apply. Learn more about W3C Community and Business Groups.

This document is the first official report of the OntoLex community group. It does not represent the view of single individuals but reflects the consensus and agreement reach as part of the regular group discussions. The report should be regarded as the official specification of lemon.

If you wish to make comments regarding this document, please send them to public-ontolex@w3.org (subscribe, archives).

1. Overview

This document describes the specification of the lexicon model for ontologies (lemon) as resulting from the work of the W3C Ontology Lexicon Community Group.

The aim of the lexicon model for ontologies (lemon) is to provide rich linguistic grounding for ontologies. Rich linguistic grounding includes the representation of morphological and syntactic properties of lexical entries as well as the syntax-semantics interface, i.e. the meaning of these lexical entries with respect to an ontology or vocabulary.

This document is structured into nine sections, where the first five correspond to the main modules of lemon. Depending on their needs and requirements, applications will use one or more of the modules mentioned below, with the use of the OntoLex module being the minimal choice.

The last three sections do not describe the formal modelling but clarify

2. Introduction

Ontologies are an important component of the Semantic Web but current standards such as OWL only support the addition of a simple label to entities in the ontology. It is not currently possible to add inflected forms, different genders, usage notes or create a full lexical resource such as Princeton WordNet. The model described in this document aims to close this gap by providing a vocabulary that allows ontologies to be enriched with information about how the vocabulary elements described in them are realized linguistically, in particular in natural languages, in order to render ontologies suitable for supporting meaningful interaction with and manipulation of them by human users and allowing NLP tools to be able work with ontologies.

OWL and RDF(S) rely on a property rdfs:label to capture the relation between a vocabulary element and its (preferred) lexicalization in a given language. This lexicalization provides a lexical anchor that makes the concept, property, individual etc. understandable to a human user. The use of a simple label for linguistic grounding as available in OWL and RDF(S) is far from being able to capture the necessary linguistic and lexical information that Natural Language Processing (NLP) applications working with a particular ontology need. Such NLP applications are for example:

2.1 Purpose of the model

The purpose of the model is to support linguistic grounding of a given ontology by adding information about how the elements in the vocabulary of the ontology (individuals, classes, properties) are lexicalized in a given natural language.

The model follows the principle of semantics by reference [1] in the sense that the semantics of a lexical entry is expressed by reference to an individual, class or property defined in the ontology. In some cases, the lexicon itself can add named concepts which are not made explicit in the ontology.

The model described here is open in the sense that it provides a core vocabulary to add information about the linguistic realization of ontology and vocabulary elements. This vocabulary can and should be extended as required by a particular application. In particular, the model abstracts from specific linguistic theory or category systems used to describe the linguistic properties of lexical entries and their syntactic behavior, encouraging reuse of existing data category systems or linguistic ontologies. The model is thus agnostic with respect to the linguistic theory and category systems. We make explicit in this document at which points we refer to an external repository of data categories or introduce novel sub-properties of properties defined in lemon.

The model as presented here is inspired by many other models, in particular the Lexical Markup Framework (LMF), the LexInfo model, the LIR model, the Linguistic Meta Model (LMM), the semiotics.owl ontology design pattern, and the Senso Comune core model.

It is important to also mention what is not the purpose of the model:

2.2 Namespaces

The model is available with the following sub-namespaces for the various modules of the overall model:

All modules may be imported from the following URL:

2.3 Conventions in this document

Throughout this document, we will use Turtle RDF Syntax to provide examples showing the use of the model. Axioms will be paraphrased in natural language. We will assume the following namespaces throughout all the examples in this document:

@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .
@prefix synsem: <http://www.w3.org/ns/lemon/synsem#> .
@prefix decomp: <http://www.w3.org/ns/lemon/decomp#> .
@prefix vartrans: <http://www.w3.org/ns/lemon/vartrans#> .
@prefix lime: <http://www.w3.org/ns/lemon/lime#> .

As we frequently also refer to other models, we will also assume the following namespaces in all examples:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix owl: <http://www.w3.org/2002/07/owl#>.
@prefix xsd: <http://www.w3.org/2001/XMLSchema#>.
@prefix skos: <http://www.w3.org/2004/02/skos#>.
@prefix dbr: <http://dbpedia.org/resource/>.
@prefix dbo: <http://dbpedia.org/ontology/>.
@prefix void: <http://rdfs.org/ns/void#>.
@prefix lexinfo: <http://www.lexinfo.net/ontology/2.0/lexinfo#>.
@prefix semiotics: <http://www.ontologydesignpatterns.org/cp/owl/semiotics.owl#>.
@prefix oils: <http://lemon-model.net/oils#>.
@prefix dct: <http://purl.org/dc/terms/>.
@prefix provo: <http://www.w3.org/ns/prov#>.

Furthermore, we require that instances of the model adhere to the RDF 1.1 specification and follow the appropriate guidelines. In particular, we require that language tags adhere to Best Common Practice 47, where tags are made up of a language code (based on ISO 639 codes part 1, 2, 3 or 5), optionally followed by a hyphen and a ISO 3166-1 country code. Language tags may also contain further subtags expressing e.g. the region, script or further variants.

In all examples in this document, the above namespaces are introduced using an appropriate @prefix statement. Prefixes are omitted from class and object property definitions if the referenced ontology element is defined in the same module. For cross-module and external references, the prefix is made explicit.

In many examples we will use the LexInfo ontology to describe grammatical categories, although this is not required for using this model. The LexInfo model and guidelines for constructing and extending linguistic category schemes are provided in the section on linguistic description.

3. Core

The following diagram depicts the core model (ontolex). Boxes represent classes of the model. Arrows with filled heads represent object properties, while arrows with empty heads represent subclass relations. In arrows labeled 'X/Y' (e.g. sense/isSenseOf), X (sense) is the name of the object property and Y (isSenseOf) the name of the inverse property.

Lemon_OntoLex_Core.png
Figure 1 Lemon_OntoLex_Core.png

3.1 Lexical Entries

The main class of the core of the lexicon ontology model is the class Lexical Entry. A lexical entry is defined as follows:

Lexical Entry (Class)

URI: http://www.w3.org/ns/lemon/ontolex#LexicalEntry

A lexical entry represents a unit of analysis of the lexicon that consists of a set of forms that are grammatically related and a set of base meanings that are associated with all of these forms. Thus, a lexical entry is a word, multiword expression or affix with a single part-of-speech, morphological pattern, etymology and set of senses.

SubClassOf: lexicalForm min 1 Form, canonicalForm max 1 Form, semiotics:Expression

A Lexical Entry thus needs to be associated with at least one form, and has at most one canonical form (see below).

Lexical entries are further specialized into words, affixes (e.g., suffix, prefix, infix or circumfix) and multiword expressions.

Word (Class)

URI: http://www.w3.org/ns/lemon/ontolex#Word

A word is a lexical entry that consists of a single token.

SubClassOf: LexicalEntry

Multiword Expression (Class)

URI: http://www.w3.org/ns/lemon/ontolex#MultiwordExpression

A multiword expression is a lexical entry that consists of two or more words.

SubClassOf: LexicalEntry

Affix (Class)

URI: http://www.w3.org/ns/lemon/ontolex#Affix

An affix is a lexical entry that represents a morpheme (suffix, prefix, infix, circumfix) that is attached to a word stem to form a new word.

SubClassOf: LexicalEntry

The following Turtle code gives examples of lexical entries for each of these subclasses, corresponding to the word cat, the multiword expression minimum finance lease payments and the affix anti:

no desc

:cat a ontolex:Word

:minimum_finance_lease_payments a ontolex:MultiwordExpression

:anti- a ontolex:Affix

3.2 Forms

A lexical entry can be realized in different ways from a grammatical point of view. These different grammatical realizations are represented as different forms of the lexical entry. A form is defined as follows:

Form (Class)

URI: http://www.w3.org/ns/lemon/ontolex#Form

A form represents one grammatical realization of a lexical entry.

SubclassOf: writtenRep min 1 rdf:langString

A lexical entry can be associated to one of its forms by means of the lexicalForm property, although it is preferred to use one of the two subproperties (canonical form, other form) defined below.

Lexical Form (Object Property)

URI: http://www.w3.org/ns/lemon/ontolex#lexicalForm

The lexical form property relates a lexical entry to one grammatical form variant of the lexical entry.

Domain: LexicalEntry

Range: Form

Each form can thus have one or more written representations, defined as follows:

Written Representation (Datatype Property)

URI: http://www.w3.org/ns/lemon/ontolex#writtenRep

The written representation property indicates the written representation of a form.

Domain: Form

Range: rdf:langString

SubPropertyOf: representation

A simple example of a lexical entry with two different forms corresponding to two different grammatical realizations (as singular and plural noun, respectively) is given below:

no desc

:lex_child a ontolex:LexicalEntry ;                                                
  ontolex:lexicalForm :form_child_singular, :form_child_plural .              
                                                                                
:form_child_singular a ontolex:Form ;                                          
  ontolex:writtenRep "child"@en .                                               
                                                                                
:form_child_plural a ontolex:Form ;                                            
  ontolex:writtenRep "children"@en .

Different forms are used to express different morphological forms of the entry. They should not be used to represent ortographical variants, which should be represented as different representations of the same form. For example, for the lexical entry color, we would have two different representations of the same form, one for the British English written representation colour and one for the American English written representation color. Both representations have the same pronunciation and the same meaning, so they are two different lexicographic variants of the same lexical entry:

no desc

:lex_color a ontolex:LexicalEntry;
     ontolex:lexicalForm :form_color.

:form_color a ontolex:Form;
     ontolex:writtenRep "colour"@en-GB, "color"@en-US.

A form may also have a phonetic representation, indicating the pronunciation of the word.

Phonetic Representation (Datatype Property)

URI: http://www.w3.org/ns/lemon/ontolex#phoneticRep

The phonetic representation property indicates one phonetic representation of the pronunciation of the form using a scheme such as the International Phonetic Alphabet (IPA).

Domain: Form

Range: rdf:langString

SubPropertyOf: representation

The following example shows how we can represent two different pronunciations for one form of a lexical entry using the example of "privacy" (the phonetic code is based on IPA):

no desc

:lex_privacy a ontolex:LexicalEntry;
     ontolex:lexicalForm :form_privacy.

:form_privacy a ontolex:Form;
     ontolex:writtenRep "privacy"@en;
     ontolex:phoneticReppɹɪv.si"@en-US-fonipa;
     ontolex:phoneticReppɹaɪ.və.si"@en-GB-fonipa.

Phonetic representation and written representation are both considered to be sub-properties of a more general property representation, for which users may define extra sub-properties as required.

Representation (Datatype Property)

URI: http://www.w3.org/ns/lemon/ontolex#representation

The representation property indicates a string by which the form is represented according to some scheme.

Domain: Form

Range: rdf:langString

A lexical entry has a canonical form, which is the form that primarily identifies this entry and may be used as an index term in the lexicon. The canonical form for single words is typically the lemma of that word and is determined by lexicographic conventions for that language. In the case of verbs, the lemma is typically the infinitive form or, alternatively, the present tense of the verb (note that if an external particle is used to indicate the infinitive as in English "to play", this particle should be omitted). For nouns it is the noun singular form, while for adjectives it is the positive (i.e., non-negative, non-graded) form. For multiword entries it is assumed that the same principles of lemmatization are applied to the head word.

The property canonical form has a LexicalEntry as domain and a Form as range. It is a subproperty of the property lexicalForm. The canonical form has to be unique, so that the property canonical form is declared to be functional:

Canonical Form (Object Property)

URI: http://www.w3.org/ns/lemon/ontolex#canonicalForm

The canonical form property relates a lexical entry to its canonical or dictionary form. This usually indicates the "lemma" form of a lexical entry.

Domain: LexicalEntry

Range: Form

Characteristics: Functional

SubPropertyOf: lexicalForm

It is recommended to use the rdfs:label property to indicate the canonical form in addition to the property canonicalForm to ensure compatibility with RDFS-based systems that expect an RDFS label. The lexical entries for the noun "cat", the verb "marry" and the adjective "high" would look as follows (in Turtle syntax):

no desc

:lex_cat a ontolex:LexicalEntry, ontolex:Word;
     ontolex:canonicalForm :form_cat;
     rdfs:label "cat"@en .

:form_cat a ontolex:Form;
     ontolex:writtenRep "cat"@en .

:lex_marry a ontolex:LexicalEntry, ontolex:Word;
     ontolex:canonicalForm :form_marry;
     rdfs:label "marry"@en .

:form_marry a ontolex:Form;
     ontolex:writtenRep "marry"@en .

:lex_high a ontolex:LexicalEntry, ontolex:Word;
     ontolex:canonicalForm :form_high;
     rdfs:label "high"@en .

:form_high a ontolex:Form; 
    ontolex:writtenRep "high"@en .

Of course, lexical entries need not to correspond to one word only, they can correspond to a multiword term, as the following example for the lexical entry "intangible assets" shows:

no desc

:lex_intangible_assets a ontolex:LexicalEntry, ontolex:MultiwordExpression;
     ontolex:canonicalForm :form_intangible_assets;
     rdfs:label "intangible assets"@en .

:form_intangible_assets a ontolex:Form;
     ontolex:writtenRep "intangible assets"@en .

Mulitword expressions are assumed to be distinct in both their full form and any abbreviated form as there may be distinct lexical and pragmatic properties associated with the two different forms of the term. Links using other vocabularies such as LexInfo may be used to describe the type of abbreviation:

no desc

:nasa a ontolex:LexicalEntry, lexinfo:Acronym ;
  ontolex:canonicalForm :form_nasa ;
  lexinfo:abbreviationFor :national_aeronautics_and_space_administration;
  rdfs:label "NASA"@en .

:form_nasa a ontolex:Form ;
  ontolex:writtenRep "NASA"@en .

:national_aeronautics_and_space_administration a ontolex:LexicalEntry, ontolex:MultiwordExpression ;
  ontolex:canonicalForm :form_national_aeronautics_and_space_administration ;
  lexinfo:abbreviationFor :nasa ;
  rdfs:label "National Aeronautics and Space Administration"@en .

:form_national_aeronautics_and_space_administration a ontolex:Form ;
  ontolex:writtenRep "National Aeronautics and Space Administration"@en .

It is also possible to indicate non-canonical forms of lexical entries, which we call other forms:

Other Form (Object Property)

URI: http://www.w3.org/ns/lemon/ontolex#otherForm

The other form property relates a lexical entry to a non-preferred ("non-lemma") form that realizes the given lexical entry.

Domain: LexicalEntry

Range: Form

SubPropertyOf: lexicalForm

For example, we may specify non-canonical forms of the verb (to) marry as follows:

no desc

:lex_marry a ontolex:LexicalEntry ;
  ontolex:canonicalForm :form_marry ;
  ontolex:otherForm :form_marries .

:form_marry a ontolex:Form;
     ontolex:writtenRep "marry"@en .

:form_marries a ontolex:Form;
     ontolex:writtenRep "marries"@en .

The morphological class (i.e., declension, conjugation or similar) may be specified with the morphological pattern property to avoid having to list all regular forms of a word. The implementation of these patterns is not specified by this document (but should be provided by some suitable vocabulary such as LIAM).

Morphological Pattern (Object Property)

URI: http://www.w3.org/ns/lemon/ontolex#morphologicalPattern

The morphological pattern property indicates the morphological class of a word.

Domain: LexicalEntry

The following example shows how to indicate the conjugation for the Latin words amare and videre.

no desc

:amare ontolex:morphologicalPattern :latin_first_conjugation ;
  ontolex:canonicalForm :amare_form .

:amare_form ontolex:writtenRep "amare"@la .

:videre ontolex:morphologicalPattern :latin_second_conjugation ;
  ontolex:canonicalForm :videre_form .

:videre_form ontolex:writtenRep "videre"@la

3.3 Semantics

The model supports the specification of the meaning of lexical entries with respect to a given ontology. The lexicon model for ontologies follows the paradigm of semantics by reference in the sense that the meaning of a lexical entry is specified by pointing to the ontological concept that captures or represents its meaning.

The property denotes is defined as follows:

Denotes (Object Property)

URI: http://www.w3.org/ns/lemon/ontolex#denotes

The denotes property relates a lexical entry to a predicate in a given ontology that represents its meaning and has some denotational or model-theoretic semantics.

Domain: LexicalEntry

Range: rdfs:Resource

SubPropertyOf: semiotics:denotes

InverseOf: isDenotedBy

PropertyChain: sense o reference

For the lexical entries cat and marriage, the meaning could be expressed by pointing to the corresponding DBpedia resources:

no desc

:lex_cat a ontolex:LexicalEntry;
   ontolex:canonicalForm :form_cat;
   ontolex:denotes <http://dbpedia.org/resource/Cat>.

:form_cat a ontolex:Form;
   ontolex:writtenRep "cat"@en.

:lex_marriage a ontolex:LexicalEntry;
   ontolex:canonicalForm :form_marriage;
   ontolex:denotes <http://dbpedia.org/resource/Marriage>.

:form_marriage a ontolex:Form;
   ontolex:writtenRep "marriage"@en .

The following example shows how we can model the fact that a word is ambiguous with respect to the meanings it denotes, for example the word 'troll' can refer both to a mythical creature and to someone who makes inflammatory posts on the internet. These two meanings can be easily captured as shown in the following example:

no desc

:troll a ontolex:LexicalEntry ;
  ontolex:denotes <http://dbpedia.org/resource/Troll> ;
  ontolex:denotes <http://dbpedia.org/resource/Internet_troll> .

Two terms may be different lexical entries if they are distinct in part-of-speech, gender, inflected forms or etymology. For example the following words with lemma 'bank' are all considered distinct:

no desc

:bank1_en a ontolex:LexicalEntry ;
  dct:language <http://id.loc.gov/vocabulary/iso639-2/eng>, <http://lexvo.org/id/iso639-1/en> ;
  lexinfo:partOfSpeech lexinfo:noun ;
  lexinfo:etymologicalRoot :banque_frm ;
  ontolex:denotes <http://dbpedia.org/resource/Bank> .

:bank2_en a ontolex:LexicalEntry ;
  dct:language <http://id.loc.gov/vocabulary/iso639-2/eng>, <http://lexvo.org/id/iso639-1/en> ;
  lexinfo:partOfSpeech lexinfo:noun ;
  lexinfo:etymologicalRoot :hobanca_ang ;
  ontolex:denotes <http://dbpedia.org/resource/Bank_(geographic)> .

:bank3_en a ontolex:LexicalEntry ;
  dct:language <http://id.loc.gov/vocabulary/iso639-2/eng>, <http://lexvo.org/id/iso639-1/en> ;
  lexinfo:partOfSpeech lexinfo:verb ;
  lexinfo:etymologicalRoot :hobanca_ang ;
  ontolex:denotes <http://dbpedia.org/resource/Banked_turn> .

:bank1_de a ontolex:LexicalEntry ;
  dct:language <http://id.loc.gov/vocabulary/iso639-2/de>, <http://lexvo.org/id/iso639-1/de> ;
  lexinfo:partOfSpeech lexinfo:noun ;
  lexinfo:gender lexinfo:feminine ;
  ontolex:denotes <http://dbpedia.org/resource/Bank> ;
  ontolex:otherForm :banken .

:banken ontolex:writtenRep "Banken"@de ;
  lexinfo:number lexinfo:plural .

:bank2_de a ontolex:LexicalEntry ;
  odct:language <http://id.loc.gov/vocabulary/iso639-2/de>, <http://lexvo.org/id/iso639-1/de> ;
  lexinfo:partOfSpeech lexinfo:noun ;
  lexinfo:gender lexinfo:feminine ;
  ontolex:denotes <http://dbpedia.org/resource/Bench_(furniture)> ;
  ontolex:otherForm :baenke .

:baenke ontolex:writtenRep "Bänke"@de ;
  lexinfo:number lexinfo:plural .

Note that the target of a denotation does not need to be an individual in the ontology but may also refer to a class, property or datatype property defined by the ontology. The model is agnostic with respect to the ontology language used to express the ontological meaning referred to. The assumption is merely that the entity in the range represents some predicate that has a denotational semantics in some formal logical system.

Properties in the model for linking to ontologies have an inverse property named as "is x-ed by", where x is the original property name to enable the lexicon to be defined in an ontology focused manner. In the case of denotes this property is isDenotedBy.

In some cases the meaning of a lexical entry is not explicit in the given ontology. Yet, to represent the meaning of a lexical entry we might want to create a new class at the interface between lexicon and ontology by reusing atomic ontological entities defined in the ontology in question. For example, we might want to express the meaning of an adjective by creating an anonymous restriction class at the level of the lexicon-ontology interface. This is illustrated below for the adjective "female" expressing the membership of an anonymous class ∃gender.{female}:

no desc

:female a ontolex:LexicalEntry; 
  lexinfo:partOfSpeech lexinfo:adjective;
  ontolex:canonicalForm :female_canonical_form;
  ontolex:sense :female_sense.

:female_canonical_form ontolex:writtenRep "female"@en.

:female_sense ontolex:reference [
    a owl:Restriction;
    owl:onProperty <http://dbpedia.org/ontology/gender> ;
    owl:hasValue <http://dbpedia.org/resource/Female> ] ;
  synsem:isA :female_arg .

3.4 Lexical Sense & Reference

For many practical modelling situations, the denotes property is not sufficient to capture the precise linking between a lexical entry and its meaning with respect to a given ontology. Thus, lemon introduces an intermediate element called lexical sense to capture the particular sense of a word that refers to the particular ontology entity. The lexical entry is linked to a lexical sense by means of the sense property and the lexical sense is linked to the ontology by means of the reference property. The chain sensereference is equivalent to the property denotes introduced above.

LexicalSense (Class)

URI: http://www.w3.org/ns/lemon/ontolex#LexicalSense

A lexical sense represents the lexical meaning of a lexical entry when interpreted as referring to the corresponding ontology element. A lexical sense thus represents a reification of a pair of a uniquely determined lexical entry and a uniquely determined ontology entity it refers to. A link between a lexical entry and an ontology entity via a Lexical Sense object implies that the lexical entry can be used to refer to the ontology entity in question.

SubClassOf: reference exactly 1 rdfs:Resource; isSenseOf exactly 1 LexicalEntry, semiotics:Meaning

Via the lexical sense object we can attach additional properties to a pair of lexical entry and ontological predicate that it denotes to describe under which conditions (context, register, domain, etc.) it is valid to regard the lexical entry as having the ontological entity as meaning. For example, we may wish to express the usages of the word "consumption" in terms of the topic and diachronic usage of the word. As shown in the following example, we can use the Dublin Core property subject to indicate the topic of the Sense. The example also shows how to use the property dating defined in the LexInfo ontology to specify that the fourth sense of consumption is outdated.

no desc

:lex_consumption a ontolex:LexicalEntry;
   ontolex:canonicalForm :form_consumption;
   ontolex:sense :consumption_sense1;
   ontolex:sense :consumption_sense2;
   ontolex:sense :consumption_sense3;
   ontolex:sense :consumption_sense4 .

:form_consumption ontolex:writtenRep "consumption"@en.

:consumption_sense1 a ontolex:LexicalSense;
  dct:subject <http://dbpedia.org/resource/Ecology> ;
  ontolex:reference <http://dbpedia.org/resource/Consumption_(ecology)> .

:consumption_sense2 a ontolex:LexicalSense;
  dct:subject <http://dbpedia.org/resource/Anatomy> ;
  ontolex:reference <http://dbpedia.org/resource/Ingestion> .

:consumption_sense3 a ontolex:LexicalSense;
   dct:subject <http://dbpedia.org/resource/Economics> ;
   ontolex:reference <http://dbpedia.org/resource/Consumption_(economics)> .

:consumption_sense4 a ontolex:LexicalSense;
   dct:subject <http://dbpedia.org/resource/Medicine> ;
   lexinfo:dating lexinfo:old ;
   ontolex:reference <http://dbpedia.org/resource/Tuberculosis> .

The lexical sense has a single lexical entry and a single reference in the ontology. As a consequence, the properties "sense" and "reference" are defined as inverse functional and functional, respectively.

Sense (Object Property)

URI: http://www.w3.org/ns/lemon/ontolex#sense

The sense property relates a lexical entry to one of its lexical senses.

Domain: LexicalEntry

Range: LexicalSense

InverseOf: isSenseOf

Characteristics: Inverse Functional

Reference (Object Property)

URI: http://www.w3.org/ns/lemon/ontolex#reference

The reference property relates a lexical sense to an ontological predicate that represents the denotation of the corresponding lexical entry.

Domain: LexicalSense or synsem:OntoMap

Range: rdfs:Resource

InverseOf: isReferenceOf

Characteristics: Functional

3.5 Usage

The interpretation of a word (lexical entry) with respect to a meaning defined in a given ontology is often modulated by usage conditions or pragmatic implications in particular due to register, connotations or meaning nuances of a word. For example, consider as an example the French words 'rivière' and 'fleuve', which refer to rivers flowing into a sea and flowing into other rivers, respectively. As corresponding ontological classes to capture the specific meanings of these French words might not be available in the ontology, these meaning nuances can be specified using the property usage, which allows information to be captured related to usage conditions and pragmatic implications under which the lexical entry can be used to refer to the ontological meaning in question. These usage conditions are not introduced instead of a formally defined sense but complement the corresponding sense by additional information describing the usage of the lexical entry.

How exactly constraints on the usage of senses are defined is not specified by lemon. Yet, we give an example below that shows how to model the lexical meaning of 'rivière' and 'fleuve' when used to refer to the DBpedia class River:

no desc

:riviere a ontolex:LexicalEntry ;
  ontolex:sense :riviere_sense .

:fleuve a ontolex:LexicalEntry ;
  ontolex:sense :fleuve_sense .

:riviere_sense ontolex:reference <http://dbpedia.org/ontology/River> ;
  ontolex:usage [ 
    rdf:value "A riviere is a river that flows into another river"@en
  ] .

:fleuve_sense ontolex:reference <http://dbpedia.org/ontology/River>;
  ontolex:usage [
    rdf:value "A fleuve is a river that flows into the sea"@en
  ] .

Usage (Object Property)

URI: http://www.w3.org/ns/lemon/ontolex#usage

The usage property indicates usage conditions or pragmatic implications when using the lexical entry to refer to the given ontological meaning.

Domain: LexicalSense

Range: rdfs:Resource

3.6 Lexical Concept

We have seen above how to capture the fact that a certain lexical entry can be used to denote a certain ontological predicate. We capture this by saying that the lexical entry denotes the class or ontology element in question. However, sometimes we would like to express the fact that a certain lexical entry evokes a certain mental concept rather than that it refers to a class with a formal interpretation in some model. Thus, in lemon we introduce the class Lexical Concept that represents a mental abstraction, concept or unit of thought that can be lexicalized by a given collection of senses. A lexical concept is thus a subclass of skos:Concept.

Lexical Concept (Class)

URI: http://www.w3.org/ns/lemon/ontolex#LexicalConcept

A lexical concept represents a mental abstraction, concept or unit of thought that can be lexicalized by a given collection of senses.

SubClassOf: skos:Concept

The lexical entry is said to evoke a particular lexical concept, similar to how a lexical entry denotes an ontology reference.

Evokes (Object Property)

URI: http://www.w3.org/ns/lemon/ontolex#evokes

The evokes property relates a lexical entry to one of the lexical concepts it evokes, i.e. the mental concept that speakers of a language might associate when hearing the lexical entry.

Domain: Lexical Entry

Range: Lexical Concept

InverseOf: isEvokedBy

Property Chain: sense o isLexicalizedSenseOf

The evoked concept is different from the reference in the ontology, as the reference primarily gives an interpretation of a word in terms of the identifiers that would be generated by the semantic parsing of the sentence. For example if we were to understand the sentence John F. Kennedy died in 1963. we may understand the verb "die (in)" as generating the URI deathDate within a SPARQL query. However, we might also want to record the actual lexical sense of the word with respect to a mental lexicon, in which die evokes the event of dying, as modelled in the following example:

no desc

:die a ontolex:Word ;
     ontolex:denotes <http://dbpedia.org/ontology/deathDate> ;
     ontolex:evokes  :Dying .

We can link a lexical concept to a lexical sense that lexicalizes the concept via the property lexicalized sense:

Lexicalized Sense (Object Property)

URI: http://www.w3.org/ns/lemon/ontolex#lexicalizedSense

The lexicalized sense property relates a lexical concept to a corresponding lexical sense that lexicalizes the concept.

Domain: Lexical Concept

Range: Lexical Sense

InverseOf: isLexicalizedSenseOf

A simple example involving the use of a lexical concept is the following:

no desc

:temporary_change_of_possession a ontolex:LexicalConcept;
     ontolex:lexicalizedSense :borrow_sense;
     ontolex:lexicalizedSense :lend_sense;
     ontolex:isEvokedBy :borrow_le;
     ontolex:isEvokedBy :lend_le.

:borrow_le a ontolex:LexicalEntry;
     ontolex:sense :borrow_sense;
     ontolex:evokes :temporary_change_of_possession.

:lend_le a ontolex:LexicalEntry;
    ontolex_sense :lend_sense;
    ontolex:evokes :temporary_change_of_possession.

Similarly, we can link a lexical concept to a reference in the ontology by means of the concept property:

Concept (Object Property)

URI: http://www.w3.org/ns/lemon/ontolex#concept

The concept property relates an ontological entity to a lexical concept that represents the corresponding meaning.

Domain: owl:Thing

Range: Lexical Concept

InverseOf: isConceptOf

The combined usage of the properties denotes, sense, evokes, concept and lexicalized sense is demonstrated in the example below for the case of a lexical resource such as Princeton WordNet. Roughly, the synsets in a wordnet correspond to a lexical concept in lemon. The modelling would thus look as follows:

no desc

:cat_lex a ontolex:LexicalEntry ;                                               
  ontolex:canonicalForm :cat_form ;
  ontolex:sense :cat_sense ;
  ontolex:denotes <http://dbpedia.org/resource/Cat> ;
  ontolex:evokes pwn:102124272-n .

:cat_form ontolex:writtenRep "cat"@en .

:cat_sense a ontolex:LexicalSense ;
  ontolex:reference <http://dbpedia.org/resource/Cat> ;
  ontolex:isLexicalizedSenseOf pwn:102124272-n ;
  ontolex:isSenseOf :cat_lex .

<http://dbpedia.org/resource/Cat>
  ontolex:concept pwn:102124272-n ;
  ontolex:isReferenceOf :cat_sense ;
  ontolex:isDenotedBy :cat_lex .

pwn:102124272-n a ontolex:LexicalConcept;
  ontolex:isEvokedBy :cat_lex ;
  ontolex:lexicalizedSense :cat_sense ;
  ontolex:isConceptOf <http://dbpedia.org/resource/Cat> .

A definition can be added to a lexical concept as a gloss by using the skos:definition property.

In addition to organizing a lexicon by lexical entries, we may alternatively create a lexicon of concepts, by means of the the concept set class, defined as follows:

Concept Set (Class)

URI: http://www.w3.org/ns/lemon/ontolex#ConceptSet

A concept set represents a collection of lexical concepts.

SubClassOf: skos:ConceptScheme, void:Dataset

EquivalentClass: skos:inScheme min 1 LexicalConcept

In this way lexicons can be ordered onomasiologically, that is by meanings rather than by lemmas. The concept set is a special type of skos:ConceptScheme. A lexical concept is linked to a ConceptSet using the property skos:inScheme

no desc

:conceptLexicon a ontolex:ConceptSet .

:consumption1 a ontolex:LexicalConcept ;
  ontolex:isConceptOf <http://dbpedia.org/resource/Tuberculosis> ;
  skos:definition "Tuberculosis, MTB, or TB (short for tubercle bacillus), in the past also called phthisis, phthisis pulmonalis, or consumption, is a widespread, and in many cases fatal, infectious disease caused by various strains of mycobacteria, usually Mycobacterium tuberculosis. Tuberculosis typically attacks the lungs, but can also affect other parts of the body. It is spread through the air when people who have an active TB infection cough, sneeze, or otherwise transmit respiratory fluids through the air."@en;
  ontolex:isEvokedBy :consumption ;
  skos:inScheme :conceptLexicon .
                                                                                
:consumption2 a ontolex:LexicalConcept ;                                         
  ontolex:isConceptOf <http://dbpedia.org/resource/Consumption_(Economics)> ;
  skos:definition "Consumption is a major concept in economics and is also studied by many other social sciences. Economists are particularly interested in the relationship between consumption and income, and therefore in economics the consumption function plays a major role.";
  ontolex:isEvokedBy :consumption ;
  skos:inScheme :conceptLexicon .
                                                                                
:tuberculosis1 a ontolex:LexicalConcept ;
  ontolex:isConceptOf <http://dbpedia.org/resource/Tuberculosis> ;
  skos:definition "Tuberculosis, MTB, or TB (short for tubercle bacillus), in the past also called phthisis, phthisis pulmonalis, or consumption, is a widespread, and in many cases fatal, infectious disease caused by various strains of mycobacteria, usually Mycobacterium tuberculosis. Tuberculosis typically attacks the lungs, but can also affect other parts of the body. It is spread through the air when people who have an active TB infection cough, sneeze, or otherwise transmit respiratory fluids through the air."@en;
  ontolex:isEvokedBy :tuberculosis ;
  skos:inScheme :conceptLexicon .

:consumption a ontolex:LexicalEntry ;
  ontolex:canonicalForm :consumption_lemma .

:consumption_lemma ontolex:writtenRep "consumption"@en .

:tuberculosis a ontolex:LexicalEntry ;
  ontolex:canonicalForm :tuberculosis_lemma .

:tuberculosis_lemma ontolex:writtenRep "tuberculosis"@en .

4. Syntax and Semantics (synsem)

Lemon_Syntax_and_Semantics.png
Figure 2 Lemon_Syntax_and_Semantics.png

4.1 Syntactic Frames

Most words in a language do not stand by their own, but have a certain syntactic behavior in the sense that they appear in certain syntactic structures and require a number of syntactic arguments to be complete. Examples of this are i) transitive verbs (e.g. to own), which require a syntactic subject and a syntactic object, ii) relational nouns (e.g. capital (of), mother (of), son (of), brother (of), etc.), which require a prepositional object, or iii) adjectives, which require a noun to modify, etc. The syntactic behavior of a lexical entry is defined in lemon by a syntactic frame:

Syntactic Frame (Class)

URI: http://www.w3.org/ns/lemon/synsem#SyntacticFrame

A syntactic frame represents the syntactic behavior of an open class word in terms of the (syntactic) arguments it requires. It essentially describes the so called subcategorization structure of the word in question, in particular the syntactic arguments it requires.

In order to relate a lexical entry to one of its various syntactic behaviors as captured by a syntactic frame, the synsem module defines the syntactic behavior property. Each lexical entry should have its own syntactic frame instance, generic behavior such as 'transitive' should be captured by classes.

Syntactic Behavior (Object Property)

URI: http://www.w3.org/ns/lemon/synsem#synBehavior

The syntactic behavior property relates a lexical entry to one of its syntactic behaviors as captured by a syntactic frame.

Domain: ontolex:LexicalEntry

Range: SyntacticFrame

Characteristics: InverseFunctional

The following example shows how to indicate that the verb (to) own can be used as a transitive verb. This is accomplished by adding a frame own_frame_transitive that is declared as a transitive frame, using the class TransitiveFrame defined in the LexInfo Ontology.

no desc

:own_lex a ontolex:LexicalEntry ;
  ontolex:canonicalForm :own_form ;
  synsem:synBehavior :own_frame_transitive .

:own_frame_transitive a synsem:SyntacticFrame, lexinfo:TransitiveFrame.

:own_form ontolex:writtenRep "own"@en . 

Arguments of a syntactic frame are represented by the class Syntactic Argument:

Syntactic Argument (Class)

URI: http://www.w3.org/ns/lemon/synsem#SyntacticArgument

A syntactic argument represents a slot that needs to be filled for a certain syntactic frame to be complete. Syntactic arguments typically realize a certain grammatical function (e.g. subject, direct object, indirect object, prepositional object, etc.).

The object property synArg is used to relate a (syntactic) frame to one of its syntactic arguments.

SynArg (Object Property)

URI: http://www.w3.org/ns/lemon/synsem#synArg

The object property synArg relates a syntactic frame to one of its syntactic arguments.

Domain: SyntacticFrame

Range: SyntacticArgument

The following example shows how to extend the example for the verb (to) own by specifically indicating the arguments, in this case via two specific sub-properties of synArg, i.e. lexinfo:subject or lexinfo:directObject defined in the external LexInfo ontology.

no desc

:own_lex a ontolex:LexicalEntry ;
  ontolex:canonicalForm :own_form ;
  synsem:synBehavior :own_frame_transitive .

:own_form ontolex:writtenRep "own"@en. 

:own_frame_transitive a lexinfo:TransitiveFrame;
       lexinfo:subject :own_frame_subj;
       lexinfo:directObject :own_frame_obj.

Note that if an external ontology is used to describe the type of arguments in more detail, e.g. indicating the grammatical function as in the example above, the external property used needs to be a sub-property of synArg.

4.2 Ontology Mappings

At the lexicon-ontology interface, syntactic frames need to be mapped or bound to ontological structures that represent their meaning. In the same way that a lexical sense binds a lexical entry to an ontology entity, the OntoMap maps a syntactic frame onto an ontology entity.

OntoMap (Class)

URI: http://www.w3.org/ns/lemon/synsem#OntoMap

An ontology mapping (OntoMap for short) specifies how a syntactic frame and its syntactic arguments map to a set of concepts and properties in the ontology that together specify the meaning of the syntactic frame.

In order to link an ontology map to a corresponding sense, the model foresees the property ontoMapping, which is defined as functional and inverse functional, that is in exact 1:1 relationship with a lexical sense. As such, it is recommended that in the case that a lexicon requires both the ontology map and the lexical sense, then these two entities are defined using the same URI as there is no technical reason to distinguish them and they have very similar functions.

ontoMapping (Object Property)

URI: http://www.w3.org/ns/lemon/synsem#ontoMapping

The ontoMapping property relates an ontology mapping to its corresponding lexical sense.

Domain: OntoMap

Range: LexicalSense

Characteristics: Functional, InverseFunctional

The synsem module introduces the property ontoCorrespondence to establish a mapping between an argument of a predicate defined in the ontology and the syntactic argument that realizes this predicate argument in a given syntactic frame:

ontoCorrespondence (Object Property)

URI: http://www.w3.org/ns/lemon/synsem#ontoCorrespondence

The ontoCorrespondence property binds an argument of a predicate defined in the ontology to a syntactic argument that realizes this predicate argument syntactically.

Domain: OntoMap or LexicalSense

Range: SyntacticArgument

Without limitation, we assume that an ontology consists of symbols representing individuals, unary predicates and binary predicates, as indicated by the following table:

Type

Predicate

Predicate Logic Notation

RDF Notation

Class

Unary predicate

City(x)

?x rdf:type dbo:City

Object, Datatype or Annotation Property

Binary predicate

knows(x,y),

?x foaf:knows ?y

Individual

Constant (null-ary predicate)

London,

dbr:London

Predicates with an arity of more than two can be represented by complex senses (see below). This is due to the fact that this module is aligned to RDF and OWL, which distinguish between: individuals/resources (constants), classes (unary predicates) and properties (predicates of arity "2").

In the following, we introduce three sub-properties of the ontoCorrespondence property. The first property is a is used to refer to the single argument of a unary predicate in the ontology:

Is A (Object Property)

URI: http://www.w3.org/ns/lemon/synsem#isA

The is a property represents the single argument of a class or unary predicate.

SubPropertyOf: ontoCorrespondence

Following the terminology used in RDF/OWL we call the first argument of a property its subject and the second argument the object. The synsem module defines two properties subjOfProp and objOfProp that can be used to refer to the 1st (subject) and 2nd (object) argument of a property, that is a predicate of arity "2".

Subject of Property (Object Property)

URI: http://www.w3.org/ns/lemon/synsem#subjOfProp

The subjOfProp property represents the 1st argument or subject of a binary predicate (property) in the ontology.

SubPropertyOf: ontoCorrespondence

Object of Property (Object Property)

URI: http://www.w3.org/ns/lemon/synsem#objOfProp

The objOfProp represents the 2nd argument or object of a binary predicate (property) in the ontology.

SubPropertyOf: ontoCorrespondence

Finally, we can specify the reference owner that expresses the meaning of "to own" with respect to the DBpedia ontology, specifying the mapping between arguments of the property owner and the arguments that realize these arguments syntactically.

no desc

:own_lex a ontolex:LexicalEntry ;
  ontolex:canonicalForm :own_form ;
  synsem:synBehavior :own_frame_transitive ;
  ontolex:denotes <http://dbpedia.org/ontology/owner> .

:own_form ontolex:writtenRep "own"@en. 

:own_frame_transitive a lexinfo:TransitiveFrame;
       lexinfo:subject :own_subj;
       lexinfo:directObject :own_obj.

:own_ontomap a synsem:OntoMap;
         synsem:subjOfProp :own_obj;
         synsem:objOfProp :own_subj.

As a further example we show a lexical entry for the relational noun "father (of)". The entry indicates that the relation noun "father (of)" can be used to verbalize the DBpedia property father, whereby the subject in a copula construct such as "X is father of Y" (:arg1 below) corresponds to the 2nd argument of the property father, and the prepositional argument at position Y (:arg2 below) corresponds to the 1st argument of the property father. We use the LexInfo vocabulary to provide linguistic information.

no desc

:father_of a ontolex:LexicalEntry ; 
    lexinfo:partOfSpeech lexinfo:noun ;
    ontolex:canonicalForm :father_form;
    synsem:synBehavior :father_of_nounpp;
    ontolex:sense :father_sense_ontomap.

:father_form a ontolex:Form;
    ontolex:writtenRep "father"@en.

:father_of_nounpp a lexinfo:NounPPFrame;
   lexinfo:subject :arg1;
   lexinfo:prepositionalArg :arg2.

:father_sense_ontomap a synsem:OntoMap, ontolex:LexicalSense;
   synsem:ontoMapping :father_sense_ontomap;
   ontolex:reference <http://dbpedia.org/ontology/father>;
   synsem:subjOfProp :arg2;
   synsem:objOfProp :arg1.

:arg2 synsem:marker :of .

Marker (Object Property)

URI: http://www.w3.org/ns/lemon/synsem#marker

The object property marker indicates the marker of a syntactic argument; this can be a case marker or some other lexical entry such as a preposition or particle.

Domain: SyntacticArgument

Range: rdfs:Resource

The following example shows how to specify that the intransitive verb operate, subcategorizing a prepositional phrase introduced by the preposition in, can be used to denote the property regionServed in DBpedia. The entry specifies that in a construction such as `X operates in Y', the X refers to the subject of the property regionServed, and the Y refers to the object of the property regionServed. Again, we use the LexInfo ontology in our example to provide linguistic information:

no desc

:operate_in a ontolex:LexicalEntry ; 
    lexinfo:partOfSpeech lexinfo:verb ;
    ontolex:canonicalForm :operate_form;
    synsem:synBehavior :operate_intransitivepp;
    ontolex:sense :operate_sense_ontomap.

:operate_form a ontolex:Form;
   ontolex:writtenRep "operate"@en.

:operate_intransitivepp a synsem:SyntacticFrame;
   lexinfo:subject :operate_subj ;
   lexinfo:prepositionalArg :operate_pobj.

:operate_sense_ontomap a ontolex:LexicalSense, synsem:OntoMap;
   synsem:ontoMapping :operate_sense_ontomap;
   ontolex:reference <http://dbpedia.org/ontology/regionServed>;
   synsem:subjOfProp :operate_subj;
   synsem:objOfProp :operate_pobj.
 
:operate_pobj synsem:marker :in .

4.3 Complex ontology mappings / submappings

In many cases, the meaning of a syntactic frame can not be expressed by exactly one binary predicate as in the examples given above. Take for instance the case of a transitive verb (to) launch, which subcategorizes a subject expressing the company that launched a product, a direct object expressing the launched product, and a prepositional object introduced by the preposition in indicating the year of the launch of the product in question. The important thing here is that there are three syntactic arguments (subject, object and prepositional object, represented as arg1, arg2 and arg3 below, respectively) that realize the arguments of a complex predicate that consist of the sub-predicates dbpedia:product and dbpedia:launchDate.

Thus, the synsem module introduces the property submap that relates a (complex) ontological map involving various ontological predicates to a set of less complex ontological maps that bind the arguments of one of the involved predicates to a syntactic argument that realizes it.

Submap (Object Property)

URI: http://www.w3.org/ns/lemon/synsem#submap

The submap property relates a (complex) ontological mapping to a set of bindings that together bind the arguments of the involved predicates to a set of syntactic arguments that realize them syntactically.

Domain: OntoMap

Range: OntoMap

The following example shows how to use the submap property to indicate that the meaning of the phrase X launched Y in Z is a composition of the properties dbpedia:product and dbpedia:launchDate, which together express the meaning of the syntactic frame:

no desc

:launch a ontolex:LexicalEntry ;
  lexinfo:partOfSpeech lexinfo:verb ;
  ontolex:canonicalForm :launch_canonical_form;
  synsem:synBehavior :launch_transitive_pp;
  ontolex:sense :launch_sense_ontomap.

:launch_canonical_form ontolex:writtenRep "launch"@en.

:launch_transitive_pp a lexinfo:TransitivePPFrame;
 lexinfo:subject              :arg1 ;
 lexinfo:directObject         :arg2 ;
 lexinfo:prepositionalAdjunct :arg3.

:arg3 synsem:marker :in ;
             synsem:optional "true"^^xsd:boolean .


:launch_sense_ontomap a ontolex:LexicalSense, synsem:OntoMap;
   synsem:ontoMapping :launch_sense_ontomap;
   synsem:submap :launch_submap1;
   synsem:submap :launch_submap2.

:launch_submap1 ontolex:reference <http://dbpedia.org/ontology/product>;
                                 synsem:subjOfProp :arg1;
                                 synsem:objOfProp  :arg2.

:launch_submap2 ontolex:reference <http://dbpedia.org/ontology/launchDate>;
                                 synsem:subjOfProp :arg2;
                                 synsem:objOfProp  :arg3.

It is possible to specify that a certain argument is not compulsory by the optional property. It is generally only advised to use this property with complex senses. Indicating that an argument is optional means that it does not have to be realized syntactically in which case from a semantic point of view the corresponding argument of the ontological predicate is existentially quantifier over. In the above example we have indicated that arg3 is optional, allowing to assign the correct semantics to an expression such as X launched Y by existentially quantifying over the year.

Optional (Datatype Property)

URI: http://www.w3.org/ns/lemon/synsem#optional

The optional property indicates whether a syntactic argument is optional, that is, it can be syntactically omitted.

Domain: SyntacticArgument

Range: xsd:boolean

The following example shows how we can capture the diathesis alternation between X gave Y Z and X gave Z to Y, which in our modelling represent the same ontological meaning:

no desc

:give a ontolex:LexicalEntry ; 
    lexinfo:partOfSpeech lexinfo:verb ;
    ontolex:canonicalForm :give_form;
    synsem:synBehavior :give_ditransitive;
    synsem:synBehavior :give_transitive_pp;
    ontolex:sense :giving_sense_ontomap.

:give_form a ontolex:Form;
   ontolex:writtenRep "give"@en.

:give_transitive_pp a lexinfo:TransitivePPFrame;
   lexinfo:subject :give_subj1 ;
   lexinfo:directObject :give_dobj1; 
   lexinfo:prepositionalArg :give_pobj1.

:give_ditransitive a lexinfo:DitransitiveFrame;
   lexinfo:subject :give_subj2 ;
   lexinfo:indirectObject :give_iobj2;
   lexinfo:directObject :give_dobj2.


:giving_sense_ontomap a ontolex:LexicalSense, synsem:OntoMap;
   synsem:ontoMapping :giving_sense_ontomap;
   ontolex:reference <http://www.ontologyportal.org/SUMO.owl#Giving>;
   synsem:submap :giving_submap1;
   synsem:submap :giving_submap2;
   synsem:submap :giving_submap3.
 
:giving_submap1 ontolex:reference <http://www.ontologyportal.org/SUMO.owl#agent>;
                                 synsem:subjOfProp :giving_event;
                                 synsem:objOfProp  :give_subj1;
                                 synsem:objOfProp  :give_subj2.

:giving_submap2 ontolex:reference <http://www.ontologyportal.org/SUMO.owl#patient>;
                                 synsem:subjOfProp :giving_event;
                                 synsem:objOfProp  :give_dobj2;
                                 synsem:objOfProp :give_dobj1.

:giving_submap3 ontolex:reference <http://www.ontologyportal.org/SUMO.owl#destination>;
                                 synsem:subjOfProp :giving_event;
                                 synsem:objOfProp  :give_iobj2;
                                 synsem:objOfProp :give_pobj1.

:give_pobj1 synsem:marker :to .

For adjectives a modelling may be as follows:

no desc

:female a ontolex:LexicalEntry; 
  lexinfo:partOfSpeech lexinfo:adjective;
  ontolex:canonicalForm :female_canonical_form;
  synsem:synBehavior :female_syn,:female_syn1;
  ontolex:sense :female_sense_ontomap.

:female_canonical_form ontolex:writtenRep "female"@en.

:female_sense_ontomap ontolex:reference [
    a owl:Restriction;
    owl:onProperty <http://dbpedia.org/ontology/gender> ;
    owl:hasValue <http://dbpedia.org/resource/Female> ] ;
  synsem:ontoMapping :female_sense_ontomap;
  synsem:isA :female_arg .

:female_syn a lexinfo:AdjectivePredicateFrame;
   lexinfo:copulativeSubject :female_arg.
                                                                                
:female_syn1 a lexinfo:AdjectiveAttributiveFrame ;                              
   lexinfo:attributiveArg :female_arg.  

Note that in the above example the property synsem:isA property is used to mark the single argument/variable of the class of all the things that have female gender. The copulative subject in an expression such as "Mary is female" is bound to this single argument of the corresponding ontological predicate. The semantics is thus in essence the characteristic function that for each element decides if it is in the set denoted by the class.

4.4 Conditions

Conditions describe precise conditions that must be met by a context in which a lexical entry can be used to refer to a certain ontological predicate (reference). These contextual conditions are attached to the lexical sense that mediates the relation between a lexical entry and the ontological predicate it can be used to express.

condition (Object Property)

URI: http://www.w3.org/ns/lemon/synsem#condition

The condition property defines an evaluable constraint that derives from using a certain lexical entry to express a given ontological predicate.

Domain: LexicalSense

Range: rdfs:Resource

SubPropertyOf: usage

Two special types of conditions are defined in the synsem module, which formulate constraints on the type of arguments that can be used at the first or second position of a property when a certain lexical entry is used to express that property. Take for instance the distinction between the English verbs (to) ride and (to) drive. Both express the means of transportation, but have different implications. Ride implies that the means of transportation is a bike. Instead of introducing different ontological predicates and different senses, the modulation can be captured by specifying restrictions on the values that can fill the 1st or 2nd argument of the corresponding ontological predicate. This is illustrated by the example below:

no desc

:ride a ontolex:LexicalEntry ;
  ontolex:sense :ride_sense1 .

:ride_sense1 a ontolex:LexicalSense ;
  ontolex:reference :methodOfTransportation ;
  synsem:propertyRange :Bicycle ;
  synsem:semArg :subj, :obj .

:methodOfTransportation a rdf:Property ;
  rdfs:range :Vehicle .

It is important to note that the propertyDomain or propertyRange properties do not modify in any way the ontological status or commitment of the corresponding property (here: methodOfTransportation). Instead, they make explicit certain implications on the type of arguments involved that derive from the use of a certain lexical entry to express the property in question.

propertyDomain (Object Property)

URI: http://www.w3.org/ns/lemon/synsem#propertyDomain

The propertyDomain property specifies a constraint on the type of arguments that can be used at the first position of the property that is referenced by the given sense.

Domain: LexicalSense

Range: rdfs:Resource

propertyRange (Object Property)

URI: http://www.w3.org/ns/lemon/synsem#propertyRange

The propertyRange property specifies a constraint on the type of arguments that can be used at the first position of the property that is referenced by the given sense.

Domain: LexicalSense

Range: rdfs:Resource

5. Decomposition (decomp)

Lemon_Decomposition.png
Figure 3 Lemon_Decomposition.png

5.1 Subterms

Decomposition is the process of indicating which elements constitute a multiword or compound lexical entry. The simplest way to do this is by means of the subterm property, which indicates that a lexical entry is a part of another entry. This property allows us to specify which lexical entries a certain compound lexical entry is composed of.

Subterm (Object Property)

URI: http://www.w3.org/ns/lemon/decomp#subterm

The property subterm relates a compound lexical entry to one of the lexical entries it is composed of.

Domain: LexicalEntry

Range: LexicalEntry

The subterm property is used to indicate which terms have been derived from another term by means of adding or removing words, for example

no desc

:AfricanSwineFever a ontolex:LexicalEntry ;
  decomp:subterm :SwineFever .

The subterm property may also be used to indicate the decomposition of compound words. The following example shows how to indicate that the German compound Lungenentzündung ('pneumonia' literally 'lung inflammation') is decomposed into the lexical entries Lunge and Entzündung:

no desc

:Lungenentzündung a ontolex:LexicalEntry ;
  decomp:subterm :Lunge_lex;
  decomp:subterm :Entzündung_lex .

It is important to mention that the subterm property is a relation between lexical entries and does neither indicate the specific inflected word of a lexical entry that appears in the compound nor the position at which it appears.

5.2 Components

The subterm property allows us to indicate which lexical entries a compound is composed of, but it does not indicate the internal structure of the compound. This can be achieved by introducing so called components. Such components represent a fixed list of each of the elements that compose a lexical entry. In the most common case of a multiword expression, the components of the lexical entry are the individual tokens that compose that entry.

Component (Class)

URI: http://www.w3.org/ns/lemon/decomp#Component

A component is a particular realization of a lexical entry that forms part of a compound lexical entry.

Each component is said to be a constituent of a lexical entry:

Constituent (Object Property)

URI: http://www.w3.org/ns/lemon/decomp#constituent

The property constituent relates a lexical entry or component to a component that it is constituted by.

Domain: LexicalEntry or Component

Range: Component

no desc

:AfricanSwineFever a ontolex:MultiwordExpression ;
  decomp:constituent :African_comp , :Swine_comp , :Fever_comp ;
  decomp:subterm :SwineFever .

:African_comp a decomp:Component .

:Swine_comp a decomp:Component .

:Fever_comp a decomp:Component .

:SwineFever a ontolex:MultiwordExpression ;
  decomp:constituent :Swine_comp , :Fever_comp .

As a component represents a particular realization of a lexical entry which forms part of a compound lexical entry, we need to link the component to the corresponding lexical entry it is a realization of. This is done by the property correspondsTo:

Corresponds To (Object Property)

URI: http://www.w3.org/ns/lemon/decomp#correspondsTo

The property correspondsTo links a component to a corresponding lexical entry or argument.

Domain: Component

Range: LexicalEntry or SyntacticArgument

It may be necessary to add inflectional properties to the component to uniquely determine the actual form of the lexical entry. This inflectional information can be attached to the component as shown in the following example for the Spanish term 'comunidad autónoma' (federal state), whose second word is the singular feminine form autónoma instead of the canonical form autónomo.

no desc

:comunidad_autonoma_lex a ontolex:LexicalEntry ;
  decomp:constituent :comunidad_component;
  decomp:constituent :autonoma_component .

:comunidad_component a decomp:Component;
     decomp:correspondsTo :comunidad_lex.

:autonoma_component a decomp:Component;
     decomp:correspondsTo :autonomo_lex;
     lexinfo:gender lexinfo:feminine;
     lexinfo:number lexinfo:singular.

If we want to specify the order of the components, we can use the RDF properties rdf:_1, rdf:_2, etc. as in the following example to specify the absolute order, in addition to the constituent properties. Note that the property constituent alone is not sufficient to specify the order of components.

no desc

:comunidad_autonoma_lex a ontolex:LexicalEntry ;
  decomp:constituent :comunidad_component;
  rdf:_1             :comunidad_component; 
  decomp:constituent :autonoma_component;
  rdf:_2             :autonoma_component;
  ontolex:denotes <http://dbpedia.org/ontology/federalState>;
  ontolex:canonicalForm :comunidad_autonoma_lex_canonical_form.

:comunidad_autonoma_lex_canonical_form ontolex:writtenRep "comunidad autónoma"@es.

:comunidad_component a decomp:Component;
     decomp:correspondsTo :comunidad_lex.

:autonoma_component a decomp:Component;
     decomp:correspondsTo :autonomo_lex;
     lexinfo:gender lexinfo:feminine;
     lexinfo:number lexinfo:singular.

5.3 Phrase structure

The constituent property can also be used to specify the structure of a phrase, by means of showing some components as being constituted of further components. In this way, each of the components represents a node in the phrase structure tree and may be annotated with a phrase tag as in the following example:

no desc

:AfricanSwineFever_root a decomp:Component ;
  decomp:correspondsTo :AfricanSwineFever ;
  decomp:constituent :African_node, :SwineFever_node ;
  rdf:_1 :African_node;
  rdf:_2 :SwineFever_node;
  olia:hasTag penn:NP .

:African_node a decomp:Component ;
  decomp:correspondsTo :African ;
  olia:hasTag penn:JJ .

:SwineFever_node a decomp:Component ;
  decomp:constituent :Swine_node, :Fever_node ;
  rdf:_1 Swine_node;
  rdf:_2 Fever_node;
  olia:hasTag penn:NP .

:Swine_node a decomp:Component ; 
  decomp:correspondsTo :Swine ;
  olia:hasTag penn:NN .

:Fever_node a decomp:Component ; 
  decomp:correspondsTo :Fever ;
  olia:hasTag penn:NN .

The syntactic categories of the phrases are indicated using the property olia:hasTag from the OLiA vocabulary using the Penn TreeBank tagset.

The following example shows how to use the synsem module in conjunction with the decomp module to indicate the phrase structure tree of a frame. This is done by making the frame the target of the correspondsTo property and including components in the tree that correspond to individual arguments. As such it is possible to represent modelling of lexicalized grammars within the lexicon.

no desc

:know a ontolex:Word ;
  synsem:synBehavior :know_frame .

:know_frame a synsem:SyntacticFrame ;
  lexinfo:subject :subject ;
  lexinfo:directObjet :directObject .

:know_root a decomp:Component ;
  decomp:correspondsTo :know_frame ;
  decomp:constituent :X_node, :knowY_node ;
  olia:hasTag penn:S .

:X_node a decomp:Component ;
  decomp:correspondsTo :subject ;
  olia:hasTag penn:NP .

:knowY_node a decomp:Component ;
  decomp:constituent :know_node, :Y_node ;
  olia:hasTag penn:VP .

:know_node a decomp:Component ;
  decomp:correspondsTo :know ;
  olia:hasTag penn:V .

:Y_node a decomp:Component ;
  decomp:correspondsTo :directObject ;
  olia:hasTag penn:NP .

6. Variation & Translation (vartrans)

The variation and translation module introduces vocabulary needed to represent relations between lexical entries and lexical senses that are variants of each other. The following diagram provides an overview of the vocabulary introduced by the module:

Lemon_Variation_and_Translation.png
Figure 4 Lemon_Variation_and_Translation.png

6.1 Lexico-Semantic Relations

The model defines a generic class lexico-semantic relation that allows us to relate two lexical entries or two lexical senses to each other, this is done principally by means of two properties lexicalRel and senseRel that allow to directly link two lexical entries / lexical senses that are related.

lexicalRel (Object Property)

URI: http://www.w3.org/ns/lemon/vartrans#lexicalRel

The lexicalRel property relates two lexical entries that stand in some lexical relation.

Domain: ontolex:LexicalEntry

Range: ontolex:LexicalEntry

senseRel (Object Property)

URI: http://www.w3.org/ns/lemon/vartrans#senseRel

The senseRel property relates two lexical senses that stand in some sense relation.

Domain: ontolex:LexicalSense

Range: ontolex:LexicalSense

In general, these properties should not be used directly but instead a sub-property should be introduced, for example:

no desc

:fao lexinfo:initialismFor :food_and_agriculture_organization.

:surrogate_mother lexinfo:hypernym :mother.

lexinfo:initialismFor rdfs:subProperty vartrans:lexicalRel.
lexinfo:hypernym rdfs:subProperty vartrans:senseRel.
    

In the case that further information about the relationship needs to be represented it is possible to create an individual that 'reifies' the relationship.

Lexico-Semantic Relation (Class)

URI: http://www.w3.org/ns/lemon/vartrans#LexicoSemanticRelation

A lexico-semantic relation represents the relation between two lexical entries or lexical senses that are related by some lexical or semantic relationship.

subClassOf: relates exactly 2 (ontolex:LexicalEntry OR ontolex:LexicalSense OR ontolex:LexicalConcept)

The object property relates links a lexico-semantic relation to the lexical entries or lexical senses between which it establishes the relation:

relates (Object Property)

URI: http://www.w3.org/ns/lemon/vartrans#relates

The relates property links a lexico-semantic relation to the two lexical entries or lexical senses between which it establishes the relation.

Domain: LexicoSemanticRelation

Range: ontolex:LexicalEntry OR ontolex:LexicalSense OR ontolex:LexicalConcept

As many lexico-semantic relations are asymmetric, it is necessary to distinguish the source from the target:

source (Object Property)

URI: http://www.w3.org/ns/lemon/vartrans#source

The source property indicates the lexical sense or lexical entry involved in a lexico-semantic relation as a 'source'.

SubPropertyOf: relates

target (Object Property)

URI: http://www.w3.org/ns/lemon/vartrans#target

The target property indicates the lexical sense or lexical entry involved in a lexico-semantic relation as a 'target'.

SubPropertyOf: relates

The class lexico-semantic relation is specialized into the following two subclasses: lexical relation and sense relation, which relate two lexical entries or two lexical senses, respectively:

Lexical Relation (Class)

URI: http://www.w3.org/ns/lemon/vartrans#LexicalRelation

A lexical relation is a lexico-semantic relation that represents the relation between two lexical entries the surface forms of which are related grammatically, stylistically or by some operation motivated by linguistic economy.

subClassOf: LexicoSemanticRelation, relates exactly 2 ontolex:LexicalEntry

By lexical relations we understand those relations at the surface forms, mainly motivated by grammatical requirements, style (Wortklang), and linguistic economy (helping to avoid excessive denominative repetition and improving textual coherence). Examples of lexical relations are the following:

The specific type of lexical or sense relation can be specified via the object property category, which is defined as follows:

Category (Object Property)

URI: http://www.w3.org/ns/lemon/vartrans#category

The category property indicates the specific type of relation by which two lexical entries or two lexical senses are related.

Domain Lexico-Semantic Relation

Characteristics: Functional

The following example shows how to model the relation between "Food and Agriculture Organization" and its initialism "FAO" as one example of a lexical relation:

no desc

:fao a ontolex:LexicalEntry ;
     ontolex:sense :fao_sense; 
     ontolex:lexicalForm :fao_form.

:fao_sense ontolex:reference <http://dbpedia.org/resource/Food_and_Agriculture_Organization> .

:food_and_agriculture_organization a ontolex:LexicalEntry;
     ontolex:sense :food_and_agriculture_organization_sense ;
     ontolex:lexicalForm :food_and_agriculture_organization_form.

:food_and_agriculture_organization_sense ontolex:reference <http://dbpedia.org/resource/Food_and_Agriculture_Organization> .

:fao_form ontolex:writtenRep "FAO"@en .
:food_and_agriculture_organization_form ontolex:writtenRep "Food and Agriculture Organization"@en .

:fao_initialism a vartrans:LexicalRelation ;
      vartrans:source :food_and_agriculture_organization ; 
      vartrans:target :fao ;
      vartrans:category :initialism.

Sense Relation (Class)

URI: http://www.w3.org/ns/lemon/vartrans#SenseRelation

A sense relation is a lexico-semantic relation that represents the relation between two lexical senses the meanings of which are related.

subClassOf: LexicoSemanticRelation, relates exactly 2 ontolex:LexicalSense

Examples of semantic relations are the equivalence relation between two senses, hypernymy and hyponymy relations, synonymy, antonymy, translations, etc.

The following example gives an example of a sense relation:

no desc

:surrogate_mother_lex a ontolex:LexicalEntry ;
     ontolex:sense :surrogate_mother_sense ;
     ontolex:canonicalForm :surrogate_mother_form.

:surrogate_mother_sense ontolex:reference <http://dbpedia.org/ontology/surrogate_mother>.

:surrogate_mother_form ontolex:writtenRep "surrogate mother"@en .

:mother_lex a ontolex:LexicalEntry ;
     ontolex:sense :mother_sense ;
     ontolex:canonicalForm :mother_form.

:mother_sense ontolex:reference <http://dbpedia.org/ontology/mother>.

mother_form ontolex:writtenRep "mother"@en .

:senseRelation a vartrans:SenseRelation ;
      vartrans:source :surrogate_mother_sense ;
      vartrans:target :mother_sense ; 
      vartrans:category lexinfo:hypernym .

Further, we consider terminological relations, which are defined as follows:

Terminological Relation (Class)

URI: http://www.w3.org/ns/lemon/vartrans#TerminologicalRelation

A terminological relation is a sense relation that relates two lexical senses of terms that are semantically related in the sense that they can be exchanged in most contexts, but their surface forms are not directly related. The variants vary along dimensions that are not captured by the given ontology and are intentionally (pragmatically) caused.

SubclassOf: SenseRelation

Examples of categories of terminological relations include:

We illustrate the use of terminological relations with the following example of a diachronic variant:

no desc

:tuberculosis a ontolex:LexicalEntry ;
       ontolex:lexicalForm :tuberculosis_form ; 
       ontolex:sense :tuberculosis_sense.

:tuberculosis_form ontolex:writtenRep "tuberculosis"@en .

:tuberculosis_sense ontolex:reference <http://dbpedia.org/resource/Tuberculosis>.

:phthisis a ontolex:LexicalEntry ;
       ontolex:lexicalForm :phthisis_form ; 
       ontolex:sense :phthisis_sense.

:phthisis_form ontolex:writtenRep "phthisis"@en .

:phtisis_sense ontolex:reference <http://dbpedia.org/resource/Tuberculosis>;
               dct:subject <http://dbpedia.org/resource/Medicine> .

:phtisis_diachronic_relation a vartrans:TerminologicalRelation ;
      vartrans:source :phthisis_sense ;
      vartrans:target :tuberculosis_sense ; 
      vartrans:category :diachronic.

Finally, it is also possible to give relationships between concepts, and this is useful for modelling relations between synsets in wordnets and other similar resources

conceptRel (Object Property)

URI: http://www.w3.org/ns/lemon/vartrans#conceptRel

The conceptRel property relates two lexical concepts that stand in some sense relation.

Domain: ontolex:LexicalConcept

Range: ontolex:LexicalConcept

Conceptual Relation (Class)

URI: http://www.w3.org/ns/lemon/vartrans#ConceptualRelation

A relationship between two lexical concepts

subClassOf: LexicoSemanticRelation, relates exactly 2 ontolex:LexicalConcept

6.2 Translation

Translation relates two lexical entries from different languages the meaning of which is 'equivalent'. This 'equivalence` can be expressed at three different levels:

6.2.1 Translation as shared reference

In order to express that the lexical senses of two lexical entries are ontologically equivalent, we do not need other machinery than the one introduced already in the ontolex section above:

no desc

:surrogate_mother a ontolex:LexicalEntry;
      dct:language <http://id.loc.gov/vocabulary/iso639-2/eng>, <http://lexvo.org/id/iso639-1/en> ;
      ontolex:sense :surrogate_mother_sense.

:surrogate_mother_sense ontolex:reference ontology:SurrogateMother.

:madre_de_alquiler a ontolex:LexicalEntry;
      dct:language <http://id.loc.gov/vocabulary/iso639-2/es>, <http://lexvo.org/id/iso639-1/es> ;
      ontolex:sense :madre_de_alquiler_sense.

:madre_de_alquiler_sense ontolex:reference ontology:SurrogateMother.

:leihmutter a ontolex:LexicalEntry;
      dct:language <http://id.loc.gov/vocabulary/iso639-2/de>, <http://lexvo.org/id/iso639-1/de> ;
      ontolex:sense :leihmutter_sense.

:leihmutter_sense ontolex:reference ontology:SurrogateMother.

By this, the corresponding senses of the lexical entries surrogate mother, madre de alquiler and Leihmutter are said to be equivalent in that they denote the same class in the ontology.

6.2.2 Translation as a relation between lexical senses

The second alternative mentioned above can be realized through the class translation, which relates two senses that can be regarded as equivalent in that they can be exchanged for each other.

Translation (Class)

URI: http://www.w3.org/ns/lemon/vartrans#Translation

A translation is a sense relation expressing that two lexical senses corresponding to two lexical entries in different languages can be translated to each other without any major meaning shifts.

subClassOf: SenseRelation

no desc

:zip_code a ontolex:LexicalEntry;
      dct:language <http://id.loc.gov/vocabulary/iso639-2/eng>, <http://lexvo.org/id/iso639-1/en> ;
      ontolex:sense :zip_code_sense.

:zip_code_sense ontolex:reference <http://dbpedia.org/ontology/zipCode>.

:postleitzahl a ontolex:LexicalEntry;
      dct:language <http://id.loc.gov/vocabulary/iso639-2/de>, <http://lexvo.org/id/iso639-1/de> ;
      ontolex:sense :postleitzahl_sense.

:postleitzahl_sense ontolex:reference <http://de.dbpedia.org/resource/Postleitzahl>.


:trans a vartrans:Translation;
       vartrans:source :zip_code_sense;
       vartrans:target :postleitzahl_sense;
       vartrans:category <http://purl.org/net/translation-categories#directEquivalent>.

Thus, in spite of using having different denotations, both Postleitzahl and zip code can be seen as cross-lingual equivalents and thus as translations of each other.

Besides the class Translation, which reifies the translation relation between two lexical senses, as a shortcut the model also allows us to directly express the relation of translation between lexical senses by a property translation that is regarded as equivalent to the reification:

translation (Object Property)

URI: http://www.w3.org/ns/lemon/vartrans#translation

The translation property relates two lexical senses of two lexical entries that stand in a translation relation to one another.

subPropertyOf: senseRel

With the translation property, the above example can be replaced with:

no desc

:zip_code a ontolex:LexicalEntry;
      dct:language <http://id.loc.gov/vocabulary/iso639-2/eng>, <http://lexvo.org/id/iso639-1/en> ;
      ontolex:sense :zip_code_sense.

:zip_code_sense ontolex:reference <http://dbpedia.org/ontology/zipCode>.

:postleitzahl a ontolex:LexicalEntry;
      dct:language <http://id.loc.gov/vocabulary/iso639-2/de>, <http://lexvo.org/id/iso639-1/de> ;
      ontolex:sense :postleitzahl_sense.

:postleitzahl_sense ontolex:reference <http://de.dbpedia.org/resource/Postleitzahl>.

:zip_code_sense vartrans:translation :postleitzahl_sense.

6.2.3 Translatable As

The third option foreseen in the vartrans model is one where we say that a lexical entry can be translated into some other entry in some contexts, underspecifying the exact lexical senses involved and the exact contextual conditions under which this translation is valid. For this, the model introduces the property translatableAs:

translatableAs (Object Property)

URI: http://www.w3.org/ns/lemon/vartrans#translatableAs

The translatableAs property relates a lexical entry in some language to a lexical entry in another language that it can be translated as depending on the particular context and specific senses of the involved lexical entries.

Domain: ontolex:LexicalEntry

Range: ontolex:LexicalEntry

Characteristics: Symmetric

Subproperty of: isSenseOf o translation o sense

The following example shows how to use the relation translatableAs to specify that corner (which can mean street intersection or intersection of two inside walls) can be translated as the Spanish rincón (intersection of two inside walls) or esquina (street intersection), depending on the particular sense involved.

no desc

:corner a ontolex:LexicalEntry;
      dct:language <http://id.loc.gov/vocabulary/iso639-2/eng>, <http://lexvo.org/id/iso639-1/en> .
 
:rincón a ontolex:LexicalEntry;
       dct:language <http://id.loc.gov/vocabulary/iso639-2/es>, <http://lexvo.org/id/iso639-1/es> .

:esquina a ontolex:LexicalEntry;
       dct:language <http://id.loc.gov/vocabulary/iso639-2/es>, <http://lexvo.org/id/iso639-1/es> .

:corner vartrans:translatableAs :rincón.
:corner vartrans:translatableAs :esquina.

6.2.4 Translation Set

We can group translations into a set by using the class translation set, for instance if they come from the same language resource, or belonging to the same organisation, etc.:

Translation Set (Class)

URI: http://www.w3.org/ns/lemon/vartrans#TranslationSet

A translation set is a set of translations that have some common source.

In order to relate a translation set to one of the translations contained in it, the model defines a property trans:

trans (Object Property)

URI: http://www.w3.org/ns/lemon/vartrans#trans

The trans property relates a TranslationSet to one of its translations.

Domain: TranslationSet

Range: Translation

no desc

:study a ontolex:LexicalEntry ;
  ontolex:sense :study_sense ;
  dct:language iso639:en .

:Studium a ontolex:LexicalEntry ;
  ontolex:sense :Studium_sense ;
  dct:language iso639:de .

:Untersuchung a ontolex:LexicalEntry ;
  ontolex:sense :Untersuchung_sense ;
  dct:language iso639:de .

:staidear a ontolex:LexicalEntry ;
  ontolex:ense :staidear_sense ;
  dct:language iso639:ga .

:t1 a vartrans:Translation ;
  vartrans:source :study_sense ;
  vartrans:target :Studium_sense .

:t2 a vartrans:Translation ;
  vartrans:source :study_sense ;
  vartrans:target :staidear_sense .

:t3 a vartrans:Translation ;
  vartrans:source :study_sense ;
  vartrans:target :Untersuching_sense .

:ts1 a vartrans:TranslationSet ;
  vartrans:trans :t1, :t3 ;
  dc:source "Automatically translated"@en .

:ts2 a vartrans:TranslationSet ;
  vartrans:trans :t2 ;
  dc:source "Wiktionary"@en .

7. Metadata (lime)

Lemon_Lime_Metadata.png
Figure 5 Lemon_Lime_Metadata.png

The LInguistic MEtadata (lime) module allows for describing metadata at the level of the lexicon-ontology interface. This module is intended to complement existing metadata schemas such as Dublin Core, the PROV ontology, DCAT or VoID, as lime provides a profile to describe metadata as related to the lexicon-ontology interface.

Following the conceptual model of the lexicon-ontology interface, lime distinguishes three main metadata entities:

  1. the reference dataset (describing the semantics of the domain, e.g., the ontology),
  2. the lexicon (being a collection of lexical entries),
  3. the concept set (an optional set of lexical concepts, bearing a conceptual backbone to a lexicon)

Note: the reference dataset here is not limited to OWL vocabularies, but includes any RDF dataset which contains references to objects of a domain of discourse.

As a metadata vocabulary, lime focuses on summarizing quantitative and qualitative information about these entities and the relations among them.

Metadata is attached in particular to three types of sets that lime distinguishes:

  1. the set of lexicalizations, containing the bindings between logical predicates in the ontology and lexical entries in the lexicon
  2. the set of conceptualizations, containing the bindings between lexical concepts in the concept set and entries in the lexicon
  3. the set of lexical links, linking lexical concepts from a concept set to references in an ontology

In the following sections, we provide detailed descriptions for the lime vocabulary to describe metadata for the lexicon as a whole as well as for the three types of sets described above. Metadata about ontologies (and domain datasets as well) and lexical concept sets can be provided by means of the already mentioned existing metadata vocabularies.

7.1 Lexicon and Lexicon Metadata

The main metadata-bearing entity in lemon is a lexicon object that represents a collection of lexical entries for a particular language. A small example lexicon consisting of four lexical entries for cat, marry, high and intangible assets would look as follows:

no desc

:lexicon a lime:Lexicon;
   lime:language "en";
   lime:entry :lex_high;
   lime:entry :lex_cat;
   lime:entry :lex_marry;
   lime:entry :lex_intangible_assets.

A lexicon is expected to consist of at least one lexical entry and is defined as a subclass of void:Dataset:

Lexicon (Class)

URI: http://www.w3.org/ns/lemon/lime#Lexicon

A lexicon represents a collection of lexical entries for a particular language or domain.

SubClassOf: entry min 1 ontolex:LexicalEntry, language exactly 1 rdfs:Literal, void:Dataset

The property linking a lexicon to a lexical entry is the property entry:

Entry (Object Property)

URI: http://www.w3.org/ns/lemon/lime#entry

The entry property relates a lexicon to one of the lexical entries contained in it.

Domain: Lexicon

Range: ontolex:LexicalEntry

The language property can be stated on either a lexicon or a lexical entry (note that all entries in the same lexicon should be in the same language and that the language of the lexicon and entry should be consistent with the language tags used on all forms) and its value should be a literal representing the language.

Language (Datatype Property)

URI: http://www.w3.org/ns/lemon/lime#language

The language property indicates the language of a lexicon, a lexical entry, a concept set or a lexicalization set.

Domain: Lexicon or ontolex:LexicalEntry or ConceptSet or LexicalizationSet

Range: xsd:language

Beyond using the lime:language property, which has a Literal as a range, it is recommended to use the Dublin Core language property with reference to either Lexvo.org or The Library of Congress Vocabulary

The property lexical entries indicates the number of lexical entries contained in a lexicon. The property is also used for lexicalization and conceptualization sets, indicating in this case the number of lexical entries involved in these sets.

Lexical Entries (Datatype Property)

URI: http://www.w3.org/ns/lemon/lime#lexicalEntries

The lexical entries property indicates the number of distinct lexical entries contained in a lexicon, lexicalization set or conceptualization set.

Domain: Lexicon or LexicalizationSet or ConceptualizationSet

Range: xsd:integer

The model also allows us to specify the linguistic (annotation) model used to describe characteristics of lexical entries via the linguisticCatalog property:

Linguistic Catalog (Object Property)

URI: http://www.w3.org/ns/lemon/lime#linguisticCatalog

The linguistic catalog property indicates the catalog of linguistic categories used in a lexicon to define linguistic properties of lexical entries.

Domain: Lexicon

SubPropertyOf: voaf:Vocabulary

As an example we may describe a simple lexicon using the above introduced properties in addition to Dublin Core properties. The part-of-speech of the four lexical entries is indicated using the lexinfo vocabulary, so that the value of linguisticCatalog is set to http://www.lexinfo.net/ontologies/2.0/lexinfo. In the example, there is one (RDF) resource that represents both the lexicon itself and its metadata:

no desc

:lexicon a lime:Lexicon;
   lime:language "en";
   dct:language <http://id.loc.gov/vocabulary/iso639-2/en>, <http://lexvo.org/id/iso639-1/eng> ;
   lime:lexicalEntries "4"^^xsd:integer;                                               
   lime:linguisticCatalog <http://www.lexinfo.net/ontologies/2.0/lexinfo> ;
   dct:description "This is an example lexicon"@en;                              
   dct:description "Questo è un lessico di esempio"@it;                          
   dct:creator <http://john.mccr.ae/>;                                           
   void:triples "29"^^xsd:integer ;                                                       
   lime:entry :lex_high;                                                     
   lime:entry :lex_cat;                                                      
   lime:entry :lex_marry;                                                    
   lime:entry :lex_intangible_assets.                                        

                                                                                
:lex_cat a ontolex:LexicalEntry, lexinfo:Noun;                                                
   ontolex:canonicalForm :form_cat.
:form_cat ontolex:writtenRep "cat"@en.                                          
                                                                                
:lex_marry a ontolex:LexicalEntry, lexinfo:Verb;                                              
   ontolex:canonicalForm :form_marry.                    
:form_marry ontolex:writtenRep "marry"@en .                                     
                                                                                
:lex_high a ontolex:LexicalEntry, lexinfo:Adjective;                                               
   ontolex:canonicalForm :form_high.                   
:form_high ontolex:writtenRep "high"@en .                                       
                                                                                
:lex_intangible_assets a ontolex:LexicalEntry, lexinfo:Noun;                               
  ontolex:canonicalForm :form_intangible_assets.               
:form_intangible_assets ontolex:writtenRep "intangible assets"@en.

7.2 Lexicalization Set

A lexicalization set is a void:Dataset that comprises a collection of so called lexicalizations, which we understand as pairs of a lexical entry and an associated reference in the ontology.

Lexicalization Set (Class)

URI: http://www.w3.org/ns/lemon/lime#LexicalizationSet

A lexicalization set is a dataset that comprises a collection of lexicalizations, that is pairs of lexical entry and corresponding reference in the associated ontology/vocabulary/dataset.

SubClassOf: void:Dataset, lexiconDataset max 1 lime:Lexicon, referenceDataset exactly 1 void:Dataset, partition only LexicalizationSet, lexicalizationModel exactly 1

The lexicalization set is linked to the ontology and the lexicon by means of the properties reference dataset and lexicon dataset, respectively.

Reference Dataset (Object Property)

URI: http://www.w3.org/ns/lemon/lime#referenceDataset

The reference dataset property indicates the dataset that contains the domain objects or vocabulary elements that are either referenced by a given lexicon, providing the grounding vocabulary for the meaning of the lexical entries, or linked to lexical concepts in a concept set by means of a lexical link set.

Domain: LexicalizationSet or LexicalLinkset

Range: void:Dataset

Lexicon Dataset (Object Property)

URI: http://www.w3.org/ns/lemon/lime#lexiconDataset

The lexicon dataset property indicates the lexicon that contains the entries referred to in a lexicalization set or a conceptualization set.

Domain: LexicalizationSet or ConceptualizationSet

Range: Lexicon

The optionality of the lexicon dataset property is required to support other lexicalization models (e.g. RDFS, SKOS, SKOS-XL) that do not introduce a separate notion of lexicon, since lexical entries only exist implicitly being part of a lexicalization. The property lexicalization model indicates the specific lexicalization model used.

Lexicalization Model (Object Property)

URI: http://www.w3.org/ns/lemon/lime#lexicalizationModel

The lexicalization model property indicates the model used for representing lexical information. Possible values include (but are not limited to) http://www.w3.org/2000/01/rdf-schema# (for the use of rdfs:label), http://www.w3.org/2004/02/skos/core (for the use of skos:pref/alt/hiddenLabel), http://www.w3.org/2008/05/skos-xl (for the use of skosxl:pref/alt/hiddenLabel) and http://www.w3.org/ns/lemon/ontolex for lemon.

Domain: LexicalizationSet

Range: rdfs:Resource

SubPropertyOf: void:vocabulary

The model defines the property references, which indicates the number of vocabulary elements lexicalized by at least one lexical entry. This number can be obviously smaller than the number of entities in the ontology (in case some vocabulary elements are not lexicalized) and the number of lexical entries in the lexicon (in case that several lexical entries refer to the same ontology element), respectively.

References (Datatype Property)

URI: http://www.w3.org/ns/lemon/lime#references

The references property indicates the number of distinct ontology or vocabulary elements that are either associated with lexical entries via a lexicalization set or linked to lexical concepts via a lexical link set.

Domain: LexicalizationSet or LexicalLinkset

Range: xsd:integer

In the following example, we describe a lexicalization set expressing how elements of an ontology can be verbalized in Japanese by means of entries from a supplied lexicon. The metadata clearly tells which ontology and lexicon are involved in the lexicalization set, that is http://www.example.com/ontology and http://www.example.com/lexicon, respectively, as well as the relevant natural language. The knowledge of these facts about a lexicalization set allows us to assess its usefulness for a given task as well to discover relevant lexicalization sets, when we are constrained by the choice of an ontology, lexicon or natural language.

The ontology is modelled as an instance of the class voaf:Vocabulary that is a kind of void:Dataset representing vocabularies (both RDFS Schemas and OWL Ontologies). We benefit from the more specific distinctions made by VOAF, by breaking down the total number of entities in the ontology (held by the property void:entities) into separate counts for the classes and properties (held by voaf:classNumber and voaf:propertyNumber, respectively).

Similarly, terms from the lime vocabulary are used to represent statistics about the linguistic content of the lexicon and the lexicalization set. Overall, the ontology defines 100 entities and the lexicon 80 lexical entries; however, only 20 entities from the target ontology have been associated with a total of 50 lexical entries. In this sense, only 20 references from the ontology have been actually lexicalized by linking them to a lexical entry.

When counting the entities in the ontology or, in general, in the reference dataset, we recommend to ignore the resources describing the ontology itself (that is an instance of the class owl:Ontology) as well as other metadata entities.

no desc

:Lexicalization a lime:LexicalizationSet ;
  lime:language "ja";
  dct:language  <http://id.loc.gov/vocabulary/iso639-1/ja>, <http://lexvo.org/id/iso639-3/jpn> ;
  lime:lexicalizationModel <http://www.w3.org/ns/lemon/all> ;
  lime:referenceDataset <http://www.example.com/ontology> ;
  lime:lexiconDataset <http://www.example.com/lexicon> ;
  lime:references 20 ;
  lime:lexicalEntries 50 .

<http://www.example.com/ontology> a owl:Ontology, voaf:Vocabulary, void:Dataset ;
  void:entities 100 ;
  voaf:classNumber 60 ;
  voaf:propertyNumber 40 .

<http://www.example.com/lexicon> a lime:Lexicon ;
  lime:language "ja" ;
  dct:language  <http://id.loc.gov/vocabulary/iso639-1/ja>, <http://lexvo.org/id/iso639-3/jpn> ;
  lime:lexicalEntries 80 .

A lexicalization set comprises a set of pairs of a lexical entry and the corresponding reference that the lexical entry denotes. These pairs are expressed differently depending on the lexical model adopted:

In addition to specifying the number of entities in the ontology lexicalized, it is also possible to give the total number of lexicalizations, that is the total connections between lexical entries and references. This number should in most cases be the same as the total number of lexical senses defined in the lexicon. The value may be given by the absolute number of lexicalizations:

Lexicalizations (Datatype Property)

URI: http://www.w3.org/ns/lemon/lime#lexicalizations

The lexicalizations property indicates the total number of lexicalizations in a lexicalization set, that is the number of unique pairs of lexical entry and denoted ontology element.

Domain: LexicalizationSet

Range: xsd:integer

In addition or alternatively to the absolute number of lexicalizations, the model also supports the indication of the average number of lexicalizations per ontology element:

Average Number of Lexicalizations (Datatype Property)

URI: http://www.w3.org/ns/lemon/lime#avgNumOfLexicalizations

The average number of lexicalizations property indicates the average number of lexicalizations per ontology element.

Domain: LexicalizationSet

Range: xsd:decimal

The average number of lexicalizations is calculated as specified by the following formula:

Formula_avgNumOfLexicalizations-v1.png
Figure 6 Formula_avgNumOfLexicalizations-v1.png

The following example describes an ontology consisting of 30 ontology elements. The corresponding lexicalization set contains 20 lexicalizations involving 15 lexical entries (so some entries have multiple meanings in the ontology). On average, for each element in the ontology there are thus 20/30 = 0.66 lexicalizations.

no desc

:Lexicalization a lime:LexicalizationSet ;
  lime:lexicalizations 20 ;
  lime:references 20 ;
  lime:lexicalEntries 15 ;
  lime:avgNumOfLexicalizations 0.66 ;
  lime:referenceDataset <http://www.example.com/ontology> ;
  lime:lexiconDataset <http://www.example.com/lexicon> .

<http://www.example.com/ontology> a owl:Ontology, void:Dataset ;
  void:entities 30 .

Finally, the percentage property may be used to express the percentage of entities in an ontology which are lexicalized, formally:

percentage.png
Figure 7 percentage.png

Percentage (Datatype Property)

URI: http://www.w3.org/ns/lemon/lime#percentage

The percentage property expresses the percentage of entities in the reference dataset which have at least one lexicalization in a lexicalization set or are linked to a lexical concept in a lexical linkset.

Domain: LexicalizationSet or LexicalLinkset

Range: xsd:decimal

7.3 Partitions

In many cases, we want to provide descriptive metadata about a subset of a lexicalization set, that is for the subset representing all the lexicalizations for a certain type of ontology entity (class, property, etc.). To logically partition a lexicalization set, the lime module introduces the property partition:

Partition (Object Property)

URI: http://www.w3.org/ns/lemon/lime#partition

The partition property relates a lexicalization set or lexical linkset to a logical subset that contains lexicalizations for a given ontological type only.

Domain: LexicalizationSet or LexicalLinkset

Range:: LexicalizationSet or LexicalLinkset

SubPropertyOf: void:subset

Resource Type (Object Property)

URI: http://www.w3.org/ns/lemon/lime#resourceType

The resource type property indicates the type of ontological entity of a lexicalization set or lexical linkset.

Domain: LexicalizationSet or LexicalLinkset

Range: rdfs:Class

Characteristics: Functional

For example, we may limit our metadata about lexicalizations to a particular class, e.g. restricting the metadata to the logical partition of lexicalizations that denote an element in the extension of the corresponding class:

no desc

:Lexicalization a lime:LexicalizationSet ;
  lime:partition :CountryPartition ;
  lime:references 2000 .

:CountryPartition
  lime:resourceType ontology:Country ;
  lime:references 50 .

In addition, it is also possible to give RDF(S) or OWL types as the target of the resource type property. This allows us to state the number of classes that are lexicalized by at least one lexical entry:

no desc

:Lexicalization a lime:LexicalizationSet ;
  lime:partition :ClassPartition .

:ClassPartition
  lime:resourceType owl:Class ;
  lime:references 50 .

7.4 Lexical Linkset

Lexical linksets are similar in many ways to the lexicalization sets above in the sense that they connect a concept set to an ontology. The primary purpose of this is to describe the linking of a concept set such as the synsets in a wordnet to an ontology.

Lexical Linkset (Class)

URI: http://www.w3.org/ns/lemon/lime#LexicalLinkset

A lexical linkset represents a collection of links between a reference dataset and a set of lexical concepts (e.g. synsets of a wordnet).

SubClassOf: void:Linkset, conceptualDataset exactly 1 ontolex:ConceptSet, referenceDataset exactly 1 void:Dataset, partition only LexicalLinkset

The lexical linkset is linked to a concept set by means of the conceptual dataset property:

Conceptual Dataset (Object Property)

URI: http://www.w3.org/ns/lemon/lime#conceptualDataset

The conceptual dataset property relates a lexical link set or a conceptualization set to a corresponding concept set.

Domain: LexicalLinkset or ConceptualizationSet

Range: ontolex:ConceptSet

There are several properties that are analogous to properties defined for a lexicalization set. For example concepts indicates the number of concepts in a concept set:

Concepts (Datatype Property)

URI: http://www.w3.org/ns/lemon/lime#concepts

The concepts property indicates the number of lexical concepts defined in a concept set or involved in either a LexicalLinkset or ConceptualizationSet.

Domain: ontolex:ConceptSet or LexicalLinkset or ConceptualizationSet

Range: xsd:integer

Similarly, the links and avgNumOfLinks properties are analogous to the properties lexicalizations and avgNumOfLexicalizations.

Finally, we note that the references, percentage and partition properties apply to the lexical linkset in the same way as to the lexicalization set.

7.5 Conceptualization Set

A conceptualization set is analogous to a lexicalization set, but associates a concept set with a lexicon and consists of conceptualizations, that is pairs formed by a single lexical entry and its associated lexical concept.

Conceptualization Set (Class)

URI: http://www.w3.org/ns/lemon/lime#ConceptualizationSet

A conceptualization set represents a collection of links between lexical entries in a lexicon and lexical concepts in a concept set they evoke.

SubClassOf: void:Dataset, lexiconDataset exactly 1 Lexicon, conceptualDataset exactly 1 ontolex:ConceptSet

A number of properties already described for other metadata entities can also be used in the description of a conceptualization set.

Additional properties have been defined specifically to characterize a given set of conceptualizations:

Conceptualizations (Datatype Property)

URI: http://www.w3.org/ns/lemon/lime#conceptualizations

The conceptualizations property indicates the number of distinct conceptualizations in a conceptualization set.

Domain: ConceptualizationSet

Range: xsd:integer

Average Ambiguity (Datatype Property)

URI: http://www.w3.org/ns/lemon/lime#avgAmbiguity

The average ambiguity property indicates the average number of lexical concepts evoked by each lemma/canonical form in the lexicon.

Domain: ConceptualizationSet

Range: xsd:decimal

Average Synonymy (Datatype Property)

URI: http://www.w3.org/ns/lemon/lime#avgSynonymy

The average synonymy property indicates the average number of lexical entries evoking each lexical concept in the concept set.

Domain: ConceptualizationSet

Range: xsd:decimal

The following example shows how to describe the metadata of a version of WordNet 3.0 transformed into RDF. The example illustrates how to describe the main components of the resource (a lexicon, a concept set and a conceptualization relating them). The transformation to RDF is based on a straightforward mapping between the WordNet meta-model and the ontolex model:

By having this mapping in mind, it should be clear how some of the statistics about WordNet 3.0 would be specified by means of the vocabulary introduced by the lime module:

no desc

:WnConceptualizationSet a lime:ConceptualizationSet ;
  lime:conceptualDataset :WnConceptSet ;
  lime:lexiconDataset :WnLexicon ;
  lime:lexicalEntries "155287"^^xsd:integer ;
  lime:concepts "117659"^^xsd:integer ;
  lime:conceptualizations "206941"^^xsd:integer ;
  lime:avgAmbiguity "1.33"^^xsd:decimal ;
  lime:avgSynonymy "1.76"^^xsd:decimal
  .

:WnConceptSet a ontolex:ConceptSet ;
  lime:concepts "117659"^^xsd:integer .

:WnLexicon a lime:Lexicon ;
  lime:lexicalEntries "155287"^^xsd:integer .

7.6 Formal definition of properties

The lime module essentially provides vocabulary to describe the relation between three sets:

The model considers binary relations over these sets as follows:

For each Ri, it holds that the relation is a subset of the Cartesian product of the involved sets, i.e. RiA × B

For each of these relations RiA × B, we define the following counts:

and ratios:

The lime model does not introduce all the properties to express all of the above counts for all three relations, but has selected to model the following relations:

Relation

Related Dataset

cardinality(Ri)

count(πA(Ri))

count(πB(Ri))

coverageA(Ri)

averageA(Ri)

averageB(Ri)

RlexO × L

lime:LexicalizationSet

lime:lexicalizations

lime:references

lime:lexicalEntries

percentage

avgNumOfLexicalizations

---- N/A ----

RconL × C

lime:ConceptualizationSet

lime:conceptualizations

lime:lexicalEntries

lime:concepts

---- N/A ----

avgAmbiguity

avgSynonymy

RlinkO × C

lime:LexicalLinkset

lime:links

lime:references

lime:concepts

percentage

avgNumOfLinks

---- N/A ----

7.7 Publication Scenarios

In this section, we describe different publication scenarios for lemon models. The lexicon ontology model essentially describes three types of entities:

Irrespective of their logical dependencies, all of the entities above can be published as physically independent data sources. At the other end of the set of options, the entities can be published together as one data source.

We highlight four common publication scenarios:

  1. Independent resources: A reference dataset, a lexicon and a lexicalization set are published as independent data sources. This scenario is very common in case of independently developed resources. A reference dataset and a general-purpose (i.e. not tailored towards that dataset) lexicon exist and are published separately (possibly by different publishers). A third party then decides to link these datasets by a lexicalization set and publishes it as a third entity and advertises it through proper lime metadata.
  2. Linking to 3rd party lexicon: A general-purpose lexicon is published as an independent resource. Then, in developing a reference dataset/ontology, its authors decide to publish it together with a lexicalization set based on the lexical entries from the existing lexicon.
  3. Linking to 3rd party ontology: A lexicon tailored to an existing reference dataset is published together with a lexicalization set. This is the opposite scenario to scenario 2 above. In this case the reference dataset or ontology vocabulary is the pre-existing resource developed by some 3rd party, and a lexicon is created ad hoc for it, with the associated lexicalizations.
  4. Integrated: Reference dataset, dataset-specific lexicon and lexicalization set are combined into a single data source: this scenario corresponds to closed environments where a single party is in control of the ontology, the lexicon and the lexicalizations and publishes the three as one dataset. In this scenario, the reference dataset is created and lexicalized with lexical elements created specifically for it. This scenario is the typical setting of ontology vocabularies/datasets naturally lexicalized by means of rdfs:label, skos or skosxl labeling properties.

Similarly, there is Concept Set for a collection of lexical concepts and ConceptualizationSet for the triples expressing how lexical concepts relate to lexical entries from a given lexicon. Similar considerations to the ones above apply to these datasets.

Identifying a Concept Set as an independent dataset allows reusing the same lexical concepts across different conceptualization sets. For example, this allows us to reuse the same lexical concepts from an existing wordnet to conceptualize a lexicon in a different natural language than the one for which the resource was initially conceived. Otherwise, it is possible to define different concept sets, one for each conceptualization set, and then to relate them via a VoID Linkset.

8. Linguistic Description

An important goal of a lexicon is to record linguistic properties of the lexical entries defined in the lexicon such as its part-of-speech, gender, aspect, inflectional pattern, etc. The lemon model does not prescribe any vocabulary for doing so, but leaves it at the discretion of the user of the model to select an appropriate vocabulary that is in line with a given theoretical linguistic framework or grammar. We show below how third party category systems can be reused to describe the properties of lexical entires in a lemon lexicon. We will use the [http://lexinfo.net/ontology/2.0/lexinfo# lexinfo] ontology in our examples as such as third party ontology describing relevant linguistic categories and properties.

8.1 Morphosyntactic Description

A lexicon typically indicates the part-of-speech of a given lexical entry. We can specify the part of speech of a word as follows using the lexinfo vocabulary:

no desc

:cat a ontolex:Word ;
  lexinfo:partOfSpeech lexinfo:noun .

When defining categories, it is crucial to link these categories to other models to establish coherence. The partOfSpeech property is defined as follows in lexinfo:

no desc

lexinfo:partOfSpeech 
  rdfs:label "part of speech"@en ;
  rdfs:comment "A category assigned to a word based on its grammatical and semantic properties."@en ;
  dcr:datcat <http://www.isocat.org/datcat/DC-1345> ,
             <http://www.isocat.org/datcat/DC-396> ;
  rdfs:range lexinfo:PartOfSpeech ;
  rdfs:subPropertyOf lexinfo:morphosyntacticProperty .

The concrete part of speech "noun" is defined as follows and linked to the ISOcat category DC-1333.

no desc

lexinfo:noun
  a lexinfo:PartOfSpeech, lexinfo:NounPOS ;
  rdfs:label "noun"@en ;
  rdfs:comment "Part of speech used to express the name of a person, place, action or thing."@en ;
  dcr:datcat <http://www.isocat.org/datcat/DC-1333> .

The following morpho-syntactic properties are defined in the lexinfo ontology:

When using these properties, care should be taken to distinguish between linguistic properties of the entry itself and properties of any of the forms. By default, it should be assumed that a property of a lexical entry also holds for all its forms. For example, in many languages gender is an entry property for nouns, but a form property for adjectives, for example:

no desc

:spiaggia a ontolex:Word ;
  ontolex:canonicalForm :spiaggia_lemma ;
  ontolex:otherForm spiaggia_plural ;
  lexinfo:partOfSpeech lexinfo:noun ;
  lexinfo:gender lexinfo:feminine .

:spiaggia_lemma 
  ontolex:writtenRep "spiaggia"@it ;
  lexinfo:number lexinfo:singular .

:spiaggia_plural
  ontolex:writtenRep "spiagge"@it ;
  lexinfo:number lexinfo:plural .

:famoso a ontolex:Word ;
  ontolex:canonicalForm :famoso_lemma ;
  ontolex:otherForm :famosa_form, :famose_form, famosi_form ;
  lexinfo:partOfSpeech lexinfo:adjective .

:famoso_lemma
  ontolex:writtenRep "famoso"@it ;
  lexinfo:number lexinfo:singular ;
  lexinfo:gender lexinfo:masculine .

:famosa_form 
  ontolex:writtenRep "famosa"@it ;
  lexinfo:number lexinfo:singular ;
  lexinfo:gender lexinfo:feminine .

For convenience, lexinfo also introduces specific classes for each part of speech so that the part of speech of a word can be specified by a rdf:type statement. For example, the part of speech Noun is defined as follows:

Noun ≡ ∃ partOfSpeech.NounPOS

It is recommended to use both the rdf:type statement as well as the lexinfo:partOfSpeech to maximize interoperability in spite of the small redundancy:

no desc

:geneesmiddel a lexinfo:Noun ;
  lexinfo:partOfSpeech lexinfo:noun .

8.2 Paradigmatic Description

Pragmatic aspects related to the usage of a lexical entry as well as the paradigmatic relationships between lexical entries can also be described using the lemon model by resorting to some external vocabulary. As for the case of the description of the morphosyntactic properties of lexical entries and their forms, lemon does not prescribe any vocabulary but encourages the use of external vocabularies to describe aspects related to the temporal use of a lexical entry, e.g. to indicate whether the use of the lexical entry is modern or anachronic or to specify lexico-semantic relationships between lexical senses. Examples of such paradigmatic or lexico-semantic relationships are: synonymy, antonymy, holonymy, hypernymy, meronymy, etc.

8.3 Arguments

When describing syntactic frames it is important to specify the grammatical role or function played by different syntactic arguments. We might want to specify, for instance, which argument plays the grammatical role of subject and which argument plays the role of a direct object, etc. LexInfo distinguishes the following types of arguments:

Each argument is associated with a specific property indicating the grammatical role to the actual object representing the syntactic argument.

no desc

:father a lexinfo:Noun ;
  synsem:synBehavior :father_frame.

:father_frame a lexinfo:NounPredicateFrame ;
  rdfs:label "X is the father of Y" , "X is Y's father" ;
  lexinfo:copulativeArg :father_frame_arg1 ;
  lexinfo:possessiveAdjunct :father_frame_arg2 .

:father_frame_arg1 a lexinfo:CopulativeArg .

:father_frame_arg2 a lexinfo:PossessiveAdjunct .

8.4 Frames

Syntactic or subcategorization frames describe which syntactic arguments a certain lexical entry (verb, noun etc.) requires to be complete. A verb that requires a subject and a direct object is called a transitive verb. The corresponding frame that generalizes across particular verbs is called transitive frame or transitive construction (in construction grammar theories).

In lexinfo, frames can be axiomatized by describing which type of arguments they subcategorize. A transitive frame would be axiomatized as follows in lexinfo:

TransitiveFrame ≡ VerbFrame ⊓ (=1 subject ⊓ =1 directObject)

8.5 Other properties

In addition, it is possible to define other properties in an external resource, that may be difficult to translate across resources. An example of such a property is translation confidence as shown below:

no desc

:bench a ontolex:LexicalEntry ;
        ontolex:lexicalForm [ ontolex:writtenRep "bench"@en].

:bench-sense a ontolex:LexicalSense ;
        ontolex:isSenseOf :bench .

:banco a ontolex:LexicalEntry ;
        ontolex:lexicalForm [ ontolex:writtenRep "banco"@es].

:banco-sense a ontolex:LexicalSense ;       
        ontolex:isSenseOf :banco .


:tranSetEN-ES a vartrans:TranslationSet ;
        dc:source <http://hdl.handle.net/10230/17110> ;
        vartrans:trans :bench_banco-trans .

:bench_banco-trans a vartrans:Translation ;
        vartrans:source :bench-sense ;
        vartrans:target :banco-sense .


:tranSetEN-ES a prov:Entity .  
:bench_banco-trans a prov:Entity .
    
:humanTranslationActivity a prov:Activity .
:executionOfMyAlgorithm a prov:Activity .

:bench_banco-trans prov:qualifiedGeneration [
     a prov:Generation ;
     prov:activity :humanTranslationActivity ;
     lexinfo:translationConfidence 1.0 ;
] .

:bench_banco-trans prov:qualifiedGeneration [
     a prov:Generation ;
     prov:activity :executionOfMyAlgorithm ;
     lexinfo:translationConfidence 0.3 ;
] .

9. Lexical Nets

Lexical nets, so called wordnets in particular, are an important type of lexical resource used very often in natural language processing applications. Lexical nets organize the senses of words into groups of equivalent meaning, so called synsets. Further, synsets are related to each other using lexico-semantic relationships so that the the resource can be regarded as a "net". We discuss below how lexical nets can be represented using the lemon vocabulary using Princeton wordnet as an example.

9.1 Lexical nets in OntoLex-lemon

As mentioned above, lexical nets indicate the different lexical senses that a word has and groups these senses into sets of equivalent senses (so called synsets). Below we state how the main entities of a lexical net (words, lemmas, senses and synsets) can be represented in lemon:

Lexico-semantic relations should be represented between lexical concepts. The WordNet-RDF ontology defines some of these lexico-semantic relations:

http://globalwordnet.github.io/schemas/wn#

An full description of the Global WordNet Association extension of OntoLex-lemon is available here, and an example of modelling is given here:

no desc

<#example-en> a lime:Lexicon ;
  rdfs:label "Example wordnet (English)"@en ;
  dc:language "en" ;
  schema:email "john@mccr.ae" ;
  cc:license <https://creativecommons.org/publicdomain/zero/1.0/> ;
  owl:versionInfo "1.0" ;
  schema:citation "CILI: the Collaborative Interlingual Index. Francis Bond, Piek Vossen, John P. McCrae and Christiane Fellbaum, Proceedings of the Global WordNet Conference 2016, (2016)." ;
  schema:url "http://globalwordnet.github.io/schemas/" ;
  dc:publisher "Global Wordnet Association" ;
  lime:entry <#w1>, <#w2>, <#w3> .

<#w1> a ontolex:LexicalEntry ;
  ontolex:canonicalForm [
    ontolex:writtenRep "grandfather"@en 
  ] ;
  wn:partOfSpeech wn:noun ;
  ontolex:sense <#example-10161911-n-1> .

<#example-10161911-n-1>  a ontolex:LexicalSense ;
  ontolex:reference <#example-10161911-n> .

<#w2> a ontolex:LexicalEntry ;
  ontolex:canonicalForm [
    ontolex:writtenRep "paternal grandfather"@en 
  ] ;
  wn:partOfSpeech wn:noun ;
  ontolex:sense <#example-1-n-1> .

<#example-1-n-1> a ontolex:LexicalSense ;
  ontolex:reference <#example-1-n> .

[] a ontolex:Sense ;
  vartrans:source <#example-1-n-1> ;
  vartrans:category wn:derivation ;
  vartrans:target <#example-10161911-n-1> ;
  dc:creator "John McCrae"@en .
          
<#w3> a ontolex:LexicalEntry ;
  ontolex:canonicalForm [
    ontolex:writtenRep "pay"@en
  ] ;
  wn:partOfSpeech wn:verb ;
  synsem:synBehavior [
    rdfs:label "Sam cannot %s Sue" @en
  ], [
    rdfs:label "Sam and Sue %s"@en
  ], [
    rdfs:label "The banks %s the check"@en
  ] .

<#example-10161911-n> a ontolex:LexicalConcept ;
  skos:inScheme <#example-en> ;
  wn:ili ili:i90287 ;
  wn:definition [
    rdf:value "the father of your father or mother"@en
  ] .

[] 
  vartrans:source <#example-10161911-n> ;
  vartrans:category wn:hypernym ; 
  vartrans:target <#example-10162692-n> .
          
<#example-1-n> a ontolex:LexicalConcept ;
  skos:inScheme <#example-en> ;
  wn:definition [
    rdf:value "the father of your father or mother"@en 
  ] ;
  wn:iliDefinition [
    rdf:value "the father of your father or mother"@en ;
    dc:source "https://en.wiktionary.org/wiki/farfar"
  ] .

[]
  vartrans:source <#example-1-n> ;
  vartrans:category wn:hypernym ;
  vartrans:target <#example-10162692-n> .

<#example-sv> a lime:Lexicon ;
  rdfs:label "Example wordnet (Swedish)"@sv ;
  dc:language "sv" ;
  schema:email "john@mccr.ae" ;
  cc:license <https://creativecommons.org/publicdomain/zero/1.0/> ;
  owl:versionInfo "1.0" ;
  schema:citation "CILI: the Collaborative Interlingual Index. Francis Bond, Piek Vossen, John P. McCrae and Christiane Fellbaum, Proceedings of the Global WordNet Conference 2016, (2016)." ;
  schema:url "http://globalwordnet.github.io/schemas" ;
  dc:publisher "Global Wordnet Association" ;
  lime:entry <#w4> .

<#w4> a ontolex:LexicalEntry ;
  ontolex:canonicalForm [
    ontolex:writtenRep "farfar"@sv 
  ] ;
  ontolex:otherForm [
    ontolex:writtenRep "farfäder"@sv ;
    wn:tag "NNS" 
  ] ;
  wn:partOfSpeech wn:noun ;
  wn:sense <#example-2-n-1> .

<#example-2-n-1> a ontolex:LexicalSense ;
  ontolex:reference <#example-1-n> ;
  wn:example [
    rdf:value "Jag vill berätta för er att min farfar var svensk beredskapssoldat vid norska gränsen under andra världskriget, ett krig som Sverige stod utanför"@sv ;
    dc:source "Europarl Corpus"
  ] .

10. Relation to Other Models

In this section, we informally clarify the relation to other models, in particular SKOS, the Lexical Markup Model (LMF), and the Open Annotation standard.

10.1 SKOS(-XL)

SKOS is a vocabulary used to represent so called knowledge organization systems (KOS), comprising taxonomies, classification schemes, thesauri etc. SKOS thus addresses an orthogonal use case to lemon. lemon was designed to provide detailed information about the linguistic grounding of an ontological vocabulary, specifying in particular by which lexical entries a class or property can be verbalized. SKOS has only a very rudimentary way of doing this, that is by means of SKOS labels and the properties (prefLabel, altLabel and hiddenLabel). This is by no means a criticism of SKOS, but merely to make clear that SKOS and lemon have been designed with a different purpose and use case in mind.

Nevertheless, SKOS and lemon can be used in conjunction to provide more detailed information about the "labels". We recommend to use the property evokes and its inverse isEvokedBy to relate a skos:Concept to a lexical entry. This is shown in the following example:

The use case we address is one where a thesaurus or other taxonomic resource or classification system in SKOS needs to be enriched with more detailed linguistic information.

no desc

:financial_assets a skos:Concept;
                ontolex:lexicalizedSense :financial_assets_lex.

:financial_assets_lex a ontolex:LexicalEntry;
                 ontolex:evokes :financial_assets;
                 ontolex:canonicalForm :financial_assets_form. 

:financial_assets_form ontolex:writtenRep "financial assets".

The above represents the recommended way of linking a SKOS concept to a lexical entry in the lexicon ontology model.

To show how to make statements about preferred lexicalizations akin to the properties prefLabel, altLabel and hiddenLabel as used in SKOS, the following example shows how to attach such preference information via the lexical senses:

no desc

:tuberculosis a skos:Concept;
     ontolex:isEvokedBy :tuberculosis_lex;
     ontolex:isEvokedBy :consumption_lex.

:tuberculosis_lex a ontolex:LexicalEntry;
      ontolex:sense :tuberculosis_sense;
      ontolex:evokes :tuberculosis.
   
:tuberculosis_sense a ontolex:LexicalSense;
      ontolex:isLexicalizedSenseOf :tuberculosis; 
      ontolex:usage [ rdf:value "preferred" ].

:consumption_lex a ontolex:LexicalEntry;
       ontolex:sense :consumption_sense;
       ontolex:evokes :tuberculosis.

:consumption_sense a ontolex:LexicalSense;
        ontolex:isLexicalizedSenseOf :tuberculosis;
        ontolex:usage [ rdf:value "outdated" ]. 

In case you are using reified labels as in SKOS-XL, it is possible to have forms or lexical entries in the range of the skosxl:prefLabel, skosxl:altLabel and skosxl:hiddelLabel properties. However, we note that from this it would follow that lexical entries and forms would be inferred to be skosxl:Labels, which does not correspond to the understanding of forms and lexical entries of this community as linguistic objects rather than mere `labels'.

10.2 LMF

The Lexical Markup Framework (LMF) (ISO-24613:2008) is a standard for representing machine readable lexicons. The model is not suited, however, to publish lexica on the web as linked data as it only knows a serialization in XML rather than in RDF. Further, LMF does not address the interface between lexica and ontologies as lemon does.

Nevertheless, the lemon model draws heavy inspiration from the LMF model. lemon has imported many classes/entities from LMF and adopted its core ontology. On the other hand, lemon has added vocabulary to describe the syntax-semantics interface with respect to an ontology and remove a number of classes that create syntactic overhead. A complete description of the relationship between LMF and the original lemon model is provided here. The main differences are summarized here:

10.3 OpenAnnotation

In many uses cases the need arises to annotate a text corpus with links to entities defined in a lexicon, e.g. lexical entries, forms, lexical senses, lexical concepts etc. lemon does not support this annotation per se, as there are other models that are dedicated exactly to this. This is the case for the Open Annotation standard. In both models an element of the lexicon may be the target of an annotation. This target may be a form, lexical entry, lexical sense or lexical concept and it is important to give the class to make clear what the target of the annotation is.

We will now give an example of annotating a word "cat" occurring at character 7 in a file at the URL 2, where the lemon element is given as the body of an annotation. For example

no desc

@prefix dctypes: <http://purl.org/dc/dcmitype/> .
@prefix oa: <http://www.w3.org/ns/oa#> .
@prefix ontolex: <http://www.w3.org/ns/lemon/ontolex#> .

:annotation a oa:Annotation ;
  oa:hasBody :cat ;
  oa:hasTarget <anno#target> .

:annotation#target a dctypes:Text ;
  oa:hasSelector <http://www.example.com/doc.txt#char=7,10> .

<http://www.example.com/doc.txt#char=7,10> a oa:FragmentSelector .

<cat> a ontolex:LexicalEntry .

11. Acknowledgements

The following persons have contributed to the creation of this document and are gratefully acknowledged.