Goals and Scope of Ontology-Lexica Community Group

From Ontology-Lexica Community Group

Motivation

Ontologies have numerous applications and they represent the conceptual backbone of the Semantic Web. In fact, significant efforts have gone into standardization efforts under the auspices of the W3C to produce „recommendations“ for data and knowledge representation languages, i.e. the Resource Description Framework (RDF) and the Web Ontology Language (OWL).

While such ontology languages allow us to define logical theories consisting of ungrounded symbols and corresponding axioms, a grounding in language is crucial in order to render such ontologies for human consumption and thus support meaningful interaction with them by human users. Going further, it seems reasonable to assume that acces to the Semantic Web will be to a large extent mediated by language as this is the natural means of expression and communication employed by humans.

However, current web-based knowledge representations languages such as OWL and RDF(S) lack the rich linguistic grounding that is required for language-mediated access to ontologies. OWL and RDF(S) rely on a property rdfs:label to capture the relation between a vocabulary element and its (preferred) lexicalization in a given language. This lexicalization in some sense provides a lexical anchor that makes the concept, property, individual etc. understandable to a user. The mechanisms for linguistic grounding available in OWL and RDF(S) can be seen at best as rudimentary. They are far from being able to capture the necessary linguistic and lexical information that NLP applications working with a particular ontology need. Such NLP applications are for example:

  • Natural language generation systems that produce coherent discourses verbalizing a set of triples.
  • Question Answering systems that interpret user questions with respect to ontologies.
  • Text interpretation systems that interpret texts with respect to a given ontological vocabulary, extracting triples with respect to this vocabulary
  • Information retrieval systems
  • ...


Milestones

The mission of the ontology-lexica working group is to produce a specification for a lexicon-ontology model that can be used to provide rich linguistic grounding for domain ontologies. Rich linguistic grounding include the representation of morphological, syntactic properties of lexical entries as well as the syntax-semantics interface, i.e. the meaning of these lexical entries with respect to the ontology in question. An important issue herein will be to clarify how extant lexical and language resources can be leveraged and reused for this purpose. As a byproduct of our work on specifying a lexicon-ontology model, we hope that such a model can become the basis for a web of lexical linked data (LLD): a network of lexical and terminological resources that are linked according to Linked Data principles forming a large network of lexico-syntactic knowledge.


Specifying requirements and use cases for such a model will be an important milestone for this working group. Roughly, the 5 milestones for the ontology-lexica group are the following ones:

  • M1: Specification of Use Cases
  • M2: Specification of Requirements on the model
  • M3: Specification of the core lexicon-ontology model
  • M4: Development of an API for the model
  • M5: Development of further modules
  • M6: Release of the final specification including the core ontology-model and modules on which consensus has been reached.

General Requirements on the Model

Four important meta-requirements can be already advanced:

  • R1: The actual model will be an OWL ontology, while a specific lexicon instantiating the model will be a plain RDF document.
  • R2: (“Multilinguality”): The model should support the specification of the linguistic grounding with respect to any language
  • R3 (“Semantics by reference”): The meaning of lexical entries will be specified through a principle we call “semantics by reference” by which the semantics of a lexical entry with respect to a given ontology will essentially be specified by referencing the URI of the concept or property in question.
  • R4: (“Openness”): the lexicon-ontology model will be “open” in two ways; first, it will also be extensible by new constructs as needed, e.g. by a certain application. Second, it will not make unnecessary choices with respect to which linguistic data categories to use, leaving open the possibilties to have very different instantiations of the lemon model. In this sense lemon can thus be called a lexicon-ontology meta-model.
  • R5: (“Reuse of relevant standards”) We will aim to reuse as many standards as possible, in particular lexicon models such as LMF as well as terminology models such as LMF as well as linguistic data categories

Scope

While we have mentioned above what the main mission of the ontology-lexica group is supposed to be, we find it equally important to explicitly exclude certain goals. The mission of the group will not be:

  • to create an annotation format allowing to annotate texts with ontological concepts (see older work on SHOE / OntoMat as well as recent work on RDFa)
  • to develop specific NLP tools that make use of lexicon ontologies
  • to develop yet another lexicon model: ( e.g. LMF, ...<please add>) though a metamodel for wrapping existing lexicons will probably be an outcome of the effort from this working group
  • to develop yet another terminology standard: see TMF etc.
  • to develop a model for translation memories or bilingual dictionaries