Specification of Use Cases

From Ontology-Lexica Community Group
Revision as of 16:15, 9 May 2012 by Astellat2 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

This document specifies relevant use cases that the ontology-lexicon model is expected to support.

Each use case has an identifier consisting and owners.

A description of a use case consists of

  • a general description/motivation,
  • a detailed example involving specific ontologies and triples as well as
  • a specification of minimal necessary knowledge that is needed to account for the example.

Ontology-based Information Extraction and Ontology Population from Text
Ontology-based Information Extraction and Ontology Population from Text
Philipp Cimiano, Tobias Wunner, Paul Buitelaar
This use cases describes how the ontology-lexicon model can be used as background knowledge in ontology-based information extraction systems. The system is assumed to be ontology-based in the sense that the triples that it extracts from text conform to the vocabulary of the given ontology.
Imagine that our ontology models the gross domestic product as follows:
ex:gdp rdf:type rdfs:FinancialIndicator
ex:year rdfs:domain ex:FinancialIndicator
ex:year rdfs:range xsd:string
ex:value rdfs:domain ex:FinancialIndicator
ex:country rdfs:range ex:Country
ex:value rdfs:range xsd:double

Assuming we want to populate our example ontology by analysing textual data, we might want to extract the triples below from the following sentence: "The GDP of Germany was $3.306 trillion in 2010.":

ex:gdp_germany_2010 rdf:type ex:gdp
ex:gdp_germany_2010 ex:year "2010"
ex:gdp_germany_2010 ex:country ex:Germany
ex:gdp_germany_2010 ex:value "3.306."

The same triples should be extracted from equivalent sentences in other languages, e.g.

Das Bruttosozialprodukt von Deutschland betrug in 2010 3.306 Billionen Dollar. (DE) El producto interior bruto de Alemania ascendió a 3.306 trilliones de dólares en el 2010 (ES).

The knowledge required for this example is the following:

  • The relational noun "gross domestic product" subcategorizes a prepositional phrase headed by the preposition "of" expressing the country in question, as well as a second prepositional phrases headed by the prepositions "in" expressing the relevant year.
  • The above syntactic frame for "gross domestic product", abbreviated as "X is the gross product of Y in Z" here, lexicalizes the class ex_gdp, where the X is to be interpreted as filling the range of ex:value, Y as filling the range of ex:country and Z as filling the range of ex:year.
  • "gdp" is an abbreviation of gross domestic product and has the same subcategorization behaviour.
  • "Germany" refers to the entity ex:Germany
Necessary Knowledge (Examples)
  • subcategorization information
  • mapping of subcategorization frame to predicates in the ontology
  • abbreviations
  • named entities and a mapping to individuals in the ontology

Ontology-based Question Answering
Ontology-based Question Answering
Philipp Cimiano, Nitish Aggarwal
This use cases describes how the ontology-lexicon model can be used as background knowledge in an ontology-based question answering system that interprets natural language questions with respect to a given ontology.
Imagine a user asking the question "Who painted the Mona Lisa?" to a semantic question answering engine that has indexed all the RDF data available on the Web, in particular DBPedia. DBPedia contains a triple:

<http://dbpedia.org/resource/Mona_Lisa> <http://dbpedia.org/property/artist> "Leonardo Da Vinci”

The ontology-lexicon should contain all the relevant lexico-semantic knowledge that is needed to interpret the question above "Who painted the Mona Lisa?” into a SPARQL queries such as the following:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select ?who where {

<http://dbpedia.org/resource/Mona_Lisa> <http://dbpedia.org/property/artist> ?who }

The following sentences in other languages should be mapped to the same query:

  • Wer malte die Mona Lisa? (DE)
  • Quién es el pintor de la Mona Lisa? (ES)

This SPARQL query would retrieve the answer from the data graph above, i.e. "Leonardo Da Vinci”.

The lexico-linguistic knowledge necessary to map the above questions into the right SPARQL query includes at least the following:

  • "paint” is a transitive verb that subcategorizes for a subject and an object. The subcategorization frame can be abbreviated as "X:subj paint Y:obj”.
  • The lexical entry "paint” and the corresponding subcategorization frame "X:subj paint Y:obj” can be interpreted as the triple Y http://dbpedia.org/property/artist X
  • "Mona Lisa” lexicalizes http://dbpedia.org/resource/Mona_Lisa.
  • "painted” is the simple past or past participle of "paint”
Necessary Knowledge
  • verb knowledge, including subcategorization information
  • mapping of the subcategorization frame to ontological predicates
  • named entities and their mapping to URIs
  • inflectional and morphological variants of base forms

Natural Language Generation from Triples
Natural language generation from triples
Philipp Cimiano, Brian Davis
This use cases describes how the ontology-lexicon model can be used as background knowledge in a system that verbalizes triples, i.e. it generates natural language text from a given set of triples using the vocabulary defined by some ontology.

Imagine we have the following triples given:

dbp:Claude_Monet dbo:name "Claude Oscar Monet"
dbp:Claude_Monet dbpr:birthdate dbp:Claude_Monet/dateOfBirth/birth_date
dbp:Claude_Monet dbo:deathPlace dbp:Giverny
dbp:Claude_Monet dbo:nationality "French"
dbp:Claude_Monet dbo:field "Painting"
dbp:Claude_Monet/dateOfBirth/birth_date dbpr:day "14"^^xsd:integer
dbp:Claude_Monet/dateOfBirth/birth_date dbpr:month "11"^^xsd:integer
dbp:Claude_Monet/dateOfBirth/birth_date dbpr:year "1840"^^xsd:integer
dbp:Claude_Monet/dateOfBirth/death_date dbpr:day "5"^^xsd:integer
dbp:Claude_Monet/dateOfBirth/death_date dbpr:month "12"^^xsd:integer
dbp:Claude_Monet/dateOfBirth/death_date dbpr:year "1926"^^xsd:integer

Assume further a NLP system should generate the following sentences for the above triples in different languages:

  • "Claude Monet was a french painter born in Paris on the 14th of November of 1840. He died in Giverny on the 5th of December of 1962" (EN)
  • "Claude Monet war ein französischer Maler, der in Paris am 14. November 1840 geboren wurde. Er starb in Giverny am 5. Dezember in 1962." (DE)
  • "Claude Monet fue un pintor francés nacido en Paris el 14 de Noviembre de 1840. Murió en Giverny el 5 de Diciembre de 1962. (ES)

In order to generate such discourses from the above triples, the following lexico-linguistic knowledge is needed (for English and similar for the other languages):

  • "(be) born" requires a subject (the entity given birth to), a prepositional phrase headed by the preposition "on" (expressing the day of birth) as well as a prepositional phrase headed by the preposition "in" (expressing the place of birth).
  • The above mentioned subcategorization frame "X be (born) in Y on Z" lexicalizes the following triples: { X onto:birthdate Y, X onto:birthplace Y}.
  • The adjective "french", when modifying an entity X, expresses the triple "X onto:nationality "French_people"".
  • the noun "painter" when predicted about an entity X, expressess the triple "X onto:field "Painting"".
  • The verb "die" requires a subject as well as a prepositional phrase headed by the preposition "on" (expressing the date of death) as well as a prepositional phrase headed by the preposition "in" (expressing the place of death).
Necessary Knowledge
  • verb knowledge together with subcategorization information
  • adjective and noun knowledge, and their mapping to ontological properties and classes
  • morphological and inflectional knowledge (for generation)

Integration and publishing of legacy language resources
Lexical Linked Data
John McCrae, Aldo Gangemi, Armando Stellato
There is a large number of machine-readable lexical resources already available in various format, and structured according to different schemat. However, thse resoruces are not accessible on the Web, and the schemata are both implicit and mutually incompatible. Through a set of good practices proposed by this CG, it will be possible to publish the resources in a homogeneous format (RDF), and link them to the LOD cloud. In addition, the alignment of the resource schemata to an open lexicon ontology designed by this CG (or directly the representation of those resources according to the open lexicon ontology) will enable a deeper and sounder semantic interoperability.

WordNet is a first example, as it is a widely-used, broad-coverage resource for English. It has already been published as linked data in two versions: 2.0 (work of the W3C Task Force on porting WordNet to the Semantic Web), and 3.0 (work of the Free University of Amsterdam). However the schema (ontology) used in (RDF) WordNet is a conservative refactoring of the original WordNet database schema, and is not easily comparable to other resources using different schemata. For example, the class: wn20:Synset in the WordNet ontology can be aligned to some other class in a RDF lexical dataset, e.g. to the class: framenet:Frame.

Other lexical resources have been ported to the LOD cloud, e.g. FrameNet (work of STLab at ISTC-CNR), each with its own ontology conservatively based on the original schema.

Wiktionary is another resource that could be improved through the use of linked data publishing. The publicly available version of Wiktionary is in MediaWiki (presentation) mark-up and hence is difficult for machines to understand and often inconsistent. There are a number of ongoing attempts to fix this, however there is no consensus on the correct representation of the data and how this should interact with other resources in the linked data cloud (notably WordNet and DBPedia). However, it is clear that Wiktionary is a very rich resource and contains significantly richer information than is in WordNet and hence any existing format would need to be significantly extended to capture all the information in Wiktionary.

It's therefore very relevant for interoperability at the ontology and data layers of lexical resources to establish equivalences or similarities between the respective original ontologies, and an open lexicon format.

/* For example, it is important to run linking algorithms generating owl:sameAs or skos:closeMatch triples between data of the same or similar type e.g. between a WordNet synset and a FrameNet frame, and not between a WordNet word and a FrameNet frame. In the first case, we could get a (useful) union of documents annotated with wn20Synset:Desire and those annotated with fnFrame:Desiring, while in the second case, we could get a noisy union of documents annotated with wn20Word:desire and fnFrame:Desiring. The noisy case is evident if we consider a philologist's paper on the history of the word desire (annotated with wn20Word:desire), and a biochemistry article on the physiology of desire (annotated with fnFrame:Desiring). */

One possibility (compatible with suggesting a lexicon model) is also to keep existing resources already expressed in RDF, and find a good metamodel to fit their content and make it compatible/comparable. Here's a simple example of a pattern for linking external ling resources to an ontology through a metamodel:

<wn20schema:NounSynset rdf:about="wn20instances:synset-entity-noun-1" rdfs:label="entity">

<rdf:Description rdf:about="wn20schema:Synset">
    <rdfs:subClassOf rdf:resource="ml:SemanticDescriptor"/>

    <otml:semanticDescriptor rdf:resource="wn20instances:synset-entity-noun-1">


the prefixes ml and otml indicate two possible subvocabularies for "meta-lexicon" and "ontology to metalexicon linking"

ml:SemanticDescriptor indicates a very general concept of "semantic collector" in lexicons (such as synonymy sets, i.e. synsets, in wordnet)

otml:semanticDescriptor is a pointer to ml:SemanticDescriptor(s)

thus the wordnet synset 100001740 is declared in wn20schema as a NounSynset (and thus a Synset). Synset is an ml:SemanticDescriptor. A given XYZ concept from an ontology is then "decorated" with a reference to the wordnet synset.

Obviously, all names are just few-seconds-thoughts...

Necessary Knowledge

The requirements for this use case concern the ability of an open lexicon ontology to enable a smooth representation of data from existing LRs. This includes but is not limited to:

  • Metadata about the type of Linguistic Resource (a lexicon, a frame, a lexical entry, ...) and its characteristics. A LR may belong to more than one type.
  • Representation of forms, senses, synsets, sentences, and other entities contained in LRs. It is important to find the proper balance between generality and expressiveness: should we ever commit to any particular theory?, should we formalize synsets or think of a more general word-collector of which synsets is a subtype, defined in wordnet's theory?
  • Representation of relations between entities from LRs, such as synonymy, derived forms, translations, hypernyms, antonyms, frame-role associations, and other relations.
  • Representation of etymology, pronounciation, definitions and (usage) examples that typically document and/or enrich the structure of LRs.
  • Good practices for linking LRs between them, as well as to ontologies in the LOD cloud, with case studies and known anti-patterns (typical errors).
  • Tools for translating, refactoring, linking, and publishing LRs in the LOD cloud

The objective here will be to develop a minimal yet appropriate and suitable expressive model that allows to represent various lexical resources in a uniform and principled way and uspports the linking between them.

Representation of Translations in the Web of Data
Representation of Translations in the Web of Data
Elena Montiel-Ponsoda, John McCrae
This use case describes how the representation mechanisims provided by the lexicon-ontology model can be used to represent translations in the Web of Data taking advantage of the linguistic descriptions associated to ontologies and data sets.
Example 1: Imagine that we have two data sets in the Web of Data about Administrative and Governmental organizations in Europe. One represents the administrative organization of Great Britain and is documented in English, and the other represents the Spanish administrative organization of the country and has labels in Spanish. In each of the ontologies we find the concept of "head of the executive branch of the government". In the English ontology the label describing this concept is "Prime Minister". In the Spanish ontology, the corresponding label is "Presidente del Gobierno". For certain purposes, such as interoperability at a European level, we may want to express that one label is the cultural equivalent translation of the other. This means that although they cannot be considered "exact conceptual equivalents", in certain contextual conditions it may be convenient or adequate to express that one is translation of the other.

Example 2: Now let us imagine that we have one data set in the Web of Data about e-commerce. This could be described according to the GoodRelations ontology. We may want to translate the GoodRelations ontology into Spanish in order to annotate information about e-commerce in Spanish. In this case we also need to represent translation relations even when descriptions in English and Spanish are pointing to the same ontology concepts. For example, we may want to express that the English label "payment methods" is translated as "medios de pago" into Spanish.

Necessary Knowledge
  • We may need to specify in which contextual conditions two labels or terms can be considered translations of each other.
  • Terms in the different languages can point to the same ontology or resource, or they can belong to lexicons or natural language descriptions of different ontologies or datasets, so we need mechanisms to represent these translation links across ontologies and data sets in different languages.
  • Translations may also be of different sorts. We may be talking about "cultural equivalents", or if no equivalent exists, we may want to provide a description in the target language. In our first example this would involve including the label "Primer Ministro británico" in Spanish.

Benefits of rich Linguistic Descriptions for Ontology Translation
Benefits of rich Linguistic Descriptions for Ontology Translation
Mihael Arcan, Elena Montiel-Ponsoda
This use case describes how rich linguistic descriptions associated to ontology elements can help in the activity of ontology translation or localization.

The type of linguistic descriptions associated to ontology elements can range from simple part-of-speech annotations or terminology variation, to a more deep morphosyntactic analysis of labels or description of syntactic frames (subcategoration), for example. In the following we include several examples of how linguistic descriptions may contribute to obtaining the most adequate translation candidates.

Example 1: It is well known that part-of-speech annotations may help in disambiguating translation candidates. Let us take the example of "book". If "book" is describing an element of the ontology, it can refer to the noun "a book" as a set of written, printed sheets bound together, or to the verb "to book", as to make a room reservation in a hotel. In Spanish, for example, the noun would be translated as "libro" and the verb as "reservar". Apart from the semantic context provided by the ontology, this simple linguistic analysis can already give some hints about the most appropriate translation.

Example 2: If term variation is captured in the linguistic descriptions associated to the ontology, i.e., if several variants (such as orthographical variants, dialectal variants or register variants) are used to describe ontology elements, the chances of getting the correct translation will increase. Let us imagine that we have the concept "headache" in our ontology, defined as "a pain in the region of the head or the neck", and two register variants associated to it: "headache" and "cephalalgia". We have more probabilities of obtaining the right translation for those terms in general and specialized dictionaries.

Example 3: A richer analysis of labels can help in the way that extracting subterms of a complex label can improve the translation of labels, which holds the same subterm, i.e. "cost of raw materials, consumables and supplies, and of purchased merchandise" holds a financial subterm stored in the ontology "raw materials, consumables and supplies", which is linked with the German translation "Roh-, Hilfs- und Betriebsstoffe". In this cases, it is possible to match subterms and translate them as a one unit, which gives the proper translations instead of splitting the subterm into even smaller pieces.

Necessary Knowledge
  • basic linguistic description (part-of-speech, canonical forms, natural language of the label...)
  • morphology (inflectional variants)
  • variation (synonymy, terminology variation...)
  • phrase structure of multi-word forms (identification of subterms within a term)
  • syntactic frames (subcategorization frames)

Ontology-based Machine Translation
Ontology-based Machine Translation
John McCrae, Elena Montiel Ponsoda
Similarly to the above use case (LDOT), the use of ontology semantics should help in the translation of text documents. The usage of ontology-lexicon allows for elements in text to be identified and extra information about their semantic content to be obtained, aiding principally in the problem of deducing valid ontology candidates. In addition the use of more sophisticated linguistic description allows the syntax of the term to be better understood by means of ontological constraints

Example 1: An ontology-lexicon can be used to disambiguate terms in text based on their syntactic information and semantic context. For example the sentence "Push the rudder to the left to bank the airplane" has a highly ambiguous term "to bank". Firstly this can be disambiguated by applying part-of-speech tagging to deduce that this entry is a verb, hence ruling out translations such as "Bank" (financial institute) or "Ufer" (river bank). Further, looking at the surrounding words ("rudder", "airplane"), indicates that word is likely to be associated to the domain of air travel, indicating the correct translation should be related. For reference, Google Translate generated "Schieben Sie das Ruder nach links, um das Flugzeug Bank", where "Bank" should be "zu neigen".

Example 2: Semantic role labeling has been shown by many authors to be beneficial for machine translation. This involves identifying the subcategorization of verbs and its arguments. For this a semantic role lexicon is required. If such a lexicon were bound to an ontology then it would be possible to constrain the possible applications of a frame to not just the syntactic role of the arguments but also the semantic type of the arguments. For example the verb "to know" is translated into German as either "kennen" or "wissen", the former generally when the object is a person, and the latter when the object is a fact.

Example 3: Reference resolution is the process of finding the referent of anaphors such as pronouns. Most languages use pronouns in very different manner, for example English uses pronouns differentiated by gender and number, but uses the neutral gender for all inanimate objects, in contrast German uses gendered pronouns for inanimate objects, based on the gender of the referent. Furthermore, some languages do not use gender for pronouns (e.g., Turkish) or do not use pronouns (e.g., zero-anaphor in Japanese). As the referent does not necessarily occur close to the anaphor in text, deep processing of the document is required and the selection of the correct translation requires knowledge about the grammatical gender, number and semantic class of the referent.

Necessary Knowledge
  • POS, case, gender, number etc. of words
  • Frames and syntactic roles of the labels
  • Top-level semantic classes of referenced ontology entities
  • Domain/range restriction of ontological predicate
  • Higher-order ontological predicates (e.g., order > 2)

Semantic Search
Semantic search
Nitish Aggarwal, Philipp Cimiano
This use case is concerned with how the ontology-lexicon model can support the monolingual and cross-lingual semantic search problem. Semantics of queries and documents can be interpreted by annotating them with domain ontologies, enriched with lexical information provided by the lemon model. The semantic annotation enables cross-lingual matching of a query in one language to relevant documents or document segments in other languages.

Example. 1: A sophisticated cross-lingual semantic similarity assessment can help to match a query in one language to available information in other languages. E.g. the term "Umlaufvermögen" refers to the term "current assets" in English and contains two lexical entries as follows:

1. Umlauf/JJ (modifier) - circulate, orbital, fluid

2. Vermögen/NN (head) - assets, wealth, money

The meaning obtained from the above lexical entries refers to other terms "Fluid assets", "Liquid assets" and "Circulating money" as they have semantically similar head and modifier, in English. This example also shows that the terms "Current assets" "Fluid assets" and "Liquid assets" refer to the same concepts in English.

This semantic similarity can help to interpret a given query "Current assets of Vestas in 2011" which is described below.

Imagine that we have a finance ontology as follows:

<xebr:LiquidAssetsPresentation> <rdfs:subClassOf> <xebr:Class> .
<xebr:CurrentAssetsPresentation> <rdfs:subClassOf> <xebr:Class> .
<xebr:LiquidAssetsPresentation> <rdfs:label> "liquid assets [presentation]"@en .
<xebr:hasLiquidAssetsTotal> <rdf:type> <owl:FunctionalProperty> .
<xebr:hasLiquidAssetsTotal> <rdf:type> <owl:DatatypeProperty> .
<xebr:hasLiquidAssetsTotal> <rdfs:domain> <xebr:LiquidAssetsPresentation> .
<xebr:hasLiquidAssetsTotal> <rdfs:label> "liquid assets [total]"@en .
<xebr:hasLiquidAssetsTotal> <rdfs:range> <xsd:monetary> .
ex:year rdfs:domain ex:FinancialIndicator
ex:year rdfs:range xsd:string
<xebr:hasCompanyNameText> <rdf:type> <owl:FunctionalProperty> .
<xebr:hasCompanyNameText> <rdf:type> <owl:DatatypeProperty> .
<xebr:hasCompanyNameText> <rdfs:domain> <xebr:CompanyNameList> .
<xebr:hasCompanyNameText> <rdfs:label> "company name, text"@en .
<xebr:hasCompanyNameText> <rdfs:range> <xsd:string> .

The concept appeared in given query "current assets" is semantically similar to ontology concept "liquid assets". Therefore this query is interpreted as searching for a monetary value tagged with "liquid assets" in a report, which is tagged for company name "Vestas" and year "2011"

ex:Vestas xebr:hasCompanyNameText "Vestas"
ex:Vestas ex:year "2011"
ex:Vestas xebr:hasLiquidAssetsTotal  ?
Necessary Knowledge
  • basic linguistic description (part-of-speech, syntactic analysis and parsing)
  • variation (synonymy, terminology variation...)
  • phrase structure of multi-word forms
  • basic similarity measures (edit-distance, n-gram model..)

Semantic Tagging of text
Cross-lingual Web Service Retrieval
Philipp Cimiano, Maria Maleshkova (Open University)
This use case is a specialization of the above use case on cross-lingual information retrieval. This use case describes how the ontology-lexicon model could support the retrieval of web services across languages.
Assume that a web service is described by the following RDF description:
@prefix : <http :// iserve . kmi.open.ac.uk/resource/services/e8f9548e−bbed−43fe−9d8a−71b7fdef9da#> .
@prefix rdf : <http :// www.w3.org/1999/02/22−rdf−syntax−ns#> .
@prefix rdfs : <http :// www.w3.org/2000/01/rdf−schema#>.
@prefix msm: <http://cms−wg.sti2.org/ns/minimal−service−model#>.
@prefix rest: <http :// purl .org/hRESTS/1.1#>. @prefix waa: <http://purl.org/hRESTS/1.1#>.
@prefix sawsdl: <sawsdl="http://www.w3.org/ns/sawsdl#">.
 :lastfmService a msm:Service ;
rdfs : isDefinedBy <http :// www.last.fm/api/show?service=267> ;
sawsdl:modelReference <http://programmableweb,com/classes/Music>
rest :hasAddress "method=artist.getinfo&artist={p1} &api_key={p2}"^^rest:URITemplate;
waa:requiresAuthentication waa:All ;

The above description makes reference to the Programmable Web Taxonomy: http://www.programmableweb.com/apis/directory see Category

Necessary Knowledge

Support to Automatic Ontology Mediation through exploitation of Linguistic Metadata
Support to Automatic Ontology Mediation through exploitation of Linguistic Metadata
Armando Stellato
Many ontology mapping/matching tools available from the ontology research communities need to be tuned and configured according to the specific scenario in which they are employed: the modeling language adopted (RDFS, OWL, SKOS), the language(s) in which labels are expressed (or if localnames of URIs are the sole source available), the ontology constructs used to provide these labels etc.. (see [1] and [2] for some application scenarios).

Though many of these tools are claimed to be usable (with little or no adaptation at all) in scenarios where mappings are to be computed on demand, actually the context of a mediation does not allow any fine tuning of the matching processes as ontologies are not both locally available (e.g. wrapped by resource agents on the Web, available as linked data etc..) and there’s no need to align the whole ontologies, only to "negotiate the meaning” of specific concepts involved in a remote query. This results, in practice, in:

  • The unavailability of explicit information about the ontologies to be mediated (languages, references to notable lexical resources etc..)
  • The impossibility of producing this information in a short time, as this involves a global analysis of the ontology or at least an analysis to be carried over a sample of the sources

A vocabulary supporting OM should provide:

  • Explicitly links between ontologies and lexical resources. This means a general vocabulary to properly represent linguistic information, but also an explicit way to link info from notable resources (e.g. WordNet, see note 1 below this table).
  • A set of ontology properties (metadata for the ontology as a whole) providing quantitative/qualitative information on the degree of "linguistic expressiveness” of an ontology with respect to given languages and/or linguistic resources

I’m trying to depict in an informal "story" a dialogue between two ontology agents: Merlin and Djinni, which benefit of the above descriptors for their ontologies. Text between square brackets explains what happens behind the scenes

Merlin: Hi, I’m Merlin the Wizard. I see you are a Genie, so i suppose we can talk about magic*

Djinni: Oh yes, I like talking about magic. My reference ontology for magic is: Xxxxx/magic.owl

Merlin: Erm…sorry, mine is: YYYYY/mana.owl

Djinni: Well, ok, what’s (are) your language(s)?

Merlin: actually I’m a good english speaker [ontology natively filled with english terms]

Djinni: Mmm…I just speak arabian, and I’m able to express some of my ideas in a very simple english [Freelang (http://www.freelang.net/) bilingual vocabulary, automatic translation with 23% coverage of ontology concepts]

Merlin: That’s much better than nothing, I can summon a familiar of mine who is a good English speaker [a Wordnet 3.0 resource agent] and I’ve just found on the yellow pages an english/arabian translator [Dict (http://www.dict.org) english/arabian dictionary Semantic Web Service], maybe they can help us a bit…

Another example is given by the possibility a lexical resource as an interlingua. Given a dialogue such as the one above, if the two agents discover they both have references to WordNet (also, checked on a quantitative ground), they can use synsets as a less ambiguous language (but still, not a formal constraint) to match their concepts.

Necessary Knowledge

The dialogue above could represent a real (formal) dialogue between two distinct agents, or an inspection done by a single agent having complete access to the ontology data (as of Open Linked Data). In both cases, it implies the availability of explicit knowledge about the linguistic expressiveness of mediated ontologies. Potentially useful information is:

  • List of languages for which concepts are being described in the ontology
  • List of (linkable) lexical resources (see note 1 below this table) which have been adopted in enriching the concepts (especially necessary in case the resource is enriched with specific semantic links such as wordnet’s synsets).
  • Percentage of terms common to both ontology and linguistic resource w.r.t. total number of terms (for the same language) in the ontology. This information is useful for knowing at what extent the considered LR is covering the ontology
  • Average number of terms per concept, which belong to the linguistic resource
  • Percentage of ontology concepts which are represented by at least a term from the linguistic resource

All of the above info are not difficult to be produced (i.e. it is not "unrealistic" to think of people adding this metadata), as these can be generated easily and automatically by dedicated tools. So, what we would need here is to reuse core-elements needed to describe linguistic-resources at a meta-level (that is, able to wrap different lexical resources) thus coming from requirments of the LLD use-case, and add specific elements of the lemon vocabulary to describe the ontology-lexicon interaction on a qualitative and quantitative ground, thus:

  • Metadata about expressivity of an ontology/concept scheme for a given language (in general) or, more specifically, for a given lexical resource
    • void-like statistical info about:
      • the expressivity of an ontology/conceptscheme with respect to a given language (e.g. how many resources are labelled or linguistically expressed somehow, in a given language)
      • the coverage of an ontology/conceptscheme with respect to a given lexical resource (i.e. how much the conceptual content is covered by that resource's lexical content)
    • the "linguistic enrichment" of an ontology wrt a given lexical resource, could be an entity on its own, thus mapping the specific relation ontology-lexresource for a specific lexresource, and statistical data of point above would be associated to each "linguistic enrichment" instance

Lexicon driven Ontology Evolution
Lexicon driven Ontology Evolution
Dagmar Gromann, Thierry Declerck
Large-scale ontologies inevitably change and evolve, which renders it necessary to propagate these changes to dependent elements. This use case focuses on the evolution of elements of an existing domain ontology itself rather than instance data by applying bootstrapping and ontology learning methods. The ontology-lexicon model augments both ontology learning and ontology evolution by contributing rich linguistic resources to the process of obtaining valid ontology concepts and relations from text.

Either a changing view of the world or an altered usage situation may necessitate change in the ontology. Furthermore, an elaborate and rich lexical description of tokens occurring in natural language text might call for changing elements of the ontology. On the event of modified structures in the domain, the ontology has to evolve to reflect these changes and its consistency needs to be maintained in all its parts.


Example1: The Industrial Classification Benchmark ontology [3] contains the subsequent concept

ex:AssetManager rfds:subClassOf ex:FinancialServices
ex: AssetManager rfds:label “Asset Managers“@en
ex: AssetManager rfds:label “Administradores de activos“@es
ex: AssetManager rfds:label “Vermögensverwaltung“@de

On the basis of the following sentence, new ontology nodes can be derived as subclass of “Financial Services”.

These strategies had proved resilient for asset managers, including hedge funds.

Hearst Pattern:

[NP0 asset_NN managers_NNS], [VBG including] [NP1 hedge_NN fund_NN]

This allows us to extract the following triple from the sentence above and its translations, and extend the ICB ontology by one sub-concept “hedge fund”:

ex:AssetManager rfds:subClassOf ex:FinancialServices
ex:HedgeFund rfds:subClassOf ex:AssetManagers
ex: HedgeFund rfds:label “Hedge Fund“@en
ex: HedgeFund rfds:label “Fondo de inversión libre“@es
ex: HedgeFund rfds:label “Hedgefond“@de

This evolution is further underpinned by the fact that most companies classified as ICB 8771 “Asset Managers“ explicitly refer to hedge funds in their company profiles, such as the Bank of New York Mellon Corp. or the State Street Corp.

According to a recent article in The Economist, 67% of hedge funds were below their high-water marks at the end of 2011. Thus, some companies in the industry decided to amend “the lexicon of hedge funds” by changing their title to “alternative asset managers” and turning form “absolute” to “relative returns”, as incisively depicted by a letter to investors from Zilch Capital, LLC . “Hedge fund” and “alternative asset manager” point to the same ontology concept, but represent different aspects of the concept, i.e. differ semantically. The ontology-lexicon model represents this term variation by linking each term to different lexical senses, which will point to the same ontology entity.

:hedge_fund_lemon:canonicalForm [lemon:writtenRep "hedge fund"@en];
    lemon:otherForm [lemon:writtenRep "hedge funds"@en];
    lemon:sense [lemon:reference ontology:HedgeFund] .
    lemon:otherForm [lemon:writtenRep   "hedge funds"@en];
:alternative_asset_manager lemon:canonicalForm 
    [lemon:writtenRep "alternative asset manager"@en];
    lemon:otherForm [lemon:writtenRep "alternative asset managers"@en];
    lemon:sense [lemon:reference ontology:HedgeFund].
Necessary Knowledge
  • basic lexical information (lemma, POS, morphology, etc.), lexical semantic relations (hypnomy, meronymy, semantic and terminological variation etc.) and linguistic structures (collocation patterns, constituency and dependency information, grammatical roles, etc.)
  • application of linguistic heuristics to text
  • ontology learning methods from text (formalization of lexical relations, etc.)
  • bootstrapping

Ontology alignment
Ontology alignment
Jorge Gracia, John McCrae
Ontology alignment or ontology matching is the task of discovering correspondences between entities in different ontologies. It forms the basis of several other tasks, such as information integration, ontology evolution, semantic-based query answering, etc. There is a range of techniques that can be used for ontology matching such as graph-based comparison, common extension comparison, terminology similarity, etc. However these techniques ground ultimately on comparisons of the lexical information contained in the ontology and as such rely on NLP processing tools.

External lexicons associated to ontologies could be used as a sort of interlingua and as such exploited to better discover correspondences between ontology entities on the basis of their lexical commonalities.

Some situations in which richer external lexical information would help OA:

1) Ontology alignment often consists of matching entities based on labels, this is often based on term variation such as

  • Synonyms: "Car" vs. "Automobile"
  • Rearrangement: "prostate cancer" vs "cancer of the prostate"
  • Modification by adjunct or adjectives: "hospital-acquired MRSA" vs "MRSA"

Such term variations could be represented in the ontology lexica and used to get a better lexical mapping between the compared concepts

2) If links from different ontology entities to common lexical entries (in external lexicons) can be discovered, lexical similarity between the entities could be inferred.

  • OntologyA#entityA -> hasLexicalEntry -> Lexicon#lexicalEntryA
  • OntologyB#entityB -> hasLexicalEntry -> Lexicon#lexicalEntryA
  • => OntologyA#entityA and OntologyB#entityB are (lexically) similar

3) Even if the compared ontologies point to different lexicons, the associated lexical descriptions contained in such lexicons can be compared (in a way that is richer than traditional label-based comparisons, as there are more features than "label" to consider) and lexical similarities inferred.

  • OntologyA#entityA -> hasLexicalEntry -> LexiconA#lexicalEntryA
  • OntologyB#entityB -> hasLexicalEntry -> LexiconB#lexicalEntryB
  • LexiconA#lexicalEntryA and LexiconB#lexicalEntryB are equally defined (e.g., same written form, variations, PoS, etc.)
  • => OntologyA#entityA and OntologyB#entityB are (lexically) similar
Necessary Knowledge
  • Basic syntactic information, e.g., part-of-speech
  • Synonyms
  • Term structure

Ontology transformation enhanced by lexical information
Ontology transformation
Ondřej Zamazal, Vojtěch Svátek

With the help of lexical information, different applications can better work with alternative modelling styles employed in an ontology. Ontology transformation [4] in this context means modification of an ontology in terms of its structural and naming aspects. Lexical information would help detect ontology fragments to be transformed and generate the new variants of those ontology fragments.


Ontology transformation basically consists of three steps: detection of ontology fragments to be transformed, generation of transformation instructions, and transformation as such. Ontology fragments to be transformed are detected using the structural aspect (employed axioms) and naming aspect (names of entities). Regarding the names of entities (local fragment of an IRI), they are usually very short and their analysis can hardly reveal much of the underlying lexical background. Using labels can only slightly improve the situation. Therefore, explicit representation of lexical characteristics of entity names could contribute to resolution of this 'detection bottleneck' and consequently increase the quality of the transformation result, see example.

Limitations of existing approach

Currently, NLP-based techniques applied on local IRI fragments or labels of entities often label fail due to lack of material for proper analysis and conclusion.


Let us consider that we want to change (here, unfold) the representation of a concept by a named entity to its alternative modelling style, i.e. class A will be replaced by the definition 'p some B'. Particularly, we would like to transform the 'AcceptedPaper' named entity into 'Paper and (hasDecision some Acceptance)'. In order to properly perform such transformation, we would need to get the following lexical information:

  • main term of AcceptedPaper
  • noun form of modal adjective part of noun phrase, i.e. Accepted -> acceptance
  • hint about the naming property p (i.e. hasDecision), e.g. reference to a related activity.
Necessary Knowledge

Basic syntactic knowledge (e.g., adjectival inflection for modification) Patterns matching syntactical constructs to semantic constructs

[1] Sure, Y., Corcho, O., Euzenat J., and Hughes, T. eds.: Proceedings of the 3rd Evaluation of Ontology-based tools (EON), located at the 3nd International Semantic Web Conference ISWC 2004, Hiroshima, Japan, November 2004

[2] http://ontologymatching.org/

[3] This ontology is a transformation of the ICB taxonomy (http://www.icbenchmark.com/) into OWL, performed by the partner DFKI in the project Monnet (http://www.monnet-project.eu/).

Note 1. This usecase much relies on outcome from the LLD scenario proposed by John McCrae. I was originally publishing two use cases, when I noticed the one from John had an almost complete overlap with my first one so I preferred to publish only SAOM and put a dependency on the already existing LLD. To John: I didn’t add anything to the description of LLD to avoid stating things which could be not in your original intention. However, whenever you feel that some of the things described in SAOM are perfectly aligned to your idea, feel free to copy them in LLD and leave in SAOM only those things peculiar to ontology mapping support + links to LLD