Terminology

From Ontology-Lexica Community Group

Connection Details

https://upm.zoom.us/j/85417259039

Meeting Minutes

November 27th 2023

Agenda:

  1. Review action points from the previous meeting
  2. Review already identify representation needs
  3. Identify new use cases/representation needs
  4. Discuss (proposed) solutions

Meeting minutes: https://docs.google.com/document/d/13Tc_qRcvZR6Qv5rKWFII6tLBK1_cu3viQK5KwyTIzPg/edit?usp=sharing

September 12th 2023

W3C Day @LDK23 Meeting minutes here.

Slides here.

May 10th 2023

Online meeting

  1. Update of the proposal towards a terminology module in Ontolex by Elena, Víctor, Thierry and Patricia.
  2. Several expressions of disagreement were received.
  3. Fahad proposes to have two co-leaders of this initiative, expressing the interest of Rute and Sara.
  4. Jorge proposes to start identifying open issues that cannot be covered with the current status of Ontolex and start working from there.
  5. Patricia, Elena, Víctor and Thierry agree and add the open issues identified during their work to the Terminology Wiki.
  6. The rest of the community members is expected to add their issues as well.

September 4th 2021

Physical meeting in Zaragoza

  1. Introduction of a proposal towards a terminology module in Ontolex by Thierry and Patricia during the W3C Day at LDK in Zaragoza.
  2. Several expressions of interest were received.

Open Issues

This section is used to collect and document, collaboratively, the doubts and issues that arose when working with lemon-ontolex for terminologies.

Proposed typology of issues:

  • T1- Best practises when applying lemon core and existent modules (i.e.: best ways of modelling certain things with the existent "ingredients")
  • T2- Detected limitations of the lemon core (i.e., things that cannot be easily modelled with the current "ingredients")
  • T3- Missing entities to account for specific information in terminologies
  • T4- Missing categories in existent catalogues (e.g., LexInfo)
  • T5- Other


I1: The need to have a class for definitions
Reported by: P. Martín-Chozas on 11/5/2023, last updated on 19/06/23 (J. Gracia)
Type: T3
Related references:
Description:

A single triple does not suffice to capture all the key information of terminological definitions. Considering that neither the specification of skos:definition nor the specification of dct:source restricts the object to be a literal, we recommend definitions and sources to be given as resources (possibly blank nodes), with further attributed properties.

     <subject_resource> skos:definition 
                  [
                  rdf:value “This is an example of definition” ;
                  dct:source “Dictionary X” .
                  ] .

The current spec of Ontolex-lemon (https://www.w3.org/2016/05/ontolex/, “A definition can be added to a lexical concept as a gloss by using the skos:definition property”) was somewhat nonspecific, and we believe the specification should make explicit that implementors of Ontolex-lemon should expect the text definitions to be given either as literals or as the rdf:value attributed to the definition object.

Discussion:
Solution:

With the purpose of harmonizing the properties to be used for definition and source, we believe a few classes and properties should be specified --favouring thus the interoperability of Ontolex-lemon implementations.

I2. The need to have a class for notes
Reported by: P. Martín-Chozas on 11/5/2023
Type: T3
Related references:
Description: Notes are key elements of traditional term records, providing additional information, such as usage recommendations, domain data and references; they are considered valuable pieces of knowledge for language professionals. This type of information is still present in authoritative resources. (Add example)

TBC

Discussion:
Solution: We propose reify the note properties of other models such as SKOS, OntoLex and Lexinfo (skos:note, ontolex:usage, lexinfo:note) into a class, so that we can also assert its provenance, or the additional and relevant data that notes may contain.
I3. The need to have a class for usage information
Reported by: P. Martín-Chozas on 11/5/2023
Type: T3
Related references:
Description: This need is derived from the observation of language level notes on different IATE entries, that contain usage recommendations of the different terms that denote the concept. Such usage recommendations can be expressed as string of text and links to related resources, meaning that there are different pieces of information that need to be represented.
Discussion:
Solution: We propose to reify this property into a class, to represent the different pieces of usage information.
I4. The need to have a class for sources
Reported by: P. Martín-Chozas on 11/5/2023
Type: T3
Related references:
Description:

Like definitions, sources play a very important role in this modelling approach. Especially when terminologies are generated from multiple resources, it is crucial to maintain the traceability of the different terminological data (may they be definitions, term notes, term contexts, etc.). With the automation of the terminology creation process, we may distinguish between two types of sources: Intermediate Sources: not direct sources but information providers, such as existing linguistic resources from which information is retrieved (IATE, for instance) or applications (a Definition Extractor) and Original Sources: direct sources, meaning corpora (i.e. European Legislation), organisations (i.e. European Commission) or individuals (i.e. John Doe, European terminologist).

Discussion:
Solution:
I5. The need to have a (standardised) class for the reliability of a term
Reported by: P. Martín-Chozas on 12/7/2023
Type:
Related references: Cimiano, et al. 2015
Description:

Previous work on the representation of terminologies as Linked Data (Cimiano, et al. 2015) proposed an ontology based on the TBX specification which used the property tbx:reliabilityCode to represent this kind confidence rating that terminologists assign to terms. However, the domain is ontolex:LexicalEntry, and the property admits any type of rating. Following the guidelines of IATE, we propose a class ReliabilityCode with a fixed set of values 1-4, to standardise this rating.

Discussion:
Solution:
I6: The need to have a class for terminological concepts
Reported by: P. Martín-Chozas on 12/7/2023
Type: T3
Related references:
Description:

We propose the addition of a TerminologicalConcept as a sister class of ontolex:LexicalConcept, to maintain the consistency throughout the model. The addition of this class does not affect the OntoLex vocabulary: we are not redefining any property related to this class but adding new ones to account for their domains and ranges.

Discussion:
Solution:
I7: The need to have a properties that accomodate classes of the proposed extension with the Ontolex core model
Reported by: P. Martín-Chozas on 12/7/2023
Type: T3
Related references:
Description:

We propose 3 properties:


1) isEvokedBy Domain: TerminologicalConcept Range: LexicalEntry

2) lexicalizedSense Domain: TerminologicalConcept Range: LexicalSense

3) reliabilityCode Range: ReliabilityCode

Discussion:
Solution:
I8: Doubts about Catalan verbs modeling
Reported by: Paula Diez-Ibarbia on 27/11/2023
Type:
Related references:
Description:

I am trying to model Catalan verbs registered in Termcat Terminologia Oberta terminologies. Here we have two examples of the possible information we are provided:

Case 1: verb intransitive pronominal

Case 2: verb prepositional

I will model ‘verb’ with lexinfo:partOfSpeech and lexinfo:verb. Also, I will model ‘intrantisitive’ with olia:hasValency and olia:Intransitive.

However, I do not know how to model the information ‘pronominal’ and ‘prepositional’.For prepositional verbs, Termcat’s documentation says that “Prepositional verbs are verbs that usually require a complement headed by a preposition” (translation). To model prepositional information I have found:

1. olia:PrepositionalCase (used for pronouns and determiners according to the description)

2. olia:PrepositionalObject

3. olia:PrepositionalAdverb or lexinfo:PrepositionalAdverb

4. lexinfo:PrepositionalAdjunct

5. olia:PrepositionalPhrase or lexinfo:PrepositionPhrase

Using PrepositionalPhrase class does not seem fully adequate because we are given a verb not a phrase, although the verb will trigger the necessity of a preposition. Therefore, there seems to be a lack of 'pronominal verb' representation.

In terms of pronominal verbs, according to Termcat’s documentation (translation): “Pronominal verbs of Romance languages are verbs that, in all times, are accompanied by a pronoun that matches the subject and does not develop any specific syntactic function within the sentence”. In terms of prepositional information modeling, I have found:

1.olia:PronominalAdverb or lexinfo:PronominalAdverb.

There seems to be a lack of ‘pronominal verb’.

Discussion: Not a terminological issue; will not be discussed in this forum.
Solution: Christian will support Paula with Olia and Synsem. Move discussion to a lexicographical forum.

Use Cases

This section is used to collect and document, collaboratively, the use cases where previous reported issues arose when working with lemon-ontolex for terminologies.


Use Case 1: The representation of data from IATE
Reported by: P. Martín-Chozas on 21/7/2023
Related references: https://iate.europa.eu/entry/result/1443648/
Description:

IATE is one of the largest terminology databases and the main reference for translators across the bodies of the European Union. Since its creation in 1999, the database has remarkably been increased, and term entries are becoming more and more complex. The screenshots below show the different types of information attached to the term train path. Many of these items can already be represented with existing Ontolex extensions and additional ontologies, but others can not. Such items are the different elements contained in definitions and notes, the diferent sources for each element, and the reliability and evaluation of the term, for instance.

Texto alternativo. Texto alternativo.

Discussion:
Solution:

The addition of new classes and properties to the Ontolex model.

Use Case 2: The representation automatically generated terminologies enriched from different data sources
Reported by: P. Martín-Chozas on 21/7/2023
Related references: https://termitup.oeg.fi.upm.es/
Description:

TBD

Discussion:
Solution:

The addition of new classes and properties to the Ontolex model.

Use Case 3: The representation of cybersecurity terminology from the Lithuanian-English Cybersecurity Termbase
Reported by: Sigita Rackevičienė (Mykolas Romeris University) and Andrius Utka (Vytautas Magnus University) on 03/8/2023
Related references: https://www.terminologue.org/csterms/
Description:

The Lithuanian-English Cybersecurity Termbase was compiled in 2022 as a result of the COST Action Nexus Linguarum WG4 Use Case 4.3.1. Cybersecurity and the national project DVITAS (https://sitti.vdu.lt/dvitas/en ) Currently the termbase contains 233 concept entries which are organised in the following levels and data categories: Concept level (Concept ID, Subdomain the concept belongs to); Language level LT (Term, Term frequency, Definition, Definition source, Context example, Context example source); Language level EN (Term, Term frequency, Definition, Definition source, Context example, Context example source). Synonymous terms are categorised according to their frequencies. The most frequent synonyms (up to 3) are presented in the Language level Term category slots with frequency labels, while less frequent terms are presented in the category slot created especially for less frequent synonymous terms. The termbase can be exported into XML TermBase eXchange (TBX) format. Its latest TBX version is deposited at the CLARIN-LT repository: https://clarin.vdu.lt/xmlui/handle/20.500.11821/55 We would like to represent our terminological dataset as LLOD and link it with other terminological resources and ontologies.

Discussion:
Solution:
Use Case 4: The representation of Termcat Terminologia Oberta terminologies
Reported by: Paula Diez-Ibarbia on 27/11/2023
Related references: https://www.termcat.cat/ca/terminologia-oberta
Description:

The Catalan Terminology Center, Termcat, has a collection of terminologies available in three formats: XML, HTML, and PDF. We would like to model the 159 XML terminologies. These terminologies are multilingual, cover many domains, and provide a wide range of information such as grammatical information, notes, descriptions, languages (including Catalan Sign Language), chemical formulas, codes, CAS numbers, scientific names, examples, and domains and subdomains.

Discussion:
Solution: