Morphology

From Ontology-Lexica Community Group

OntoLex-Morph is an on-going initiative to create a vocabulary for morphologically rich languages to complement OntoLex core. The morphology module telco takes place every second Wednesday at 1pm CET/CEST (Berlin time).

Organizational

This OntoLex wiki page documents the procedure, methods and modeling of the creation of the OntoLex-Lemon Morphology Module. This module is the outcome of a collaborative work of the OntoLex community members which is built on regular telcos and discussions via the mailing list. Everyone can join the creation process by joining the W3C Ontology-Lexica Community group to participate in the calls and e-mail exchange.

WHEN'S THE NEXT CALL: please see the most recent agenda/minutes document on GDrive. Unless announced otherwise (see below), meetings follow a bi-weekly rhythm.

Since 2019, these discussions are also supported by and conducted in collaboration with Nexus Linguarum, an open academic network funded as a Cost Action (2019-2023). Registration to Nexus Linguarum is free and without obligations, and also not required for joining the discussion, but recommended, as Nexus is engaged in organizing events related to our discussions. On how to join (but OntoLex membership registration is) see their web site.

  • Telcos are conducted bi-weekly and announced
  * in the minutes document of the last call,
  * via Nexus Linguarum Slack, and
  * via Nexus Linguarum Calendar, and 
  * if we deviate from the bi-weekly rhythm, it is also announced via OntoLex mailing list. 
  • Telco link: The call can be accessed by using a telco link announced in the minutes/agenda documents. Note that the link can vary from one meeting to the next.
  • Minutes/agenda documents are collected ON GDRIVE and are accessible to all members of the Ontology-Lexica Community group. You can also use this directory to confirm the date of the next meeting (= document name).

The contents below reflect the module development and describes

  • the phases of vocabulary development
  • the goals underlying the Morphology Module, collected in the form of representation needs
  • general discussions, decisions and working examples
  • the current status and drafts that emerged during the development process

Module creation procedure

PHASE I: Definition of the purpose of the module [telcos from 20/11/2018 to 12/02/2019 and 17/02/2021]

This preparatory phase served to define the purpose and scope of the module based on example data and identified representation needs.

PHASE II: Exploration of example data [telcos from 26/02/2019 to 30/04/2019]

In this phase, those examples of representation needs which were marked in Phase I as “should be an explicit part of the model” have been explored in more detail. They formed the modeling basis for a module draft that fulfills the first purpose, i.e. the representation of morphological decomposition on the lexical entry and form levels. At the end of this phase an interim report about the status of the Morphology Module has been published in the paper Challenges for the Representation of Morphology in Ontology Lexicons.

PHASE III: Investigation of possibilities for automatic data generation [telcos from 25/06/2019 to 04/02/2020]

The published module draft has been taken as the starting point to proceed with the development of the second purpose, i.e. enabling the representation of building patterns that are involved in the formation of lexical entries and forms. Detailed discussions investigated the suitability of numerous drafts that would enable an automatic generation of inflectional paradigms.

PHASE IV: Refinement and testing [telcos from 20/01/2021 - ongoing]

After a one-year pause of the module development, the regular telcos have been taken up again. Based on the published draft as well as the latest draft for the representation of inflectional building patterns the discussions continued with drafting the representation of word-formation patterns, resulting in a module draft that encompassed both of the defined module purposes. This draft is currently tested against the representation needs for several languages and constantly refined until the module is finished.

Modeling goals

Purpose and scope

The morphology module aims at fulfilling two modelling purposes:

1.Stating elements that are involved in the decomposition of lexical entries and forms.

1.1 Morphological decomposition on the lexical entry level.

scope: The kind of elements of which a lexical entry can consist should be as non-restrictive as possible. I.e. The decomposition of lexical entries encompass lexical entries, components, derivational affixes, inflectional affixes, stems, roots and zero morphs. However, a lexical entry can NEVER be composed of a form!

1.2 Morphological decomposition on the form level.

scope: Elements of which a form can consist include roots, stems, inflectional affixes and zero morphs. A form can NEVER be decomposed into lexical entries (including ontolex:Affix), components and forms.

2. Enabling the representation of building patterns that are involved in the formation of lexical entries and forms.

2.1 Representation of decompositional building patterns for lexical entries.

2.2 Representation of decompositional building patterns for forms.


Discussion:

Katrien raised the question of the scope in the telco of the 15th January 2019. She pointed the distinction between traditional dictionary content (which contains not information about morphological rules and paradigms) vs. structured computational lexical data out. In the development of this module we are focussing more on the latter.

John's reply: We are at the intersection of the two. Going from traditional dictionaries up to the computational more structural lexical datasets.

Representation needs

The following representation needs have been collected in various discussions during the Morphology Module telcos by the community members. All of them will be realized in the modeling of the module.


N1: Morph resources
Description: In order to represent morphemic elements that do not apply to the restrictive definition of ontolex:Affix as being ontolex:LexicalEntry resources, a distinct class morphMorph is required as another top-level class next to ontolex:LexicalEntry and ontolex:Form. Moreover, with regard to a future etymology OntoLex module, it could serve as a means to represent data that has been identified and should be pointed to but to which no further detailed knowledge exists yet but might be added later.
Required vocabulary: owl:Class
Initial consensus: approved modeling:

morph:Morph a owl:Class ; rdfs:subClassOf owl:Thing .

Status Updates: as of 2021, we shifted towards modelling morph:Morphs as subclasses of ontolex:LexicalEntry. This was done to eliminate redundancy in morph-level form and sense attributes.
N2: Specific morph resources
Description: Next to main morph:Morph class, more specific morph resources should be representable. For morphological representation, the elements root and stem should be assignable to classes. Further a morph:Affix class is required in parallel to ontolex:Affix to enable the representation of morphs that are not considered ontolex:LexicalEntry resources. Further, more specific affix types such as transfix (a discontinuous affix), simulfix (change or replacement of vowels or consonants (usually vowels) which changes the meaning of a word) and zero morph (a morpheme that has a morphological meaning that corresponds to no overt form) which are not covered by other existing RDF vocabularies are required as well.
Language example:

English Simulfix: a-->e in man (singular) vs. men (plural)

Hebrew Transfix:grammatical information is encoded in a discontinuous vowel pattern that is applied to a consonantic root pattern. E.g. the transfix a-a-a (third person, singular, past) is inserted into the root k-t-b 'all concepts evolving around writing' to render the word-form kataba 'he wrote'.

German Zero Morph: case and gender are not overtly marked in the German noun Herr 'master' and, thus, correspond to no overt form. The morpheme NOM.SG is realized by the zero morph Ø (i.e. Herr-Ø (at morph level) vs. ‘master-NOM.SG’ (at morpheme level)).

Required vocabulary: owl:Class
Initial consensus: approved modeling:

current modeling with fixed set of classes

morph:RootMorph, morph:StemMorph, morph:AffixMorph, morph:TransfixMorph, morph:SimulfixMorph, morph:ZeroMorph rdfs:subClassOf morph:Morph .

Status Updates: The need is agreed upon, but as of early 2022, we decided to move the subclassification of morphs into Lexinfo. This is because this hierarchy is partially provided in LexInfo v. 3.0, already, and users should not be confused with having multiple namespaces for information of the same kind (e.g., lexinfo:Suffix alongside morph:Simulfix).
N3: Differentiation between derivational and inflectional morph resources
Description: With regard to representing the morphological content of lexical data the destinction between word-form forming (inflectional) and lexeme-forming (derivational) morph:Morph resources should be expressible and extractable. Concomitantly, the existing limitation of ontolex:Affix resources to represent only the latter type of morphs (due to its subclass relation to ontolex:LexicalEntry) will be overcome.
Language example:

German (homonym) suffixes:

1) -er: an inlectional affix forming comparative adjectives, e.g. schön 'beautiful' --> schöner 'more beautiful'

2) -er: a derivational affix forming agent nouns from verbs, e.g. fahren 'to drive' --> Fahrer 'driver'

Required vocabulary: Explicit identification of morph:Morph resources as being an inflectional or derivational morph.
Initial consensus: initial modeling:

morph:Morph morph:hasMorphStatus morph:Value .

morph:derivational a morph:Value .

morph:inflectional a morph:Value .

Status Updates: 2021/2022: The need is agreed upon, but with the inclusion of data from the LinkingLatin project, we shifted towards class-based modelling, i.e., WordFormationRule (resp. WordFormationRelation) vs. InflectionRule. Furthermore, we encode the difference between compounding and derivation in subclasses of WordFormationRule, resp. (partially) WordFormationRelation.
current modelling:

[a morph:WordFormationRule ] morph:involves [a morph:Morph ].

[a morph:CompoundingRule ] morph:involves [a morph:Morph ].

[a morph:DerivationRule ] morph:involves [a morph:Morph ].

[a morph:WordFormationRelation ] morph:wordFormationRule [ a morph:DerivationRule; morph:involves [ a morph:Morph ]] .

Note that here, we don't model the difference as a property of the morph, but as a property of the analysis and via morph:WordFormationRelation

N4: Inflectional paradigm
Description: Lexical data contains pointers to and/or tables of inflectional paradigms or classes including the respective stem affixes or the full word-forms. Both, the pointers to paradigms and the interconnection of word-forms that belong to a paradigm, should be representable.
Language example:

Greek assignment of a lexical entry to an inflection class: λόγος:

  mounce-morphcat: n-2a

Greek inflectional class paradigm: (with reconstructed underlying stem endings and desinence) n-3e(3):

  NS:   -ευς     {-εϝ+ς}
  GS:   -εως     {-εϝ+ος}
  DS:   -ει      {-εϝ+ι}
  AS:   -εα      {-εϝ+α}
  VS:   -ευ      {-εϝ+}
  NP:   -εις     {-εϝ+ες}
  VP:   -εις     {-εϝ+ες}
  GP:   -εων     {-εϝ+ων}
  DP:   -ευσι    {-εϝ+σι}
  AP:   -εις     {-εϝ+ας}

Examples for inflection tables with the inflectional paradigm structure and the inflected word-form. Latin: https://en.wiktionary.org/wiki/Appendix:Latin_third_conjugation

German: https://de.wiktionary.org/wiki/Flexion:jagen

Required vocabulary:

ontolex:LexicalEntry [object property] [morph:Paradigm] . ontolex:Form [object property] [morph:Paradigm] .

Tested on data:
Status: agreed (version 4.16)

ontolex:LexicalEntry lexinfo:morphologicalPattern morph:Paradigm .

ontolex:Form morph:inflectionRule morph:InflectionRule .

ontolex:InflectionRule morph:hasParadigm morph:Paradigm .

N5: Morphology crosses part-of-speech boundaries (derivation)
Description: John (Issue derived from "Linguistic Fundamentals for Natural Language Processing" by Emily Bender, Source: https://www.morganclaypool.com/doi/abs/10.2200/S00493ED1V01Y201303HLT020)
Language example:

Morphological processes can turn one part-of-speech into another, effectively creating a distinct LexicalEntry

English

  • "to play" (verb) => "played" (adjective)
  • "to play" (verb) => "the playing" (noun)
Required vocabulary:

ontolex:LexicalEnty ontolex:lexicalForm ontolex:Form .

ontolex:Form morph:consistsOf morph:ZeroMorph .

Tested on data:
Status: agreed modelling

CC: This should include "zero derivation", where one word receives another part-of-speech without any difference in form or meaning. As an example, every German adjective can be used as adverb, most English prepositions also occur as subordinating conjunctions (complementizers) and verbal particles, etc. For "zero morphology", a distinct LexicalEntry is necessary only if differences in sense can be established. The underlying issue is that OntoLex does not permit more than one part-of-speech per LexicalEntry (which would be the natural modeling here).

Bettina: Derivation should be expressable at least as the underlying word-formation process. Whether the three different types of derivation (i.e. 1) zero derivation, 2) word-class changing derivation with no additional meaning and 3) word-class changing derivation with additional meaning) should be expressable depends on the needs of the lexicographers.

Current draft: use established means for derivation to represent conversion and specify zero morph, e.g. “play” (noun):

descriptive/extensional modelling:

ex:play_v_rel_play_n a morph:WordFormationRelation ;

vartrans:source ex:lex_play_verb ;

vartrans:target ex:lex_play_noun .

ex:lex_play_noun ontolex:lexicalForm ex:form_play_noun_sg .

ex:lex_play_noun rdfs:member|morph:consistsOf ex:lex_play_verb, [a morph:ZeroMorph ].

or generative/intensional modelling:

ex:play_v_rel_play_n a morph:WordFormationRelation ;

vartrans:source ex:lex_play_verb ;

vartrans:target ex:lex_play_noun .

ex:lex_play_noun ontolex:lexicalForm ex:form_play_noun_sg .

ex:play_v_rel_play_n morph:wordFormationRule/morph:involves [a morph:ZeroMorph ].


N6: Morphs linked to Lexical Entries
Description: Many dictionaries contain information about the morphology of a headword. This is typically given relative to the lemma. A possibility should be provided that enables an explicit statement of word-forms or morphemic elements that are given as part of the lexical entry.
Language example:

German(from "Langenscheidt Taschenwörterbuch Deutsch als Fremdsprache":

  • Bedingung die; -, -en
  • Bedürfnis das; -ses, -se
  • Beitrag der; -(e)s, Beiträge

Note the does not cover all forms of the German noun, e.g., "Bedürfnissen", "Beiträgen"

It should be possible to model this information with two conditions:

  1. It is not necessary to materialize all forms of the word, instead only the relevant stems and minimal set of inflected forms or inflectional morphemes
  2. It is possible to generate any form in a programmatic manner

JMC: question is if we can underspecify the morphological pattern

Required vocabulary: 1. reuse vocabulary for automatic generation of word-forms and 2. create new property with ontolex:LexicalEntry in its domain to explicitly state which word-forms and/or morphs or grammatical information are considered custom extensions of a lemma.
Tested on data:
Status: unclear if this representation need should be kept

Look up TEI representation: https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-gramGrp.html and https://www.tei-c.org/release/doc/tei-p5-doc/en/html/DI.html

Telco 09.06.2021:

Proposal: Object property morph:morphologicalForm could be created (in parallel to ontolex:lexicalForm) with domain ontolex:LexicalEntry and range morph:Morph

→ different positions on whether this should be representable in the module at all because all information/data is already covered with the vocabulary and it is a need of space-restricted print dictionaries - discuss again later!

Status 07.09.2022: This can be done via morph:morphologicalPattern and morph:paradigm. However, THERE IS NO DIRECT LINK between morph:InflectionRule and morph:Morph, so this would be represented as string replacements, only, not as morphs.


N7: Multiple segmentation strategies
Description: Way to allow more than 1 segmentation of a single ontolex:Form
Language example:

The segmentation of lexical entries or wordforms varies with different granularity:

German verb jagte "hunted"

Complete segmentation: root-stem-suffix

[[[jag]-t]-e] - [[[root]tense suffix]number suffix]wordform

Contracted segmentation: stem-suffix

[[jagt]-e] - [[past tense stem]number suffix]wordform

Required vocabulary:
Tested on data:
Status: to be discussed

Christian: Does occur in Splett's Old High German dictionary (https://brill.com/view/journals/abag/42/1/article-p264_28.xml): Here, full morphological parses (tree structures) are being used. The other (main) use case is in language documentation (with Toolbox, from which dictionaries are being created): Linguistic glossing can operate on a superficial level or on a deep level, cf. German fressen ("to eat, of an animal") which superficially involves two morphemes (fress- + -en), but on a deep level involves three (*ver- + ess- + -en, *ver- contributing the derogative [non-human] meaning as in verwerfen "reject", lit. "cast away"). Normally, while one dictionary may chose one level of depth, another dictionary may chose another. Admitting more than one level of depth allows to merge information from different sources in a coherent representation. Wrt. morphological pattern: Isn't the idea that the morphological pattern describes a context for one given morph(eme)? So if have more than one (-t- and -e-) here, how will be formalize their combination?

Petra Steiner (7.9.2022): need for modelling derivation trees ((A B) C) confirmed.

Current recommendation: model with decomp, no designated vocabulary needed HOWEVER: not clear whether this supports multiple concurrent segmentations in a single data structure.

N8, N9, and N10 have been removed after discussion in the telco of 26.05.2021

N11: Meanings of stems and roots
Description: Link morphs and senses. For roots or stems with lexical senses or lexical concepts, e.g., for semantic fields of roots , e.g., reconstructed protoforms (resp., their meaning) [why is Morph not a Lexical Entry?]
Language example:

The meaning of stems and roots differ in the former are language-specific and the latter language-independent concepts. Stems have a word-class affiliation and often also entail grammatical information like tense and number (inherent inflectional meanings). As they function as the underlying semantic core of the lexical entry they occur in, the meanings of stems could be treated as the meanings of lexical entries. Roots, however, comprise very unspecific meanings from which words of various wordforms can be built.

Hebrew root k-t-b conveys the concept "anything related to writing". From this root nouns and verbs can be build, e.g. to write, journalist, author.

Required vocabulary:

'sense' property: domain: ontolex:LexicalEntry, morph:StemMorph and morph:RootMorph

range: 'sense' concept class

Tested on data:
Initial proposal: modeled as draft:

Bettina: The description of meanings of stems and roots could be realized in the same way as the description of meanings of lexical entries as given in ontolex. For the representation of roots maybe external resources such as Concepticon could be recommended or the possibility of a plain textual definition could be established in addition.

Discussed proposal: Extend domain of ontolex:sense with ontolex:LexicalEntry and morph:StemMorph and morph:RootMorph.

JMC: not in favour of extending ontolex:sense domain with morph:Morph, proposes new property morph:sense with ontolex:LexicalSense and another Concept class.

JBG: With the use of ontolex:LexicalSense we are assuming an ontological reference, so we might run into the same problems as the ones we found when converting dictionaries (which ontological references to point to?). Since in the lexicog specification we opted to stick to ontolex:LexicalConcepts for the meaning of lexical entries in the conversion of dict entries to LLD, why would we want to point to LexicalSense in this case, instead of Concept?

Current draft: property morph:sense with morph:Morph in domain and ontolex:LexicalSense in range

object property: morph:sense

domain: morph:Morph

range: ontolex:LexicalSense

Status: solved: use OntoLex core vocabulary, as morph:Morph is now a LexicalEntry


N12: Derivational Meanings
Description: Issue derived from "Linguistic Fundamentals for Natural Language Processing" by Emily Bender, Source: https://www.morganclaypool.com/doi/abs/10.2200/S00493ED1V01Y201303HLT020
Language example:

Diminutives create a new noun with a meaning of being smaller, this could be modelled by means of adding a small classes to the meaning of a noun. Three types of derivational meanings should be considered: Conversion: word-class change with no affxal marking and no additional meaning, e.g. play (v) → play (n)

Derivation 1: word-class change with affxal marking and no additional meaning, e.g. play (v) → playing (n)

Derivation 2: with or without word-class change with affxal marking and additional meaning, e.g. book (n) → booklet (n), play (v) → player (n)

Required vocabulary: class for representing derivational meanings, e.g. morph:DerivationalConcept
Tested on data:
Status: modeled as draft:

Diminuitives are not an ideal example because they are sometimes considered to be inflectional rather than semantic features (a form of degree, such as comparative). A better example might be the English morpheme "-er" which attaches to a verb to form a noun that represents the agent. The classic representation is by means of a rule: V + "-er" => N_ag (CC)

John: Model derivational meanings as concepts and link morph instances to this concept.

Fahad: Ignore examples with lexicalized words (e.g. computer). We do not need to model too deeply - just state “diminutive”.

John: Proposes to have DerivationalConcept as subclass of ontolex:Concept (but no need for InflectionaConcept subclass).

Current draft: property morph:evokes with morph:Morph in domain and ontolex:LexicalConcept in range

object property: morph:evokes

domain: morph:Morph

range: ontolex:LexicalConcept

morph:DerivationalConcept rdfs:subClassOf ontolex:LexicalConcept .

Current status: NOT MODELLED: instead of morph:evokes, we can use ontolex:evokes. TBC: what is the added value of morph:DerivationalConcept

N13: “missing” part of the stem becomes a separate token
Description: I think there is a need to allow for morphology to break up a stem. I see John has raised a similar issue in N9, but what I am suggesting is that some tokens represent reduced forms of the stem/headword, but that the “missing” part of the stem becomes a separate token.
Language example:

Eg. Old Irish verbs like do-beir:


1. Prototonic form is tabair (a verb), with the ta- mapping to the do- of the stem. 2. Deuterotonic form is do + beir (a particle + a verb).

In this case, while the headword, do-beir contains do-, the morphological form does not, and do- exists as a separate particle token. Pronouns can come between the particle and the verb and this is not considered tmesis.

Required vocabulary: class for representing free and/or grammatical morphs and an object property that allows statements to express that a free/grammatical morph is part of an ontolex:Form or a complex morph:Morph resource
Tested on data:
Status: consensus on modelling:

object property: morph:consistsOf

domain: morph:Morph

range: morph:Morph

ontolex:Form morph:consistsOf morph:Morph .

N14 has been removed after discussion in the telco of 07.07.2021


N15: Lexeme generation takes LexicalEntry and Form as input
Description: The generation of ontolex:LexicalEntry resources should allow to take resources of the type ontolex:LexicalEntry as well as ontolex:Form as input sources. This is required for languages which form new lexemes with inflected word-forms. One example is compounding in German, where the modifier takes on inflected forms (e.g., Gäste+haus "guest house", lit. "guests' house" [plural]).
Language example:
Required vocabulary: morph:consistOf range: ontolex:Form
Tested on data:
initial proposal: modeled as draft:

The object properties vartrans:source and vartrans:target are reused and the range of morph:consistOf will not be extended to ontolex:Form. Any word-forms involved in the source or target of a generated ontolex:LexicalEntry have to be expressed by using morph:WordFormationRule.

vartrans:source

vartrans:target

morph:WordFormationRule

current status: to be droppen? no real data. extension of vartrans:source is possible but beyond scope (in vartrans). We'd need to suggest a vartrans:LexicalRelation between forms.

In German linguistics, an alternative view on compounding with inflected modifiers has been advocated, i.e., that the (diachronic) inflection now serves as interfix. This is supported by the fact that these "inflections" lost their grammatical meaning, so there is German Gästehaus (guest house) along with Gasthaus (restaurant), but the difference in meaning has nothing to do with the singular or plural morpheme that acts as interfix.


N16: Coverage of morphological language types
Description: Language vary with regard to their degree of morphological complexity on a scale ranging from isolating, agglutinating, synthetic and polysynthetic languages. The morphology module should cover the representation of most frequent morphological phenomena of agglutinating and synthetic languages. How much shall be covered with regard to highly complex polysynthetic languages, e.g. Inuktitut? Decide on test data.
Language example:
Required vocabulary:
Tested on data: Turkish (agglutinating)
Status: support of "slot grammars" confirmed (morph:InflectionType)


N17: Part of Speech transformation
Description: For lexical entries of categories that systematically come in two variants by “Zero Derivation”, e.g., every German adjective is an adverb, but this cannot be modelled with OntoLex vocabulary, and it is a productive process (so it’s within morphology)
Language example:
Required vocabulary:
Tested on data:
Status: same as N5?


N18: Recursive morphology
Description: Current focus is on inflectional morphology. But there are languages where after inflection is applied, additional layers of inflection can be applied (see data samples for Sumerian).
Language example:
Required vocabulary:
Tested on data: German, Turkish
Status: agreed modelling via the next property of morph:InflectionType


N19: Incorporation
Description: In many languages, incorporation is a way for a verb to refer to a specific semantic role (usually THEME). This is a productive process and it corresponds in function to case inflection in standard average european, so within the realm of inflection (see data samples on Inuktitut).
Language example:
Required vocabulary:
Tested on data:
Status: untested


N20: Weak noun/verb distinction
Description: Some languages are relatively flexible in “recasting” a verb into a noun, e.g., Standard Average European participles (verb => adjectives) but also finite verbs (Inuktitut qimmiuvuq “he has a dog” (verb) = “dog-owner” (noun), as a noun, this can be inflected as a noun, e.g., qimmiuvup “dog owner”.ERG). The mechanism here is that a particular type of inflection is “repurposed” for derivation. Technically, this can be treated like Zero Derivation.
Language example:
Required vocabulary:
Tested on data:
Status: same as N5?


N21: Derivation
Description: With recursion and incorporation, two morphological processes are addressed that share important characteristics with derivation: incorporation takes a noun and uses a verbal affix to produce a verb (so it involves a shift in parts of speech), recursive morphology is a recursive process (like most forms of derivation). As a generalization over both these aspects, derivation would require that an affix (morph?) posits constraints on the base form it is applied to (e.g., nominal for incorporation), and the grammatical features (“meaning”?) of the resulting form (e.g., verbal for incorporation). Sample data for derivation (plus the variants above) in Inuktit data sample.
Language example:
Required vocabulary:
Tested on data:
Status: consensus: modelling of constraints with morph:GrammaticalMeaning (morph:baseConstraint, morph:grammaticalMeaning)


N22: Compositional structure of compounds and derivations
Description: Basically, being able to represent morphological parses. See Old High German sample data.
Language example:
Required vocabulary:
Tested on data:
Status: cf. N7


N23: Slot grammars
Description: A number if agglutinative languages are described in terms of slots, i.e., morphemes that follow each other in a fixed order. This is not the same as a paradigm, because this is part of a paradigm only. See Sumerian sample data.
Language example: Finnish number and case, see demonstration on form generation. morph:InflectionRule represents a paradigm for one slot
Required vocabulary: morph:next, morph:InflectionRule
Tested on data: Turkish
Status: agreed upon, modelled by next property of InflectionType


N24: Assimilation rules
Description: Depending on context, a morpheme can be serialized in different ways. So far, we decided to not model assimilation rules as productive rules, but we might want to capture interdependencies, e.g., vowel harmony in Turkic languages.
Language example:
Required vocabulary: morph:InflectionRule, morph:Rule
Tested on data: Toy Finnish example
Status: in 2021/2022, we decided to focus on the first level of two-level morphologies. technically, assimilation rules as finite states are possible, but the naming is counterintuitive, currently considered out of scope


N25: Transliteration rules
Description: (technically similar to assimilation rules.) Many languages have defective orthographies (e.g., writing a CCVCC language like Greek with a CV syllabary like Cypriot). Then, a morphologically derivable form must be mapped onto a particular orthography. To provide rules that directly produce orthographic forms may be too complex. Sample data from Sumerian (actual data sample use idealistic representations, not orthographic strings)
Language example:
Required vocabulary:
Tested on data:
Status: out of scope, a possible solution could, however, converge with N24

Decisions

Fixed set of morph:Morph classes: Telco 12.05.2021

The morphology module provides explicit subclasses of morph:Morph to represent roots, stems, affixe, transfixes, simulfixes and zero morphs (cf. representation need #2). It has been decided that other frequent types of morphs, e.g. prefix, suffix, circumfix, will not be included because the Lexinfo vocabulary can be reused to represent them. In case specific subclass relations between morph:AffixMorph and Lexinfo morph classes are required, the Lexinfo vocabulary can be extended, respectively (to do so refer to https://github.com/ontolex/lexinfo).

At present (2022-02-23), the following TermElements are provided:

If the modelling decision from 2021-05-12 is to be respected, we need to add the current subclasses to Lexinfo

The following TermElements are not relevant for morphology or too coarse-grained:

Update Telco 06.04.2022

LexInfo also provides sub*classes* of ontolex:Affix, e.g., lexinfo:Infix. For reasons of symmetry, and to avoid the notion of `term' (which may be felt inappropriate for morphological entities), we should extend the set of Affix/Morph subclasses in LexInfo rather than use TermElement instances. Accordingly, we removed the subclasses of Morph from the diagram.

Working examples

Examples that concern the modelling of morphological data on the lexicon level (lexical decomposition and word-formation)

E1: Representing the lexical entries of which a derived word (as a lexical entry) consists:

RESULT: Implement Option 1 and reuse the decomp vocabulary by adapting decomp:correspondsTo, decomp:subterm and decomp:Component according to this document.

The example entry is the Wiktionary entry for the noun driver. Since the elements of a morphological decomposition of a lexical entry are by itself lexical entries, these elements do not have to be further morphologically describable but can be specified by using the ontolex module.

MODELLING OPTION 1: reuse decomp:constituent

 ontolex:lex_driver_n a ontolex:LexicalEntry ;
 decomp:constituent ontolex:lex_drive_v , ontolex:lex_suffix_er . 

Discussion:

Bettina: The object property for these statements should interrelate also other parts (i.e. derivational affixes) and not only a “particular realization of a lexical entry that forms part of a compound lexical entry”. There is the risk that the users will understand decomp:constituent as only applying to parts of compound words and not also to derived words.

Christian: compound is used more technically here, would reuse decomp:constituent

Thierry: is also ok with decomp:constituent

Francesca: for NLP users it is confusing to have compounds composed of morphemes (de-compounding is a specific task)

Thierry: maybe adjust the existing decomp model?! To describe more encompassing properties.

Christian: agrees with Thierry (roughly: vocabulary needs to be consolidated in the end, and existing terminology has to take priority, not for the moment, though)

MODELLING OPTION 2: create new property morph:consistsOfLexEntry

 ontolex:lex_driver_n a ontolex:LexicalEntry ;
 morph:consistsOfLexEntry ontolex:lex_drive_v , ontolex:lex_suffix_er . 

Discussion:

Bettina:consistsOfLexEntry for decomposing lexical entries, which can only consist of other lexical entries (not of morphs) with Domain: ontolex:LexicalEntry and Range: ontolex:LexicalEntry. The elements of which a lexical entry consists should be also lexical entries. Equivalently there could be a consistsOfMorphEntry object property for stating of which morph resources ontolex:Form (i.e. wordforms in inflectional paradigms) consist.

MODELLING OPTION 3: create more specific subproperties

 ontolex:lex_driver_n a ontolex:LexicalEntry ;
 morph:hasRoot ontolex:lex_drive_v ;
 morph:hasSuffix lex_suffix_er . 

Discussion:

Fahad: It might be an idea to have more specific properties such as hasRoot and hasAffix/prefix/suffix/infix here instead of the rather generic consistsOfLexEntry or decomp:constituent.

Bettina: I have strong objections to use :hasRoot because decomposition of lexical entries on the word-formation level does not entail roots. This confuses the description of word-formation and inflection. Also, I cannot see how this would help to describe complex derived words, e.g. driverless. One would have to adopt a linear decomposition, rather than a binary one, i.e. :root_drive plus :suffix_er plus :suffix_less instead of :lex_driver_n plus :lex_less and then decompose :lex_driver again.

E2: Stating that a (derived) word has a derivational relation to another word:

RESULT: Create new property morph:derivationalRel as subproperty of vartrans:lexicalRel.

MODELLING OPTION: create subproperty of vartrans:lexicalRel

 ex:lex_driver_n a ontolex:LexicalEntry ;
 morph:derivationalRel ex:driverless_n , 
                       ex:driverside , 
                       ex:pile driver.  

E3: Stating the derivational relation that holds between a lexical entry and its derivative:

RESULT:

MODELLING OPTION:Create new class morph:DerivationalRelation as subclass of vartrans:LexicalRelation.

 ex:driver_AgentNoun a morph:DerivationalRelation ;
 vartrans:source   ex:lex_drive_v ;
 vartrans:target   ex:lex_driver_n


Examples that concern the modelling of morphological data on the morphology level (morphological segmentation and inflectional paradigms)

Modelling

Current Modelling Status

The current modelling status can be inspected in our GitHub repository: - diagram: https://ontolex.github.io/morph/doc/diagrams/ - definitions: https://ontolex.github.io/morph/draft.html

The notes below are preserved for historical reasons.

Modelling Status eLex 2019

relations classes
morph:derivationalRel

subproperty of vartrans:lexicalRel

definition: The 'derivationalRel' property relates two lexical entries that stand in some derivational relation.

domain: ontolex:LexicalEntry

range: ontolex:LexicalEntry


morph:Morph

definition: A morph is a concrete primitive element of morphological analysis.

morph:consistsOf

definition: This property states into which Morph resources a Form resource can be segmented.

domain: ontolex:Form

range: morph:Morph


morph:DerivationalRelation

subclass of vartrans:lexicalRelation

Definition: A 'derivational relation' is a lexical relation that relates two lexical entries by means of a derivational affix.

decomp:correspondsTo (adapted)

definition: The property correspondsTo links a component to a corresponding lexical entry or argument.

domain: Component

range: LexicalEntry or Argument or Frame or Morph

morph:MorphValue

definition: The value of a morph states the relationship that holds between the morph and the forms or lexical entries in which it can occur.

decomp:subterm (adapted)

definition: The property subterm relates a compound lexical entry to one of the lexical entries it is composed of.

domain: LexicalEntry

range: LexicalEntry or Morph

decomp:Component (adapted)

definition: A component is a particular realization of a lexical entry that forms part of a compound lexical entry or a derived word.

ontolex:morphologicalPattern (adapted)

definition: The 'morphological pattern' property indicates the morphological class of a word.

domain: ontolex:LexicalEntry

range: morph:MorphologicalPattern

morph:MorphologicalPattern

definition: The morphological pattern states the inflectional, derivational or compositional building pattern that applies to a lexical entry.

morph:hasMorphStatus

definition: States whether a morphological element functions as inflectional or derivational.

domain: morph:Morph or ontolex:Affix

range: morph:MorphValue


morph:InflectionalParadigm

definition:A structured set of inflected forms according to specific grammatical parameters.

morph:belongsToMorphPattern

definition: This property assigns an inflectional pattern of a form as belonging to a morphological pattern of a lexical entry.

domain: morph:InflectionalParadigm

range: morph:MorphologicalPattern

morph:hasParadigm

definition: This property assigns a form to an inflectional paradigm.

domain: ontolex:Form

range: morph:InflectionalParadigm


Previous Modelling Drafts/Discussion

Draft until 26.04.2019

This second draft works through the first to subareas 1.1 and 1.2 of modelling as defined in the purpose and scope section.It shows the realization of the working examples E1 to E3 so far and proposes a modelling of the morphological decomposition of forms.

Module draft 2
relations classes
morph:derivationalRel

subproperty of vartrans:LexicalRel

definition: The 'derivationalRel' property relates two lexical entries that stand in some derivational relation.

domain: ontolex:LexicalEntry

range: ontolex:LexicalEntry


morph:Morph

definition:

morph:consistsOf

definition: This property states into which Morph resources a Form resource can be segmented.

domain: ontolex:Form

range: morph:Morph


morph:DerivationalRelation

subclass of vartrans:lexicalRelation

Definition: A 'derivational relation' is a lexical relation that relates two lexical entries by means of a derivational affix.

decomp:correspondsTo (adapted)

definition: The property correspondsTo links a component to a corresponding lexical entry or argument.

domain: Component

range: LexicalEntry or Argument or Frame or Morph

morph:RootMorph

definition:

decomp:subterm (adapted)

definition: The property subterm relates a compound lexical entry to one of the lexical entries it is composed of.

domain: LexicalEntry

range: LexicalEntry or Morph

morph:StemMorph

definition:

- morph:AffixMorph

definition:

- morph:TransfixMorph

definition:

- morph:SimulfixMorph

definition:

- morph:ZeroMorph

definition:


Draft until 26.03.2019

This first draft is based on John's initial modelling diagram proposed in the eLex abstract ([1]). Bettina reproduced it here and added numbers on the relation arrows and the namespace prefixes for better clarification. For now as prefix for the morphology module "morph" has been chosen. This can be debated and changed later.

Module Draft 1.2

John Todo: add class descriptions

relations classes
vartrans:source

definition: The source property indicates the lexical sense or lexical entry involved in a lexico-semantic relation as a 'source'.

domain: vartrans:LexicoSemanticRelation

range: ontolex:LexicalEntry OR ontolex:LexicalSense OR ontolex:LexicalConcept

morph:DerivationalRelation

definition:

vartrans:target

definition: The target property indicates the lexical sense or lexical entry involved in a lexico-semantic relation as a 'target'.

domain: vartrans:LexicoSemanticRelation

range: ontolex:LexicalEntry OR ontolex:LexicalSense OR ontolex:LexicalConcept

morph:MorphologicalPattern

definition:

ontolex:evokes

definition: The evokes property relates a lexical entry to one of the lexical concepts it evokes, i.e. the mental concept that speakers of a language might associate when hearing the lexical entry.

domain: ontolex:LexicalEntry

range: ontolex:LexicalConcept

morph:FormPrototype

definition:

ontolex:lexicalForm

definition: The lexical form property relates a lexical entry to one grammatical form variant of the lexical entry.

domain: ontolex:LexicalEntry

range: ontolex:Form

morph:MorphPrototype

definition:

ontolex:morphologicalPattern

definition: The morphological pattern property indicates the morphological class of a word.

domain: ontolex:LexicalEntry

morph:Morph

definition: open for discussion for later

1-7: object properties that still need to be specified morph:DerivationalPrototype

definition:


The diagram has raised the following questions:

  1. DerivationalRelation:
    • CC: any reason not to just name this Relation (assuming the namespace will be something like morph: or ontolex-morph:, this will be unambiguous). Otherwise, inflection cannot be expressed nor composition with interfixes.
      • JM: I think this would cause confusion with more generic relations in the VarTrans model
    • CC: not in the diagram, but do I understand correctly that this is the reification of proposal:derivRel?
      • JM: Like in VarTrans we could support both a single relation and a reified relation
    • CC: a derivational relation normally involves three or more elements, a base morpheme (say, a root), one or more derivational morphemes, and the derived lexeme, each of these could have a lexical entry. I guess this may be the idea behind DerivationalPrototype, but if so, what is the relation between DerivationalRelation and DerivationalPrototype?
      • JM: Prototypes are patterns that express generic derivations, where as the relation is between two specific lexical entreis
    • CC: Form being defined as an aggregation of proposal:Morph elements, the aggregation relation would probably be proposal:consistsOfMorphEntry from Feb 26 notes?
  2. *Prototype:
    • CC: I might have missed it, but what is the definition of these?
      • JM: A prototype is like a morphological analysis, but with one or more elements being replace with a generic placeholder, typically for the stem
  3. MorphologicalPattern:
    • CC: I guess this is ontolex:MorphologicalPattern, with the proposal adding FormPrototype and DerivationalPrototype
  4. MorphPrototype:
    • CC: I assume this is the representation of a morpheme (as an abstraction over different allomorphs). Why isn't it connected with Morph, then?
      • JM: This is the placeholder for the 'stem' in the model. I will add an example soon.
  5. FormPrototype:
    • CC: not quite sure what this is supposed to be. Do you mean something like a normalized morphological segmentation (with morphemes rather than morphs/substrings)? If so, why link Morphs directly rather than indirectly via MorphPrototypes?
    • CC: assuming that FormPrototype and Form are basically morphological segmantations at different levels of (morphological) normalization, I can see that both on an abstract level (MorphologicalPattern) and for a specific lexical entry (Form) one would need different segmentations. However, if MorphPrototype and Morph would be linked, there's no need for a relation with Morph (or Form).
    • CC: Morph must have a way to represent both order (always) and hierarchical structure (cf. Old High German Splett dictionary)
    • CC: assuming that the MorphPrototype does have multiple allomorphs attached: Is this a LexicalEntry?
    • CC: I might be all wrong about the division of labour between Morph and MorphPrototype, but if the MorphPrototype is something like a morpheme, and the Morph is understood as the specific form of a morpheme (e.g., an allomorph), the Morph subclasses should probably be MorphPrototype subclasses.
  6. DerivationalPrototype:
    • CC: Do you have an example? I have difficulties to see a DerivationalPrototype that doesn't involve a Morph(Prototype).
      • JM: see below for an example

For example the straightforward modelling of an English noun and its regular plural would be:

 <#cat> a ontolex:LexicalEntry ;
 ontolex:canonicalForm [
   ontolex:writtenRep "cat"@en ;
   lexinfo:number lexinfo:singular ;
   morph:consistsOf [
      a morph:StemMorph ;
      morph:representation "cat"@en
   ]
 ] ;
 ontolex:otherForm [
   ontolex:writtenRep "cats"@en ;
   lexinfo:number lexinfo:plural ;
   morph:consistsOf [
     a morph:StemMorph ;
     morph:representation "cat"@en 
   ] , [
     a morph:AffixMorph ;
     morph:representation "s"@en
   ] 
 ].

We could make this a prototype as follows

 <#cat> a ontolex:LexicalEntry ;
 ontolex:canonicalForm [
   lexinfo:number lexinfo:plural ;
   ontolex:writtenRep "cat"@en ;
   morph:consistsOf [ # This bit is optional
      a morph:StemMorph ;
      morph:representation "cat"@en
   ]
 ] ;
 ontolex:morphologicalPattern <#englishNoun> .
 <#englishNoun> a morph:MorphologicalPattern ;
  morph:canonicalForm [
     a morph:FormPrototype ;
     lexinfo:number lexinfo:singular ;
     morph:consistsOf [
       a morph:PrototypeStemMorph ]
  ] ;
  morph:otherForm [
    a morph:FormPrototype ;
    lexifo:number lexinfo:plural ;
    morph:consistsOf [
       a morph:PrototypeStemMorph
     ] , [
       a morph:AffixMorph ;
       morph:representation "s"@en
     ] 
  ] .


Draft until 22.01.2019

Telco 22.01.2019 Description and generative aspects should be achieved with this module. They are mirrors of each other. For every element there will be an abstract version of it. A form will be composed of morphs. And then a form pattern will be composed by morph patterns. Morph pattern is composed of concrete morphs or of abstract morphs, which are slots (like the stems). As in N7: suffixes are concrete morphs and all noun stems that can be formed with these are the abstract stem slots, see structure proposal https://www.w3.org/community/ontolex/wiki/File:Morphology_structure_proposal.jpg


Telco 27.11.2018 Starting point for modelling as a reduced version of MMoOn Core. This version enables the representation of inflection, derivation and compounding with explicitly stating the morphs of which words consists and assigning meaning to the morphs.

[[2]]

Minutes: CC: Start with ontolex definitions for elements that already exist.

Word:

Jesse: instead of linking ontolex:From or ontolex:LexicalEntry just link to ontolex:Form Replace mmoon:Wordform with ontolex:Form class and interrelate other elements with properties to it

B: represent derivation and compounding on LexicalEntry level and inflection on Form level (CC agrees)

Max: cluster examples in types of use cases

Meaning:

CC: cover morphology generation aspect, collect use case

Upcoming issues:

Julia: Table with opposing definitions of ontolex and moon classes (Bettina Todo)

K: likes to have example data with mmoon and its source data (Bettina Hebrew and Xhosa Todo), provides examples as well

Julia: provide KD examples

Summary: The discussion of specific modelling issues is postponed until we have collected and analyzed enough example data and defined the purpose of the moduele.

Morphology structure proposal (early 2019)