Valency and Semantics

From Ontology-Lexica Community Group

Suggestion

Develop a designated vocabulary for syntactic and semantic relations (frames and valency), possibly as an extension (revision?) of the SynSem module. The SynSem module currently captures primarily aspects of syntax.

Material for this discussion is collected in the SynSem GitHub repository. There is no formal discussion (series of telcos) on the topic.

Motivations

- Frame semantics has been an area of intense research since Fillmore’s seminal “Case for Case” article (Fillmore, 1968), and numerous digital resources for frame semantics have subsequently emerged, most notably FrameNet (Baker et al., 1998) and PropBank (Kingsbury and Palmer, 2003) (other specifications do exist, but FrameNet and PropBank are more representative in that their specifications have been applied to several other languages beyond English): FrameNet is an inventory of frames, i.e., predicates, their roles and potential fillers, and constraints for those, coupled with lexicalization preferences and subsequently augmented with annotations in actual text. PropBank is an annotation effort that develops a frame inventory as a means to annotate textual data. Both differ in philosophy and granularity, but are nevertheless closely interrelated and complementary resources. Unfortunately, their respective data models and formats are quite different, so that harmonization between both resources could only be implemented by untyped hyperlinks (the Unified Verb Index, http://verbs.colorado.edu/verb-index/index.php, Palmer, 2009). This mapping is informative, but incomplete and not machine-readable, as it is implemented on the level of human-readable visualizations (websites) rather than machine-readable web resources. (Pret-a-LLOD D5.1 report, 2020, p.13)

- More recent efforts to integrate both resources with each other and related resources (VerbNet, NomBank, BabelNet, etc.) have thus been developed on the basis of Linked Data principles and technology. At the same time, we are faced with a multitude of proposals for vocabularies for this purpose, so that the desideratum is less to develop novel or more adequate vocabulary, but rather to harmonize or synthesize existing proposals. (Pret-a-LLOD D5.1 report, 2020, p.13)

- Valency dictionaries, e.g., Latin Vallex lexicon (https://itreebank.marginalia.it/view/lvl.php): The Latin Vallex describes the valency frame attached to possible senses of a series of valency-capable words, which are for now limited to verbs. Each sense of a verb is assigned a valency frame that is described with the appropriate set of semantic-role descriptors (called "functors"; a list is available here). Furthermore, we're defining the senses in close connection with our new Latin WordNet. So, for instance, the verb "abduco" (remove) in the sense connected to the synset "remove something concrete, as by lifting, pushing, or taking off, or remove something abstract", is assigned a frame with the roles: ACT (roughly: agent), DIR1 (direction from), DIR3 (direction to), PAT (roughly: patient). Note that "functors" are semantic, rather than syntactic, descriptors. Since the beginning our Vallex has been closely modeled on the Czech Vallex: http://ufal.mff.cuni.cz/vallex, http://ufal.mff.cuni.cz/%7Elopatkova/literatura/06-TR-vallex-2.0.pdf (Francesco Mambrini, via OntoLex mailing list, 2020-12-14)

- Valency dictionaries vs. FrameNet: In the theoretical frame of Prague's Vallex, "valency frames" and "roles" are assigned to "Lexical Units", i.e (quoting from the PDF linked above) "form-meaning complexes with (relatively) stable and discrete semantic properties. Roughly speaking, LU can be understood as a given word in the given sense"; frames are defined locally for each lexical unit (there are no general classes of frames or of words, like VerbNet or Propbank classes). As all the LiLa resources are based on Ontolex, I am looking for any Ontolex-based solution that would be coherent with the Lexicon as it was designed. Now, it seems to me that it would be relatively easy to model it using the older classes and properties of Lemon, and assigning semantic roles to Ontolex senses (as was done with UbyLemon, if I see it right). I have a much harder time figuring out how I can model this information using SynSem now. One could theoretically say that the Vallex model presupposes a syntactic frame that is linked to a semantic frame (the Czech Vallex does indeed link the two); but I am quite at loss on how I could model this semantic frame using SynSem's ontology mapping, which seems designed to do other things. (Francesco Mambrini, via OntoLex mailing list, 2020-12-14)

Candidate vocabularies

Most existing semantic representation models address lexical semantic aspects, which capture the underlying predicate-argument structure, without providing elements from logical semantics, which can be described as truth-conditional semantics and model-theoretic semantics. The emerging need is formalizing propositions, as idealised sentence suitable for logical manipulation, so that the meaning of the various parts of the propositions are given by a group of interpretation functions which license important inferences. (Pret-a-LLOD D5.1 report, 2020, p.21)

The main goal for emerging models should be providing a description for combining lexical and logical aspects in order to integrate typing predicates into the existing models and to model ambiguous predicates. In fact, as described by (Berant et al., 2011), different type signatures of the same predicate have different meanings, but given a type signature a predicate is unambiguous, and may reflect a distinction in the semantics that is not always obvious in the syntax. The representation of arguments to induce n-ary relations should allow to create a separate predicate for each pair of arguments of a word, furthering generalizations and supporting formal semantics for logical operators within linguistic theories. (Pret-a-LLOD D5.1 report, 2020, p.21)

A preliminary outcome of this discussion is a tentative recommendation for one particular candidate vocabulary introduced above. This discussion will be continued in exchange with the communities involved. For the moment, we express a preference for the PreMon vocabulary, as its development seems to be well-coordinated with the development of OntoLex-Lemon. (Pret-a-LLOD D5.1 report, 2020, p.21)

Lexfom

- Lexfom (for Meaning-Text Theory, Mel'cuk 1997)

PreMon

The PREdicate Model for ONtologies (PreMon, Rospocher et al., 2019)13 is an ontology that extends the lemon model to provide for the representation of predicate models and their mappings. PreMon supports the representation of predicate models such as PropBank, NomBank, VerbNet and FrameNet. PreMon provides an OWL ontology for modelling semantic classes (i.e., verb classes, rolesets, or frames) with their roles, mappings across different predicate models and to ontological resources, and annotations, based on OntoLex-Lemon. For this, the model extends lemon by introducing classes pmo:SemanticClass and pmo:SemanticRole. pmo:SemanticClass homogeneously represents the semantic classes from the various predicate models. Mappings are explicitly represented as individuals of class pmo:Mapping, and can be seen as sets of (or n-ary relations between) either (i) pmo:Conceptualizations, (ii) pmo:SemanticClasses, and (iii) pmo:SemanticRoles, with role mappings anchored to conceptualization or class mappings via property pmo:semRoleMapping. Structurally, a pmo:Conceptualization can be seen as the reification of the ontolex:evokes relation between ontolex:LexicalEntry and ontolex:LexicalConcept. Semantically, it can be seen as a very specific intensional concept (among many, in case of polysemy) evoked by a single ontolex:LexicalEntry, which can be generalized to a ontolex:LexicalConcept when multiple entries are considered but with a possible loss of information that prevents precise alignments to be represented. Besides the core PreMon vocabulary14, there are extensions to represent predicate models in FrameNet, Propbank and VerbNet. (Pret-a-LLOD D5.1 report, 2020, p.14)

Framester

Framester by STLab Framester (Gangemi et al., 2016) is a linked data resource that acts as a hub between FrameNet, WordNet, VerbNet, BabelNet, DBpedia, Yago, DOLCE-Zero, as well as other resources. Framester is not only a strongly connected knowledge graph, but also applies a rigorous formal treatment for Fillmore’s frame semantics, enabling full-fledged OWL querying and reasoning on a large frame-based knowledge graph. Following frame semantics, which is a development of case grammar and relates linguistic semantics to encyclopaedic knowledge, Framester describes the frame evoked by a single word. The underlying idea is allowing to formalize the semantic frame of encyclopaedic meaning, evoked or activated by a word and related to the specific concept which the word refers to. Words are not only the expression of individual concepts, but also the description of a certain perspective in which the frame is viewed. (Pret-a-LLOD D5.1 report, 2020, p.14)

Framester core maps WordNet, BabelNet, VerbNet and FrameNet expanding them to other linguistic resources transitivetely. It features a subsumption hierarchy of semantic roles, namely frame elements and generic roles on top of frame-specific roles. The core schema for Framester can be found at: https://w3id.org/framester/schema/. Framester has been released in version 3.015. Framester can be queried via a SPARQL16 endpoint and also features an Word-Frame Disambiguation API17. (Pret-a-LLOD D5.1 report, 2020, p.14)

Rich Event Ontology (REO)

The Rich Event Ontology (Brown et al., 2017) provides an independent conceptual backbone to unify existing semantic role labeling (SRL) schemas and augment them with event-to-event causal and temporal relations. By unifying the FrameNet, VerbNet, Automatic Content Extraction, and Rich Entities, Relations and Events resources, the ontology serves as a shared hub for the disparate annotation schemas and therefore enables the combination of SRL training data into a larger, more diverse corpus. By adding temporal and causal relational information not found in any of the independent resources, the ontology facilitates reasoning on and across documents, revealing relationships between events that come together in temporal and causal chains to build more complex scenarios. (Pret-a-LLOD D5.1 report, 2020, p.15)

Intended as a resource for a wide range of tasks, the Rich Event Ontology (REO) has been designed to encompass both meta-level concepts in its upper level and many general domains in its mid level. REO has been implemented in OWL, which allows for easy extension with more detailed, domain-specific ontologies. The main reference ontology now encompasses 161 classes and 553 axioms. Including the lexical resource ontologies and the linking models (described in detail in sections 2.5 and 2.6) in these counts brings the totals to 3,065 classes and 60,531 axioms, as well as 16,005 individuals representing the vocabulary (unique lemmas) of event denotations. (Pret-a-LLOD D5.1 report, 2020, p.15)

References

- Mel’čuk, I. (1997). Vers une linguistique Sens-Texte. Leçon inaugurale. Paris: Collège de France, 78 p.

- Mel’čuk, I. (1996). Lexical functions: A tool for the description of lexical relations in the lexicon. In L. Wanner (ed.): Lexical Functions in Lexicography and Natural Language Processing , pp. 37-102, Amsterdam/Philadelphia: Benjamins.

- Pret-a-LLOD D5.1 Report on Vocabularies for Interoperable Language Resources and Services; Author(s): Christian Chiarcos, Philipp Cimiano, Julia Bosque-Gil, Thierry Declerck, Christian Fäth, Jorge Gracia, Maxim Ionov, John McCrae, Elena Montiel-Ponsoda, Maria Pia di Buono, Roser Saurí, Fernando Bobillo, Mohammad Fazleh Elahi; Date: 2020-01-25; see https://cordis.europa.eu/project/id/825182/results