Syntax and Semantics Module

From Ontology-Lexica Community Group

This page is for the code and documentation of this module, please use the discussion page to discuss all modelling issues

Code

This document is in Manchester OWL Syntax. Please refer to this guide for how to write the ontology in this syntax.

Prefix: owl: <http://www.w3.org/2002/07/owl#>
Prefix: rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
Prefix: xsd: <http://www.w3.org/2001/XMLSchema#>
Prefix: rdfs: <http://www.w3.org/2000/01/rdf-schema#>
Prefix: skos: <http://www.w3.org/2004/02/skos/core#>
Prefix: ontolex: <http://www.w3.org/ns/ontolex#>
Prefix: synsem: <http://www.w3.org/ns/ontolex-synsem#>

Class: synsem:Frame

    Annotations:
        rdfs:comment "Represents a syntactic usage of a lexical entry"@eng

    DisjointWith:
        ontolex:LexicalConcept, ontolex:Lexicon, ontolex:Form, ontolex:LexicalSense, ontolex:LexicalEntry

Class: synsem:Argument

    Annotations:
        rdfs:comment "Represents an argument used in a syntactic or semantic frame"@eng

    DisjointWith:
        ontolex:LexicalConcept, ontolex:Lexicon, ontolex:Form, ontolex:LexicalSense, ontolex:LexicalEntry

ObjectProperty: synsem:synBehavior

    Annotations:
        rdfs:comment "Indicates the frame, which represents the syntactic behavior of a lexical entry"@en

    Domain:
        ontolex:LexicalEntry

    Range:
        synsem:Frame

ObjectProperty: synsem:synArg

    Annotations:
        rdfs:comment "Indicates that an argument participates in a syntactic frame"@en

    Domain:
        synsem:Frame

    Range:
        synsem:Argument

ObjectProperty: synsem:semArg

    Annotations:
        rdfs:comment "Indicates that an argument participates in a semantic frame"@en

    Domain:
        ontolex:LexicalSense

    Range:
        synsem:Argument

ObjectProperty: synsem:subjOfProp

    Annotations:
        rdfs:comment "Indicates that an argument is the subject (domain) of the property referred to by this sense"@en

    SubPropertyOf:
        synsem:semArg

ObjectProperty: synsem:objOfProp

    Annotations:
        rdfs:comment "Indicates that an argument is the object (range) of the property referred to by this sense"@en

    SubPropertyOf:
        synsem:semArg

ObjectProperty: synsem:subjOfProp

    Annotations:
        rdfs:comment "Indicates that an argument an instance (subject of rdf:type triple) of the class referred to by this sense"@en

    SubPropertyOf:
        synsem:semArg

ObjectProperty: synsem:linguisticProperty

   Annotations:
       rdfs:comment "Super-property of all linguistic properties used to describe elements of the lexicon"@en

Documentation

Syntactic Frames

The primary units of this module is the frame and its arguments, these may be used to describe the syntactic behavior of a lexical entry. For example, we may describe a verb such as "love" as having a transitive behavior as follows:

<> a ontolex-lemon:LexicalEntry ;
  ontolex-lemon:canonicalForm <#CanonicalForm> ;
  ontolex-synsem:synBehavior <#TransitiveFrame> .

<#CanonicalForm> ontolex-lemon:writtenRep "love"@eng 

A transitive frame may be said to have two arguments as follows

<#TransitiveFrame> ontolex-synsem:synArg <#arg0> , <#arg1> .

Sub-properties of synArg may be introduced to describe specific syntactic roles by use of data category properties, e.g.,

<#TransitiveFrame> 
   dc:subject <#arg0> ;
   dc:directObject <#arg1> .

dc:subject rdfs:subPropertyOf ontolex-synsem:synArg .
dc:directObject rdfs:subPropertyOf ontolex-synsem:synArg .

Semantic Frames

Semantic roles in this module are considered to be equivalent to lexical senses in the whole model. As such, arguments may be added using the semArg property, e.g.,

<> a ontolex-lemon:LexicalEntry ;
   ontolex-lemon:sense <#sense1> .

<#sense1> a ontolex-lemon:LexicalSense ;
   ontolex-synsem:semArg <#arg0>, <#arg1> .

There are three properties used to indicate the exact role that the semantic argument takes in a semantic frame

Ontology

ontology:isFatherOf a owl:ObjectProperty ;
  rdfs:domain ontology:Father .
ontology:Father a owl:Class .

Lexicon

<> a ontolex-lemon:LexicalEntry ;
  ontolex-lemon:sense <#sense1>, <#sense2> .

<#sense1> a ontolex-lemon:LexicalSense ;
  ontolex-lemon:reference ontology:isFatherOf ;
  ontolex-synsem:subjOfProp <#arg0> ;
  ontolex-synsem:objOfProp <#arg1> .

<#sense2> a ontolex-lemon:LexicalSense ;
  ontolex-lemon:reference ontology:Father ;
  ontolex-synsem:isA <#arg0> .

This indicates that the semantic frames are

 arg0 isFatherOf arg1

And

 arg0 rdf:type Father

Mapping syntactic and semantic frames

Correspondence between a syntactic and semantic frame is simply stated by using the same argument URIs for the arguments of both a syntactic and semantic frame. For example

<> a ontolex-lemon:LexicalEntry ;
  ontolex-synsem:synBehavior <#TransitiveFrame> ;
  ontolex-lemon:sense <#Sense1> ;
  ontolex-lemon:canonicalForm <#CanonicalForm> .

<#CanonicalForm> ontolex-lemon:writtenRep "know"@eng .

<#TransitiveFrame> 
  dc:subject <#arg0> ;
  dc:directObject <#arg1> .

<#Sense1>
  ontolex-lemon:reference foaf:knows ;
  ontolex-synsem:subjOfProp <#arg0> ;
  ontolex-synsem:objOfProp <#arg1> .

Here, we state that the syntactic frame

 arg0 knows arg1

Corresponds to the semantic frame

 arg0 foaf:knows arg1

Linguistic Properties

This module introduces a 'linguistic property' property that is intended to indicate a linguistic annotation on an element, for example

<> a ontolex-lemon:LexicalEntry ;
  ontolex-synsem:linguisticProperty isocat:DC-1333 ; # noun
  ontolex-lemon:canonicalForm <#CanonicalForm> .

<#CanonicalForm> ontolex-lemon:writtenRep "cat"@eng .

Subproperties of this may be introduced to show particular kinds of annotations

<> a ontolex-lemon:LexicalEntry ;
  dc:partOfSpeech isocat:DC-1333 ; # noun
  ontolex-lemon:canonicalForm <#CanonicalForm> .

<#CanonicalForm> ontolex-lemon:writtenRep "cat"@eng .

dc:partOfSpeech rdfs:subPropertyOf ontolex-synsem:linguisticProperty .

Term Decomposition

A key part of the description of syntax in the Lemon OntoLex model is the description of how a multi-word lexical entry can be related to each of its component words. There are three principal ways this can be done

  • Sub-terms, which are links between a multi-word term and its component. These are commonly found in terminologies and do not generally describe which word or words the component actually refers to
  • Tokenization, which represents each of the words of a term, the order in which these words occur and optionally some inflectional information.
  • Phrase Structure, which represents a word by means of a graph (usually a phrase structure tree or dependency graph)

Term decomposition in Monnet Lemon

Term decomposition in Monnet Lemon was handled by using three different mechanisms

Sub-terms in Monnet Lemon

Sub-terms were indicated by a simple lexical entry to lexical entry relation (actually defined in LexInfo 2), as follows:

:AfricanSwineFlu a lemon:LexicalEntry ;
  lexinfo:subterm :African , :SwineFlu .

This method is very simple, however does not capture which words are actually used in the decomposition

Tokenization in Monnet Lemon

Tokenization was represented by means a decomposition property pointing to a RDF list of components, which then referred to the lexical entries by means of element properties

:AfricanSwineFlu a lemon:LexicalEntry ;
  lemon:decomposition (
    [ lemon:element :African ]
    [ lemon:element :Swine ]
    [ lemon:element :Flu ]
  ) .

This does capture the order and individual words in a decomposition well, however involves the creation of many blank nodes and is not possible to query with SPARQL

Phrase structure in Monnet Lemon

Phrase structure was captured by means of a separate node graph that referred to the decomposition (so that the order could be maintained).

:AfricanSwineFlu a lemon:LexicalEntry ;
  lemon:decomposition (
    :AfricanSwineFlu#Component_African
    :AfricanSwineFlu#Component_Swine
    :AfricanSwineFlu#Component_Flu
  ) ;
  lemon:phraseRoot [
    lemon:leaf :AfricanSwineFlu#Component_African ;
    lemon:edge [
      lemon:leaf :AfricanSwineFlu#Component_Swine ;
      lemon:edge [
        lemon:left :AfricanSwineFlu#Component_Flu
      ]
    ]
  ] .

This method is heavyweight and involves the definition of much extra vocabulary

Term Decomposition in OntoLex (John's Proposal)

I propose we handle term decomposition by means of introducing the following properties

  • A Component class that captures both the usage of Component and Node in Monnet Lemon
  • A transitive property constituent that links either Lexical Entries or Components to Components
  • A property identifies that links Components to Lexical Entries
  • A property subterm equal to the chain constituent o identifies

Sub-terms can then use the same light-weight syntax as Monnet Lemon

(In Red)

:AfricanSwineFlu a ontolex:LexicalEntry ;
  syntax:subterm :African, :SwineFlu .

To enable better SPARQL querying I propose we use RDF Seq in place of an RDF list to represent tokenization

(In Green)

:AfricanSwineFlu a ontolex:LexicalEntry ;
  rdf:_1 [ syntax:identifies :African ] ;
  rdf:_2 [ syntax:identifies :Swine ] ;
  rdf:_3 [ syntax:identifies :Flu ] .

Finally, we represent phrase structure by first defining a component that identifies the lexical entry

(In Blue)

:AfricanSwineFlu#Root syntax:identifies :AfricanSwineFlu

We then hang the decomposition from this, subclasses can be used to give the phrase type.

(In Blue)

:AfricanSwineFlu#Root syntax:constituent 
  [ a :ADJ ; syntax:identifies :African] ,
  [ a :NP  ; syntax:constituent [
    [ a :ADJ ; syntax:identifies :Swine ] ,
    [ a :NP ; syntax:constituent [
       [ a :NN ; syntax:identifies :Flu ]
    ]
  ] .

Morphology

TODO