Syntax and Semantics Module
This page is for the code and documentation of this module, please use the discussion page to discuss all modelling issues
Code
This document is in Manchester OWL Syntax. Please refer to this guide for how to write the ontology in this syntax.
Prefix: owl: <http://www.w3.org/2002/07/owl#> Prefix: rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> Prefix: xsd: <http://www.w3.org/2001/XMLSchema#> Prefix: rdfs: <http://www.w3.org/2000/01/rdf-schema#> Prefix: skos: <http://www.w3.org/2004/02/skos/core#> Prefix: ontolex: <http://www.w3.org/ns/ontolex#> Prefix: synsem: <http://www.w3.org/ns/ontolex-synsem#> Class: synsem:Frame Annotations: rdfs:comment "Represents a syntactic usage of a lexical entry"@eng DisjointWith: ontolex:LexicalConcept, ontolex:Lexicon, ontolex:Form, ontolex:LexicalSense, ontolex:LexicalEntry Class: synsem:Argument Annotations: rdfs:comment "Represents an argument used in a syntactic or semantic frame"@eng DisjointWith: ontolex:LexicalConcept, ontolex:Lexicon, ontolex:Form, ontolex:LexicalSense, ontolex:LexicalEntry ObjectProperty: synsem:synBehavior Annotations: rdfs:comment "Indicates the frame, which represents the syntactic behavior of a lexical entry"@en Domain: ontolex:LexicalEntry Range: synsem:Frame ObjectProperty: synsem:synArg Annotations: rdfs:comment "Indicates that an argument participates in a syntactic frame"@en Domain: synsem:Frame Range: synsem:Argument ObjectProperty: synsem:semArg Annotations: rdfs:comment "Indicates that an argument participates in a semantic frame"@en Domain: ontolex:LexicalSense Range: synsem:Argument ObjectProperty: synsem:subjOfProp Annotations: rdfs:comment "Indicates that an argument is the subject (domain) of the property referred to by this sense"@en SubPropertyOf: synsem:semArg ObjectProperty: synsem:objOfProp Annotations: rdfs:comment "Indicates that an argument is the object (range) of the property referred to by this sense"@en SubPropertyOf: synsem:semArg ObjectProperty: synsem:subjOfProp Annotations: rdfs:comment "Indicates that an argument an instance (subject of rdf:type triple) of the class referred to by this sense"@en SubPropertyOf: synsem:semArg ObjectProperty: synsem:linguisticProperty Annotations: rdfs:comment "Super-property of all linguistic properties used to describe elements of the lexicon"@en
Documentation
Syntactic Frames
The primary units of this module is the frame and its arguments, these may be used to describe the syntactic behavior of a lexical entry. For example, we may describe a verb such as "love" as having a transitive behavior as follows:
<> a ontolex-lemon:LexicalEntry ; ontolex-lemon:canonicalForm <#CanonicalForm> ; ontolex-synsem:synBehavior <#TransitiveFrame> . <#CanonicalForm> ontolex-lemon:writtenRep "love"@eng
A transitive frame may be said to have two arguments as follows
<#TransitiveFrame> ontolex-synsem:synArg <#arg0> , <#arg1> .
Sub-properties of synArg
may be introduced to describe specific syntactic roles by use of data category properties, e.g.,
<#TransitiveFrame> dc:subject <#arg0> ; dc:directObject <#arg1> . dc:subject rdfs:subPropertyOf ontolex-synsem:synArg . dc:directObject rdfs:subPropertyOf ontolex-synsem:synArg .
Semantic Frames
Semantic roles in this module are considered to be equivalent to lexical senses in the whole model. As such, arguments may be added using the semArg
property, e.g.,
<> a ontolex-lemon:LexicalEntry ; ontolex-lemon:sense <#sense1> . <#sense1> a ontolex-lemon:LexicalSense ; ontolex-synsem:semArg <#arg0>, <#arg1> .
There are three properties used to indicate the exact role that the semantic argument takes in a semantic frame
Ontology
ontology:isFatherOf a owl:ObjectProperty ; rdfs:domain ontology:Father . ontology:Father a owl:Class .
Lexicon
<> a ontolex-lemon:LexicalEntry ; ontolex-lemon:sense <#sense1>, <#sense2> . <#sense1> a ontolex-lemon:LexicalSense ; ontolex-lemon:reference ontology:isFatherOf ; ontolex-synsem:subjOfProp <#arg0> ; ontolex-synsem:objOfProp <#arg1> . <#sense2> a ontolex-lemon:LexicalSense ; ontolex-lemon:reference ontology:Father ; ontolex-synsem:isA <#arg0> .
This indicates that the semantic frames are
arg0 isFatherOf
arg1
And
arg0 rdf:type Father
Mapping syntactic and semantic frames
Correspondence between a syntactic and semantic frame is simply stated by using the same argument URIs for the arguments of both a syntactic and semantic frame. For example
<> a ontolex-lemon:LexicalEntry ; ontolex-synsem:synBehavior <#TransitiveFrame> ; ontolex-lemon:sense <#Sense1> ; ontolex-lemon:canonicalForm <#CanonicalForm> . <#CanonicalForm> ontolex-lemon:writtenRep "know"@eng . <#TransitiveFrame> dc:subject <#arg0> ; dc:directObject <#arg1> . <#Sense1> ontolex-lemon:reference foaf:knows ; ontolex-synsem:subjOfProp <#arg0> ; ontolex-synsem:objOfProp <#arg1> .
Here, we state that the syntactic frame
arg0 knows arg1
Corresponds to the semantic frame
arg0 foaf:knows
arg1
Linguistic Properties
This module introduces a 'linguistic property' property that is intended to indicate a linguistic annotation on an element, for example
<> a ontolex-lemon:LexicalEntry ; ontolex-synsem:linguisticProperty isocat:DC-1333 ; # noun ontolex-lemon:canonicalForm <#CanonicalForm> . <#CanonicalForm> ontolex-lemon:writtenRep "cat"@eng .
Subproperties of this may be introduced to show particular kinds of annotations
<> a ontolex-lemon:LexicalEntry ; dc:partOfSpeech isocat:DC-1333 ; # noun ontolex-lemon:canonicalForm <#CanonicalForm> . <#CanonicalForm> ontolex-lemon:writtenRep "cat"@eng . dc:partOfSpeech rdfs:subPropertyOf ontolex-synsem:linguisticProperty .
Term Decomposition
A key part of the description of syntax in the Lemon OntoLex model is the description of how a multi-word lexical entry can be related to each of its component words. There are three principal ways this can be done
- Sub-terms, which are links between a multi-word term and its component. These are commonly found in terminologies and do not generally describe which word or words the component actually refers to
- Tokenization, which represents each of the words of a term, the order in which these words occur and optionally some inflectional information.
- Phrase Structure, which represents a word by means of a graph (usually a phrase structure tree or dependency graph)
Term decomposition in Monnet Lemon
Term decomposition in Monnet Lemon was handled by using three different mechanisms
Sub-terms in Monnet Lemon
Sub-terms were indicated by a simple lexical entry to lexical entry relation (actually defined in LexInfo 2), as follows:
:AfricanSwineFlu a lemon:LexicalEntry ; lexinfo:subterm :African , :SwineFlu .
This method is very simple, however does not capture which words are actually used in the decomposition
Tokenization in Monnet Lemon
Tokenization was represented by means a decomposition property pointing to a RDF list of components, which then referred to the lexical entries by means of element properties
:AfricanSwineFlu a lemon:LexicalEntry ; lemon:decomposition ( [ lemon:element :African ] [ lemon:element :Swine ] [ lemon:element :Flu ] ) .
This does capture the order and individual words in a decomposition well, however involves the creation of many blank nodes and is not possible to query with SPARQL
Phrase structure in Monnet Lemon
Phrase structure was captured by means of a separate node graph that referred to the decomposition (so that the order could be maintained).
:AfricanSwineFlu a lemon:LexicalEntry ; lemon:decomposition ( :AfricanSwineFlu#Component_African :AfricanSwineFlu#Component_Swine :AfricanSwineFlu#Component_Flu ) ; lemon:phraseRoot [ lemon:leaf :AfricanSwineFlu#Component_African ; lemon:edge [ lemon:leaf :AfricanSwineFlu#Component_Swine ; lemon:edge [ lemon:left :AfricanSwineFlu#Component_Flu ] ] ] .
This method is heavyweight and involves the definition of much extra vocabulary
Term Decomposition in OntoLex (John's Proposal)
I propose we handle term decomposition by means of introducing the following properties
- A Component class that captures both the usage of Component and Node in Monnet Lemon
- A transitive property constituent that links either Lexical Entries or Components to Components
- A property identifies that links Components to Lexical Entries
- A property subterm equal to the chain constituent o identifies
Sub-terms can then use the same light-weight syntax as Monnet Lemon
(In Red)
:AfricanSwineFlu a ontolex:LexicalEntry ; syntax:subterm :African, :SwineFlu .
To enable better SPARQL querying I propose we use RDF Seq in place of an RDF list to represent tokenization
(In Green)
:AfricanSwineFlu a ontolex:LexicalEntry ; rdf:_1 [ syntax:identifies :African ] ; rdf:_2 [ syntax:identifies :Swine ] ; rdf:_3 [ syntax:identifies :Flu ] .
Finally, we represent phrase structure by first defining a component that identifies the lexical entry
(In Blue)
:AfricanSwineFlu#Root syntax:identifies :AfricanSwineFlu
We then hang the decomposition from this, subclasses can be used to give the phrase type.
(In Blue)
:AfricanSwineFlu#Root syntax:constituent [ a :ADJ ; syntax:identifies :African] , [ a :NP ; syntax:constituent [ [ a :ADJ ; syntax:identifies :Swine ] , [ a :NP ; syntax:constituent [ [ a :NN ; syntax:identifies :Flu ] ] ] .
Morphology
TODO