Lexicography
The final draft of the lexicog module specification can be found at https://jogracia.github.io/ontolex-lexicog/
This section was used to collect and document, collaboratively, the doubts and issues that emerged when working with lemon-ontolex for dictionaries. They touched several types of issues:
- T1- Best practises when applying lemon core and existent modules (i.e.: best ways of modelling certain things with the existent "ingredients")
- T2- Detected limitations of the lemon core (i.e., things that cannot be easily modelled with the current "ingredients")
- T3- Missing entities to account for specific dictionary information
- T4- Missing categories in existent catalogues (e.g., LexInfo)
In the following, we detail the different issues.
Issues
I1: Lexical sense cannot be associated to a particular form of a lexical entry. [PENDING CONFIRMATION TO CLOSE] |
---|
Reported by: UPM on 2/2/2017, last updated on 25/07/2017 |
Type: T2 |
Related references: |
Description: How to model a lexical sense that is directly associated to a particular form in addition to a lexical entry?In Spanish the word “cura” in its masculine form means “priest, pastor”, but in feminine means “healing, cure”. They share etymology and, although they could be regarded as two different words, in the RAE dictionary (and many others) are presented as one. This is also the case for "la cometa" (kite) - "el cometa" (the comet). A case in which we clearly aren’t dealing with two different words but only one is the following: “margen” (bank, edge), which can be masculine or feminine but when referring to the "margin of a document" only the masculine form is applicable. In English, the sense of manner referring to polite social behavior requires the plural form manners, and we find a similar case with refreshment when denoting snacks and beverages. In German, the multiple senses of Wasser, Band, and Wort occur with different plural forms but belong to the same dictionary entry |
Discussion:
02/02/2017 Philipp> I see them ("cura") as two lexical entries. Julia> how to stick to the original source? What in the case of "margen"? Ph> according to the way we define LEs, all the lexical entries assume the same meaning, not possible to have two meanings to different forms Paul> core idea of lemon is to group senses but split at the morphosyntactical level Ph> are we over-fitting to a particular way of doing things (standard dictionaries)? Jorge> in "margen" the meaning is the same but it is the context what forces one form or the other Decision: in "cura" it seems that two LEs are needed, but in "margen" it is not so clear, let's continue thinking about this |
Solution:
Some solutions for this issue are proposed at https://www.w3.org/community/ontolex/wiki/Lexicography#TELCO_05.09.2017. Update 11.12.17: There seems to be agreement on using Option 3' (DictionaryEntry and DictionaryEntry Components): :manner_n_sp a ontolex:LexicalEntry ; %% with both singular and plural forms ontolex:lexicalForm :manner_n_sp_form_s ; ontolex:lexicalForm :manner_n_sp_form_p . :manner_n_sp_form_s ontolex:writtenRep “manner”@en ; lexinfo:number lexinfo:singular . :manner_n_sp_form_p ontolex:writtenRep “manners”@en ; lexinfo:number lexinfo:plural . :manner_n_sp_sense a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :manner_n_sp_sense_lc . :manner_n_sp_sense_lc a ontolex:LexicalConcept; skos:definition "A way in which a thing is done or happens"@en. :manner_n_sp ontolex:sense :manner_n_sp_sense . :manner_n_mn a ontolex:LexicalEntry ; %%with only singular form (mass noun) ontolex:lexicalForm :manner_n_mn_form . :manner_n_mn_form ontolex:writtenRep “manner”@en ; lexinfo:number lexinfo:massNoun . :manner_n_mn_sense a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :manner_n_mn_sense_lc . :manner_n_mn_sense_lc a ontolex:LexicalConcept; skos:definition "A semantic category of adverbs and adverbials which answer the question ‘how?’"@en. :manner_n_mn ontolex:sense :manner_n_mn_sense. :manner_n_p a ontolex:LexicalEntry ; %% with only plural form ontolex:lexicalForm :manner_n_p_form . :manner_n_p_form ontolex:writtenRep “manners”@en ; lexinfo:number lexinfo:plural . :manner_n_p_sense a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :manner_n_p_sense_lc . :manner_n_p_sense_lc a ontolex:LexicalConcept; skos:definition "Polite or well-bred social behaviour"@en. :manner_n_p ontolex:sense :manner_n_p_sense. :manner_n a lex:DictionaryEntry; %described in one single entry rdf:_1 :manner_n_sp_sense_comp, rdf:_2 :manner_n_mn_sense_comp, rdf:_3 :manner_n_p_sense_comp. :manner_n_sp_sense_comp a lexgr:DictionaryEntryComponent; lexgr:describes :manner_n_sp_sense. :manner_n_mn_sense_comp a lexgr:DictionaryEntryComponent; lexgr:describes :manner_n_mn_sense . :manner_n_p_sense_comp a lexgr:DictionaryEntryComponent; lexgr:descrbes :manner_n_p_sense . :exampleDictionary a lexgr:Dictionary; lime:language "en" ; lex:dictEntry :manner_n . |
I2: same sense has different conditions on register or geography that affect the grammatical gender of the entry [NOT ENOUGH DATA] |
---|
Reported by: UPM on 2/2/2017, last updated on 25/07/2017 |
Type: T2 |
Related references: [1] |
Description: The same sense has different conditions on register or geography that affect the grammatical gender of the entry: in Spanish “el mar/la mar” (the sea) have the same meaning but different register (e.g., feminine in poetry or nautical domain, or masculine when metaphorically referred as "a mass of something"). Both forms have the same sense but different usages (context). |
Discussion:
02/02/2017 Jorge> similar to "margen" Paul> these two "mar" would be associated to two different ontological concepts Julia> people are using ontolex without the ontology, starting from the lexical resource Fahad> I faced similar problem. There is the option of using TEI if you want to model dictionaries Ilan> TEI is far from ideal Ilan> In our LD conversion of dictionaries, we wanted everything revertible Jorge> what about adding a property not in the core model but in a lexicography module to account for these register-dependent relation between sense andform? |
Solution:
Update: Julia> I have not been able to find other examples of this phenomenon besides el/la mar. I suggest we skip this issue until we get more data. |
I3: Headwords that can take different parts-of-speech [PENDING CONFIRMATION TO CLOSE] |
---|
Reported by: UPM and U Bielefeld on 25/07/2017 |
Type: T2 |
Related references: |
Description: Some headwords can take different parts-of-speech but many dictionaries group all of their senses under the same dictionary entry due to etymological reasons. E.g. poison, bread, water (noun and verb), wash (noun and verb), Sp. lento ‘slow, slowly’ (adjective and adverb), Sp. bajo ‘short, quietly, bass, under’ (adjective, adverb, noun, preposition), Sp. general (adj, noun), etc. The limitation does not concern the access to this information (we can get lexical entries that share lemma but not part-of-speech) but the representation of entries that share etymology and lemma, and are conceived as single dictionary entries in the source data. |
Discussion: |
Solution:
Some solutions for this issue are proposed at https://www.w3.org/community/ontolex/wiki/Lexicography#TELCO_26.07.2017 and https://www.w3.org/community/ontolex/wiki/Lexicography#TELCO_05.09.2017. Update 11.12.17: There seems to be agreement on using Option 3' (DictionaryEntry and DictionaryEntry Components). Example for the headword Sp. bajo, which in Spanish takes four parts of speech: :bajo_adj a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:adjective . :bajo_adj_sense_1 a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :bajo_adj_sense_1_lc . :bajo_adj_sense_1_lc a ontolex:LexicalConcept; skos:definition "De poca altura"@es. %%"short" sense. :bajo_adj ontolex:sense :bajo_adj_sense_1. :bajo_n a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:noun . :bajo_n_sense_1 a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :bajo_n_sense_1_lc. :bajo_n_sense_1_lc a ontolex:LexicalConcept; skos:definition "Piso bajo de las casas que tienen dos o más"@es. %%"ground floor" sense :bajo_n ontolex:sense :bajo_n_sense_1. :bajo_adv a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:adverb . :bajo_adv_sense_1 a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :bajo_adv_sense_1_lc. :bajo_adv_sense_1_lc a ontolex:LexicalConcept; skos:definition "A poca altura"@es. %%"low" sense :bajo_adv ontolex:sense :bajo_adv_sense_1. :bajo_prep a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:preposition . :bajo_prep_sense_1 a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :bajo_prep_sense_1_lc. :bajo_prep_sense_1_lc a ontolex:LexicalConcept; skos:definition "debajo de (‖ en lugar inferior a)"@es. %%"under" sense :bajo_prep ontolex:sense :bajo_prep_sense_1. :bajo a lex:DictionaryEntry; rdf:_1 :bajo_adj_sense_1_comp, rdf:_2 :bajo_n_sense_1_comp, rdf:_3 :bajo_adv_sense_1_comp, rdf:_4 :bajo_prep_sense_1_comp. :example_Dictionary a lex:Dictionary ; lime:language "es" ; lex:dictEntry :bajo . :bajo_adj_sense_1_comp a lex:DictionaryEntryComponent; lex:describes :bajo_adj_sense_1 . :bajo_n_sense_1_comp a lex:DictionaryEntryComponent; lex:describes :bajo_n_sense_1. :bajo_adv_sense_1_comp a lex:DictionaryEntryComponent; lex:describes :bajo_adv_sense_1. :bajo_prep_sense_1_comp a lex:DictionaryEntryComponent; lex:describes :bajo_prep_sense_1. |
I4: usage examples [PENDING CONFIRMATION TO CLOSE] |
---|
Reported by: UPM on 2/2/2017, last updated 25/07/2017 |
Type: T1, T2 |
Related references: [1][4] |
Description: There is currently no way of representing sense examples and their translations. Some dictionaries group together examples that are translations of each other, for instance: prep. besides. Def. in addition to. Example: Besides my brother, I have three other siblings.. Translations: En plus de mon frère, j'ai trois autres frères et sœurs (fr), 弟に加えて、私には他に3人の兄弟がいる。 (ja), Além do meu irmão, tenho três outras irmãs (br-po), Foruten broren min har jeg tre andre søsken (no), Förutom min bror har jag tre andra syskon (sv) [K Dictionaries Global Series English Dictionary] |
Discussion: |
Solution:
During the telco on 26.07.2017 the class UsageExample and the property lexgr:example were briefly discussed. Update: if examples do not include translations or if there is no further information concerning those examples, using the property skos:example and a literal suffices (instead of instantiating a UsageExamples): :verre_n_lc_3 a ontolex:LexicalConcept; skos:definition “Vase à boire, fait de verre ; ce qu'il contient”@fr ; skos:example “un verre de vin.”@fr . If examples do include translations, comments, sources, etc. then the classes UsageExample and ExampleCluster would be suitable: :besides_prep a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:preposition ; ontolex:sense :besides_sense_1. :besides_sense_1 a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :besides_sense_1_lc. :besides_sense_1_lc a ontolex:LexicalConcept; lex:example :besides_sense_1_lc_example . :besides_sense_1_lc_example a lex:UsageExample; rdf:value "Besides my brother, I have three other siblings."@en . lex:exampleTranslation en-plus-de_prep_sense_1_lc_example . :en-plus-de_prep a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:preposition ; ontolex:sense :en-plus-de_prep_sense_1 . :en-plus-de_prep_sense_1 a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :en-plus-de_prep_sense_1_lc. :en-plus-de_prep_sense_1_lc a ontolex:LexicalConcept; lex:example en-plus-de_prep_sense_1_lc_example . :en-plus-de_prep_sense_1_lc_example a lex:UsageExample; rdf:value "En plus de mon frère, j'ai trois autres frères et sœurs"@fr . :に加えて_post a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:postposition ; ontolex:sense :に加えて_post_sense_1 . :に加えて_post_sense_1 a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :に加えて_post_sense_1_lc. :に加えて_post_sense_1_lc a ontolex:LexicalConcept; lex:example :に加えて_post_sense_1_lc_example . :に加えて_post_sense_1_example_lc a lex:UsageExample ; rdf:value "弟に加えて、私には他に3人の兄弟がいる。"@ja. :além-do_prep a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:preposition ; ontolex:sense :além-do_prep_sense_1 . :além-do_prep_sense_1 a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :além-do_prep_sense_1_lc . :além-do_prep_sense_1_lc a ontolex:LexicalConcept; lex:example :além-do_prep_sense_1_lc_example . :além-do_prep_sense_1_lc_example a lex:UsageExample ; rdf:value "Além do meu irmão, tenho três outras irmãs"@BR-po. :besides_prep_sense_1_ec a lex:ExampleCluster; lex:containsExample :besides_sense_1_lc_example, :en-plus-de_prep_sense_1_lc_example, :加_post_sense_1_lc_example, :além-do_prep_sense_1_lc_example. |
I5: geographical information [TO DISCUSS] |
---|
Reported by: UPM on 2/2/2017 |
Type: T1,T4 |
Related references: [1] |
Description: Geographical information (not geographical variants!). One of the lexical senses of a lexical entry might be applicable to a specific place only (e.g, “manejar”@es, which is “to drive” in Venezuela and “to handle” in Spain). Is it appropriate to use "ontolex:usage" in this case? How to inform that this usage is of "geographical" type? Add new external registry of categories of usage?
The Lexvo Ontology includes the property "lexvo:usedIn", with range lexvo:GeographicRegion, and described as "The property of a language or writing system being used somewhat extensively in a particular geographical region at some point in time". Some examples: English: OED: trainer: Sense 1: A person who trains people or animals Sense 2: [British] A soft sports shoe suitable for casual wear. Option 1: Treat this as a case of issue 1, where a sense goes only with a form (here, the form would have writtenRep "trainer"@en-GB). We would have different lexical entries, each with its sense, and the DictionaryEntry "trainer" would point to all senses through the use of DictionaryEntryComponents, following an order. Option 2: To have both senses under the same lexical entry and to use ontolex:usage, lexvo:usedIn + Geographical Region, or a new property to state that Sense 2 is only used in British English. Same issue with football, vest, mate, etc. A mix of this issue and issue 1 is given in the case of OED braces (the "suspenders" sense takes the plural form and is used in British English). Spanish: RAE: bolsa (bag, sack) Sense 1: Especie de talega o saco de tela u otro material, que sirve para llevar o guardar algo. (bag or sack of cloth or other material used to carry or to store something.) Sense 17: [C. Rica, El Salv., Guat., Hond., Méx. y Nic.] Bolsillo de las prendas de vestir. (A pocket in garments) Sense 17 is attested in six countries vs. sense 1 (if no indication is given, it is attested in all Spanish speaking countries). |
Discussion:
Ph> ontolex:usage is OK although its range is fuzzy. Another option is subsenses |
Solution:
|
I6: order of senses [CLOSED] |
---|
Reported by: UPM on 2/2/2017 |
Type: T3 |
Related references: [5] |
Description: Order of senses for a lexical entry. Different dictionaries apply different criteria (frequency of use, date of origin, …). That ordering should be searchable and retrievable. |
Discussion:
(collected points from e-mail threads, telcos, etc. related to sense order): A. Stellato> this can be a metadata information A.Tchechmedjiev > In collaborative dictionaries such as Wiktionary (and the Ontolex version in dbnary), the order of senses is typically conditioned by the prevalence of the use of the sense from the perspective of the contributors (that is a purely cognitive criterion unrelated to any objective measure of actual use of the senses). As such, there is no knowable concrete reason for the ordering, yet it is still valuable. DBNary encodes this both in the URI scheme and as a custom property of LexicalSense. M. Alper > information like what headwords are grouped in dictionary entries, sense ordering etc. could be expressed more transparently (e.g. if sense ordering is based on frequency of use, add frequency of use data directly) so that this data is accessible and will be used consistently across data sets (which fits with the aspiration of linked data) F. Khan > for legacy dictionaries its not always clear why senses have a certain order, or it involves too much of an interpretation T. Knorr > I would not spend much time on in what sequence the entries should be displayed/stored. If any, I would focus on the ability of the computer to select based on context. J. Gracia > this translates into a model that supports ANY order criteria, being agnostic of the particular sequence in which the entries should be displayed/stored, which is left to the dictionary creator's choice or to a NLP service that dynamically assigns it according to the context. In principle we should focus on the modelling side |
Solution:
There seems to be agreement on using option 3' , which indicates the order of DictionaryEntryComponents. However, the ordering criterion is not yet addressed, this option merely serves to represent a sequence. :verre_n_dict a lexgr:DictionaryEntry ; rdf:_1 :verre_n_sense_1_comp ; rdf:_2 :verre_n_sense_2_comp ; rdf:_3 :verre_n_sense_3_comp . :verre_n_sense_1_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :verre_n_sense_1 . :verre_n_sense_2_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :verre_n_sense_2 . :verre_n_sense_3_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :verre_n_sense_3 . |
I7: onomasiological ordering: relating senses to concepts [TO DISCUSS] | ||||||||
---|---|---|---|---|---|---|---|---|
Reported by: Ssstolk on 25/7/2017 | ||||||||
Type: T3 | ||||||||
Related references: [8] | ||||||||
Description: Onomasiological ordering of senses can be looser than ontolex:isLexicalizedSenseOf. That property indicates that a sense explicitly lexicalizes a certain concept (and therefore all other senses that do so are considered near-synonyms), but a great number of thesauri use a looser approach by simply stating a sense is (categorized) within a certain concept. Much like a word that indicates "a basket to carry seeds" would not lexicalize the concept "Farming" but could definitely be said to be within its domain. See example taken from [8]:
A second illustration is taken from Stolk's dissertation. This one depicts sample content from the HTOED, of which lexical senses found at the same category indicate they are (near)synonymous. | ||||||||
Discussion: In order to truly express how senses are categorized according to topical systems in thesauri, additional terminology is required beyond what OntoLex currently offers. Properties from other vocabularies that might fill the gap, such as dcterms:subject, tend to be too generic to be able to infer further knowledge from topical systems of thesauri. Moreover, the relation between such properties and ontolex:isLexicalizedSenseOf is not evident. As such, the required terminology is best captured in an update of the OntoLex vocabulary itself or in a lexicographical module.
Suggested property :isSenseInConcept, which can relate a lexical sense to a semantic/lexical concept. By definition it would be a superproperty of ontolex:isLexicalizedSenseOf . :isSenseInConcept a owl:ObjectProperty ; rdfs:label "is sense in concept"@en ; rdfs:comment "This property relates a lexical sense to a concept to which it onomasiologically belongs."@en ; rdfs:domain ontolex:LexicalSense ; rdfs:range ontolex:LexicalConcept . ontolex:isLexicalizedSenseOf rdfs:subPropertyOf :isSenseInConcept | ||||||||
Further discussion:
|
I8: onomasiological ordering: specifying conceptual levels [TO DISCUSS] |
---|
Reported by: Ssstolk on 25/7/2017 |
Type: T3 |
Related references: |
Description: In many onomasiological orderings, the concepts are divided over different levels. In classifications like Linnaeus's we find such examples already similar distinctions: classes divided into orders, then families, genera, and species, etc. In thesauri, similar distinctions are made to suggest a certain granularity or depth of the concepts in the overall categorization system. The vocabulary XKOS (an extension of SKOS) provides something like depth of concepts, but instead of a conceptual depth, it defines the notion depth as the exact depth in the tree. Instead, quite a number of thesauri (like Thesaurus of Old English, Historical Thesaurus of the Oxford English Dictionary, Historical Thesaurus of Scots, etc) do not have their conceptual levels coincide exactly with a specific depth of the tree. The only rule is that higher conceptual levels in a branch can only branch out into conceptual levels of equal or lower conceptual level. Two examples of the topical system of a thesaurus are depicted here (visualizations taken from dissertation chapters of S. Stolk): |
Discussion: Possible solution would be to add terminology for conceptual levels. Example using tentative lex: namespace below.
<lexicon> a ontolex:ConceptSet . <lexicon> lex:numberOfConceptualLevels 2 . <lexicon> lex:conceptualLevels ( <conceptualLevel1> <conceptualLevel2> ) . <conceptualLevel1> a lex:ConceptualLevel . <conceptualLevel1> skos:prefLabel "Main categories" . <conceptualLevel1> lex:conceptualDepth 1 . <conceptualLevel1> skos:member <someConcept_A> . <conceptualLevel2> a lex:ConceptualLevel . <conceptualLevel2> skos:prefLabel "Subgroups" . <conceptualLevel2> lex:conceptualDepth 2 . <conceptualLevel2> skos:member <someConcept_B> , <someConcept_C> . |
I9: Representation of complex forms, idioms, etc. and their relation to the lexical entry and/or lexical sense [CLOSED] |
---|
Reported by: UPM on the basis of the example provided by F. Frontini (Petite Larousse Illustré, 1905 edition) |
Type: T1 |
Related references: |
Description: Most dictionaries include a set of complex forms related to the entry, usually compounds, idioms, collocations, etc. These sometimes are defined in relation to a particular sense (e.g. K Dictionaries), sometimes are encoded as sub-senses of a sense (OED, s. fool -- fool around), but very frequently they are just defined under the same dictionary entry after all senses have been presented. How should these forms be represented and linked to a) the main sense under which they are defined, if applicable or b) the main entry under which they are defined? (see discussion) |
Discussion:
In the case of the example Verre (glass) provided by F. Frontini, there are three complex forms related to it which are defined in that same dictionary entry: verre double (double glass), Maison de verre (lit. glass house, a hourse where there are no secrets) and Petit verre (a shot, alcoholic drink). In contrast to K Dictionaries data, where complex forms where introduced at the sense level, in the P. Larousse example they all seem to refer to the same dictionary entry and not to one of its senses in particular. Possible approaches (based on UPM's work with K Dictionaries): a) Complex forms related to the lexical entry: treat them as independent lexical entries and relate them to the lexical entry in which they are defined through the decomp module or through vartrans:lexRel. b) Complex forms related specifically to a lexical sense of the entry in which they are defined: treat them as independent lexical entries and relate their sense to the appropriate sense of the main entry through vartrans:senseRel. Relate both lexical entries with vartrans:lexRel or the decomp module. Problem: By "extracting" new lexical entries from the original dictionary entry (verre, verre double, Maison de verre, Petite verre) we are no longer encoding that only one of them corresponds to the lemma in the P. Larousse. In the work with K Dictionaries we created an element kd:Dictionary that gathered those lexical entries that were also dictionary entries in the source data, whereas the new ones, created on-the-fly during the conversion, belonged to a lime:Lexicon but not to a kd:Dictionary. |
Solution:
Update 11.12.17: Complex and/or related forms to the lemma are treated as LexicalEntr[ies] and linked to the LexicalEntry encoding the lemma through decomp:subterm. Additionally, the property decomp:constituent can be used to mark the components of the compound, if desired: :verre_double a ontolex:LexicalEntry ; decomp:subterm :verre_n ; ontolex:lexicalForm :verre_double_form ; ontolex:sense :verre_double_sense ; decomp:constituent :verre_comp, :double_comp; rdf:_1 :verre_comp; rdf:_2 :double_comp. :verre_comp a decomp:Component . :double_comp a decomp:Component . By adopting option 3', the entry "verre" is represented as a lexical entry *and* as a Dictionary Entry. If the related forms are not dictionary entries themselves in a given dictionary, they will only be treated as ontolex:LexicalEntries and not as lexgr:DictionaryEntries. This solves the problem mentioned above. :verre_n a ontolex:LexicalEntry . :petit_verre a ontolex:LexicalEntry. :verre_n_dict a lexgr:DictionaryEntry. |
I10: Encyclopedic information [CLOSED] |
---|
Reported by: UPM on the basis of the example provided by F. Frontini (Petite Larousse Illustré, 1905 edition) |
Type: T1, T3? |
Related references: |
Description: The Petite Larousse Illustré, 1905 edition, provides an encyclopedic definition along with the lexical ones. Do we want to represent this information as RDF? If so, how and where? |
Discussion:
Possible approach: If we encode the lexical definition at the level of the lexical concept with skos:definition, the encyclopedic definition could be included with a different property in order to distinguish both kinds of definitions. The information provided in this section of the dictionary entry can be used to create the ontological layer that we usually lack in the conversion of linguistic resources to ling. linked data, and could be later linked to other enclyclopedic resources such as BabelNet. |
Solution:
Update 11.12.17: If the dictionary includes encyclopedic definitions (or encyclopedic information in general), this can be represented using a skos:Concept and the property skos:definition: :verre_n_concept a skos:Concept ; skos:definition “Le verre. dont l'invention est attribuée aux Phéniciens, est obtenu par la fusion dans des creusets (ou pots) d'un mélange de silice (sable) avec des sels de soude, de potasse (verre ordinaire) ou de plomb (cristal.) The lexical concepts that will be defined for each dictionary sense can point to this general concept through skos:related or skos:broader, for instance. The property ontolex:concept relates an ontological entity to a lexical concept that represents the corresponding meaning, but in a dictionary the lemma may be polysemic and we do not know which of the lexical concepts that it evokes represents the general/encyclopedic one. The properties skos:broader or skos:related do not make those strong assumptions. :verre_n_lc_3 a ontolex:LexicalConcept; skos:definition “Vase à boire, fait de verre ; ce qu'il contient”@fr ; skos:example “un verre de vin.”@fr ; skos:related :verre_n_concept. |
I11: Dictionary Senses as Lexical Senses or as Lexical Concepts [CLOSED] |
---|
Reported by: UPM on the basis of the example provided by F. Frontini (Petite Larousse Illustré, 1905 edition) |
Type: T1 |
Related references: |
Description: There is a discusion on whether dictionary senses should be modelled as lexical concepts or lexical senses: definitions in a dictionary are not logical and there is nothing in the dictionary leading to assume a external ontology entity to which dictionary senses refer. If translations or lexico-semantic restrictions are to be represented, the lexical sense is the class used in those cases, but it requieres a reference that we lack in a dictionary. |
Discussion:
See https://www.w3.org/community/ontolex/wiki/Teleconference,_2017.11.07,_14-15_pm_CET |
Solution:
In order to keep both lexical senses (compatibility with other lexica migrated to lemon, possibility to represent translations and lexico-semantic restrictions specific to the lexical entry) and lexical concepts (the definition is not logical, no ontology reference, definitions may have cross-lexical validity, compatibility with WordNets), the following modelling is suggested: LexicalEntry -- ontolex:sense -- LexicalSense -- isLexicalizedSenseOf -- LexicalConcept (with definition): :verre_n a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:noun ; lexinfo:gender lexinfo:masculine; ontolex:lexicalForm :verre_n_form; ontolex:sense :verre_n_sense_1, :verre_n_sense_2, :verre_n_sense_3. :verre_n_sense_1 a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :verre_n_lc_1 . :verre_n_lc_1 a ontolex:LexicalConcept; skos:definition “Corps solide, transparent et fragile, produit de la fusion d'un sable siliceux mêlé de potasse ou de soude”@fr ; skos:example “le verre est très cassant.”@fr ; skos:related :verre_n_concept. |
Discussion
This section presents in chronological order all examples, figures, suggestions and possible solutions that were discussed in relation to the above-mentioned issues.
Some comments by UPM post 2/2/17 telco
Issues 1 and 2 could be potentially solved with an explosion of lexical entries. For instance “folk”@en means “people in general” or “traditional music” (same etymology, PoS, morphological patterns).
Option 1 (lemon-compliant) only one lexical entry [1]
:folk_n a lemon:LexicalEntry ; lemon:lexicalForm :folk_n_form_s ; lemon:lexicalForm :folk_n_form_p . :folk_n_form_s lemon:writtenRep “folk”@en ; lexinfo:number lexinfo:singular . :folk_n_form_p lemon:writtenRep “folks”@en ; lexinfo:number lexinfo:plural . :folk_n lemon:sense :folk_sense_people; lemon:sense :folk_sense_music.
Problem: how to say that when :folk_sense_music applies the lexical entry only can take its singular form :folk_n_form_s
Option 2 (ontolex-compliant) several lexical entries [2]
:folk_n_s a ontolex:LexicalEntry ; ontolex:lexicalForm :folk_n_form_s ; :folk_n_sp a ontolex:LexicalEntry ; ontolex:lexicalForm :folk_n_form_s ; ontolex:lexicalForm :folk_n_form_p . :folk_n_form_s ontolex:writtenRep “folk”@en ; lexinfo:number lexinfo:singular . :folk_n_form_p ontolex:writtenRep “folks”@en ; lexinfo:number lexinfo:plural . :folk_n_p ontolex:sense :folk_sense_music; :folk_n_sp ontolex:sense :folk_sense_people;
Problem 1 (minor): proliferation of lexical entries [imagine another case with only plural -> 3rd lexical entry; many more if a similar situation happens with other properties (e.g., gender) ]. Much information duplicated (etymology, etc.)
Problem 2 (major): the form is “aware” of the senses and the lexical entry is not independent of the senses (which is against the idea of decoupling lexical layer from conceptual layer)
REMARK: ontolex does not support option 1; lemon supports both -> lemon supports the “descriptive but not prescriptive” design principle better
Telco 26.07.2017
Proposed solution for Issues 1 and 3 (temporarily skipping Issue 2 until we find more examples):
A new class Dictionary Entry to both enable to group together lexical entries as well as to associate any information shared by all of them by a) linking to the ontolex-compliant lexical entries extracted from the dictionary entry (first proposal by UPM) OR b) directly pointing to the senses of those lexical entries (revised proposal after discussion with U Bielefeld). Mirroring the lime:Lexicon-LexicalEntry relation we suggest a Dictionary-DictionaryEntry one in order to distinguish lexical entries and lime:Lexic[a] from the original dictionary entry and the original dictionary resource (a new class Dictionary), which would serve in turn to record the provenance of each dictionary entry and dictionary versions, etc. Any lexical entry created during the conversion to LLD but not originally provided in the resource would then belong to a lime:Lexicon, but not to the instance of Dictionary representing that resource.
Issue 1 (I1: Lexical sense cannot be associated to a particular form of a lexical entry. ) and lex:DictionaryEntry (namespace "lex" used for illustration):
E.g. manner in Oxford English Dictionary (skipping the sense-subsense hierarchy at this point):
Sense: A way in which a thing is done or happens
Sense: [mass noun] A semantic category of adverbs and adverbials which answer the question ‘how?’
Sense: (manners) Polite or well-bred social behaviour
:manner_n_sp a ontolex:LexicalEntry ; ontolex:lexicalForm :manner_n_sp_form_s ; ontolex:lexicalForm :manner_n_sp_form_p . :manner_n_sp_form_s ontolex:writtenRep “manner”@en ; lexinfo:number lexinfo:singular . :manner_n_sp_form_p ontolex:writtenRep “manners”@en ; lexinfo:number lexinfo:plural . :manner_n_sp_sense a ontolex:LexicalSense . :manner_n_sp_sense_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense :manner_n_sp_sense; skos:definition "A way in which a thing is done or happens"@en. :manner_n_sp ontolex:sense :manner_n_sp_sense .
:manner_n_mn a ontolex:LexicalEntry ; ontolex:lexicalForm :manner_n_mn_form . :manner_n_mn_form ontolex:writtenRep “manner”@en ; lexinfo:number lexinfo:massNoun . :manner_n_mn_sense a ontolex:LexicalSense . :manner_n_mn_sense_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense :manner_n_mn_sense; skos:definition "A semantic category of adverbs and adverbials which answer the question ‘how?’"@en. :manner_n_mn ontolex:sense :manner_n_mn_sense.
:manner_n_p a ontolex:LexicalEntry ; ontolex:lexicalForm :manner_n_p_form . :manner_n_p_form ontolex:writtenRep “manners”@en ; lexinfo:number lexinfo:plural . :manner_n_p_sense a ontolex:LexicalSense . :manner_n_p_sense_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense :manner_n_p_sense; skos:definition "Polite or well-bred social behaviour"@en. :manner_n_p ontolex:sense :manner_n_p_sense. :manner_n a lex:DictionaryEntry; lex:dictSense :manner_n_sp_sense, :manner_n_mn_sense, :manner_n_p_sense. :example_Dictionary a lex:Dictionary ; lime:language "en" ; lex:dictEntry :manner_n .
Issue 3 (Headwords that can take different parts-of-speech) and lex:DictionaryEntry (namespace "lex" used for illustration):
E.g. bajo 'short, flat (adj.); ground floor, hem, bottom of pants, bass (n.); quietly, low (adv.), under, from (a viewpoint) (prep.) in Diccionario de la Lengua Española:
Sense: adj. De poca altura. (short) Sense: adj. Dicho del calzado: Que no tiene tacón o lo tiene de poca altura. (predicated from shoes: flat, low). Sense: n. Piso bajo de las casas que tienen dos o más. (ground floor) Sense: n. Dobladillo de la parte inferior de la ropa. (hem, bottom of pants) Sense: adv. A poca altura. (low) Sense: adv. En voz baja o que apenas se oiga. (quietly, in a way difficult to hear) Sense: prep. debajo de (‖ en lugar inferior a). (under) Sense: prep. Desde un enfoque u opinión. (from a viewpoint or an approach)
:bajo_adj a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:adjective . :bajo_adj_sense_1 a ontolex:LexicalSense . :bajo_adj_sense_1_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense bajo_adj_sense_1; skos:definition "De poca altura"@es. %%"short" sense :bajo_adj_sense_2 a ontolex:LexicalSense . :bajo_adj_sense_2_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense bajo_adj_sense_2; skos:definition "Dicho del calzado: Que no tiene tacón o lo tiene de poca altura"@es. %%"flat" sense :bajo_adj ontolex:sense bajo_adj_sense_1, bajo_adj_sense_2. :bajo_n a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:noun . :bajo_n_sense_1 a ontolex:LexicalSense . :bajo_n_sense_1_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense bajo_n_sense_1; skos:definition "Piso bajo de las casas que tienen dos o más"@es. %%"ground floor" sense :bajo_n_sense_2 a ontolex:LexicalSense . :bajo_n_sense_2_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense bajo_n_sense_2; skos:definition "Dobladillo de la parte inferior de la ropa"@es. %%"hem, bottom of pants" sense :bajo_n ontolex:sense bajo_n_sense_1, bajo_n_sense_2. :bajo_adv a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:adverb . :bajo_adv_sense_1 a ontolex:LexicalSense . :bajo_adv_sense_1_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense bajo_adv_sense_1; skos:definition "A poca altura"@es. %%"low" sense :bajo_adv_sense_2 a ontolex:LexicalSense . :bajo_adv_sense_2_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense bajo_adv_sense_2; skos:definition "En voz baja o que apenas se oiga."@es. %%"quietly, in a way difficult to hear" sense :bajo_adv ontolex:sense bajo_adv_sense_1, bajo_adv_sense_2. :bajo_prep a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:preposition . :bajo_prep_sense_1 a ontolex:LexicalSense . :bajo_prep_sense_1_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense bajo_prep_sense_1; skos:definition "debajo de (‖ en lugar inferior a)"@es. %%"under" sense :bajo_prep_sense_2 a ontolex:LexicalSense . :bajo_prep_sense_2_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense bajo_prep_sense_2; skos:definition "Desde un enfoque u opinión"@es. %%"from a viewpoint or approach" sense :bajo_prep ontolex:sense bajo_prep_sense_1, bajo_prep_sense_2. :bajo a lex:DictionaryEntry; lex:dictSense bajo_adj_sense_1, bajo_adj_sense_2, bajo_n_sense_1, bajo_n_sense_2, bajo_adv_sense_1, bajo_adv_sense_2, bajo_prep_sense_1, bajo_prep_sense_2. :example_Dictionary a lex:Dictionary ; lime:language "es" ; lex:dictEntry :bajo .
Proposed solution for Issue 4 (UPM and U Bielefeld):
A new class Usage Example and a property example linked to the Lexical Sense to represent usage examples. Examples in different languages can be represented with Usage Example and linked through exampleTranslation (property name tbd). The new class Example Cluster would group together multiple UsageExamples that are translations from each other to prevent an exploitation of exampleTranslation instances.
:besides_prep a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:preposition ; ontolex:sense :besides_sense_1. :besides_sense_1 a ontolex:LexicalSense ; lex:example besides_sense_1_example . :besides_sense_1_example a lex:UsageExample; rdf:value "Besides my brother, I have three other siblings."@en . lex:exampleTranslation en-plus-de_prep_sense_1_example . :en-plus-de_prep a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:preposition ; ontolex:sense :en-plus-de_prep_sense_1 . :en-plus-de_prep_sense_1 a ontolex:LexicalSense ; lex:example en-plus-de_prep_sense_1_example . :en-plus-de_prep_sense_1_example a lex:UsageExample; rdf:value "En plus de mon frère, j'ai trois autres frères et sœurs"@fr .
:加_post a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:postposition ; ontolex:sense :加_post_sense_1 . :加_post_sense_1 a ontolex:LexicalSense ; lex:example :加_post_sense_1_example . :加_post_sense_1_example a lex:UsageExample ; rdf:value "弟に加えて、私には他に3人の兄弟がいる。"@ja. :além-do_prep a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:preposition ; ontolex:sense :além-do_prep_sense_1 . :além-do_prep_sense_1 a ontolex:LexicalSense ; lex:example :além-do_prep_sense_1_example . :além-do_prep_sense_1_example a lex:UsageExample ; rdf:value "Além do meu irmão, tenho três outras irmãs"@BR-po. :foruten_prep a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:preposition ; ontolex:sense :foruten_prep_sense_1 . :foruten_prep_sense_1 a ontolex:LexicalSense ; lex:example :além-do_sense_1_example . :foruten_prep_sense_1_example a lex:UsageExample ; rdf:value "Foruten broren min har jeg tre andre søsken"@no.
:besides_prep_sense_1_ec a lex:ExampleCluster; lex:containsExample besides_sense_1_example, en-plus-de_prep_sense_1_example, 加_post_sense_1_example, além-do_prep_sense_1_example, foruten_prep_sense_1_example.
The following diagram summarizes the proposed solutions for Issues 1, 3 and 4. (Telco 26.07.2017)
Telco 05.09.2017
Context: Issues I1 (Lexical sense cannot be associated to a particular form of a lexical entry) and I3 (Headwords that can take different parts-of-speech).
Dictionaries show different structures in their description of lexical entries with the same written representation but different part-of-speech, being the following the most typical approaches:
1) They are presented as independent dictionary entries (treated as homographs). Example dictionaries: K Dictionaries or Merriam Webster. E.g. fool in Merriam Webster .
2) One dictionary entry is clearly subdivided by part-of-speech, with each sub-entry containing a list of ordered senses. Example dictionary: Oxford English Dictionary. Sometimes, the order of senses is continuous through the whole list of sub-entries (Collins English Dictionary). Note that these sub-entries with different part-of-speech but common etymology are still under the same dictionary entry in the fool case, whereas the entry referring to the British dessert is an independent dictionary entry (homograph). E.g.fool in OED, fool in Collins Dictionary.
3) One single dictionary entry with no sub-division by part-of-speech. Example dictionaries: Collins Learner's Dictionary, Diccionario de la Lengua Española (RAE). Senses are presented in an ordered (long) list, listed by frequency, difficulty, etc. E.g. fool in Collins Learner's Dictionary.
From 26.07.2017's telco:
Introduce a new class grouping lexical entries/lexical senses. Either:
1. DictionaryEntry as a group of lexical senses (OPTION 1)
2. DictionaryEntry as a group of lexical entries (OPTION 2)
3. DictionaryEntry as a group of lexical senses and/or lexical entries (OPTION 3)
4. LexicalDictionaryEntry as a group of lexical entries and SenseDictionaryEntry as a group of lexical senses (OPTION 4)
ADDED: 3'. DictionaryEntry may be divided into DictionaryEntryComponents (OPTION 3', proposed by S. Stolk)
The following diagram shows the five proposals:
Option 1 [Deprecated]: DictionaryEntry as a group of lexical senses
This was the option discussed during the telco on 26.07.2017. The only change is that the property dictSense has been replaced for a more general describes in analogy to the CSVW vocabulary, as proposed by S. Stolk.
Bajo example (issue 3)
:bajo a lex:DictionaryEntry; lex:describes bajo_adj_sense_1, bajo_adj_sense_2, bajo_n_sense_1, bajo_n_sense_2, bajo_adv_sense_1, bajo_adv_sense_2, bajo_prep_sense_1, bajo_prep_sense_2. :example_Dictionary a lex:Dictionary ; lime:language "es" ; lex:dictEntry :bajo .
Manner example (issue 1)
:manner_n_sp a ontolex:LexicalEntry ; %"manner" used both in singular and plural ontolex:lexicalForm :manner_n_sp_form_s ; ontolex:lexicalForm :manner_n_sp_form_p . :manner_n_sp_sense a ontolex:LexicalSense . :manner_n_sp ontolex:sense :manner_n_sp_sense . [...] :manner_n_mn a ontolex:LexicalEntry ; %"manner" only used as mass noun" ontolex:lexicalForm :manner_n_mn_form . :manner_n_mn_sense a ontolex:LexicalSense . :manner_n_mn ontolex:sense :manner_n_mn_sense. [...] :manner_n_p a ontolex:LexicalEntry ; %"manner" only used in plural ontolex:lexicalForm :manner_n_p_form . :manner_n_p_sense a ontolex:LexicalSense . :manner_n_p ontolex:sense :manner_n_p_sense.
:manner_n a lex:DictionaryEntry; lex:describes :manner_n_sp_sense, :manner_n_mn_sense, :manner_n_p_sense. :example_Dictionary a lex:Dictionary ; lime:language "en" ; lex:dictEntry :manner_n .
Fool example (issue 3) fool (from OED). Two homographs, the first one with three sub-entries.
:fool_1_n a ontolex:LexicalEntry ; %The first homograph of "fool" lexinfo:partOfSpeech lexinfo:noun . :fool_1_n_sense a ontolex:LexicalSense . :fool_1_n_sense_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense :fool_1_n_sense; skos:definition "A person who acts unwisely or imprudently; a silly person."@en :fool_1_n ontolex:sense :fool_1_n_sense . :fool_1_v a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:verb . :fool_1_v_sense a ontolex:LexicalSense . :fool_1_v_sense_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense :fool_1_v_sense; skos:definition "Trick or deceive (someone); dupe."@en. :fool_1_v ontolex:sense :fool_1_v_sense. :fool_1_adj a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:adjective . :fool_1_adj_sense a ontolex:LexicalSense . :fool_1_adj_sense_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense :fool_1_adj_sense; skos:definition "Foolish; silly."@en. :fool_1_adj ontolex:sense :fool_1_adj_sense .
:fool_2_n a ontolex:LexicalEntry ; %The second homograph of "fool" lexinfo:partOfSpeech lexinfo:noun . :fool_2_n_sense a ontolex:LexicalSense . :fool_2_n_sense_lc a ontolex:LexicalConcept; ontolex:lexicalizedSense :fool_2_n_sense; skos:definition "A cold dessert made of pureed fruit mixed or served with cream or custard"@en
:fool_1 a lex:DictionaryEntry; lex:describes :fool_1_n_sense; :fool_1_v_sense, :fool_1_adj_sense . :fool_2 a lex:DictionaryEntry; lex:describes :fool_2_n_sense . :exampleDictionary a lex:Dictionary ; lime:language "en" ; lex:dictEntry :fool_1, :fool_2 . :exampleLexicon a lime:Lexicon ; lime:entry :fool_1_n, :fool_1_v, :fool_1_adj, :fool_2_n .
Option 2 [Deprecated]: DictionaryEntry as a group of lexical entries
Fool example (see previous section for the whole RDF representation of the entry "fool")
:fool_1 a lex:DictionaryEntry; lex:describes :fool_1_n, :fool_1_v, :fool_1_adj . :fool_2 a lex:DictionaryEntry; lex:describes :fool_2_n . :exampleDictionary a lex:Dictionary ; lime:language "en" ; lex:dictEntry :fool_1, :fool_2 . :exampleLexicon a lime:Lexicon ; lime:entry :fool_1_n, :fool_1_v, :fool_1_adj, :fool_2_n .
Manner example: manner (from OED)
:manner a lex:DictionaryEntry; lex:describes :manner_n_sp, :manner_n_mn, :manner_n_p. :example_Dictionary a lex:Dictionary ; lime:language "en" ; lex:dictEntry :manner.
PROBLEM (?) -> This division of the dictionary entry into three lexical entries is not explicitly stated in the dictionary in the "bajo" and "manner" case, in contrast to the fool (n, adj, v) example. We are treating both cases in the same way.
Option 3 [Deprecated]: DictionaryEntry as a group of lexical entries OR lexical senses
If DictionaryEntry is considered to describe both LexicalEntries as well as LexicalSenses, lex:describes (exact name TBD) would take both as its domain. This would allow us both to link DictionaryEntries directly to LexicalSenses in cases in which there is no sub-division of entries in the source dictionary (see examples Sp.'bajo' and 'manner' above)...:
:bajo a lex:DictionaryEntry; lex:describes :bajo_adj_sense_1, :bajo_adj_sense_2, :bajo_n_sense_1, :bajo_n_sense_2, :bajo_adv_sense_1, :bajo_adv_sense_2, :bajo_prep_sense_1, :bajo_prep_sense_2.
:manner a lex:DictionaryEntry; lex:describes :manner_n_sp_sense, :manner_n_mn_sense, :manner_n_p_sense.
...and directly to LexicalEntries if there is a division by part-of-speech in the source dictionary:
:fool a lex:DictionaryEntry; lex:describes :fool_n, :fool_v, :fool_adj .
Option 4 [Deprecated]: LexicalDictionaryEntry as a group of lexical entries and SenseDictionaryEntry as a group of lexical senses
This option suggests two different elements to deal with each one of these cases (sense-based dictionary entries vs. lexical entry-based ones):
lex:SenseDictionaryEntry rdfs:subClassOf lex:DictionaryEntry . :bajo a lex:SenseDictionaryEntry; lex:describes bajo_adj_sense_1, bajo_adj_sense_2, bajo_n_sense_1, bajo_n_sense_2, bajo_adv_sense_1, bajo_adv_sense_2, bajo_prep_sense_1, bajo_prep_sense_2.
VS.
lex:LexicalDictionaryEntry rdfs:subClassOf lex:DictionaryEntry . :fool a lex:LexicalDictionaryEntry; lex:describes :fool_n, :fool_v, :fool_adj . :manner a lex:LexicalDictionaryEntry; lex:describes :manner_n_sp, :manner_n_mn, :manner_n_p.
Option 3' [Agreed on]: DictionaryEntry may be divided into DictionaryEntryComponents
[Proposed by S. Stolk]
In an analogy to W3C's CSVW vocabulary, where a csvw:Table consists of csvw:Row(s) and csvw:Column(s), we can state that DictionaryEntry may contain several DictionaryEntryComponents. Each component can be assigned a value or linked to ontolex elements (e.g. lexical senses, sense usage information, etc.). Through these (very general) classes the structure of the dictionary is captured and the lexicographic vocabulary is kept to a minimum. Since the goal of this option is solely to represent the structure of the entry without addressing the meaning of the data, it would still require the use of ontolex elements for an interpretation of it. If a dictionary contains components in an entry that can not yet be expressed with ontolex, however, this approach can be used as a simple and straightforward option to ensure that all data can still be captured as LLD. As such, it is best to be seen as a complementary approach to solution 3 (with a number of added benefits) that is certainly not meant to be compulsory in use.
With issue 3:
:bajo a lex:DictionaryEntry ; rdf:_1 :bajo_adj_sense_1_comp . :bajo_adj_sense_1_comp a lex:DictionaryEntryComponent . :bajo_adj a ontolex:LexicalEntry . ontolex:sense bajo_adj_sense_1 . :bajo_adj_sense_1 a ontolex:LexicalSense ; lex:example :bajo_adj_sense_1_ex. :bajo_adj_sense_1_ex a lex:UsageExample. :bajo lex:describes :bajo_adj . :bajo_adj_sense_1_comp lex:describes :bajo_adj_sense_1_ex .
Update: This option can also be used to account for sense order. The entry Fr.verre (`glass') in the Petit Larousse Illustré has three different senses, which can be ordered using Dictionary Entry Components:
:verre_n_dict a lexgr:DictionaryEntry ; rdf:_1 :verre_n_sense_1_comp ; rdf:_2 :verre_n_sense_2_comp ; rdf:_3 :verre_n_sense_3_comp . :verre_n_sense_1_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :verre_n_sense_1 . :verre_n_sense_2_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :verre_n_sense_2 . :verre_n_sense_3_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :verre_n_sense_3 .
Telcos from 10.10.2017 to 28.11.2017
Modelling of the entry verre from the Petit Larousse Illustré 1905 edition:
:verre_n a ontolex:LexicalEntry ; lexinfo:partOfSpeech lexinfo:noun ; lexinfo:gender lexinfo:masculine; ontolex:lexicalForm :verre_n_form; ontolex:sense :verre_n_sense_1, :verre_n_sense_2, :verre_n_sense_3; etym:etymon “vitrum”@la . :verre_n_form a ontolex:Form; ontolex:writtenRep “verre”@fr ; ontolex:phoneticRep “vè-re”@fr-x-petitlar .
:verre_n_sense_1 a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :verre_n_lc_1 . :verre_n_lc_1 a ontolex:LexicalConcept; skos:definition “Corps solide, transparent et fragile, produit de la fusion d'un sable siliceux mêlé de potasse ou de soude”@fr ; skos:example “le verre est très cassant.”@fr ; skos:related :verre_n_concept.
:verre_n_sense_2 a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :verre_n_lc_2 . :verre_n_lc_2 a ontolex:LexicalConcept; skos:definition “Objet fait de verre”@fr ; skos:example “verre de montre.”@fr ; skos:related :verre_n_concept.
:verre_n_sense_3 a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :verre_n_lc_3 . :verre_n_lc_3 a ontolex:LexicalConcept; skos:definition “Vase à boire, fait de verre ; ce qu'il contient”@fr ; skos:example “un verre de vin.”@fr ; skos:related :verre_n_concept.
:verre_n_concept a skos:Concept ; skos:definition “Le verre. dont l'invention est attribuée aux Phéniciens, est obtenu par la fusion dans des creusets (ou pots) d'un mélange de silice (sable) avec des sels de soude, de potasse (verre ordinaire) ou de plomb (cristal.) Les creusets sont placés dans des fours où la température est poussée jusqu'à 1.000°. Cueilli avec une canne que l'on plonge dans les creusets par une ouverture (ouvreau) pratiquée dans la paroi du four, le verre pâteux est travaillé, soufflé, moulé, étiré, pour donner des bouteilles, des vitres, des objets de gobeleterie, des tubes, etc. Les glaces sont obtenues par coulage ; on sort du four le creuset tout entier et l'on en verse le contenu sur une immense table de fonte. Tous les objets de verre, avant d'être livrés au commerce et indépendamment des façons qu'on leur fait subir ou des décors dont on les agrémente, doivent être recuits c'est-à-dire refroidis lentement, pour être moins cassants. Outre les mille objets à l'usage domestique, le verre sert encore à fabriquer les verres optiques et les instruments si nombreux utilisés dans les laboratoires. Ramolli au four et comprimé fortement, il donne la pierre de verre, qu'on emploie au revêtement des murs et même au pavage des rues.”@fr.
:verre_double a ontolex:LexicalEntry ; decomp:subterm :verre_n ; ontolex:lexicalForm :verre_double_form ; ontolex:sense :verre_double_sense . :verre_double_form a ontolex:Form; ontolex:writtenRep “Verre double”@fr . :verre_double_sense a ontolex:LexicalSense; ontolex:isLexicalizedSenseOf :verre_double_lc. :verre_double_lc a ontolex:LexicalConcept; skos:definition “verre très épais.”@fr .
:maison_de_verre a ontolex:LexicalEntry ; decomp:subterm :verre_n ; ontolex:lexicalForm :maison_de_verre_form; ontolex:sense :maison_de_verre_sense . :maison_de_verre_form a ontolex:Form; ontolex:writtenRep “Maison de verre”@fr . :maison_de_verre_sense a ontolex:LexicalSense; ontolex:isLexicalizedSenseOf :maison_de_verre_lc . :maison_de_verre_lc a ontolex:LexicalConcept; skos:definition “maison où il n'y a rien de secret.”@fr .
:petit_verre a ontolex:LexicalEntry ; decomp:subterm :verre_n .; ontolex:lexicalForm :petit_verre_form; ontolex:sense :petit_verre_sense . :petit_verre_form a ontolex:Form; ontolex:writtenRep “Petit verre”@fr . :petit_verre_sense a ontolex:LexicalSense ; ontolex:isLexicalizedSenseOf :petit_verre_lc . :petit_verre_lc a ontolex:LexicalConcept; skos:definition “liqueur alcoolique qu'on prend dans un verre de petite dimension”@fr; skos: example “boire un petit verre.”@fr
:verre_n_dict a lexgr:DictionaryEntry ; rdf:_1 :verre_n_sense_1_comp ; rdf:_2 :verre_n_sense_2_comp ; rdf:_3 :verre_n_sense_3_comp . :verre_n_sense_1_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :verre_n_sense_1 . :verre_n_sense_2_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :verre_n_sense_2 . :verre_n_sense_3_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :verre_n_sense_3 .
Telcos from December 2017 to May 2018
December 2017
By the end of 2017 we had agreed on the use of DictionaryEntr[ies], DictionaryEntryComponent[s] to describe dictionary entries as a group of senses in a certain order, encoded by rdf containers. The Petite Larousse example provided by Francesca was the example at hand:
:verre_n_dict a lexgr:DictionaryEntry ; rdf:_1 :verre_n_sense_1_comp ; rdf:_2 :verre_n_sense_2_comp ; rdf:_3 :verre_n_sense_3_comp . :verre_n_sense_1_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :verre_n_sense_1 . :verre_n_sense_2_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :verre_n_sense_2 . :verre_n_sense_3_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :verre_n_sense_3 .
January -- March 2018
From January on we started looking at how these elements would solve the issues listed on the wiki page.
- One of these issues was the sense-subsense hierarchy, which we discussed on the basis of the telos example provided by Fahad. We agreed to have a subCoponent property between DictionaryEntryComponents to describe sub-senses:
:telos_n_dict a lexgr:DictionaryEntry ; rdf:_1 :lsjsense_n32381_0_comp; (a component describing the first main sense) rdf:_2 :lsjsense_n32381_5_comp .
:lsjsense_n32381_0_comp a lexgr:DictionaryEntryComponent ; lexgr:describes :lsjsense_n32381_0; (the first main sense) lexgr:subComponent :lsjsense_n32381_1_comp, :lsjsense_n32381_2_comp; (subcomponents describing the children of the first main sense (sub-senses)) rdf:_1 :lsjsense_n32381_1_comp; (those subcomponents are also ordered, reflecting the subsense order.) rdf:_2 :lsjsense_n32381_2_comp .
- In a telco we shortly discussed how to encode geographical usage of senses. ontolex:usage would be an option but it is very general and its range is fuzzy, so geographicalUsage was proposed.
- The class UsageExample was proposed to be used instead of skos:example in those cases in which the dictionary provides more information about a sense example than merely the string.
- The attestation/citations issue and how to relate the attestation to a corpus came up during several telcos, but we did not discuss it yet with an example. There is some material in the Google Doc provided by Katrien and Jesse, as well as some examples of attesations in Fahad's telos entry. TO-DO.
March -- April 2018
- Some dictionary entries would need to be divided into several lexical entries when senses occurred with different parts of speech, or even just one sense associated with more that on part of speech (Blau in German and counterclock-wise in English examples, adverbs and adjectives). With the examples of the wiki we saw that this would lead to a high number of triples if we take DictionaryEntryComponents to describe only senses; there are situations in which we just want to "group" those lexical entries which have different part-of-speech into one dictionary entry. After some discussion on whether to have two properties, one to link DictionaryEntryComponents to senses and other to lexical entries (e.g. describesSense/describesEntry), in our last telco we agreed on using just one property, describes, with multiple defined ranges.
- The definition of lexical entry is strict in that the lexical forms are all associated with a set of meanings, but sometimes a dictionary sense is only associated with a particular form. If that restriction were loosened, this problem would be solved and we would not need to divide the entry into several lexical entries. How to specify that a sense takes a particular form could be done with an extensional and an intentional modelling TO-DO.
- During some telcos we emphasized that these elements, DictionaryEntry and DictionaryEntryComponents, would be used if necessary in the conversion of a dictionary to lemon-OntoLex. With most entries in the dictionary the core would suffice.
- With this in mind, the Dictionary class, intented to group everything originally in the dictionary (vs. lime:Lexicon) would need to group both DictionaryEntries as well as LexicalEntries.
- If we use DictionaryEntry just occasionally in the transformation of a dictionary to RDF, it seems that those entries for which the lemon core suffices are not dictionary entries (or, rather, that we do not know whether they are), as we are not treating them as such.
- In the future we might want to use these elements with lexicographical data that do not belong to what is traditionally called a "dictionary". Ilan's notion of "post-dictionary lexicography" comes into mind.
These aspects brought about the discussion on changing the name of DictionaryEntry to a more general name such as SuperEntry or SupEntry: a Dictionary would group together LexicalEntries and SuperEntries (arranged as desired, with a specific sense order, etc.). However, Philipp's point on DictionaryEntry being "a conscious and deliberate arrangement of lexical entries / words into collections, making lexicographic choices what to group, etc." and emphasis on its nature as "product" that describes language in a certain way perfectly fits the intended use of that class during our whole series of telcos.
May 2018
- SuperEntry replaces DictionaryEntry and all related class and property names are changed accordingly.
- The attestation/citations issue and how to relate the attestation to a corpus is to be discussed. TO-DO
- There are some doubts as to whether current class and property names are too long and specific: SuperEntryComponent vs. Component.
- The use of the property subComponent in those cases in which a component (e.g. describing a sense) has subcomponents (describing sub-senses) *already ordered* with rdf:_1, 2, etc. is to be discussed: is it necessary?
<comp1> a lex:SuperEntryComponent . rdf:_1 <subcomp1> ; rdf:_2 <subcomp2> ; (lex:subComponent <subcomp1>, <subcomp2> .)
- How to specify that a sense takes a particular form could be done with an extensional and an intentional modelling, not yet defined. TO-DO
- In a telco we shortly discussed how to encode geographical usage of senses. ontolex:usage would be an option but it is very general and its range is fuzzy, so geographicalUsage was proposed. To-confirm.
- Issues 7 (onomasiological ordering: relating senses to concepts) and 8 (onomasiological ordering: specifying conceptual levels) are also on the queue to discuss.
June 2018
- Discussion started on how to represent ATTESTATION, and other related notions such as citation, quotation, usage example, ...
- In order to clarify the different views, some definitions have been proposed:
- Ilan>
- A citation consists of a quote from a corpus (text); it may either (i) include a reference to its origin (bibl), or (ii) not.
- An attestation is the reference to a source (bibl) without its actual citation.
- (in other words, bibl and attestation might be similar, but the latter is not preceded by a citation)
- An example of usage (or usage example) is human-crafted, whether (i) corpus-inspired/derived, or (ii) not. The example can consist of either a full sentence or a short phrase (and could also be a citation)
- (there are different types of examples – mainly of general patterns, for reception/decoding purposes, active for production/encoding – but that is probably beyond the scope here)
- Philipp>
- Attestation: An attestation is a reference to a source that proves that the lexical entry has a certain linguistic property (e.g. a sense).
- CiTO ontology>
- A citation is a conceptual directional link from a citing entity to a cited entity, created by a human performative act of making a citation, typically instantiated by the inclusion of a bibliographic reference (biro:BibliographicReference) in the reference list of the citing entity, or by the inclusion within the citing entity of a link, in the form of an HTTP Uniform Resource Locator (URL), to a resource on the World Wide Web.
- Fahad (from [10])>
- [citations] do not deal directly with words or their usages rather they are concerned with documents or works and the rhetorical/organisational structure pertaining to them.
- [attestations] describe the direct relationship between an item in a lexicon and a text which evidences, or better, attests to its past use. These latter statements are at the level of linguistic facts about words and other lexical entries.
- [...] the example demonstrates the clear conceptual distinction that exists between the preformative act of citing a piece of text as evidence -- and there can be no reasonable doubt that the 1947 edition of the LSJ did indeed cite Thucydides 7.71 in its entry for ἀνώμᾰλος -- and an instance of a word in a text attesting to a given sense -- it is doubtful whether Thucydides did use the word in that sense in the passage in question: that is, the distinction between citations and attestations.
- Attestation reifies the relationship between a given lexical element in lemon -- whether this is a Lexical Entry, a Lexical Sense, a Lexical Form, or something else -- and a bibliographic item that contains a text exemplifying the use of the element in question; we will also be able to relate an Attestation with any citation that is associated with it. [An RDF example at http://lari-datasets.ilc.cnr.it/lsj_anomalos].
- Ilan>
Current status
This diagram illustrates the current status of the module.
References
[1] J. Bosque-Gil, J. Gracia, E. Montiel-Ponsoda, and G. Aguado-de Cea, "Modelling multilingual lexicographic resources for the web of data: the k dictionaries case," in Proc. of GLOBALEX'16 workshop at LREC'15, Portoroz, Slovenia, May 2016.
[2] J. Bosque-Gil, J. Gracia, and A. Gómez-Pérez, "Linked data in lexicography," Kernerman Dictionary News, pp. 19-24, Jul. 2016.
[3] Declerck, T., Wandl-Vogt, E., & Mörth, K. (2015). "Towards a Pan European Lexicography by Means of Linked (Open) Data". In Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference (pp. 342-355).
[4] Klimek, B., & Brümmer, M. (2015). "Enhancing lexicography with semantic language databases." Kernerman Dictionary News, 23, 5-10.
[5] Parvizi, A., Kohl, M., González, M., Saurí, R. (2016, May). "Towards a Linguistic Ontology with an Emphasis on Reasoning and Knowledge Reuse". Language Resources and Evaluation Conference (LREC), 2016
[6] Abromeit, F., Chiarcos, C., Fäth, C., & Ionov, M. (2016, May). "Linking the Tower of Babel: Modelling a Massive Set of Etymological Dictionaries as RDF". In LDL 2016 5th Workshop on Linked Data in Linguistics: Managing, Building and Using Linked Language Resources (p. 11).
[7] Khan, F., Díaz-Vera, J. E., & Monachini, M. (2016). "Representing Polysemy and Diachronic Lexico-Semantic Data on the Semantic Web". ESWC (2016)
[8] Stolk, S., "OntoLex and Onomasiological Ordering: Supporting Topical Thesauri", in Proc. of the LDK2017 Workshops (2017), NUI Galway, Ireland, 18 June (pp. 60–67).
[9] El Maarouf, I., Bradbury, J., & Hanks, P. (2014). PDEV-lemon: a Linked Data implementation of the Pattern Dictionary of English Verbs based on the Lemon model. In 3rd Workshop on Linked Data in Linguistics: Multilingual Knowledge Resources and Natural Language Processing (p. 88).
[10] Khan, F. & Boschetti, F. (2018). Towards a Representation of Citations in Linked Data Lexical Resources. In proc. of the XVIII EURALEX International Congress (EURALEX 2018).