metadata (and not only): a few discussion points from Armando Stellato on 2014-12-17 (public-ontolex@w3.org from December 2014)

From: Armando Stellato <stellato@info.uniroma2.it>
Date: Wed, 17 Dec 2014 18:42:49 +0100
To: <public-ontolex@w3.org>
Message-ID: <DUB408-EAS157E84D29B8E26242E44C3CA06D0@phx.gbl>

Dear all,

we are expanding a bit here (wrt the meeting minutes) on what has been discussed in the last calls for what concerns the final touches still open on the metadata module.

We share the feel of being very close to the finalization of the vocabulary, however, we would like to deepen the discussion on a few aspects which we consider fundamental for the consistency of the whole model (and thus not limited to the metadata) by weighting pros and contra for each of them.

In some cases we may suggest our position, in some others we take no stance at all, so just take this as a neutral checklist to be verified all together.

1. Model and Terminology Consistency

We report here a few statements/idiosyncrasies that have been made/noticed in the context of our calls. We would suggest to verify them all together and then report them explicitly somewhere, to make things clear from the start.

a. a Lexicon contains only lexical information (no conceptual information, such as synsets)

are we fine with this? In some cases, WordNet (as a whole), which is mostly known as a “lexical database” (correct though maybe too general), has been called also a Lexicon (computational Lexicon). We know the literature can often explode even with terminology misuses, so it’s ok if we decide to keep the above statement in a strict way. Just checking confirmation (this influences other choices). Also it’s important to take in consideration all the modules and where their information belong to (semantic / lexicon part).

b. Lexical/Lexicalized (and then Conceptual/Conceptualized), not only terminology…
During the first year, I (Armando) suggested to introduce a superclass for synset-like things, and suggested to use the name LexicalConcept (used by Miller himself in describing synsets) to represent a common semantic entity for synonymic lexical entries. It is important that we recall a cause/effect distinction. A LexicalConcept is not a domain concept which is being lexicalized (it would be a “lexicalizED concept”), but an entity which exists as a semantic complementary element in the description of a lexicon (whether it is technically part of it or not, see point (a) above). So it is lexical in that is “has to do” with lexical descriptions. A few consequences:

i. ConceptualLexicon

This was the name reported in the minutes to represent Lexicons which have a conceptual backbone (like synsets in wordnet): actually we suggested: ConceptualizedLexicon. This sounds not as an oxymoron (agree with John that ConceptualLexicon does..), and actually tell more about something which is still (purely) a Lexicon. To confirm after vote on (a) if the conceptual backbone is part of the Lexicon or not (and so technically to which dataset the “evokes” triples belong).

ii. Use of properties evokes/denotes
We have got the impression during last calls, that ontolex:evokes has been intended to be used whenever a skos:Concept is being described.

Actually it is important that domain skos:Concepts in KOSs which are lexicalized through an ontolex:Lexicon fall in the same category as owl:Classes or properties...so to be linked through the ontolex:denotes property.

The ontolex:LexicalConcept should be meant to represent the conceptual backbone of a lexical database such as WordNet (proposed name: ConceptualizedLexicon) which is something totally different (and opposite) from lexicalizing a domain concept scheme.
Note: ontolex:evokes triples should not be part of a LexicalizationSet. Synsets are not concepts being lexicalized, but they exist to give meaning to lexical entries.
What to do then? Introduce another class for LexicalConcepts and for binding between them and LexicalEntries?, such as lime:Conceptualization?

2. Requirement: we should be able to model WordNet (and resources alike)
We felt important that ontolex should be able to represent WordNet-like resources (or wordnets) giving an umbrella over everything that is inside them. We should know explicitly (for already cited reasons) from the vocabulary whether a Lexicon has a conceptual backbone or not, and many other things. The general idea is that an agent using a lexical resource should know, by available metadata/data classification, which features it can rely on.

Proposal: Introduce property: ontolex:scheme : Lexicon => skos:ConceptScheme

this originated from requirement (1.a). If accepted, a WordNet would be identifiable as a dataset containing a Lexicon AND a (separated) Conceptualization.

One note: the name ontolex:scheme is quite misleading as it is taken from skos:ConceptScheme. Actually, the focus should be more on the fact that the pointed element is a conceptualization of the Lexicon. ontolex:conceptualization would be more appropriate.

we would also propose to introduce ontolex:LexicalConceptScheme. John suggested that it could be more complicate. Our point is this: if we give credit to the existence of LexicalConcepts, than much better to represent a proper collector for them: LexicalConceptScheme.

Introduce Class ConceptualizedLexicon ⊑ some.conceptualization

Introduce ConceptualizedLexicon ⊑ Lexicon ⊓ some.conceptualization

..and yes…maybe we should say something about (the presence/quantity of) glosses…

3. Keeping separate ontolex:Lexicon from lime:Lexicon
In principle no issue, we can collapse them. However, a few things that will happen as a consequence of the merge:

Pro:
- merge allows for one single entry in the vocabulary (only ontolex:Lexicon, no lime:Lexicon)
- more agile to embed metadata in the data (but only in those cases when the lexicon is very small so to be a single file, and not a SPARQL-accessible dataset, which would require a separate void file, see second point of contra below)

Contra:
- if you have separate data and associated metadata file, you have in any case (LOD principle for proper http-dereferenciation) to use different URIs to refer to the same object (one in the data and one in the metadata file), and then state a owl:sameAs between the two. Quite confounding wrt the usual void pattern...
- common use: the separate void file is necessary in all cases where there is a separate lexicalization from the lexicon UNION any scenario where the resource is/are of non-trivial size: these are the very common case, and would require the separate URIs and the owl:sameAs said above

One point also discussed is if there should be any other element which is not properly a Lexicon (in the ontolex sense at least) but still represent a purely lexical resource addressable from the metadata point of view. SKOS-XL label as separate lexicon is very rare, though not unheard-of as reported in the minutes of 5-Dec. Actually we know of cases of thesauri with file dumps of skosxl:Labels. These would be “sort-of-Lexicon + lexicalization” for a given dataset.

3.1. General Discussion on the usefulness of ontolex:Lexicon in the data

In general, in any ontology vocabulary, there is no tradition in declaring, in the data, collectors for "typical" objects that are defined in the vocabulary...they are just there, put in the data. E.g., there is no ClassSet, no PropertySet etc…

A notable exception is skos:ConceptScheme, which is not intended merely as a collector. It has been invented to provide different views over the same content (like a specific scheme rooted over a non-root concept of the main scheme, or even more fine-grained filters applied on the whole content).
Taken as-is, a ontolex:Lexicon is not providing any of these useful information, unless that was the intention.. A case coming to our mind is providing a general Lexicon which may have topic separations, in which one LexicalEntry belongs to the general Lexicon and to one or more (sub)Lexicons… is this the case?

Cheers,

Manuel and Armando

Received on Wednesday, 17 December 2014 17:43:26 UTC