Re: [ACTION-94]: go and find examples of concept ontology (semantic features of terms as opposed to domain type ontologies) from Dave Lewis on 2012-06-07 (public-multilingualweb-lt@w3.org from June 2012)

From: Dave Lewis <dave.lewis@cs.tcd.ie>
Date: Thu, 07 Jun 2012 22:58:21 +0100
To: public-multilingualweb-lt@w3.org
Message-ID: <4FD123FD.1050404@cs.tcd.ie>
Hi Tadej,
I spoke to some people from ISOCAT at LREC. They operate persistent URL 
for their platform, so with an example perhaps we could add that to the 
list?

cheers,
Dave

On 07/06/2012 15:19, Felix Sasaki wrote:
>
>
> 2012/6/7 Tadej Stajner <tadej.stajner@ijs.si 
> <mailto:tadej.stajner@ijs.si>>
>
>     Hi Felix,
>     as far as I'm aware, URIs only exist for the English wordnet.
>     Maybe prefixing the a # was not the best stylistic choice here,
>     but yes, what I meant to convey is that that value was a local
>     identifier, valid within a particular semantic network.
>
>     In the ideal scenario, these selectors would be dereferencible and
>     verifiable via URIs for arbitrary wordnets and terminology
>     lexicons and their entries.
>
>
>
> OK - the main point would be that they are dereferencible and 
> verifiable. In practice, you will not achieve that for arbitrary 
> wordnets, but you can achieve that for a subset, if the related 
> "players" agree. In the "collation" example mentioned before, the 
> identifier for the Unicode code point based collation 
> http://www.w3.org/2005/xpath-functions/collation/codepoint/ was the 
> lowest common dominator; in addition to that everybody is free to have 
> other URIs for arbitrary collations. I would hope that we could end up 
> with such a list (hopefully longer than one) for the semantic networks 
> too.
>
> Felix
>
>     Do we have any people involved in developing semantic networks or
>     term lexicons on this list? The compromise is allowing some
>     limited classes of non-URI local selectors, like synset IDs for
>     wordnets, and term IDs for TBX lexicons.
>
>     -- Tadej
>
>
>     On 6/7/2012 3:44 PM, Felix Sasaki wrote:
>>     Thanks, Tadej.
>>
>>     The value of the its-selector attribute looks like a document
>>     internal link. But it is probably an identifier of the synset in
>>     the given semantic network, no?
>>
>>     About 1) and 2): is your made-up example then the output of the
>>     text annotation use case? I am asking since you say "2) markup in
>>     raw ITS", so I'm not sure.
>>
>>     Also, it seems that an implementation needs to "know" about the
>>     resources that are identified via its-semantic-network-ref. This
>>     is really an identifier, like
>>     http://www.w3.org/2005/xpath-functions/collation/codepoint/
>>     is an identifier for a Unicode code point collation; it doesn't
>>     give you the collation data, but creating an implementation that
>>     "understands" the identifier means probably caching the collation
>>     data. The same would be true for the semantic network.
>>
>>     This leads to the next question: can we engage the developers of
>>     the semantic network (or other disambiguation related) resources
>>     to come up with stable URIs for these? It would be great to list
>>     these URIs in our specification and say "this is how you identify
>>     the English wordnet etc.", for scenarios like the collation data
>>     mentioned above.
>>
>>     Felix
>>
>>     2012/6/7 Tadej Štajner <tadej.stajner@ijs.si
>>     <mailto:tadej.stajner@ijs.si>>
>>
>>         Hi,
>>
>>         I agree with Pedro on the questions. Automatic word sense
>>         disambiguation is in practice still not perfect, so some
>>         semi-automatic user interfaces make a lot of sense. And how I
>>         think that this could look like in a made-up example,
>>         answering Felix's 1) and 2):
>>
>>         1) HTML+ITS: <span its-disambiguation
>>         its-semantic-network-ref="http://www.sfs.uni-tuebingen.de/lsd/index.shtml"
>>         <http://www.sfs.uni-tuebingen.de/lsd/index.shtml>
>>         its-selector="#synset_loschen_3">löschen</span>
>>
>>         2) Markup in raw ITS
>>         <its:disambiguation
>>            
>>         semanticNetworkRef="http://www.sfs.uni-tuebingen.de/lsd/index.shtml"
>>         <http://www.sfs.uni-tuebingen.de/lsd/index.shtml>
>>             selector="#synset_loschen_3">löschen</its:disambiguation>
>>
>>         -- Tadej
>>
>>
>>
>>         On 04. 06. 2012 13 <tel:04.%2006.%202012%2013>:53, Pedro L.
>>         Díez Orzas wrote:
>>>
>>>         Dear Felix,
>>>
>>>         Thank you very much. Probably Tadej can prepare the use
>>>         cases you mention, with the consolidated data category.
>>>         About the question 3 and 4, I can tell you the following:
>>>
>>>         3) Would it be produced also by an automatic text annotation
>>>         tool?
>>>
>>>         For the pointers to the three information referred (concepts
>>>         in Ontology, meanings in Lexical DB, and terms in
>>>         Terminological resources) I think it would be possible
>>>         semiautomatic annotation tools, that is, proposed by the
>>>         tool and confirmed by user.
>>>
>>>         The fully automatic text annotation  would need more
>>>         sophisticate “semantic calculus”, and most of these are
>>>         under research, as far as I know. Maybe, in this cases, it
>>>         should be combined with textAnalysisAnnotation, specifying
>>>         in *Annotation agent* – and *Confidence score* – which
>>>         systemand with which reliability has been produced.
>>>
>>>         4) Would 1-2 be consumed by an MT tool, or by other tools?
>>>
>>>         These can be basically consumed by language processing
>>>         tools, like MT, and other Linguistic Technology that needs
>>>         content or semantic info. For instance Text Analytics,
>>>         Semantic search, etc.. In the localization chains, these
>>>         information can be also used by automatic or semiautomatic
>>>         processes (like selection of dictionaries for translations,
>>>         or selection of translators/revisers by subject area)
>>>
>>>         It could be also used by humans for translation or
>>>         post-edition in case of ambiguity or lake of context in the
>>>         content, but mostly by automatic systems.
>>>
>>>         I hope this helps.
>>>
>>>         Pedro
>>>
>>>         ------------------------------------------------------------------------
>>>
>>>         *De:*Felix Sasaki [mailto:fsasaki@w3.org]
>>>         *Enviado el:* sábado, 02 de junio de 2012 14:13
>>>         *Para:* Tadej Stajner; pedro.diez
>>>         *CC:* public-multilingualweb-lt@w3.org
>>>         <mailto:public-multilingualweb-lt@w3.org>
>>>         *Asunto:* Re: [ACTION-94]: go and find examples of concept
>>>         ontology (semantic features of terms as opposed to domain
>>>         type ontologies)
>>>
>>>         Hi Tadej, Pedro, all,
>>>
>>>         this looks like a great chain of producing and consuming
>>>         metadata.
>>>
>>>         Apologies if this was explained during last weeks call or
>>>         before, but can you clarify a bit the following:
>>>
>>>         1) How would the actual HTML markup produced in the original
>>>         text annotation use case look like?
>>>
>>>         2) How would the markup in this use case look like?
>>>
>>>         3) Would it be produced also by an automatic text annotation
>>>         tool?
>>>
>>>         4) Would 1-2 be consumed by an MT tool, or by other tools?
>>>
>>>         Thanks again,
>>>
>>>         Felix
>>>
>>>         2012/5/31 Tadej Stajner <tadej.stajner@ijs.si
>>>         <mailto:tadej.stajner@ijs.si>>
>>>
>>>         Hi Pedro,
>>>         thanks for the excellent explanation. If I understand you
>>>         correctly, a sufficient example for this use case would be
>>>         annotation of individual words with synset URI of the
>>>         appropriate wordnet? If so, then I believe this route can be
>>>         practical - I think linking to the synset is a more
>>>         practical idea than expressing semantic features of the word
>>>         given the available tools.
>>>
>>>         Enrycher can do automatic all-word disambiguation into the
>>>         english wordnet, whereas  we don't have anything specific in
>>>         place for semantic features (which I suspect also holds for
>>>         other text analytics providers).
>>>
>>>         I'm also in favor of prescribing wordnets for individual
>>>         languages as valid selector domains as you suggest in option
>>>         1). That would make validation easier since we have a known
>>>         domain.
>>>
>>>         @All: Can we come up with a second implementation for this
>>>         use case, preferrably a consumer?
>>>
>>>         -- Tadej
>>>
>>>
>>>
>>>
>>>         On 5/29/2012 2:00 PM, Pedro L. Díez Orzas wrote:
>>>
>>>         Dear all,
>>>
>>>         Sorry for the delay. I tried to contact some people I think
>>>         can contribute to this, but they are not available these weeks.
>>>
>>>         Before providing an example to consider all if it is
>>>         worthwhile to maintain “semantic selector” attribute in the
>>>         consolidation of “Disambiguation” I would like to do a
>>>         couple considerations:
>>>
>>>          1. Probably we will not have short term any implementation,
>>>             but there are for example few semantic networks
>>>             available in web (see
>>>             http://www.globalwordnet.org/gwa/wordnet_table.html)
>>>             that could be mapped using semantic selectors. See on
>>>             line for example, the famous
>>>             http://wordnetweb.princeton.edu
>>>             <http://wordnetweb.princeton.edu/perl/webwn>).
>>>          2. The W3C working group SKOS (Simple Knowledge
>>>             Organization System Reference) are maybe dealing with
>>>             similar things.
>>>
>>>         The “semántica selector” allows further lexical (simple
>>>         words or multi words) distinctions than a “domain” or an
>>>         ontology like NERD. Also, the denotation is different from
>>>         the “concept reference”, most of all in part of speech like
>>>         verbs.
>>>
>>>         Within the same domain, referring to very similar concepts,
>>>         languages have semantic differences. Depending on the
>>>         semantic theory used, each tries to captivate these
>>>         differences by means of different systems (semantic
>>>         features, semantic primitives, semantic nodes (in semantic
>>>         networks), other semantic representations). An example could
>>>         be the German verb “löschen”, which in different contexts
>>>         can take different meanings that can be try to capture using
>>>         different selectors, with the different systems.
>>>
>>>         –löschen                        -> clear             (some
>>>         bits)
>>>                                            -> delete           (files)
>>>                                            -> cancel          (programs)
>>>                                            -> erase            (a
>>>         scratchpad)
>>>                                            -> extinguish     (a fire)
>>>
>>>         Other possible translations of the verb**“löschen” are:
>>>
>>>         delete
>>>
>>>          
>>>
>>>         löschen, streichen, tilgen, ausstreichen, herausstreichen
>>>
>>>         clear
>>>
>>>          
>>>
>>>         löschen, klären, klarmachen, leeren, räumen, säubern
>>>
>>>         erase
>>>
>>>          
>>>
>>>         löschen, auslöschen, tilgen, ausradieren, radieren, abwischen
>>>
>>>         extinguish
>>>
>>>          
>>>
>>>         löschen, auslöschen, zerstören
>>>
>>>         quench
>>>
>>>          
>>>
>>>         löschen, stillen, abschrecken, dämpfen
>>>
>>>         put out
>>>
>>>          
>>>
>>>         löschen, bringen, ausmachen, ausschalten, treiben, verstimmen
>>>
>>>         unload
>>>
>>>          
>>>
>>>         entladen, abladen, ausladen, löschen, abstoßen, abwälzen
>>>
>>>         discharge
>>>
>>>          
>>>
>>>         entladen, erfüllen, entlassen, entlasten, löschen, ausstoßen
>>>
>>>         wipe out
>>>
>>>          
>>>
>>>         auslöschen, löschen, ausrotten, tilgen, zunichte machen,
>>>         auswischen
>>>
>>>         slake
>>>
>>>          
>>>
>>>         stillen, löschen
>>>
>>>         close
>>>
>>>          
>>>
>>>         schließen, verschließen, abschließen, sperren, zumachen, löschen
>>>
>>>         blot
>>>
>>>          
>>>
>>>         löschen, abtupfen, klecksen, beklecksen, sich unmöglich
>>>         machen, sich verderben
>>>
>>>         turn off
>>>
>>>          
>>>
>>>         ausschalten, abbiegen, abstellen, abdrehen, einbiegen, löschen
>>>
>>>         blow out
>>>
>>>          
>>>
>>>         auspusten, löschen, aufblasen, aufblähen, aufbauschen, platzen
>>>
>>>         zap
>>>
>>>          
>>>
>>>         abknallen, düsen, umschalten, löschen, töten, kaputtmachen
>>>
>>>         redeem
>>>
>>>          
>>>
>>>         einlösen, erlösen, zurückkaufen, tilgen, retten, löschen
>>>
>>>         pay off
>>>
>>>          
>>>
>>>         auszahlen, bezahlen, tilgen, abzahlen, abbezahlen, löschen
>>>
>>>         switch out
>>>
>>>          
>>>
>>>         löschen
>>>
>>>         unship
>>>
>>>          
>>>
>>>         ausladen, entladen, abnehmen, löschen
>>>
>>>         souse
>>>
>>>          
>>>
>>>         eintauchen, durchtränken, löschen, nass machen
>>>
>>>         rub off
>>>
>>>          
>>>
>>>         abreiben, abgehen, abwetzen, ausradieren, abscheuern, löschen
>>>
>>>         strike off
>>>
>>>          
>>>
>>>         löschen
>>>
>>>         land
>>>
>>>          
>>>
>>>         landen, an Land gehen, kriegen, an Land ziehen, aufsetzen,
>>>         löschen
>>>
>>>         According to this, the consolidation of
>>>         disambiguation/namedEntity/  data categories under
>>>         “Terminology”
>>>         http://www.w3.org/International/multilingualweb/lt/wiki/Requirements#disambiguation
>>>         could be the following. It is thought to cover operational
>>>         URI or XPath pointers to the current three most important
>>>         semantic resources: conceptual (ontology), semantic
>>>         (semantic networks or lexical databases) and terminological
>>>         (glossaries and terminological resources), where ontologies
>>>         are used for both general lexicon and terminology, semantic
>>>         networks to represent general vocabulary (lexicon), and
>>>         terminological resources specialized vocabulary.
>>>
>>>         *disambiguation*
>>>
>>>         Includes data to be used by MT systems in disambiguating
>>>         difficult content
>>>
>>>         *Data model*
>>>
>>>           * concept reference: points to a *concept in an ontology*
>>>             that this fragment of text represents. May be an URI or
>>>             an XPath pointer.
>>>           * semantic selector: points to a *meaning in an semantic
>>>             network* that this fragment of text represents. May be
>>>             an URI or an XPath pointer.
>>>           * terminology reference: points to *a term in a
>>>             terminological resource* that this fragment of text
>>>             represents. May be an URI or an XPath pointer.
>>>           * equivalent translation: expressions of that concept in
>>>             other languages, for example for training MT systems
>>>
>>>         Also, I would keep *textAnalysisAnnotation*, since the
>>>         purpose is quite different.
>>>
>>>         Anyway, if we consider not to include “semantic selector”
>>>         now, maybe it can be for future versions or to be treated in
>>>         liaison with other groups.
>>>
>>>         I hope it helps,
>>>
>>>         Pedro
>>>
>>>         *__________________________________*
>>>
>>>         **
>>>
>>>         *Pedro L. Díez Orzas*
>>>
>>>         *Presidente Ejecutivo/CEO*
>>>
>>>         *Linguaserve Internacionalización de Servicios, S.A.*
>>>
>>>         *Tel.: +34 91 761 64 60 <tel:%2B34%2091%20761%2064%2060>
>>>         Fax: +34 91 542 89 28 <tel:%2B34%2091%20542%2089%2028> *
>>>
>>>         *E-mail: **pedro.diez@linguaserve.com
>>>         <mailto:pedro.diez@linguaserve.com>*
>>>
>>>         *www.linguaserve.com <http://www.linguaserve.com/>*
>>>
>>>         **
>>>
>>>         «En cumplimiento con lo previsto con los artículos 21 y 22
>>>         de la Ley 34/2002, de 11 de julio, de Servicios de la
>>>         Sociedad de Información y Comercio Electrónico, le
>>>         informamos que procederemos al archivo y tratamiento de sus
>>>         datos exclusivamente con fines de promoción de los productos
>>>         y servicios ofrecidos por LINGUASERVE INTERNACIONALIZACIÓN
>>>         DE SERVICIOS, S.A. En caso de que Vdes. no deseen que
>>>         procedamos al archivo y tratamiento de los datos
>>>         proporcionados, o no deseen recibir comunicaciones
>>>         comerciales sobre los productos y servicios ofrecidos,
>>>         comuníquenoslo a clients@linguaserve.com
>>>         <mailto:clients@linguaserve.com>, y su petición será
>>>         inmediatamente cumplida.»
>>>
>>>         "According to the provisions set forth in articles 21 and 22
>>>         of Law 34/2002 of July 11 regarding Information Society and
>>>         eCommerce Services, we will store and use your personal data
>>>         with the sole purpose of marketing the products and services
>>>         offered by LINGUASERVE INTERNACIONALIZACIÓN DE SERVICIOS,
>>>         S.A. If you do not wish your personal data to be stored and
>>>         handled, or you do not wish to receive further information
>>>         regarding products and services offered by our company,
>>>         please e-mail us to clients@linguaserve.com
>>>         <mailto:clients@linguaserve.com>. Your request will be
>>>         processed immediately."
>>>
>>>         *____________________________________*
>>>
>>>
>>>
>>>         -- 
>>>         Felix Sasaki
>>>
>>>         DFKI / W3C Fellow
>>>
>>
>>
>>
>>
>>     -- 
>>     Felix Sasaki
>>     DFKI / W3C Fellow
>>
>
>
>
>
> -- 
> Felix Sasaki
> DFKI / W3C Fellow
>
Received on Thursday, 7 June 2012 21:58:48 UTC