issue-68 (Re: Comment on ITS 2.0 WD-its20-20121206 - Disambiguation (and term)) from Felix Sasaki on 2013-01-11 (public-multilingualweb-lt-comments@w3.org from January 2013)

From: Felix Sasaki <fsasaki@w3.org>
Date: Fri, 11 Jan 2013 18:16:48 +0100
To: "public-multilingualweb-lt-comments@w3.org" <public-multilingualweb-lt-comments@w3.org>
Message-ID: <50F04900.9010606@w3.org>
All (co-chair hat on),

thank you for this discussion. General remark: as explained at
http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0045.html
please add the issue number to the mail subject. Otherwise it will be 
very hard to track discussions.

It would now be interesting to hear the implementors: according to
http://tinyurl.com/its20-testsuite-dashboard
Enlaso, Tilde and UL will implement terminology. As I understand it, UL 
will make a wrapper around the Enlaso / Okapi engine, correct?
Now, for Disambiguation we have Enlaso, JSI, Moravia and UL. Here I 
*think* that Moravia and UL will basically have an Okapi wrapper. Please 
correct me if I'm wrong.

This leaves us with the following situation:
- two implementations for terminology (Enlaso and Tilde)
- two for disambiguation (Enlaso and JSI)

So Mārcis, Tadej, Yves - what do you think about this proposal?

I'm asking this also since I have to remind people about the W3C process:

(W3C process hat on) We cannot just say "we don't like a comment". There 
needs to be good reasons to reject it. Below argumentation can support 
the rejection, but the rejection is rather weak if implementers don't 
have an opinion or would even say "I would do the change". So please 
express your thoughts in this thread.

Best,

Felix

Am 11.01.13 14:07, schrieb Jörg Schütz:
> +1
>
> Hi Christian, David, and all,
>
> I would have similar arguments for keeping term and disambiguation 
> separat although they are related. There are several use cases out 
> there in the wild that need this kind of separation, e.g. terminology 
> based workflows in a particular supply chain vs. data stream analyses 
> which prepare the data for further treatment such as a machine 
> translation application (vocubulary support and training/tuning life 
> cycles).
>
> One other topic is the discussion of the ISOCat elements which to some 
> extend would force applications to adopt an NLP standard that might 
> not be appropriate for a given application scenario, e.g. those that 
> do not use NLP technologies at all. Therefore, I would also recommend 
> that we do not talk about bringing ITS closer to NLP because ITS 
> should remain open and deployable for different language processing 
> strategies.
>
> Nevertheless, thanks a lot for raising these concerns.
>
> All the best -- Jörg
>
> On Jan 11, 2013, at 12:22 (CET), Dr. David Filip wrote:
>> Dear Christian, thanks for this insightful comment.
>> I agree that the disambiguation category is one of the most important
>> additions that can expand the usage of the standard and become more
>> useful across technologies and industries.
>>
>> The group had discussed and it is clear that disambiguation and term are
>> somehow related categories. We have however not considered deprecation
>> of the ITS 1.0 term, at least not explicitly.
>>
>> I believe that this is given by the chartered principles of the group
>> [paraphrasing]
>> 1) Do not break 1.0
>> 2) Keep the 1.0 principle of independent categories that can also be
>> independently implemented.
>>
>> I believe that your proposal to fuse term and disambiguation is inline
>> with 2) in the sense of making two seemingly interdependent categories
>> into one fully self contained and independent category, but would
>> violate 1).
>>
>> But even if we did not care for 1), I believe that the relationship
>> between term and disambiguation is a reasonably loose one, i.e. not a
>> hard formal interdependency that would warrant or even mandate normative
>> handling, and thus can and should be handled in non-normative material
>> such as a best practice document, while we are keeping both categories,
>> because they have discernable use cases and still can be implemented
>> independently.
>>
>> A)
>> A user that uses both a terminology management system and a text
>> analytics system for disambiguation can reasonably combine them and
>> their combination can be driven by organization specific process driven
>> considerations. They can for instance harvest spans marked as
>> disambiguation as term candidates for their Terminology database and
>> these can be encoded as terms next time if e.g. a  terminologist
>> approves them as terms.
>>
>> B)
>> People using text analytics input only do not need to care about term.
>>
>> C)
>> People using terminology management as the only source do not need to
>> bother with complexities of the disambiguation category.
>>
>> To summarize:
>> While many ITS categories, and prominently term and disambiguation, are
>> informally semantically related, it seems important to keep a reasonable
>> and manageable granularity of the independently implementable 
>> categories.
>>
>> I hope this helps to understand the group's motivation for keeping the
>> categories apart.
>> Please let me know
>> Rgds
>> dF
>>
>> Dr. David Filip
>> =======================
>> LRC | CNGL | LT-Web | CSIS
>> University of Limerick, Ireland
>> telephone: +353-6120-2781
>> *cellphone: +353-86-0222-158*
>> facsimile: +353-6120-2734
>> mailto: david.filip@ul.ie <mailto:david.filip@ul.ie>
>>
>>
>> On Thu, Jan 10, 2013 at 9:14 AM, Lieske, Christian
>> <christian.lieske@sap.com <mailto:christian.lieske@sap.com>> wrote:
>>
>>     Hi,____
>>
>>     __ __
>>
>>     Please find below comments/observations/questions/ideas concerning
>>     the ITS 2.0 working draft dated December 6, 2012
>>     (http://www.w3.org/TR/2012/WD-its20-20121206/).  Please feel free to
>>     contact me for clarifications if anything is unclear.____
>>
>>     __ __
>>
>>     The section related to the “disambiguation” data category to me is
>>     one of the most important ones of the draft. ITS 2.0 from my
>>     point-of-view moves ITS 1.0 closer to Natural Language Processing
>>     (NLP), and “disambiguation” to me is related to NLP in various ways.
>>     Thus, making “disambiguation” powerful and easy to use (e.g. via a
>>     clear distinction to other data categories, as well as
>>     conceptualizations and wording that are not just known within
>>     linguistics) seems important to me.____
>>
>>     ____
>>
>>     While looking at “disambiguation” from this angle, I started to
>>     wonder if it could benefit from additions/modifications. I apologize
>>     in advance if a reply to this comment may require that discussions
>>     which presumably already took place may have to be summarized.____
>>
>>     __ __
>>
>>     Here are my observations/questions/ideas:____
>>
>>     ____
>>
>>     __a.__I sense that ITS users will have difficulties to decide when
>>     to use “term” and when to use “disambiguation” (the note in the
>>     Working Draft indicates this). ____
>>
>>     __ __
>>
>>     __b.__Annotation of known terms, generation of so-called “term
>>     candidates”, (named) entity recognition, and other automation can be
>>     subsumed under the heading “(automated) text analysis”.____
>>
>>     __ __
>>
>>     I am thus wondering if the following would be worth considering:____
>>
>>     ____
>>
>>     __1.__Enhance the current “disambiguation” so that also the current
>>     “term” can be covered____
>>
>>     __2.__Deprecate “term”____
>>
>>     __3.__Revising some of the terminology used in the spec (e.g.
>>     “disambiguation”, “disambigGranularity”)____
>>
>>     ____
>>
>>     An example use of a revised “disambiguation” (and deprecated “term”)
>>     – partially inspired by ISOCat (see http://www.isocat.org/ ) – is
>>     the following:____
>>
>>     __ __
>>
>>     Data category name: (automated) text analysis annotation (atan/tan);
>>     using “text analysis annotation” would have the advantage that even
>>     manual work (e.g. “promoting a term candidate to a term”) could be
>>     covered____
>>
>>     __ __
>>
>>     Data category “qualifier” (currently “disambigGranularity”):
>>     atan-type or tan-type____
>>
>>     __ __
>>
>>     Values for “qualifier”: lexical, term, termCandidate,
>>     ontological-class, ontological-entity; possibly even URIs such as
>>     http://www.isocat.org/datcat/DC-2275 - would allow rather
>>     fine-grained and under certain provisions standard-conformant (ISO
>>     12620; see http://www.ttt.org/clsframe/datcats.html) annotation____
>>
>>     __ __
>>
>>     Example:____
>>
>>     __ __
>>
>>             <span ____
>>
>>     __ __
>>
>>                its-tan-confidence="0.7"____
>>
>>     __ __
>>
>> its-tan-class-ref="http://nerd.eurecom.fr/ontology#Place"
>>     ____
>>
>>     __ __
>>
>> its-tan-ident-ref="http://dbpedia.org/resource/Dublin" ____
>>
>>     __ __
>>
>>                its-tan-type="
>>     http://www.isocat.org/datcat/DC-2275">Dublin</span> ____
>>
>>     __ __
>>
>>     Cheers,____
>>
>>     Christian____
>>
>
Received on Friday, 11 January 2013 17:17:14 UTC