Re: [Issue-41, issue-42, Action-190, action-194] Draft a section about mtConfidence, based on the discussion from Felix Sasaki on 2012-08-23 (public-multilingualweb-lt@w3.org from August 2012)

From: Felix Sasaki <fsasaki@w3.org>
Date: Thu, 23 Aug 2012 13:45:22 +0200
To: Tadej Stajner <tadej.stajner@ijs.si>
Cc: public-multilingualweb-lt@w3.org
Message-ID: <CAL58czqGNpFs3jxPMcON4yO-fxhvnr1T=ddkBOGiO-u+mGtBqw@mail.gmail.com>
That is an issue, indeed. I assume that the consolidation will not lead to
100% the same definitions. For mtconfidence and taa, I see more similarity.

Felix

2012/8/23 Tadej Stajner <tadej.stajner@ijs.si>

>  Hi,
> following up on the idea of consolidating textAnalysisAnnotation with
> something else, like qualityReviewAgent, or provenance: intuitively, it
> should fit nicely, but I would point out that textAnalysisAnnotation talks
> about other annotations in the document ('this <its:disambiguation> was
> produced by that tool') , not the document's content ('the quality of that
> translation is good'). In a sense, its meta-metadata. Is this difference in
> targets an issue for consolidation?
>
> -- Tadej
>
>
> On 8/23/2012 9:41 AM, Felix Sasaki wrote:
>
> Hi Dave, we discussed this on the call during your absence. The general
> opinion was that the the information needed for mtconfidence, quality and
> disambiguation is very similar and very specific. I had a brief look at the
> drafts for the three data categories and came to that conclusion, hence the
> issue-42 (drafted before the call).
>
>  The also discussed whether this would be a separate data category, or
> whether we need to interrelate data category. Instead of going this path
> the rough consensus was that it is OK if the three data categories convey
> the same information - we should just try to harmonize the description of
> aspects like score or tool identification. Hence my action-194 related to
> issue-42, to come up with such a harmonization proposal. I am not sure yet
> if I'll get to it before the call.
>
>  Best,
>
>  Felix
>
> Am Donnerstag, 23. August 2012 schrieb Dave Lewis :
>
>  Felix,
>> I've only now got to your post on ISSUE-42 :
>>
>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Aug/0149.html
>>
>> I think with the combination of mtconfidence score and translationAgent I
>> suggest below is suggesting pretty much the same thing, just arrived at via
>> a different route. Was that the direction you were heading in?
>>
>> The translationReviewAgent could work similarly with quality, and we
>> could add a sourceReviewAgent or terminologyReviewAgent, or generalise
>> translationReviewAgent or qualityReviewAgent to address
>> textAnalysisAnnotation.
>>
>> One point here is that as different data categories are separably
>> conformant, will specifications of how they are used in combination
>> essentially have to be non-normative, or would we need a distinct normative
>> data category in combination section?
>>
>> cheers,
>> Dave
>>
>> On 23/08/2012 01:56, Dave Lewis wrote:
>>
>>> Yves, David,
>>> Apologies coming to this thread a bit late. You've already pointed out
>>> that the score needs to be mostly local, i.e. per segment as passed to an
>>> MT service, while the definition of providers/engine would be more likely
>>> global, i.e. the same engine would be used for most segments in a document.
>>> We also have distinct use cases where only the score is relevant or where
>>> the score and the service is needed. So it seems that two data categories
>>> would suite, one for score and one for identifying the engine.
>>>
>>> We do however already a way of identifying an MT service that has been
>>> used on a document or its segments, in the form of translationAgent (see
>>> call for concensus
>>> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jul/0256.html
>>> ).
>>>
>>> I propose therefore that translationAgent used in conjunction with an
>>> mtConfidence score data category that had just one score attribute would
>>> therefore cover the different use cases while also supporting the existing
>>> use cases outlined for translationAgent.
>>>
>>> Note translationAgent allows multiple agents to be specified, but
>>> doesn't concern itself with distinguishing the types of agent, e.g.
>>> provider/organisation from software/engine, though both are possible. The
>>> form of the ID or the result of dereferencing it is assumed to address
>>> this, given the lack of common namign schemes for organsiations or engines.
>>>
>>> I'd be happy anyway to include the example IDs from mtConfidence engine
>>> attribute into translationAgent - as these are sensible ideas, and
>>> something we could address more comprehensively as best practice next year.
>>>
>>> cheers,
>>> Dave
>>>
>>>
>>>
>>> On 09/08/2012 13:56, Yves Savourel wrote:
>>>
>>>>  The end user who does not understand this MUST NOT be exposed to values
>>>>> >coming from mixed engines/producers.
>>>>> >In other words it is OK to DISPLAY SCORE ONLY TO THE END USER
>>>>> >if you have ensured up the stream that they DO come from the same
>>>>> >producer AND engine.
>>>>> >Again not sure how to cut this with defaults, as the defaults would
>>>>> >collapse filtering.
>>>>>
>>>> Again all this applies only when you have translations for different
>>>> providers/engines for the same text. That only one part of the scenarios.
>>>>
>>>>   In any case, the bottom line is that making a local attribute
>>>> presence required or not based on whether a global one is present or not is
>>>> not easily implementable. It could be defined in an linked rule file for
>>>> example.
>>>>
>>>> What I think you really try to do is make sure a value is define for
>>>> mtProducer and mtEngine. I don't agree that one is always need, but that is
>>>> a different topic (as discussed above). But if we decide one is needed, we
>>>> can just state that one must be define. It doesn't make sense to me to try
>>>> to define how or where it should be defined: the inheritance takes care of
>>>> that.
>>>>
>>>
>>>
>>>
>>
>>
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow
Received on Thursday, 23 August 2012 11:45:51 UTC