Re: [Action-126] David to come up with a proposal for mtConfidence

Hi David, all,

2012/7/31 Dr. David Filip <David.Filip@ul.ie>

> HI all, I was trying to engage a PhD student here at LRC to produce a
> proposal for this data category but I failed.
>
> Nevertheless, here is my thinking on the category that maybe someone else
> (Declan?) could take it to the call for consensus stage.
>

co-chair hat on: If there is no strong support for this, I would propose to
put this on hold until we have finished all other data categories. As you
wrote in your agenda,

http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Jul/0311.html

we have various data categories proposals on the table that are not
finished: special requirements, named entity, quality, ... I will send a
proposal for the time until last call later today, which will show that we
need to finish these and the various "ed. notes" in
http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html
I think we need the time and your input to work on these.

I very much hope for your understanding - let's discuss this also during
the call on Thursday,

Felix


>
> I believe that mtConfidence is being produced in some form or other by all
> major current MT systems. as discussed in Dublin, the issue is that these
> confidence scores are not really comparable between engines, I mean not
> only between Ging and Google, or Matrex, but even not between different
> pair engines or even specific domain trained engines based on the same
> general technology.
>
> Nevertheless there are prospects for standardizing based on cognitive
> effort on post-editing etc. Even knowing that the usability of confidence
> scores is limited, there are valid production-consumption scenarios in the
> content lifecycle.
> If a client/service provider/translator/reviewer do repeatedly work with
> the same engine, they will find even the engines self evaluation useful.
>
> Further to this, there is potential of connecting this with automated and
> human MT evaluation scores, so I'd propose to generalize as mtQuality
> [mening raw MT quality, NOT talking about levels of PE] that would subume
> mtConfidence etc. as seen below
>
> My proposal of the data model based on the above
>
> -mtQuality
> --mtConfidence
> ---mtProducer [string identifying producer Bing, DCU-Matrex etc.]
> ----mtEngine [string identifying the engine on one of the above platforms,
> can be potentially quite structured, pair domain etc.]
> -----mtConfidenceScore [0-100% or interval 0-1]
> --mtAutomatedMetrics
> ---mtScoreType [METEOR, TER, BLEU, Levensthein distance etc.]
> ----mtAutomatedMetricsScore [0-100% or interval 0-1]
> --mtHumanMetrics
> ---mtHumanMetricsScale [{4,3,2,1,0},{0,1,2,3,4}.{3,2,1,0} etc.]
> ----mtHumanMetricsValue [one of the above values depending on scale]
>
> mtQuality is an optional attribute of a machine text segment (as in
> Unicode or localization segmentations). I do not think this is useful on
> higher or lower levels.
>
> mtQuality must be specified as mtConfidence XOR mtAutomatedMetrics
> XOR mtHumanMetrics
>
> Then comes the compulsory specification the actual value (eventaully
> preceded by value change if more options exist)..
>
> Cheers
> dF
>
>
> Dr. David Filip
> =======================
> LRC | CNGL | LT-Web | CSIS
> University of Limerick, Ireland
> telephone: +353-6120-2781
> *cellphone: +353-86-0222-158*
> facsimile: +353-6120-2734
> mailto: david.filip@ul.ie
>
>


-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Wednesday, 1 August 2012 07:50:59 UTC