Re: [Issue-41][Action-190] Draft a section about mtConfidence, based on the discussion

Hi David, Yves, all,

2012/8/9 Yves Savourel <ysavourel@enlaso.com>

> Thanks for the explanations David.
>
>
> > the XOR just includes slightly different header
> > as another example inserted in the example
>
> IMO examples should be straight real files, that we can actually process.
>

+1  - note that the examples are the basis of our test suite - in creating
the spec we follow a literate programming approach that creates from one
source the HTML for human readers, the schemas (for the XML world) and
input for the test suite. That helps consistency, but it then requires that
we create real examples now, since otherwise they are useless for testing.


> If we want to show two ways to do something we should use two separate
> examples.
>
>
> > The value "en-t-cs" follow the t extension
> > syntax from BCP 47, so it means English transformed
> > from Czech. I am aware that the usual MT pair
> > convention is the other way round (I use this
> > convention in the private string examples), but I
> > thought that the t extension would find valid usage here..
>
> I see that now. So none of my notes about text that should be translated
> stand.
>
> This said, I'm not sure using the t extension is a good way to identify an
> 'engine', in addition to be counter intuitive we don't really intend to
> standardize that value don't we? So I suppose one example can use it, but
> we could have several other examples maybe.
>
>
> > I do not understand this part at all. MT candidate translations
> > are always 100% matches in the terms of TM matching.
> > The self-reported confidence expresses what might be the
> > chance that the 100% match is accurate/usable.. I do not
> > think we need a combined value here. And this is also a
> > reason why XLIFF would need a separate mechanism for
> > reporting the confidence, we could not overload the normal
> > match rate..
>
> I guess the point I was making was that Bing doesn't provide 0-100%
> confidence score. So if we use this as an example we should explain how we
> get it. Or use another example.
>
>
> >> I'm not understanding why it's there. I think you
> >> mean that the global rule must not use that attribute.
> >> Then just don't say anything. If it's not listed it
> >> cannot be used (it's just not an attribute of
> >> <mtConfidenceRule>)
> >
> > It is true, still in my experience redundancy serves
> > the purpose of absolute clarity
>
> As a developer I'm utterly confused to see a mention of an attribute that
> does not exist in that element.
>
>
> > Well, the whole point is that the score is
> > worth nothing at all if you do not know what
> > the producer and engine are. I first thought that
> > GLOBAL does not make sense at all for confidence.
> > But later reintroduced GLOBAL for producer and
> > engine, as they are likely to be the same throughout
> > the whole document in many scenarios, so that
> > you can save lot of space not specifying them for
> > each and every segment. So mtProducer and mtEngine
> > are only optional at the local level if thez
> > have been specified at the gloabl level
>
> So your real goal is to have a value set for mtProducer and mtEngine when
> we have a local mtConfidenceScore. You don't really care how or where it is
> set, right?
> Then we should have defaults for those values. Validating if those
> attribute should be defined locally or not based on whether they are
> defined at a higher level is going to be very difficult to implement.
>

+1


>
>
> > And there must be a processing requirement to
> > move them onto the segment level should the
> > header be separated during processing..
>
> Some formats may not allow those attribute elsewhere than the top of the
> document. But anyway, I don't think we can have such processing
> requirements for ITS. Default values solve all this as far as I can tell.
>
> I would also disagree that the score is worth nothing without the
> mtProvider and mtEngine values. Actually, in many scenarios knowing the
> provider or the engine means diddly squat to the end-users. They just care
> about the score.
>
>
> Cheers,
> -yves
>
>
>
>
Not sure where this fits into this thread, but: I was confused by the three
MAYs here:

[

    - This MAY be a privately structured string, eg. Domain:IT-Pair:IT-JA,
      IT-JA:Medical, etc.
      - This MAY be a BCP 47 language tag WITH t-extension, e.g. ja-t-it
      for an Italian to Japanese MT engine

This MAY specify a Domain as per the 6.9 Domain [link to 6.9] data category
]


RFC 2119 MAY expresses something which is beyond our (=spec writers)
control. The examples of MAY you find in ITS 2.0 are all like that, e.g.

This specification defines two types of conformance: conformance of 1) ITS
markup declarations<http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#conformance-product-schema>
,
and conformance of 2) processing expectations for ITS
Markup<http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#conformance-product-processing-expectations>.
These conformance types complement each other. An implementation of this
specification MAY<http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#rfc2119>
use
them separately or together.
[image: Go to the table of
contents.]<http://www.w3.org/International/multilingualweb/lt/drafts/its20/its20.html#contents>
Also, MAY is used rarely, because at the end it doesn't convey really
implementation relevant information - this is conveyed by MUST (NOT) or (if
really needed) SHOULD (NOT). I think the three MAYs above are examples - so
I would propose to create examples demonstrating what you say - if
possible, with real life data (see above).

Best,

Felix

-- 
Felix Sasaki
DFKI / W3C Fellow

Received on Thursday, 9 August 2012 04:34:44 UTC