RE: Tool info specification (Re: action-221 summary of overriding discussion) from Yves Savourel on 2012-09-21 (public-multilingualweb-lt@w3.org from September 2012)

From: Yves Savourel <ysavourel@enlaso.com>
Date: Fri, 21 Sep 2012 10:22:54 -0600
To: "'Felix Sasaki'" <fsasaki@w3.org>
CC: <public-multilingualweb-lt@w3.org>
Message-ID: <assp.0611008315.assp.06113398cb.007401cd9815$5b0d8400$11288c00$@com>
> I think the issues you mention can be resolved, but first we'd need 
> to agree on the following:
> ...
> Information about tools used for producing metadata (+content) 
> is orthogonal to data categories

Shockingly some of us don't have PhDs and, not being completely familiar with the academic lingo, may need a specific definition of what 'orthogonal' exactly means in this context :)

For me, I agree that the information about the tool that was used to annotate the document is un-related to the information of the data category itself.
With one exception: somewhere in the data category information there should be a way to point to the tool information.

-yves



From: Felix Sasaki [mailto:fsasaki@w3.org] 
Sent: Friday, September 21, 2012 8:13 AM
To: Yves Savourel
Cc: public-multilingualweb-lt@w3.org
Subject: Re: Tool info specification (Re: action-221 summary of overriding discussion)

Hi Yves,
2012/9/21 Yves Savourel <ysavourel@enlaso.com>
Thanks for the example Felix,

> ... All tool specifications allow for identifying the relevant
> data categories. In that way it becomes explicit that e.g. a
> certain MT tool is relevant for mt-confidence.
>
> the tool specifications have "id" attributes, e.g. "t-2" for "bing" translator.
> Yves' requirement of referring to tool info from a piece of XLIFF could be
> realized by referring to the ID attribute.
How exactly the relationship between the local data category markup and the tool is expressed?

Currently not at all.
 

It seems you are saying: the ITS way is to look at the itsDataCategoryIdentifer element in the tool info.
That's clumsy IMO, but it is indeed preventing any tool-specific data on the data category side.

Correct, that's a huge benefit IMO: to separate the metadata itself from information about production of metadata - or in the case production of content+metadata.
 

But the case for several tools used for the same data category is not really catered for.

Correct.
 
When you say "referring to tool info from a piece of XLIFF could be realized by referring to the ID attribute" who is defining the attribute that does the referring? ITS or XLIFF?

Good question :) In my mind it was XLIFF, but obviously you are pushing for a mechanism on the ITS side.
 

If it's XLIFF, then I disagree: I think the ITS mechanism must have provision for both cases. (Actually I even think the MT case would tend to favor that multi-tool case: knowing which tool produced a given MT is probably more relevant when you have several candidates).

Having such provision probably means some kind of tool-ref attribute in each data category using the tool information.
Which means it probably needs to be specify for each local occurrence over and over again.
We're back to square one, admittedly now with only one attribute referring to the tool info rather than with all the tool info... I suppose that's a progress :)

Yes, that's a progress :)

I think the issues you mention can be resolved, but first we'd need to agree on the following:
- Partial inheritance is out of scope
- Information about tools used for producing metadata (+content) is orthogonal to data categories


Now, if we agree on that, I think it would be OK to have a data category "ITS Tool information" which is available both locally and globally. Locally, it would have the tool references you mention, e.g.

<span its:tool-ref="#t1" ...> (in tool-ref there might be a comma-separated list of "ref" values)
meaning Enrycher and the "disambiguation" data data category have been used to create metadata for the content of "span". We could also have a global rule like

<its:toolInfoRule selector="trans-unit/target" tool-ref="#t-2"/>
meaning that "Bing" translate has been used to create translated content and the mt confidence score information.

What is the difference to previous approaches? With the above we don't change selection at all and actually don't see anything about the relation between data categories. E.g. there might be no disambiguation or mt-confidence annotation at all. The "toolInfo" data category allows applications to interrelate the annotations, if they are available - but we don't require testing that and don't create new conformance claims. That's a huge benefit IMO.

If in above approach there is a "local" tool-ref attribute, that would inherit in the document. So since Declan and Tadej need a "document only" solution without XPath, that global approach would accomodate that.

The "new" ITS mechanism of referencing is actually not new: we do that with standoff in localization quality issue already. And it seems that in the new draft of Provenance, standoff also would be much more appropriate, instead of too much usage of pointer attributes.

Best,

Felix
Received on Friday, 21 September 2012 16:23:27 UTC