MultilingualWeb-LT Working Group Teleconference -- 05 Nov 2012

agenda

http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0024.html

http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0026.html

topic; Doodle poll about virtual f2f

<tadej> http://doodle.com/heh7k59h7vkvnv88#table

<tadej> daveL: poll shows 27th and 28th to be both good candidates

<tadej> ... I would suggest taking the 27th and 28th, having both around 3 hour calls in the afternoon

<tadej> ... howerver, we should deal with more specific issues beforehand

<tadej> daveL: Tuesday, Nov 20th is also a good candidate

<tadej> ACTION: daveL to confirm November 20, 27 and 28 as virtual session dates [recorded in http://www.w3.org/2012/11/05-mlw-lt-minutes.html#action01]

<trackbot> Created ACTION-278 - Confirm November 20, 27 and 28 as virtual session dates [on David Lewis - due 2012-11-12].

topic; upcoming meetings

http://www.w3.org/International/multilingualweb/lt/wiki/Main_Page#Upcoming

<tadej> daveL: checking if the schedule makes sense - so far Prague 23-24 Jan, Rome 12-13 March, Bled 7-8 May, and Madrid still unspecified

<tadej> daveL: as for events, there's a GALA event, LocWorld, the WWW conference in Rio, and the LRC conference in Limerick

<tadej> Yves_: the only thing we need to fix is the dates for the Madrid meeting, since July is a holiday month

<Arle> We may be able to get on the GALA program. I will know more soon.

<tadej> Pedro: For July, the sooner the better, ideally first week

<tadej> ... or even last week of June

<tadej> ACTION: daveL to open doodle poll for Madrid dates (end June - beginning July) [recorded in http://www.w3.org/2012/11/05-mlw-lt-minutes.html#action02]

<trackbot> Created ACTION-279 - Open doodle poll for Madrid dates (end June - beginning July) [on David Lewis - due 2012-11-12].

<Arle> (Separate from what Pedro has already submitted, which is a great start.)

Standoff markup

topic; standoff markup

http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0019.html

<tadej> Yves_: we should use a single root element, like its:standOffList (or similarly named). the inclusion mechanism would be via the script element, either inline or separate file

<tadej> ...given the example, it would be better to split the standoff into two separate <script>-s, and have the script element id match the standoff list ids.

<tadej> Pedro: the external files can be problematic in cases with real-time translation

<tadej> daveL: do you think the its:rules elements could be the enclosing element?

<tadej> Yves_: since we need to point to multiple its:standofflists, they can't be the root element, since they could exist in the same file; its:rules could be a root.

<tadej> daveL: could you correct the schema so it takes this into account?

<tadej> Yves_: mixing rules and standoff can get messy

<tadej> daveL: its:rules is easy from the conformance point of view, easier to explain, although there may be confusion

<tadej> Jirka: there's conceptual overload with this - we'd be declaring its:rules, and it wouldn't contain actual rules, but standoff info

<tadej> daveL: let's summarize having a single element its:standoffList having an id attribute which matches the script element's id.

<tadej> ... in external files, we could have multiple standoff lists

<tadej> ACTION: Yves_ to edit the spec to unique standoff markup [recorded in http://www.w3.org/2012/11/05-mlw-lt-minutes.html#action03]

<trackbot> Sorry, couldn't find Yves_. You can review and register nicknames at <http://www.w3.org/International/multilingualweb/lt/track/users>.

<tadej> ACTION: Yves to edit the spec to unique standoff markup [recorded in http://www.w3.org/2012/11/05-mlw-lt-minutes.html#action04]

<trackbot> Created ACTION-280 - Edit the spec to unique standoff markup [on Yves Savourel - due 2012-11-12].

its-tools

http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0004.html

<tadej> daveL: Marcis sent an update consolidating MT confidence and TA Annotation into simpler definitions

<tadej> ... there's still an open issue on whether defining its:tools should be compulsory for these two data categories. any opinions?

<tadej> Yves_: sounds reasonable

<tadej> daveL: I'll modify the text and make it compulsory.

<tadej> daveL: Marcis also pointed out that several tools could process a fragment of text, which makes things confusing. it's different than MT, since you're annotating an annotation.

<tadej> ... should we then just apply the its:tool to those data categories than have it as a separate data category?

<tadej> tadej: disambiguation could survive that, it's equivalent

http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Nov/0006.html

<scribe> scribe: daveL

tadej: is currently updating its-tools, looking at use of non-its annotations

<tadej> daveL: right now we have a mechanism to identify to which data category it applies to, allowing for user-defined names

<tadej> daveL: ... since you're borrowing the mechanism anyway, you're out of conformance anyway

<tadej> daveL: we could remove it, since we don't have a formal extension mechanism

<Marcis> I hear you, I just cannot say anything

<tadej> tadej: if we define a per-datacategory confidence attribute, how to express multi-valued attributes?

<Marcis> I mean, if the domains are automatically identified, then you will have a confidence (if the systems will return probabilistic results)

<Marcis> As tadej said - the weighted mechanism says that there is a confidence

<tadej> tadej: It boils down to whether that number is useful for the consumer

<Marcis> The categories (not in exact names...) that I see requiring the confidence are: MT, Terminology, Domain segmentation tools (are there any currently used by the MT use cases?), Named Entity Recognition (currently in Disambiguation, right?), others (?)

<tadej> ACTION: daveL to ask for use cases of data category-specific confidence scores [recorded in http://www.w3.org/2012/11/05-mlw-lt-minutes.html#action05]

<trackbot> Created ACTION-281 - Ask for use cases of data category-specific confidence scores [on David Lewis - due 2012-11-12].

<Ankit> w.r.t. confidence scores in MT, they are are mainly used in a post-editing environment, i.e. when a human translator uses these scores to determine which outputs of a MT system they want to correct..

<tadej> tadej: disambiguation can produce scores, but not commonly used

<tadej> daveL: its:tools has its own element, the its:standOffList - we should describe it how it works within a script element, so it's as similar as possible to the XML markup.

- DRAFT -

MultilingualWeb-LT Working Group Teleconference

05 Nov 2012

Attendees

Contents

agenda

Standoff markup

its-tools

Summary of Action Items