IRC log of mlw-lt on 2012-11-05

Timestamps are in UTC.

Meeting: MultilingualWeb-LT Working Group Teleconference
16:08:53 [trackbot]
Date: 05 November 2012
16:09:53 [daveL]
topic: agenda
16:09:56 [daveL]
16:10:15 [daveL]
16:11:05 [daveL]
topic; Doodle poll about virtual f2f
16:12:27 [tadej]
16:13:59 [tadej]
daveL: poll shows 27th and 28th to be both good candidates
16:15:08 [tadej]
... I would suggest taking the 27th and 28th, having both around 3 hour calls in the afternoon
16:16:24 [tadej]
... howerver, we should deal with more specific issues beforehand
16:17:01 [tadej]
daveL: Tuesday, Nov 20th is also a good candidate
16:18:46 [tadej]
action: daveL to confirm November 20, 27 and 28 as virtual session dates
16:18:58 [daveL]
topic; upcoming meetings
16:19:02 [daveL]
16:22:11 [tadej]
daveL: checking if the schedule makes sense - so far Prague 23-24 Jan, Rome 12-13 March, Bled 7-8 May, and Madrid still unspecified
16:23:32 [tadej]
daveL: as for events, there's a GALA event, LocWorld, the WWW conference in Rio, and the LRC conference in Limerick
16:24:26 [tadej]
Yves_: the only thing we need to fix is the dates for the Madrid meeting, since July is a holiday month
16:24:36 [Arle]
We may be able to get on the GALA program. I will know more soon.
16:24:45 [tadej]
Pedro: For July, the sooner the better, ideally first week
16:24:58 [tadej]
... or even last week of June
16:25:22 [tadej]
action: daveL to open doodle poll for Madrid dates (end June - beginning July)
16:25:30 [Arle]
(Separate from what Pedro has already submitted, which is a great start.)
16:26:54 [tadej]
topic: Standoff markup
16:26:57 [daveL]
topic; standoff markup
16:27:08 [daveL]
16:29:00 [tadej]
Yves_: we should use a single root element, like its:standOffList (or similarly named). the inclusion mechanism would be via the script element, either inline or separate file
16:30:12 [tadej]
...given the example, it would be better to split the standoff into two separate <script>-s, and have the script element id match the standoff list ids.
16:32:11 [tadej]
Pedro: the external files can be problematic in cases with real-time translation
16:32:26 [tadej]
daveL: do you think the its:rules elements could be the enclosing element?
16:33:32 [tadej]
Yves_: since we need to point to multiple its:standofflists, they can't be the root element, since they could exist in the same file; its:rules could be a root.
16:33:51 [tadej]
daveL: could you correct the schema so it takes this into account?
16:34:51 [tadej]
Yves_: mixing rules and standoff can get messy
16:35:19 [tadej]
daveL: its:rules is easy from the conformance point of view, easier to explain, although there may be confusion
16:36:32 [tadej]
Jirka: there's conceptual overload with this - we'd be declaring its:rules, and it wouldn't contain actual rules, but standoff info
16:37:59 [tadej]
daveL: let's summarize having a single element its:standoffList having an id attribute which matches the script element's id.
16:38:18 [tadej]
... in external files, we could have multiple standoff lists
16:40:42 [tadej]
action: Yves_ to edit the spec to unique standoff markup
16:40:54 [tadej]
action: Yves to edit the spec to unique standoff markup
16:41:48 [daveL]
topic: its-tools
16:41:50 [daveL]
16:43:28 [tadej]
daveL: Marcis sent an update consolidating MT confidence and TA Annotation into simpler definitions
16:43:51 [tadej]
... there's still an open issue on whether defining its:tools should be compulsory for these two data categories. any opinions?
16:44:29 [tadej]
Yves_: sounds reasonable
16:44:38 [tadej]
daveL: I'll modify the text and make it compulsory.
16:46:08 [tadej]
daveL: Marcis also pointed out that several tools could process a fragment of text, which makes things confusing. it's different than MT, since you're annotating an annotation.
16:46:26 [tadej]
... should we then just apply the its:tool to those data categories than have it as a separate data category?
16:47:26 [tadej]
tadej: disambiguation could survive that, it's equivalent
16:47:38 [daveL]
16:47:53 [daveL]
scribe: daveL
16:48:26 [daveL]
tadej: is currently updating its-tools, looking at use of non-its annotations
16:50:56 [tadej]
daveL: right now we have a mechanism to identifiy to which data category it applies to, allowing for user-defined names
16:51:10 [tadej]
16:51:59 [tadej]
daveL: ... since you're borrowing the mechanism anyway, you're out of conformance anyway
16:52:13 [tadej]
daveL: we could remove it, since we don't have a formal extension mechanism
16:54:14 [tadej]
tadej: if we define a per-datacategory confidence attribute, how to express multi-valued attributes?
16:54:54 [Marcis]
I mean, if the domains are automatically identified, then you will have a confidence (if the systems will return probabilistic results)
16:55:22 [Marcis]
As tadej said - the weighted mechanism says that there is a confidence
16:56:16 [tadej]
tadej: It boils down to whether that number is useful for the consumer
16:57:38 [Marcis]
The categories (not in exact names...) that I see requiring the confidence are: MT, Terminology, Domain segmentation tools (are there any currently used by the MT use cases?), Named Entity Recognition (currently in Disambiguation, right?), others (?)
16:58:03 [tadej]
action: daveL to ask for use cases of data category-specific confidence scores
16:59:06 [Ankit]
w.r.t. confidence scores in MT, they are are mainly used in a post-editing environment, i.e. when a human translator uses these scores to determine which outputs of a MT system they want to correct..
17:00:35 [tadej]
tadej: disambiguation can produce scores, but not commonly used
17:02:33 [tadej]
daveL: its:tools has its own element, the its:standOffList - we should describe it how it works within a script element, so it's as similar as possible to the XML markup.
