IRC log of mlw-lt on 2012-11-05

Timestamps are in UTC.

16:08:48 [RRSAgent]
RRSAgent has joined #mlw-lt
16:08:49 [RRSAgent]
logging to
16:08:50 [trackbot]
RRSAgent, make logs world
16:08:50 [Zakim]
Zakim has joined #mlw-lt
16:08:52 [trackbot]
Zakim, this will be
16:08:52 [Zakim]
I don't understand 'this will be', trackbot
16:08:53 [trackbot]
Meeting: MultilingualWeb-LT Working Group Teleconference
16:08:53 [trackbot]
Date: 05 November 2012
16:09:17 [daveL]
zakim, who's here ?
16:09:17 [Zakim]
sorry, daveL, I don't know what conference this is
16:09:19 [Zakim]
On IRC I see RRSAgent, Pedro, Marcis, daveL, Jirka, DomJones, leroy, Yves_, Ankit, tadej, omstefanov, kfritsche, Arle, timeless, tpacbot, trackbot, fantasai, shaunm
16:09:29 [Naoto]
Naoto has joined #mlw-lt
16:09:53 [daveL]
topic: agenda
16:09:56 [daveL]
16:10:15 [daveL]
16:10:58 [mdelolmo]
mdelolmo has joined #mlw-lt
16:11:05 [daveL]
topic; Doodle poll about virtual f2f
16:12:27 [tadej]
16:13:49 [Milan]
Milan has joined #mlw-lt
16:13:59 [tadej]
daveL: poll shows 27th and 28th to be both good candidates
16:15:08 [tadej]
... I would suggest taking the 27th and 28th, having both around 3 hour calls in the afternoon
16:16:24 [tadej]
... howerver, we should deal with more specific issues beforehand
16:17:01 [tadej]
daveL: Tuesday, Nov 20th is also a good candidate
16:18:46 [tadej]
action: daveL to confirm November 20, 27 and 28 as virtual session dates
16:18:46 [trackbot]
Created ACTION-278 - Confirm November 20, 27 and 28 as virtual session dates [on David Lewis - due 2012-11-12].
16:18:58 [daveL]
topic; upcoming meetings
16:19:02 [daveL]
16:20:17 [shaunm_]
shaunm_ has joined #mlw-lt
16:22:11 [tadej]
daveL: checking if the schedule makes sense - so far Prague 23-24 Jan, Rome 12-13 March, Bled 7-8 May, and Madrid still unspecified
16:23:32 [tadej]
daveL: as for events, there's a GALA event, LocWorld, the WWW conference in Rio, and the LRC conference in Limerick
16:24:26 [tadej]
Yves_: the only thing we need to fix is the dates for the Madrid meeting, since July is a holiday month
16:24:36 [Arle]
We may be able to get on the GALA program. I will know more soon.
16:24:45 [tadej]
Pedro: For July, the sooner the better, ideally first week
16:24:58 [tadej]
... or even last week of June
16:25:22 [tadej]
action: daveL to open doodle poll for Madrid dates (end June - beginning July)
16:25:26 [trackbot]
Created ACTION-279 - Open doodle poll for Madrid dates (end June - beginning July) [on David Lewis - due 2012-11-12].
16:25:30 [Arle]
(Separate from what Pedro has already submitted, which is a great start.)
16:26:54 [tadej]
topic: Standoff markup
16:26:57 [daveL]
topic; standoff markup
16:27:08 [daveL]
16:29:00 [tadej]
Yves_: we should use a single root element, like its:standOffList (or similarly named). the inclusion mechanism would be via the script element, either inline or separate file
16:30:12 [tadej]
...given the example, it would be better to split the standoff into two separate <script>-s, and have the script element id match the standoff list ids.
16:32:11 [tadej]
Pedro: the external files can be problematic in cases with real-time translation
16:32:26 [tadej]
daveL: do you think the its:rules elements could be the enclosing element?
16:33:32 [tadej]
Yves_: since we need to point to multiple its:standofflists, they can't be the root element, since they could exist in the same file; its:rules could be a root.
16:33:51 [tadej]
daveL: could you correct the schema so it takes this into account?
16:34:51 [tadej]
Yves_: mixing rules and standoff can get messy
16:35:19 [tadej]
daveL: its:rules is easy from the conformance point of view, easier to explain, although there may be confusion
16:36:32 [tadej]
Jirka: there's conceptual overload with this - we'd be declaring its:rules, and it wouldn't contain actual rules, but standoff info
16:36:59 [dF]
dF has joined #mlw-lt
16:37:59 [tadej]
daveL: let's summarize having a single element its:standoffList having an id attribute which matches the script element's id.
16:38:18 [tadej]
... in external files, we could have multiple standoff lists
16:40:42 [tadej]
action: Yves_ to edit the spec to unique standoff markup
16:40:42 [trackbot]
Sorry, couldn't find Yves_. You can review and register nicknames at <>.
16:40:54 [tadej]
action: Yves to edit the spec to unique standoff markup
16:40:55 [trackbot]
Created ACTION-280 - Edit the spec to unique standoff markup [on Yves Savourel - due 2012-11-12].
16:41:48 [daveL]
topic: its-tools
16:41:50 [daveL]
16:43:28 [tadej]
daveL: Marcis sent an update consolidating MT confidence and TA Annotation into simpler definitions
16:43:51 [tadej]
... there's still an open issue on whether defining its:tools should be compulsory for these two data categories. any opinions?
16:44:29 [tadej]
Yves_: sounds reasonable
16:44:38 [tadej]
daveL: I'll modify the text and make it compulsory.
16:46:08 [tadej]
daveL: Marcis also pointed out that several tools could process a fragment of text, which makes things confusing. it's different than MT, since you're annotating an annotation.
16:46:26 [tadej]
... should we then just apply the its:tool to those data categories than have it as a separate data category?
16:47:26 [tadej]
tadej: disambiguation could survive that, it's equivalent
16:47:38 [daveL]
16:47:53 [daveL]
scribe: daveL
16:48:26 [daveL]
tadej: is currently updating its-tools, looking at use of non-its annotations
16:50:56 [tadej]
daveL: right now we have a mechanism to identifiy to which data category it applies to, allowing for user-defined names
16:51:10 [tadej]
16:51:59 [tadej]
daveL: ... since you're borrowing the mechanism anyway, you're out of conformance anyway
16:52:13 [tadej]
daveL: we could remove it, since we don't have a formal extension mechanism
16:54:11 [Marcis]
I hear you, I just cannot say anything
16:54:14 [tadej]
tadej: if we define a per-datacategory confidence attribute, how to express multi-valued attributes?
16:54:54 [Marcis]
I mean, if the domains are automatically identified, then you will have a confidence (if the systems will return probabilistic results)
16:55:22 [Marcis]
As tadej said - the weighted mechanism says that there is a confidence
16:56:16 [tadej]
tadej: It boils down to whether that number is useful for the consumer
16:57:38 [Marcis]
The categories (not in exact names...) that I see requiring the confidence are: MT, Terminology, Domain segmentation tools (are there any currently used by the MT use cases?), Named Entity Recognition (currently in Disambiguation, right?), others (?)
16:58:03 [tadej]
action: daveL to ask for use cases of data category-specific confidence scores
16:58:03 [trackbot]
Created ACTION-281 - Ask for use cases of data category-specific confidence scores [on David Lewis - due 2012-11-12].
16:59:06 [Ankit]
w.r.t. confidence scores in MT, they are are mainly used in a post-editing environment, i.e. when a human translator uses these scores to determine which outputs of a MT system they want to correct..
17:00:35 [tadej]
tadej: disambiguation can produce scores, but not commonly used
17:02:33 [tadej]
daveL: its:tools has its own element, the its:standOffList - we should describe it how it works within a script element, so it's as similar as possible to the XML markup.
17:03:13 [DomJones]
DomJones has left #mlw-lt
17:03:20 [tadej]
RRSAgent, draft minutes
17:03:20 [RRSAgent]
I have made the request to generate tadej
17:04:07 [Jirka]
Jirka has left #mlw-lt
17:05:42 [shaunm]
shaunm has left #mlw-lt
17:07:14 [Marcis]
Marcis has left #mlw-lt
18:18:19 [Fredrik]
Fredrik has joined #mlw-lt
18:34:33 [Zakim]
Zakim has left #mlw-lt
20:55:13 [timeless]
timeless has left #mlw-lt