IRC log of mlw-lt on 2013-01-16

Timestamps are in UTC.

14:50:32 [fsasaki]
meeting: MLW-LT WG
14:50:34 [fsasaki]
chair: felix
14:50:51 [fsasaki]
14:51:01 [fsasaki]
topic: roll call
14:51:06 [fsasaki]
tbd ...
14:53:39 [fsasaki]
present+ felix
14:53:45 [kfritsche]
present+ karl
14:54:18 [Marcis]
present+ Marcis
14:55:20 [philr]
14:55:26 [philr]
present +philr
14:55:52 [daveL]
14:56:01 [fsasaki]
regrets+ dom
14:57:44 [Ankit]
14:58:20 [leroy]
present+ leroy
14:58:23 [Ankit]
present +Ankit
14:58:45 [joerg]
14:59:00 [shaunm]
present+ shaunm
14:59:37 [joerg]
present+ joerg
14:59:50 [mdelolmo]
15:00:15 [Clemens]
15:00:57 [Jirka]
15:00:58 [Clemens]
present + Clemens
15:01:03 [Des]
15:01:04 [Jirka]
present +Jirka
15:01:12 [Clemens]
present +Clemens
15:01:23 [fsasaki]
present+ dave
15:01:26 [Des]
present +Des
15:01:52 [mdelolmo]
present+ mdelolmo
15:02:04 [renatb]
present+ renatb
15:02:45 [Arle]
15:02:51 [pnietoca]
15:03:14 [fsasaki]
scribe: daveL
15:03:36 [fsasaki]
15:03:50 [fsasaki]
topic: Meeting time
15:03:56 [Arle]
Can someone post the Germany dial in info?
15:04:02 [fsasaki]
15:04:31 [joerg]
+49 (0) 811 8899 6930
15:04:52 [tadej]
15:04:54 [omstefanov]
15:05:16 [Arle]
And the meeting number?
15:05:17 [Yves_]
15:05:27 [pnietoca]
+present pnietoca
15:05:29 [Yves_]
present+ Yves
15:05:44 [fsasaki]
present+ guiseppe
15:05:53 [omstefanov]
can someone please post the gotomeeting link, I lost it
15:06:05 [omstefanov]
+present omstefanov
15:06:10 [Yves_]
15:06:14 [tadej]
+present tadej
15:07:31 [omstefanov]
15:07:49 [daveL]
felix; there is no apparent slot that works. felix willl distribute a weekly alternating proposal
15:08:01 [fsasaki]
topic: state of XLIFF mapping
15:08:36 [fsasaki]
scribe: fsasaki
15:08:46 [fsasaki]
dave: haven't updated the mapping page a lot
15:09:05 [fsasaki]
.. there is more work to be done to formalize the mapping
15:09:11 [fsasaki]
.. and come up with examples
15:09:24 [fsasaki]
.. I think we won't to focus on XLIFF 1.2 mapping first
15:09:39 [fsasaki]
.. we were hoping that XLIFF 2 would be stable, but there is a dely
15:09:44 [fsasaki]
15:10:01 [fsasaki]
dave: focus on XLIFF 1.2 also helps with putting a demonstrator together
15:10:08 [fsasaki]
yves: dave summarized everythign right
15:10:19 [fsasaki]
.. in okapi we implemented ITS mapping on what we have
15:10:26 [fsasaki]
.. it is partially implemented, ongoing
15:10:36 [fsasaki]
dave: we will come back shortly on that
15:10:39 [Pedro]
15:10:56 [fsasaki]
.. wrt to interop between solas and CMS lion, also using okapi
15:11:03 [fsasaki]
.. with the preparation for rome
15:11:22 [fsasaki]
phil: it is now on our critical path for our implementation
15:11:32 [fsasaki]
.. david said he would have a prototype a few weeks ago
15:11:39 [fsasaki]
.. even if there is nothing final
15:11:47 [fsasaki]
.. even if we would have a rough direction
15:12:00 [fsasaki]
.. e.g. yves said that with xliff 1.2, he would use mrk markup
15:12:18 [fsasaki]
.. even if we had directions what is easily acceptable
15:12:26 [fsasaki]
.. otherwise it could hold up my implemetnation
15:12:38 [fsasaki]
yves: the xliff 1.2 mapping is what we used for implementations
15:12:45 [fsasaki]
.. most of the time it made sense
15:13:02 [fsasaki]
.. we have tackled some of the standoff stuff
15:13:10 [giuseppe]
15:13:15 [fsasaki]
.. it is also in the git repository (for okapi, scribe assumes)?
15:13:26 [Yves_]
15:13:55 [fsasaki]
phil: provenance and loc quality issue, rating are relevant for us here
15:13:56 [Yves_]
15:14:40 [fsasaki]
phil: Yves' page for 1.2. we can certainly use that as our direction
15:14:50 [fsasaki]
dave: will talk to david tomorrow about that
15:14:53 [fsasaki]
phil: tx
15:15:15 [fsasaki]
topic: New value for localization quality type "conformance"
15:15:19 [daveL]
scribe: daveL
15:15:45 [daveL]
felix: asks if anyone has further thoughts, or supported for this new type
15:15:52 [fsasaki]
topic: Regular expression change
15:15:56 [daveL]
... no respeonses yet
15:16:39 [daveL]
shaun: no update on this
15:17:07 [daveL]
action: shaunm to work on regex for validating regex subset proposal
15:17:39 [fsasaki]
action: shaun to work on regex for validating regex subset proposal
15:17:54 [fsasaki]
topic: Disambiguation and term
15:18:34 [daveL]
topic: disambiguation and term
15:18:55 [daveL]
felix: has been discussed in response to christian comment
15:19:05 [daveL]
... any further comments
15:19:21 [daveL]
marcis: what is the goal?
15:19:47 [daveL]
felix: christian suggested merging term and disambig data categories
15:20:23 [daveL]
... but response was that both had distinct use cases, that could merge by are valid individually
15:20:58 [daveL]
marcis: would not want to drop data category, term is easier to implement and purpose is clear
15:21:25 [daveL]
... not so clear on disambiguation category, in terms of what is possible to do with this
15:21:44 [Milan]
15:22:05 [daveL]
... for example there may be other types that might be useful in the disambiguation use case
15:22:42 [daveL]
... and doing term management with disambig would make it very heavy
15:23:02 [fsasaki]
present+ milan, tadej
15:23:35 [daveL]
marcis: so there might need to be more atribute specifically for named entity
15:23:46 [daveL]
... referencing input form W3C india recvied today
15:24:39 [daveL]
tadej: motivation for separate data category was because it covered some use cases that fell out of the scope of terminology
15:25:07 [daveL]
... by providing some additional context
15:25:23 [daveL]
... but do see that there is some commonality
15:25:56 [daveL]
... Also term must remain to keep compatibility with named entity 1
15:26:26 [daveL]
correction, > with terminology in ITS1
15:26:57 [daveL]
jorge: still in favour of having the two data categories
15:27:28 [daveL]
... since dismabiguation can cover many other tasks in content or NLP processing
15:27:40 [daveL]
... whereas term is more specific
15:28:32 [fsasaki]
15:28:37 [daveL]
pedro: the sort of text we mark up is different in both cases so it makes sense to keep the distinction
15:30:45 [daveL]
tadej; agree granularities are quite limiting, or should we have more identifiers to support this
15:30:56 [daveL]
... but this might be more comlicating
15:31:19 [daveL]
jorge: yes this would be more complicated, clearer as it is
15:31:28 [fsasaki]
15:31:54 [daveL]
felix: christian will dial in to f2f to discuss this and resolve the topic next week
15:32:52 [daveL]
... we also need to consider number of implementations, which are not so many, when considering any possible merger
15:33:18 [daveL]
Dec: agree with jorge, keep them separate as they are distinct use cases
15:33:38 [fsasaki]
15:33:58 [daveL]
jorge: clarified, attributes as defined currently are clearer than making them more fine grained
15:36:11 [daveL]
felix:reminds that W3C process requires responding which involves some work
15:36:50 [Yves_]
could we talk about annotorsRef a bit during this call?
15:37:02 [daveL]
felix: the current level of commetn is good currently
15:37:05 [fsasaki]
topic: annotorsRef
15:38:29 [daveL]
yves; for two data categories, proc and locqualiss, can have information from multiple annotators, but we have no way of doing this for annotatorRef
15:38:40 [fsasaki]
15:39:32 [daveL]
... for current implementation, we assume the most recent annotator is the correct one, but this is not ideal
15:40:16 [daveL]
... provenance especially has multiple items and requires annotationRef
15:40:28 [fsasaki]
daveL: will onto this
15:40:42 [daveL]
scribe: daveL
15:41:30 [daveL]
phil: lets talk about the ordering of proveance
15:41:33 [Yves_]
provenance data category
15:41:55 [Arle_]
15:42:02 [Arle_]
zakim, Arle_ is Arle
15:42:59 [fsasaki]
15:43:02 [Arle_]
I am back on the call.
15:43:11 [fsasaki]
15:43:15 [fsasaki]
15:43:17 [daveL]
felix: this was a discussion of whether there was any implication between ordering and time of record
15:43:21 [fsasaki]
15:43:35 [fsasaki]
(mails related to the discussion)
15:43:43 [fsasaki]
topic: provenance record ordering
15:43:53 [daveL]
phil: asks whether there should be a lack of date stamp
15:44:11 [fsasaki]
daveL: a date stamp was discussed
15:44:17 [fsasaki]
.. there is two aspects:
15:44:34 [fsasaki]
.. a lot of original requirements didn't have a strong need for a time stamp
15:45:00 [fsasaki]
.. the original requirement was about identifying rich enough so that we can differentiate
15:45:13 [fsasaki]
.. see e.g. "agent provenance" that used to include taht
15:45:26 [fsasaki]
.. the 2nd aspect:
15:45:46 [fsasaki]
.. we discussed whether the order of the proveancen records are added is significant
15:46:00 [fsasaki]
.. but from an implementation point of view it is again compliciated
15:46:16 [fsasaki]
.. and there hadn't be much a call for this during requirements gathering
15:46:36 [fsasaki]
.. "time" also has various aspects: start of a translation, finish, duration, ...
15:46:52 [fsasaki]
.. it is also a point that the provenance wg in w3c had addressed
15:47:10 [fsasaki]
.. so we just provide identifiers of who made the translation and revision
15:47:13 [dF]
15:47:35 [fsasaki]
.. for knowing more there is a the provenance model
15:47:41 [fsasaki]
.. more = more about time
15:48:11 [fsasaki]
.. so in summary, there was no big requirement to have a time stamp
15:48:22 [fsasaki]
.. and *if* you want to do that, you can use the w3c prov model
15:48:30 [fsasaki]
.. I'll reply to that mail thread
15:48:49 [fsasaki]
pablo: I think provenance can stay as is
15:49:14 [fsasaki]
.. adding a time stamp can be useful and interesint - if every implementer is fine with that i'm fine too
15:49:51 [daveL]
scribe: daveL
15:50:20 [daveL]
felix: adding tiestamp is a substantive change and would require another call, plus tests etc
15:51:35 [fsasaki]
topic: Test suite
15:51:43 [fsasaki]
15:52:27 [daveL]
felix: from this week on be aware that people should stop using the google docs and they update the test suite master themselves
15:52:40 [fsasaki]
15:53:04 [daveL]
... we need still some input on tests still related to assertion (MUSTs0 which need suggestion for test for them
15:53:27 [fsasaki]
topic: prague f2f
15:53:30 [fsasaki]
15:53:39 [fsasaki]
15:53:54 [daveL]
felix: thanks to jirka for organising this
15:54:00 [fsasaki]
15:54:24 [daveL]
jirka: is you are not yet register, please do so numbers are known for wifi etc.
15:54:29 [dF]
present+ dF
15:55:08 [daveL]
felix: also need to know in advance when people want to dial in for organising the agenda
15:55:12 [fsasaki]
15:56:52 [daveL]
felix: go through objectives
15:57:43 [fsasaki]
15:58:15 [daveL]
... in particular the relationship between the different posters and links to where people can access them and update high level summary, adding any new use cases
15:58:25 [Naoto]
present+ Naoto
15:58:58 [fsasaki]
daveL: some time to discuss preparing EU project review?
15:59:09 [daveL]
... also brainstorm on activities for rest of year and new projects and synergy between them
15:59:42 [daveL]
felix: the Rome preparation should cover that.
16:00:20 [fsasaki]
scribe: fsasaki
16:00:21 [omstefanov]
as I will not be able to take part in the f2f Prague, but definitely intend to come to Rome, so please make sure preps for Rome are recording in writing
16:00:41 [fsasaki]
topic: xliff mapping implementation update
16:01:06 [fsasaki]
david: phil asked on that, we got good comments from xyz
16:01:22 [fsasaki]
.. status of xliff mapping - only written piece is xliff mapping wiki
16:01:23 [dF]
16:02:02 [fsasaki]
david: will work on this today, yesterday / today was EC deadline
16:02:28 [fsasaki]
.. we should publish this as a note / PC
16:02:40 [fsasaki]
.. what is the editorial setup for such a note?
16:03:11 [fsasaki]
.. we will need an additional namespace itsx
16:03:40 [fsasaki]
felix: update on implementation prototype?
16:03:53 [fsasaki]
david: solas is consuming ITS2 categories
16:03:57 [fsasaki]
.. like OKAPI does
16:04:11 [fsasaki]
.. that is being tested as part of the test suite
16:04:27 [fsasaki]
.. that is consumed by various components of solas architecture
16:04:34 [fsasaki]
.. one is an MT broker
16:04:46 [fsasaki]
.. works with different MT systems
16:05:16 [fsasaki]
.. depends on the MT systems whether they can deal with ITS metadata
16:05:27 [fsasaki]
.. moravia is contributing to that
16:05:37 [fsasaki]
.. m4loc can be used as middleware
16:05:58 [fsasaki]
.. in our current prototype the mt services exposes the m4loc service
16:06:14 [fsasaki]
.. from the deliverable - open source xliff roundtripp
16:06:23 [fsasaki]
.. the okapi filter interprets the ITS decoration
16:06:39 [fsasaki]
.. then the mapping in the wiki is used
16:06:56 [fsasaki]
.. it is consumed by middle ware open source component
16:07:50 [fsasaki]
felix: would be good to see a demo
16:08:18 [fsasaki]
david: will do, in prague and in rome
16:09:23 [fsasaki]
topic: metadata harvesting
16:09:37 [fsasaki]
ankit: we are waiting for some sort of data from cocomore
16:09:53 [fsasaki]
felix: what data?
16:10:13 [fsasaki]
ankit: we said that cocomore would provide us with annotated data
16:12:26 [fsasaki]
ankit will provide module by prague f2f
16:13:14 [fsasaki]
pedro: will have annotated data from spanish client
16:13:48 [fsasaki]
pedro: client is the spanish gov tax office
16:13:58 [fsasaki]
.. they will annotate with ITS metadata for this show case
16:14:05 [fsasaki]
.. spanish content in HTML5
16:14:12 [fsasaki]
.. we will generate english content
16:14:21 [fsasaki]
.. and annotate it in the output of the real time system
16:14:49 [fsasaki]
felix: so ankit could later use the data to test the module?
16:15:07 [fsasaki]
ankit: training data is as much as you can get
16:15:26 [fsasaki]
pedro: annotated data from cocomore is html content
16:15:35 [fsasaki]
.. we will generate content in chinese and french
16:16:08 [fsasaki]
.. so ankit can take that into account chinese, french, german in his system
16:16:14 [fsasaki]
.. and spanish
16:16:35 [fsasaki]
ankit: this will be german to english, german to french, german to chinese, german to spanish
16:16:59 [fsasaki]
16:17:21 [Pedro]
Showcase WP3 (Cocomore-Linguaserve) is German to Chinese and German to French
16:17:43 [Clemens]
16:18:10 [Pedro]
Showcase WP4 (Linguaserve-Lucy-DCU) is the full demo Spanish to English, and partial demo Spanish to French and Spanish to German
16:19:01 [Jirka]
16:19:32 [fsasaki]
thanks for everybody for staying longer, meeting adjourned
16:20:41 [mdelolmo]
16:38:23 [fsasaki]
present+ olaf
