MLW-LT WG -- 16 Jan 2013

roll call

<fsasaki> checking attendance

<fsasaki> scribe: daveL

<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0090.html

Meeting time

<fsasaki> http://www.doodle.com/pn6xa86rfbypmd2k

felix; there is no apparent slot that works. felix willl distribute a weekly alternating proposal

state of XLIFF mapping

<fsasaki> scribe: fsasaki

dave: haven't updated the mapping page a lot
... there is more work to be done to formalize the mapping
... and come up with examples
... I think we won't to focus on XLIFF 1.2 mapping first
... we were hoping that XLIFF 2 would be stable, but there is a delay
... focus on XLIFF 1.2 also helps with putting a demonstrator together

yves: dave summarized everythign right
... in okapi we implemented ITS mapping on what we have
... it is partially implemented, ongoing

dave: we will come back shortly on that
... wrt to interop between solas and CMS lion, also using okapi
... with the preparation for rome

phil: it is now on our critical path for our implementation
... david said he would have a prototype a few weeks ago
... even if there is nothing final
... even if we would have a rough direction
... e.g. yves said that with xliff 1.2, he would use mrk markup
... even if we had directions what is easily acceptable
... otherwise it could hold up my implemetnation

yves: the xliff 1.2 mapping is what we used for implementations
... most of the time it made sense
... we have tackled some of the standoff stuff
... it is also in the git repository (for okapi, scribe assumes)?

<Yves_> yes

phil: provenance and loc quality issue, rating are relevant for us here

<Yves_> Location: http://code.google.com/p/okapi/source/list?name=html5

phil: Yves' page for 1.2. we can certainly use that as our direction

dave: will talk to david tomorrow about that

phil: tx

New value for localization quality type "conformance"

<daveL> scribe: daveL

felix: asks if anyone has further thoughts, or supported for this new type

Regular expression change

felix: no respeonses yet

shaun: no update on this

<fsasaki> ACTION: shaun to work on regex for validating regex subset proposal [recorded in http://www.w3.org/2013/01/16-mlw-lt-minutes.html#action02]

<trackbot> Created ACTION-385 - Work on regex for validating regex subset proposal [on Shaun McCance - due 2013-01-23].

Disambiguation and term

felix: has been discussed in response to christian comment
... any further comments

marcis: what is the goal?

felix: christian suggested merging term and disambig data categories
... but response was that both had distinct use cases, that could merge by are valid individually

marcis: would not want to drop data category, term is easier to implement and purpose is clear
... not so clear on disambiguation category, in terms of what is possible to do with this
... for example there may be other types that might be useful in the disambiguation use case
... and doing term management with disambig would make it very heavy
... so there might need to be more atribute specifically for named entity
... referencing input form W3C india recvied today

tadej: motivation for separate data category was because it covered some use cases that fell out of the scope of terminology
... by providing some additional context
... but do see that there is some commonality
... Also term must remain to keep compatibility with named entity 1

correction, > with terminology in ITS1

jörg: still in favour of having the two data categories

scribe: since dismabiguation can cover many other tasks in content or NLP processing
... whereas term is more specific

pedro: the sort of text we mark up is different in both cases so it makes sense to keep the distinction

tadej; agree granularities are quite limiting, or should we have more identifiers to support this

scribe: but this might be more comlicating

jorge: yes this would be more complicated, clearer as it is

<fsasaki> http://tinyurl.com/its20-testsuite-dashboard

felix: christian will dial in to f2f to discuss this and resolve the topic next week
... we also need to consider number of implementations, which are not so many, when considering any possible merger

Des: agree with jorge, keep them separate as they are distinct use cases

jorge: clarified, attributes as defined currently are clearer than making them more fine grained

felix: reminds that W3C process requires responding which involves some work

<Yves_> could we talk about annotorsRef https://www.w3.org/International/multilingualweb/lt/track/issues/71 a bit during this call?

felix: replying to a question from Dave: the current number of comments received is good

annotorsRef

yves: for two data categories, proc and locqualiss, can have information from multiple annotators, but we have no way of doing this for annotatorRef
... for current implementation, we assume the most recent annotator is the correct one, but this is not ideal
... provenance especially has multiple items and requires annotationRef

<fsasaki> daveL: will look into this thread

<scribe> scribe: daveL

provenance record ordering

phil: lets talk about the ordering of proveance

<Yves_> provenance data category https://www.w3.org/International/multilingualweb/lt/track/issues/72

<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0090.html

<Arle_> I am back on the call.

<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0061.html

<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0066.html

felix: this was a discussion of whether there was any implication between ordering and time of record

<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0055.html

<fsasaki> (mails related to the discussion)

phil: asks whether there should be a lack of date stamp

<fsasaki> daveL: a date stamp was discussed

<fsasaki> .. there is two aspects:

<fsasaki> .. a lot of original requirements didn't have a strong need for a time stamp

<fsasaki> .. the original requirement was about identifying rich enough so that we can differentiate

<fsasaki> .. see e.g. "agent provenance" that used to include taht

<fsasaki> .. the 2nd aspect:

<fsasaki> .. we discussed whether the order of the proveancen records are added is significant

<fsasaki> .. but from an implementation point of view it is again compliciated

<fsasaki> .. and there hadn't be much a call for this during requirements gathering

<fsasaki> .. "time" also has various aspects: start of a translation, finish, duration, ...

<fsasaki> .. it is also a point that the provenance wg in w3c had addressed

<fsasaki> .. so we just provide identifiers of who made the translation and revision

<fsasaki> .. for knowing more there is a the provenance model

<fsasaki> .. more = more about time

<fsasaki> .. so in summary, there was no big requirement to have a time stamp

<fsasaki> .. and *if* you want to do that, you can use the w3c prov model

<fsasaki> .. I'll reply to that mail thread

<fsasaki> pablo: I think provenance can stay as is

<fsasaki> .. adding a time stamp can be useful and interesint - if every implementer is fine with that i'm fine too

<scribe> scribe: daveL

felix: adding tiestamp is a substantive change and would require another call, plus tests etc

Test suite

<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2013Jan/0090.html

felix: from this week on be aware that people should stop using the google docs and they update the test suite master themselves

<fsasaki> http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012Dec/0087.html

felix: we need still some input on tests still related to assertion (MUSTs0 which need suggestion for test for them

prague f2f

<fsasaki> http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f

<fsasaki> http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Objectives

felix: thanks to jirka for organising this

<fsasaki> http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Participants

jirka: is you are not yet register, please do so asap. Numbers of people need to be known for wifi etc.

felix: also need to know in advance when people want to dial in for organising the agenda

<fsasaki> http://www.w3.org/International/multilingualweb/lt/wiki/PragueJan2013f2f#Objectives

felix: going through objectives

<fsasaki> http://www.w3.org/International/multilingualweb/lt/wiki/Use_cases_-_high_level_summary

felix: in particular the relationship between the different posters and links to where people can access them and update high level summary, adding any new use cases

<fsasaki> daveL: some time to discuss preparing EU project review?

felix: also brainstorm on activities for rest of year and new projects and synergy between them
... the Rome preparation should cover that.

<fsasaki> scribe: fsasaki

<omstefanov> as I will not be able to take part in the f2f Prague, but definitely intend to come to Rome, so please make sure preps for Rome are recording in writing

xliff mapping implementation update (with David on the call)

david: phil asked on that, we got good comments from xyz
... status of xliff mapping - only written piece is xliff mapping wiki

<dF> http://www.w3.org/International/multilingualweb/lt/wiki/XLIFF_Mapping

david: will work on this today, yesterday / today was EC deadline
... we should publish this as a note / PC
... what is the editorial setup for such a note?
... we will need an additional namespace itsx

felix: update on implementation prototype?

david: solas is consuming ITS2 categories
... like OKAPI does
... that is being tested as part of the test suite
... that is consumed by various components of solas architecture
... one is an MT broker
... works with different MT systems
... depends on the MT systems whether they can deal with ITS metadata
... moravia is contributing to that
... m4loc can be used as middleware
... in our current prototype the mt services exposes the m4loc service
... from the deliverable - open source xliff roundtripp
... the okapi filter interprets the ITS decoration
... then the mapping in the wiki is used
... it is consumed by middle ware open source component

felix: would be good to see a demo

david: will do, in prague and in rome

metadata harvesting

ankit: we are waiting for some sort of data from cocomore

felix: what data?

ankit: we said that cocomore would provide us with annotated data

ankit will provide module by prague f2f

pedro: will have annotated data from spanish client
... client is the spanish gov tax office
... they will annotate with ITS metadata for this show case
... spanish content in HTML5
... we will generate english content
... and annotate it in the output of the real time system

felix: so ankit could later use the data to test the module?

ankit: training data is as much as you can get

pedro: annotated data from cocomore is html content
... we will generate content in chinese and french
... so ankit can take that into account chinese, french, german in his system
... and spanish
... this will be german to english, german to french, german to chinese, german to spanish

<Pedro> Showcase WP3 (Cocomore-Linguaserve) is German to Chinese and German to French

<Clemens> right!

<Pedro> Showcase WP4 (Linguaserve-Lucy-DCU) is the full demo Spanish to English, and partial demo Spanish to French and Spanish to German

thanks for everybody for staying longer, meeting adjourned

- DRAFT -

MLW-LT WG

16 Jan 2013

Attendees

Contents

roll call

Meeting time

state of XLIFF mapping

New value for localization quality type "conformance"

Regular expression change

Disambiguation and term

annotorsRef

provenance record ordering

Test suite

prague f2f

xliff mapping implementation update (with David on the call)

metadata harvesting

Summary of Action Items