MLW-LT Working group

09 Mar 2012

See also: IRC log


aaron, carina, declan, fsasaki, Mihael, Milan, Moritz, Nicoletta (on IRC), Phil, Tadej, doates
daniel, jirka, Yves, Jan



david: going through list of participants of the group

current state: https://www.w3.org/2000/09/dbwg/details?group=53116&public=1

michael from DERI introducing himself

michael: working at DERI, concentrating on Semantic Web
... dealing with ontologies
... question answering, multilingual ontology generation

pedro: from linguaserve, localization service provider
... working with multilingual solutions:
... e.g. online translation systems
... localization chain interoperability & web services
... participating here for two aspects:
... 1) providing input to the standard definition
... 2) practical experiences from multilingual processing
... 3) specific work package contribution

david: for people who are not in the group: work packages are references implementations the EC funded group is doing

see http://www.w3.org/International/multilingualweb/lt/wiki/Main_Page#Deliverables.2C_WPs

david: some of the reference implementations will be open source

aaron: from Opera, nice to be here
... I am localization coordinator for opera
... localization of web properties is my interest
... managing relation between browsers and users
... currently handled differently for each browser
... we have developed ways how we manage translations with providers and volunteers
... I'm here to learn more about best practices

<dF> Felix, do we have a public list for the LUX f-2-f

aaron: that we can increase efficiency

link for luxembourg meeting: http://www.w3.org/International/multilingualweb/lt/wiki/Main_Page#Deliverables.2C_WPs

nc, do you want to post a brief introduction here on IRC?

guiseppe: development director of linguaserve, I'm from italy

david: describing the role XLIFF, main format for localization roundtripping
... we also have relationship with other TCs like oasis XLIFF etc.
... other relationships to Unicode consortium, ETSI etc.
... I am working at Univ. of Limerick
... research in standardization related to localization

declan: work at DCU
... european projects like euromatrix plus
... panacea (for data aquisition)
... cosyne (multilingual synchronization of wikis)
... background is mostly in MT
... my role in MLW-LT is about metadata for machine translation training
... and work package 4 about online MT systems

milan: from moravia worldwide, another LSP in the group
... helping developing reference implementations related to XLIFF roundtripping

david: too early for enlaso

phil: CTO with vistaTEC, also LSP
... based in Dublin Ireland
... a lot of process automation
... in MLW-LW, we want to improve decision making automation, based on metadata
... excited for potential for improving our process
... we are also member of CNGL

(centre for next generation localisation)

felix: working for W3C and DFKI fellow, coordinating underlying EU project and co-chair with David and Dave
... background in i18n actitiy in W3C, bringing in existing standardization here (ITS) and assuring relation to HTML5 etc.

doates: from adobe

<doates> A brief intro from me as requested: I'm a localization architect working withing the globalization group at Adobe. I've been in that group for 14 years, working on variousl products and tools. I am now the architect of our internal localization platform and am keen to help shape the definition of this group, and help bring it to a point where it is suitable to drive it back into our internal machinery, and also to push it into our products.

<dF> link for luxembourg meeting: http://www.w3.org/International/multilingualweb/lt/wiki/Main_Page#Deliverables.2C_WPs

doates, can you type a short self intro into IRC?

tadej: from JSI
... working on NLP, data mining, semantic web
... our role is to be provider of text analytics tools
... that provide metadata for existing content
... and to use that for other processes in the localization or other pipelines

david: anyone else on the call?

moritz: from cocomore, agency for communication and technology
... joining to work on multilingual CMS solutions

moritz: want to evangelize the metadata too
... metadata will focus on drupal

about carina (introduced by moritz): also at cocomore

scribe: will also join us

about the group

David: EU project started January 2012
... W3C project just started 7 March

(see press release at http://www.w3.org/Press/Releases-2012#x2012-mwlt )

david: 13 members of the group are LT-Web consortium. That's EC working title of the project
... the w3c working group is larger and is working under the W3C IPR policies
... the EU project has some additional obligations, e.g. work on the standard and test suites
... and to develop reference implementations
... see the deliverables list at http://www.w3.org/International/multilingualweb/lt/wiki/Main_Page#Deliverables.2C_WPs

first f2f meeting

david: not a formal f2f meeting, since it couldn't have been announced in time
... still the meeting in Luxembourg is open for everybody

see http://www.w3.org/International/multilingualweb/lt/wiki/LuxembourgMarch2012

david: meeting is open for all participants

felix: how about other people to join the meeting?

david: should be OK, but let's address this offline
... other meetings planned: Dublin Meeting in June

<Carina> looking forward to Dublin


david: dublin meeting will be a requirements gathering meeting, hoping to get a lot of feedback

david going through the WG homepage

see http://www.w3.org/International/multilingualweb/lt/#feedback for questionnaire about requirements


david: please fill in the questionnaire http://www.w3.org/2002/09/wbs/1/mlw-lt-requirements/
... and let other stakeholders know about it and see if they can fill it in



david: you can still add points of interest to the agenda

felix: please also have a look at the pre-read materials


david: we had a good opening call, looking forward to meet you regualry for 95 times

felix: we will need an additional meeting slot, for people in the US

felix: continuing now with work package specific dicsussion, everybody can stay on the call.

adjourn of this call, continue with WP discussions

WP 5 discussion (EU project specific, but open to others too)

tadej: interested how the MT workflow will use metadata
... declan said in the berlin meeting that there are some opportunities

<dF> Declan: what we could handle just now

<dF> Declan: priorities

declan: types of metadata for MT training: domain, terminology, do not translate item

<dF> .. 1. terms

declan: and related to training process

<dF> ..2. do not translate

@DF, happy to continue scribing

<dF> ..3. domain related metadata

declan: above would be most useful for MT training

<dF> OK :-)

tadej: for terminology we can contribute automatic generation
... for "do not translate" it depends on the target language

declan: domain information is about the topic, e.g. chemistry, biology etc.
... but also stylistic
... e.g. whether it is formalized etc.
... could be someone dealing with patents etc.
... we use this to categorize training materials

tadej: sounds very useful
... our existing can analyze topic information
... but not genre information yet

david: in let's MT project things like that were done
... there was a challenge in organizing training data
... result was multi dimensional
... main interest was for SMT builder
... they are interested in data, metadata is nice
... but for building the training corpus there needs to be something else
... e.g. a clustering
... if a corpus is really big
... you may need to use metadata for clustering
... but clustering can also depend on other things than metadata

declan: issue is granularity
... on top of a base line you would want to use more specific models
... you would use more domain specific data, identified by metadata
... and cascade the training
... in an SMT system you can have multilple translation models and language models
... we assume that domain specific models can help the quality

david: true, but is domain specific not cross metadata categories?

declan: for general clustering that might the case
... but you could use a number of domains to represent a particiluar domain

tadej: what granularity are you looking for - corse grained?

declan: yes, higher level

milan: granularity also depends on amount of data

declan: do you have example of ontology you use, Tadej?

tadej: yes
... we are using dmos
... open directory ontology
... has 1 million categories
... hierachical
... covers a lot of domains

<tadej> http://www.dmoz.org/

declan: want to have a granularity that MT systems can make use of

tadej: so maybe only top level is useful

<tadej> http://enrycher.ijs.si/

declan: yes

tadej: for enrycher, you can see categories we are providing

declan: great, we could use that as a starting point

kimmo: any practical questions for me at the moment?

nothing specific for kimmo at the moment

declan: deliverables are to be finished month 15, but standard is month 21
... we will adjust our deliverable for the final standard in month 21
... will still be in allocated PM, but divide the time between start and the end
... so we still do what is in the dow, but adhere to the standard process

moritz: there should be not too much that we have change a lot

declan: agree
... so move deliverables to month 21 or have two parts
... any other questions about the WP5 work plan?

declan, fine by me

<scribe> ACTION: declan to adjust the WP plan for WP 5 [recorded in http://www.w3.org/2012/03/09-mlw-lt-minutes.html#action01]

moritz: would good to have some background material about localization chain

declan: I'll send you some material

moritz: great, thanks
... two main workflows that we could use from drupal: XLIFF or HTML(5)

declan: usually we do a lot of batch processing, offline
... we have done work with XLIFF and TMX and other formats
... we are open about what is most easiest and flexible
... to define standard ways of input

moe, let's have the call with the whole group, so that others can jump in

declan: so that's all for WP5 discussion so far
... anything else?


moritz: nothing specific to discuss


felix: propose to discuss Yves's proposal about a separate module to get localizable content out of drupal - at this call
... not now in detail, but later with Yves on the list or on a call

<scribe> ACTION: moritz to trigger the discussion with Yves on the public mailing list and or a regular call about Yves's issue [recorded in http://www.w3.org/2012/03/09-mlw-lt-minutes.html#action02]

david: Yves reported that it is hard to extract, that was expected
... issues of getting all localizable stuff into the cycle has been addressed by XLIFF module

that's all for me for WP3, I have no further items





<Declan> It may be worth compiling a master planning chart for the WPs (in a similar format to the plans for WP3, WP5 that I've seen already) so that we can clearly see the inter-dependencies between the deliverables across WPs


<scribe> ACTION: felix to check excel upload in the w3c side [recorded in http://www.w3.org/2012/03/09-mlw-lt-minutes.html#action03]

draft agenda for luxembourg

moritz: agenda needs times in it

<scribe> ACTION: felix to add times to luxembourg agenda [recorded in http://www.w3.org/2012/03/09-mlw-lt-minutes.html#action04]

meeting adjourned

Summary of Action Items

[NEW] ACTION: declan to adjust the WP plan for WP 5 [recorded in http://www.w3.org/2012/03/09-mlw-lt-minutes.html#action01]
[NEW] ACTION: felix to add times to luxembourg agenda [recorded in http://www.w3.org/2012/03/09-mlw-lt-minutes.html#action04]
[NEW] ACTION: felix to check excel upload in the w3c side [recorded in http://www.w3.org/2012/03/09-mlw-lt-minutes.html#action03]
[NEW] ACTION: moritz to trigger the discussion with Yves on the public mailing list and or a regular call about Yves's issue [recorded in http://www.w3.org/2012/03/09-mlw-lt-minutes.html#action02]
[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.136 (CVS log)
$Date: 2012/03/09 11:42:21 $