MLW workshop - breakout session feedback

22 Sep 2011


See also: IRC log


Jaap van der Meer
Tadej Štajner

This is the raw scribe log for the group reports for the Open Space discussion on day two of the MultilingualWeb workshop in Limerick. The log has not undergone careful post-editing and may contain errors or omissions. It should be read with that in mind. It constitutes the best efforts of the scribes to capture the gist of the talks and discussions that followed, in real time. IRC is used not only to capture notes on the talks, but can be followed in real time by remote participants, or participants with accessibility problems. People following IRC can also add contributions to the flow of text themselves.

See also the log for the first day.


  1. Standardization
  2. Translation Container Standard
  3. LT-Web
  4. Multilingual Social Media
  5. Best practices
  6. Conclusions
  7. Wrap-up


Gerhard: Decided to draft a Wikipedia article on the topic
... The idea is to have meta-article, referring and summarizing existing articles on i18n standards
... In parallel, publishing the meta-article on a Tiki wiki CMS

... also, including training information: tutorials, best practices
... many different communties interested in the same standard, but from different perspectives
... a wiki article could connect these interest groups
... the standards scene is fragmented, lack of coordination and communication.

... We increasingly employ joint meetings.

... we would have websites that implement this standard and simply use it as an example.
... Given that here we have representatives from many working groups, it would be a good idea to do something concrete that has immediate impact on people, e.g. wiki article.

... we have started collecting addressed from people to start these, we expect many of you in this room to join.
... it's an open group, so anyone with good ideas is invited to join, provided that there is exeecution.

Olaf: if anyone wants to join, contanct me or Gerhard.

SebastianHellman: instead of maintaining a separate org for this initiative, we could employ Wikipedia's WikiProject infrastructure.
... there's also MetaWiki, which coordinates action accross different language wikipedias.

<r12a> contact for this report: Gerhard Budin

Translation Container Standard

Christian: translation tools create packages, which are not always interoperable.
... so, what should a container format look like?

... In practice, translation seems to include a lot of files being e-mailed around.

... we wnat to achieve automation and interop.
... we have discussed many of these issues. Some of the issues were adressed in LINPORT.
... Workflow issues: 1) can I send you something and you can immediately use it?
... This was the focus of LINPORT - we focused on containers, but did not work on the concrete content formats.
... Standards are often too narrow-focused in use cases. InteroperabilityNow was slightly broader, LINPORT even more so.
... 2) can we merge all of the efforts into one?
... At the least, we want to avoid overlapping development of the same functionality.
... We can't just focus on being a translation-focused project (LINPORT), we need to look for broader scope.

ManuelThomas: There's a open symposium next week, and also everything LINPORT publishes is open.

Nico van de Water: do the uses include thousands of independent translators? MT is not widely implemented, especially at the individual translator levels where a lot of work takes place.

Answer: That's true, independent translators don't have the bandwidth to participate in these contributions.

ManuelThomas: There's more to the independent translators: the ML electronic dossier takes them strongly into account
... there are two main cases in practice relevant: fully manual translation and low-cost tools.

<r12a> Contact point for summary review: Tomas CB


fsasaki: The scenarios the LTWeb project wants to tackle. 1) deep web localization 2) making content available for MT 3) making CMS contents for MT training.
... We got feedback from Monica from the ISO perspective, emphasizing importance of test suites and reference impls.
... Industry input also says that it will be hard to convince use case partners to use our categories instead of their.
... This will be addressed by carefully constructing categories that are general enough to achieve alignment.
... The conclusion is that we shouldn't focus too much on developing new standards for MLW, but describing best practices to implementing these connectors, developing test suites that people with their own implementations can test against, as well as supplying reference implementations.
... Another idea was to focus on very specific pieces of content, namely HTML, CSS, JavaScript and XML based content in variuous applications.
... Everything will take place as a W3C process.
... Even if we narrow the scope of categories, we should still define a general business process.

r12a: In ITS development, one of the goals was not the describe how people would use it, we would just define the concepts.
... Implementation was more open-ended.
... a data concept is a concept, while implementation was left to concrete use cases.

fsasaki: We don't tackle the details on how the exact code or format will look like, but focus more on concepts and looking at how they map to various concepts of organization-dependent concepts.
... Around HTML, there's no convergence of the concept of translatabilty, many different implementations.

<r12a> Contact point for summary review: Felix

<fsasaki> will just paste my summary into IRC now ....

<fsasaki> - NIF format (Sebastian): annotated (linked open) data can contribute to test suites, if NIF (meta) data is close to what is used in localization area. That can be an input to the test suites that need to be developed for MLW-LT. - SDL (Matthias Heyn): all three scenarios are interesting: deep web <> LSP, surface web <> real time MT, deep web <> MT training. Details are important. - ISO (Monica): reference implementations, tests suites good idea. People who w

Multilingual Social Media

Timo: One important topic was crowdsourcing for translation, emphasizing the Facebook use case

... a half a million people are participating in choices of terminology for 75 languages.

... Important criteria are speed, cost, quality, trustworthiness.
... Lionbridge said that a lot of trustworthiness comes from the fact that crowdsources content has a local feel.
... People have many motivations to participate: the chance to make decisions, peer recognition, to see contributions to be visible, national pride.
... Hybrid approaches of professional + crowdsource localization are also feasible and practical.
... There are also two cases for cultural differences: design of the product and interacting with the user generated content
... For instance, family relationships are not always directly translatable, since they mean different things in different cultures.
... Multilingual mining of SM can be very useful for marketing analytics.

Question: How to actually deal with the millions of contributions in terms of QA.

Ghassan: We deal with this thing very concretely - it's an important issue.

Arle: That looks like an opportunity for describing best practices for implementing crowdsourcing QA.

Timo: If there are many people in a local community, mistakes get discovered sooner or later.
... on the other hand, pro translators make correct translations, but may not adapt to the market completely
... A rule of thumb: if a use of a term is shared by a large number of people, it's preferred over the official translation.

<r12a> Contact point for summary review: Timo

Best practices

Problem 1: Accessiblity

Often the practices are very complex and time consuming, so developer often accomplish only the first level of guidelines.

scribe: One suggestion is to have a annual selection of "best website of the year" in order to promote this.
... Many differences between original text and target accessible text.
... Another problem: how to deal with clients which want a "rainbow" progress bar.
... How to localize sign languages?
... One solution is to embed videos, but that opens a new set of problems
... Another topic was multilingual eduacation websites.
... Here the content is important, should be adapted to audience, e.g. children. For instance, larger fonts images, appropriate vocabulary complexity.
... There are institutional websites: they are not user-centric but more focused for themselves.
... Often the case is that the client institution provides a document of text which gets uploaded to the website.
... The second problem is the language policy of institutional websites. In EU institutions, they should reach 25 countries, but the content is often available oonly in four (or event english only)

Question: Since 2008, things have change a lot, one of the suggestions was looking at the guidelines and coming up with a simplified version with lower barrier to entry.

Cristina: There is a need to bridge experts in differnt fields. There was aggreement in our group that it needs to involve developers, translators, localizers, designers ,as well as a need for further training.

Question2: The suggestion of inclusion of sign language videos is great, especially combined with subtitles for screen readers.
... Eventually we could establish a single body of diverse participants on the model of the german organization (T-COM ?)

r12a: Guidelines are often based around checklist, which is convenient for QA, but not good for designers - it's better to warn designers while they're actually designing. WAI is starting a new EC-funded project which may provide an opportunity to input your concerns and get them addressed.

Question3: Sometimes accessibilty can be considered as another element to localize. We wanted to promote more communication accross the whole web design process.

ManuelTomas: Don't reinvent the wheel with methodologies. For the case of EU institutions, since there is not any standard for multilingual website, large ML webistes are often expensive to maintain.

Christian: In Germany there are actually awards for accessible websites.
... Another interesting thing was that we are entering the mobile world. If you provide mobile-ready content, you almost automatically get accesibility.


fsasaki: Some topics didn't get through, like what to put on research agendas.
... there's a public mailing list for these purposes.

r12a: Everyone here is on the announcement list, where we will promote the public discussions list.

Olaf: Each of the breakout groups gave generated starting points, but needs to keep going. There are contact points, which is good. Is there a host which can provide infrastructure for these groups to be active.
... Interactions on wikis need constant prodding and pushing, it needs a certain point until it actually grows by itself.

r12a: We could schedule a follow-up at the next workshop.

Question: Can we use the MLW project website for this?

r12a: Of course, along with the MLW mailing lists.

Olaf: Should the five groups report on the next workshop (many raise hands)
... What about moving the half-day session to the middle half. There could be more followups on the same event;.
... (many raise hands)

GunnarBittersmann: it would also be good because it's hard to follow talks by the end of the day.

fsasaki: A lot of activities will be continued in the scope of the LT-Web project.


r12a: The open discussion format seems to be a good idea, will most likely continue on the next workshop.
... We are done, thanks for attending, there will be videos, as well as notes from the presentations.

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.136 (CVS log)
$Date: 2011/09/27 09:03:38 $