IRC log of mlw on 2011-09-22

Timestamps are in UTC.

10:22:29 [RRSAgent]
RRSAgent has joined #mlw
10:22:29 [RRSAgent]
logging to
10:22:41 [tadej]
... also, including training information: tutorials, best practices
10:23:01 [tadej]
... many different communties interested in the same standard, but from different perspectives
10:23:19 [tadej]
... a wiki article could connect these interest groups
10:23:48 [tadej]
... the standards scene is fragmented, lack of coordination and communication.
10:24:07 [r12a]
contact for this report: Gerhard Budin
10:24:08 [tadej]
.... We increasingly employ joint meetings.
10:24:29 [fsasaki]
fsasaki has joined #mlw
10:24:44 [tadej]
Gerhard: we would have websites, that implement this standard and simply use it as an example.
10:24:54 [tadej]
s/, that/ that/
10:25:47 [tadej]
Gerhard: Given that here we have representatives from many working groups, it would be a good idea to do something concrete that has immediate impact on people, e.g. wiki article.
10:25:52 [fsasaki]
meeting: mlw workshop - breakout session feedback
10:25:58 [fsasaki]
chair: jaap
10:26:01 [fsasaki]
scribe: tadej
10:26:19 [fsasaki]
10:26:29 [fsasaki]
topic: breakout session feedback
10:26:36 [fsasaki]
present: richard, mlw, manyFolks
10:26:38 [tadej]
Gerhard: we have started collecting addressed from people to start these, we expect many of you in this room to join.
10:26:40 [RRSAgent]
I have made the request to generate fsasaki
10:27:34 [tadej]
... it's an open group, so anyone with good ideas is invited to join, provided that there is exeecution.
10:27:54 [tadej]
Olaf: if anyone wants to join, contanct me or Gerhard.
10:29:07 [tadej]
SebastianHellman: instead of maintaining a separate org for this initiative, we could employ Wikipedia's WikiProject infrastructure.
10:29:26 [tadej]
... there's also MetaWiki, which coordinates action accross different language wikipedias.
10:29:52 [tadej]
topic: Translation Container Standard
10:30:40 [tadej]
Christian: translation tools create packages, which are not always interoperable.
10:30:55 [tadej], what should a container format look like?
10:31:55 [tadej]
In practice, translation seems to include a lot of files being e-mailed around.
10:32:03 [tadej]
... we wnat to achieve automation and interop.
10:32:23 [tadej]
... we have discussed many of these issues. Some of the issues were adressed in LINPORT.
10:32:52 [tadej]
... Workflow issues: 1) can I send you something and you can immediately use it?
10:33:20 [tadej]
... This was the focus of LINPORT - we focused on containers, but did not work on the concrete content formats.
10:34:13 [tadej]
... Standards are often too narrow-focused in use cases. InteroperabilityNow was slightly broader, LNPORT even more so.
10:34:35 [tadej]
.... 2) can we merge all of the efforts into one?
10:34:59 [tadej]
... At the least, we want to avoid overlapping development of the same functionality.
10:35:57 [tadej]
... We can't just focus on being a translation-focused project (LINPORT), we need to look for broader scope.
10:36:41 [tadej]
ManuelThomas: There's a open symposium next week, and also everything LINPORT publishes is open.
10:37:42 [tadej]
Question: do the uses include thousands of independent translators? MT is not widely implemented, especially at the individual translator levels where a lot of work takes place.
10:38:08 [tadej]
Answer: That's true, independent translators don't have the bandwidth to participate in these contributions.
10:38:52 [r12a]
s/Question/Nico de Water/
10:39:41 [tadej]
ManuelThomas: There's more to the independent translators: the ML electronic dossier takes them strongly into account
10:39:59 [tadej]
... there are two main cases in practice relevant: fully manual translation and low-cost tools.
10:40:29 [tadej]
Topic: LT-Web
10:40:31 [r12a]
Contact point for summary review: Tomas CB
10:41:18 [tadej]
fsasaki: The scenarios the LTWeb project wants to tackle. 1) deep web localization 2) making content available for MT 3) making CMS contents for MT training.
10:42:00 [tadej]
... We got feedback from Monica from the ISO perspective, emphasizing importance of test suites and reference impls.
10:42:40 [tadej]
... Industry input also says that it will be hard to convince use case partners to use our categories instead of their.
10:43:01 [tadej]
... This will be addressed by carefully constructing categories that are general enough to achieve alignment.
10:44:24 [tadej]
... The conclusion is that we shouldn't focus too much on developing new standards for MLW, but describing best practices to implementing these connectors, developing test suites that people with their own implementations can test against, as well as supplying reference implementations.
10:45:04 [tadej]
... Another idea was to focus on very specific pieces of content, namely HTML and XML based content in variuous applications.
10:45:20 [tadej]
... Everything will take place as a W3C process.
10:45:47 [tadej]
... Even if we narrow the scope of categories, we should still define a general business process.
10:46:26 [tadej]
r12a: In ITS development, one of the goals was not the describe how people would use it, we would just define the concepts.
10:46:38 [tadej]
... Implementation was more open-ended.
10:47:07 [tadej]
... a data concept is a concept, while implementation was left to concrete use cases.
10:47:45 [tadej]
fsasaki: We don't tackle the details on how the exact code or format will look like, but focus more on concepts and looking at how they map to various concepts of organization-dependent concepts.
10:48:13 [tadej]
fsasaki: Around HTML, there's no convergence of the concept of translatabilty, many different implementations.
10:48:30 [tadej]
topic: Multilingual Social Media
10:48:52 [fsasaki]
will just paste my summary into IRC now ....
10:48:54 [fsasaki]
- NIF format (Sebastian): annotated (linked open) data can contribute to test suites, if NIF (meta) data is close to what is used in localization area. That can be an input to the test suites that need to be developed for MLW-LT. - SDL (Matthias Heyn): all three scenarios are interesting: deep web <> LSP, surface web <> real time MT, deep web <> MT training. Details are important. - ISO (Monica): reference implementations, tests suites good idea. People who w
10:49:31 [tadej]
One important topic was crowdsourcing for translation, emphasizing the Facebook use case
10:49:51 [r12a]
felix, are you my contact for summary of your discussion ?
10:49:53 [tadej]
s/One important/Timo: One important/
10:49:57 [fsasaki]
10:50:22 [tadej]
Timo: a half a million people are participating in choices of terminology for 75 languages.
10:50:22 [r12a]
10:50:52 [tadej]
Timo: Important criteria are speed, cost, quality, trustworthiness.
10:51:28 [fsasaki]
s/namely HTML/namely HTML, CSS, JavaScript/
10:51:34 [RRSAgent]
I have made the request to generate fsasaki
10:51:35 [tadej]
Timo: Lionbridge said that a lot of trustworthiness comes from the fact that crowdsources content has a local feel.
10:52:22 [tadej]
... People have many motivations to participate: the chance to make decisions, peer recognition, to see contributions to be visible, national pride.
10:52:56 [tadej]
... Hybrid approaches of professional + crowdsource localization are also feasible and practical.
10:53:27 [tadej]
... There are also two cases for cultural differences: design of the product and interacting with the user generated content
10:54:04 [tadej]
... For instance, family relationships are not always directly translatable, since they mean different things in different cultures.
10:54:35 [tadej]
... Multilingual mining of SM can be very useful for marketing analytics.
10:55:15 [RRSAgent]
I have made the request to generate fsasaki
10:55:29 [tadej]
Question: How to actually deal with the millions of contributions in terms of QA.
10:56:18 [tadej]
Ghassan: We deal with this thing very concretely - it's an important issue.
10:56:38 [tadej]
Arle: That looks like an opportunity for describing best practices for implementing crowdsourcing QA.
10:57:26 [tadej]
Timo: If there are many people in a local community, mistakes get discovered sooner or later.
10:57:55 [tadej]
... on the other hand, pro translators make correct translations, but may not adapt to the market completely
10:58:28 [tadej]
Timo: A rule of thumb: if a use of a term is shared by a large number of people, it's preferred over the official translation.
10:59:10 [tadej]
Contact point for future action: Timo.
10:59:24 [tadej]
topic: Best practices
10:59:41 [tadej]
Problem 1: Accessiblity
11:00:23 [tadej]
Often the practices are very complex and time consuming, so developer often accomplish only the first level of guidelines.
11:00:31 [RRSAgent]
I have made the request to generate fsasaki
11:00:57 [tadej]
... One suggestion is to have a annual selection of "best website of the year" in order to promote this.
11:01:16 [tadej]
... Many differences between original text and target accessible text.
11:01:46 [tadej]
... Another problem: how to deal with clients which want a "rainbow" progress bar.
11:01:57 [tadej]
... How to localize sign languages?
11:02:13 [tadej]
... One solution is to embed videos, but that opens a new set of problems
11:02:27 [tadej]
... Another topic was multilingual eduacation websites.
11:02:58 [tadej]
... Here the content is important, should be adapted to audience, e.g. children. For instance, larger fonts images, appropriate vocabulary complexity.
11:03:24 [tadej]
... There are institutional websites: they are not user-centric but more focused for themselves.
11:03:46 [tadej]
... Often the case is that the client institution provides a document of text which gets uploaded to the website.
11:04:37 [tadej]
... The second problem is the language policy of institutional websites. In EU institutions, they should reach 25 countries, but the content is often available oonly in four (or event english only)
11:04:45 [RRSAgent]
I have made the request to generate fsasaki
11:05:23 [RRSAgent]
I have made the request to generate fsasaki
11:05:57 [tadej]
Question: Since 2008, things have change a lot, one of the suggestions was looking at the guidelines and coming up with a simplified version with lower barrier to entry.
11:07:07 [tadej]
Cristina: There is a need to bridge experts in differnt fields. There was aggreement in our group that it needs to involve developers, translators, localizers, designers ,as well as a need for further training.
11:07:40 [tadej]
Question2: The suggestion of inclusion of sign language videos is great, especially combined with subtitles for screen readers.
11:08:57 [tadej]
... Eventually we could establish a single body of diverse participants on the model of the german organization (T-COM ?)
11:09:54 [tadej]
r12a: Guidelines are often based around checklist, which is convenient for QA, but not good for designers - it's better to warn designers while they're actually designing.
11:10:47 [tadej]
Question3: Sometimes accessibilty can be considered as another element to localize. We wanted to promote more communication accross the whole web design process.
11:11:59 [tadej]
ManuelTomas: Don't reinvent the wheel with methodologies. For the case of EU institutions, since there is not any standard for multilingual website, large ML webistes are often expensive to maintain.
11:12:47 [tadej]
Christian: In Germany there are actually awards for accessible websites.
11:13:22 [tadej]
... Another interesting thing was that we are entering the mobile world. If you provide mobile-ready content, you almost automatically get accesibility.
11:14:38 [tadej]
topic: Conclusions
11:15:06 [tadej]
fsasaki: Some topics didn't get through, like what to put on research agendas.
11:15:22 [tadej]
... there's a public mailing list for these purposes.
11:15:46 [tadej]
r12a: Everyone here is on the announcement list, where we will promote the public discussions list.
11:16:33 [RRSAgent]
I have made the request to generate fsasaki
11:16:33 [tadej]
Olaf: Each of the breakout groups gave generated starting points, but needs to keep going. There are contact points, which is good. Is there a host which can provide infrastructure for these groups to be active.
11:17:16 [tadej]
Olaf: Interactions on wikis need constant prodding and pushing, it needs a certain point until it actually grows by itself.
11:18:13 [RRSAgent]
I have made the request to generate fsasaki
11:18:20 [tadej]
r12a: We could schedule a follow-up at the next workshop.
11:18:33 [tadej]
Question: Can we use the MLW project website for this?
11:18:48 [tadej]
r12a: Of course, along with the MLW mailing lists.
11:19:23 [tadej]
Olaf: Should the five groups report on the next workshop (many raise hands)
11:20:19 [tadej]
... What about moving the half-day session to the middle half. There could be more followups on the same event;.
11:20:29 [tadej]
... (many raise hands)
11:21:06 [tadej]
GunnarBittersmann: it would also be good because it's hard to follow talks by the end of the day.
11:21:24 [tadej]
fsasaki: A lot of activities will be continued in the scope of the LT-Web project.
11:21:40 [RRSAgent]
I have made the request to generate fsasaki
11:22:15 [tadej]
topic: Wrap-up
11:22:44 [tadej]
r12a: The open discussion format seems to be a good idea, will most likely continue on the next workshop.
11:24:31 [tadej]
r12a: We are done, thanks for attending, there will be vidos, as well as notes from the presentations.
11:26:22 [RRSAgent]
I have made the request to generate fsasaki
11:26:28 [tadej]
rrsagent, draft minutes
11:26:28 [RRSAgent]
I have made the request to generate tadej
11:47:44 [tadej]
tadej has joined #mlw
13:35:53 [r12a]
r12a has joined #mlw
13:36:07 [r12a]
rrsagent, bye
13:36:07 [RRSAgent]
I see no action items