IRC log of mlw on 2011-04-04

Timestamps are in UTC.

07:13:17 [RRSAgent]
RRSAgent has joined #mlw
07:13:17 [RRSAgent]
logging to http://www.w3.org/2011/04/04-mlw-irc
07:13:24 [fsasaki]
meeting: MLW workshop, PISA
07:13:27 [fsasaki]
chair: richard
07:13:32 [fsasaki]
scribe: felix
07:13:44 [fsasaki]
topic: introduction
07:13:52 [Jirka]
Jirka has joined #mlw
07:13:56 [fsasaki]
Richard introduces the project and the workshop
07:14:12 [luke]
2nd of 4 MultilingualWeb conferences
07:14:41 [luke]
Goal is to facilitate cross-pollination across different areas, so don't tune out if it's not your specialty!
07:18:56 [tadej]
tadej has joined #mlw
07:20:27 [mpo]
mpo has joined #mlw
07:20:38 [chaals]
chaals has joined #mlw
07:21:14 [chaals]
rrsagent, draft minutes
07:21:15 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html chaals
07:21:36 [chaals]
rrsagent, make log public
07:22:51 [lbellido]
lbellido has joined #mlw
07:24:16 [fsasaki]
topic: Presentation from Domenico Laforenza on "The Italian approach to Internationalized Domain Names"
07:29:19 [tadej1]
tadej1 has joined #mlw
07:30:22 [tadej1]
tadej1 has left #mlw
07:30:26 [tadej1]
tadej1 has joined #mlw
07:31:13 [fsasaki]
Domenico describes the mechanisms behind IDN, domain names in general, the usage of the internet
07:31:47 [r12a]
r12a has joined #mlw
07:33:18 [fsasaki]
Domenico describes what is possible with IDN, compared to domain names in general
07:33:59 [iantruscott]
iantruscott has joined #mlw
07:36:10 [fsasaki]
Domenico describes how the punycode translation helps to use IDN, while keeping the underlying domain name system as is
07:39:38 [tadej]
tadej has joined #mlw
07:40:40 [tadej]
tadej has left #mlw
07:40:47 [tadej]
tadej has joined #mlw
07:45:33 [lbellido]
lbellido has joined #mlw
07:46:56 [fsasaki]
fsasaki has joined #mlw
07:47:26 [chaals]
q+ to ask about how users will distinguish papa.it and papá.it
07:47:33 [Zakim]
Zakim has joined #mlw
07:47:38 [chaals]
q+ to ask about how users will distinguish papa.it and papá.it
07:48:01 [fsasaki]
topic: presentation from oreste signore on "web for all"
07:48:08 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
07:50:09 [chaals]
[webfonts is actually really important for some places ... ]
07:50:14 [fsasaki]
oreste is showing various areas that need more work to create "a web for all", e.g. in the area of accessibility, multilinguality etc.
07:50:53 [fsasaki]
oreste describes wcag 2.0
07:51:01 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
07:51:59 [fsasaki]
oreste: issues of multilingual web: encoding, colors, navigation, ...
07:52:12 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
07:52:49 [fsasaki]
q-
07:54:18 [PBS]
PBS has joined #mlw
07:54:27 [r12a]
q?
07:56:42 [fsasaki]
oreste describes the role of W3C offices, translations, W3C I18N Activity etc. as important means to push the multilingual web
07:57:08 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
07:58:55 [fsasaki]
topic: Presentation from Kimmo Rossi
07:59:51 [fsasaki]
Kimmo: I am project officer for mlw project
08:00:45 [fsasaki]
.. I am very happy about the enthusastim in this project. It is very small in terms of budget, but it is very successful
08:01:22 [fsasaki]
.. mlw has also been very succesful in using social media
08:01:39 [fsasaki]
.. looking forward to see the next steps including the review which is coming up
08:02:11 [fsasaki]
.. mlw has been wonderful forum for gathering new ideas, to understand how much fragmentation still exits
08:02:26 [fsasaki]
.. now it is time to become operational, to start to put ideas into practice
08:02:47 [fsasaki]
.. I except that this project will come up with good recommendations: what needs to be done, why, who could do it?
08:03:01 [fsasaki]
.. we have to create operational working links to other European projects
08:03:20 [fsasaki]
.. mid 2015 we will have about 50 onging projects in the area of multilingual technologies
08:03:39 [fsasaki]
.. we started creating these links, i.e. we have speakers from several European projects
08:03:56 [fsasaki]
.. please look into these other initiatives and see what we can do together
08:04:19 [fsasaki]
.. we started funding language technology 2 years ago - we are reaching a plateau
08:04:47 [fsasaki]
.. we just evaluated 90 proposals, asking 240 mill. Euros, we only have 50 mill. Euros
08:04:59 [fsasaki]
.. we can only select one of five projects
08:05:20 [fsasaki]
.. there is still one more call coming up for SME: 35 mill. Euro for sharing data / language resources
08:05:35 [fsasaki]
.. there is still three weeks to put in a proposal
08:05:41 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
08:06:17 [fsasaki]
kimmo: once SME call is other, we will have about 50 projects
08:06:49 [fsasaki]
.. we spent 150.000 Euros to fund a survey, interviewing many people in European states
08:07:03 [fsasaki]
.. asking about language use while being online
08:07:24 [fsasaki]
.. results will soon be public on our web site and europe barometer web site
08:07:37 [fsasaki]
.. results are that use of other languages is mostly passive
08:07:56 [fsasaki]
.. when people write and engage in social networking, they prefer to use their own language
08:08:17 [fsasaki]
.. 44% said: they are missing important information because they don't understand the language used
08:08:39 [fsasaki]
.. thank you, have a succesful conference
08:08:49 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
08:09:10 [fsasaki]
topic: presentation from ralf steinberger
08:09:54 [fsasaki]
ralf: talking about attempts to give access to information across languages
08:10:15 [fsasaki]
.. monitoring news in 50 languages
08:10:29 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
08:12:23 [fsasaki]
ralf introduces JRC
08:14:12 [fsasaki]
ralf describes the news sources used for "media monitoring": 100.000 news articles gathered per day, in 50 languages
08:14:48 [fsasaki]
ralf: articles are converted into rss for further processing
08:16:48 [fsasaki]
ralf gives examples of news coverage: not always news are available in English, but sometimes more is available in other languages
08:18:38 [fsasaki]
ralf: we also find out co-occurences: who or what is mentioned with whom or what in different languages?
08:19:25 [fsasaki]
.. also analysing quotation networks: who gets mentioned by whom, also different depending on the language
08:19:37 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
08:19:46 [r12a]
q?
08:20:12 [fsasaki]
ralf: recognition of entities (mostly persons) in about 20 languages
08:22:35 [fsasaki]
ralf: multilingual categorization, using about 1000 categories, using boolean search word operations, optional weights of words, co-occurance and distance of words, regular expressions for inflection forms (not only morphological)
08:24:11 [fsasaki]
.. multilngual categorization in general and specific for medicine in the medisys - system
08:25:29 [fsasaki]
.. classifying countries and category, e.g. there is 1/2 article about tuberculosus in tzech, but if suddenly it is 5 articles a day, we can issue an alert
08:27:08 [j]
j has joined #mlw
08:27:51 [j]
j has left #mlw
08:28:32 [j]
j has joined #mlw
08:28:43 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
08:29:22 [fsasaki]
ralf introducing news explorer - multilingual news daily overview
08:36:08 [fsasaki]
ralph: application about multilingual template filling - NEXUS, extracting structured information about events
08:36:21 [fsasaki]
.. focusing on conflicts, crimes, desastors, ...
08:36:40 [fsasaki]
.. want to know if there is a desastor with the need to send aid etc.
08:41:27 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
08:44:03 [fsasaki]
raplh: summarizing: have demonstrated our EMM system, technologies being used, application scenarios
08:44:55 [fsasaki]
.. modest attempts to get access across languages, but users appreciate it and it shows that the Web is not only for English
08:45:11 [fsasaki]
topic: Q/A for welcome session
08:45:18 [Zakim]
chaals, you wanted to ask about how users will distinguish papa.it and papá.it
08:46:49 [fsasaki]
domenico: punycode translation of papa.it and papá.it is different, so sure, yes
08:47:36 [fsasaki]
XYZ: question about nexus: if a news paper says "person X is a freedom fighter, another saying "person X is a terrorist", who do you deal with this?
08:48:12 [fsasaki]
raplh: there is political analysis being done, but categorization like the above is normally not being done
08:48:26 [fsasaki]
.. system is publicly accesibly via our home page
08:48:50 [fsasaki]
richard: now break
08:48:58 [tadej]
tadej has left #mlw
08:49:00 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
09:18:06 [Jirka]
Jirka has joined #mlw
09:18:21 [Jirka]
scribe Jirka
09:18:37 [RRSAgent]
I'm logging. I don't understand 'scribe Jirka', Jirka. Try /msg RRSAgent help
09:20:43 [Jirka]
scribenick: Jirka
09:21:22 [Jirka]
Adriane Rinsche opens Developer session
09:22:24 [Jirka]
topic: "Multilingual forms and applications" by Steven Pemberton
09:23:20 [Jirka]
Steven talks about HTTP content negotiation
09:23:43 [fsasaki]
fsasaki has joined #mlw
09:23:48 [Tomas]
Tomas has joined #mlw
09:24:26 [tadej]
tadej has joined #mlw
09:25:08 [PBS]
PBS has joined #mlw
09:25:31 [Jirka]
Steven shows some examples of content negotiation
09:26:01 [Jirka]
Steven talks about possibility of providing more better 404 error pages
09:27:16 [Jirka]
... and 406 pages
09:28:07 [Jirka]
... some servers like www.google.com ignore content negotiation headers
09:29:13 [Jirka]
... and try to guess your location based on your IP address
09:29:20 [Tomas]
Most do. The general problem is Multilingual Web Sites (MWS).
09:30:36 [Jirka]
... another approach is to have button for changing language on the web page itself
09:31:42 [Jirka]
... some sites even use Javascript to change content inside the page
09:32:31 [Jirka]
After summarizing some bad practices in serving multilingual websites Steven now introduces XForms
09:32:55 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Jirka
09:35:19 [Jirka]
XForms separate data and presentation. Steven shows this on example of the simple form
09:35:44 [luke]
luke has joined #mlw
09:36:42 [Jirka]
... XForms can contain calculations
09:37:03 [Jirka]
... controls are abstract and can get different styling easily
09:37:28 [Jirka]
... it's possible to use different datasources
09:37:36 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Jirka
09:39:22 [Jirka]
Steven shows form which can dynamically change labels for form fields based on the selected language for the form
09:42:23 [Jirka]
... XForms use declarative approach which require much less work to produce
09:43:44 [Jirka]
... conclusion - XForms allow to use "language stylesheets" to create multilingual forms even if this wasn't original goal for XForms
09:44:16 [Tomas]
It is in my presentation this afternoon. An overview http://dragoman.org/mws-india.html
09:45:00 [Jirka]
topic: "Lessons from standardizing i18n aspects of packaged web applications" by Charles McCathieNevile
09:45:22 [Jirka]
Chaals introduces Widgets technology
09:46:37 [Jirka]
... history of Widgets development and standardization in W3C
09:46:53 [Jirka]
... Widgets are now split into 7 specifications
09:47:32 [PBS]
PBS has joined #mlw
09:47:46 [Jirka]
Chaals shows source of simple Widget
09:48:18 [kimmo]
kimmo has joined #mlw
09:49:10 [Jirka]
... describes l10n features of Widgets
09:49:20 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
09:52:22 [Jirka]
... Widgets use xml:lang and for more larger resources separate language specific directory can be used
09:53:42 [Steven]
Steven has joined #mlw
09:54:42 [Jirka]
... Widgets do not use ITS because namespaces are too hard for some web develpers, instead few specific attributes and elements were adopted (span, dir, xml:lang)
09:55:22 [Jirka]
... Opera extensions are based on Widgets
09:56:13 [Jirka]
... l10n is hard, you should get advice and do proper test
09:59:41 [Jirka]
topic: "HTML5 proposed markup changes related to internationalization" by Richard Ishida
10:00:34 [r12a]
r12a has left #mlw
10:00:45 [Jirka]
Richard closes his IRC client
10:01:02 [Steven]
Steven has joined #mlw
10:01:18 [Jirka]
Richard tries to explain what HTML5 mean
10:01:38 [Jirka]
... Richard will talk only about HTML5 specification
10:01:40 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
10:02:11 [Jirka]
... not about related things like CSS3, new Javascript APIs, ...
10:02:33 [Jirka]
... HTML5 endorses utf-8 encoding
10:03:17 [Jirka]
... simplified encoding declaration <meta charset=utf-8>
10:03:56 [Jirka]
... polyglot documents are both XML and HTML5 (HTML syntax) documents, use utf-8, no XML declaration
10:04:28 [Steven]
Actually, XHTML 1.0 had the same thing, but didn't call it "Polyglot"
10:05:23 [Steven]
But it was addressing the same problem
10:07:34 [Jirka]
... charset attribute was removed from link and a elements
10:08:40 [Jirka]
... language declaration can use lang attribute or content-language HTTP header
10:09:05 [Jirka]
... content-language can contain more languages then one
10:10:00 [Jirka]
... content-language was just recently removed from HTML5 draft
10:10:26 [Jirka]
Richard now explains Ruby
10:11:03 [chaals]
[Ruby was very common in western medieval texts, where greek, latin, hebrew etc would be mixed. E.g. religious texts, and scholarly documents]
10:11:18 [Steven]
Yes, Chaals, it is very useful for other things than Ruby; pity they called it Ruby mark up, since it is more than that
10:11:22 [Jirka]
... HTML5 have support for Ruby, but uses slightly different markup then XHTML 1.1 or ITS (missing rb element for base text)
10:11:54 [Jirka]
... Bidi support
10:12:58 [PBS_]
PBS_ has joined #mlw
10:13:00 [Jirka]
... HTML5 adds bdi element for bidi isolation
10:13:36 [Jirka]
... dir="auto" allow run-time decision about directionality
10:15:10 [Steven]
I sent a last call comment to the ruby WG, saying they should call it something more generic, but they declined "because Microsoft had already implemented it"
10:15:29 [Jirka]
... Richard invites all to get involved in spec development
10:16:06 [Jirka]
topic: "Internationalization (or the lack of it) in current browsers" by Gunnar Bittersmann
10:16:51 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
10:17:54 [Jirka]
Gunnar talks about some problems in the HTML5
10:18:25 [Jirka]
... valdation of email input type field is too restrictive in spec - doesn't support IDN
10:19:34 [Jirka]
s/valdation/validation/
10:19:52 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Jirka
10:21:43 [Jirka]
... each browser provides different UI for changing preferred language
10:22:05 [Jirka]
... some browsers has bugs in this
10:22:35 [r12a]
r12a has joined #mlw
10:23:39 [Steven]
Some browsers have bugs, but some do it completely wrong :-)
10:25:12 [Jirka]
... language negotiation is missing some feature
10:25:25 [Jirka]
... how to label original and translation
10:25:34 [Jirka]
... how to label human and machine translation
10:26:04 [Jirka]
topic: "What's Next in Multilinguality, Web News & Social Media Standardization?" by Jochen Leidner
10:26:08 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Jirka
10:28:21 [Jirka]
Jochen shows mind map of presentation
10:28:41 [Jirka]
... presents details about Thomson Reuters company
10:32:00 [Jirka]
... customers require high quality
10:32:18 [Jirka]
... combination of human and automatic methods is in use
10:32:30 [Jirka]
... XML and Unicode is heavily used
10:33:10 [Jirka]
... main issue is not lack of standards but developer education
10:33:48 [Jirka]
... i18n and l10n is not a part of curriculum
10:38:11 [Jirka]
... new chalanges are support for multimedia content
10:38:29 [Jirka]
... some content is hidden (Facebook, Twitter, ...)
10:39:38 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
10:41:15 [Jirka]
... proposes more open twitter-like messaging system with better support for i18n
10:42:19 [omstefanov]
omstefanov has joined #mlw
10:42:21 [Jirka]
... it might be useful to HTML tag saying that some page is translation of a different page
10:42:40 [Jirka]
topic: Q&A session
10:44:27 [Jirka]
Question from Google: Defends current state of affair regarding language selection. Asks whether easier UI will help?
10:45:09 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
10:45:59 [Jirka]
Chaals: Interface should be easier to use, most users doesn't set their language
10:46:51 [Jirka]
... content should contain as much metadata as possible to inform about alternative versions of content
10:47:21 [Jirka]
Richard: mentions some extension that allows easier change of preferred language
10:48:12 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
10:49:05 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
10:50:34 [Andrea]
Andrea has joined #mlw
10:50:46 [Jirka]
Question from Olaf: What is chance to implement some notation for marking document being in the original language.
10:51:11 [Jirka]
Chaals: There are many notations starting from simple rel= going to RDF
10:52:40 [Jirka]
... you should use it, browsers will support what is used on the pages visited by users
10:53:24 [Jirka]
... you should talk to producers of content creation tools
10:54:35 [Jirka]
Richard: you should be more involved, create proposals, ...
10:55:15 [Jirka]
Felix Sasaki: It's possible to introduce new language subtag for this
10:55:52 [fsasaki]
.. use the ietf-languages list to discuss this with the people reviewing such proposals
10:55:59 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
11:38:56 [Zakim]
Zakim has left #mlw
11:59:44 [Steven]
Steven has joined #mlw
11:59:54 [Steven]
Scribe: Steven
12:01:27 [Steven]
Topic: Creators
12:02:28 [Steven]
i/Scribe: Felix/scribenick: fsasaki
12:02:38 [Steven]
rrsagent, make minutes
12:02:38 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Steven
12:03:41 [Steven]
i/scribe: felix/scribenick: fsasaki
12:03:49 [Steven]
rrsagent, make minutes
12:03:49 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Steven
12:06:22 [Jirka]
Jirka has joined #mlw
12:06:36 [Steven]
Felix: Welcome to afternoon session
12:06:51 [Steven]
Topic: Office.com 2010
12:07:43 [Steven]
(Speaker - Dag Schmidtke)
12:08:11 [Steven]
Dag: 37 langs, 51 markets
12:09:37 [Steven]
... some countries have more than one language (eg Belguim, Canada)
12:09:43 [Steven]
s/ui/iu/
12:10:19 [Steven]
Dag: adding value to Office
12:10:47 [Steven]
... content, templates, also sell Office
12:11:05 [Steven]
... campaigns in different markets at different times
12:11:16 [Steven]
... market specific engagement
12:11:49 [Steven]
Dag: Recent migration, site management and authoring from XMetal to Word
12:12:19 [Steven]
... and using sharepoint instead of a custom publishing system
12:12:36 [Steven]
... we did extend Word to support this
12:12:45 [Steven]
... allows federated authoring
12:13:22 [Steven]
... helps with localization
12:13:55 [Steven]
Dag: Lessons from this migration
12:14:07 [Steven]
... internationalisation was a key stakeholder
12:14:22 [Steven]
... designed for scale
12:15:12 [Steven]
... it was quite an effort, next time we won't do everything at once
12:16:08 [Steven]
Dag: 100s of thousands of help documents for at least the last three releases
12:16:17 [Steven]
... content heavy
12:16:53 [Steven]
... complexity wasn't where we expected, and was more complex than we expected
12:17:51 [Steven]
Dag: General lessons from the site
12:18:43 [Steven]
... Serve all global market needs, English is just another language
12:18:56 [Steven]
... scale up *and* down
12:19:48 [r12a]
r12a has joined #mlw
12:19:55 [chaals]
chaals has joined #mlw
12:20:10 [Steven]
... design for growth
12:20:57 [Steven]
[gives example of content riginating in Japan, and translated to other languages]
12:21:03 [Steven]
s/rig/orig/
12:22:11 [Steven]
Dag: No character formating, only character styles
12:22:18 [Steven]
s/ting/tting/
12:22:53 [Steven]
Dag: We have an XML format for translation
12:23:02 [Steven]
Dag: Local touch
12:23:20 [Steven]
... deliver right experience to each market
12:23:45 [Steven]
[examples]
12:25:19 [Steven]
Dag: Customer connection
12:25:41 [Steven]
... feedback, evaluation, SEO
12:26:36 [Steven]
[examples from site]
12:27:02 [Steven]
Dag: Continuous updates
12:27:27 [Steven]
... respond to regional events, A/B testing
12:27:48 [Steven]
... use some machine translation
12:27:56 [Steven]
Dag: Future trends
12:28:05 [Steven]
... moving to the cloud
12:28:43 [Steven]
... multilingual multimedia
12:28:59 [Steven]
... language automation
12:29:21 [Steven]
.... interoperability with standards
12:29:27 [Steven]
s/..../.../
12:29:36 [Steven]
Dag: Conclusions
12:29:59 [Steven]
... It is possible to design for scale and local relevance
12:30:49 [Steven]
Topic:
12:30:49 [Steven]
Jirka Kosek - Using ITS in the common content formats
12:31:06 [Steven]
s/Jirka/Topic: Jirka/
12:32:19 [Steven]
rrsagent, make minutes
12:32:19 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Steven
12:33:16 [Steven]
Jirka: tag set designed to help with translations
12:33:37 [Steven]
... usable with any XML vocabulary
12:34:39 [Steven]
[example of use]
12:36:21 [Steven]
Jirka: Allows automatic software to see what should not be translated, as well as human translators
12:36:35 [omstefanov]
omstefanov has joined #mlw
12:36:58 [chaals]
[As Jirka said, you don't have to use the actual ITS namespace to use the ITS pieces - and the decision for widgets was indeed to do that]
12:37:05 [Steven]
Jirka: Now to look at formats that support ITS
12:37:14 [Steven]
... first DocBook
12:38:16 [lbellido]
lbellido has joined #mlw
12:38:22 [Steven]
[example]
12:41:24 [Steven]
Jirka: Next format, DITA
12:41:45 [Steven]
... for topic-bsed documentation
12:42:07 [Steven]
.... DITA doesn't natively support ITS
12:42:13 [Steven]
... can be added
12:42:31 [Steven]
Jirka: Now OOXML
12:42:40 [jan]
jan has joined #mlw
12:43:00 [Steven]
... Open Office, and even for MS Office 2007+
12:43:23 [Steven]
... no native support, but can be added
12:44:30 [jan]
Office Open XML is a MS developed standard, not Open Office... ;-)
12:44:43 [Steven]
Jirka: ODF is similar
12:45:07 [Steven]
Jirka: XHTML allows use of ITS
12:45:46 [Steven]
... HTML5 has no extension points to allow ITS
12:47:02 [Steven]
... what is to be done?
12:48:09 [Steven]
... HTML5 needs to be augmented to support ITS
12:50:01 [Steven]
Dag: MS translator does support something similar
12:51:00 [Steven]
Steven: If XHTML5 supports it, why not just say "Use XML serialization if you want this facility"?
12:51:23 [Steven]
Jirka: Not sure if people can produce well-formed XML
12:51:50 [Steven]
Topic: Chaals - standards for multilingual websites
12:52:38 [Jirka]
Slides from my presentation http://www.kosek.cz/xml/2011mlwpisa/
12:52:58 [Steven]
Chaals: What standards should be developed?
12:53:35 [Steven]
... there are lots of multilingual sites. Substantial problems
12:53:56 [Tomas]
I am here ... just in case
12:54:04 [Steven]
... principles - don't break existing stuff
12:54:25 [Steven]
... expect it to take time
12:54:38 [Steven]
... two sides of coin: users and webmasters
12:54:39 [Tomas]
Slides - http://dragoman.org/pisa/carrasco-mw-pisa.pdf
12:55:07 [Steven]
Chaals: But it is often less clear-cut
12:55:46 [Steven]
Chaals: Currently - no consistent user interface for a ML website.
12:56:22 [Steven]
... this should be fixed
12:56:25 [Encarna]
Encarna has joined #mlw
12:56:53 [Steven]
... No standards for multilingual content production
12:57:02 [Steven]
... this should be fixed
12:57:54 [Tomas]
No standards for content production - in general - not a particular problem to MWS
12:58:02 [Steven]
Chaals; Most users are monolingual
12:58:09 [Steven]
[Scribe: he claims]
12:58:43 [Tomas]
One needs hard data
12:58:46 [Steven]
Chaals: Webmasters must manage multilingual system
12:59:06 [Steven]
... users don't want more complexity
13:00:21 [Steven]
... webmasters aren't necessarily experts in this stuff
13:00:56 [Steven]
... interfaces for content from the user side are well-established
13:01:01 [Steven]
... not so for webmasters
13:02:33 [Steven]
Chaals: Some ideas - language button in the browser
13:02:48 [Steven]
... use HTTP header fields maybe
13:03:48 [Steven]
... content negotiation
13:03:49 [Tomas]
Another good "high level" variant is memento http://www.mementoweb.org
13:04:05 [Steven]
... reserved URIs
13:04:37 [Steven]
... I am not sure if reserved URIs are a good idea
13:04:54 [Steven]
Chaals: It should be possible to request a translation
13:05:07 [Steven]
... there's an Opera extension for that
13:05:35 [Tomas]
A reserved URI is very good as one can have all the pages in the MWS with the same URI pointing to the variants
13:06:20 [Tomas]
maitaining pages with different URIs for the variants is very hard
13:06:38 [Steven]
... need a metaresource concept
13:06:45 [Steven]
[Scribe: RDFa!]
13:07:46 [Tomas]
RDF might do it - needs verification
13:08:26 [Steven]
Chaals: Need server-side standards
13:09:21 [iantruscott]
iantruscott has joined #mlw
13:09:35 [Steven]
Scribe: RDFa was largest growing web format last year http://rdfa.info/2011/01/26/rdfa-grows/
13:09:58 [Steven]
Chaals: Next step? Working group maybe
13:10:23 [Steven]
... at W3C? Elsewhere?
13:10:37 [Tomas]
No WG, not specifications
13:10:43 [Steven]
... or create a new initiative?
13:11:12 [Steven]
Chaals: Need guides for best practice on user and webmaster sides
13:13:44 [r12a]
scribe: r12a
13:14:05 [r12a]
Topic: Sophie Hurst - Local is global
13:14:11 [Tomas]
A tabular view http://dragoman.org/mws-india.html
13:14:35 [Tomas]
Chaals: you wont your beer !!!
13:15:06 [r12a]
Sophie: 90% of HP buy based on content rather than touhing product
13:15:24 [r12a]
s/HP/HP's customers/
13:16:14 [r12a]
Sophie: 42% of web users are from Asia
13:16:26 [r12a]
... only 13% from USA
13:16:38 [r12a]
... yet English still leading language
13:17:20 [r12a]
... asia has highest usage but low penetration
13:17:27 [r12a]
... therefore it's a growth area
13:17:49 [r12a]
... 10% retails sales in CHina are done online
13:18:00 [r12a]
s/retails/retail/
13:18:15 [r12a]
s/CH/Ch/
13:19:02 [chaals]
[My concern with reserved URIs is that it breaks some existing standards and expectations. I think HTTP headers and metadata are better approaches. (I generally hate reserved URIs - they are used in P3P, favicons, robots.txt and a couple of other places, but I don't think they're going to handle the complexity of multilingual websites without creating as many problems as they solve...)]
13:19:04 [r12a]
Sophie: How to represent brand consistently, locally
13:19:12 [r12a]
... how to make it relevant
13:19:23 [chaals]
[I certainly think that being able to get the information about available variants is really important]
13:19:29 [r12a]
... how to manage translation
13:20:05 [r12a]
Sophie: First is to use component based system
13:21:10 [Jirka]
chaals: yes, but it might be sufficient to have link/http header pointing to another URL where manifest listing all possible variants will be sitting then to have dozen of alternatives in each page -- to much change when new translation is added
13:21:17 [r12a]
... synchronisation between compnenet sis then eassy to manage
13:21:23 [r12a]
s/ss/s
13:21:54 [r12a]
... allows local components, but global style
13:22:17 [r12a]
... eg Emirates site
13:23:18 [r12a]
Sophie: Use positioning information to personalise information
13:23:54 [r12a]
... example, Lux brand which is up-market in India, but not elsewhere
13:24:21 [r12a]
... need local input to ensure local nuances are working
13:25:22 [r12a]
... users come with cultural layers as well
13:25:38 [r12a]
... cultures vary in many dimensions
13:26:39 [r12a]
Sophie: Finally, managing content
13:26:57 [r12a]
... need a well-managed process
13:27:06 [Tomas]
[The browser side is much better, but we have to care for the server side. This is the question: how to implement the server-side. Separate function from the mechanism: we can explore different mechanisms. One fix reserved URI for the whole server combined with the Referer header will certanly resolve a big problem (different URIs for each page).
13:27:30 [r12a]
... can be automated to large extent (the management, not the translation)
13:28:10 [r12a]
[shows an example process]
13:29:30 [r12a]
Sophie: In conclusion, translation must be part of a larger picture
13:29:55 [r12a]
... use component, geo-positioning, and translation management
13:30:09 [r12a]
[Q&A session]
13:30:12 [Tomas]
Question of scope: what should be in MWS and what in other specifications for full translation system.
13:30:51 [Tomas]
The picture is larger: Authorship, Translation and Publishing Chain
13:31:51 [Tomas]
Translation is only part of the whole production chain
13:31:59 [r12a]
Christian Lieske: For Chaals- I got different messages - we've got to do stuff, but Sophie seems to suggest we can already do it.
13:32:40 [r12a]
Chaals: It's not that we can't do it already, but that there is no agreed way to do it
13:33:59 [Tomas]
We need to define the different scopes and how the different fields integrate; a MWS is *not* a translation management system.
13:34:30 [r12a]
Chaals: We have no ineroperability
13:34:38 [r12a]
s/iner/inter/
13:34:46 [Tomas]
You wont another beer !!!
13:35:13 [r12a]
Sophie: Changing solutions is hard, standards could help
13:35:31 [Tomas]
We need to identify what is particular to MWS and is general.
13:36:15 [r12a]
Sophie: We should work towards a position where you need less developers
13:36:44 [Tomas]
Language is just one of the dimensions in TCN; e.g., mementos should be integrated in the same mechanism http://www.mementoweb.org/
13:37:42 [r12a]
Dag: We have a translation tag, but it is not standard, so there is less customer value, in the long run a standard lowers the cost of entry for us
13:37:45 [Tomas]
+1 regarding developers: one should be able to construct a MWS from Apache out of the box
13:37:57 [Andrea]
Andrea has joined #mlw
13:38:15 [r12a]
Tomas Abramovitch: Do you use different CSS for different cultures?
13:38:30 [r12a]
... and how accurate is geo-location?
13:39:07 [Tomas]
One could (CSS)
13:39:10 [r12a]
Dag: We componentise our pages, the local part is not done by CSS
13:39:45 [r12a]
Sophie: I can't totally answer the geo-loc part.
13:40:28 [r12a]
Chaals: It is a spectrum from one person to just someone in a country
13:40:28 [Tomas]
One could generate some pages: "5.3. Generating language in parallel" in http://dragoman.org/mws/oamws.pdf
13:41:15 [chaals]
s/one person/identifying one seat in an audience/
13:42:22 [r12a]
Ian Truscott: identifying people is always a guess until they log in
13:42:55 [Tomas]
Or he set his browser preferences
13:43:27 [r12a]
Reinhard: How do we learn from research? No one has mentioned this
13:43:48 [r12a]
... different people like different things
13:44:08 [r12a]
... 16 year olds in China have more in common with 16 year olds in the USA than with their parents
13:44:35 [r12a]
... all I've heard is corporate policy. Why not let the user decide?
13:45:01 [Tomas]
A user wants the page in his language
13:45:10 [r12a]
Sophie: Crowd sourcing is an option
13:45:39 [Tomas]
Choosing is already a hurdle
13:47:33 [Tomas]
We need to look at all the available mechanisms and decide on a recommendation: "4.4. Options" in http://dragoman.org/mws/oamws.pdf
13:50:00 [r12a]
Dag: There are areas where our interest and the users' coincide
13:50:22 [r12a]
... but we can't do translation on demand
13:50:38 [r12a]
... they pay for premium product
13:51:05 [chaals]
[It isn't always a guess identifying the user until they log in. In fact, technically it is often easy to identify users anyway - this is why we have laws to protect privacy and limit the things done to make it easy]
13:54:04 [r12a]
Steven: A good example of Reinhard's point is websites that conflate refgion with language. I often don't knwo which question they are asking.
13:54:38 [r12a]
... and I don't believe that most people are monolingual. There are 6000 languages, and 150 countries. Most people are at least bilingual
13:55:06 [r12a]
[scribe's computer is nearly out of battery]
13:55:26 [Tomas]
[we need to identify what the user wants, not who he is]
13:55:36 [r12a]
Reinhard: Corwdsourcing translation is often not possible because of copyright issues
13:55:54 [r12a]
s/Cor/Cro/
13:57:12 [r12a]
Olaf: We need the possibility to offer translations of parts of sites
13:57:38 [r12a]
... it works on wikipedia
13:57:52 [Tomas]
Monolingual user: we need hard data; but circunstancial data point to that the requirement of most user is monolingual.
13:58:01 [r12a]
... microsoft needs to open its translation tools
13:58:48 [r12a]
Chaals: I use crowdsourced translation of Norwegian law
13:59:31 [r12a]
... it is easy to do, but by and large it doesn't happen
13:59:45 [r12a]
... too little reward
14:01:41 [Tomas]
Translation integration in MWS: a language non available could be defined as a "language potentially available" (after translation). One needs a mechanism covering all the aspects of the different translation techiques: human (professional, crowd), machine (fast as RBMT or slow as SMT).
14:02:30 [karl]
karl has joined #mlw
14:04:14 [Tomas]
For the whole enchilada: "Open architecture for multilingual parallel texts" http://arxiv.org/ftp/arxiv/papers/0808/0808.3889.pdf
14:06:05 [Tomas]
OK: I go for coffee.
14:30:02 [Jirka]
Jirka has joined #mlw
14:42:22 [fsasaki]
fsasaki has joined #mlw
14:43:23 [fsasaki]
topic: presentation from Christian Lieske et al.
14:44:03 [lbellido]
lbellido has joined #mlw
14:45:10 [fsasaki]
christian: five areas show that there is a need for change:
14:45:39 [fsasaki]
.. demand for language related services, shortcomings of today's translation-related standards, ...
14:45:57 [fsasaki]
.. why talking about standards: demand & lack of interoperability
14:47:02 [fsasaki]
.. lack of interoperability e.g. for XLIFF
14:47:46 [fsasaki]
.. things break down across tool chains
14:48:15 [fsasaki]
.. standards in localization area are sometimes not compatible
14:48:26 [fsasaki]
.. example of phrases in TMX vs phrases in XLIFF
14:49:16 [fsasaki]
christian: not of work in localization standardization integration new web technologies
14:49:43 [fsasaki]
.. e.g. aspect of RESTful services, use of related protocols (odata, gdata) for translation related services
14:50:11 [fsasaki]
.. these problems have lead to implementation challenges, problems for standards that are already here
14:50:47 [fsasaki]
.. how to solve the problems: four areas of requirements, methodology, compliance , stewardship are important
14:51:44 [fsasaki]
.. requrements: identify processing areas related to language processing - and keep them separated
14:52:00 [fsasaki]
.. determine the entities that needed in each area
14:52:24 [fsasaki]
.. chart technology options and needs
14:52:48 [fsasaki]
... etc. Next: methodology:
14:53:12 [fsasaki]
.. distinguish between models and implementation / serialization
14:53:13 [omstefanov]
omstefanov has joined #mlw
14:53:41 [fsasaki]
.. distinguish between entities without context and entities with business / processing context
14:53:53 [fsasaki]
.. set up rules to transform data models into syntax
14:54:15 [fsasaki]
.. set up flexible registries, e.g. CLDR, IANA
14:54:33 [fsasaki]
.. provide migration paths / mapping mechaisms for legacy data
14:54:52 [fsasaki]
s/mechaisms/mechanisms/
14:55:01 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
14:55:39 [fsasaki]
.. third, compliance: e.g. what does "support for standard X" mean?
14:56:50 [fsasaki]
.. finally, stewardship: driving, supporting standardization activity
14:57:07 [fsasaki]
.. anyone who shouts for small standards should be willing to invest
14:57:25 [fsasaki]
.. EC has a track recor, see e.g. mlw project
14:57:37 [fsasaki]
.. make donations / contributions easy
14:57:50 [fsasaki]
.. discourage fragmentation and unclear roles
14:58:20 [fsasaki]
.. LISA does no longer exist, now there is a kind of competition who could follow in the footsteps
14:58:53 [fsasaki]
.. my fear is that another organization is being cretaed, my and probably Felix' and Yves' thought is that this should be avoided
14:59:36 [fsasaki]
topic: David Filip on "Multilingual transformations on the web via XLIFF current and via XLIFF next"
14:59:52 [fsasaki]
.. christian has covered a lot for XLIFF 2.0 - what do I want to cover?
15:01:10 [fsasaki]
s/.. christian/david: christian/
15:02:12 [fsasaki]
david: my main statements: metadata must survive language transformations, content metadata must be designed upfront with the transformation process in mind, XLIFF is the principle vehicle for criticial metadata throughout multilingual transformations
15:02:51 [fsasaki]
.. and finally: next generation XLIFF standard is an exciting work in progress in OASIS TC
15:03:06 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
15:03:20 [cchiavet]
cchiavet has joined #mlw
15:03:49 [fsasaki]
.. about preserving metadata: there are various transformations: g11n, l18n, l10n, t9n ("GILT")
15:04:01 [fsasaki]
.. transformation modi: manual, automated, assisted
15:04:37 [fsasaki]
.. transformation types: MT, human translation, postediting, stylistic review, tagging (semantic, subject matter review, transcribing), subtitling, ...
15:04:47 [fsasaki]
.. growing number of source languages
15:04:54 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
15:05:54 [Steven_]
Steven_ has joined #mlw
15:06:26 [fsasaki]
david: what metadata is necessary?
15:06:34 [fsasaki]
.. preview and context are critical
15:06:54 [fsasaki]
.. argue for creating standardized XSLT artefacts for preview
15:07:12 [fsasaki]
.. metadata for legally conscious sharing (ownership, licensing, ...)
15:08:44 [fsasaki]
.. grammatical, syntactic, morphological and lexical metadata
15:09:18 [fsasaki]
.. example of m4loc project: they developed an XLIFF middleware to ensure interop between localization open source tool and moses MT tool
15:10:15 [fsasaki]
.. tagging of culturally and legally targeted information
15:11:07 [fsasaki]
.. home for LT standardization? Leverage BP of existing loc standards (XLIFF, TBX, SRX, ...) - pointing into the past (OASIS, LISA)
15:12:13 [fsasaki]
.. now: leverage OASIS XLIFF, ISO TC37, Unicode SRX and GMX
15:12:48 [fsasaki]
.. further development of W3C ITS and RDF, create conscious standardization including RDF and XLIFF
15:13:13 [fsasaki]
david: OASIS is home of XLIFF, but has also UBL and XBL as its home
15:13:31 [fsasaki]
.. W3C has ITS and RDF modeling, Unicode - see above
15:13:48 [fsasaki]
.. ISO TC 37, important not for standards creation but for secondary publishing
15:14:30 [fsasaki]
.. why XLIFF?, and why 2.0? see also presentation from christian
15:15:21 [fsasaki]
.. good progress of XLIFF in 2011 possible, as SWOT analysis shows
15:15:40 [fsasaki]
.. prediction: 2011 will see definition of new features, in 2012 new standard
15:16:13 [fsasaki]
topic: presentation from Sven C. Andrä
15:16:19 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
15:17:19 [fsasaki]
sven: kilgray, "we localize", Andrä, biobloom are behind the "interoperability now!" initiative
15:17:36 [Steven_]
i/christian: five/scribenick: fsasaki
15:17:41 [fsasaki]
.. translation (technology) industry is a niche industry
15:17:42 [Steven_]
rrsagent, make minutes
15:17:42 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Steven_
15:18:03 [fsasaki]
.. very few computer scientists here, not a technical, but experience driven industry
15:18:16 [fsasaki]
.. industry is getting more and more important, including technology
15:18:23 [fsasaki]
.. hence interop is getting more important
15:18:39 [fsasaki]
.. there are enough standards here, but they are complex, not many have reference implementations
15:18:47 [fsasaki]
.. and there is little exchange within tool providers
15:19:28 [fsasaki]
table of features in XLIFF that are supported by all tools - only two features (from about 50?) are supported by all tools
15:20:01 [fsasaki]
sven: we want lossless data exchange in a mixed (tool) environment
15:20:30 [fsasaki]
.. standards are important, also develo
15:20:45 [fsasaki]
s/develo/further development of XLIFF/
15:21:00 [fsasaki]
.. but mindset is most important, i.e. about the lossless data exchange
15:21:13 [fsasaki]
.. basis of our work: "interoperability manifesto"
15:22:34 [fsasaki]
.. pushing standards over the edge, give feedback to the TC
15:22:56 [fsasaki]
.. modules that we are working on: about content, package, transportation
15:23:06 [fsasaki]
.. content is modified xliff
15:23:16 [fsasaki]
.. package is currently just made up
15:23:27 [fsasaki]
.. for transfortation we are using regular web services
15:23:48 [fsasaki]
.. basic approach: disclose our concepts
15:24:27 [fsasaki]
.. reference implementations are open source
15:25:00 [fsasaki]
.. early real life usage
15:25:52 [fsasaki]
.. test scenarios to verify compliance
15:26:48 [fsasaki]
.. theoretical aspect: agile vs. standard?
15:27:30 [fsasaki]
.. would be good to have a framework for organizations like W3C that could help is to bring this into standardization step by step
15:28:17 [fsasaki]
.. benefits of this approach: it is a limited time that we are working on this
15:29:12 [fsasaki]
topic: presentation from Eliott Nedas
15:29:18 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
15:30:15 [tadej]
tadej has joined #mlw
15:34:27 [tadej]
David Grunwald, GTS
15:34:49 [fsasaki]
s/Eliott Nedas/David Grunwald/
15:35:14 [fsasaki]
david: our vision: have a box that creates quality content very quickly and cheaply
15:35:39 [fsasaki]
.. using MT, we want an efficient solution that will make mlw a reality
15:36:04 [fsasaki]
.. need to develop MT which is good for blog publishing
15:36:16 [fsasaki]
.. MT will never be ready "as is" for human quality translation
15:36:27 [fsasaki]
.. we developed a system for cheap and quick post editing
15:36:52 [fsasaki]
.. currently, explosion of content, lots of it is local because of language barriers
15:37:03 [fsasaki]
.. translation costs are very high
15:37:58 [fsasaki]
.. we are targeting open source CMS platforms
15:38:10 [fsasaki]
.. 20 % of web sites are published on such platforms
15:38:28 [fsasaki]
.. we could offer a good translation solution to these
15:38:41 [fsasaki]
.. large media publishers who use open source CMS
15:38:58 [fsasaki]
.. wordpress, movable type are created for all kinds of web sites, not only blogs
15:39:12 [fsasaki]
.. our solution: based on MT; human post editing, and crowd sourcing
15:39:21 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
15:40:03 [fsasaki]
david: crowdsourcing startups in many regions
15:40:44 [fsasaki]
.. our solution: not automated open source CMS solution for small guys
15:40:54 [fsasaki]
.. no automated tools for post editing / MT either
15:42:06 [fsasaki]
.. our solution uses data from blogs that is available on the web
15:43:55 [fsasaki]
.. workflow: user installs workpress, MT is done, email notification is sent to crowdsourcing translators, integrated after review by a moderator
15:45:55 [fsasaki]
.. interested in opportunities for funding this kind of work
15:46:11 [fsasaki]
topic: presentation from Pål Nes
15:46:20 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
15:46:42 [fsasaki]
Pål: opera has been using crowd sourcing for a long time
15:47:32 [fsasaki]
.. caveat of crowd sourcing: it is not free, organizing it is difficult
15:47:45 [fsasaki]
.. e.g. employing managers for the crowd
15:48:18 [jan]
jan has joined #mlw
15:48:50 [fsasaki]
.. should only be used for cretain tasks
15:49:01 [fsasaki]
.. not for time critical tasks
15:49:27 [fsasaki]
.. mostly students are participating, picked up from university talks
15:49:34 [tadej1]
tadej1 has joined #mlw
15:49:43 [fsasaki]
.. large crowd is not necessarily a good crowd
15:49:59 [fsasaki]
.. better 3,4,5 good translators, than 50 translators doing nothing
15:51:09 [fsasaki]
.. e.g. press releases, marketing material are not well suited for crowd translations
15:51:29 [fsasaki]
.. good for crowd sourcing: applications (web site "my opera", "opera com"), with a stable set of text
15:51:38 [fsasaki]
.. and documentation, that is easy to maintain
15:52:28 [fsasaki]
.. start small, put your crowd under embargo / NDA
15:52:54 [r12a]
r12a has joined #mlw
15:53:37 [fsasaki]
.. try building up a hierarchy
15:53:47 [fsasaki]
.. be careful with your branding
15:53:58 [fsasaki]
.. and your terminology
15:56:06 [fsasaki]
.. for opera we used XLIFF - we used our own, incompatibly version of XLIFF
15:56:19 [fsasaki]
.. discovered that open source is not open standard
15:57:41 [fsasaki]
.. tools we used: gettext and po4a, transifex, translate toolkit with pootle and virtaal, homebrew applications to bridge the vast gaps
15:58:14 [fsasaki]
.. XLIFF is a mindfield, in the current version
15:58:40 [fsasaki]
.. about html: keep it as simple as possible, semantic markup is key
15:59:03 [fsasaki]
.. write proper CSS - write a separate RTL - stylesheet to negate RTL-challenged CSS
16:00:02 [fsasaki]
topic: presentation from Eliott Nedas
16:00:10 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
16:05:10 [fsasaki]
eliott: everything that was said from David, Christian etc. in this session about interoperability was right, I concur with them
16:05:47 [fsasaki]
.. we need standards because of interdependence.
16:06:23 [fsasaki]
.. the demise of LISA. Sad that they are gone, but opportunity to look into this in a new way
16:06:32 [fsasaki]
.. LISA standards are important
16:06:53 [fsasaki]
.. now is a good opportunity for a new model of standardization
16:07:03 [fsasaki]
.. new kids on the block: TAUS and Gala
16:07:50 [fsasaki]
.. currently losts of different technologies
16:09:23 [fsasaki]
.. and many different standards
16:09:34 [fsasaki]
.. OAXAL is a solution that brings these together
16:09:50 [fsasaki]
.. that can be used for free
16:11:18 [fsasaki]
description of various aspects of standards and applications built on top of it
16:12:41 [fsasaki]
eliott: how to spread the message: important e.g. in academic curricula
16:13:26 [fsasaki]
topic: presentation from Manuel Herranz
16:14:02 [fsasaki]
manuel: presentation about PangeaMT project
16:14:30 [Andrea]
Andrea has joined #mlw
16:14:42 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
16:15:03 [fsasaki]
manuel: translation is something that you have to go through for achieving what you want
16:15:32 [fsasaki]
.. web people except immediate translation
16:15:42 [fsasaki]
s/except/expect/
16:16:11 [fsasaki]
.. why don't we have immediate translations?
16:17:33 [fsasaki]
.. inroducing pangeanic: LSP, major clients in Asia and Europe
16:17:48 [fsasaki]
.. we wanted to provide faster service for translation
16:17:54 [fsasaki]
.. became founding member of TAUS
16:19:17 [tadej]
tadej has joined #mlw
16:19:20 [fsasaki]
.. four years ago created relation with computer science institute in valencia
16:19:42 [fsasaki]
.. challenge at that time: turn academic develpment (moses) into a commercial application
16:20:17 [fsasaki]
.. limitations: plain text, language model building (first), no recording, no update feature, data availability, ...
16:21:08 [fsasaki]
.. objectives: provide high quality MT for post editing
16:21:19 [fsasaki]
.. and to use only open standards: XLIFF, tmx, xml
16:21:34 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
16:22:10 [fsasaki]
.. built an TMX - XLIFF workflow
16:22:39 [fsasaki]
.. not to be locked into a solution
16:23:35 [fsasaki]
.. PangeMT system: comes as TMX or as XLIFF
16:23:51 [fsasaki]
.. TMX should not die, people are still using it
16:24:17 [lbellido]
lbellido has joined #mlw
16:25:07 [fsasaki]
.. future work: on the fly MT training
16:25:15 [fsasaki]
.. pick and match sets of data
16:25:23 [fsasaki]
.. objective stats for post-editors
16:25:28 [fsasaki]
.. confidence scores for users
16:26:50 [fsasaki]
topic: Q/A for localizers
16:27:19 [fsasaki]
reinhard: thank you, was a great session
16:28:26 [fsasaki]
.. about remarks on crowd sourcing: there was emphasis on crowd sourcing for enterprise
16:28:36 [fsasaki]
.. this does not go well together
16:29:29 [fsasaki]
.. other people like rosetta foundations, translators without borders etc. have made good experiences
16:29:59 [fsasaki]
Pål: crowd sourcing was good for us
16:30:11 [fsasaki]
.. it just took us a lot of effor and time to get there
16:31:18 [fsasaki]
jörg: there is some similarity: you have to train translators, otherwise you won't get the good results in medical translation
16:33:02 [fsasaki]
felix: one comment on interop now, it is very important to go into a standards body as a next step
16:33:15 [fsasaki]
sven: thanks, we will definitely try to do that
16:33:46 [fsasaki]
richard: w3c just created business groups / community groups, that might be a thing for you to look into
16:33:56 [fsasaki]
david: about what reinhard said
16:34:20 [fsasaki]
.. if your expectation is high you will be disappointed, but the business case is in the future
16:34:43 [fsasaki]
topic: wrap up
16:35:02 [fsasaki]
richard: see you tomorrow, speakers please show up at 8:30 tomorrow
16:35:07 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
16:40:55 [tadej]
tadej has left #mlw
17:00:38 [asgeirf]
asgeirf has joined #mlw