IRC log of mlw on 2011-04-04
Timestamps are in UTC.
- 07:13:17 [RRSAgent]
- RRSAgent has joined #mlw
- 07:13:17 [RRSAgent]
- logging to http://www.w3.org/2011/04/04-mlw-irc
- 07:13:24 [fsasaki]
- meeting: MLW workshop, PISA
- 07:13:27 [fsasaki]
- chair: richard
- 07:13:32 [fsasaki]
- scribe: felix
- 07:13:44 [fsasaki]
- topic: introduction
- 07:13:52 [Jirka]
- Jirka has joined #mlw
- 07:13:56 [fsasaki]
- Richard introduces the project and the workshop
- 07:14:12 [luke]
- 2nd of 4 MultilingualWeb conferences
- 07:14:41 [luke]
- Goal is to facilitate cross-pollination across different areas, so don't tune out if it's not your specialty!
- 07:18:56 [tadej]
- tadej has joined #mlw
- 07:20:27 [mpo]
- mpo has joined #mlw
- 07:20:38 [chaals]
- chaals has joined #mlw
- 07:21:14 [chaals]
- rrsagent, draft minutes
- 07:21:15 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html chaals
- 07:21:36 [chaals]
- rrsagent, make log public
- 07:22:51 [lbellido]
- lbellido has joined #mlw
- 07:24:16 [fsasaki]
- topic: Presentation from Domenico Laforenza on "The Italian approach to Internationalized Domain Names"
- 07:29:19 [tadej1]
- tadej1 has joined #mlw
- 07:30:22 [tadej1]
- tadej1 has left #mlw
- 07:30:26 [tadej1]
- tadej1 has joined #mlw
- 07:31:13 [fsasaki]
- Domenico describes the mechanisms behind IDN, domain names in general, the usage of the internet
- 07:31:47 [r12a]
- r12a has joined #mlw
- 07:33:18 [fsasaki]
- Domenico describes what is possible with IDN, compared to domain names in general
- 07:33:59 [iantruscott]
- iantruscott has joined #mlw
- 07:36:10 [fsasaki]
- Domenico describes how the punycode translation helps to use IDN, while keeping the underlying domain name system as is
- 07:39:38 [tadej]
- tadej has joined #mlw
- 07:40:40 [tadej]
- tadej has left #mlw
- 07:40:47 [tadej]
- tadej has joined #mlw
- 07:45:33 [lbellido]
- lbellido has joined #mlw
- 07:46:56 [fsasaki]
- fsasaki has joined #mlw
- 07:47:26 [chaals]
- q+ to ask about how users will distinguish papa.it and papá.it
- 07:47:33 [Zakim]
- Zakim has joined #mlw
- 07:47:38 [chaals]
- q+ to ask about how users will distinguish papa.it and papá.it
- 07:48:01 [fsasaki]
- topic: presentation from oreste signore on "web for all"
- 07:48:08 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 07:50:09 [chaals]
- [webfonts is actually really important for some places ... ]
- 07:50:14 [fsasaki]
- oreste is showing various areas that need more work to create "a web for all", e.g. in the area of accessibility, multilinguality etc.
- 07:50:53 [fsasaki]
- oreste describes wcag 2.0
- 07:51:01 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 07:51:59 [fsasaki]
- oreste: issues of multilingual web: encoding, colors, navigation, ...
- 07:52:12 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 07:52:49 [fsasaki]
- q-
- 07:54:18 [PBS]
- PBS has joined #mlw
- 07:54:27 [r12a]
- q?
- 07:56:42 [fsasaki]
- oreste describes the role of W3C offices, translations, W3C I18N Activity etc. as important means to push the multilingual web
- 07:57:08 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 07:58:55 [fsasaki]
- topic: Presentation from Kimmo Rossi
- 07:59:51 [fsasaki]
- Kimmo: I am project officer for mlw project
- 08:00:45 [fsasaki]
- .. I am very happy about the enthusastim in this project. It is very small in terms of budget, but it is very successful
- 08:01:22 [fsasaki]
- .. mlw has also been very succesful in using social media
- 08:01:39 [fsasaki]
- .. looking forward to see the next steps including the review which is coming up
- 08:02:11 [fsasaki]
- .. mlw has been wonderful forum for gathering new ideas, to understand how much fragmentation still exits
- 08:02:26 [fsasaki]
- .. now it is time to become operational, to start to put ideas into practice
- 08:02:47 [fsasaki]
- .. I except that this project will come up with good recommendations: what needs to be done, why, who could do it?
- 08:03:01 [fsasaki]
- .. we have to create operational working links to other European projects
- 08:03:20 [fsasaki]
- .. mid 2015 we will have about 50 onging projects in the area of multilingual technologies
- 08:03:39 [fsasaki]
- .. we started creating these links, i.e. we have speakers from several European projects
- 08:03:56 [fsasaki]
- .. please look into these other initiatives and see what we can do together
- 08:04:19 [fsasaki]
- .. we started funding language technology 2 years ago - we are reaching a plateau
- 08:04:47 [fsasaki]
- .. we just evaluated 90 proposals, asking 240 mill. Euros, we only have 50 mill. Euros
- 08:04:59 [fsasaki]
- .. we can only select one of five projects
- 08:05:20 [fsasaki]
- .. there is still one more call coming up for SME: 35 mill. Euro for sharing data / language resources
- 08:05:35 [fsasaki]
- .. there is still three weeks to put in a proposal
- 08:05:41 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 08:06:17 [fsasaki]
- kimmo: once SME call is other, we will have about 50 projects
- 08:06:49 [fsasaki]
- .. we spent 150.000 Euros to fund a survey, interviewing many people in European states
- 08:07:03 [fsasaki]
- .. asking about language use while being online
- 08:07:24 [fsasaki]
- .. results will soon be public on our web site and europe barometer web site
- 08:07:37 [fsasaki]
- .. results are that use of other languages is mostly passive
- 08:07:56 [fsasaki]
- .. when people write and engage in social networking, they prefer to use their own language
- 08:08:17 [fsasaki]
- .. 44% said: they are missing important information because they don't understand the language used
- 08:08:39 [fsasaki]
- .. thank you, have a succesful conference
- 08:08:49 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 08:09:10 [fsasaki]
- topic: presentation from ralf steinberger
- 08:09:54 [fsasaki]
- ralf: talking about attempts to give access to information across languages
- 08:10:15 [fsasaki]
- .. monitoring news in 50 languages
- 08:10:29 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 08:12:23 [fsasaki]
- ralf introduces JRC
- 08:14:12 [fsasaki]
- ralf describes the news sources used for "media monitoring": 100.000 news articles gathered per day, in 50 languages
- 08:14:48 [fsasaki]
- ralf: articles are converted into rss for further processing
- 08:16:48 [fsasaki]
- ralf gives examples of news coverage: not always news are available in English, but sometimes more is available in other languages
- 08:18:38 [fsasaki]
- ralf: we also find out co-occurences: who or what is mentioned with whom or what in different languages?
- 08:19:25 [fsasaki]
- .. also analysing quotation networks: who gets mentioned by whom, also different depending on the language
- 08:19:37 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 08:19:46 [r12a]
- q?
- 08:20:12 [fsasaki]
- ralf: recognition of entities (mostly persons) in about 20 languages
- 08:22:35 [fsasaki]
- ralf: multilingual categorization, using about 1000 categories, using boolean search word operations, optional weights of words, co-occurance and distance of words, regular expressions for inflection forms (not only morphological)
- 08:24:11 [fsasaki]
- .. multilngual categorization in general and specific for medicine in the medisys - system
- 08:25:29 [fsasaki]
- .. classifying countries and category, e.g. there is 1/2 article about tuberculosus in tzech, but if suddenly it is 5 articles a day, we can issue an alert
- 08:27:08 [j]
- j has joined #mlw
- 08:27:51 [j]
- j has left #mlw
- 08:28:32 [j]
- j has joined #mlw
- 08:28:43 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 08:29:22 [fsasaki]
- ralf introducing news explorer - multilingual news daily overview
- 08:36:08 [fsasaki]
- ralph: application about multilingual template filling - NEXUS, extracting structured information about events
- 08:36:21 [fsasaki]
- .. focusing on conflicts, crimes, desastors, ...
- 08:36:40 [fsasaki]
- .. want to know if there is a desastor with the need to send aid etc.
- 08:41:27 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 08:44:03 [fsasaki]
- raplh: summarizing: have demonstrated our EMM system, technologies being used, application scenarios
- 08:44:55 [fsasaki]
- .. modest attempts to get access across languages, but users appreciate it and it shows that the Web is not only for English
- 08:45:11 [fsasaki]
- topic: Q/A for welcome session
- 08:45:18 [Zakim]
- chaals, you wanted to ask about how users will distinguish papa.it and papá.it
- 08:46:49 [fsasaki]
- domenico: punycode translation of papa.it and papá.it is different, so sure, yes
- 08:47:36 [fsasaki]
- XYZ: question about nexus: if a news paper says "person X is a freedom fighter, another saying "person X is a terrorist", who do you deal with this?
- 08:48:12 [fsasaki]
- raplh: there is political analysis being done, but categorization like the above is normally not being done
- 08:48:26 [fsasaki]
- .. system is publicly accesibly via our home page
- 08:48:50 [fsasaki]
- richard: now break
- 08:48:58 [tadej]
- tadej has left #mlw
- 08:49:00 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 09:18:06 [Jirka]
- Jirka has joined #mlw
- 09:18:21 [Jirka]
- scribe Jirka
- 09:18:37 [RRSAgent]
- I'm logging. I don't understand 'scribe Jirka', Jirka. Try /msg RRSAgent help
- 09:20:43 [Jirka]
- scribenick: Jirka
- 09:21:22 [Jirka]
- Adriane Rinsche opens Developer session
- 09:22:24 [Jirka]
- topic: "Multilingual forms and applications" by Steven Pemberton
- 09:23:20 [Jirka]
- Steven talks about HTTP content negotiation
- 09:23:43 [fsasaki]
- fsasaki has joined #mlw
- 09:23:48 [Tomas]
- Tomas has joined #mlw
- 09:24:26 [tadej]
- tadej has joined #mlw
- 09:25:08 [PBS]
- PBS has joined #mlw
- 09:25:31 [Jirka]
- Steven shows some examples of content negotiation
- 09:26:01 [Jirka]
- Steven talks about possibility of providing more better 404 error pages
- 09:27:16 [Jirka]
- ... and 406 pages
- 09:28:07 [Jirka]
- ... some servers like www.google.com ignore content negotiation headers
- 09:29:13 [Jirka]
- ... and try to guess your location based on your IP address
- 09:29:20 [Tomas]
- Most do. The general problem is Multilingual Web Sites (MWS).
- 09:30:36 [Jirka]
- ... another approach is to have button for changing language on the web page itself
- 09:31:42 [Jirka]
- ... some sites even use Javascript to change content inside the page
- 09:32:31 [Jirka]
- After summarizing some bad practices in serving multilingual websites Steven now introduces XForms
- 09:32:55 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Jirka
- 09:35:19 [Jirka]
- XForms separate data and presentation. Steven shows this on example of the simple form
- 09:35:44 [luke]
- luke has joined #mlw
- 09:36:42 [Jirka]
- ... XForms can contain calculations
- 09:37:03 [Jirka]
- ... controls are abstract and can get different styling easily
- 09:37:28 [Jirka]
- ... it's possible to use different datasources
- 09:37:36 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Jirka
- 09:39:22 [Jirka]
- Steven shows form which can dynamically change labels for form fields based on the selected language for the form
- 09:42:23 [Jirka]
- ... XForms use declarative approach which require much less work to produce
- 09:43:44 [Jirka]
- ... conclusion - XForms allow to use "language stylesheets" to create multilingual forms even if this wasn't original goal for XForms
- 09:44:16 [Tomas]
- It is in my presentation this afternoon. An overview http://dragoman.org/mws-india.html
- 09:45:00 [Jirka]
- topic: "Lessons from standardizing i18n aspects of packaged web applications" by Charles McCathieNevile
- 09:45:22 [Jirka]
- Chaals introduces Widgets technology
- 09:46:37 [Jirka]
- ... history of Widgets development and standardization in W3C
- 09:46:53 [Jirka]
- ... Widgets are now split into 7 specifications
- 09:47:32 [PBS]
- PBS has joined #mlw
- 09:47:46 [Jirka]
- Chaals shows source of simple Widget
- 09:48:18 [kimmo]
- kimmo has joined #mlw
- 09:49:10 [Jirka]
- ... describes l10n features of Widgets
- 09:49:20 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 09:52:22 [Jirka]
- ... Widgets use xml:lang and for more larger resources separate language specific directory can be used
- 09:53:42 [Steven]
- Steven has joined #mlw
- 09:54:42 [Jirka]
- ... Widgets do not use ITS because namespaces are too hard for some web develpers, instead few specific attributes and elements were adopted (span, dir, xml:lang)
- 09:55:22 [Jirka]
- ... Opera extensions are based on Widgets
- 09:56:13 [Jirka]
- ... l10n is hard, you should get advice and do proper test
- 09:59:41 [Jirka]
- topic: "HTML5 proposed markup changes related to internationalization" by Richard Ishida
- 10:00:34 [r12a]
- r12a has left #mlw
- 10:00:45 [Jirka]
- Richard closes his IRC client
- 10:01:02 [Steven]
- Steven has joined #mlw
- 10:01:18 [Jirka]
- Richard tries to explain what HTML5 mean
- 10:01:38 [Jirka]
- ... Richard will talk only about HTML5 specification
- 10:01:40 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 10:02:11 [Jirka]
- ... not about related things like CSS3, new Javascript APIs, ...
- 10:02:33 [Jirka]
- ... HTML5 endorses utf-8 encoding
- 10:03:17 [Jirka]
- ... simplified encoding declaration <meta charset=utf-8>
- 10:03:56 [Jirka]
- ... polyglot documents are both XML and HTML5 (HTML syntax) documents, use utf-8, no XML declaration
- 10:04:28 [Steven]
- Actually, XHTML 1.0 had the same thing, but didn't call it "Polyglot"
- 10:05:23 [Steven]
- But it was addressing the same problem
- 10:07:34 [Jirka]
- ... charset attribute was removed from link and a elements
- 10:08:40 [Jirka]
- ... language declaration can use lang attribute or content-language HTTP header
- 10:09:05 [Jirka]
- ... content-language can contain more languages then one
- 10:10:00 [Jirka]
- ... content-language was just recently removed from HTML5 draft
- 10:10:26 [Jirka]
- Richard now explains Ruby
- 10:11:03 [chaals]
- [Ruby was very common in western medieval texts, where greek, latin, hebrew etc would be mixed. E.g. religious texts, and scholarly documents]
- 10:11:18 [Steven]
- Yes, Chaals, it is very useful for other things than Ruby; pity they called it Ruby mark up, since it is more than that
- 10:11:22 [Jirka]
- ... HTML5 have support for Ruby, but uses slightly different markup then XHTML 1.1 or ITS (missing rb element for base text)
- 10:11:54 [Jirka]
- ... Bidi support
- 10:12:58 [PBS_]
- PBS_ has joined #mlw
- 10:13:00 [Jirka]
- ... HTML5 adds bdi element for bidi isolation
- 10:13:36 [Jirka]
- ... dir="auto" allow run-time decision about directionality
- 10:15:10 [Steven]
- I sent a last call comment to the ruby WG, saying they should call it something more generic, but they declined "because Microsoft had already implemented it"
- 10:15:29 [Jirka]
- ... Richard invites all to get involved in spec development
- 10:16:06 [Jirka]
- topic: "Internationalization (or the lack of it) in current browsers" by Gunnar Bittersmann
- 10:16:51 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 10:17:54 [Jirka]
- Gunnar talks about some problems in the HTML5
- 10:18:25 [Jirka]
- ... valdation of email input type field is too restrictive in spec - doesn't support IDN
- 10:19:34 [Jirka]
- s/valdation/validation/
- 10:19:52 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Jirka
- 10:21:43 [Jirka]
- ... each browser provides different UI for changing preferred language
- 10:22:05 [Jirka]
- ... some browsers has bugs in this
- 10:22:35 [r12a]
- r12a has joined #mlw
- 10:23:39 [Steven]
- Some browsers have bugs, but some do it completely wrong :-)
- 10:25:12 [Jirka]
- ... language negotiation is missing some feature
- 10:25:25 [Jirka]
- ... how to label original and translation
- 10:25:34 [Jirka]
- ... how to label human and machine translation
- 10:26:04 [Jirka]
- topic: "What's Next in Multilinguality, Web News & Social Media Standardization?" by Jochen Leidner
- 10:26:08 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Jirka
- 10:28:21 [Jirka]
- Jochen shows mind map of presentation
- 10:28:41 [Jirka]
- ... presents details about Thomson Reuters company
- 10:32:00 [Jirka]
- ... customers require high quality
- 10:32:18 [Jirka]
- ... combination of human and automatic methods is in use
- 10:32:30 [Jirka]
- ... XML and Unicode is heavily used
- 10:33:10 [Jirka]
- ... main issue is not lack of standards but developer education
- 10:33:48 [Jirka]
- ... i18n and l10n is not a part of curriculum
- 10:38:11 [Jirka]
- ... new chalanges are support for multimedia content
- 10:38:29 [Jirka]
- ... some content is hidden (Facebook, Twitter, ...)
- 10:39:38 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 10:41:15 [Jirka]
- ... proposes more open twitter-like messaging system with better support for i18n
- 10:42:19 [omstefanov]
- omstefanov has joined #mlw
- 10:42:21 [Jirka]
- ... it might be useful to HTML tag saying that some page is translation of a different page
- 10:42:40 [Jirka]
- topic: Q&A session
- 10:44:27 [Jirka]
- Question from Google: Defends current state of affair regarding language selection. Asks whether easier UI will help?
- 10:45:09 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 10:45:59 [Jirka]
- Chaals: Interface should be easier to use, most users doesn't set their language
- 10:46:51 [Jirka]
- ... content should contain as much metadata as possible to inform about alternative versions of content
- 10:47:21 [Jirka]
- Richard: mentions some extension that allows easier change of preferred language
- 10:48:12 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 10:49:05 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 10:50:34 [Andrea]
- Andrea has joined #mlw
- 10:50:46 [Jirka]
- Question from Olaf: What is chance to implement some notation for marking document being in the original language.
- 10:51:11 [Jirka]
- Chaals: There are many notations starting from simple rel= going to RDF
- 10:52:40 [Jirka]
- ... you should use it, browsers will support what is used on the pages visited by users
- 10:53:24 [Jirka]
- ... you should talk to producers of content creation tools
- 10:54:35 [Jirka]
- Richard: you should be more involved, create proposals, ...
- 10:55:15 [Jirka]
- Felix Sasaki: It's possible to introduce new language subtag for this
- 10:55:52 [fsasaki]
- .. use the ietf-languages list to discuss this with the people reviewing such proposals
- 10:55:59 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 11:38:56 [Zakim]
- Zakim has left #mlw
- 11:59:44 [Steven]
- Steven has joined #mlw
- 11:59:54 [Steven]
- Scribe: Steven
- 12:01:27 [Steven]
- Topic: Creators
- 12:02:28 [Steven]
- i/Scribe: Felix/scribenick: fsasaki
- 12:02:38 [Steven]
- rrsagent, make minutes
- 12:02:38 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Steven
- 12:03:41 [Steven]
- i/scribe: felix/scribenick: fsasaki
- 12:03:49 [Steven]
- rrsagent, make minutes
- 12:03:49 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Steven
- 12:06:22 [Jirka]
- Jirka has joined #mlw
- 12:06:36 [Steven]
- Felix: Welcome to afternoon session
- 12:06:51 [Steven]
- Topic: Office.com 2010
- 12:07:43 [Steven]
- (Speaker - Dag Schmidtke)
- 12:08:11 [Steven]
- Dag: 37 langs, 51 markets
- 12:09:37 [Steven]
- ... some countries have more than one language (eg Belguim, Canada)
- 12:09:43 [Steven]
- s/ui/iu/
- 12:10:19 [Steven]
- Dag: adding value to Office
- 12:10:47 [Steven]
- ... content, templates, also sell Office
- 12:11:05 [Steven]
- ... campaigns in different markets at different times
- 12:11:16 [Steven]
- ... market specific engagement
- 12:11:49 [Steven]
- Dag: Recent migration, site management and authoring from XMetal to Word
- 12:12:19 [Steven]
- ... and using sharepoint instead of a custom publishing system
- 12:12:36 [Steven]
- ... we did extend Word to support this
- 12:12:45 [Steven]
- ... allows federated authoring
- 12:13:22 [Steven]
- ... helps with localization
- 12:13:55 [Steven]
- Dag: Lessons from this migration
- 12:14:07 [Steven]
- ... internationalisation was a key stakeholder
- 12:14:22 [Steven]
- ... designed for scale
- 12:15:12 [Steven]
- ... it was quite an effort, next time we won't do everything at once
- 12:16:08 [Steven]
- Dag: 100s of thousands of help documents for at least the last three releases
- 12:16:17 [Steven]
- ... content heavy
- 12:16:53 [Steven]
- ... complexity wasn't where we expected, and was more complex than we expected
- 12:17:51 [Steven]
- Dag: General lessons from the site
- 12:18:43 [Steven]
- ... Serve all global market needs, English is just another language
- 12:18:56 [Steven]
- ... scale up *and* down
- 12:19:48 [r12a]
- r12a has joined #mlw
- 12:19:55 [chaals]
- chaals has joined #mlw
- 12:20:10 [Steven]
- ... design for growth
- 12:20:57 [Steven]
- [gives example of content riginating in Japan, and translated to other languages]
- 12:21:03 [Steven]
- s/rig/orig/
- 12:22:11 [Steven]
- Dag: No character formating, only character styles
- 12:22:18 [Steven]
- s/ting/tting/
- 12:22:53 [Steven]
- Dag: We have an XML format for translation
- 12:23:02 [Steven]
- Dag: Local touch
- 12:23:20 [Steven]
- ... deliver right experience to each market
- 12:23:45 [Steven]
- [examples]
- 12:25:19 [Steven]
- Dag: Customer connection
- 12:25:41 [Steven]
- ... feedback, evaluation, SEO
- 12:26:36 [Steven]
- [examples from site]
- 12:27:02 [Steven]
- Dag: Continuous updates
- 12:27:27 [Steven]
- ... respond to regional events, A/B testing
- 12:27:48 [Steven]
- ... use some machine translation
- 12:27:56 [Steven]
- Dag: Future trends
- 12:28:05 [Steven]
- ... moving to the cloud
- 12:28:43 [Steven]
- ... multilingual multimedia
- 12:28:59 [Steven]
- ... language automation
- 12:29:21 [Steven]
- .... interoperability with standards
- 12:29:27 [Steven]
- s/..../.../
- 12:29:36 [Steven]
- Dag: Conclusions
- 12:29:59 [Steven]
- ... It is possible to design for scale and local relevance
- 12:30:49 [Steven]
- Topic:
- 12:30:49 [Steven]
- Jirka Kosek - Using ITS in the common content formats
- 12:31:06 [Steven]
- s/Jirka/Topic: Jirka/
- 12:32:19 [Steven]
- rrsagent, make minutes
- 12:32:19 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Steven
- 12:33:16 [Steven]
- Jirka: tag set designed to help with translations
- 12:33:37 [Steven]
- ... usable with any XML vocabulary
- 12:34:39 [Steven]
- [example of use]
- 12:36:21 [Steven]
- Jirka: Allows automatic software to see what should not be translated, as well as human translators
- 12:36:35 [omstefanov]
- omstefanov has joined #mlw
- 12:36:58 [chaals]
- [As Jirka said, you don't have to use the actual ITS namespace to use the ITS pieces - and the decision for widgets was indeed to do that]
- 12:37:05 [Steven]
- Jirka: Now to look at formats that support ITS
- 12:37:14 [Steven]
- ... first DocBook
- 12:38:16 [lbellido]
- lbellido has joined #mlw
- 12:38:22 [Steven]
- [example]
- 12:41:24 [Steven]
- Jirka: Next format, DITA
- 12:41:45 [Steven]
- ... for topic-bsed documentation
- 12:42:07 [Steven]
- .... DITA doesn't natively support ITS
- 12:42:13 [Steven]
- ... can be added
- 12:42:31 [Steven]
- Jirka: Now OOXML
- 12:42:40 [jan]
- jan has joined #mlw
- 12:43:00 [Steven]
- ... Open Office, and even for MS Office 2007+
- 12:43:23 [Steven]
- ... no native support, but can be added
- 12:44:30 [jan]
- Office Open XML is a MS developed standard, not Open Office... ;-)
- 12:44:43 [Steven]
- Jirka: ODF is similar
- 12:45:07 [Steven]
- Jirka: XHTML allows use of ITS
- 12:45:46 [Steven]
- ... HTML5 has no extension points to allow ITS
- 12:47:02 [Steven]
- ... what is to be done?
- 12:48:09 [Steven]
- ... HTML5 needs to be augmented to support ITS
- 12:50:01 [Steven]
- Dag: MS translator does support something similar
- 12:51:00 [Steven]
- Steven: If XHTML5 supports it, why not just say "Use XML serialization if you want this facility"?
- 12:51:23 [Steven]
- Jirka: Not sure if people can produce well-formed XML
- 12:51:50 [Steven]
- Topic: Chaals - standards for multilingual websites
- 12:52:38 [Jirka]
- Slides from my presentation http://www.kosek.cz/xml/2011mlwpisa/
- 12:52:58 [Steven]
- Chaals: What standards should be developed?
- 12:53:35 [Steven]
- ... there are lots of multilingual sites. Substantial problems
- 12:53:56 [Tomas]
- I am here ... just in case
- 12:54:04 [Steven]
- ... principles - don't break existing stuff
- 12:54:25 [Steven]
- ... expect it to take time
- 12:54:38 [Steven]
- ... two sides of coin: users and webmasters
- 12:54:39 [Tomas]
- Slides - http://dragoman.org/pisa/carrasco-mw-pisa.pdf
- 12:55:07 [Steven]
- Chaals: But it is often less clear-cut
- 12:55:46 [Steven]
- Chaals: Currently - no consistent user interface for a ML website.
- 12:56:22 [Steven]
- ... this should be fixed
- 12:56:25 [Encarna]
- Encarna has joined #mlw
- 12:56:53 [Steven]
- ... No standards for multilingual content production
- 12:57:02 [Steven]
- ... this should be fixed
- 12:57:54 [Tomas]
- No standards for content production - in general - not a particular problem to MWS
- 12:58:02 [Steven]
- Chaals; Most users are monolingual
- 12:58:09 [Steven]
- [Scribe: he claims]
- 12:58:43 [Tomas]
- One needs hard data
- 12:58:46 [Steven]
- Chaals: Webmasters must manage multilingual system
- 12:59:06 [Steven]
- ... users don't want more complexity
- 13:00:21 [Steven]
- ... webmasters aren't necessarily experts in this stuff
- 13:00:56 [Steven]
- ... interfaces for content from the user side are well-established
- 13:01:01 [Steven]
- ... not so for webmasters
- 13:02:33 [Steven]
- Chaals: Some ideas - language button in the browser
- 13:02:48 [Steven]
- ... use HTTP header fields maybe
- 13:03:48 [Steven]
- ... content negotiation
- 13:03:49 [Tomas]
- Another good "high level" variant is memento http://www.mementoweb.org
- 13:04:05 [Steven]
- ... reserved URIs
- 13:04:37 [Steven]
- ... I am not sure if reserved URIs are a good idea
- 13:04:54 [Steven]
- Chaals: It should be possible to request a translation
- 13:05:07 [Steven]
- ... there's an Opera extension for that
- 13:05:35 [Tomas]
- A reserved URI is very good as one can have all the pages in the MWS with the same URI pointing to the variants
- 13:06:20 [Tomas]
- maitaining pages with different URIs for the variants is very hard
- 13:06:38 [Steven]
- ... need a metaresource concept
- 13:06:45 [Steven]
- [Scribe: RDFa!]
- 13:07:46 [Tomas]
- RDF might do it - needs verification
- 13:08:26 [Steven]
- Chaals: Need server-side standards
- 13:09:21 [iantruscott]
- iantruscott has joined #mlw
- 13:09:35 [Steven]
- Scribe: RDFa was largest growing web format last year http://rdfa.info/2011/01/26/rdfa-grows/
- 13:09:58 [Steven]
- Chaals: Next step? Working group maybe
- 13:10:23 [Steven]
- ... at W3C? Elsewhere?
- 13:10:37 [Tomas]
- No WG, not specifications
- 13:10:43 [Steven]
- ... or create a new initiative?
- 13:11:12 [Steven]
- Chaals: Need guides for best practice on user and webmaster sides
- 13:13:44 [r12a]
- scribe: r12a
- 13:14:05 [r12a]
- Topic: Sophie Hurst - Local is global
- 13:14:11 [Tomas]
- A tabular view http://dragoman.org/mws-india.html
- 13:14:35 [Tomas]
- Chaals: you wont your beer !!!
- 13:15:06 [r12a]
- Sophie: 90% of HP buy based on content rather than touhing product
- 13:15:24 [r12a]
- s/HP/HP's customers/
- 13:16:14 [r12a]
- Sophie: 42% of web users are from Asia
- 13:16:26 [r12a]
- ... only 13% from USA
- 13:16:38 [r12a]
- ... yet English still leading language
- 13:17:20 [r12a]
- ... asia has highest usage but low penetration
- 13:17:27 [r12a]
- ... therefore it's a growth area
- 13:17:49 [r12a]
- ... 10% retails sales in CHina are done online
- 13:18:00 [r12a]
- s/retails/retail/
- 13:18:15 [r12a]
- s/CH/Ch/
- 13:19:02 [chaals]
- [My concern with reserved URIs is that it breaks some existing standards and expectations. I think HTTP headers and metadata are better approaches. (I generally hate reserved URIs - they are used in P3P, favicons, robots.txt and a couple of other places, but I don't think they're going to handle the complexity of multilingual websites without creating as many problems as they solve...)]
- 13:19:04 [r12a]
- Sophie: How to represent brand consistently, locally
- 13:19:12 [r12a]
- ... how to make it relevant
- 13:19:23 [chaals]
- [I certainly think that being able to get the information about available variants is really important]
- 13:19:29 [r12a]
- ... how to manage translation
- 13:20:05 [r12a]
- Sophie: First is to use component based system
- 13:21:10 [Jirka]
- chaals: yes, but it might be sufficient to have link/http header pointing to another URL where manifest listing all possible variants will be sitting then to have dozen of alternatives in each page -- to much change when new translation is added
- 13:21:17 [r12a]
- ... synchronisation between compnenet sis then eassy to manage
- 13:21:23 [r12a]
- s/ss/s
- 13:21:54 [r12a]
- ... allows local components, but global style
- 13:22:17 [r12a]
- ... eg Emirates site
- 13:23:18 [r12a]
- Sophie: Use positioning information to personalise information
- 13:23:54 [r12a]
- ... example, Lux brand which is up-market in India, but not elsewhere
- 13:24:21 [r12a]
- ... need local input to ensure local nuances are working
- 13:25:22 [r12a]
- ... users come with cultural layers as well
- 13:25:38 [r12a]
- ... cultures vary in many dimensions
- 13:26:39 [r12a]
- Sophie: Finally, managing content
- 13:26:57 [r12a]
- ... need a well-managed process
- 13:27:06 [Tomas]
- [The browser side is much better, but we have to care for the server side. This is the question: how to implement the server-side. Separate function from the mechanism: we can explore different mechanisms. One fix reserved URI for the whole server combined with the Referer header will certanly resolve a big problem (different URIs for each page).
- 13:27:30 [r12a]
- ... can be automated to large extent (the management, not the translation)
- 13:28:10 [r12a]
- [shows an example process]
- 13:29:30 [r12a]
- Sophie: In conclusion, translation must be part of a larger picture
- 13:29:55 [r12a]
- ... use component, geo-positioning, and translation management
- 13:30:09 [r12a]
- [Q&A session]
- 13:30:12 [Tomas]
- Question of scope: what should be in MWS and what in other specifications for full translation system.
- 13:30:51 [Tomas]
- The picture is larger: Authorship, Translation and Publishing Chain
- 13:31:51 [Tomas]
- Translation is only part of the whole production chain
- 13:31:59 [r12a]
- Christian Lieske: For Chaals- I got different messages - we've got to do stuff, but Sophie seems to suggest we can already do it.
- 13:32:40 [r12a]
- Chaals: It's not that we can't do it already, but that there is no agreed way to do it
- 13:33:59 [Tomas]
- We need to define the different scopes and how the different fields integrate; a MWS is *not* a translation management system.
- 13:34:30 [r12a]
- Chaals: We have no ineroperability
- 13:34:38 [r12a]
- s/iner/inter/
- 13:34:46 [Tomas]
- You wont another beer !!!
- 13:35:13 [r12a]
- Sophie: Changing solutions is hard, standards could help
- 13:35:31 [Tomas]
- We need to identify what is particular to MWS and is general.
- 13:36:15 [r12a]
- Sophie: We should work towards a position where you need less developers
- 13:36:44 [Tomas]
- Language is just one of the dimensions in TCN; e.g., mementos should be integrated in the same mechanism http://www.mementoweb.org/
- 13:37:42 [r12a]
- Dag: We have a translation tag, but it is not standard, so there is less customer value, in the long run a standard lowers the cost of entry for us
- 13:37:45 [Tomas]
- +1 regarding developers: one should be able to construct a MWS from Apache out of the box
- 13:37:57 [Andrea]
- Andrea has joined #mlw
- 13:38:15 [r12a]
- Tomas Abramovitch: Do you use different CSS for different cultures?
- 13:38:30 [r12a]
- ... and how accurate is geo-location?
- 13:39:07 [Tomas]
- One could (CSS)
- 13:39:10 [r12a]
- Dag: We componentise our pages, the local part is not done by CSS
- 13:39:45 [r12a]
- Sophie: I can't totally answer the geo-loc part.
- 13:40:28 [r12a]
- Chaals: It is a spectrum from one person to just someone in a country
- 13:40:28 [Tomas]
- One could generate some pages: "5.3. Generating language in parallel" in http://dragoman.org/mws/oamws.pdf
- 13:41:15 [chaals]
- s/one person/identifying one seat in an audience/
- 13:42:22 [r12a]
- Ian Truscott: identifying people is always a guess until they log in
- 13:42:55 [Tomas]
- Or he set his browser preferences
- 13:43:27 [r12a]
- Reinhard: How do we learn from research? No one has mentioned this
- 13:43:48 [r12a]
- ... different people like different things
- 13:44:08 [r12a]
- ... 16 year olds in China have more in common with 16 year olds in the USA than with their parents
- 13:44:35 [r12a]
- ... all I've heard is corporate policy. Why not let the user decide?
- 13:45:01 [Tomas]
- A user wants the page in his language
- 13:45:10 [r12a]
- Sophie: Crowd sourcing is an option
- 13:45:39 [Tomas]
- Choosing is already a hurdle
- 13:47:33 [Tomas]
- We need to look at all the available mechanisms and decide on a recommendation: "4.4. Options" in http://dragoman.org/mws/oamws.pdf
- 13:50:00 [r12a]
- Dag: There are areas where our interest and the users' coincide
- 13:50:22 [r12a]
- ... but we can't do translation on demand
- 13:50:38 [r12a]
- ... they pay for premium product
- 13:51:05 [chaals]
- [It isn't always a guess identifying the user until they log in. In fact, technically it is often easy to identify users anyway - this is why we have laws to protect privacy and limit the things done to make it easy]
- 13:54:04 [r12a]
- Steven: A good example of Reinhard's point is websites that conflate refgion with language. I often don't knwo which question they are asking.
- 13:54:38 [r12a]
- ... and I don't believe that most people are monolingual. There are 6000 languages, and 150 countries. Most people are at least bilingual
- 13:55:06 [r12a]
- [scribe's computer is nearly out of battery]
- 13:55:26 [Tomas]
- [we need to identify what the user wants, not who he is]
- 13:55:36 [r12a]
- Reinhard: Corwdsourcing translation is often not possible because of copyright issues
- 13:55:54 [r12a]
- s/Cor/Cro/
- 13:57:12 [r12a]
- Olaf: We need the possibility to offer translations of parts of sites
- 13:57:38 [r12a]
- ... it works on wikipedia
- 13:57:52 [Tomas]
- Monolingual user: we need hard data; but circunstancial data point to that the requirement of most user is monolingual.
- 13:58:01 [r12a]
- ... microsoft needs to open its translation tools
- 13:58:48 [r12a]
- Chaals: I use crowdsourced translation of Norwegian law
- 13:59:31 [r12a]
- ... it is easy to do, but by and large it doesn't happen
- 13:59:45 [r12a]
- ... too little reward
- 14:01:41 [Tomas]
- Translation integration in MWS: a language non available could be defined as a "language potentially available" (after translation). One needs a mechanism covering all the aspects of the different translation techiques: human (professional, crowd), machine (fast as RBMT or slow as SMT).
- 14:02:30 [karl]
- karl has joined #mlw
- 14:04:14 [Tomas]
- For the whole enchilada: "Open architecture for multilingual parallel texts" http://arxiv.org/ftp/arxiv/papers/0808/0808.3889.pdf
- 14:06:05 [Tomas]
- OK: I go for coffee.
- 14:30:02 [Jirka]
- Jirka has joined #mlw
- 14:42:22 [fsasaki]
- fsasaki has joined #mlw
- 14:43:23 [fsasaki]
- topic: presentation from Christian Lieske et al.
- 14:44:03 [lbellido]
- lbellido has joined #mlw
- 14:45:10 [fsasaki]
- christian: five areas show that there is a need for change:
- 14:45:39 [fsasaki]
- .. demand for language related services, shortcomings of today's translation-related standards, ...
- 14:45:57 [fsasaki]
- .. why talking about standards: demand & lack of interoperability
- 14:47:02 [fsasaki]
- .. lack of interoperability e.g. for XLIFF
- 14:47:46 [fsasaki]
- .. things break down across tool chains
- 14:48:15 [fsasaki]
- .. standards in localization area are sometimes not compatible
- 14:48:26 [fsasaki]
- .. example of phrases in TMX vs phrases in XLIFF
- 14:49:16 [fsasaki]
- christian: not of work in localization standardization integration new web technologies
- 14:49:43 [fsasaki]
- .. e.g. aspect of RESTful services, use of related protocols (odata, gdata) for translation related services
- 14:50:11 [fsasaki]
- .. these problems have lead to implementation challenges, problems for standards that are already here
- 14:50:47 [fsasaki]
- .. how to solve the problems: four areas of requirements, methodology, compliance , stewardship are important
- 14:51:44 [fsasaki]
- .. requrements: identify processing areas related to language processing - and keep them separated
- 14:52:00 [fsasaki]
- .. determine the entities that needed in each area
- 14:52:24 [fsasaki]
- .. chart technology options and needs
- 14:52:48 [fsasaki]
- ... etc. Next: methodology:
- 14:53:12 [fsasaki]
- .. distinguish between models and implementation / serialization
- 14:53:13 [omstefanov]
- omstefanov has joined #mlw
- 14:53:41 [fsasaki]
- .. distinguish between entities without context and entities with business / processing context
- 14:53:53 [fsasaki]
- .. set up rules to transform data models into syntax
- 14:54:15 [fsasaki]
- .. set up flexible registries, e.g. CLDR, IANA
- 14:54:33 [fsasaki]
- .. provide migration paths / mapping mechaisms for legacy data
- 14:54:52 [fsasaki]
- s/mechaisms/mechanisms/
- 14:55:01 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 14:55:39 [fsasaki]
- .. third, compliance: e.g. what does "support for standard X" mean?
- 14:56:50 [fsasaki]
- .. finally, stewardship: driving, supporting standardization activity
- 14:57:07 [fsasaki]
- .. anyone who shouts for small standards should be willing to invest
- 14:57:25 [fsasaki]
- .. EC has a track recor, see e.g. mlw project
- 14:57:37 [fsasaki]
- .. make donations / contributions easy
- 14:57:50 [fsasaki]
- .. discourage fragmentation and unclear roles
- 14:58:20 [fsasaki]
- .. LISA does no longer exist, now there is a kind of competition who could follow in the footsteps
- 14:58:53 [fsasaki]
- .. my fear is that another organization is being cretaed, my and probably Felix' and Yves' thought is that this should be avoided
- 14:59:36 [fsasaki]
- topic: David Filip on "Multilingual transformations on the web via XLIFF current and via XLIFF next"
- 14:59:52 [fsasaki]
- .. christian has covered a lot for XLIFF 2.0 - what do I want to cover?
- 15:01:10 [fsasaki]
- s/.. christian/david: christian/
- 15:02:12 [fsasaki]
- david: my main statements: metadata must survive language transformations, content metadata must be designed upfront with the transformation process in mind, XLIFF is the principle vehicle for criticial metadata throughout multilingual transformations
- 15:02:51 [fsasaki]
- .. and finally: next generation XLIFF standard is an exciting work in progress in OASIS TC
- 15:03:06 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 15:03:20 [cchiavet]
- cchiavet has joined #mlw
- 15:03:49 [fsasaki]
- .. about preserving metadata: there are various transformations: g11n, l18n, l10n, t9n ("GILT")
- 15:04:01 [fsasaki]
- .. transformation modi: manual, automated, assisted
- 15:04:37 [fsasaki]
- .. transformation types: MT, human translation, postediting, stylistic review, tagging (semantic, subject matter review, transcribing), subtitling, ...
- 15:04:47 [fsasaki]
- .. growing number of source languages
- 15:04:54 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 15:05:54 [Steven_]
- Steven_ has joined #mlw
- 15:06:26 [fsasaki]
- david: what metadata is necessary?
- 15:06:34 [fsasaki]
- .. preview and context are critical
- 15:06:54 [fsasaki]
- .. argue for creating standardized XSLT artefacts for preview
- 15:07:12 [fsasaki]
- .. metadata for legally conscious sharing (ownership, licensing, ...)
- 15:08:44 [fsasaki]
- .. grammatical, syntactic, morphological and lexical metadata
- 15:09:18 [fsasaki]
- .. example of m4loc project: they developed an XLIFF middleware to ensure interop between localization open source tool and moses MT tool
- 15:10:15 [fsasaki]
- .. tagging of culturally and legally targeted information
- 15:11:07 [fsasaki]
- .. home for LT standardization? Leverage BP of existing loc standards (XLIFF, TBX, SRX, ...) - pointing into the past (OASIS, LISA)
- 15:12:13 [fsasaki]
- .. now: leverage OASIS XLIFF, ISO TC37, Unicode SRX and GMX
- 15:12:48 [fsasaki]
- .. further development of W3C ITS and RDF, create conscious standardization including RDF and XLIFF
- 15:13:13 [fsasaki]
- david: OASIS is home of XLIFF, but has also UBL and XBL as its home
- 15:13:31 [fsasaki]
- .. W3C has ITS and RDF modeling, Unicode - see above
- 15:13:48 [fsasaki]
- .. ISO TC 37, important not for standards creation but for secondary publishing
- 15:14:30 [fsasaki]
- .. why XLIFF?, and why 2.0? see also presentation from christian
- 15:15:21 [fsasaki]
- .. good progress of XLIFF in 2011 possible, as SWOT analysis shows
- 15:15:40 [fsasaki]
- .. prediction: 2011 will see definition of new features, in 2012 new standard
- 15:16:13 [fsasaki]
- topic: presentation from Sven C. Andrä
- 15:16:19 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 15:17:19 [fsasaki]
- sven: kilgray, "we localize", Andrä, biobloom are behind the "interoperability now!" initiative
- 15:17:36 [Steven_]
- i/christian: five/scribenick: fsasaki
- 15:17:41 [fsasaki]
- .. translation (technology) industry is a niche industry
- 15:17:42 [Steven_]
- rrsagent, make minutes
- 15:17:42 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Steven_
- 15:18:03 [fsasaki]
- .. very few computer scientists here, not a technical, but experience driven industry
- 15:18:16 [fsasaki]
- .. industry is getting more and more important, including technology
- 15:18:23 [fsasaki]
- .. hence interop is getting more important
- 15:18:39 [fsasaki]
- .. there are enough standards here, but they are complex, not many have reference implementations
- 15:18:47 [fsasaki]
- .. and there is little exchange within tool providers
- 15:19:28 [fsasaki]
- table of features in XLIFF that are supported by all tools - only two features (from about 50?) are supported by all tools
- 15:20:01 [fsasaki]
- sven: we want lossless data exchange in a mixed (tool) environment
- 15:20:30 [fsasaki]
- .. standards are important, also develo
- 15:20:45 [fsasaki]
- s/develo/further development of XLIFF/
- 15:21:00 [fsasaki]
- .. but mindset is most important, i.e. about the lossless data exchange
- 15:21:13 [fsasaki]
- .. basis of our work: "interoperability manifesto"
- 15:22:34 [fsasaki]
- .. pushing standards over the edge, give feedback to the TC
- 15:22:56 [fsasaki]
- .. modules that we are working on: about content, package, transportation
- 15:23:06 [fsasaki]
- .. content is modified xliff
- 15:23:16 [fsasaki]
- .. package is currently just made up
- 15:23:27 [fsasaki]
- .. for transfortation we are using regular web services
- 15:23:48 [fsasaki]
- .. basic approach: disclose our concepts
- 15:24:27 [fsasaki]
- .. reference implementations are open source
- 15:25:00 [fsasaki]
- .. early real life usage
- 15:25:52 [fsasaki]
- .. test scenarios to verify compliance
- 15:26:48 [fsasaki]
- .. theoretical aspect: agile vs. standard?
- 15:27:30 [fsasaki]
- .. would be good to have a framework for organizations like W3C that could help is to bring this into standardization step by step
- 15:28:17 [fsasaki]
- .. benefits of this approach: it is a limited time that we are working on this
- 15:29:12 [fsasaki]
- topic: presentation from Eliott Nedas
- 15:29:18 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 15:30:15 [tadej]
- tadej has joined #mlw
- 15:34:27 [tadej]
- David Grunwald, GTS
- 15:34:49 [fsasaki]
- s/Eliott Nedas/David Grunwald/
- 15:35:14 [fsasaki]
- david: our vision: have a box that creates quality content very quickly and cheaply
- 15:35:39 [fsasaki]
- .. using MT, we want an efficient solution that will make mlw a reality
- 15:36:04 [fsasaki]
- .. need to develop MT which is good for blog publishing
- 15:36:16 [fsasaki]
- .. MT will never be ready "as is" for human quality translation
- 15:36:27 [fsasaki]
- .. we developed a system for cheap and quick post editing
- 15:36:52 [fsasaki]
- .. currently, explosion of content, lots of it is local because of language barriers
- 15:37:03 [fsasaki]
- .. translation costs are very high
- 15:37:58 [fsasaki]
- .. we are targeting open source CMS platforms
- 15:38:10 [fsasaki]
- .. 20 % of web sites are published on such platforms
- 15:38:28 [fsasaki]
- .. we could offer a good translation solution to these
- 15:38:41 [fsasaki]
- .. large media publishers who use open source CMS
- 15:38:58 [fsasaki]
- .. wordpress, movable type are created for all kinds of web sites, not only blogs
- 15:39:12 [fsasaki]
- .. our solution: based on MT; human post editing, and crowd sourcing
- 15:39:21 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 15:40:03 [fsasaki]
- david: crowdsourcing startups in many regions
- 15:40:44 [fsasaki]
- .. our solution: not automated open source CMS solution for small guys
- 15:40:54 [fsasaki]
- .. no automated tools for post editing / MT either
- 15:42:06 [fsasaki]
- .. our solution uses data from blogs that is available on the web
- 15:43:55 [fsasaki]
- .. workflow: user installs workpress, MT is done, email notification is sent to crowdsourcing translators, integrated after review by a moderator
- 15:45:55 [fsasaki]
- .. interested in opportunities for funding this kind of work
- 15:46:11 [fsasaki]
- topic: presentation from Pål Nes
- 15:46:20 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 15:46:42 [fsasaki]
- Pål: opera has been using crowd sourcing for a long time
- 15:47:32 [fsasaki]
- .. caveat of crowd sourcing: it is not free, organizing it is difficult
- 15:47:45 [fsasaki]
- .. e.g. employing managers for the crowd
- 15:48:18 [jan]
- jan has joined #mlw
- 15:48:50 [fsasaki]
- .. should only be used for cretain tasks
- 15:49:01 [fsasaki]
- .. not for time critical tasks
- 15:49:27 [fsasaki]
- .. mostly students are participating, picked up from university talks
- 15:49:34 [tadej1]
- tadej1 has joined #mlw
- 15:49:43 [fsasaki]
- .. large crowd is not necessarily a good crowd
- 15:49:59 [fsasaki]
- .. better 3,4,5 good translators, than 50 translators doing nothing
- 15:51:09 [fsasaki]
- .. e.g. press releases, marketing material are not well suited for crowd translations
- 15:51:29 [fsasaki]
- .. good for crowd sourcing: applications (web site "my opera", "opera com"), with a stable set of text
- 15:51:38 [fsasaki]
- .. and documentation, that is easy to maintain
- 15:52:28 [fsasaki]
- .. start small, put your crowd under embargo / NDA
- 15:52:54 [r12a]
- r12a has joined #mlw
- 15:53:37 [fsasaki]
- .. try building up a hierarchy
- 15:53:47 [fsasaki]
- .. be careful with your branding
- 15:53:58 [fsasaki]
- .. and your terminology
- 15:56:06 [fsasaki]
- .. for opera we used XLIFF - we used our own, incompatibly version of XLIFF
- 15:56:19 [fsasaki]
- .. discovered that open source is not open standard
- 15:57:41 [fsasaki]
- .. tools we used: gettext and po4a, transifex, translate toolkit with pootle and virtaal, homebrew applications to bridge the vast gaps
- 15:58:14 [fsasaki]
- .. XLIFF is a mindfield, in the current version
- 15:58:40 [fsasaki]
- .. about html: keep it as simple as possible, semantic markup is key
- 15:59:03 [fsasaki]
- .. write proper CSS - write a separate RTL - stylesheet to negate RTL-challenged CSS
- 16:00:02 [fsasaki]
- topic: presentation from Eliott Nedas
- 16:00:10 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 16:05:10 [fsasaki]
- eliott: everything that was said from David, Christian etc. in this session about interoperability was right, I concur with them
- 16:05:47 [fsasaki]
- .. we need standards because of interdependence.
- 16:06:23 [fsasaki]
- .. the demise of LISA. Sad that they are gone, but opportunity to look into this in a new way
- 16:06:32 [fsasaki]
- .. LISA standards are important
- 16:06:53 [fsasaki]
- .. now is a good opportunity for a new model of standardization
- 16:07:03 [fsasaki]
- .. new kids on the block: TAUS and Gala
- 16:07:50 [fsasaki]
- .. currently losts of different technologies
- 16:09:23 [fsasaki]
- .. and many different standards
- 16:09:34 [fsasaki]
- .. OAXAL is a solution that brings these together
- 16:09:50 [fsasaki]
- .. that can be used for free
- 16:11:18 [fsasaki]
- description of various aspects of standards and applications built on top of it
- 16:12:41 [fsasaki]
- eliott: how to spread the message: important e.g. in academic curricula
- 16:13:26 [fsasaki]
- topic: presentation from Manuel Herranz
- 16:14:02 [fsasaki]
- manuel: presentation about PangeaMT project
- 16:14:30 [Andrea]
- Andrea has joined #mlw
- 16:14:42 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 16:15:03 [fsasaki]
- manuel: translation is something that you have to go through for achieving what you want
- 16:15:32 [fsasaki]
- .. web people except immediate translation
- 16:15:42 [fsasaki]
- s/except/expect/
- 16:16:11 [fsasaki]
- .. why don't we have immediate translations?
- 16:17:33 [fsasaki]
- .. inroducing pangeanic: LSP, major clients in Asia and Europe
- 16:17:48 [fsasaki]
- .. we wanted to provide faster service for translation
- 16:17:54 [fsasaki]
- .. became founding member of TAUS
- 16:19:17 [tadej]
- tadej has joined #mlw
- 16:19:20 [fsasaki]
- .. four years ago created relation with computer science institute in valencia
- 16:19:42 [fsasaki]
- .. challenge at that time: turn academic develpment (moses) into a commercial application
- 16:20:17 [fsasaki]
- .. limitations: plain text, language model building (first), no recording, no update feature, data availability, ...
- 16:21:08 [fsasaki]
- .. objectives: provide high quality MT for post editing
- 16:21:19 [fsasaki]
- .. and to use only open standards: XLIFF, tmx, xml
- 16:21:34 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 16:22:10 [fsasaki]
- .. built an TMX - XLIFF workflow
- 16:22:39 [fsasaki]
- .. not to be locked into a solution
- 16:23:35 [fsasaki]
- .. PangeMT system: comes as TMX or as XLIFF
- 16:23:51 [fsasaki]
- .. TMX should not die, people are still using it
- 16:24:17 [lbellido]
- lbellido has joined #mlw
- 16:25:07 [fsasaki]
- .. future work: on the fly MT training
- 16:25:15 [fsasaki]
- .. pick and match sets of data
- 16:25:23 [fsasaki]
- .. objective stats for post-editors
- 16:25:28 [fsasaki]
- .. confidence scores for users
- 16:26:50 [fsasaki]
- topic: Q/A for localizers
- 16:27:19 [fsasaki]
- reinhard: thank you, was a great session
- 16:28:26 [fsasaki]
- .. about remarks on crowd sourcing: there was emphasis on crowd sourcing for enterprise
- 16:28:36 [fsasaki]
- .. this does not go well together
- 16:29:29 [fsasaki]
- .. other people like rosetta foundations, translators without borders etc. have made good experiences
- 16:29:59 [fsasaki]
- Pål: crowd sourcing was good for us
- 16:30:11 [fsasaki]
- .. it just took us a lot of effor and time to get there
- 16:31:18 [fsasaki]
- jörg: there is some similarity: you have to train translators, otherwise you won't get the good results in medical translation
- 16:33:02 [fsasaki]
- felix: one comment on interop now, it is very important to go into a standards body as a next step
- 16:33:15 [fsasaki]
- sven: thanks, we will definitely try to do that
- 16:33:46 [fsasaki]
- richard: w3c just created business groups / community groups, that might be a thing for you to look into
- 16:33:56 [fsasaki]
- david: about what reinhard said
- 16:34:20 [fsasaki]
- .. if your expectation is high you will be disappointed, but the business case is in the future
- 16:34:43 [fsasaki]
- topic: wrap up
- 16:35:02 [fsasaki]
- richard: see you tomorrow, speakers please show up at 8:30 tomorrow
- 16:35:07 [RRSAgent]
- I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki
- 16:40:55 [tadej]
- tadej has left #mlw
- 17:00:38 [asgeirf]
- asgeirf has joined #mlw