07:13:17 RRSAgent has joined #mlw 07:13:17 logging to http://www.w3.org/2011/04/04-mlw-irc 07:13:24 meeting: MLW workshop, PISA 07:13:27 chair: richard 07:13:32 scribe: felix 07:13:44 topic: introduction 07:13:52 Jirka has joined #mlw 07:13:56 Richard introduces the project and the workshop 07:14:12 2nd of 4 MultilingualWeb conferences 07:14:41 Goal is to facilitate cross-pollination across different areas, so don't tune out if it's not your specialty! 07:18:56 tadej has joined #mlw 07:20:27 mpo has joined #mlw 07:20:38 chaals has joined #mlw 07:21:14 rrsagent, draft minutes 07:21:15 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html chaals 07:21:36 rrsagent, make log public 07:22:51 lbellido has joined #mlw 07:24:16 topic: Presentation from Domenico Laforenza on "The Italian approach to Internationalized Domain Names" 07:29:19 tadej1 has joined #mlw 07:30:22 tadej1 has left #mlw 07:30:26 tadej1 has joined #mlw 07:31:13 Domenico describes the mechanisms behind IDN, domain names in general, the usage of the internet 07:31:47 r12a has joined #mlw 07:33:18 Domenico describes what is possible with IDN, compared to domain names in general 07:33:59 iantruscott has joined #mlw 07:36:10 Domenico describes how the punycode translation helps to use IDN, while keeping the underlying domain name system as is 07:39:38 tadej has joined #mlw 07:40:40 tadej has left #mlw 07:40:47 tadej has joined #mlw 07:45:33 lbellido has joined #mlw 07:46:56 fsasaki has joined #mlw 07:47:26 q+ to ask about how users will distinguish papa.it and papá.it 07:47:33 Zakim has joined #mlw 07:47:38 q+ to ask about how users will distinguish papa.it and papá.it 07:48:01 topic: presentation from oreste signore on "web for all" 07:48:08 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 07:50:09 [webfonts is actually really important for some places ... ] 07:50:14 oreste is showing various areas that need more work to create "a web for all", e.g. in the area of accessibility, multilinguality etc. 07:50:53 oreste describes wcag 2.0 07:51:01 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 07:51:59 oreste: issues of multilingual web: encoding, colors, navigation, ... 07:52:12 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 07:52:49 q- 07:54:18 PBS has joined #mlw 07:54:27 q? 07:56:42 oreste describes the role of W3C offices, translations, W3C I18N Activity etc. as important means to push the multilingual web 07:57:08 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 07:58:55 topic: Presentation from Kimmo Rossi 07:59:51 Kimmo: I am project officer for mlw project 08:00:45 .. I am very happy about the enthusastim in this project. It is very small in terms of budget, but it is very successful 08:01:22 .. mlw has also been very succesful in using social media 08:01:39 .. looking forward to see the next steps including the review which is coming up 08:02:11 .. mlw has been wonderful forum for gathering new ideas, to understand how much fragmentation still exits 08:02:26 .. now it is time to become operational, to start to put ideas into practice 08:02:47 .. I except that this project will come up with good recommendations: what needs to be done, why, who could do it? 08:03:01 .. we have to create operational working links to other European projects 08:03:20 .. mid 2015 we will have about 50 onging projects in the area of multilingual technologies 08:03:39 .. we started creating these links, i.e. we have speakers from several European projects 08:03:56 .. please look into these other initiatives and see what we can do together 08:04:19 .. we started funding language technology 2 years ago - we are reaching a plateau 08:04:47 .. we just evaluated 90 proposals, asking 240 mill. Euros, we only have 50 mill. Euros 08:04:59 .. we can only select one of five projects 08:05:20 .. there is still one more call coming up for SME: 35 mill. Euro for sharing data / language resources 08:05:35 .. there is still three weeks to put in a proposal 08:05:41 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 08:06:17 kimmo: once SME call is other, we will have about 50 projects 08:06:49 .. we spent 150.000 Euros to fund a survey, interviewing many people in European states 08:07:03 .. asking about language use while being online 08:07:24 .. results will soon be public on our web site and europe barometer web site 08:07:37 .. results are that use of other languages is mostly passive 08:07:56 .. when people write and engage in social networking, they prefer to use their own language 08:08:17 .. 44% said: they are missing important information because they don't understand the language used 08:08:39 .. thank you, have a succesful conference 08:08:49 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 08:09:10 topic: presentation from ralf steinberger 08:09:54 ralf: talking about attempts to give access to information across languages 08:10:15 .. monitoring news in 50 languages 08:10:29 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 08:12:23 ralf introduces JRC 08:14:12 ralf describes the news sources used for "media monitoring": 100.000 news articles gathered per day, in 50 languages 08:14:48 ralf: articles are converted into rss for further processing 08:16:48 ralf gives examples of news coverage: not always news are available in English, but sometimes more is available in other languages 08:18:38 ralf: we also find out co-occurences: who or what is mentioned with whom or what in different languages? 08:19:25 .. also analysing quotation networks: who gets mentioned by whom, also different depending on the language 08:19:37 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 08:19:46 q? 08:20:12 ralf: recognition of entities (mostly persons) in about 20 languages 08:22:35 ralf: multilingual categorization, using about 1000 categories, using boolean search word operations, optional weights of words, co-occurance and distance of words, regular expressions for inflection forms (not only morphological) 08:24:11 .. multilngual categorization in general and specific for medicine in the medisys - system 08:25:29 .. classifying countries and category, e.g. there is 1/2 article about tuberculosus in tzech, but if suddenly it is 5 articles a day, we can issue an alert 08:27:08 j has joined #mlw 08:27:51 j has left #mlw 08:28:32 j has joined #mlw 08:28:43 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 08:29:22 ralf introducing news explorer - multilingual news daily overview 08:36:08 ralph: application about multilingual template filling - NEXUS, extracting structured information about events 08:36:21 .. focusing on conflicts, crimes, desastors, ... 08:36:40 .. want to know if there is a desastor with the need to send aid etc. 08:41:27 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 08:44:03 raplh: summarizing: have demonstrated our EMM system, technologies being used, application scenarios 08:44:55 .. modest attempts to get access across languages, but users appreciate it and it shows that the Web is not only for English 08:45:11 topic: Q/A for welcome session 08:45:18 chaals, you wanted to ask about how users will distinguish papa.it and papá.it 08:46:49 domenico: punycode translation of papa.it and papá.it is different, so sure, yes 08:47:36 XYZ: question about nexus: if a news paper says "person X is a freedom fighter, another saying "person X is a terrorist", who do you deal with this? 08:48:12 raplh: there is political analysis being done, but categorization like the above is normally not being done 08:48:26 .. system is publicly accesibly via our home page 08:48:50 richard: now break 08:48:58 tadej has left #mlw 08:49:00 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 09:18:06 Jirka has joined #mlw 09:18:21 scribe Jirka 09:18:37 I'm logging. I don't understand 'scribe Jirka', Jirka. Try /msg RRSAgent help 09:20:43 scribenick: Jirka 09:21:22 Adriane Rinsche opens Developer session 09:22:24 topic: "Multilingual forms and applications" by Steven Pemberton 09:23:20 Steven talks about HTTP content negotiation 09:23:43 fsasaki has joined #mlw 09:23:48 Tomas has joined #mlw 09:24:26 tadej has joined #mlw 09:25:08 PBS has joined #mlw 09:25:31 Steven shows some examples of content negotiation 09:26:01 Steven talks about possibility of providing more better 404 error pages 09:27:16 ... and 406 pages 09:28:07 ... some servers like www.google.com ignore content negotiation headers 09:29:13 ... and try to guess your location based on your IP address 09:29:20 Most do. The general problem is Multilingual Web Sites (MWS). 09:30:36 ... another approach is to have button for changing language on the web page itself 09:31:42 ... some sites even use Javascript to change content inside the page 09:32:31 After summarizing some bad practices in serving multilingual websites Steven now introduces XForms 09:32:55 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Jirka 09:35:19 XForms separate data and presentation. Steven shows this on example of the simple form 09:35:44 luke has joined #mlw 09:36:42 ... XForms can contain calculations 09:37:03 ... controls are abstract and can get different styling easily 09:37:28 ... it's possible to use different datasources 09:37:36 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Jirka 09:39:22 Steven shows form which can dynamically change labels for form fields based on the selected language for the form 09:42:23 ... XForms use declarative approach which require much less work to produce 09:43:44 ... conclusion - XForms allow to use "language stylesheets" to create multilingual forms even if this wasn't original goal for XForms 09:44:16 It is in my presentation this afternoon. An overview http://dragoman.org/mws-india.html 09:45:00 topic: "Lessons from standardizing i18n aspects of packaged web applications" by Charles McCathieNevile 09:45:22 Chaals introduces Widgets technology 09:46:37 ... history of Widgets development and standardization in W3C 09:46:53 ... Widgets are now split into 7 specifications 09:47:32 PBS has joined #mlw 09:47:46 Chaals shows source of simple Widget 09:48:18 kimmo has joined #mlw 09:49:10 ... describes l10n features of Widgets 09:49:20 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 09:52:22 ... Widgets use xml:lang and for more larger resources separate language specific directory can be used 09:53:42 Steven has joined #mlw 09:54:42 ... Widgets do not use ITS because namespaces are too hard for some web develpers, instead few specific attributes and elements were adopted (span, dir, xml:lang) 09:55:22 ... Opera extensions are based on Widgets 09:56:13 ... l10n is hard, you should get advice and do proper test 09:59:41 topic: "HTML5 proposed markup changes related to internationalization" by Richard Ishida 10:00:34 r12a has left #mlw 10:00:45 Richard closes his IRC client 10:01:02 Steven has joined #mlw 10:01:18 Richard tries to explain what HTML5 mean 10:01:38 ... Richard will talk only about HTML5 specification 10:01:40 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 10:02:11 ... not about related things like CSS3, new Javascript APIs, ... 10:02:33 ... HTML5 endorses utf-8 encoding 10:03:17 ... simplified encoding declaration 10:03:56 ... polyglot documents are both XML and HTML5 (HTML syntax) documents, use utf-8, no XML declaration 10:04:28 Actually, XHTML 1.0 had the same thing, but didn't call it "Polyglot" 10:05:23 But it was addressing the same problem 10:07:34 ... charset attribute was removed from link and a elements 10:08:40 ... language declaration can use lang attribute or content-language HTTP header 10:09:05 ... content-language can contain more languages then one 10:10:00 ... content-language was just recently removed from HTML5 draft 10:10:26 Richard now explains Ruby 10:11:03 [Ruby was very common in western medieval texts, where greek, latin, hebrew etc would be mixed. E.g. religious texts, and scholarly documents] 10:11:18 Yes, Chaals, it is very useful for other things than Ruby; pity they called it Ruby mark up, since it is more than that 10:11:22 ... HTML5 have support for Ruby, but uses slightly different markup then XHTML 1.1 or ITS (missing rb element for base text) 10:11:54 ... Bidi support 10:12:58 PBS_ has joined #mlw 10:13:00 ... HTML5 adds bdi element for bidi isolation 10:13:36 ... dir="auto" allow run-time decision about directionality 10:15:10 I sent a last call comment to the ruby WG, saying they should call it something more generic, but they declined "because Microsoft had already implemented it" 10:15:29 ... Richard invites all to get involved in spec development 10:16:06 topic: "Internationalization (or the lack of it) in current browsers" by Gunnar Bittersmann 10:16:51 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 10:17:54 Gunnar talks about some problems in the HTML5 10:18:25 ... valdation of email input type field is too restrictive in spec - doesn't support IDN 10:19:34 s/valdation/validation/ 10:19:52 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Jirka 10:21:43 ... each browser provides different UI for changing preferred language 10:22:05 ... some browsers has bugs in this 10:22:35 r12a has joined #mlw 10:23:39 Some browsers have bugs, but some do it completely wrong :-) 10:25:12 ... language negotiation is missing some feature 10:25:25 ... how to label original and translation 10:25:34 ... how to label human and machine translation 10:26:04 topic: "What's Next in Multilinguality, Web News & Social Media Standardization?" by Jochen Leidner 10:26:08 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Jirka 10:28:21 Jochen shows mind map of presentation 10:28:41 ... presents details about Thomson Reuters company 10:32:00 ... customers require high quality 10:32:18 ... combination of human and automatic methods is in use 10:32:30 ... XML and Unicode is heavily used 10:33:10 ... main issue is not lack of standards but developer education 10:33:48 ... i18n and l10n is not a part of curriculum 10:38:11 ... new chalanges are support for multimedia content 10:38:29 ... some content is hidden (Facebook, Twitter, ...) 10:39:38 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 10:41:15 ... proposes more open twitter-like messaging system with better support for i18n 10:42:19 omstefanov has joined #mlw 10:42:21 ... it might be useful to HTML tag saying that some page is translation of a different page 10:42:40 topic: Q&A session 10:44:27 Question from Google: Defends current state of affair regarding language selection. Asks whether easier UI will help? 10:45:09 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 10:45:59 Chaals: Interface should be easier to use, most users doesn't set their language 10:46:51 ... content should contain as much metadata as possible to inform about alternative versions of content 10:47:21 Richard: mentions some extension that allows easier change of preferred language 10:48:12 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 10:49:05 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 10:50:34 Andrea has joined #mlw 10:50:46 Question from Olaf: What is chance to implement some notation for marking document being in the original language. 10:51:11 Chaals: There are many notations starting from simple rel= going to RDF 10:52:40 ... you should use it, browsers will support what is used on the pages visited by users 10:53:24 ... you should talk to producers of content creation tools 10:54:35 Richard: you should be more involved, create proposals, ... 10:55:15 Felix Sasaki: It's possible to introduce new language subtag for this 10:55:52 .. use the ietf-languages list to discuss this with the people reviewing such proposals 10:55:59 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 11:38:56 Zakim has left #mlw 11:59:44 Steven has joined #mlw 11:59:54 Scribe: Steven 12:01:27 Topic: Creators 12:02:28 i/Scribe: Felix/scribenick: fsasaki 12:02:38 rrsagent, make minutes 12:02:38 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Steven 12:03:41 i/scribe: felix/scribenick: fsasaki 12:03:49 rrsagent, make minutes 12:03:49 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Steven 12:06:22 Jirka has joined #mlw 12:06:36 Felix: Welcome to afternoon session 12:06:51 Topic: Office.com 2010 12:07:43 (Speaker - Dag Schmidtke) 12:08:11 Dag: 37 langs, 51 markets 12:09:37 ... some countries have more than one language (eg Belguim, Canada) 12:09:43 s/ui/iu/ 12:10:19 Dag: adding value to Office 12:10:47 ... content, templates, also sell Office 12:11:05 ... campaigns in different markets at different times 12:11:16 ... market specific engagement 12:11:49 Dag: Recent migration, site management and authoring from XMetal to Word 12:12:19 ... and using sharepoint instead of a custom publishing system 12:12:36 ... we did extend Word to support this 12:12:45 ... allows federated authoring 12:13:22 ... helps with localization 12:13:55 Dag: Lessons from this migration 12:14:07 ... internationalisation was a key stakeholder 12:14:22 ... designed for scale 12:15:12 ... it was quite an effort, next time we won't do everything at once 12:16:08 Dag: 100s of thousands of help documents for at least the last three releases 12:16:17 ... content heavy 12:16:53 ... complexity wasn't where we expected, and was more complex than we expected 12:17:51 Dag: General lessons from the site 12:18:43 ... Serve all global market needs, English is just another language 12:18:56 ... scale up *and* down 12:19:48 r12a has joined #mlw 12:19:55 chaals has joined #mlw 12:20:10 ... design for growth 12:20:57 [gives example of content riginating in Japan, and translated to other languages] 12:21:03 s/rig/orig/ 12:22:11 Dag: No character formating, only character styles 12:22:18 s/ting/tting/ 12:22:53 Dag: We have an XML format for translation 12:23:02 Dag: Local touch 12:23:20 ... deliver right experience to each market 12:23:45 [examples] 12:25:19 Dag: Customer connection 12:25:41 ... feedback, evaluation, SEO 12:26:36 [examples from site] 12:27:02 Dag: Continuous updates 12:27:27 ... respond to regional events, A/B testing 12:27:48 ... use some machine translation 12:27:56 Dag: Future trends 12:28:05 ... moving to the cloud 12:28:43 ... multilingual multimedia 12:28:59 ... language automation 12:29:21 .... interoperability with standards 12:29:27 s/..../.../ 12:29:36 Dag: Conclusions 12:29:59 ... It is possible to design for scale and local relevance 12:30:49 Topic: 12:30:49 Jirka Kosek - Using ITS in the common content formats 12:31:06 s/Jirka/Topic: Jirka/ 12:32:19 rrsagent, make minutes 12:32:19 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Steven 12:33:16 Jirka: tag set designed to help with translations 12:33:37 ... usable with any XML vocabulary 12:34:39 [example of use] 12:36:21 Jirka: Allows automatic software to see what should not be translated, as well as human translators 12:36:35 omstefanov has joined #mlw 12:36:58 [As Jirka said, you don't have to use the actual ITS namespace to use the ITS pieces - and the decision for widgets was indeed to do that] 12:37:05 Jirka: Now to look at formats that support ITS 12:37:14 ... first DocBook 12:38:16 lbellido has joined #mlw 12:38:22 [example] 12:41:24 Jirka: Next format, DITA 12:41:45 ... for topic-bsed documentation 12:42:07 .... DITA doesn't natively support ITS 12:42:13 ... can be added 12:42:31 Jirka: Now OOXML 12:42:40 jan has joined #mlw 12:43:00 ... Open Office, and even for MS Office 2007+ 12:43:23 ... no native support, but can be added 12:44:30 Office Open XML is a MS developed standard, not Open Office... ;-) 12:44:43 Jirka: ODF is similar 12:45:07 Jirka: XHTML allows use of ITS 12:45:46 ... HTML5 has no extension points to allow ITS 12:47:02 ... what is to be done? 12:48:09 ... HTML5 needs to be augmented to support ITS 12:50:01 Dag: MS translator does support something similar 12:51:00 Steven: If XHTML5 supports it, why not just say "Use XML serialization if you want this facility"? 12:51:23 Jirka: Not sure if people can produce well-formed XML 12:51:50 Topic: Chaals - standards for multilingual websites 12:52:38 Slides from my presentation http://www.kosek.cz/xml/2011mlwpisa/ 12:52:58 Chaals: What standards should be developed? 12:53:35 ... there are lots of multilingual sites. Substantial problems 12:53:56 I am here ... just in case 12:54:04 ... principles - don't break existing stuff 12:54:25 ... expect it to take time 12:54:38 ... two sides of coin: users and webmasters 12:54:39 Slides - http://dragoman.org/pisa/carrasco-mw-pisa.pdf 12:55:07 Chaals: But it is often less clear-cut 12:55:46 Chaals: Currently - no consistent user interface for a ML website. 12:56:22 ... this should be fixed 12:56:25 Encarna has joined #mlw 12:56:53 ... No standards for multilingual content production 12:57:02 ... this should be fixed 12:57:54 No standards for content production - in general - not a particular problem to MWS 12:58:02 Chaals; Most users are monolingual 12:58:09 [Scribe: he claims] 12:58:43 One needs hard data 12:58:46 Chaals: Webmasters must manage multilingual system 12:59:06 ... users don't want more complexity 13:00:21 ... webmasters aren't necessarily experts in this stuff 13:00:56 ... interfaces for content from the user side are well-established 13:01:01 ... not so for webmasters 13:02:33 Chaals: Some ideas - language button in the browser 13:02:48 ... use HTTP header fields maybe 13:03:48 ... content negotiation 13:03:49 Another good "high level" variant is memento http://www.mementoweb.org 13:04:05 ... reserved URIs 13:04:37 ... I am not sure if reserved URIs are a good idea 13:04:54 Chaals: It should be possible to request a translation 13:05:07 ... there's an Opera extension for that 13:05:35 A reserved URI is very good as one can have all the pages in the MWS with the same URI pointing to the variants 13:06:20 maitaining pages with different URIs for the variants is very hard 13:06:38 ... need a metaresource concept 13:06:45 [Scribe: RDFa!] 13:07:46 RDF might do it - needs verification 13:08:26 Chaals: Need server-side standards 13:09:21 iantruscott has joined #mlw 13:09:35 Scribe: RDFa was largest growing web format last year http://rdfa.info/2011/01/26/rdfa-grows/ 13:09:58 Chaals: Next step? Working group maybe 13:10:23 ... at W3C? Elsewhere? 13:10:37 No WG, not specifications 13:10:43 ... or create a new initiative? 13:11:12 Chaals: Need guides for best practice on user and webmaster sides 13:13:44 scribe: r12a 13:14:05 Topic: Sophie Hurst - Local is global 13:14:11 A tabular view http://dragoman.org/mws-india.html 13:14:35 Chaals: you wont your beer !!! 13:15:06 Sophie: 90% of HP buy based on content rather than touhing product 13:15:24 s/HP/HP's customers/ 13:16:14 Sophie: 42% of web users are from Asia 13:16:26 ... only 13% from USA 13:16:38 ... yet English still leading language 13:17:20 ... asia has highest usage but low penetration 13:17:27 ... therefore it's a growth area 13:17:49 ... 10% retails sales in CHina are done online 13:18:00 s/retails/retail/ 13:18:15 s/CH/Ch/ 13:19:02 [My concern with reserved URIs is that it breaks some existing standards and expectations. I think HTTP headers and metadata are better approaches. (I generally hate reserved URIs - they are used in P3P, favicons, robots.txt and a couple of other places, but I don't think they're going to handle the complexity of multilingual websites without creating as many problems as they solve...)] 13:19:04 Sophie: How to represent brand consistently, locally 13:19:12 ... how to make it relevant 13:19:23 [I certainly think that being able to get the information about available variants is really important] 13:19:29 ... how to manage translation 13:20:05 Sophie: First is to use component based system 13:21:10 chaals: yes, but it might be sufficient to have link/http header pointing to another URL where manifest listing all possible variants will be sitting then to have dozen of alternatives in each page -- to much change when new translation is added 13:21:17 ... synchronisation between compnenet sis then eassy to manage 13:21:23 s/ss/s 13:21:54 ... allows local components, but global style 13:22:17 ... eg Emirates site 13:23:18 Sophie: Use positioning information to personalise information 13:23:54 ... example, Lux brand which is up-market in India, but not elsewhere 13:24:21 ... need local input to ensure local nuances are working 13:25:22 ... users come with cultural layers as well 13:25:38 ... cultures vary in many dimensions 13:26:39 Sophie: Finally, managing content 13:26:57 ... need a well-managed process 13:27:06 [The browser side is much better, but we have to care for the server side. This is the question: how to implement the server-side. Separate function from the mechanism: we can explore different mechanisms. One fix reserved URI for the whole server combined with the Referer header will certanly resolve a big problem (different URIs for each page). 13:27:30 ... can be automated to large extent (the management, not the translation) 13:28:10 [shows an example process] 13:29:30 Sophie: In conclusion, translation must be part of a larger picture 13:29:55 ... use component, geo-positioning, and translation management 13:30:09 [Q&A session] 13:30:12 Question of scope: what should be in MWS and what in other specifications for full translation system. 13:30:51 The picture is larger: Authorship, Translation and Publishing Chain 13:31:51 Translation is only part of the whole production chain 13:31:59 Christian Lieske: For Chaals- I got different messages - we've got to do stuff, but Sophie seems to suggest we can already do it. 13:32:40 Chaals: It's not that we can't do it already, but that there is no agreed way to do it 13:33:59 We need to define the different scopes and how the different fields integrate; a MWS is *not* a translation management system. 13:34:30 Chaals: We have no ineroperability 13:34:38 s/iner/inter/ 13:34:46 You wont another beer !!! 13:35:13 Sophie: Changing solutions is hard, standards could help 13:35:31 We need to identify what is particular to MWS and is general. 13:36:15 Sophie: We should work towards a position where you need less developers 13:36:44 Language is just one of the dimensions in TCN; e.g., mementos should be integrated in the same mechanism http://www.mementoweb.org/ 13:37:42 Dag: We have a translation tag, but it is not standard, so there is less customer value, in the long run a standard lowers the cost of entry for us 13:37:45 +1 regarding developers: one should be able to construct a MWS from Apache out of the box 13:37:57 Andrea has joined #mlw 13:38:15 Tomas Abramovitch: Do you use different CSS for different cultures? 13:38:30 ... and how accurate is geo-location? 13:39:07 One could (CSS) 13:39:10 Dag: We componentise our pages, the local part is not done by CSS 13:39:45 Sophie: I can't totally answer the geo-loc part. 13:40:28 Chaals: It is a spectrum from one person to just someone in a country 13:40:28 One could generate some pages: "5.3. Generating language in parallel" in http://dragoman.org/mws/oamws.pdf 13:41:15 s/one person/identifying one seat in an audience/ 13:42:22 Ian Truscott: identifying people is always a guess until they log in 13:42:55 Or he set his browser preferences 13:43:27 Reinhard: How do we learn from research? No one has mentioned this 13:43:48 ... different people like different things 13:44:08 ... 16 year olds in China have more in common with 16 year olds in the USA than with their parents 13:44:35 ... all I've heard is corporate policy. Why not let the user decide? 13:45:01 A user wants the page in his language 13:45:10 Sophie: Crowd sourcing is an option 13:45:39 Choosing is already a hurdle 13:47:33 We need to look at all the available mechanisms and decide on a recommendation: "4.4. Options" in http://dragoman.org/mws/oamws.pdf 13:50:00 Dag: There are areas where our interest and the users' coincide 13:50:22 ... but we can't do translation on demand 13:50:38 ... they pay for premium product 13:51:05 [It isn't always a guess identifying the user until they log in. In fact, technically it is often easy to identify users anyway - this is why we have laws to protect privacy and limit the things done to make it easy] 13:54:04 Steven: A good example of Reinhard's point is websites that conflate refgion with language. I often don't knwo which question they are asking. 13:54:38 ... and I don't believe that most people are monolingual. There are 6000 languages, and 150 countries. Most people are at least bilingual 13:55:06 [scribe's computer is nearly out of battery] 13:55:26 [we need to identify what the user wants, not who he is] 13:55:36 Reinhard: Corwdsourcing translation is often not possible because of copyright issues 13:55:54 s/Cor/Cro/ 13:57:12 Olaf: We need the possibility to offer translations of parts of sites 13:57:38 ... it works on wikipedia 13:57:52 Monolingual user: we need hard data; but circunstancial data point to that the requirement of most user is monolingual. 13:58:01 ... microsoft needs to open its translation tools 13:58:48 Chaals: I use crowdsourced translation of Norwegian law 13:59:31 ... it is easy to do, but by and large it doesn't happen 13:59:45 ... too little reward 14:01:41 Translation integration in MWS: a language non available could be defined as a "language potentially available" (after translation). One needs a mechanism covering all the aspects of the different translation techiques: human (professional, crowd), machine (fast as RBMT or slow as SMT). 14:02:30 karl has joined #mlw 14:04:14 For the whole enchilada: "Open architecture for multilingual parallel texts" http://arxiv.org/ftp/arxiv/papers/0808/0808.3889.pdf 14:06:05 OK: I go for coffee. 14:30:02 Jirka has joined #mlw 14:42:22 fsasaki has joined #mlw 14:43:23 topic: presentation from Christian Lieske et al. 14:44:03 lbellido has joined #mlw 14:45:10 christian: five areas show that there is a need for change: 14:45:39 .. demand for language related services, shortcomings of today's translation-related standards, ... 14:45:57 .. why talking about standards: demand & lack of interoperability 14:47:02 .. lack of interoperability e.g. for XLIFF 14:47:46 .. things break down across tool chains 14:48:15 .. standards in localization area are sometimes not compatible 14:48:26 .. example of phrases in TMX vs phrases in XLIFF 14:49:16 christian: not of work in localization standardization integration new web technologies 14:49:43 .. e.g. aspect of RESTful services, use of related protocols (odata, gdata) for translation related services 14:50:11 .. these problems have lead to implementation challenges, problems for standards that are already here 14:50:47 .. how to solve the problems: four areas of requirements, methodology, compliance , stewardship are important 14:51:44 .. requrements: identify processing areas related to language processing - and keep them separated 14:52:00 .. determine the entities that needed in each area 14:52:24 .. chart technology options and needs 14:52:48 ... etc. Next: methodology: 14:53:12 .. distinguish between models and implementation / serialization 14:53:13 omstefanov has joined #mlw 14:53:41 .. distinguish between entities without context and entities with business / processing context 14:53:53 .. set up rules to transform data models into syntax 14:54:15 .. set up flexible registries, e.g. CLDR, IANA 14:54:33 .. provide migration paths / mapping mechaisms for legacy data 14:54:52 s/mechaisms/mechanisms/ 14:55:01 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 14:55:39 .. third, compliance: e.g. what does "support for standard X" mean? 14:56:50 .. finally, stewardship: driving, supporting standardization activity 14:57:07 .. anyone who shouts for small standards should be willing to invest 14:57:25 .. EC has a track recor, see e.g. mlw project 14:57:37 .. make donations / contributions easy 14:57:50 .. discourage fragmentation and unclear roles 14:58:20 .. LISA does no longer exist, now there is a kind of competition who could follow in the footsteps 14:58:53 .. my fear is that another organization is being cretaed, my and probably Felix' and Yves' thought is that this should be avoided 14:59:36 topic: David Filip on "Multilingual transformations on the web via XLIFF current and via XLIFF next" 14:59:52 .. christian has covered a lot for XLIFF 2.0 - what do I want to cover? 15:01:10 s/.. christian/david: christian/ 15:02:12 david: my main statements: metadata must survive language transformations, content metadata must be designed upfront with the transformation process in mind, XLIFF is the principle vehicle for criticial metadata throughout multilingual transformations 15:02:51 .. and finally: next generation XLIFF standard is an exciting work in progress in OASIS TC 15:03:06 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 15:03:20 cchiavet has joined #mlw 15:03:49 .. about preserving metadata: there are various transformations: g11n, l18n, l10n, t9n ("GILT") 15:04:01 .. transformation modi: manual, automated, assisted 15:04:37 .. transformation types: MT, human translation, postediting, stylistic review, tagging (semantic, subject matter review, transcribing), subtitling, ... 15:04:47 .. growing number of source languages 15:04:54 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 15:05:54 Steven_ has joined #mlw 15:06:26 david: what metadata is necessary? 15:06:34 .. preview and context are critical 15:06:54 .. argue for creating standardized XSLT artefacts for preview 15:07:12 .. metadata for legally conscious sharing (ownership, licensing, ...) 15:08:44 .. grammatical, syntactic, morphological and lexical metadata 15:09:18 .. example of m4loc project: they developed an XLIFF middleware to ensure interop between localization open source tool and moses MT tool 15:10:15 .. tagging of culturally and legally targeted information 15:11:07 .. home for LT standardization? Leverage BP of existing loc standards (XLIFF, TBX, SRX, ...) - pointing into the past (OASIS, LISA) 15:12:13 .. now: leverage OASIS XLIFF, ISO TC37, Unicode SRX and GMX 15:12:48 .. further development of W3C ITS and RDF, create conscious standardization including RDF and XLIFF 15:13:13 david: OASIS is home of XLIFF, but has also UBL and XBL as its home 15:13:31 .. W3C has ITS and RDF modeling, Unicode - see above 15:13:48 .. ISO TC 37, important not for standards creation but for secondary publishing 15:14:30 .. why XLIFF?, and why 2.0? see also presentation from christian 15:15:21 .. good progress of XLIFF in 2011 possible, as SWOT analysis shows 15:15:40 .. prediction: 2011 will see definition of new features, in 2012 new standard 15:16:13 topic: presentation from Sven C. Andrä 15:16:19 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 15:17:19 sven: kilgray, "we localize", Andrä, biobloom are behind the "interoperability now!" initiative 15:17:36 i/christian: five/scribenick: fsasaki 15:17:41 .. translation (technology) industry is a niche industry 15:17:42 rrsagent, make minutes 15:17:42 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html Steven_ 15:18:03 .. very few computer scientists here, not a technical, but experience driven industry 15:18:16 .. industry is getting more and more important, including technology 15:18:23 .. hence interop is getting more important 15:18:39 .. there are enough standards here, but they are complex, not many have reference implementations 15:18:47 .. and there is little exchange within tool providers 15:19:28 table of features in XLIFF that are supported by all tools - only two features (from about 50?) are supported by all tools 15:20:01 sven: we want lossless data exchange in a mixed (tool) environment 15:20:30 .. standards are important, also develo 15:20:45 s/develo/further development of XLIFF/ 15:21:00 .. but mindset is most important, i.e. about the lossless data exchange 15:21:13 .. basis of our work: "interoperability manifesto" 15:22:34 .. pushing standards over the edge, give feedback to the TC 15:22:56 .. modules that we are working on: about content, package, transportation 15:23:06 .. content is modified xliff 15:23:16 .. package is currently just made up 15:23:27 .. for transfortation we are using regular web services 15:23:48 .. basic approach: disclose our concepts 15:24:27 .. reference implementations are open source 15:25:00 .. early real life usage 15:25:52 .. test scenarios to verify compliance 15:26:48 .. theoretical aspect: agile vs. standard? 15:27:30 .. would be good to have a framework for organizations like W3C that could help is to bring this into standardization step by step 15:28:17 .. benefits of this approach: it is a limited time that we are working on this 15:29:12 topic: presentation from Eliott Nedas 15:29:18 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 15:30:15 tadej has joined #mlw 15:34:27 David Grunwald, GTS 15:34:49 s/Eliott Nedas/David Grunwald/ 15:35:14 david: our vision: have a box that creates quality content very quickly and cheaply 15:35:39 .. using MT, we want an efficient solution that will make mlw a reality 15:36:04 .. need to develop MT which is good for blog publishing 15:36:16 .. MT will never be ready "as is" for human quality translation 15:36:27 .. we developed a system for cheap and quick post editing 15:36:52 .. currently, explosion of content, lots of it is local because of language barriers 15:37:03 .. translation costs are very high 15:37:58 .. we are targeting open source CMS platforms 15:38:10 .. 20 % of web sites are published on such platforms 15:38:28 .. we could offer a good translation solution to these 15:38:41 .. large media publishers who use open source CMS 15:38:58 .. wordpress, movable type are created for all kinds of web sites, not only blogs 15:39:12 .. our solution: based on MT; human post editing, and crowd sourcing 15:39:21 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 15:40:03 david: crowdsourcing startups in many regions 15:40:44 .. our solution: not automated open source CMS solution for small guys 15:40:54 .. no automated tools for post editing / MT either 15:42:06 .. our solution uses data from blogs that is available on the web 15:43:55 .. workflow: user installs workpress, MT is done, email notification is sent to crowdsourcing translators, integrated after review by a moderator 15:45:55 .. interested in opportunities for funding this kind of work 15:46:11 topic: presentation from Pål Nes 15:46:20 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 15:46:42 Pål: opera has been using crowd sourcing for a long time 15:47:32 .. caveat of crowd sourcing: it is not free, organizing it is difficult 15:47:45 .. e.g. employing managers for the crowd 15:48:18 jan has joined #mlw 15:48:50 .. should only be used for cretain tasks 15:49:01 .. not for time critical tasks 15:49:27 .. mostly students are participating, picked up from university talks 15:49:34 tadej1 has joined #mlw 15:49:43 .. large crowd is not necessarily a good crowd 15:49:59 .. better 3,4,5 good translators, than 50 translators doing nothing 15:51:09 .. e.g. press releases, marketing material are not well suited for crowd translations 15:51:29 .. good for crowd sourcing: applications (web site "my opera", "opera com"), with a stable set of text 15:51:38 .. and documentation, that is easy to maintain 15:52:28 .. start small, put your crowd under embargo / NDA 15:52:54 r12a has joined #mlw 15:53:37 .. try building up a hierarchy 15:53:47 .. be careful with your branding 15:53:58 .. and your terminology 15:56:06 .. for opera we used XLIFF - we used our own, incompatibly version of XLIFF 15:56:19 .. discovered that open source is not open standard 15:57:41 .. tools we used: gettext and po4a, transifex, translate toolkit with pootle and virtaal, homebrew applications to bridge the vast gaps 15:58:14 .. XLIFF is a mindfield, in the current version 15:58:40 .. about html: keep it as simple as possible, semantic markup is key 15:59:03 .. write proper CSS - write a separate RTL - stylesheet to negate RTL-challenged CSS 16:00:02 topic: presentation from Eliott Nedas 16:00:10 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 16:05:10 eliott: everything that was said from David, Christian etc. in this session about interoperability was right, I concur with them 16:05:47 .. we need standards because of interdependence. 16:06:23 .. the demise of LISA. Sad that they are gone, but opportunity to look into this in a new way 16:06:32 .. LISA standards are important 16:06:53 .. now is a good opportunity for a new model of standardization 16:07:03 .. new kids on the block: TAUS and Gala 16:07:50 .. currently losts of different technologies 16:09:23 .. and many different standards 16:09:34 .. OAXAL is a solution that brings these together 16:09:50 .. that can be used for free 16:11:18 description of various aspects of standards and applications built on top of it 16:12:41 eliott: how to spread the message: important e.g. in academic curricula 16:13:26 topic: presentation from Manuel Herranz 16:14:02 manuel: presentation about PangeaMT project 16:14:30 Andrea has joined #mlw 16:14:42 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 16:15:03 manuel: translation is something that you have to go through for achieving what you want 16:15:32 .. web people except immediate translation 16:15:42 s/except/expect/ 16:16:11 .. why don't we have immediate translations? 16:17:33 .. inroducing pangeanic: LSP, major clients in Asia and Europe 16:17:48 .. we wanted to provide faster service for translation 16:17:54 .. became founding member of TAUS 16:19:17 tadej has joined #mlw 16:19:20 .. four years ago created relation with computer science institute in valencia 16:19:42 .. challenge at that time: turn academic develpment (moses) into a commercial application 16:20:17 .. limitations: plain text, language model building (first), no recording, no update feature, data availability, ... 16:21:08 .. objectives: provide high quality MT for post editing 16:21:19 .. and to use only open standards: XLIFF, tmx, xml 16:21:34 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 16:22:10 .. built an TMX - XLIFF workflow 16:22:39 .. not to be locked into a solution 16:23:35 .. PangeMT system: comes as TMX or as XLIFF 16:23:51 .. TMX should not die, people are still using it 16:24:17 lbellido has joined #mlw 16:25:07 .. future work: on the fly MT training 16:25:15 .. pick and match sets of data 16:25:23 .. objective stats for post-editors 16:25:28 .. confidence scores for users 16:26:50 topic: Q/A for localizers 16:27:19 reinhard: thank you, was a great session 16:28:26 .. about remarks on crowd sourcing: there was emphasis on crowd sourcing for enterprise 16:28:36 .. this does not go well together 16:29:29 .. other people like rosetta foundations, translators without borders etc. have made good experiences 16:29:59 Pål: crowd sourcing was good for us 16:30:11 .. it just took us a lot of effor and time to get there 16:31:18 jörg: there is some similarity: you have to train translators, otherwise you won't get the good results in medical translation 16:33:02 felix: one comment on interop now, it is very important to go into a standards body as a next step 16:33:15 sven: thanks, we will definitely try to do that 16:33:46 richard: w3c just created business groups / community groups, that might be a thing for you to look into 16:33:56 david: about what reinhard said 16:34:20 .. if your expectation is high you will be disappointed, but the business case is in the future 16:34:43 topic: wrap up 16:35:02 richard: see you tomorrow, speakers please show up at 8:30 tomorrow 16:35:07 I have made the request to generate http://www.w3.org/2011/04/04-mlw-minutes.html fsasaki 16:40:55 tadej has left #mlw 17:00:38 asgeirf has joined #mlw