08:11:29 RRSAgent has joined #mlwrome 08:11:29 logging to http://www.w3.org/2013/03/12-mlwrome-irc 08:11:39 meeting: MLW workshop, rome, day 1 08:11:46 agenda: http://www.multilingualweb.eu/documents/rome-workshop/rome-program 08:11:48 chair: arle 08:11:52 scribe: various 08:11:56 Labra has joined #mlwrome 08:12:36 topic: Welcome and Keynote 08:13:01 Daniel Gustafson on behalf of FAO introducing the conference 08:13:24 arle showing an example of internationalization needed 08:13:44 arle: language and culture are not easy issues to solve 08:13:49 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:14:27 Gudrun has joined #mlwrome 08:14:48 arle: we will learn about things from other communities, which normally won't see 08:14:57 Jirka has joined #mlwrome 08:15:08 .. this workshop series shows what actions across communities are needed 08:15:29 .. the mlw workshop series drives the development of a community who takes up the challenge of the mlw 08:15:36 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:16:16 arle: we also act as a katalysator for future projects that take up the challenge of the mlw 08:16:19 .. e.g. European projects that work on the topics discussed here 08:16:33 .. we want to improve the use of standards and BP 08:16:48 .. and want to improve support of multilingual features in browser agents 08:17:00 .. we have seen real engagement in this area to tackle the issues 08:17:05 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:17:12 omstefanov has joined #mlwrome 08:18:19 Clara has joined #mlwrome 08:19:29 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:19:40 arle going through admin issues and the program 08:19:58 s/Welcome and Keynote/Welcome and workshop intro/ 08:20:00 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:23:40 topic: Keynote: Innovations in Internationalization at Google 08:23:46 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:24:52 mark: we will focus on some of the technologies at google and how we make our products more multilingual 08:24:55 scribe: fsasaki 08:25:54 mark: at google we have to deal with core localization - here we will talk about work which is above the core 08:26:06 .. google is about search for text 08:26:28 ... we take synonyms into account, but more recently we want to take entities into account 08:26:34 .. "entity i18n" 08:26:48 .. we do this e.g. to look at wikipedia to find out how entities look like 08:27:12 .. english wikipedia is huge, so we did cross connections of wikipedia to find more about entities 08:27:35 GordonD has joined #mlwrome 08:27:35 .. part of this is names, e.g. personal names, google+pages with more free form names 08:27:42 .. and also URLs, which present their own problems 08:27:49 .. some problems are related to security 08:28:00 .. many characters are look alikes 08:28:30 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:28:30 .. this creates opportunities for spoofing 08:28:45 .. normalization of names: when are two names the same? 08:29:09 .. that includes handling of inflection, how people want their names represented 08:29:36 .. this involves issues related to semantics, encoding, formatting etc. 08:29:42 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:30:17 mark: recently we worked on plurals and gender 08:32:21 mark: plural and gender are tricky features to deal with 08:32:32 .. currently we have patterns for numbers written as digits 08:32:42 .. used in messages, units, contact numbers, etc. 08:33:08 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:33:27 mark: now handing over to vladimiar who will take us through these areas 08:33:47 vladimiar: we have a nice way to represent e.g. gender and number information across languages 08:33:55 .. we want translators to handle with this properly 08:34:19 .. we built a tool that would show the localization specialist what he needs to see 08:34:26 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:34:39 vladimar: engineers write forms in English 08:34:55 .. in our code we have ways to specifiy how gender and plural should be treated 08:35:19 .. examples on how this works for various languages 08:35:25 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:35:44 Andrea has joined #mlwrome 08:35:44 smokingred83 has joined #mlwrome 08:36:13 vladimir: now phone number example 08:36:25 .. operating system that runs on the phone 08:36:35 .. for each phone number we want it to be unique 08:36:48 .. that is, there is a country code, area code, digits to dial the number 08:37:09 .. people in different countries don't want to think about +xx, they just want to write things as they like 08:37:21 .. so we have an open source phone number library 08:37:53 .. see http://code.google.com/p/libphonenumber/ 08:38:07 .. it handles parsing, formatting, canonicalization 08:38:11 .. getting types and examples 08:38:14 .. finding numbers in text 08:38:34 vladimir: canonicalization is a hard problem 08:38:54 .. google contact book has a flag that tells you the country of the number 08:39:12 .. it allows you to fix the number if it's wrong, and it relies on above library 08:39:19 .. geolocation of number is an issue 08:39:24 .. in some places you cannot solve it 08:39:39 .. in us there is a physical , teretorry designation 08:39:50 .. in Europe that is different 08:39:59 .. so the problem can't be solved in general 08:40:05 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:40:14 vladimir: now about addresses 08:40:25 .. e.g. for sending a check 08:41:10 .. see library at http://code.google.com/p/libaddressinput/ 08:41:13 .. this is also open source 08:41:32 .. allows for validation of regionsy, layout and basic validation 08:41:50 .. to e.g. give a street address that is actually meaningful 08:42:28 mark: "getting language settings wrong" - an issue for users and also people inside Google 08:42:30 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:42:45 .. was a hard problems to fix 08:43:04 .. worse even for enterprises, since the language often is set but some administration 08:43:24 .. we created "universal language settings" 08:43:34 .. this allows people to set the language across languages 08:43:46 .. sounds simple but rollout across products is hard 08:43:58 .. allows for setting more than one language 08:44:18 .. did some analysis of gmail - a fair amount of users speaks more than one language 08:44:22 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:44:40 .. fallback is needed if preferred language is not available 08:45:02 .. a use case is to serve better content in search scenarios 08:45:29 .. with language settings the outcome is better than trying to gues the users language 08:45:36 .. "60 language initative" at google 08:45:47 .. we did a 40 language initiative 08:46:03 DomJones has joined #mlwrome 08:46:18 Monica has joined #mlwrome 08:46:19 .. we showed google internally how support for more languages helps to get more satisfied customers 08:46:29 lbellido has joined #mlwrome 08:46:37 .. that was an incentive google internally to have more language support 08:46:41 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:46:56 .. now we are rolling out 60 languages in many products 08:47:41 vladimir: important for people that they can interact with their device the way they want 08:48:25 .. speech-text library important for that; our library for that now has support for 42 languages including accents / dialects in 46 countries 08:48:35 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:48:49 ivan has joined #mlwrome 08:49:31 vladimir: text input - dedicated team is developing input methods for many languages 08:49:44 .. input methods on android, native input methods to store on your device 08:49:51 .. also cloud input methods 08:50:17 .. librar(ies) for input http://www.google.com/inputtools/ 08:50:47 .. with dictinoaries and word frequency data we are trying to guess what people want to type 08:50:54 .. which is helpful for many users 08:51:18 .. another team at google creating fonts for all unicode scripts 08:51:30 see http://code.google.com/p/noto/ 08:51:35 noto = no tofu 08:51:49 vladimir: dealing with so many fonts and font data 08:52:06 .. needs tools for reading and writing basic font tables, special bit maps glyphs 08:52:17 .. allowing us to serve smaller subset of fonts 08:52:29 .. we have a sfntly font library for that 08:52:56 http://code.google.com/p/sfntly/ 08:52:56 vladimir: now google translate 08:53:03 .. now support for 65+ languages 08:53:35 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:53:46 .. if you use a chrome browser the browser can detect the content of the website and will offer translation 08:54:12 .. we also allow you to submit feedback to make the translation better 08:54:20 .. we are allowing people to access content on the web 08:54:28 .. we do not shoot for specialized enginges 08:54:46 vladimir: now google localization infrasttructure 08:54:57 .. see http://translate.google.com/toolkit 08:55:16 .. used now for all google localization 08:55:52 .. important to have everything available under google control, so that changes can be rolled out everywhere easily 08:56:13 .. the toolkit can also be used for outside users 08:56:26 .. you can use translation tools, glossaries etc. for yourself 08:56:34 s/tools/memories/ 08:56:49 vladimir: world of localization is not well standardized 08:56:55 .. we created ARB format 08:57:00 .. for web applications 08:57:34 .. we created an easy to use JSON format to use at runtime to skin applications at runtime 08:57:52 .. we also allow people to produce localized content 08:58:05 .. example of youtube caption translation 08:58:10 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:58:36 .. allows to take your captions from the video and ask your friends to translate it yourself 08:58:53 smokingred83 has joined #mlwrome 08:59:15 vladimir: next itegration even allows you to buy captions from a vendor 08:59:28 nicoletta has joined #mlwrome 08:59:29 .. idea is that people can access translation of their content on many different levels 08:59:37 s/idea/general ideal/ 08:59:43 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 08:59:54 present: many, many, many, people 09:00:15 topic: keynote q/a 09:00:21 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 09:00:42 joomla has joined #mlwrome 09:01:06 chaals: question on ARB format 09:01:27 .. w3c standardized various XML formats related to that 09:01:36 .. google didn't follow that path and choose the json way 09:01:42 .. why did google took that path? 09:02:01 mark: google for the format is to have something light 09:02:09 . .that can be mapped to xml, but something simple 09:03:06 tomas: do you follow language identification means? 09:03:10 mark: we use bcp 47 09:03:19 .. we do not follow language accept header in HTTP 09:03:33 .. since that has never really been used in a consistent way 09:03:53 tomas: sending in the header "this is my language preference" - how about that? 09:04:07 mark: we cannot rely on people having set put the header right 09:04:21 .. then people are also not depending on which machien they are working on 09:04:22 Andrea has joined #mlwrome 09:04:33 .. things are handled just be the google (account) settings 09:04:50 xyz: are you using timed text stanards? 09:05:00 (= caption translation standard) 09:05:18 vladimir: we support one standard here, not sure which one 09:05:29 .. different standards for captions have different ways to convert from one to another 09:05:42 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 09:06:12 christianL: you use quite a bit of natural language processing in your tools 09:06:20 .. e.g. google translation in 65+ languages 09:06:42 .. in some areas: are you still working with rule-based, that is not statistical methods? 09:07:04 mark:google translate uses masses of bi-lingual data 09:07:13 s/mark:google/mark: google 09:07:27 mark: that doesn't work with languages that need ordering 09:07:36 .. so there are now pre-ordering steps 09:07:46 .. and syntactic analysis more and more being used 09:08:02 .. advantage of data approach that easily you can accomodate new language pairs 09:08:07 .. just with new data 09:08:23 .. the rule based approach was very labor extensive for us 09:08:28 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 09:08:44 mark: google translate does not map into the most commercial languages 09:08:58 .. but the languages that have most of the data 09:09:08 tomas: do you tackle the data that you use in MT? 09:09:32 .. if you have the choice between a smaller amount with good quality and large "dirty" data sets? 09:10:00 mark: we take into account large data sets in standard format, but we also want to translate tweets 09:10:23 .. also need to make sure we do not use data for training that has been MT translated already 09:10:44 Monica has joined #mlwrome 09:10:49 .. or example of date representation: sometimes auto-generated with CLDR, that then influences training 09:11:08 I'm logging. I don't understand 'drat minuts', fsasaki. Try /msg RRSAgent help 09:11:11 I'm logging. I don't understand 'drat minutes', fsasaki. Try /msg RRSAgent help 09:11:31 richard: maybe explain what CLDR is 09:11:49 .. and explain what mobile internet tool xyz does 09:11:58 mark: cldr is a project for gathering localization data 09:12:15 .. e.g. formatting of dates, times, numbers, collaction and sorting rules, currency formatting 09:12:23 .. contact numbers 09:12:59 s/xyz/halfbuzz/ 09:13:13 mark: halfbuzz is an open source project to handle complex scripts 09:13:26 .. it is gradually developing to support more and more scripts in the world 09:13:30 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 09:14:05 now coffeee break 09:14:10 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 09:20:44 vp has joined #mlwrome 09:38:04 stocazzo has joined #mlwrome 09:38:49 antonino_carella has joined #mlwrome 09:39:13 stocazzo has left #mlwrome 09:39:57 lmatteis_ has joined #mlwrome 09:40:04 hi all 09:40:07 what's up? 09:43:38 why nobody talk? 09:44:56 tadej has joined #mlwrome 09:45:03 laurent_oz has joined #mlwrome 09:45:09 hi tadej and laurent_oz 09:46:10 Monica has joined #mlwrome 09:49:09 hi everyone, Laurent from CSIRO/W3C Australia (Canberra). Not staying long but hoping to finsih a conversation with Felix Sasaki about NUTTAB http://www.foodstandards.gov.au/consumerinformation/nuttab2010/ 09:49:53 Andrea has joined #mlwrome 09:50:15 fsasaki has joined #mlwrome 09:50:22 scribe: fsasaki 09:50:39 topic: Going Global with Mobile App Development: Enabling the Connected Enterprise 09:50:44 presentation by jan nelson 09:50:49 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 09:51:57 jan: windows 8 phone toolkit 09:52:19 .. visual studio IDE, provides pseudo languages for "in house" testing 09:52:25 .. XLIFF support 09:52:36 Clara has joined #mlwrome 09:52:39 ... integration with MS trasnlator service 09:52:42 .. via interent 09:52:55 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 09:53:00 nicoletta has joined #mlwrome 09:53:13 lbellido has joined #mlwrome 09:53:34 Arle has joined #mlwrome 09:53:43 jan: windows phone 8 - you create a new project, binding to resources, then testing in various languages 09:53:54 .. you test in many languages, apply resource changes across languages 09:54:06 .. re-starting testing, ... takes a lot of work to ship in one language 09:54:23 .. demo today in the toolkit: how we are trying to make the process easier 09:54:51 .. will show how to add a new language, how to export to xilff, store the data in sky drive etc. 09:55:08 .. so high level overview: how you can create a windows 8 app in various languages 09:55:40 .. in the enterprise it is important: "bring your own device" scenario 09:56:02 .. in the past that didn't work in the enterprise, due to security, protocols etc. 09:56:38 .. this tool supports that scenario, but also goes beyond the enterprise 09:56:41 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 09:57:39 ian demoing the toolkit 09:57:54 ian: resources file, an XLIFF file 09:58:03 .. a specific locale for pseudo localization 09:58:42 ian: now adding french as a new language 09:58:48 .. there is a notation for a translator service 09:58:52 why so much attention on wp8? the most diffused platform is android 09:58:55 .. this service adds more language pairs 09:59:00 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 09:59:29 Monica has joined #mlwrome 09:59:35 antonino_carella: true, isn't it one of the least used mobile operatying systems? 09:59:54 and especially for w3c, it seems to me that open-source would be much more interesting 10:00:06 things like Firefox OS are much more promising 10:00:12 lmatteis, antonino, are you in the room? do you want to ask the questino during q/a session? 10:00:26 i'm not in the room 10:00:44 besides: at this workshop we try not to evaluate company specific technologies - just show what it can do and what "standardization gaps" are here 10:00:50 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:01:17 yes of course 10:01:35 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:03:02 fsasaki: which standards is the current presentor showing? 10:03:07 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:03:07 also live stream has no more audio. 10:03:13 lmatteis: XLIFF 10:03:24 s/lmatteis/at lmatteis/ 10:03:49 the editor allows you to generate XLIFF (XML Localization Interchange file format) 10:03:56 ok audio is back on live stream 10:04:26 there was now audio, just video - is the video streaming working for you? 10:04:31 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:04:32 now yes 10:04:40 good, thanks 10:04:56 ian continuing demo - now loading resource file 10:05:03 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:05:14 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:05:53 Arle has joined #mlwrome 10:06:26 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:07:42 Andrea has joined #mlwrome 10:08:17 ian: in windows 8 you have language preference settings - you can add several ones and a fallback too 10:08:26 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:10:42 ian: xliff support important, for working with many translator services 10:11:02 .. we continue to work active in the XLIFF TC to assure that there will be interoperability 10:11:46 topic: gavin brelstaff - Multilingual Mark-Up of Text-Audio Synchronization at a Word-by-Word Level – How HTML5 May Assist In-Browser Solutions 10:11:51 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:12:42 fsasaki: there are no questions? 10:12:49 (for the speaker) 10:12:59 Monica has joined #mlwrome 10:13:24 lmatteis, q/a will be at the end of the session 10:13:37 ok 10:13:46 see http://www.multilingualweb.eu/documents/rome-workshop/rome-program - noon - 12:15 "q/a" 10:14:32 gavin: example of a movie - french movie with english subtitles 10:15:10 .. example of closed captions in HTML5 - just put track, use timed text markup 10:15:11 .. can be e.g. vtt, or srt standards 10:15:16 .. it gives you line-by-line translation 10:15:19 does youtube support this? 10:15:21 .. supported by youtube 10:15:54 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:16:24 example of english subtitles for italian clip on wikimedia commons 10:16:49 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:17:19 gavin: difficult to type all numbers of timed text 10:17:25 .. we will show later how that can be easier 10:18:08 .. in pisa mlw workshop we gave a presentation 10:18:12 .. of alignment of bilingual text 10:18:53 gavin: example works on the desktop 10:19:29 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:20:09 omstefanov has joined #mlwrome 10:20:13 gavin: using javascript, html, jquery, and we can do a go through highlighting of the text 10:21:03 gavin: semantics can spread on the whole part of the page, not always easy to extract 10:21:10 .. somebody has put some timed text markup in 10:21:19 .. we can say "we are here in the text" 10:21:25 .. we don't make a video - we work on text 10:21:52 hrm 10:21:57 he didn't explain what this interface does 10:22:04 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:22:05 i can guess, but it's not very clear 10:22:21 gavin: a bit like karaoke 10:22:45 .. you can hear it in the original language and follow in your language 10:22:58 what you can? 10:23:06 there was no intro on what that interface with text was 10:23:23 demo of alignment across languages - left side italian, right side english, the highlighting of words moves with the audio 10:23:31 Clara has joined #mlwrome 10:23:50 gavin: we have a way to mark up semantic correspondance - green is direct correspondance 10:23:50 ok... well good intuition i guess 10:24:06 .. and there are other color codes for other types of equivalence 10:24:32 .. HTML under the hood: audio element, no controls 10:24:40 .. there are start and end + time identicators 10:24:47 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:25:00 gavin: markup is reflecting "semantic segmentation" 10:25:36 gavin: there is also an archived format, can be safed by server, an XML format specified in text encoding initative 10:26:02 .. we do that for historical reasons - with JSON it might be archaic, but academics use it a lot 10:26:12 .. the hard task is to deal with overlapping hierarchives 10:26:34 .. it would be useful if some kind of timed text standard woudl address the overlap issue 10:26:45 s/hierarchives/hierarchies/ 10:26:50 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:28:37 demo continuing 10:28:43 Monica has joined #mlwrome 10:28:54 gavin: visual interface to add cue points 10:29:01 gavin: why are we doing this? 10:29:13 .. aim to activate petic memory 10:29:43 s/petic/poetic/ 10:30:18 .. complementary to external memory produced by europeana - we try to get the memomry inside heads of people 10:30:28 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:31:28 talk finished 10:31:35 thanks to galvin 10:32:03 topic: Gábor Hojtsy - Multilingual Challenges from a Tool Developer's Perspective 10:32:08 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:33:17 gabor: have a decade of experience on support of multiple languages in drupal 10:33:55 .. drupal is mostly used as a CMS platform, used by various famous sites 10:34:15 s/used by/examples of several/ 10:34:20 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:35:23 Gábor: first gettext .po format 10:35:30 .. we use since it fulfils our needs 10:35:44 Gábor: location information, plural format, message texts etc. 10:35:53 .. we only use this as a transfortation format 10:35:58 .. very simple, small 10:36:05 .. we don't need to deal with gender issues 10:36:17 Gábor: that's rarely an issue for drupal web sites 10:36:26 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:36:47 Gábor: at drupal.org we have 20 000 modules hosted 10:37:14 .. you need to have incentives for people to participate in open source efforts 10:37:34 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:37:47 Gábor: we found that there is a lot of overlap between drupal projects 10:37:59 .. when you use projects together you can share translations 10:38:18 .. we built a sub model for bulding translations for drupal 10:38:29 .. we found that encouraging micro contributions is helpful 10:38:35 .. people can submit one translation each 10:38:41 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:38:58 Gábor: we built a diffing tool etc. and people get fame by contributing more 10:39:34 Gábor: configuation as a problematic area 10:39:43 .. configuration in drupal can be edited by the user 10:40:58 gabor: we use YAML for configuration 10:41:04 .. we ship these with drupal itself 10:41:24 .. people create their own configuration - people should ship the stuff as part of the software 10:41:37 .. breaking it down to small pieces was very beneficial 10:42:02 .. that is a kind of dual system that we needed to support 10:42:09 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:43:00 gabor: getting acceptance for two different translation models is difficult - it will freak out developers 10:43:10 .. you need to have one clean API 10:43:24 .. we go with the 2nd model to identify pieces in the content 10:43:36 .. for workflow support we have manual support for XLIFF 10:44:13 .. we have vendors that build tools - XLIFF tool, translation managemtn tool that has support for ITS and is demoed at the showcases 10:44:31 s/managemtn/management tmgmt/ 10:44:57 (see more about ITS + tmgmt at http://drupal.org/sandbox/kfritsche/1908598 ) 10:45:08 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:45:35 topic: Reinhard Schäler - Enabling the Global Conversation in Communities 10:45:39 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:46:27 reinhard: a confession - I'm not a developer - but "every saint has a past, and every sinner has a future" :) 10:46:36 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:47:13 reinhard: rosetta foundation is about empowering language communities 10:47:45 .. two-three years ago we found that this is also about a business model 10:48:29 reinhard: you can reach 90% of the customers by just localizing in 50 languages 10:48:43 .. that's not 90% of population, but 90% of customers 10:48:55 .. story of bloomberg storyful 10:49:17 .. a startup. It tells you looking at tweets what effects stock prices 10:49:27 .. access to information and knowledge is crucial for money 10:49:34 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:49:48 .. "social localzation": non market localization 10:49:57 .. localization for world piece - all the good things you can do 10:50:23 .. with this you will reach 70% of citizens - not customers, but citizens 10:50:29 .. many organizations active in this space 10:50:41 .. so situation here: no business case, but lots of activity 10:50:51 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:51:12 .. problem is how to connect content with volunteers that want to help? 10:51:24 .. the answer we got was: why not use what we have got already? 10:51:50 .. all the stuff we have works well in a commerical setting 10:51:57 .. but often it is closed and working in silos 10:52:20 .. with "705 of population" scenario anything can happen 10:52:38 .. what we need is something open, configurable, standards based 10:52:45 .. that is where XLIFF and w3c standards come in 10:53:02 ... SOLAS is a localization architecture 10:53:10 .. I will talk about productivive part of SOLAS 10:53:30 .. david filip and dave lewis have a demo here to show the interoperability in SOLAS productivity 10:53:43 .. focus is on impact rather than commercial value 10:53:47 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:54:23 ivan has joined #mlwrome 10:54:28 reinhard: we are about to announce an open soure project about solas 10:54:43 .. we have just signed a related licensing agreement with univ. of limerick 10:54:57 .. we will open source it an announce that next week at the gala conference 10:55:25 .. we see solas as the tool that will connect massive amount of content with massive number of people 10:55:37 .. not only to produce wealth 10:55:48 .. the project will run in trommons - translation commons 10:57:13 various solas screenshots demonstrating the functionality 10:57:20 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:57:39 reinhard: exampe of translating wikipedia article 10:57:47 .. easy to organize with solas 10:58:05 .. you can have tasks or sub tasks that volunteer transators can take up 10:58:10 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:58:39 reinhard: matching interest of translators with specific tasks is the key 10:58:50 .. that's why it is called solas match 10:58:58 .. you find the preferred task that fits you 10:59:02 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 10:59:27 .. we need to find bankers to support this 11:00:05 topic: q/a for developers 11:00:19 christian introducing q/a session 11:00:24 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 11:01:09 ionannis: many different platforms 11:01:23 .. for translation: google, microsoft, solas, ... 11:01:25 .. drupal 11:01:28 labra has joined #mlwrome 11:01:37 .. is there the danger to have too many platforms? Can this be united 11:01:49 gavin: the standards based web is the platform of the future 11:01:58 gabor: agree that this is a problem 11:02:15 .. we had a problem in drupal too - if we build it for ourself it needs to be maintained, build the features 11:02:25 .. if we cannot sustain interest it is a problem 11:02:48 .. a different solution is to have a backend that we re-use and just have our interface on top of it 11:03:00 reinhard: you need to integrate, standardize and strive for interop 11:03:02 tgraham has joined #mlwrome 11:03:08 ioannis: is this happening? 11:03:13 reinhard: in solas 11:03:30 gavin: 5 years ago the browser wouldn't have give me technologoy to do what I demoed 11:03:39 ian: you heard XLIFF, CLDR; Unicode, ... 11:03:46 .. there are many standards we can work on 11:03:56 .. think of the network stack 11:04:08 .. this is similar: as we work on the multilingual stack 11:04:22 .. we have to be careful on what becomes part of the standardization 11:04:34 .. see e.g. the ITS shown at the workshop 11:04:45 olaf: competition is important 11:04:49 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 11:05:12 s/olaf/olaf-michael/ 11:05:23 .. as long as we have serious competition, standards are helpful 11:05:42 .. if we don't have competition, things will not develop as fast as we'd like 11:06:00 mark: at gabor - how do you manage your community 11:06:14 .. if you get 10 different translation for the same thing for example? 11:06:21 s/gabor/reinhard and gabor/ 11:06:32 AndChat496496 has joined #mlwrome 11:06:32 reinhard: we are trying to manage as little as possible 11:06:34 .. and to trust 11:06:45 .. that implies some monitoring and intervention 11:06:48 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 11:07:01 reinhard: we don't have entry barriers 11:07:09 .. people can take part and can contribute 11:07:19 .. there are good examples in the open source community 11:07:22 .. that this works 11:07:42 reinhard: there are big questions around quality of translations 11:07:54 .. but there are no of the shelf answers to this 11:07:58 nwaltham has joined #mlwrome 11:08:05 .. I think twitter is trying to work with 500 000 volunteers 11:08:11 .. there is no silver bullet solution 11:08:17 .. you trust your community and work with it 11:08:22 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 11:08:57 mark: an example - using a volunteer translation 11:08:57 .. you can take the wikipedia appraoch to take the latest translation 11:09:18 .. my question about manage is: how do you choose the latest translation? 11:09:36 reinhard: we have proofreaders - we trust them more than editors 11:09:45 gabor: every language communtiy in drupal has their own subsite 11:09:50 .. the manage permissions 11:10:02 .. we found out that they have very different mgmt styles 11:10:12 I'm logging. I don't understand 'draft minute', fsasaki. Try /msg RRSAgent help 11:10:14 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 11:10:43 gabor: I found that in the open source world you need to have incentives to start work 11:10:49 .. you need to have tasks that make people happy 11:11:00 .. the micro contribution approach is helpful with that 11:11:04 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 11:11:33 gabor: just read a book about incentives for people, how to make them happy doing that - very useful 11:11:42 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 11:12:35 xyz: memorizing of music - at gavin: do you have experience with that? 11:12:53 gavin: there are cantador, singers taht listen to each other while singing 11:13:05 .. that active listeing while doing helps with memorizing 11:13:32 xyz: students in germany have more and more problems to memorize what we did previously, so we need ways to help here 11:13:37 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 11:14:11 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 11:14:31 chaals: many people talk about XLIFF - the web world uses json 11:14:43 .. but tools use XLIFF etc. internally 11:14:54 .. how do we manage the transition from XLIFF to json etc. 11:15:00 .. how do we know when standards change? 11:15:14 gavin: you have to be clear what is an declaration and what is an object 11:15:24 .. a declaration is self-evident 11:15:30 .. in json you can't declare 11:15:44 gavin: we are using gettext, yaml, xliff, tools for ITS 11:15:50 .. we use different tools for different problems 11:16:05 .. we have a community trained and people using them 11:16:19 .. if we don't see a beneift to move to new appraoches we won't move 11:16:25 jan: agree 11:16:36 .. we support many platforms, e.g. html5 windows web apps 11:16:45 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 11:17:04 jan: how do we manage to export to something else than XLIFF 11:17:16 .. when we look at services it comes artibtrary - can be xliff, json etc. 11:17:28 .. so keeping the eye on the ball of standardization becomes crucial 11:17:38 .. we have to continue whose conversations 11:17:44 reinhard: XLIFF still has a long way to go 11:17:52 .. sometimes standards can remove competition 11:18:02 ... it is good to have competition in some areas 11:18:22 .. XLIFF is trying to solve a problem that is not related to competition 11:18:33 .. more uptake of XLIFF will help everybody 11:18:52 .. it reminds me of character encoding discussion - now everybody is using unicode 11:19:06 .. such developemnt allows to concentrate on more interesting problems 11:19:11 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 11:19:14 lunch break 11:20:40 laurent_oz has left #mlwrome 12:34:23 ivan has joined #mlwrome 12:43:59 lmatteis_ has joined #mlwrome 12:56:04 tgraham has joined #mlwrome 13:02:26 fsasaki has joined #mlwrome 13:02:52 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:03:54 topic: Román Díez González and Pedro L. Díez-Orzas - ITS2.0 Implementation Experience in HTML5 with the "Spanish Tax Agency" Site 13:04:00 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:04:48 roman: from spanish tax agency 13:04:51 .. we are partners in the use case 13:04:54 kfritsche has joined #mlwrome 13:05:44 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:06:59 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:06:59 ??: why are we here - we are contributing with linguaserve to the MLW-LT project 13:07:11 scribe: kfritsche 13:07:20 s/??:/roman: 13:07:23 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:10:39 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:10:58 Clara has joined #mlwrome 13:11:08 GordonD has joined #mlwrome 13:11:21 scribe: fsasaki roman: requires shifting to HTML5 ...: automatic adding ITS2.0 data categorys by linguaserve pedro showing demo pedro: you can see here the global rules ...: added localization quality pedro: we also have a poster outside for further information pedro: the user doesn't see anything of the translation process, the translaiton happen in real-time pedro: is aimed for big comapanies above 1mio words, otherwise its not profitable roman: what the advantage of ITS2.0? ...: for us it is control ...: user has the control about the data cetegories 13:11:48 pedro: client looses control over translation process in the tooling scenario 13:11:59 .. the ITS2 metadata allows to re-gain control for the client 13:12:23 roman: having as the client control - that is what ITS2 is good for us 13:13:07 .. we use various metadata items ("data categories"): translate, domain, localization note, locailzation quality issue, mt confidence, provenance 13:13:12 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:14:04 pedro: page in various languages 13:14:18 .. ITS2 metadata specifies what can be translated or not 13:14:31 .. we have also localization qualtiy issue created by the post editor 13:14:56 .. we also need to develop related best practices about using ITS2 e.g. by post editors 13:14:56 kfritsche has joined #mlwrome 13:15:04 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:15:44 roman: shfiting to HTML5 various steps 13:15:57 .. shallow HTML5: oboslete attributes 13:16:02 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:16:22 .. second, automatic annotation 13:16:28 .. third, manual annotation facilities 13:17:16 .. example of domain name annotation - tagging was done by scripts 13:17:29 .. provide an editor for manual annotation 13:17:36 pedro: last 30 seconds: 13:17:43 .. next steps is to end the use case 13:17:51 .. it is complete and functional 13:18:16 .. exploring best practices is a critical topic, once the standard its2 itself is finished 13:18:42 .. there are other metadata items like "readyness" that are not part of Its2 but which can be extensions to its2 13:18:50 .. then methodologies for post editing 13:19:03 .. and specific tools for dealing with the metadata are needed 13:19:30 topic: Hans-Ulrich von Freyberg - Standardization for the Multilingual Web: A Driver of Business Opportunities 13:19:35 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:20:15 hans: MLW-LT is not only for geeks, but also for accountans 13:20:52 Clara has joined #mlwrome 13:21:01 .. about cocomore: communication and IT, largest drupal dev teams in germany and spain 13:21:06 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:21:47 hans: why do we engage in MLW-LT? we want to lay the foundation of aa real integration of a CMS in the localization chain 13:21:57 .. our role in the MLW-LT project: contributing to the ITS2 standard ... we had to main task in the ITS ... first enchance drupal to use ITS2.0 ... second to create a rountrip for translations 13:22:10 .. and enhancing drupal to work with ITS rules 13:22:43 .. also creating a use case to demonstrate that ITS2 creates business benefits 13:22:52 .. not only for localization but business at large 13:23:17 .. example client vdma - industrial assocation in Germany of exporting companies 13:23:29 .. export business means multilingual challenge 13:23:42 .. today vdma has to handle 9 European languages ... VDMA one of our customers, want to translation into many languages ... translation needs time and has to be managed ... in our use-case ITS2.0 is used 13:23:53 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:24:15 hans: vdma has to publish online and offline for 60 sub sectors = domains that have to be covered 13:24:27 .. they have a central product database that also has to be multilingual 13:24:43 .. all has to be managed 13:25:09 .. we said to vdma that ITS2 / mlw-lt metadata can help the vdma business 13:25:32 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:25:44 hans: implementations in drupal for ITS2: 13:25:49 .. its rules to be used in drupal ... we have a drupal module to add local and global ITS data category ... developed a language management system to edit ITS data without touching content ... extended tmgmt to handle ITS and can export strings ... showing data categories in WYSIWYG editor 13:26:01 .. a wysiwyg edior for applying ITS rules 13:26:13 .. and we implemented a transation mgmt tool 13:26:24 .. it allows to view and edit the metadata without a cms system 13:27:36 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:27:50 hans: now more about the use case - annotation editor screen shot 13:28:12 .. it allows to deal with items that have been defined in ITS 2.0 13:28:36 hans: export of XML file to linguaserve ... we are allowing domains ... exporting to linguaserve 13:28:49 .. here how the XML arrives at linguaserve 13:29:06 .. in the format you have XML ITS 2.0 metadata 13:29:26 .. after translation the whole information returns to Cocomore, in the CMS for review 13:29:29 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:29:46 hans: two more interfaces in the CMS 13:29:54 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:30:18 hans: translation process overview 13:30:23 .. interface of language mgmt tool 13:30:40 .. tool to view and edit the metadata, the jquery plugin ... reintegrated content can be reviewed ... translation process overview to see the status of the translation foreach element ... time is money ... difference between manual and automatic processing of a translation 13:32:06 hans: options of saving time 13:32:19 .. translation process steps that have to be done from the client point of view 13:33:21 .. from the client and LSP point of view we analysed the process of an LSP 13:33:24 .. e.g. receiving, storing data etc. 13:33:39 .. processing annotation information etc. 13:34:03 .. in all process steps a lot of time saving ... 84% of time is saved ... we also have this for the LSP point of view ... 83% of time is saved ... standardization saves money ... reduced the overhead and not the translation time 13:34:26 .. conclusion: standardization combined with automatic annotation and round tripping was 80% 13:34:41 .. we also reduced the time line 13:34:50 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:35:09 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:35:22 kfritsche has joined #mlwrome 13:35:41 topic: Brian Teeman - joomla: Building Multilingual Web Sites with Joomla! the Leading Open Source CMS 13:35:45 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:36:37 s/bryan/brian/ 13:37:24 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:42:56 ok, vielen Dank 13:43:00 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:43:56 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:45:52 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:47:01 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:47:55 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:51:37 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 13:54:54 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 14:13:45 omstefanov has joined #mlwrome 14:22:04 leroy has joined #mlwrome 15:10:15 fsasaki has joined #mlwrome Brian:language matters ... in 65+ translation available ... some created by teams, some by single persons ... en-gb is the default, but not required ... mono-lingual works ... multi-lingual, also no problem ... tag, translate and associate content ... joomla 3 mobile ready, one site all sizes ... you can checkout multilingual joomal sites juv.nu and hotelreoyalsavoymadeira.com ... joomla is easy to extend ... community translation support to translate your own customiziation ... developer can use opentranslator.org ... enhance your site with over 6000 extras ... learnjoomla3.com is english only ... ayudajoomla3.com would be better for spanish content ... example 2 - translation managment system (module called josetta) ... overview of translated content ... see translation and original side by side ... here for questions for both days Topic: Vivien Petras - The Europeana Use Case-Multilingual and Semantic Interoperability in Cultural Heritage Information Systems vivien:we want every thing from cultural haritage ... many different languages, english issn't even under the top 5 ... users use their own language ... many collections from germany, therefor many meta tags in german ... even if there are many pictures most of the people only look at items, with meta tags with ther native language ... search is not multilingual aware, same results for all languges ... music and images has no language, therfore most objects are in the "Multilingual" language ... because most user only look at native langugages, most of the items maybe never get clicked ... user want to search for their languages ... we could use semantical enrychment to improve multilingual search ... but this led to other problems ... new enrichtment plan to link to contextual vocabularies from providers ... Europeana is now Open Data, we provide a sparql endpoint and RDF download topic: Inna Nickel, Christian Lieske and Daniel Naber - tool suported linguistic quality in web-related multilinguage context christian: linquistic quality is highly scenario dependant and never the same ... requirements are very different in style, voice and terminology ... integration into openoffice ... sample the single word Link - what is the meaning of this? ... NLP is about voice control and machine transaltion and linguistic quality check ... tooling to support high quality translation is Inna: open source tool for SLQ - LanguageTool supports ITS ... LanguageTool can be used standalone, embedded in java or could be use through OKAPI over HTTP ... firefox plugin to do quality report directly in the browser ... implemented russia data rule set from enterprise scenario ... limitation is processing of languagetool and complexity of enterprise data ... language tool has also the ability to check if a homepage is simple to read, which is imprortant to have better accessibility Q/A:Creators ???: is joomla support language tools brian: no, does drupal? gabor: we building a framework, which would make it easy to use any felix: jquery plugin would be CMS independent felix: when will joomla will support ITS or XLIFF brain: when someone will implement it - we have no company behind it - when there is a use for that hans: in joomla its most private and small companies, drupal have high level entrprise user, which could have more usage of ITS brain: interferences between jQuery Plugin between TMGMT hans: no problems between jQuery Plugin and TMGMT, as both addresses different fields 15:10:39 topic: bryan schnabel - Making the Multilingual Web Work: When Open Standards Meet CMS 15:10:47 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:14:08 scribe: fsasaki 15:14:12 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:15:10 bryan: multilingual web needs multilingual content 15:15:20 .. I focus on drupal CMS and XLIFF extensions to that 15:15:55 .. drupal has an out of the box solution - translation core module 15:16:05 .. "translator logs into drupal" 15:16:14 .. they click "translate" tab 15:16:21 .. choose the language of translation 15:16:33 .. add a new translation etc. 15:17:34 .. a unique name .. boomb! 15:17:44 .. pros of out of the box solution: easy to use 15:17:59 .. cons: translator can't leverage translation memory 15:18:10 .. needs access to drupal cms 15:18:13 .. and he can do harm 15:18:23 .. so idea is to leverage drupal with XLIFF 15:18:33 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:18:58 bryan: in the xliff scenario, the roles are like this: 15:19:07 .. drupal admin selects the node types to translate 15:19:12 .. he exports to xliff 15:19:17 .. saves it to harddrive 15:19:36 .. some times post processing is needed 15:20:05 .. to avoid change of text is set 15:20:14 .. XLIFF is then sent to LSP 15:20:18 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:20:29 .. so the translator doesn't get drupal nodes, but XLIFF 15:20:39 .. translator can use TM, their translation tool etc. 15:20:57 .. I got what I wanted: 15:21:06 .. translation is like I wanted 15:21:23 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:21:38 bryan: some strings are not part of the xliff model 15:21:41 .. e.g. UI strings 15:22:17 .. the drupal admin can export these as PO files 15:22:36 .. and there is a workflow PO > XLIFF , translation > PO , then into drupal again 15:22:42 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:22:53 bryan: advantage of approach: 15:23:08 .. better for translator 15:23:19 .. now about CCMS Component content management system 15:23:32 .. CCMS uses topic based authoring 15:23:42 ... DITA lets you have topics, maps, images etc. 15:23:47 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:24:05 bryan: we could talk about thousands or millions of files 15:24:53 .. now workflow in CCMS 15:25:18 .. the DITA aware nows about the translation related properties of content 15:25:49 .. coming back to the two scenarios: in the first scenario, we put everything into a ZIP file and the LSP has to deal with this 15:26:06 .. the disadvantage is that they have to deal with millions of topics to translate 15:26:11 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:26:43 bryan: now augmenting the process by using XLIFF 15:26:51 .. see the XLIFF dita open toolkit plugin 15:27:20 .. integration with trisoft / visual studio 15:27:44 .. now with new workflow: 1 xliff file, not millions of DITA files sent to LSP 15:27:56 .. advandage: LSP don't have to know dita 15:28:09 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:28:18 bryan: disadvantage: it is complex 15:28:40 s/complex/complex (scribe missed) 15:28:58 topic: sinclair morgan - How do you publish one thousand web pages, in 12 languages, at a high quality, 50% quicker than you can today? 15:29:01 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:30:52 sinclair: want to emphasize how machine translation can help to manage high volume of content in a high quality translation process 15:31:20 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:33:05 sinclair: three scenarios - conventional translation from scratch, with TM, and with MT + post editing 15:33:33 .. scenario 1): 2500 words per day per translator. Total costs = 86.000 Euro 15:33:47 .. scenario 2): 42.500 Euro 15:34:13 .. assuming 50% leverage 15:34:29 leroy has joined #mlwrome 15:34:37 .. scenario 3): trained MT + PE: many variables are important 15:35:03 .. e.g. MT output quality, language direction, quality of the source content, the training content, translation environment etc. 15:35:16 .. snapshot of recent term projects: 15:35:33 ... very dependent on projects and content 15:35:49 .. average improvemen tof 43% 15:36:02 .. so that's 30.000 Euro 15:36:24 s/improvemen tof/improvement of/ 15:36:32 .. in scenario 3) we use both MT and TM 15:36:42 .. we have a team of post editors, not translators 15:37:03 .. so we had in 1) 40 days, in 2) 30 days, in 3) 14 days 15:37:09 .. and the above cost savings 15:37:14 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:37:33 sinclair: what is needed for this performance? 15:37:40 .. you need MT system with high quality 15:37:46 .. a set of baseline languages 15:37:56 .. you need a large amount of data to build trained enginges 15:38:06 .. you need a system that is easy to minister and secure 15:38:15 .. MT adaptation is important for specific applications 15:38:31 .. you need to be able to integrate MT system into your environment 15:38:38 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:39:08 sinclair: comparison between convential translation versus baseline vs. trained mt systems 15:39:19 .. the trained mt system scores much better than a baseline system 15:39:25 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:40:10 sinclair: productivity increase is important 15:40:30 .. integration with translation environment 15:41:38 .. easy to use interface, TM and automated workflow, support functions (terminology, spell checker, qa tools, reporting) 15:42:02 .. efficient integration means: retain all benefits that we have in an MT environemnt, and add the savings from MT 15:42:06 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:43:09 bryan: example of baseline + trained MT engine - quality of trained MT engine is clearly better 15:43:28 .. human resource: need MT developers, MIT linguists, post-editors 15:43:38 .. post-editing is a professional skill 15:43:47 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:44:14 sinclair: how to guarantee high quality: use the same qa checks as for human translation 15:44:27 .. include a linguistic review of the post-editing work 15:44:40 .. use the same qa standrd as conventional translations 15:44:53 .. review gains of process 15:45:03 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:45:47 need a language technology infrastructure platform 15:45:55 .. productiiy, quliaty, automat worfklw 15:46:08 .. SMT, baselines, fast and csot efficient rainign 15:46:12 .. easy to deploy MT 15:46:24 .. and underpinnoing that with human resources 15:46:43 s/need/sinclair: summary: need 15:46:47 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:47:02 s/productiiy/productivity/ 15:47:10 s/quliaty/quality/ 15:47:23 s/automat worfklw/automated workflow/ 15:47:43 s/csot efficient rainign/cost efficient training/ 15:47:53 s/underpinnoing/underpinning/ 15:47:57 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:48:54 s/productivity/including - productivity/ 15:48:55 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:49:13 topic: Charles McCathie Nevile - Localization in a Big Company: The Hard Bits 15:49:17 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:51:01 chaals: introducing yandex - major russian search engine 15:51:10 .. providing all kinds of services running in russian 15:51:18 .. companies DNA comes from language technology tools 15:51:28 .. the company was started based on russian morphology analysis 15:51:37 .. they applied these to search 15:51:49 .. originially this was just for russian 15:51:53 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:52:52 chaals: our search results page has 1-2 results for each page, for a particular user in a given time + place 15:53:20 .. we try and give you local results, but we try not to do too much of that 15:53:26 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:53:41 chaals: we try not to personalize too much, people don't like that 15:54:58 .. linguistic processing of russian: gender issues, case issues 15:55:03 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:56:28 chaals: some services are localized, focus still on russia 15:56:41 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 15:56:52 example of yandex homepage 15:58:31 home pages for Russia and Kazakhstan 15:58:47 chaals: BEM (block-element-modifier) 15:58:59 .. open source library to let front end developers to put together a new page 15:59:31 .. yandex uses it for users settings, e.g. flags per language 15:59:44 .. we are using flags and iso language codes 15:59:57 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:00:32 chaals: different content in different languages, different design asthetics 16:00:41 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:00:52 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:01:49 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:02:13 chaals: every company has an internal bias 16:02:26 .. language technology is yandex DNA 16:02:51 .. we are building on top of that has been developed so far 16:02:59 .. standards are important, we recently joined w3c 16:03:18 .. our developers speak and read and write .. in russian 16:03:26 .. that is something the company has to deal with 16:03:33 .. thank you for your attention 16:03:51 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:07:38 topic: hans uszkoreit - Quality Translation: Addressing the Next Barrier to Multilingual Communication on the Internet 16:08:06 had to pass Manish Kanwal presentation, due to connectivity problems 16:08:13 s/pass/pass by/ 16:08:24 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:08:45 hans: other speakers are from business and talking about success story 16:09:06 .. in business if you are dissatisvied too long you are out of business 16:09:16 .. in research if you are satisfied too long you are out of business 16:09:40 .. MT success stories: free only MT systems, in-house online MT systems 16:11:20 fsasaki_ has joined #mlwrome 16:11:32 scribe: fsasaki_ 16:11:42 hans: there are MT sucess stories 16:11:47 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki_ 16:12:17 .. closely related languages work well, but others and speciaized languages don't work 16:12:31 .. MT translation research has concentrated on high volumes 16:12:38 .. needed for inbound translation etc. 16:12:41 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki_ 16:13:20 hans: there is a lack of translation quality for outbound translation 16:13:27 .. let's take a new appraoch: 16:13:45 .. separation in good enough, almost good enough and not usable for outbound 16:13:48 scribe: fsasaki 16:14:15 hans: good can be 5-75%, then 15-65, then 5-75 16:14:20 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:14:57 .. increases in bleu score: currently it is gained in the "red" part, that is "not usable" 16:15:19 .. the new current approach tries to recognize truly good high quality estimation 16:15:24 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:15:53 .. in the middle is compter assistant translation - how to move the "almost good" into the "good" area 16:16:09 .. many projcets now help to move to the left, e.g. wrt to post-editing 16:16:12 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:16:49 hans: we tried to push the topic with various instruments 16:17:19 .. one is META-NET a network of excellence that created a vision process in line with the preparation of the EU horizion 2020 program 16:17:54 .. we focused on 31 languages in Europe 16:18:18 .. see http://www.meta-net.eu/whitepapers 16:18:42 .. then there is an infrastructure for sharing resources, see http://www.meta-net.eu/meta-share/ 16:19:03 .. and there is a strategic research agenda, see http://www.meta-net.eu/sra 16:19:42 .. the SRA describes the needs of the industry, the predictions, mega-trends, research priorties 16:19:47 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:19:57 hans: strategic considerations: 16:20:28 .. we sid we woudl concentrate on some areas which have a high chance of being succesful for Europe 16:20:37 .. three research topics for the SRA 16:20:58 .. translingual cloud, social intelligence, socially aware interactive assistants 16:21:06 .. all topics are highly interconnected 16:21:11 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:21:46 hans: a lot of technology is used by several groups as the same time, e.g. a text parser 16:22:01 .. about the translingual cloud: can be a method of generic and special-purpose checking 16:22:16 .. automatic translation, language checking, post-editing etc. 16:22:29 .. systematic concentration on quality barriers 16:23:04 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:23:24 .. ingredients: semantic translation paradigm 16:23:45 .. exploitation of strong monolingual natural language analysis and generation 16:24:06 .. and modular combination of specialized analysis 16:24:45 .. european service platform: in addition to the strategic research agenda, there is a proposal of a platform for services not restricted to translation 16:24:49 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:25:09 .. you can hook into services that are not yet multilingual 16:25:25 .. finally about a project that is a pilot to prepare something: QTLaunchPad 16:25:42 .. assemble data and tools 16:25:49 .. create a shared quality metrics 16:26:03 .. has been demoed and is now being finished 16:26:15 .. then extending existing platforms for sharing 16:26:51 .. consortium comprises DFKI, CNGL DCU, ILSP athena and Univ. of Sheffield 16:26:56 .. as a subcontractor GALA 16:27:02 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:27:09 .. QTLP planning panel: 16:27:21 .. has many names in the European translation industry 16:27:31 .. important bit is semantics based translation 16:27:48 .. already in 1949 that was a vision by Warren Weaver 16:28:14 .. now in statistical MT, it is clear that we have to go into semantics deeper 16:28:23 .. this can be a talk of people working in semantic web 16:28:33 .. if you look into the work on semantic web 16:28:43 .. there is a stuff that can be used for semantically used MT 16:29:10 topic: localizers Q/A 16:29:15 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:31:03 questions from Paula Shannon 16:31:08 scribe missed 16:31:21 sinclair: cost calculation is still an issue 16:31:34 .. if MT is provided as a service there is no additional cost 16:31:48 hans: my answer to my question 16:32:01 .. in a spirit what I proposed is close to "knowledge based translation" 16:32:16 .. but it is diffierent: the approach to semantics has changed completely 16:32:39 .. in the past the idea was that people sit together and build enhanced semantic modles 16:32:54 .. but now we are getting to applications based on also crowed sourced applications 16:33:13 .. now people find huge collections that can be used as interlingua 16:33:23 .. so it will come, but in a totally different dress 16:33:33 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:34:30 various questions from tomas 16:35:57 bryan: the zip file approach - we never did it that way 16:36:45 dan: specialzed looking for language universals found only 6-7 universal 16:36:52 .. this is not enough for an MT system 16:37:00 hans: a very good piont 16:37:08 .. highly specialized systems in speeach and MT 16:37:18 .. the systems do not perform better than the generic one 16:37:29 .. but that doesn' mean that this has to stay 16:37:37 .. it only shows that we are doing things wrong 16:37:42 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:37:57 hans: this shows just that we don't get various models right 16:38:12 .. if you talk to developers of big systems 16:38:23 .. right now, because of huge amount of data 16:38:34 .. the generic system is still ahead 16:38:49 .. but I would bet quite a bit of money that the specialized systems will be better 16:38:53 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:39:05 hans: that is why we leave the space for many companies 16:39:11 .. that google and bing will never enter 16:39:21 chaals: not so sure 16:39:44 .. yandex has no reason to go away from any area of translation 16:40:01 .. and not to augment our tools with tweaks that will make an improvement 16:40:07 I'm logging. I don't understand 'draft minute', fsasaki. Try /msg RRSAgent help 16:40:09 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:40:32 hans: this is not a technical question 16:40:42 .. of course a generic system can do what the specialized system does 16:41:08 .. but it is a question of whether the big players then it comes to semantics will have all the knowledge of every subfield 16:41:15 .. and if they will share it 16:41:56 .. if it is possible in this more semantic age to have more modeling and that are not shared, then there is a possibiltiy that the business of the smaller companiesis succesful 16:42:04 s/companiesis/companies is/ 16:42:28 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:43:18 gavin: missed question 16:43:35 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:45:37 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki 16:47:11 wrapping up day one 16:47:15 I have made the request to generate http://www.w3.org/2013/03/12-mlwrome-minutes.html fsasaki