07:07:42 RRSAgent has joined #mlw 07:07:42 logging to http://www.w3.org/2011/04/05-mlw-irc 07:07:52 meeting: MLW Pisa Workshop, day 2 07:07:54 jan has joined #mlw 07:07:55 chair: Richard 07:08:00 hm, i think charles was supposed to be scribing? 07:08:00 scribe: various 07:08:21 topic: Presentation from Dave Lewis et al. 07:08:26 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 07:08:43 Dave: Presenting on CNGL research 07:08:46 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 07:09:06 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 07:09:59 Dave: multilngual IR, real time social media translation etc. are all part of the aim to support the global customer 07:10:01 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 07:10:26 lbellido has joined #mlw 07:12:44 Dave: web services - benefits for localisation like "pay as your use" models, easy deployment, .... 07:16:23 Dave: industry survey shows barriers for adoption of technology 07:17:50 Jirka has joined #mlw 07:18:13 dave: web services interoperability - needs to be very careful in profiling 07:18:55 I can jump in - although getting someone from the floor would help 07:19:33 yes 07:19:34 r12a has joined #mlw 07:19:46 Dave: proposing employing semantic web technology to the MT use case 07:19:53 cool, thanks 07:20:08 dave: semantic web may help to solve the problems we are looking at 07:20:25 .. sw is a good mechanism to leverage other things 07:20:58 .. tools are maturing 07:21:10 .. we are interested in a small part of the sw stack, that is RDF 07:22:04 luke has joined #mlw 07:22:06 .. RDF is a triple langugae, everything gets a URI and can be referenced, RDF schema provides some basic modeling methods 07:22:18 Dave compares RDF to relational data bases 07:23:32 until now, fsasaki was 07:24:11 dave: RDF provides classes, properties, ... 07:24:29 .. including multiple heritance, allows combinations in an interesting way 07:25:36 .. semantic web has not necessarily standardization, people just create a vocabulary 07:25:54 .. if it is taken up, good - a "survival of the fittest" approach 07:26:48 .. existing data can be annotated with RDF - for Web services there is WASDL 07:27:05 s/for Web/e.g. for Web/ 07:27:09 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 07:28:11 dave: developed a seed taxonomy for next generation localisation (NGL) content 07:28:45 .. working with many researchers in CNGL to see whether the taxonomy fits their needs, otherwise it is changed 07:29:29 s/until now, fsasaki was// 07:29:44 s/cool, thanks// 07:30:09 dave: have a model refinement cycle for this 07:30:51 .. fine-grained roundtrips involving customer, content developer, LSP, translators 07:31:00 .. looking into doing this with RDF 07:31:22 .. "linked open data" - not focusing so much on reasoning, but to see how to publish data you have 07:31:38 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 07:31:54 dave: triple stores are becoming robust, starting to scale 07:32:22 s/hm, i think charles was supposed to be scribing// 07:32:38 s/I can jump in - although getting someone from the floor would help// 07:32:59 dave: important vocabulary from LOD: open provenancy vocabulary 07:33:11 .. helpfrul for author, segment and source QA 07:33:19 s/helpfrul/helpful/ 07:33:25 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 07:34:08 dave: next steps: 07:34:41 .. revise semantic model, semantic sandpit, content markup via RDFa, not standardising semantics, testing semantic technology 07:34:54 .. access control, etc. 07:35:18 dave: real power of SW is its extensibility 07:35:30 .. semantic annotations can help to improve interoperabilty 07:35:41 .. provenance linked data can help for roundtripping 07:36:00 .. will gather a lot of quality metadata about the content we are localising 07:36:14 .. that might be helpful for training statistical MT 07:36:20 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 07:36:49 topic: presentation from alexandra weisgerber 07:38:05 Agenda: http://www.multilingualweb.eu/documents/pisa-workshop/program 07:38:10 alexandra: introducing swinng project, part of the software cluster 07:38:21 .. central principle: emergence 07:38:45 .. emergent software: enables combination of components and services for digital comparison 07:39:02 .. components can come from ERP, BMP, BPI, the Web, ... 07:41:30 alexandra: agility to better acount for reducing waste, empowering the team and the employee, ... 07:44:00 .. challenges: find a balance for right amount of documentation 07:44:37 .. had experiences with writing larger user concepts or user concepts on the white board 07:44:44 s|s/hm, i think charles was supposed to be scribing//|| 07:45:05 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 07:45:13 s/hm, i think charles was supposed to be scribing?// 07:45:46 rrsagent, make minutes 07:45:46 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven 07:46:23 alexandra: actions and research areas: include a technical writer in maximum 2 SCRUM teams 07:47:59 .. want to set up a controlling to measure software quality and time to market 07:48:44 .. difficult task, software quality is hard to measure 07:48:47 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 07:50:35 topic: presentation from Andrejs Vasiljevs 07:50:58 Andrejs: talking about challenges for smaller challenges 07:51:25 .. tools should be provided to help to bridge language barriers esp. for these languages 07:51:59 .. unesco is working on code of ethics , including demand to represent all linguistic grops in cyber space 07:52:49 .. alvin toffler: "survival of smaller langauges depends on outcome of MT versus proliferation of larger languages" 07:52:54 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 07:53:12 Andrejs: Tilde is doing both language technology and localization services 07:53:26 .. we can see real needs of users and test new approaches 07:54:11 lbellido has joined #mlw 07:55:24 Andrejs: MT at tilde: first rule-based, switching to data-driven methods in 2008, heavy participation in EU R&D 07:55:31 .. about MT development 07:55:51 .. not only research, but bring results in tools we provide 07:56:03 .. MT, dictionaries widely used in the country 07:56:22 .. work with MS research to improve MT engine for our language 07:56:56 .. problem of data driven MT: translation quality is low for under-resourced langauges 07:58:29 .. other challenge is customization: mass-market, online MT-systems are general 07:58:38 .. performance is poor for specific domains 07:59:30 .. open source tools like GIZA++ or moses are hard to use for the ordinary user, too complex 07:59:33 ChriLi has joined #mlw 08:00:02 .. strategies to help: see "LetsMT!" project 08:00:17 .. building a platform to gather public and user-provided MT training data 08:00:32 .. increasing quality, scope and language coverage for MT 08:01:24 .. area is "machine translation for the multilingual web" 08:01:34 ChriLi has left #mlw 08:01:54 .. user survey about IPR of text resoruces 08:02:04 .. there is willingness to share data 08:02:23 s/willingness/some willingness/ 08:03:26 Andrejs: another project "Accurat" 08:03:40 .. non-parallel bi- or multilingual text resources 08:03:48 .. e.g. multilingual news feeds 08:04:01 .. wikipedia articles, multilingual web sites, ... 08:04:20 .. these show scale of comparability 08:04:40 .. we calculate the comparability 08:05:03 .. develop comparability metrics 08:05:26 .. develop methods for automatic acquisition of parallel texts 08:05:50 .. cnosortium has both research institutions and SMEs 08:05:56 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 08:06:54 .. taggingg MT translated tags would be very helpful 08:07:07 .. to be able to distingush MT translated texts from human translated text 08:07:22 .. common interfaces for MT enginges would facilitae interoperability 08:07:31 .. standardization / BP are needed 08:08:06 topic: presentation from Boštjan Pajntar 08:08:40 Boštjan: about collecting aligned textual corpora from the hidden web 08:09:32 .. aligned parallel corpus: a text alongside its translation(s) 08:09:42 omstefanov has joined #mlw 08:09:49 .. usage: translation memory, training MT systems, many NLP scenario 08:10:10 .. looked at standards, decided to go for TMX 08:10:31 .. XLIFF is in my list in the last bullet point, in brackets 08:10:41 .. so XLIFF needs more marketing & development 08:11:16 .. getting data: non-english professional web sites 08:11:26 .. huge amount of translated text 08:11:37 .. in general quality translations 08:11:43 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 08:12:19 Boštjan: problems: 08:12:28 .. translation memory is hard to get 08:12:35 .. data should have high precision 08:14:32 .. .. no standard fully supports automatic harnessing or cleaning of data 08:15:00 .. proposed solution: crawl from the web 08:15:32 .. > database > list of HTML candidates > list of text candidates > paralell corpora 08:15:55 .. see http://kameleon.ijs.si/t4m for more info 08:16:59 http://kameleon.ijs.si/t4me/ 08:18:03 s/t4m/t4m\// 08:18:12 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 08:18:58 s/paralell/parallel/ 08:19:20 Boštjan: we used TMX - is it the right choice? 08:19:25 .. source language must be defined 08:19:59 .. no need for me to do that, I just have paralllel texts for machine consumption 08:20:14 .. would need an optional parameter to define the source for each segment 08:20:39 joerg has joined #mlw 08:20:50 .. when you develop a standard, think also about "machines" as users, not only people 08:21:16 .. future work: optimization in the areas of two phrase crawling, character encoding, enhanced candidates extraction 08:21:21 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 08:21:27 To answer Bostjan's question about "how many errors are acceptable", the answer (frustratingly for him, I'm sure) is "it depends": Is the text a guide for system administrators or the company homepage? Also: what are the type of errors (people can usually understand text with some grammatical errors, but if the key nouns/verbs are incorrect, it could be confusing/embaressing). 08:22:23 Boštjan: web service for TM memmory distribution and filtering (web 2.0 style) 08:22:44 topic: presentation from Gavin Brelstaff 08:22:53 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 08:24:47 gavin: interactive alignment of parallel texts 08:25:43 .. world wide web: need to both think globally, but alos locally, e.g. in terms of minority languages 08:26:13 .. "a seed-bed for poetic expression, beyond mere communication" 08:28:30 .. cultural context is important, see R. Jakobson 08:29:08 .. there is an osmosis between minority languages and global languages 08:29:11 .. everybody becomes a 2nd language speaker 08:29:17 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 08:30:51 Gavin: parallel text alignment <> to communicate semantics 08:31:16 .. we have standards-based markup, web delivery cross-browser, non-verbal interactivity ... 08:31:35 .. statistical MT will not translate poetry in the next 20-50 years 08:32:47 .. we developed a parallel text alignment web interface 08:33:06 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 08:34:22 demo of interactive text alignment 08:36:07 .. standards that have been used for the demo: TEI (XML-based) structure 08:36:27 .. presented as XHTML, with CSS, JavaScript 08:36:46 .. semantics is not RDF, but the TEI structure 08:38:09 gavin: beauty of Unicode - one can put multilingual information in directly into the content 08:38:29 .. pros: we can interact directly with semantics 08:38:46 .. w3crange does not work in browsers 08:38:58 .. TEI P5 must be subsetted 08:39:03 s/pros/pros and cons/ 08:39:19 .. CSS selection helps with jquery 08:40:02 .. some browser isues, does not work everywhere 08:40:10 topic: q/a session 08:42:29 question about semantic web for MT training 08:42:40 dave: have thought about that 08:42:48 .. e.g. linking to terminology data bases 08:43:17 .. looking into lexical markup, there was a presentation at the last mlw workshop about this 08:43:30 .. hot topic in MT; linguistically informed MT 08:46:05 discussion about legal issues with gathering corpora via the Web - is it legal at all? 08:46:25 Boštjan: laywers will work on finding that out 08:46:59 Alexandra: all languages in our project need to be finished 08:47:30 .. depending on the language it is difficult or easier 08:47:58 christian: funny to see the same questions, I had the remark on IP too, let's see where this goes 08:48:23 christian lieske: everyone mentioned that categorisation of what we find on the web would help with machine analysis 08:48:23 .. not a question, but a remark: all of you mentioned that categorization of what we find on the Web would be helpful for reliable machine analysis 08:48:40 ... some communities have a detailed approach to this 08:49:08 ... look at last year's w3c day in berlin and you'll see how work on digital libraries may fit well with machine translation 08:49:26 (above is presentation from Günther Neher) 08:50:36 ??: often pages with the same url that are translated are not exactly the same structure 08:51:06 (see, in German, http://www.xinnovations.de/downloads-2010.html?file=tl_files/xinnovations.2010/Download/W3C-Tag/Prof.%20Dr.%20Guenther%20Neher.pdf) 08:51:18 bostjan: we have done little testing so far - about 7000 translations - and it worked well 08:52:13 ... our preliminary experiments show that it still works very well, even if there isn't the same content on both sides of parallel text 08:52:48 andrejs: see the FP7 project that is looking how to extract comparable corpora 08:55:15 s pemberton: i'm impressed by willingness to translate poetry - i'm performing in an opera and it took me a while to understand some allusions and references (gives examples) 08:55:25 ... i'm amazed that you hope ever to do this 08:55:49 gavinB: our approach is to find the interface - to see how far machines can go 08:56:40 ... it is possible to a translation based on bare bones - even humans can get things wrong... 08:57:04 jorgS: if you have conceptual mismatches, how do you resolve them? 08:57:39 gavinB: this is where the human translator accepts that they need to go away and study it - in our system we mark it up in red 08:57:49 ... the translation will never be exact 08:58:15 jorgS: for dave, what do you think of thenext generation of content generation based on RDF ? 08:59:08 dave: there's still a gap between computational linguists and semantic web folks - there are people looking at how to apply these things, and there are proposals out there 08:59:23 ... we're looking at how to integrate those approaches into what we do 08:59:41 jorgB: i'm looking forward to multilingual text generation 09:00:15 lukeS: i was intrigued by gavin's presentation 09:00:38 ... seems the best you can do wrt translation is to come up with a separate poem that has the same feel 09:01:11 ... but this may be a useful tool for understanding the original material better 09:01:28 Jirka has left #mlw 09:01:28 ... there may be implications for other translation approaces 09:02:43 christianL: i understand the remarks about translation poems with machines - but to me Gavin's talk was about an annotation mechanism based on standards 09:03:02 .... there is a need for this approach, and gavin's presentation was inspirational 09:03:47 ... more and more acccurate annotations are needed, but there are other aspects to translation and gavin's presentation pointed to many useful aspects of this 09:04:13 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 09:05:46 chaals has joined #mlw 09:34:03 tadej has joined #mlw 09:34:51 topic: Users session, Paula Shannon, Social Media is Global. Now What? 09:35:04 rrsagent, draft minutes 09:35:04 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej 09:35:35 Steven has joined #mlw 09:37:31 rrsagent, here? 09:37:31 See http://www.w3.org/2011/04/05-mlw-irc#T09-37-31 09:37:40 paula: introducing how social media is changing localization 09:38:36 ... showing video on social media 09:40:04 fsasaki has joined #mlw 09:40:56 ... video emphasizing rapid growth and scale of various SNs, describing the relationship of new generation towards social media 09:42:16 ... video focusing on effect of social media on advertising, enabling higher ROI for marketing 09:42:39 ... introducing the term "socialnomics" 09:43:35 One mistake in the video - it conflated Internet and Web, so the time to 50M users was for the web, not for the internet 09:44:04 lbellido has joined #mlw 09:44:53 paula: describing the notion of reputation control via media - the talk will be about showing how this does not hold in presence of social media 09:45:51 ... analogy with toddlers as example of parents not being in control 09:47:17 ... in social media, the user is in the middle of the system and his worldview actually defines his experience 09:47:38 omstefanov has joined #mlw 09:48:17 ... emphasizing other social networks than facebook, e.g. hi5, orkut - a reason for their success was the fact that they were localized 09:49:29 ... talking about surveys on social media and lionbridge involvement - how people are using social media multilingually 09:50:35 ... companies using social media: a quarter of companies are using all 4 platforms - europe and especially asia businesses are growing much 09:50:51 ... faster than u.s. companies, likely due to legal issues 09:52:07 ... twitter is increasingly popular, fastest growth 09:53:26 ... 60% of tweets are non-english, but twitter localized only in 7 languages 09:53:39 rrsagent, draft minutes 09:53:40 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej 09:54:17 paula: companies engage in hyper-local strategies, twitter account-per-region 09:54:36 joerg has joined #mlw 09:55:41 paula: twitter brought new metric: TPS - tweets per second 09:56:00 scribe: tadej 09:56:58 paula: smartphones becoming the relevant computing platform 09:57:06 rrsagent, draft minutes 09:57:06 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej 09:58:13 paula: why are companies engaging? because SM allows them to really interact with the users 09:58:51 scribe: tadej, r12a, fsasaki 09:59:40 paula: strategies of social media: 1) single centralized controlled SM outlet 10:00:14 scribes: fsasaki,r12a,tadej 10:00:29 Steven has joined #mlw 10:00:31 rrsagent, draft minutes 10:00:31 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej 10:01:10 paula: 2) decentralited local pages - more effective, but users have more control 10:01:35 s/scribe: various/scribe: fsasaki/ 10:01:41 rrsagent, make minutes 10:01:41 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven 10:03:15 i/christian lieske: everyone mentioned that categorisation/scribe: r12a 10:03:21 rrsagent, make minutes 10:03:21 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven 10:03:24 paula: it is still a huge opportunity - example: coca-cola has 250 people who are tasked with buying keywords 10:04:13 i/paula: introducing how social media/scribe: tadej 10:04:20 rrsagent, make minutes 10:04:20 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven 10:04:45 ... important assertions: it is happening quickly, it's huge and growing. instantly available content has more value that quality content 10:06:19 paula: the real-time aspect also affects localization processes - when localizing a message, the process might take too much time 10:07:06 ... real-time multilingual communication does not leave space for pre- and post- editing, leaving a lot of human intervention out 10:08:10 paula: last assertion: machine translation is being increasingly more relevant for SM outlets 10:09:17 topic: Presentation of Maarten de Rijke - Emotions, experiences and the social media 10:11:41 maarten: intro - academics are not concerning with standards per-se, but trying to get things done 10:12:19 ... talk will be about standards supporting intelligent information access of content 10:13:38 ... in social media, people still do the same things, but online instead of offline 10:15:23 ... presenting concrete project of a political mashup 10:16:17 ... gather political social media content, debates, analyze and semantify it. political scientists are interested in tracking topic ownership 10:17:04 ... traditionally, this resesarch was conducted via clasisic clipping, now via social media. 10:17:19 ... however, data gathered this way is increasingly multilingual 10:18:22 ... another project, CoSyne, about cross-completing wikipedia pages using different language articles on the same topic 10:18:30 s/clasisic/classic 10:19:40 ... third example: The Mood of the Web - Livejournal has mood annotated blogs, serving as a stream of mood-annotated data 10:21:59 ... when following mood patterns accross time, you can try to interpret them, for instance "shocked", "tired" 10:23:49 ... what would explain a huge spike in "shocked" in 2008. by combining livejournal streams with news and counting word usage statistics, it turns out that it was the death of actor heath ledger. 10:26:17 ... showing a time series on stress measurements, showing a spike at the end of the year - that sort of analyses require a lot of technology for text processing and information extraction 10:27:57 ... introducing Fietstas, a multilingual en/nl text processing engine as infrastructure for what was presented 10:29:32 topic: presentation from Gustavo Lucardi - Nascent Best Practices of Multilingual SEO 10:29:56 rrsagent, draft minutes 10:29:56 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej 10:30:35 gustavo: comparing the SEO process with preparing a gourmet meal 10:31:05 ... posing the question, "what are the right ingredients for multilingual SEO?" 10:32:36 ... high search engine positioning is very important, holding potential for high revenue 10:33:11 ... introducing terms: SEO, MSEO, SMO, Social SEO as different strategies in the field 10:33:55 an important distinction is that whereas in SEO traffic comes from search engines, in SMO traffic comes from social media 10:34:35 gustavo: for example, 500 tweets have more effect than 500 incoming links 10:35:15 ... however, SEO still has higher ROI than SEO 10:36:42 ... an important concept in SEO is the long tail effect in certain business models 10:38:02 ... just translating keywords does bring traffic, but has low conversion rates 10:39:31 ... for effective multilingual, international SEO, he recommends the W3C Language Standards as basic rules 10:40:33 ... SEO can be multilingual, internation or geographical, not mutually exclusive among these. 10:41:24 gustavo: what did we learn doing it: 10:41:26 ... 1) focus on the long tail and niche market 10:41:33 ... 2) conversions, not traffic 10:41:44 3) things change, iterate 10:43:32 gustavo: showing examples - a legal company campaign was successful once they used correct glossary translations 10:44:11 ... healthcare insurance campaign was better once they regionalized their content 10:44:53 ... hotel chain: 12 languages, necessary to cover all 10:45:18 topic: presentation from Chiara Pacella - Controlled and uncontrolled environments in social networking websites and linguistic rules for multilingual websites 10:47:00 chiara: the talk will be around control of content and the implications of having a controlled vs. uncontrolled content 10:48:16 ... controlled environment - the user does not have influence, the content is relatively static 10:49:18 ... in a controlled environment, the developers work with strings with sentences, which are then combined 10:50:02 ... in an uncontrolled component, the content is very dynamic, developers have limited control - they combine it with the controlled component before outputting 10:51:21 ... even in a single sentence, there may be a combination of controlled and uncontrolled strings 10:51:59 ... in the translator's view, the content is treated as token variables 10:52:51 ... explaining their approach to i18n: handling languages with gender, number, declensions, etc. 10:53:20 ... different languages may have different needs than the source language 10:53:48 s/3) things/... 3) things 10:54:23 ... they solve that by "dynamic string explosion", which enables a translator to have multiple translations for the same source string depending on the linguistic context 10:54:46 s/an important distinction/... an important distinction 10:55:55 chiara: ... in romanian, the translator must specify gender, but in finnish and russian, it is even more complicated 10:56:36 ... an important aspect and the point of this talk is that facebook users are the translators 10:57:47 ... considering machine translation, but haven't implemented it yet 10:58:58 ... french was translated in 24 hours, released in three weeks, now supporting 67 language, many released without professional review 10:59:10 ... review process: 10:59:26 ... 1) translating the glossary of individual terms 10:59:41 ... 2) translating the content 10:59:53 ... 3) professional supervision and checking 11:00:24 ... the tool supports both inline and bulk translation for in- and out- of context translation 11:01:35 ... why use community translation: 1) users are domain experts 2) speed 3) reach 11:02:22 ... why do users translate: personal satisfaction and pride, leaderboard of translation statistics 11:02:41 rrsagent, draft minutes 11:02:41 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej 11:03:01 topic: presentation of Ian Truscott, Customizing the multilingual customer experience – deliver targeted online information based on geography, user preferences, channel and visitor demographics 11:04:33 ian: SDL is an international company, and itself also faces the multilingual problem 11:06:26 jan has joined #mlw 11:06:38 ... big themes: social media and different devices, how information is shaping opinions, relevant content is often in the user's language 11:07:54 ... reiterates the point that buyers are sensitive to the language of the content when buying 11:10:40 ... while around 50% of tweets are english, it is diminishing 11:11:35 ... connecting with visitors: be relevant, listen, understand, engage 11:11:48 ... this requires monitoring solution 11:12:12 s/requires monitoring solution/requires monitoring solutions 11:13:07 ian: understanding: finding common interests accross languages, demographics and geographies 11:13:18 ... it turns out that the common interests are key 11:15:40 ... content should be relevant, and better relevance via localisation is reflected in better effectiveness of communication 11:16:37 ... presenting the journey of the customer engagement, from research of products to buying and customer support 11:17:37 ... for the customer's journey, there's a lot of content with which the user engages that needs to be appropriate 11:18:01 davidf has joined #mlw 11:19:36 ... if people are coming to the website, they are trying to get stuff done, so 'user engagement' may be an obstacle 11:20:39 ... users' expectations have changed 11:21:00 ... they expect content in their own language 11:21:23 topic: Users Q&A 11:22:18 DavidGrunwald: to paula - you haven't discussed whether you have the tools in place to harness social media? 11:24:00 paula: they don't crowdsource, they crowd manage - using input from users of various levels of skills, split the work into tasks and monitor that 11:24:51 DavidGrunwald: you are not letting the crowd control the message, as you claimed in your talk 11:25:12 paula: the content that I am referring to is not always in public or social media 11:25:51 lbellido has joined #mlw 11:26:36 ian: with social media, you can translate and listen, but you have to be caution with translating and speaking (with automatic tools) 11:27:02 ian: agree with crowdsourcing, but it needs to be a love brand, for which people want to write 11:28:46 LukeS: on exploding translations in facebook - there is an open source project that unicode consortium supports that handles a subset of the language morphology problem 11:29:44 DanTufiş: to maarten - what theories is your work relying on 11:30:14 maarten: machine translation as core technologies, political science as application 11:30:45 DanTufiş: points out Osgood's work on subjectivity with using wordnet to extract sentiment 11:31:44 Steven: points out that in paula's presentation, it was not the internet that took 4 years to 50 million, but the WWW 11:31:53 rrsagent, draft minutes 11:31:53 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej 11:32:20 tadej has left #mlw 11:32:44 Steven has left #mlw 12:20:26 Steven has joined #mlw 12:33:59 joerg has joined #mlw 12:34:38 start scribing / policy session 12:35:11 fsasaki has joined #mlw 12:35:17 jaap van der meer starts with naming different devices 12:35:58 ScribeNick: joerg 12:35:58 ... reports on last standard summit in Boston 12:36:38 topic: Presentation from Jaap van de Meer 12:36:46 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki 12:37:12 r12a has joined #mlw 12:38:07 ... interoperability questionnaire 12:38:39 ... and interestedness in standards in particular 12:39:29 ... quotes some of the statements regarding costs 12:40:10 ... where is the friction? mostly TM followed by terminology 12:41:07 ... reasons to support: freedom of tool choice 12:41:31 Steven has joined #mlw 12:42:06 ... biggest barriers: lack of compliance, lack of maturity, etc. 12:43:27 ... sort of restistance against interop. such as market drop-down 12:43:37 IanTruscott has joined #mlw 12:43:43 omstefanov has joined #mlw 12:44:34 ... different perspectives of believers 12:47:43 ... realists point of view such as "accept market forces", "show business advantage", "restistance to tools", etc. 12:49:54 ... and now the pragmatists: "they have hope..." ;-) 12:51:27 ChriLi has joined #mlw 12:52:29 ... future outlook (5 years!) 12:52:54 lbellido has joined #mlw 12:53:46 jan has joined #mlw 12:53:51 ... content increase, multimedia, mobile, more cross-lingual challenges, ... 12:54:52 ... brief SWOT analysis (see other TAUS publications too) 12:57:24 ... information pyramid representing content disruption 12:58:07 ... apply pyramid to SWOT graphic 12:58:54 ... business model attributes: old vs. new 13:00:49 ... e.g. TM is core vs. data is core; one- vs. multi-directional; word based pricing vs. SaaS; GMS vs. MT embedded 13:02:46 ... enterprise in 5 years need a language strategies 13:04:39 ... last slide: interoperability agenda 13:05:29 ... more changes in the next 5 years than in the past 25 years 13:07:02 Next speaker: Fernando Servan 13:08:38 talks about the challenges of multilinguality for international organizations 13:09:25 ... gives the context of the food and agriculture organization of the UN 13:10:45 ... 6 languages (en, fr, es, arabic, cz, ru); approx. 12 m words/year 13:11:37 ... English has the largest share of doc lang. 13:12:50 ... websites in 6 lang. and regional relevance content in 3 lang. 13:15:09 ... challenges for doc. and web content: tech., prof. profiles, workflow, "consumer" languages 13:18:20 ... additional challenges are: rules and regulations, re-use of translations, TM/MT integration 13:19:07 ... no analysis or lessons learned available currently 13:20:31 ... envison the employment of CMS, CAT-tools, extend prof. profiles, optimize workflows 13:21:34 ... under discussion: employment of open source software, cloud services, etc. 13:22:13 ... funding could be based on current SME call of the EC 13:23:21 Next speaker: Stelios Peperidis 13:24:13 s/Next/Topic:/ 13:24:22 talks about language resources sharing initiative in the context of MetaNet 13:24:39 s/Peperidis/Piperidis/ 13:26:40 ... introduces the objectives and structure of Meta-Net, focus will be on Meta-Share 13:27:51 ... emphasis the key challenge of data and how it relates on LT research and development 13:29:33 ... another important point in the initial discussions was standards 13:31:15 ... observations: making data employable is costly 13:32:36 ... Meta-Share shall be an open infrastructure that enables interoperability on various layers 13:33:50 ... it is also built on existing projects and initiatives that already in this broad field 13:35:10 ... as an umbrella organization which shall also include national efforts 13:35:31 s/talks about language resources sharing initiative in the context of MetaNet/Stelios: [talks about language resources sharing initiative in the context of MetaNet]/ 13:35:59 s/talks about the challenges of multilinguality for international organizations/Fernando: [talks about the challenges of multilinguality for international organizations]/ 13:36:02 ... the main idea of the Meta-Share architecture is distribution based on a "meta schema" model 13:36:04 s/Next/Topic:/ 13:36:36 s/start scribing/Topic:/ 13:37:18 s/jaap van der meer starts with naming different devices/Topic: Speaker - Jaap van der Meer/ 13:37:22 .... users/consumers will have the possibility to search, browse and download resources 13:37:56 i/... reports on last standard summit in Boston/Jaap: begins by asking for the different ways people call a mobile phone/ 13:39:03 ... fully supports open source developments including appropriate maintenance 13:40:44 ... Meta-Share governance is given by members and associate members; legal issues are under cc 13:42:17 Start of discussion of Policy Session 13:43:38 Chaals: Word count is going down does mean translation workload decreases. Speculate on the implications? 13:43:55 s/does mean/does not mean/ 13:44:17 s/implications/implications where in fact we get more complex multimedia to include in the mix/ 13:45:07 Jaap: Identification of different rating criteria; human interference; word count is unmanageable; more demand 13:45:30 ... for MT but with different pricing models 13:46:27 Fernando: New challenges through users; relying on help from different sites 13:47:48 Stelios: Subtitling has a different approach based on intellectual capabilities needed; time of media content 13:48:07 ... mutiplied by a certain factor 13:48:42 Chaals: Who owns the data question? 13:50:10 Steven: In the Netherlands all films are subtitled... quotes a translator "we are payed by the word"? What would the 13:50:22 integration of MT mean? 13:51:54 Stelios: Translation based on a "master file", i.e. the translation pricing model applies. 13:52:50 Reinhard: Subtitling for free i.e. by volenteers? 13:53:54 Chaals: Student's translations, shipped to India; there are several models... 13:55:54 Stefanov: Some points need to be highlighted: PEs, interpretation vs. translation, different multi-media presentations, 13:56:13 quality control will change, etc. 13:56:41 Chaals: You mean librarians? 13:57:23 Stefanov: Not really... the picture is changing. 13:59:17 [modern librarians learn to manage digital multimedia collections, and don't have to have their hair in a bun anymore. I am often surpriesd that they are not present at all at conferences like this - it seems we're missing out on expertise that seems highly relevant] 13:59:50 Christian: MT in subtitling already existing, e.g. in Scandinavia. Question on (gov) rules to multimedia? 14:00:45 Chaals: I have seen such rules but there are a lot of options. 14:00:53 END of Session 14:04:37 tadej has joined #mlw 14:10:22 tadej has left #mlw 14:11:17 s/(gov) rules to/whether there are policies aimed at reducing translation costs by limiting use of/ 16:51:26 RRSAgent has joined #mlw 16:51:26 logging to http://www.w3.org/2011/04/05-mlw-irc 16:51:31 rrsagent, here? 16:51:31 See http://www.w3.org/2011/04/05-mlw-irc#T16-51-31 16:51:53 rrsagent, make minutes 16:51:53 I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven