07:07:42 <RRSAgent> RRSAgent has joined #mlw
07:07:42 <RRSAgent> logging to http://www.w3.org/2011/04/05-mlw-irc
07:07:52 <fsasaki> meeting: MLW Pisa Workshop, day 2
07:07:54 <jan> jan has joined #mlw
07:07:55 <fsasaki> chair: Richard
07:08:00 <tadej> hm, i think charles was supposed to  be scribing?
07:08:00 <fsasaki> scribe: various
07:08:21 <fsasaki> topic: Presentation from Dave Lewis et al.
07:08:26 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:08:43 <fsasaki> Dave: Presenting on CNGL research
07:08:46 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:09:06 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:09:59 <fsasaki> Dave: multilngual IR, real time social media translation etc. are all part of the aim to support the global customer
07:10:01 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:10:26 <lbellido> lbellido has joined #mlw
07:12:44 <fsasaki> Dave: web services - benefits for localisation like "pay as your use" models, easy deployment, ....
07:16:23 <fsasaki> Dave: industry survey shows barriers for adoption of technology
07:17:50 <Jirka> Jirka has joined #mlw
07:18:13 <fsasaki> dave: web services interoperability - needs to be very careful in profiling
07:18:55 <tadej> I can jump in - although getting someone from the floor would help
07:19:33 <tadej> yes
07:19:34 <r12a> r12a has joined #mlw
07:19:46 <tadej> Dave: proposing employing semantic web technology to the MT use case
07:19:53 <tadej> cool, thanks
07:20:08 <fsasaki> dave: semantic web may help to solve the problems we are looking at
07:20:25 <fsasaki> .. sw is a good mechanism to leverage other things
07:20:58 <fsasaki> .. tools are maturing
07:21:10 <fsasaki> .. we are interested in a small part of the sw stack, that is RDF
07:22:04 <luke> luke has joined #mlw
07:22:06 <fsasaki> .. RDF is a triple langugae, everything gets a URI and can be referenced, RDF schema provides some basic modeling methods
07:22:18 <fsasaki> Dave compares RDF to relational data bases
07:23:32 <tadej> until now, fsasaki was
07:24:11 <fsasaki> dave: RDF provides classes, properties, ...
07:24:29 <fsasaki> .. including multiple heritance, allows combinations in an interesting way
07:25:36 <fsasaki> .. semantic web has not necessarily standardization, people just create a vocabulary
07:25:54 <fsasaki> .. if it is taken up, good - a "survival of the fittest" approach
07:26:48 <fsasaki> .. existing data can be annotated with RDF - for Web services there is WASDL
07:27:05 <fsasaki> s/for Web/e.g. for Web/
07:27:09 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:28:11 <fsasaki> dave: developed a seed taxonomy for next generation localisation (NGL) content 
07:28:45 <fsasaki> .. working with many researchers in CNGL to see whether the taxonomy fits their needs, otherwise it is changed
07:29:29 <fsasaki> s/until now, fsasaki was//
07:29:44 <fsasaki> s/cool, thanks//
07:30:09 <fsasaki> dave: have a model refinement cycle for this 
07:30:51 <fsasaki> .. fine-grained roundtrips involving customer, content developer, LSP, translators
07:31:00 <fsasaki> .. looking into doing this with RDF
07:31:22 <fsasaki> .. "linked open data" - not focusing so much on reasoning, but to see how to publish data you have
07:31:38 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:31:54 <fsasaki> dave: triple stores are becoming robust, starting to scale
07:32:22 <fsasaki> s/hm, i think charles was supposed to be scribing//
07:32:38 <fsasaki> s/I can jump in - although getting someone from the floor would help//
07:32:59 <fsasaki> dave: important vocabulary from LOD: open provenancy vocabulary
07:33:11 <fsasaki> .. helpfrul for author, segment and source QA
07:33:19 <fsasaki> s/helpfrul/helpful/
07:33:25 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:34:08 <fsasaki> dave: next steps:
07:34:41 <fsasaki> .. revise semantic model, semantic sandpit, content markup via RDFa, not standardising semantics, testing semantic technology
07:34:54 <fsasaki> .. access control, etc.
07:35:18 <fsasaki> dave: real power of SW is its extensibility
07:35:30 <fsasaki> .. semantic annotations can help to improve interoperabilty
07:35:41 <fsasaki> .. provenance linked data can help for roundtripping
07:36:00 <fsasaki> .. will gather a lot of quality metadata about the content we are localising
07:36:14 <fsasaki> .. that might be helpful for training statistical MT
07:36:20 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:36:49 <fsasaki> topic: presentation from alexandra weisgerber
07:38:05 <Steven_> Agenda: http://www.multilingualweb.eu/documents/pisa-workshop/program
07:38:10 <fsasaki> alexandra: introducing swinng project, part of the software cluster
07:38:21 <fsasaki> .. central principle: emergence
07:38:45 <fsasaki> .. emergent software: enables combination of components and services for digital comparison
07:39:02 <fsasaki> .. components can come from ERP, BMP, BPI, the Web, ...
07:41:30 <fsasaki> alexandra: agility to better acount for reducing waste, empowering the team and the employee, ...
07:44:00 <fsasaki> .. challenges: find a balance for right amount of documentation
07:44:37 <fsasaki> .. had experiences with writing larger user concepts or user concepts on the white board
07:44:44 <Steven> s|s/hm, i think charles was supposed to be scribing//||
07:45:05 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:45:13 <Steven> s/hm, i think charles was supposed to be scribing?//
07:45:46 <Steven> rrsagent, make minutes
07:45:46 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven
07:46:23 <fsasaki> alexandra: actions and research areas: include a technical writer in maximum 2 SCRUM teams
07:47:59 <fsasaki> .. want to set up a controlling to measure software quality and time to market
07:48:44 <fsasaki> .. difficult task, software quality is hard to measure
07:48:47 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:50:35 <fsasaki> topic: presentation from Andrejs Vasiljevs
07:50:58 <fsasaki> Andrejs: talking about challenges for smaller challenges
07:51:25 <fsasaki> .. tools should be provided to help to bridge language barriers esp. for these languages
07:51:59 <fsasaki> .. unesco is working on code of ethics , including demand to represent all linguistic grops in cyber space
07:52:49 <fsasaki> .. alvin toffler: "survival of smaller langauges depends on outcome of MT versus proliferation of larger languages"
07:52:54 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:53:12 <fsasaki> Andrejs: Tilde is doing both language technology and localization services
07:53:26 <fsasaki> .. we can see real needs of users and test new approaches 
07:54:11 <lbellido> lbellido has joined #mlw
07:55:24 <fsasaki> Andrejs: MT at tilde: first rule-based, switching to data-driven methods in 2008, heavy participation in EU R&D
07:55:31 <fsasaki> .. about MT development
07:55:51 <fsasaki> .. not only research, but bring results in tools we provide
07:56:03 <fsasaki> .. MT, dictionaries widely used in the country
07:56:22 <fsasaki> .. work with MS research to improve MT engine for our language
07:56:56 <fsasaki> .. problem of data driven MT: translation quality is low for under-resourced langauges
07:58:29 <fsasaki> .. other challenge is customization: mass-market, online MT-systems are general
07:58:38 <fsasaki> .. performance is poor for specific domains
07:59:30 <fsasaki> .. open source tools like GIZA++ or moses are hard to use for the ordinary user, too complex
07:59:33 <ChriLi> ChriLi has joined #mlw
08:00:02 <fsasaki> .. strategies to help: see "LetsMT!" project
08:00:17 <fsasaki> .. building a platform to gather public and user-provided MT training data
08:00:32 <fsasaki> .. increasing quality, scope and language coverage for MT
08:01:24 <fsasaki> .. area is "machine translation for the multilingual web"
08:01:34 <ChriLi> ChriLi has left #mlw
08:01:54 <fsasaki> .. user survey about IPR of text resoruces 
08:02:04 <fsasaki> .. there is willingness to share data
08:02:23 <fsasaki> s/willingness/some willingness/
08:03:26 <fsasaki> Andrejs: another project "Accurat"
08:03:40 <fsasaki> .. non-parallel bi- or multilingual text resources
08:03:48 <fsasaki> .. e.g. multilingual news feeds
08:04:01 <fsasaki> .. wikipedia articles, multilingual web sites, ...
08:04:20 <fsasaki> .. these show scale of comparability
08:04:40 <fsasaki> .. we calculate the comparability
08:05:03 <fsasaki> .. develop comparability metrics
08:05:26 <fsasaki> .. develop methods for automatic acquisition of parallel texts
08:05:50 <fsasaki> .. cnosortium has both research institutions and SMEs
08:05:56 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:06:54 <fsasaki> .. taggingg MT translated tags would be very helpful
08:07:07 <fsasaki> .. to be able to distingush MT translated texts from human translated text
08:07:22 <fsasaki> .. common interfaces for MT enginges would facilitae interoperability
08:07:31 <fsasaki> .. standardization / BP are needed
08:08:06 <fsasaki> topic: presentation from Boštjan Pajntar
08:08:40 <fsasaki> Boštjan: about collecting aligned textual corpora from the hidden web
08:09:32 <fsasaki> .. aligned parallel corpus: a text alongside its translation(s)
08:09:42 <omstefanov> omstefanov has joined #mlw
08:09:49 <fsasaki> .. usage: translation memory, training MT systems, many NLP scenario
08:10:10 <fsasaki> .. looked at standards, decided to go for TMX
08:10:31 <fsasaki> .. XLIFF is in my list in the last bullet point, in brackets
08:10:41 <fsasaki> .. so XLIFF needs more marketing & development
08:11:16 <fsasaki> .. getting data: non-english professional web sites
08:11:26 <fsasaki> .. huge amount of translated text
08:11:37 <fsasaki> .. in general quality translations
08:11:43 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:12:19 <fsasaki> Boštjan: problems:
08:12:28 <fsasaki> .. translation memory is hard to get
08:12:35 <fsasaki> .. data should have high precision
08:14:32 <fsasaki> .. .. no standard fully supports automatic harnessing or cleaning of data
08:15:00 <fsasaki> .. proposed solution: crawl from the web 
08:15:32 <fsasaki> .. > database > list of HTML candidates > list of text candidates > paralell corpora
08:15:55 <fsasaki> .. see http://kameleon.ijs.si/t4m for more info
08:16:59 <tadej> http://kameleon.ijs.si/t4me/
08:18:03 <fsasaki> s/t4m/t4m\//
08:18:12 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:18:58 <fsasaki> s/paralell/parallel/
08:19:20 <fsasaki> Boštjan: we used TMX - is it the right choice?
08:19:25 <fsasaki> .. source language must be defined
08:19:59 <fsasaki> .. no need for me to do that, I just have paralllel texts for machine consumption
08:20:14 <fsasaki> .. would need an optional parameter to define the source for each segment
08:20:39 <joerg> joerg has joined #mlw
08:20:50 <fsasaki> .. when you develop a standard, think also about "machines" as users, not only people
08:21:16 <fsasaki> .. future work: optimization in the areas of two phrase crawling, character encoding, enhanced candidates extraction
08:21:21 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:21:27 <luke> To answer Bostjan's question about "how many errors are acceptable", the answer (frustratingly for him, I'm sure) is "it depends":  Is the text a guide for system administrators or the company homepage?  Also: what are the type of errors (people can usually understand text with some grammatical errors, but if the key nouns/verbs are incorrect, it could be confusing/embaressing).
08:22:23 <fsasaki> Boštjan: web service for TM memmory distribution and filtering (web 2.0 style)
08:22:44 <fsasaki> topic: presentation from Gavin Brelstaff
08:22:53 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:24:47 <fsasaki> gavin: interactive alignment of parallel texts
08:25:43 <fsasaki> .. world wide web: need to both think globally, but alos locally, e.g. in terms of minority languages
08:26:13 <fsasaki> .. "a seed-bed for poetic expression, beyond mere communication"
08:28:30 <fsasaki> .. cultural context is important, see R. Jakobson
08:29:08 <fsasaki> .. there is an osmosis between minority languages and global languages
08:29:11 <fsasaki> .. everybody becomes a 2nd language speaker
08:29:17 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:30:51 <fsasaki> Gavin: parallel text alignment <> to communicate semantics
08:31:16 <fsasaki> .. we have standards-based markup, web delivery cross-browser, non-verbal interactivity ...
08:31:35 <fsasaki> .. statistical MT will not translate poetry in the next 20-50 years 
08:32:47 <fsasaki> .. we developed a parallel text alignment web interface
08:33:06 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:34:22 <fsasaki> demo of interactive text alignment
08:36:07 <fsasaki> .. standards that have been used for the demo: TEI (XML-based) structure
08:36:27 <fsasaki> .. presented as XHTML, with CSS, JavaScript
08:36:46 <fsasaki> .. semantics is not RDF, but the TEI structure
08:38:09 <fsasaki> gavin: beauty of Unicode - one can put multilingual information in directly into the content
08:38:29 <fsasaki> .. pros: we can interact directly with semantics
08:38:46 <fsasaki> .. w3crange does not work in browsers
08:38:58 <fsasaki> .. TEI P5 must be subsetted
08:39:03 <fsasaki> s/pros/pros and cons/
08:39:19 <fsasaki> .. CSS selection helps with jquery
08:40:02 <fsasaki> .. some browser isues, does not work everywhere
08:40:10 <fsasaki> topic: q/a session
08:42:29 <fsasaki> question about semantic web for MT training 
08:42:40 <fsasaki> dave: have thought about that
08:42:48 <fsasaki> .. e.g. linking to terminology data bases
08:43:17 <fsasaki> .. looking into lexical markup, there was a presentation at the last mlw workshop about this
08:43:30 <fsasaki> .. hot topic in MT; linguistically informed MT
08:46:05 <fsasaki> discussion about legal issues with gathering corpora via the Web - is it legal at all?
08:46:25 <fsasaki> Boštjan: laywers will work on finding that out 
08:46:59 <fsasaki> Alexandra: all languages in our project need to be finished
08:47:30 <fsasaki> .. depending on the language it is difficult or easier
08:47:58 <fsasaki> christian: funny to see the same questions, I had the remark on IP too, let's see where this goes
08:48:23 <r12a> christian lieske: everyone mentioned that categorisation of what we find on the web would help with machine analysis
08:48:23 <fsasaki> .. not a question, but a remark: all of you mentioned that categorization of what we find on the Web would be helpful for reliable machine analysis
08:48:40 <r12a> ... some communities have a detailed approach to this
08:49:08 <r12a> ... look at last year's w3c day in berlin and you'll see how work on digital libraries may fit well with machine translation
08:49:26 <fsasaki> (above is presentation from Günther Neher)
08:50:36 <r12a> ??: often pages with the same url that are translated are not exactly the same structure
08:51:06 <fsasaki> (see, in German, http://www.xinnovations.de/downloads-2010.html?file=tl_files/xinnovations.2010/Download/W3C-Tag/Prof.%20Dr.%20Guenther%20Neher.pdf) 
08:51:18 <r12a> bostjan: we have done little testing so far - about 7000 translations - and it worked well
08:52:13 <r12a> ... our preliminary experiments show that it still works very well, even if there isn't the same content on both sides of parallel text
08:52:48 <r12a> andrejs: see the FP7 project that is looking how to extract comparable corpora
08:55:15 <r12a> s pemberton: i'm impressed by willingness  to translate poetry - i'm performing in an opera and it took me a while to understand some allusions and references  (gives examples)
08:55:25 <r12a> ... i'm amazed that you hope ever to do this
08:55:49 <r12a> gavinB: our approach is to find the interface - to see how far machines can go
08:56:40 <r12a> ... it is possible to a translation based on bare bones - even humans can get things wrong...
08:57:04 <r12a> jorgS: if you have conceptual mismatches, how do you resolve them?
08:57:39 <r12a> gavinB: this is where the human translator accepts that they need to go away and study it - in our system we mark it up in red
08:57:49 <r12a> ... the translation will never be exact
08:58:15 <r12a> jorgS: for dave, what do you think of thenext generation of content generation based on RDF ?
08:59:08 <r12a> dave: there's still a gap between computational linguists and semantic web folks - there are people looking at how to apply these things, and there are proposals out there
08:59:23 <r12a> ... we're looking at how to integrate those approaches into what we do
08:59:41 <r12a> jorgB: i'm looking forward to multilingual text generation
09:00:15 <r12a> lukeS: i was intrigued by gavin's presentation
09:00:38 <r12a> ... seems the best you can do wrt translation is to come up with a separate poem that has the same feel
09:01:11 <r12a> ... but this  may be a useful tool for understanding the original material better
09:01:28 <Jirka> Jirka has left #mlw
09:01:28 <r12a> ... there may be implications for other translation approaces
09:02:43 <r12a> christianL: i understand the remarks about translation poems with machines - but to me Gavin's talk was about an annotation mechanism based on standards
09:03:02 <r12a> .... there is a need for this approach, and gavin's presentation was inspirational
09:03:47 <r12a> ... more and more acccurate annotations are needed, but there are other aspects to translation and gavin's presentation pointed to many useful aspects of this
09:04:13 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
09:05:46 <chaals> chaals has joined #mlw
09:34:03 <tadej> tadej has joined #mlw
09:34:51 <tadej> topic: Users session, Paula Shannon, Social Media is Global. Now What?
09:35:04 <tadej> rrsagent, draft minutes
09:35:04 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
09:35:35 <Steven> Steven has joined #mlw
09:37:31 <Steven> rrsagent, here?
09:37:31 <RRSAgent> See http://www.w3.org/2011/04/05-mlw-irc#T09-37-31
09:37:40 <tadej> paula: introducing how social media is changing localization 
09:38:36 <tadej> ... showing video on social media
09:40:04 <fsasaki> fsasaki has joined #mlw
09:40:56 <tadej> ... video emphasizing rapid growth and scale of various SNs, describing the relationship of new generation towards social media 
09:42:16 <tadej> ... video focusing on effect of social media on advertising, enabling higher ROI for marketing 
09:42:39 <tadej> ... introducing the term "socialnomics"
09:43:35 <Steven> One mistake in the video - it conflated Internet and Web, so the time to 50M users was for the web, not for the internet
09:44:04 <lbellido> lbellido has joined #mlw
09:44:53 <tadej> paula: describing the notion of reputation control via media - the talk will be about showing how this does not hold in presence of social media 
09:45:51 <tadej> ... analogy with toddlers as example of parents not being in control 
09:47:17 <tadej> ... in social media, the user is in the middle of the system and his worldview actually defines his experience
09:47:38 <omstefanov> omstefanov has joined #mlw
09:48:17 <tadej> ... emphasizing other social networks than facebook, e.g. hi5, orkut - a reason for their success was the fact that they were localized
09:49:29 <tadej> ... talking about surveys on social media and lionbridge involvement - how people are using social media multilingually
09:50:35 <tadej> ... companies using social media: a quarter of companies are using all 4 platforms - europe and especially asia businesses are growing much 
09:50:51 <tadej> ... faster than u.s. companies, likely due to legal issues
09:52:07 <tadej> ... twitter is increasingly popular, fastest growth
09:53:26 <tadej> ... 60% of tweets are non-english, but twitter localized only in 7 languages
09:53:39 <tadej> rrsagent, draft minutes
09:53:40 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
09:54:17 <tadej> paula: companies engage in hyper-local strategies, twitter account-per-region
09:54:36 <joerg> joerg has joined #mlw
09:55:41 <tadej> paula: twitter brought new metric: TPS - tweets per second
09:56:00 <tadej> scribe: tadej
09:56:58 <tadej> paula: smartphones becoming the relevant computing platform
09:57:06 <tadej> rrsagent, draft minutes
09:57:06 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
09:58:13 <tadej> paula: why are companies engaging? because SM allows them to really interact with the users
09:58:51 <tadej> scribe: tadej, r12a, fsasaki
09:59:40 <tadej> paula: strategies of social media: 1) single centralized controlled SM outlet
10:00:14 <tadej> scribes: fsasaki,r12a,tadej
10:00:29 <Steven> Steven has joined #mlw
10:00:31 <tadej> rrsagent, draft minutes
10:00:31 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
10:01:10 <tadej> paula: 2) decentralited local pages - more effective, but users have more control
10:01:35 <Steven> s/scribe: various/scribe: fsasaki/
10:01:41 <Steven> rrsagent, make minutes
10:01:41 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven
10:03:15 <Steven> i/christian lieske: everyone mentioned that categorisation/scribe: r12a
10:03:21 <Steven> rrsagent, make minutes
10:03:21 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven
10:03:24 <tadej> paula: it is still a huge opportunity - example: coca-cola has 250 people who are tasked with buying keywords
10:04:13 <Steven> i/paula: introducing how social media/scribe: tadej
10:04:20 <Steven> rrsagent, make minutes
10:04:20 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven
10:04:45 <tadej> ... important assertions: it is happening quickly, it's huge and growing. instantly available content has more value that quality content
10:06:19 <tadej> paula: the real-time aspect also affects localization processes - when localizing a message, the process might take too much time 
10:07:06 <tadej> ... real-time multilingual communication does not leave space for pre- and post- editing, leaving a lot of human intervention out
10:08:10 <tadej> paula: last assertion: machine translation is being increasingly more relevant for SM outlets
10:09:17 <tadej> topic: Presentation of Maarten de Rijke - Emotions, experiences and the social media
10:11:41 <tadej> maarten: intro - academics are not concerning with standards per-se, but trying to get things done
10:12:19 <tadej> ... talk will be about standards supporting intelligent information access of content
10:13:38 <tadej> ... in social media, people still do the same things, but online instead of offline
10:15:23 <tadej> ... presenting concrete project of a political mashup
10:16:17 <tadej> ... gather political social media content, debates, analyze and semantify it. political scientists are interested in tracking topic ownership
10:17:04 <tadej> ... traditionally, this resesarch was conducted via clasisic clipping, now via social media.
10:17:19 <tadej> ... however, data gathered this way is increasingly multilingual
10:18:22 <tadej> ... another project, CoSyne, about cross-completing wikipedia pages using different language articles on the same topic
10:18:30 <Steven> s/clasisic/classic
10:19:40 <tadej> ... third example: The Mood of the Web - Livejournal has mood annotated blogs, serving as a stream of mood-annotated data
10:21:59 <tadej> ... when following mood patterns accross time, you can try to interpret them, for instance "shocked", "tired" 
10:23:49 <tadej> ... what would explain a huge spike in "shocked" in 2008. by combining livejournal streams with news and counting word usage statistics, it turns out that it was the death of actor heath ledger.
10:26:17 <tadej> ... showing a time series on stress measurements, showing a spike at the end of the year - that sort of analyses require a lot of technology for text processing and information extraction
10:27:57 <tadej> ... introducing Fietstas, a multilingual en/nl text processing engine as infrastructure for what was presented
10:29:32 <tadej> topic: presentation from Gustavo Lucardi - Nascent Best Practices of Multilingual SEO
10:29:56 <tadej> rrsagent, draft minutes
10:29:56 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
10:30:35 <tadej> gustavo: comparing the SEO process with preparing a gourmet meal
10:31:05 <tadej> ... posing the question, "what are the right ingredients for multilingual SEO?"
10:32:36 <tadej> ... high search engine positioning is very important, holding potential for high revenue
10:33:11 <tadej> ... introducing terms: SEO, MSEO, SMO, Social SEO as different strategies in the field
10:33:55 <tadej> an important distinction is that whereas in SEO traffic comes from search engines, in SMO traffic comes from social media
10:34:35 <tadej> gustavo: for example, 500 tweets have more effect than 500 incoming links
10:35:15 <tadej> ... however, SEO still has higher ROI than SEO
10:36:42 <tadej> ... an important concept in SEO is the long tail effect in certain business models
10:38:02 <tadej> ... just translating keywords does bring traffic, but has low conversion rates
10:39:31 <tadej> ... for effective multilingual, international SEO, he recommends the W3C Language Standards as basic rules
10:40:33 <tadej> ... SEO can be multilingual, internation or geographical, not mutually exclusive among these. 
10:41:24 <tadej> gustavo: what did we learn doing it: 
10:41:26 <tadej> ... 1) focus on the long tail and niche market
10:41:33 <tadej> ... 2) conversions, not traffic
10:41:44 <tadej> 3) things change, iterate
10:43:32 <tadej> gustavo: showing examples - a legal company campaign was successful once they used correct glossary translations
10:44:11 <tadej> ... healthcare insurance campaign was better once they regionalized their content
10:44:53 <tadej> ... hotel chain: 12 languages, necessary to cover all
10:45:18 <tadej> topic: presentation from Chiara Pacella - Controlled and uncontrolled environments in social networking websites and linguistic rules for multilingual websites
10:47:00 <tadej> chiara: the talk will be around control of content and the implications of having a controlled vs. uncontrolled content
10:48:16 <tadej> ... controlled environment - the user does not have influence, the content is relatively static
10:49:18 <tadej> ... in a controlled environment, the developers work with strings with sentences, which are then combined 
10:50:02 <tadej> ... in an uncontrolled component, the content is very dynamic, developers have limited control - they combine it with the controlled component before outputting
10:51:21 <tadej> ... even in a single sentence, there may be a combination of controlled and uncontrolled strings
10:51:59 <tadej> ... in the translator's view, the content is treated as token variables
10:52:51 <tadej> ... explaining their approach to i18n: handling languages with gender, number, declensions, etc.
10:53:20 <tadej> ... different languages may have different needs than the source language
10:53:48 <chaals> s/3) things/... 3) things
10:54:23 <tadej> ... they solve that by "dynamic string explosion", which enables a translator to have multiple translations for the same source string depending on the linguistic context
10:54:46 <tadej> s/an important distinction/... an important distinction
10:55:55 <tadej> chiara: ... in romanian, the translator must specify gender, but in finnish and russian, it is even more complicated
10:56:36 <tadej> ... an important aspect and the point of this talk is that facebook users are the translators
10:57:47 <tadej> ... considering machine translation, but haven't implemented it yet
10:58:58 <tadej> ... french was translated in 24 hours, released in three weeks, now supporting 67 language, many released without professional review
10:59:10 <tadej> ... review process:
10:59:26 <tadej> ... 1) translating the glossary of individual terms
10:59:41 <tadej> ... 2) translating the content
10:59:53 <tadej> ... 3) professional supervision and checking
11:00:24 <tadej> ... the tool supports both inline and bulk translation for in- and out- of context translation
11:01:35 <tadej> ... why use community translation: 1) users are domain experts 2) speed 3) reach
11:02:22 <tadej> ... why do users translate: personal satisfaction and pride, leaderboard of translation statistics
11:02:41 <tadej> rrsagent, draft minutes
11:02:41 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
11:03:01 <tadej> topic: presentation of Ian Truscott, Customizing the multilingual customer experience – deliver targeted online information based on geography, user preferences, channel and visitor demographics
11:04:33 <tadej> ian: SDL is an international company, and itself also faces the multilingual problem
11:06:26 <jan> jan has joined #mlw
11:06:38 <tadej> ... big themes: social media and different devices, how information is shaping opinions, relevant content is often in the user's language
11:07:54 <tadej> ... reiterates the point that buyers are sensitive to the language of the content when buying
11:10:40 <tadej> ... while around 50% of tweets are english, it is diminishing
11:11:35 <tadej> ... connecting with visitors: be relevant, listen, understand, engage
11:11:48 <tadej> ... this requires monitoring solution
11:12:12 <tadej> s/requires monitoring solution/requires monitoring solutions
11:13:07 <tadej> ian: understanding: finding common interests accross languages, demographics and geographies
11:13:18 <tadej> ... it turns out that the common interests are key
11:15:40 <tadej> ... content should be relevant, and better relevance via localisation is reflected in better effectiveness of communication
11:16:37 <tadej> ... presenting the journey of the customer engagement, from research of products to buying and customer support
11:17:37 <tadej> ... for the customer's journey, there's a lot of content with which the user engages that needs to be appropriate
11:18:01 <davidf> davidf has joined #mlw
11:19:36 <tadej> ... if people are coming to the website, they are trying to get stuff done, so 'user engagement' may be an obstacle
11:20:39 <tadej> ... users' expectations have changed
11:21:00 <tadej> ... they expect content in their own language
11:21:23 <tadej> topic: Users Q&A
11:22:18 <tadej> DavidGrunwald: to paula - you haven't discussed whether you have the tools in place to harness social media?
11:24:00 <tadej> paula: they don't crowdsource, they crowd manage - using input from users of various levels of skills, split the work into tasks and monitor that
11:24:51 <tadej> DavidGrunwald: you are not letting the crowd control the message, as you claimed in your talk
11:25:12 <tadej> paula: the content that I am referring to is not always in public or social media
11:25:51 <lbellido> lbellido has joined #mlw
11:26:36 <tadej> ian: with social media, you can translate and listen, but you have to be caution with translating and speaking (with automatic tools)
11:27:02 <tadej> ian: agree with crowdsourcing, but it needs to be a love brand, for which people want to write
11:28:46 <tadej> LukeS: on exploding translations in facebook - there is an open source project that unicode consortium supports that handles a subset of the language morphology problem
11:29:44 <tadej> DanTufiş: to maarten - what theories is your work relying on
11:30:14 <tadej> maarten: machine translation as core technologies, political science as application
11:30:45 <tadej> DanTufiş: points out Osgood's work on subjectivity with using wordnet to extract sentiment
11:31:44 <tadej> Steven: points out that in paula's presentation, it was not the internet that took 4 years to 50 million, but the WWW
11:31:53 <tadej> rrsagent, draft minutes
11:31:53 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
11:32:20 <tadej> tadej has left #mlw
11:32:44 <Steven> Steven has left #mlw
12:20:26 <Steven> Steven has joined #mlw
12:33:59 <joerg> joerg has joined #mlw
12:34:38 <joerg> start scribing / policy session
12:35:11 <fsasaki> fsasaki has joined #mlw
12:35:17 <joerg> jaap van der meer starts with naming different devices
12:35:58 <chaals> ScribeNick: joerg
12:35:58 <joerg> ... reports on last standard summit in Boston
12:36:38 <fsasaki> topic: Presentation from Jaap van de Meer
12:36:46 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
12:37:12 <r12a> r12a has joined #mlw
12:38:07 <joerg> ... interoperability questionnaire
12:38:39 <joerg> ... and interestedness in standards in particular
12:39:29 <joerg> ... quotes some of the statements regarding costs
12:40:10 <joerg> ... where is the friction? mostly TM followed by terminology
12:41:07 <joerg> ... reasons to support: freedom of tool choice
12:41:31 <Steven> Steven has joined #mlw
12:42:06 <joerg> ... biggest barriers: lack of compliance, lack of maturity, etc.
12:43:27 <joerg> ... sort of restistance against interop. such as market drop-down
12:43:37 <IanTruscott> IanTruscott has joined #mlw
12:43:43 <omstefanov> omstefanov has joined #mlw
12:44:34 <joerg> ... different perspectives of believers
12:47:43 <joerg> ... realists point of view such as "accept market forces", "show business advantage", "restistance to tools", etc.
12:49:54 <joerg> ... and now the pragmatists: "they have hope..." ;-)
12:51:27 <ChriLi> ChriLi has joined #mlw
12:52:29 <joerg> ... future outlook (5 years!)
12:52:54 <lbellido> lbellido has joined #mlw
12:53:46 <jan> jan has joined #mlw
12:53:51 <joerg> ... content increase, multimedia, mobile, more cross-lingual challenges, ...
12:54:52 <joerg> ... brief SWOT analysis (see other TAUS publications too)
12:57:24 <joerg> ... information pyramid representing content disruption
12:58:07 <joerg> ... apply pyramid to SWOT graphic
12:58:54 <joerg> ... business model attributes: old vs. new
13:00:49 <joerg> ... e.g. TM is core vs. data is core; one- vs. multi-directional; word based pricing vs. SaaS; GMS vs. MT embedded
13:02:46 <joerg> ... enterprise in 5 years need a language strategies
13:04:39 <joerg> ... last slide: interoperability agenda
13:05:29 <joerg> ... more changes in the next 5 years than in the past 25 years
13:07:02 <joerg> Next speaker: Fernando Servan
13:08:38 <joerg> talks about the challenges of multilinguality for international organizations
13:09:25 <joerg> ... gives the context of the food and agriculture organization of the UN
13:10:45 <joerg> ... 6 languages (en, fr, es, arabic, cz, ru); approx. 12 m words/year
13:11:37 <joerg> ... English has the largest share of doc lang.
13:12:50 <joerg> ... websites in 6 lang. and regional relevance content in 3 lang.
13:15:09 <joerg> ... challenges for doc. and web content: tech., prof. profiles, workflow, "consumer" languages
13:18:20 <joerg> ... additional challenges are: rules and regulations, re-use of translations, TM/MT integration
13:19:07 <joerg> ... no analysis or lessons learned available currently
13:20:31 <joerg> ... envison the employment of CMS, CAT-tools, extend prof. profiles, optimize workflows
13:21:34 <joerg> ... under discussion: employment of open source software, cloud services, etc.
13:22:13 <joerg> ... funding could be based on current SME call of the EC
13:23:21 <joerg> Next speaker: Stelios Peperidis
13:24:13 <chaals> s/Next/Topic:/
13:24:22 <joerg> talks about language resources sharing initiative in the context of MetaNet
13:24:39 <chaals> s/Peperidis/Piperidis/
13:26:40 <joerg> ... introduces the objectives and structure of Meta-Net, focus will be on Meta-Share
13:27:51 <joerg> ... emphasis the key challenge of data and how it relates on LT research and development
13:29:33 <joerg> ... another important point in the initial discussions was standards
13:31:15 <joerg> ... observations: making data employable is costly
13:32:36 <joerg> ... Meta-Share shall be an open infrastructure that enables interoperability on various layers
13:33:50 <joerg> ... it is also built on existing projects and initiatives that already in this broad field
13:35:10 <joerg> ... as an umbrella organization which shall also include national efforts
13:35:31 <chaals> s/talks about language resources sharing initiative in the context of MetaNet/Stelios: [talks about language resources sharing initiative in the context of MetaNet]/
13:35:59 <chaals> s/talks about the challenges of multilinguality for international organizations/Fernando: [talks about the challenges of multilinguality for international organizations]/
13:36:02 <joerg> ... the main idea of the Meta-Share architecture is distribution based on a "meta schema" model
13:36:04 <chaals> s/Next/Topic:/
13:36:36 <chaals> s/start scribing/Topic:/
13:37:18 <chaals> s/jaap van der meer starts with naming different devices/Topic: Speaker - Jaap van der Meer/
13:37:22 <joerg> .... users/consumers will have the possibility to search, browse and download resources
13:37:56 <chaals> i/... reports on last standard summit in Boston/Jaap: begins by asking for the different ways people call a mobile phone/
13:39:03 <joerg> ... fully supports open source developments including appropriate maintenance
13:40:44 <joerg> ... Meta-Share governance is given by members and associate members; legal issues are under cc
13:42:17 <joerg> Start of discussion of Policy Session 
13:43:38 <joerg> Chaals: Word count is going down does mean translation workload decreases. Speculate on the implications?
13:43:55 <chaals> s/does mean/does not mean/
13:44:17 <chaals> s/implications/implications where in fact we get more complex multimedia to include in the mix/
13:45:07 <joerg> Jaap: Identification of different rating criteria; human interference; word count is unmanageable; more demand
13:45:30 <joerg> ... for MT but with different pricing models
13:46:27 <joerg> Fernando: New challenges through users; relying on help from different sites
13:47:48 <joerg> Stelios: Subtitling has a different approach based on intellectual capabilities needed; time of media content
13:48:07 <joerg> ... mutiplied by a certain factor
13:48:42 <joerg> Chaals: Who owns the data question?
13:50:10 <joerg> Steven: In the Netherlands all films are subtitled... quotes a translator "we are payed by the word"? What would the
13:50:22 <joerg> integration of MT mean?
13:51:54 <joerg> Stelios: Translation based on a "master file", i.e. the translation pricing model applies.
13:52:50 <joerg> Reinhard: Subtitling for free i.e. by volenteers?
13:53:54 <joerg> Chaals: Student's translations, shipped to India; there are several models...
13:55:54 <joerg> Stefanov: Some points need to be highlighted: PEs, interpretation vs. translation, different multi-media presentations,
13:56:13 <joerg> quality control will change, etc.
13:56:41 <joerg> Chaals: You mean librarians?
13:57:23 <joerg> Stefanov: Not really... the picture is changing.
13:59:17 <chaals> [modern librarians learn to manage digital multimedia collections, and don't have to have their hair in a bun anymore. I am often surpriesd that they are not present at all at conferences like this - it seems we're missing out on expertise that seems highly relevant]
13:59:50 <joerg> Christian: MT in subtitling already existing, e.g. in Scandinavia. Question on (gov) rules to multimedia?
14:00:45 <joerg> Chaals: I have seen such rules but there are a lot of options.
14:00:53 <joerg> END of Session
14:04:37 <tadej> tadej has joined #mlw
14:10:22 <tadej> tadej has left #mlw
14:11:17 <chaals> s/(gov) rules to/whether there are policies aimed at reducing translation costs by limiting use of/
16:51:26 <RRSAgent> RRSAgent has joined #mlw
16:51:26 <RRSAgent> logging to http://www.w3.org/2011/04/05-mlw-irc
16:51:31 <Steven> rrsagent, here?
16:51:31 <RRSAgent> See http://www.w3.org/2011/04/05-mlw-irc#T16-51-31
16:51:53 <Steven> rrsagent, make minutes
16:51:53 <RRSAgent> I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven