IRC log of mlw on 2011-04-05

Timestamps are in UTC.

07:07:42 [RRSAgent]
RRSAgent has joined #mlw
07:07:42 [RRSAgent]
logging to http://www.w3.org/2011/04/05-mlw-irc
07:07:52 [fsasaki]
meeting: MLW Pisa Workshop, day 2
07:07:54 [jan]
jan has joined #mlw
07:07:55 [fsasaki]
chair: Richard
07:08:00 [tadej]
hm, i think charles was supposed to be scribing?
07:08:00 [fsasaki]
scribe: various
07:08:21 [fsasaki]
topic: Presentation from Dave Lewis et al.
07:08:26 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:08:43 [fsasaki]
Dave: Presenting on CNGL research
07:08:46 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:09:06 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:09:59 [fsasaki]
Dave: multilngual IR, real time social media translation etc. are all part of the aim to support the global customer
07:10:01 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:10:26 [lbellido]
lbellido has joined #mlw
07:12:44 [fsasaki]
Dave: web services - benefits for localisation like "pay as your use" models, easy deployment, ....
07:16:23 [fsasaki]
Dave: industry survey shows barriers for adoption of technology
07:17:50 [Jirka]
Jirka has joined #mlw
07:18:13 [fsasaki]
dave: web services interoperability - needs to be very careful in profiling
07:18:55 [tadej]
I can jump in - although getting someone from the floor would help
07:19:33 [tadej]
yes
07:19:34 [r12a]
r12a has joined #mlw
07:19:46 [tadej]
Dave: proposing employing semantic web technology to the MT use case
07:19:53 [tadej]
cool, thanks
07:20:08 [fsasaki]
dave: semantic web may help to solve the problems we are looking at
07:20:25 [fsasaki]
.. sw is a good mechanism to leverage other things
07:20:58 [fsasaki]
.. tools are maturing
07:21:10 [fsasaki]
.. we are interested in a small part of the sw stack, that is RDF
07:22:04 [luke]
luke has joined #mlw
07:22:06 [fsasaki]
.. RDF is a triple langugae, everything gets a URI and can be referenced, RDF schema provides some basic modeling methods
07:22:18 [fsasaki]
Dave compares RDF to relational data bases
07:23:32 [tadej]
until now, fsasaki was
07:24:11 [fsasaki]
dave: RDF provides classes, properties, ...
07:24:29 [fsasaki]
.. including multiple heritance, allows combinations in an interesting way
07:25:36 [fsasaki]
.. semantic web has not necessarily standardization, people just create a vocabulary
07:25:54 [fsasaki]
.. if it is taken up, good - a "survival of the fittest" approach
07:26:48 [fsasaki]
.. existing data can be annotated with RDF - for Web services there is WASDL
07:27:05 [fsasaki]
s/for Web/e.g. for Web/
07:27:09 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:28:11 [fsasaki]
dave: developed a seed taxonomy for next generation localisation (NGL) content
07:28:45 [fsasaki]
.. working with many researchers in CNGL to see whether the taxonomy fits their needs, otherwise it is changed
07:29:29 [fsasaki]
s/until now, fsasaki was//
07:29:44 [fsasaki]
s/cool, thanks//
07:30:09 [fsasaki]
dave: have a model refinement cycle for this
07:30:51 [fsasaki]
.. fine-grained roundtrips involving customer, content developer, LSP, translators
07:31:00 [fsasaki]
.. looking into doing this with RDF
07:31:22 [fsasaki]
.. "linked open data" - not focusing so much on reasoning, but to see how to publish data you have
07:31:38 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:31:54 [fsasaki]
dave: triple stores are becoming robust, starting to scale
07:32:22 [fsasaki]
s/hm, i think charles was supposed to be scribing//
07:32:38 [fsasaki]
s/I can jump in - although getting someone from the floor would help//
07:32:59 [fsasaki]
dave: important vocabulary from LOD: open provenancy vocabulary
07:33:11 [fsasaki]
.. helpfrul for author, segment and source QA
07:33:19 [fsasaki]
s/helpfrul/helpful/
07:33:25 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:34:08 [fsasaki]
dave: next steps:
07:34:41 [fsasaki]
.. revise semantic model, semantic sandpit, content markup via RDFa, not standardising semantics, testing semantic technology
07:34:54 [fsasaki]
.. access control, etc.
07:35:18 [fsasaki]
dave: real power of SW is its extensibility
07:35:30 [fsasaki]
.. semantic annotations can help to improve interoperabilty
07:35:41 [fsasaki]
.. provenance linked data can help for roundtripping
07:36:00 [fsasaki]
.. will gather a lot of quality metadata about the content we are localising
07:36:14 [fsasaki]
.. that might be helpful for training statistical MT
07:36:20 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:36:49 [fsasaki]
topic: presentation from alexandra weisgerber
07:38:05 [Steven_]
Agenda: http://www.multilingualweb.eu/documents/pisa-workshop/program
07:38:10 [fsasaki]
alexandra: introducing swinng project, part of the software cluster
07:38:21 [fsasaki]
.. central principle: emergence
07:38:45 [fsasaki]
.. emergent software: enables combination of components and services for digital comparison
07:39:02 [fsasaki]
.. components can come from ERP, BMP, BPI, the Web, ...
07:41:30 [fsasaki]
alexandra: agility to better acount for reducing waste, empowering the team and the employee, ...
07:44:00 [fsasaki]
.. challenges: find a balance for right amount of documentation
07:44:37 [fsasaki]
.. had experiences with writing larger user concepts or user concepts on the white board
07:44:44 [Steven]
s|s/hm, i think charles was supposed to be scribing//||
07:45:05 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:45:13 [Steven]
s/hm, i think charles was supposed to be scribing?//
07:45:46 [Steven]
rrsagent, make minutes
07:45:46 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven
07:46:23 [fsasaki]
alexandra: actions and research areas: include a technical writer in maximum 2 SCRUM teams
07:47:59 [fsasaki]
.. want to set up a controlling to measure software quality and time to market
07:48:44 [fsasaki]
.. difficult task, software quality is hard to measure
07:48:47 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:50:35 [fsasaki]
topic: presentation from Andrejs Vasiljevs
07:50:58 [fsasaki]
Andrejs: talking about challenges for smaller challenges
07:51:25 [fsasaki]
.. tools should be provided to help to bridge language barriers esp. for these languages
07:51:59 [fsasaki]
.. unesco is working on code of ethics , including demand to represent all linguistic grops in cyber space
07:52:49 [fsasaki]
.. alvin toffler: "survival of smaller langauges depends on outcome of MT versus proliferation of larger languages"
07:52:54 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
07:53:12 [fsasaki]
Andrejs: Tilde is doing both language technology and localization services
07:53:26 [fsasaki]
.. we can see real needs of users and test new approaches
07:54:11 [lbellido]
lbellido has joined #mlw
07:55:24 [fsasaki]
Andrejs: MT at tilde: first rule-based, switching to data-driven methods in 2008, heavy participation in EU R&D
07:55:31 [fsasaki]
.. about MT development
07:55:51 [fsasaki]
.. not only research, but bring results in tools we provide
07:56:03 [fsasaki]
.. MT, dictionaries widely used in the country
07:56:22 [fsasaki]
.. work with MS research to improve MT engine for our language
07:56:56 [fsasaki]
.. problem of data driven MT: translation quality is low for under-resourced langauges
07:58:29 [fsasaki]
.. other challenge is customization: mass-market, online MT-systems are general
07:58:38 [fsasaki]
.. performance is poor for specific domains
07:59:30 [fsasaki]
.. open source tools like GIZA++ or moses are hard to use for the ordinary user, too complex
07:59:33 [ChriLi]
ChriLi has joined #mlw
08:00:02 [fsasaki]
.. strategies to help: see "LetsMT!" project
08:00:17 [fsasaki]
.. building a platform to gather public and user-provided MT training data
08:00:32 [fsasaki]
.. increasing quality, scope and language coverage for MT
08:01:24 [fsasaki]
.. area is "machine translation for the multilingual web"
08:01:34 [ChriLi]
ChriLi has left #mlw
08:01:54 [fsasaki]
.. user survey about IPR of text resoruces
08:02:04 [fsasaki]
.. there is willingness to share data
08:02:23 [fsasaki]
s/willingness/some willingness/
08:03:26 [fsasaki]
Andrejs: another project "Accurat"
08:03:40 [fsasaki]
.. non-parallel bi- or multilingual text resources
08:03:48 [fsasaki]
.. e.g. multilingual news feeds
08:04:01 [fsasaki]
.. wikipedia articles, multilingual web sites, ...
08:04:20 [fsasaki]
.. these show scale of comparability
08:04:40 [fsasaki]
.. we calculate the comparability
08:05:03 [fsasaki]
.. develop comparability metrics
08:05:26 [fsasaki]
.. develop methods for automatic acquisition of parallel texts
08:05:50 [fsasaki]
.. cnosortium has both research institutions and SMEs
08:05:56 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:06:54 [fsasaki]
.. taggingg MT translated tags would be very helpful
08:07:07 [fsasaki]
.. to be able to distingush MT translated texts from human translated text
08:07:22 [fsasaki]
.. common interfaces for MT enginges would facilitae interoperability
08:07:31 [fsasaki]
.. standardization / BP are needed
08:08:06 [fsasaki]
topic: presentation from Boštjan Pajntar
08:08:40 [fsasaki]
Boštjan: about collecting aligned textual corpora from the hidden web
08:09:32 [fsasaki]
.. aligned parallel corpus: a text alongside its translation(s)
08:09:42 [omstefanov]
omstefanov has joined #mlw
08:09:49 [fsasaki]
.. usage: translation memory, training MT systems, many NLP scenario
08:10:10 [fsasaki]
.. looked at standards, decided to go for TMX
08:10:31 [fsasaki]
.. XLIFF is in my list in the last bullet point, in brackets
08:10:41 [fsasaki]
.. so XLIFF needs more marketing & development
08:11:16 [fsasaki]
.. getting data: non-english professional web sites
08:11:26 [fsasaki]
.. huge amount of translated text
08:11:37 [fsasaki]
.. in general quality translations
08:11:43 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:12:19 [fsasaki]
Boštjan: problems:
08:12:28 [fsasaki]
.. translation memory is hard to get
08:12:35 [fsasaki]
.. data should have high precision
08:14:32 [fsasaki]
.. .. no standard fully supports automatic harnessing or cleaning of data
08:15:00 [fsasaki]
.. proposed solution: crawl from the web
08:15:32 [fsasaki]
.. > database > list of HTML candidates > list of text candidates > paralell corpora
08:15:55 [fsasaki]
.. see http://kameleon.ijs.si/t4m for more info
08:16:59 [tadej]
http://kameleon.ijs.si/t4me/
08:18:03 [fsasaki]
s/t4m/t4m\//
08:18:12 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:18:58 [fsasaki]
s/paralell/parallel/
08:19:20 [fsasaki]
Boštjan: we used TMX - is it the right choice?
08:19:25 [fsasaki]
.. source language must be defined
08:19:59 [fsasaki]
.. no need for me to do that, I just have paralllel texts for machine consumption
08:20:14 [fsasaki]
.. would need an optional parameter to define the source for each segment
08:20:39 [joerg]
joerg has joined #mlw
08:20:50 [fsasaki]
.. when you develop a standard, think also about "machines" as users, not only people
08:21:16 [fsasaki]
.. future work: optimization in the areas of two phrase crawling, character encoding, enhanced candidates extraction
08:21:21 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:21:27 [luke]
To answer Bostjan's question about "how many errors are acceptable", the answer (frustratingly for him, I'm sure) is "it depends": Is the text a guide for system administrators or the company homepage? Also: what are the type of errors (people can usually understand text with some grammatical errors, but if the key nouns/verbs are incorrect, it could be confusing/embaressing).
08:22:23 [fsasaki]
Boštjan: web service for TM memmory distribution and filtering (web 2.0 style)
08:22:44 [fsasaki]
topic: presentation from Gavin Brelstaff
08:22:53 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:24:47 [fsasaki]
gavin: interactive alignment of parallel texts
08:25:43 [fsasaki]
.. world wide web: need to both think globally, but alos locally, e.g. in terms of minority languages
08:26:13 [fsasaki]
.. "a seed-bed for poetic expression, beyond mere communication"
08:28:30 [fsasaki]
.. cultural context is important, see R. Jakobson
08:29:08 [fsasaki]
.. there is an osmosis between minority languages and global languages
08:29:11 [fsasaki]
.. everybody becomes a 2nd language speaker
08:29:17 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:30:51 [fsasaki]
Gavin: parallel text alignment <> to communicate semantics
08:31:16 [fsasaki]
.. we have standards-based markup, web delivery cross-browser, non-verbal interactivity ...
08:31:35 [fsasaki]
.. statistical MT will not translate poetry in the next 20-50 years
08:32:47 [fsasaki]
.. we developed a parallel text alignment web interface
08:33:06 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
08:34:22 [fsasaki]
demo of interactive text alignment
08:36:07 [fsasaki]
.. standards that have been used for the demo: TEI (XML-based) structure
08:36:27 [fsasaki]
.. presented as XHTML, with CSS, JavaScript
08:36:46 [fsasaki]
.. semantics is not RDF, but the TEI structure
08:38:09 [fsasaki]
gavin: beauty of Unicode - one can put multilingual information in directly into the content
08:38:29 [fsasaki]
.. pros: we can interact directly with semantics
08:38:46 [fsasaki]
.. w3crange does not work in browsers
08:38:58 [fsasaki]
.. TEI P5 must be subsetted
08:39:03 [fsasaki]
s/pros/pros and cons/
08:39:19 [fsasaki]
.. CSS selection helps with jquery
08:40:02 [fsasaki]
.. some browser isues, does not work everywhere
08:40:10 [fsasaki]
topic: q/a session
08:42:29 [fsasaki]
question about semantic web for MT training
08:42:40 [fsasaki]
dave: have thought about that
08:42:48 [fsasaki]
.. e.g. linking to terminology data bases
08:43:17 [fsasaki]
.. looking into lexical markup, there was a presentation at the last mlw workshop about this
08:43:30 [fsasaki]
.. hot topic in MT; linguistically informed MT
08:46:05 [fsasaki]
discussion about legal issues with gathering corpora via the Web - is it legal at all?
08:46:25 [fsasaki]
Boštjan: laywers will work on finding that out
08:46:59 [fsasaki]
Alexandra: all languages in our project need to be finished
08:47:30 [fsasaki]
.. depending on the language it is difficult or easier
08:47:58 [fsasaki]
christian: funny to see the same questions, I had the remark on IP too, let's see where this goes
08:48:23 [r12a]
christian lieske: everyone mentioned that categorisation of what we find on the web would help with machine analysis
08:48:23 [fsasaki]
.. not a question, but a remark: all of you mentioned that categorization of what we find on the Web would be helpful for reliable machine analysis
08:48:40 [r12a]
... some communities have a detailed approach to this
08:49:08 [r12a]
... look at last year's w3c day in berlin and you'll see how work on digital libraries may fit well with machine translation
08:49:26 [fsasaki]
(above is presentation from Günther Neher)
08:50:36 [r12a]
??: often pages with the same url that are translated are not exactly the same structure
08:51:06 [fsasaki]
(see, in German, http://www.xinnovations.de/downloads-2010.html?file=tl_files/xinnovations.2010/Download/W3C-Tag/Prof.%20Dr.%20Guenther%20Neher.pdf)
08:51:18 [r12a]
bostjan: we have done little testing so far - about 7000 translations - and it worked well
08:52:13 [r12a]
... our preliminary experiments show that it still works very well, even if there isn't the same content on both sides of parallel text
08:52:48 [r12a]
andrejs: see the FP7 project that is looking how to extract comparable corpora
08:55:15 [r12a]
s pemberton: i'm impressed by willingness to translate poetry - i'm performing in an opera and it took me a while to understand some allusions and references (gives examples)
08:55:25 [r12a]
... i'm amazed that you hope ever to do this
08:55:49 [r12a]
gavinB: our approach is to find the interface - to see how far machines can go
08:56:40 [r12a]
... it is possible to a translation based on bare bones - even humans can get things wrong...
08:57:04 [r12a]
jorgS: if you have conceptual mismatches, how do you resolve them?
08:57:39 [r12a]
gavinB: this is where the human translator accepts that they need to go away and study it - in our system we mark it up in red
08:57:49 [r12a]
... the translation will never be exact
08:58:15 [r12a]
jorgS: for dave, what do you think of thenext generation of content generation based on RDF ?
08:59:08 [r12a]
dave: there's still a gap between computational linguists and semantic web folks - there are people looking at how to apply these things, and there are proposals out there
08:59:23 [r12a]
... we're looking at how to integrate those approaches into what we do
08:59:41 [r12a]
jorgB: i'm looking forward to multilingual text generation
09:00:15 [r12a]
lukeS: i was intrigued by gavin's presentation
09:00:38 [r12a]
... seems the best you can do wrt translation is to come up with a separate poem that has the same feel
09:01:11 [r12a]
... but this may be a useful tool for understanding the original material better
09:01:28 [Jirka]
Jirka has left #mlw
09:01:28 [r12a]
... there may be implications for other translation approaces
09:02:43 [r12a]
christianL: i understand the remarks about translation poems with machines - but to me Gavin's talk was about an annotation mechanism based on standards
09:03:02 [r12a]
.... there is a need for this approach, and gavin's presentation was inspirational
09:03:47 [r12a]
... more and more acccurate annotations are needed, but there are other aspects to translation and gavin's presentation pointed to many useful aspects of this
09:04:13 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
09:05:46 [chaals]
chaals has joined #mlw
09:34:03 [tadej]
tadej has joined #mlw
09:34:51 [tadej]
topic: Users session, Paula Shannon, Social Media is Global. Now What?
09:35:04 [tadej]
rrsagent, draft minutes
09:35:04 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
09:35:35 [Steven]
Steven has joined #mlw
09:37:31 [Steven]
rrsagent, here?
09:37:31 [RRSAgent]
See http://www.w3.org/2011/04/05-mlw-irc#T09-37-31
09:37:40 [tadej]
paula: introducing how social media is changing localization
09:38:36 [tadej]
... showing video on social media
09:40:04 [fsasaki]
fsasaki has joined #mlw
09:40:56 [tadej]
... video emphasizing rapid growth and scale of various SNs, describing the relationship of new generation towards social media
09:42:16 [tadej]
... video focusing on effect of social media on advertising, enabling higher ROI for marketing
09:42:39 [tadej]
... introducing the term "socialnomics"
09:43:35 [Steven]
One mistake in the video - it conflated Internet and Web, so the time to 50M users was for the web, not for the internet
09:44:04 [lbellido]
lbellido has joined #mlw
09:44:53 [tadej]
paula: describing the notion of reputation control via media - the talk will be about showing how this does not hold in presence of social media
09:45:51 [tadej]
... analogy with toddlers as example of parents not being in control
09:47:17 [tadej]
... in social media, the user is in the middle of the system and his worldview actually defines his experience
09:47:38 [omstefanov]
omstefanov has joined #mlw
09:48:17 [tadej]
... emphasizing other social networks than facebook, e.g. hi5, orkut - a reason for their success was the fact that they were localized
09:49:29 [tadej]
... talking about surveys on social media and lionbridge involvement - how people are using social media multilingually
09:50:35 [tadej]
... companies using social media: a quarter of companies are using all 4 platforms - europe and especially asia businesses are growing much
09:50:51 [tadej]
... faster than u.s. companies, likely due to legal issues
09:52:07 [tadej]
... twitter is increasingly popular, fastest growth
09:53:26 [tadej]
... 60% of tweets are non-english, but twitter localized only in 7 languages
09:53:39 [tadej]
rrsagent, draft minutes
09:53:40 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
09:54:17 [tadej]
paula: companies engage in hyper-local strategies, twitter account-per-region
09:54:36 [joerg]
joerg has joined #mlw
09:55:41 [tadej]
paula: twitter brought new metric: TPS - tweets per second
09:56:00 [tadej]
scribe: tadej
09:56:58 [tadej]
paula: smartphones becoming the relevant computing platform
09:57:06 [tadej]
rrsagent, draft minutes
09:57:06 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
09:58:13 [tadej]
paula: why are companies engaging? because SM allows them to really interact with the users
09:58:51 [tadej]
scribe: tadej, r12a, fsasaki
09:59:40 [tadej]
paula: strategies of social media: 1) single centralized controlled SM outlet
10:00:14 [tadej]
scribes: fsasaki,r12a,tadej
10:00:29 [Steven]
Steven has joined #mlw
10:00:31 [tadej]
rrsagent, draft minutes
10:00:31 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
10:01:10 [tadej]
paula: 2) decentralited local pages - more effective, but users have more control
10:01:35 [Steven]
s/scribe: various/scribe: fsasaki/
10:01:41 [Steven]
rrsagent, make minutes
10:01:41 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven
10:03:15 [Steven]
i/christian lieske: everyone mentioned that categorisation/scribe: r12a
10:03:21 [Steven]
rrsagent, make minutes
10:03:21 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven
10:03:24 [tadej]
paula: it is still a huge opportunity - example: coca-cola has 250 people who are tasked with buying keywords
10:04:13 [Steven]
i/paula: introducing how social media/scribe: tadej
10:04:20 [Steven]
rrsagent, make minutes
10:04:20 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven
10:04:45 [tadej]
... important assertions: it is happening quickly, it's huge and growing. instantly available content has more value that quality content
10:06:19 [tadej]
paula: the real-time aspect also affects localization processes - when localizing a message, the process might take too much time
10:07:06 [tadej]
... real-time multilingual communication does not leave space for pre- and post- editing, leaving a lot of human intervention out
10:08:10 [tadej]
paula: last assertion: machine translation is being increasingly more relevant for SM outlets
10:09:17 [tadej]
topic: Presentation of Maarten de Rijke - Emotions, experiences and the social media
10:11:41 [tadej]
maarten: intro - academics are not concerning with standards per-se, but trying to get things done
10:12:19 [tadej]
... talk will be about standards supporting intelligent information access of content
10:13:38 [tadej]
... in social media, people still do the same things, but online instead of offline
10:15:23 [tadej]
... presenting concrete project of a political mashup
10:16:17 [tadej]
... gather political social media content, debates, analyze and semantify it. political scientists are interested in tracking topic ownership
10:17:04 [tadej]
... traditionally, this resesarch was conducted via clasisic clipping, now via social media.
10:17:19 [tadej]
... however, data gathered this way is increasingly multilingual
10:18:22 [tadej]
... another project, CoSyne, about cross-completing wikipedia pages using different language articles on the same topic
10:18:30 [Steven]
s/clasisic/classic
10:19:40 [tadej]
... third example: The Mood of the Web - Livejournal has mood annotated blogs, serving as a stream of mood-annotated data
10:21:59 [tadej]
... when following mood patterns accross time, you can try to interpret them, for instance "shocked", "tired"
10:23:49 [tadej]
... what would explain a huge spike in "shocked" in 2008. by combining livejournal streams with news and counting word usage statistics, it turns out that it was the death of actor heath ledger.
10:26:17 [tadej]
... showing a time series on stress measurements, showing a spike at the end of the year - that sort of analyses require a lot of technology for text processing and information extraction
10:27:57 [tadej]
... introducing Fietstas, a multilingual en/nl text processing engine as infrastructure for what was presented
10:29:32 [tadej]
topic: presentation from Gustavo Lucardi - Nascent Best Practices of Multilingual SEO
10:29:56 [tadej]
rrsagent, draft minutes
10:29:56 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
10:30:35 [tadej]
gustavo: comparing the SEO process with preparing a gourmet meal
10:31:05 [tadej]
... posing the question, "what are the right ingredients for multilingual SEO?"
10:32:36 [tadej]
... high search engine positioning is very important, holding potential for high revenue
10:33:11 [tadej]
... introducing terms: SEO, MSEO, SMO, Social SEO as different strategies in the field
10:33:55 [tadej]
an important distinction is that whereas in SEO traffic comes from search engines, in SMO traffic comes from social media
10:34:35 [tadej]
gustavo: for example, 500 tweets have more effect than 500 incoming links
10:35:15 [tadej]
... however, SEO still has higher ROI than SEO
10:36:42 [tadej]
... an important concept in SEO is the long tail effect in certain business models
10:38:02 [tadej]
... just translating keywords does bring traffic, but has low conversion rates
10:39:31 [tadej]
... for effective multilingual, international SEO, he recommends the W3C Language Standards as basic rules
10:40:33 [tadej]
... SEO can be multilingual, internation or geographical, not mutually exclusive among these.
10:41:24 [tadej]
gustavo: what did we learn doing it:
10:41:26 [tadej]
... 1) focus on the long tail and niche market
10:41:33 [tadej]
... 2) conversions, not traffic
10:41:44 [tadej]
3) things change, iterate
10:43:32 [tadej]
gustavo: showing examples - a legal company campaign was successful once they used correct glossary translations
10:44:11 [tadej]
... healthcare insurance campaign was better once they regionalized their content
10:44:53 [tadej]
... hotel chain: 12 languages, necessary to cover all
10:45:18 [tadej]
topic: presentation from Chiara Pacella - Controlled and uncontrolled environments in social networking websites and linguistic rules for multilingual websites
10:47:00 [tadej]
chiara: the talk will be around control of content and the implications of having a controlled vs. uncontrolled content
10:48:16 [tadej]
... controlled environment - the user does not have influence, the content is relatively static
10:49:18 [tadej]
... in a controlled environment, the developers work with strings with sentences, which are then combined
10:50:02 [tadej]
... in an uncontrolled component, the content is very dynamic, developers have limited control - they combine it with the controlled component before outputting
10:51:21 [tadej]
... even in a single sentence, there may be a combination of controlled and uncontrolled strings
10:51:59 [tadej]
... in the translator's view, the content is treated as token variables
10:52:51 [tadej]
... explaining their approach to i18n: handling languages with gender, number, declensions, etc.
10:53:20 [tadej]
... different languages may have different needs than the source language
10:53:48 [chaals]
s/3) things/... 3) things
10:54:23 [tadej]
... they solve that by "dynamic string explosion", which enables a translator to have multiple translations for the same source string depending on the linguistic context
10:54:46 [tadej]
s/an important distinction/... an important distinction
10:55:55 [tadej]
chiara: ... in romanian, the translator must specify gender, but in finnish and russian, it is even more complicated
10:56:36 [tadej]
... an important aspect and the point of this talk is that facebook users are the translators
10:57:47 [tadej]
... considering machine translation, but haven't implemented it yet
10:58:58 [tadej]
... french was translated in 24 hours, released in three weeks, now supporting 67 language, many released without professional review
10:59:10 [tadej]
... review process:
10:59:26 [tadej]
... 1) translating the glossary of individual terms
10:59:41 [tadej]
... 2) translating the content
10:59:53 [tadej]
... 3) professional supervision and checking
11:00:24 [tadej]
... the tool supports both inline and bulk translation for in- and out- of context translation
11:01:35 [tadej]
... why use community translation: 1) users are domain experts 2) speed 3) reach
11:02:22 [tadej]
... why do users translate: personal satisfaction and pride, leaderboard of translation statistics
11:02:41 [tadej]
rrsagent, draft minutes
11:02:41 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
11:03:01 [tadej]
topic: presentation of Ian Truscott, Customizing the multilingual customer experience – deliver targeted online information based on geography, user preferences, channel and visitor demographics
11:04:33 [tadej]
ian: SDL is an international company, and itself also faces the multilingual problem
11:06:26 [jan]
jan has joined #mlw
11:06:38 [tadej]
... big themes: social media and different devices, how information is shaping opinions, relevant content is often in the user's language
11:07:54 [tadej]
... reiterates the point that buyers are sensitive to the language of the content when buying
11:10:40 [tadej]
... while around 50% of tweets are english, it is diminishing
11:11:35 [tadej]
... connecting with visitors: be relevant, listen, understand, engage
11:11:48 [tadej]
... this requires monitoring solution
11:12:12 [tadej]
s/requires monitoring solution/requires monitoring solutions
11:13:07 [tadej]
ian: understanding: finding common interests accross languages, demographics and geographies
11:13:18 [tadej]
... it turns out that the common interests are key
11:15:40 [tadej]
... content should be relevant, and better relevance via localisation is reflected in better effectiveness of communication
11:16:37 [tadej]
... presenting the journey of the customer engagement, from research of products to buying and customer support
11:17:37 [tadej]
... for the customer's journey, there's a lot of content with which the user engages that needs to be appropriate
11:18:01 [davidf]
davidf has joined #mlw
11:19:36 [tadej]
... if people are coming to the website, they are trying to get stuff done, so 'user engagement' may be an obstacle
11:20:39 [tadej]
... users' expectations have changed
11:21:00 [tadej]
... they expect content in their own language
11:21:23 [tadej]
topic: Users Q&A
11:22:18 [tadej]
DavidGrunwald: to paula - you haven't discussed whether you have the tools in place to harness social media?
11:24:00 [tadej]
paula: they don't crowdsource, they crowd manage - using input from users of various levels of skills, split the work into tasks and monitor that
11:24:51 [tadej]
DavidGrunwald: you are not letting the crowd control the message, as you claimed in your talk
11:25:12 [tadej]
paula: the content that I am referring to is not always in public or social media
11:25:51 [lbellido]
lbellido has joined #mlw
11:26:36 [tadej]
ian: with social media, you can translate and listen, but you have to be caution with translating and speaking (with automatic tools)
11:27:02 [tadej]
ian: agree with crowdsourcing, but it needs to be a love brand, for which people want to write
11:28:46 [tadej]
LukeS: on exploding translations in facebook - there is an open source project that unicode consortium supports that handles a subset of the language morphology problem
11:29:44 [tadej]
DanTufiş: to maarten - what theories is your work relying on
11:30:14 [tadej]
maarten: machine translation as core technologies, political science as application
11:30:45 [tadej]
DanTufiş: points out Osgood's work on subjectivity with using wordnet to extract sentiment
11:31:44 [tadej]
Steven: points out that in paula's presentation, it was not the internet that took 4 years to 50 million, but the WWW
11:31:53 [tadej]
rrsagent, draft minutes
11:31:53 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html tadej
11:32:20 [tadej]
tadej has left #mlw
11:32:44 [Steven]
Steven has left #mlw
12:20:26 [Steven]
Steven has joined #mlw
12:33:59 [joerg]
joerg has joined #mlw
12:34:38 [joerg]
start scribing / policy session
12:35:11 [fsasaki]
fsasaki has joined #mlw
12:35:17 [joerg]
jaap van der meer starts with naming different devices
12:35:58 [chaals]
ScribeNick: joerg
12:35:58 [joerg]
... reports on last standard summit in Boston
12:36:38 [fsasaki]
topic: Presentation from Jaap van de Meer
12:36:46 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html fsasaki
12:37:12 [r12a]
r12a has joined #mlw
12:38:07 [joerg]
... interoperability questionnaire
12:38:39 [joerg]
... and interestedness in standards in particular
12:39:29 [joerg]
... quotes some of the statements regarding costs
12:40:10 [joerg]
... where is the friction? mostly TM followed by terminology
12:41:07 [joerg]
... reasons to support: freedom of tool choice
12:41:31 [Steven]
Steven has joined #mlw
12:42:06 [joerg]
... biggest barriers: lack of compliance, lack of maturity, etc.
12:43:27 [joerg]
... sort of restistance against interop. such as market drop-down
12:43:37 [IanTruscott]
IanTruscott has joined #mlw
12:43:43 [omstefanov]
omstefanov has joined #mlw
12:44:34 [joerg]
... different perspectives of believers
12:47:43 [joerg]
... realists point of view such as "accept market forces", "show business advantage", "restistance to tools", etc.
12:49:54 [joerg]
... and now the pragmatists: "they have hope..." ;-)
12:51:27 [ChriLi]
ChriLi has joined #mlw
12:52:29 [joerg]
... future outlook (5 years!)
12:52:54 [lbellido]
lbellido has joined #mlw
12:53:46 [jan]
jan has joined #mlw
12:53:51 [joerg]
... content increase, multimedia, mobile, more cross-lingual challenges, ...
12:54:52 [joerg]
... brief SWOT analysis (see other TAUS publications too)
12:57:24 [joerg]
... information pyramid representing content disruption
12:58:07 [joerg]
... apply pyramid to SWOT graphic
12:58:54 [joerg]
... business model attributes: old vs. new
13:00:49 [joerg]
... e.g. TM is core vs. data is core; one- vs. multi-directional; word based pricing vs. SaaS; GMS vs. MT embedded
13:02:46 [joerg]
... enterprise in 5 years need a language strategies
13:04:39 [joerg]
... last slide: interoperability agenda
13:05:29 [joerg]
... more changes in the next 5 years than in the past 25 years
13:07:02 [joerg]
Next speaker: Fernando Servan
13:08:38 [joerg]
talks about the challenges of multilinguality for international organizations
13:09:25 [joerg]
... gives the context of the food and agriculture organization of the UN
13:10:45 [joerg]
... 6 languages (en, fr, es, arabic, cz, ru); approx. 12 m words/year
13:11:37 [joerg]
... English has the largest share of doc lang.
13:12:50 [joerg]
... websites in 6 lang. and regional relevance content in 3 lang.
13:15:09 [joerg]
... challenges for doc. and web content: tech., prof. profiles, workflow, "consumer" languages
13:18:20 [joerg]
... additional challenges are: rules and regulations, re-use of translations, TM/MT integration
13:19:07 [joerg]
... no analysis or lessons learned available currently
13:20:31 [joerg]
... envison the employment of CMS, CAT-tools, extend prof. profiles, optimize workflows
13:21:34 [joerg]
... under discussion: employment of open source software, cloud services, etc.
13:22:13 [joerg]
... funding could be based on current SME call of the EC
13:23:21 [joerg]
Next speaker: Stelios Peperidis
13:24:13 [chaals]
s/Next/Topic:/
13:24:22 [joerg]
talks about language resources sharing initiative in the context of MetaNet
13:24:39 [chaals]
s/Peperidis/Piperidis/
13:26:40 [joerg]
... introduces the objectives and structure of Meta-Net, focus will be on Meta-Share
13:27:51 [joerg]
... emphasis the key challenge of data and how it relates on LT research and development
13:29:33 [joerg]
... another important point in the initial discussions was standards
13:31:15 [joerg]
... observations: making data employable is costly
13:32:36 [joerg]
... Meta-Share shall be an open infrastructure that enables interoperability on various layers
13:33:50 [joerg]
... it is also built on existing projects and initiatives that already in this broad field
13:35:10 [joerg]
... as an umbrella organization which shall also include national efforts
13:35:31 [chaals]
s/talks about language resources sharing initiative in the context of MetaNet/Stelios: [talks about language resources sharing initiative in the context of MetaNet]/
13:35:59 [chaals]
s/talks about the challenges of multilinguality for international organizations/Fernando: [talks about the challenges of multilinguality for international organizations]/
13:36:02 [joerg]
... the main idea of the Meta-Share architecture is distribution based on a "meta schema" model
13:36:04 [chaals]
s/Next/Topic:/
13:36:36 [chaals]
s/start scribing/Topic:/
13:37:18 [chaals]
s/jaap van der meer starts with naming different devices/Topic: Speaker - Jaap van der Meer/
13:37:22 [joerg]
.... users/consumers will have the possibility to search, browse and download resources
13:37:56 [chaals]
i/... reports on last standard summit in Boston/Jaap: begins by asking for the different ways people call a mobile phone/
13:39:03 [joerg]
... fully supports open source developments including appropriate maintenance
13:40:44 [joerg]
... Meta-Share governance is given by members and associate members; legal issues are under cc
13:42:17 [joerg]
Start of discussion of Policy Session
13:43:38 [joerg]
Chaals: Word count is going down does mean translation workload decreases. Speculate on the implications?
13:43:55 [chaals]
s/does mean/does not mean/
13:44:17 [chaals]
s/implications/implications where in fact we get more complex multimedia to include in the mix/
13:45:07 [joerg]
Jaap: Identification of different rating criteria; human interference; word count is unmanageable; more demand
13:45:30 [joerg]
... for MT but with different pricing models
13:46:27 [joerg]
Fernando: New challenges through users; relying on help from different sites
13:47:48 [joerg]
Stelios: Subtitling has a different approach based on intellectual capabilities needed; time of media content
13:48:07 [joerg]
... mutiplied by a certain factor
13:48:42 [joerg]
Chaals: Who owns the data question?
13:50:10 [joerg]
Steven: In the Netherlands all films are subtitled... quotes a translator "we are payed by the word"? What would the
13:50:22 [joerg]
integration of MT mean?
13:51:54 [joerg]
Stelios: Translation based on a "master file", i.e. the translation pricing model applies.
13:52:50 [joerg]
Reinhard: Subtitling for free i.e. by volenteers?
13:53:54 [joerg]
Chaals: Student's translations, shipped to India; there are several models...
13:55:54 [joerg]
Stefanov: Some points need to be highlighted: PEs, interpretation vs. translation, different multi-media presentations,
13:56:13 [joerg]
quality control will change, etc.
13:56:41 [joerg]
Chaals: You mean librarians?
13:57:23 [joerg]
Stefanov: Not really... the picture is changing.
13:59:17 [chaals]
[modern librarians learn to manage digital multimedia collections, and don't have to have their hair in a bun anymore. I am often surpriesd that they are not present at all at conferences like this - it seems we're missing out on expertise that seems highly relevant]
13:59:50 [joerg]
Christian: MT in subtitling already existing, e.g. in Scandinavia. Question on (gov) rules to multimedia?
14:00:45 [joerg]
Chaals: I have seen such rules but there are a lot of options.
14:00:53 [joerg]
END of Session
14:04:37 [tadej]
tadej has joined #mlw
14:10:22 [tadej]
tadej has left #mlw
14:11:17 [chaals]
s/(gov) rules to/whether there are policies aimed at reducing translation costs by limiting use of/
16:51:26 [RRSAgent]
RRSAgent has joined #mlw
16:51:26 [RRSAgent]
logging to http://www.w3.org/2011/04/05-mlw-irc
16:51:31 [Steven]
rrsagent, here?
16:51:31 [RRSAgent]
See http://www.w3.org/2011/04/05-mlw-irc#T16-51-31
16:51:53 [Steven]
rrsagent, make minutes
16:51:53 [RRSAgent]
I have made the request to generate http://www.w3.org/2011/04/05-mlw-minutes.html Steven