IRC log of mlw on 2010-10-26

Timestamps are in UTC.

07:20:11 [RRSAgent]
RRSAgent has joined #mlw
07:20:11 [RRSAgent]
logging to
07:20:24 [fsasaki]
meeting: Multilingual Web Workshop
07:20:28 [fsasaki]
chair: Richard
07:20:34 [fsasaki]
scribe: various
07:20:45 [fsasaki]
07:23:53 [fsasaki]
present: many people
07:23:59 [fsasaki]
topic: welcome
07:28:02 [fsasaki]
welcome address from UPM
07:30:55 [fsasaki]
introduction to the workshop by Richard Ishida (W3C)
07:31:52 [Jirka]
Jirka has joined #mlw
07:33:33 [fsasaki]
Richard introduces the "Multilingual Web" project - aims, goais
07:34:31 [fsasaki]
project homepage
07:37:37 [fsasaki]
topic: Talk from Kimmo Rossi
07:39:31 [Jirka]
Jirka has joined #mlw
07:40:15 [fsasaki]
kimmo: I am project officer of this project. Lot of enthusiasm of participants - a dream team of coverage of different areas
07:40:23 [Jirka_]
Jirka_ has joined #mlw
07:40:41 [fsasaki]
.. so project can really make a difference for the multilingual web
07:40:58 [Jirka_]
Jirka_ has left #mlw
07:41:28 [Jirka__]
Jirka__ has joined #mlw
07:42:15 [fsasaki]
.. this project is about much more than standardization
07:43:09 [Jirka]
Jirka has joined #mlw
07:43:14 [fsasaki]
.. EC has made a commitement on the "digital agenda" in Europe
07:43:40 [fsasaki]
.. how communication technology can help to solve European / global challenges
07:44:45 [fsasaki]
.. trying to boost innovation and faster uptake of research results by the industry
07:45:11 [fsasaki]
.. 8th framework program now "research and innovation" as focus
07:45:47 [joerg]
joerg has joined #mlw
07:45:53 [fsasaki]
.. we need good input to discussions about 8th framework program. Nobody knows how the Web will look in 10 years
07:47:15 [fsasaki]
.. so we want to make use of the opinions of the stakeholders - people like you
07:47:51 [fsasaki]
.. scribing sessions is a very important job. Scribes should help us to get conclusions & recommendations to process outcomes of this event
07:49:31 [fsasaki]
.. my job is to sell money to people who have good ideas. I make an attempt to convince them to work in / on European projects
07:49:45 [fsasaki]
.. 50 mill. Euros for projects in Language Technology available
07:50:14 [fsasaki]
.. areas: multilngual content processing, including Machine Translation, chain of authoring / managing multilingual online content
07:50:40 [fsasaki]
s/multilingual/one is multilingual/
07:50:50 [RRSAgent]
I have made the request to generate fsasaki
07:51:12 [fsasaki]
.. another area: on multilingual information access & mining
07:51:48 [fsasaki]
.. and third area: natural speech interaction
07:52:10 [fsasaki]
.. call I just described is ongoing, we are taking submissions now
07:52:33 [fsasaki]
.. a future call for 1st Feburary 2011: SME initiative for digital content and languages
07:53:05 [fsasaki]
.. focused on SME, but consortia can encompass also large companies & research institutes. At least 2 SMEs need to be involved
07:53:54 [fsasaki]
.. data sharing & pooling
07:56:33 [fsasaki]
.. mlw project is about standards. Can we address standards in upcoming calls?
07:56:50 [fsasaki]
.. yes, we can. But rather than having a project developing standards, put standards in action
07:57:02 [fsasaki]
.. build something useful around the standard
07:57:22 [fsasaki]
.. thank you for your time, enjoy the workshop!
07:57:31 [paaln]
paaln has joined #mlw
07:57:32 [fsasaki]
topic: keynote from Reinhard Schäler
07:59:55 [fsasaki]
"The Multilingual Web, Policy Making and Access to Digital Knowledge for All"
08:04:54 [fsasaki]
reinhard: we made a survey on mlw - see current state of results at
08:05:47 [fsasaki]
.. standards and commercial interested are sometimes in a difficult relation
08:08:53 [fsasaki]
.. there are about 800.000 standards around
08:10:23 [fsasaki]
.. getting a standard through requires political cleverness, friends, power to push against strong interests
08:10:57 [fsasaki]
.. standards in localization: Encoding (Unicode), Quality (e.g. LISA Q/A), data exchange (XLIFF, TMX), Metrics
08:12:06 [fsasaki]
.. Unicode was successful since industry players came together and just did it, also giving up their own existing work on encoding
08:13:15 [fsasaki]
.. people are not so much interest in standards, but by what they can make with them
08:14:10 [fsasaki]
.. expectations and reality are often different things - e.g. sometimes people say they support XLIFF, but they "just" can read / import / export XML files
08:17:51 [fsasaki]
.. comparison to a bus stop - do you want to be in a standardized environment with the bus on time, or in an environment with the bus being delayed and you have time to talk to your friends?
08:18:59 [fsasaki]
.. standards means making compromises. You don't want to wait for a committee, you just do it
08:19:11 [fsasaki]
.. also you don't want to cooperate with your competitors
08:22:25 [fsasaki]
.. data exchange and process management are important too
08:22:50 [fsasaki]
.. you always want to keep an advantage compared to your competitors
08:23:10 [fsasaki]
.. where are we? 19 billion $ industry
08:23:26 [fsasaki]
.. but highly fragmented. Some (2,3,4) dominant players
08:24:34 [fsasaki]
.. short term ROI oriented
08:25:01 [fsasaki]
.. localization industry was established in the 80ies since companies want to sell products in many regions / countries
08:25:58 [fsasaki]
.. LOC people don't look ahead long term, because of short term ROI
08:26:06 [fsasaki]
.. so who can drive mlw?
08:26:41 [RRSAgent]
I have made the request to generate fsasaki
08:28:18 [fsasaki]
.. we made a survey, see results link here
08:29:03 [fsasaki]
.. many people want to have the multilingual web - so why isn't it happening?
08:30:38 [fsasaki]
.. localization for all is now in focus:
08:31:02 [fsasaki]
.. more people / languages / content, user drive / own / manage the content
08:31:13 [fsasaki]
.. networks become standards based & interoperable
08:31:48 [fsasaki]
.. that is happening in the non L10N world - why couldn't it happen also in the localization world?
08:31:58 [fsasaki]
.. companies have to give up illusion of control
08:33:06 [fsasaki]
.. focus has to be on impelementations and benefits for the people
08:33:20 [fsasaki]
.. there is a fundamental right: access to information
08:33:35 [fsasaki]
.. that does not need to be judged by business case
08:35:06 [fsasaki]
.. drivers for this change: maybe not large cooperations, but nonprofit sector
08:35:26 [fsasaki]
.. nonprofit translation is the world largest translation movement
08:35:44 [fsasaki]
.. motivation is to make the world a better place
08:36:55 [fsasaki]
.. a forum to achieve that: Intetnet governance forum (IGF)
08:37:01 [fsasaki]
.. UN's IGF working group
08:39:44 [fsasaki]
.. standards, access to technology, and skill are important
08:42:44 [fsasaki]
.. as we can see in the "Close Encounters Of The Third Kind" clip
08:43:37 [fsasaki]
.. if aliens could talk, we would not understand them but we could try to develop technologies to achieve that
08:44:59 [fsasaki]
08:55:50 [E_N]
E_N has joined #mlw
09:10:43 [Jirka]
Jirka has joined #mlw
09:10:54 [Jirka]
scribe: Jirka
09:11:10 [Jirka]
topic: Developers session
09:12:11 [Jirka]
chaired by Adriane Rinsche
09:13:00 [RRSAgent]
I have made the request to generate Jirka
09:19:40 [xxx_]
xxx_ has joined #mlw
09:19:58 [fsasaki]
fsasaki has joined #mlw
09:20:48 [Jirka]
Adriane: announcements
09:21:06 [Jirka]
... Mark Davis (Google) will videocast tommorŕow at 16:30
09:21:52 [Jirka]
topic: The Multilingual Web: Latest developments at the W3C/IETF
09:22:05 [chaals]
chaals has joined #mlw
09:22:06 [Jirka]
by Richard Ishida (W3C)
09:22:25 [chaals]
rrsagent, draft minutes
09:22:25 [RRSAgent]
I have made the request to generate chaals
09:23:02 [Jirka]
Richard: introduces W3C
09:23:31 [chaals]
s/scribe:various/scribe: Felix/
09:23:35 [Jirka]
... 22 activities, 50 working groups and more
09:23:45 [chaals]
s/scribe: various/scribe: Felix/
09:24:03 [chaals]
i/scribe: Felix/scribenick: fsasaki/
09:24:14 [Jirka]
... internationalization activity is part of W3C work
09:24:25 [chaals]
rrsagent, draft minutes
09:24:25 [RRSAgent]
I have made the request to generate chaals
09:24:40 [Jirka]
... standards supporting multilingual web are
09:25:21 [Jirka]
... Unicode, W3C technology is built on top of Unicode
09:25:30 [chaals]
scribenick: Jirka
09:25:50 [Jirka]
... 70% of web pages are using Unicode encodings
09:26:56 [Jirka]
... some mistakes solved recently, XML 5ed extends characters for identifiers
09:27:41 [Jirka]
... Unicode normalization, W3C proposed to use NFC form
09:28:52 [Jirka]
... work on allowing national characters in resource identifiers -- IDN (International Domain Names)
09:29:40 [Jirka]
... in June IANA started to release internationalized top-level domain names
09:30:16 [Jirka]
... IRI allows to internationalized path part of resource identifier
09:31:02 [Jirka]
... language tags, 8000 subtags, described in BCP47
09:32:15 [Jirka]
... Speech Synthesis Markup Language
09:32:29 [RRSAgent]
I have made the request to generate Jirka
09:33:25 [Jirka]
... CSS3 adds more internationalization support
09:35:22 [Jirka]
... browser implementers have to support all details in different langauges
09:36:11 [Jirka]
... browser developers need to have reason to implement support for i18n features
09:36:34 [Jirka]
... they need to hear from users that i18n features are critical
09:37:12 [Jirka]
... vertical text is needed in Japan, Korea, Thailand, China; covered by CSS3 module
09:38:31 [Jirka]
... there are problems with mixing various script
09:38:52 [Jirka]
... Ruby annotation
09:39:11 [Jirka]
... there is CSS3 Ruby module
09:39:44 [Jirka]
... HTML5 has implemented Ruby but differently then other specs, need for convergence
09:40:10 [chaals]
[HTML5 tried to copy IE's implementation to be interoperable with existing usage (which is why it is different from the original spec)]
09:40:18 [Jirka]
... plans to support complex Ruby
09:41:21 [chaals]
[/me is excited about the requirements for layout, because it has motivated groups in a number of other languages to do the same]
09:41:23 [Jirka]
... Requirements for Japanese Layout are used as an input to several specs, including XSL-FO, HTML, CSS
09:41:59 [Jirka]
.. Web Fonts - ability to use custom fonts in web pages
09:42:12 [Jirka]
s/.. /... /
09:42:51 [Jirka]
... there are still subsetting and licensing issues
09:43:52 [Jirka]
... HTML5 - language identification, ability to specify dates in standardized way [using uF approach]
09:44:12 [Jirka]
... HTML5 new input types for forms
09:44:42 [Jirka]
... issues with bidirectional markup
09:45:56 [Jirka]
... there are additional requirements related to ordering and alignment of text
09:46:34 [Jirka]
... MathML 3.0 supports arabic math typesetting
09:46:49 [Jirka]
... ITS -- there will be separate talk by Christian
09:47:02 [Jirka]
... The rise of Mobile Web
09:47:27 [Jirka]
... MW4D
09:48:05 [Jirka]
... Best practices developed by W3C
09:48:17 [Jirka]
... there are also tests
09:48:48 [RRSAgent]
I have made the request to generate Jirka
09:49:37 [chaals] -> W3C Internationalisation Activity (lots of useful links and things)
09:51:00 [Jirka]
... I18N Checker
09:51:34 [Jirka]
... there is also MobileOK checker
09:51:53 [Jirka]
... Web is about people, not about technology
09:52:15 [Jirka]
... we need you to make Web worldwide
09:52:47 [Jirka]
topic: Localizing the web from the Mozilla perspective
09:53:01 [Jirka]
by Axel Hecht (Mozilla)
09:53:27 [Jirka]
s/the web/the Web/
09:54:27 [Jirka]
Axel: Firefox 4 will change User-Agent header, no more locale info here, use Accept-Language instead
09:54:42 [Jirka]
... 80+ localizations
09:54:56 [Jirka]
... community driven
09:56:00 [Jirka]
... it is challange to make work everything on all platforms
09:56:17 [Jirka]
... negotiating content language
09:56:39 [Jirka]
... balance between best content and user privacy
09:57:37 [Jirka]
... problems in Javascript, eg. Date.toLocaleString() is not truly i18n
09:58:36 [Jirka]
... bettwe APIs -- for BP47, site-specific Accept-Langauge
09:59:20 [Jirka]
Axel: web sites at Mozilla
09:59:28 [Jirka]
... mostly static content
10:00:00 [Jirka]
... locale dependant content
10:00:32 [Jirka]
... data-driven sites, not easy
10:00:59 [Jirka]
... live multi-lingual documents, like documentation, knowledge base, ...
10:01:41 [Jirka]
... how to differentiate about added translation or bug-fix in a content that should be propagated to pages in other languages
10:02:02 [Jirka]
... international feedback button
10:04:06 [Jirka]
... several existing Wiki system used, noone really sufficient
10:04:12 [Jirka]
... developing own Kitsune system
10:05:19 [Jirka]
... question: what functionality is missing in the browsers (in general or in Firefox)?
10:05:39 [Jirka]
... question: localizing HTML5 content on the client
10:05:59 [Jirka]
... question: managing live multilingual docs
10:06:20 [Jirka]
question from auditorium about handling speech
10:06:51 [Jirka]
Axel: I'm not working in speech area
10:07:49 [Jirka]
topic: The Web everywhere, multilingualism at Opera
10:08:02 [Jirka]
by Charles McCathieNevile (Opera)
10:08:43 [Jirka]
Chaals: Opera was created especially to support non-English web
10:09:08 [Jirka]
... Opera supports all kind of devices
10:09:23 [RRSAgent]
I have made the request to generate Jirka
10:09:47 [Jirka]
Chaals: some key markets:
10:10:07 [Jirka]
... Japanese -- how to make it work on mobile phones
10:10:24 [Jirka]
... Russian
10:10:38 [Jirka]
... Vietnamese -- multiple diacritics over one character
10:10:48 [Jirka]
... India, Iran -- phones
10:11:04 [Jirka]
... technology/standards used
10:11:22 [Jirka]
... originaly used UTF-16, good for CJK efficiency
10:11:45 [Jirka]
... UTF-8 is better for real world
10:11:55 [Jirka]
... using getttext PO files for software
10:12:07 [Jirka]
... content for l10n is handled in XLIFF
10:12:57 [Jirka]
... translation of Opera Desktop by agencies and volunteers (for minor languages)
10:13:27 [Jirka]
... Opera Mini translated only by agencies; 100+ langauges, because space constraints
10:13:43 [Jirka]
... Widgets and extensions are translated by developers
10:14:58 [Jirka]
... My.Opera content translated by community
10:15:05 [Jirka]
... issues:
10:15:18 [Jirka]
... XLIFF more complicated then Opera needs
10:15:58 [Jirka]
... translators tools -- we use open-source tools and some inhouse stuff
10:16:37 [Jirka]
... different translation agencies use different software, so it is hard to change agencies
10:17:01 [Jirka]
... word-breaking dictionaries are getiing large
10:17:23 [Jirka]
... problem for embeded devices (TV, game consoles, ...)
10:18:20 [Jirka]
... layout (RTL, vertical), scary part of browser code, very expensive to run
10:18:39 [Jirka]
... without clear message from users there is no interest in touching this complex code
10:18:50 [Jirka]
... people issues:
10:19:30 [Jirka]
... people don't understand how translation works
10:20:12 [Jirka]
... now everything is translated from English
10:20:52 [Jirka]
... we are trying to use multiple source languages to cater translators
10:21:18 [Jirka]
... it is harder to maintain quality of such translations, but you can have larger translator community
10:22:14 [Jirka]
presentation suddenly ends with alarm sound
10:22:36 [Jirka]
topic: Bridging languages, cultures, and technology
10:22:55 [Jirka]
by Jan Nelson (Microsoft), Peter Constable (Microsoft)
10:23:50 [chaals]
s/expensive to run/expensive to change/
10:24:51 [chaals]
a/complex code/, so the fact that communities have started being clear about a need for this and writing documents on what is required is important
10:25:33 [chaals]
s/are trying/have tried/
10:26:13 [chaals]
[My presentation had got to the end exactly when the alarm sounded! I thought that was perfect timing :P ]
10:26:18 [Jirka]
Jan: MS does a lot of translation, l18n and globalization
10:26:34 [chaals]
rrsagent, draft minutes
10:26:34 [RRSAgent]
I have made the request to generate chaals
10:26:37 [Jirka]
... Microsoft research working on Microsoft Translate service
10:27:15 [Jirka]
... WikiBhasha -- browser-based multilingual content creator for Wikipedia
10:27:38 [Jirka]
10:28:51 [Jirka]
... open-source tool, currently supporting 35 languages
10:29:49 [Jirka]
... Microsoft Local Language Program
10:31:12 [Jirka]
Peter: about enabling mlw web
10:31:23 [Jirka]
... IE is localized in 95 langauges
10:32:16 [chaals]
i/welcome address from UPM/scribenick: fsasaki
10:32:19 [chaals]
rrsagent, draft minutes
10:32:19 [RRSAgent]
I have made the request to generate chaals
10:32:55 [Jirka]
s/presentation suddenly ends with alarm sound/
10:33:39 [Jirka]
Peter: pages in UTF-8 are growing (over 50% of content)
10:34:00 [chaals]
s/[My presentation had got to the end exactly when the alarm sounded! I thought that was perfect timing :P ]//
10:34:14 [Jirka]
... HTML/CSS is ready for multlingual web
10:34:48 [Jirka]
... issues in separation of content and application code
10:35:16 [Jirka]
... client-server interaction issues -- handling prefered language when travelling, ...
10:36:40 [RRSAgent]
I have made the request to generate Jirka
10:38:42 [Jirka]
Peter: show some examples of rendering various scripts on HTML page and on HTML5 canvas
10:39:08 [Jirka]
10:39:49 [Jirka]
Question for Alex: Why you have changed several Wiki implementations? Was it because poor localization support.
10:40:42 [Jirka]
Alex: content developers want visual editing (no wiki codes). For another wikis I don't know.
10:42:11 [Jirka]
Question from Christian Lieske (SAP): There are some issues in handling i18n content by Webkit based applications. Is this caused by core (webkit) or applications?
10:42:44 [Jirka]
Alex: there are several rendering engines, no representative from project using webkit is here
10:44:25 [Jirka]
Chaals: If there is no functionality in rendering engine, it simply doesn't work and webdevelopers has to find workarounds. Parties has to work together to implement the most demanded features.
10:45:35 [Jirka]
Question from David (BBC World Service): Who is responsible for rendering complex scripts -- browser engine or underlying operating systems?
10:46:37 [Jirka]
Peter: In Windows we serve more then browsers, so we have this functionality in OS. Some browsers use Windows functionality, some depend on their own rendering engine.
10:47:49 [Jirka]
Chaals: some devices don't have any such support, we have this in browser engine to support various devices
10:49:10 [Jirka]
Alex: It depends on the platform.
10:49:40 [Jirka]
... you have to have good fonts for scripts/languages. This is not easy for minor languages
10:51:01 [Jirka]
... the problem is that developes of rendering engine don't have knowledge of foreign languages
10:52:03 [Jirka]
Question from Jörg Schultz (BioLoom): There are two groups working on HTML5 -- WHATWG and W3C HTML WG. How this will evolve?
10:52:20 [Jirka]
Chaals: HTML5 spec will be produced by W3C.
10:53:01 [Jirka]
... WHATWG is very informal and open place for developers playing with possible HTML5 features.
10:53:30 [Jirka]
... some features from WHATWG were removed from W3C HTML5
10:56:43 [Jirka]
Question from Felix Sasaki: Do you see need for common way marking up what should/should not be translated?
10:57:20 [Jirka]
Chaals: It is not matter of browser, browser doesn't do translation.
10:58:00 [Jirka]
Richard: Google supports it, Hixie (editor of HTML5) dismissed this feature proposed by Microsoft [not sure if I scribed it correctly]
10:58:22 [chaals]
s/Google/Google translation/
10:58:56 [Jirka]
Alex: if we are going to support localization directly in browser, we will consider supporting it
10:59:20 [Jirka]
Peter: this is not relevant for browser, but for upstream process when content is created and translated
10:59:29 [chaals]
s/doesn't do translation./doesn't do translation. You can use XHTML and add that right now, and it will not be a problem/
11:00:03 [chaals]
s/proposed by Microsoft [not sure if I scribed it correctly]/proposed by Microsoft for inclusion in HTML5/
11:01:22 [Jirka]
Closing question, where MLW is going in the next couple of years?
11:01:55 [Jirka]
Richard: [missed this]
11:02:25 [Jirka]
s/Richard: [missed this]//
11:03:17 [chaals]
Richard: There are a huge number of things to work on...
11:03:41 [Jirka]
Chaals: A lot of things deserve better support, eg. vertical text
11:04:08 [Jirka]
Jan: work together with local governments on support of more languages
11:04:33 [RRSAgent]
I have made the request to generate Jirka
12:06:16 [fsasaki]
fsasaki has joined #mlw
12:06:36 [RRSAgent]
I have made the request to generate fsasaki
12:06:36 [chaals]
chaals has joined #mlw
12:07:49 [fsasaki]
scribe: fsasaki
12:08:42 [fsasaki]
topic: creators session introduction
12:09:03 [fsasaki]
chaals: session about making content
12:09:07 [fsasaki]
chaals introduces the speakers
12:09:46 [fsasaki]
topic: Challenges for a multilingual news provider: pursuing best practices and standards for BBC World Service
12:10:01 [fsasaki]
talk by Roberto Belo Rovella, David Vella
12:10:48 [fsasaki]
roberto: in charge of bbc world services
12:11:07 [fsasaki]
.. our focus now on online platforms
12:12:56 [fsasaki]
.. we are multilingual site, but each site has its own editors, i.e. not direct translation
12:14:17 [fsasaki]
.. recently re-released news site in 32 sites, with new fonts and Unicode support
12:15:48 [fsasaki]
s/news site in 32 sites/news site in Burmese/
12:16:09 [fsasaki]
Roberto gives over examples for Chinese market
12:17:07 [fsasaki]
later uszbek site
12:17:13 [fsasaki]
12:17:42 [chaals]
[chinese: produced in simplified chinese, can be auto-converted to traditional]
12:18:17 [chaals]
[Uzbek site: Uzbekistan is moving from writing in cyrillic to using the same language in latin. Plus arabic]
12:18:55 [fsasaki]
roberto: lot'f of other markets, e.g. Brazil
12:19:06 [fsasaki]
.. 5% of the BBC traffic
12:20:29 [fsasaki]
.. we have to package content (javascript, XML, CSS, ....) in some cases to deliver it properly
12:21:18 [fsasaki]
.. past: challenges for multilinguality: creating fonts / input methods "from scratch"
12:22:28 [fsasaki]
.. operating systems get better, so this has become much easier
12:23:12 [fsasaki]
.. BBC was one of the first content providers to publish in Unicode
12:23:51 [fsasaki]
.. was a hard challenge. Some websites offered content as GIFs
12:24:03 [fsasaki]
.. people said "why don't you use a font we are already using?"
12:24:26 [fsasaki]
.. at the end we prevailed, Unicode successed. But we lost a lot of the market
12:25:59 [fsasaki]
.. in some languages we had to use English. Urdu now works on the local script
12:26:35 [fsasaki]
.. new look for Arabic website, with new fonts
12:27:03 [fsasaki]
.. publication depends also on other parts of the website, like a localized video player
12:27:44 [fsasaki]
.. there are still web sites offering images instead of fonts, but solutions are coming
12:28:09 [fsasaki]
.. currently on a mobile displaying hindi, a user sees only boxes
12:28:38 [fsasaki]
.. iOS 4 is close to the correct rendering
12:29:00 [fsasaki]
.. reading with that display is very hard
12:29:23 [fsasaki]
.. 70% of devices in India cannot display the Hindi text properly
12:29:49 [fsasaki]
.. we created an image based solution, in addition to the text based one
12:30:07 [fsasaki]
.. we publish both, and have links from the images to the text on the paragraph level
12:30:34 [fsasaki]
.. so we ignored the (W3C) advice of not using images for character display
12:30:54 [fsasaki]
.. we will not replace the text based version in the CMS
12:31:09 [fsasaki]
.. images display on every device, we control the rendering
12:31:32 [fsasaki]
.. we used Pango text rendering library
12:31:44 [fsasaki]
.. average page size is 45 KB, text only is 20 KB
12:32:02 [fsasaki]
.. launched Aug 2010, Hindi mobile traffic up by 50%
12:32:12 [fsasaki]
.. this is only temporary
12:32:29 [fsasaki]
.. Nokia and Samsung, they only localize the UI, so the situation is not changing
12:33:04 [fsasaki]
.. new mobile phones use their own software, no standard solution
12:33:25 [fsasaki]
.. not standards based, but they have 30% market share in India, we can't ignore them
12:34:11 [chaals]
s/new mobile phones/white-label or cheap brand lookalike/
12:35:05 [fsasaki]
.. collaboration with Google. Users see only messages in their own language, but real time translation in / from other languages
12:35:34 [fsasaki]
.. size of areas on a page is changing depending on the languages involved
12:36:13 [chaals]
s/from other languages/from other languages. Results were interesting (but not always coherent...)/
12:36:27 [fsasaki]
roberto: wishful thinking: create once, publish everywhere
12:36:51 [fsasaki]
.. encourage proper font rendering
12:37:58 [fsasaki]
.. offer language expertise to mobile manufactueres
12:38:07 [fsasaki]
.. rapidly deprecate support for older browsers
12:39:58 [fsasaki]
alex o conner (CNGL): have you thought of processing binary assets?
12:40:27 [fsasaki]
(scribe missed answer)
12:41:06 [chaals]
[The images are generated on publishing, and left as static. If you correct, you regenerate new images (not an expensive process, like printing)
12:41:47 [fsasaki]
topic: presentation from Loquendo
12:42:55 [fsasaki]
paolo: many people cannot read, in many circumstances you don't have written language
12:43:01 [fsasaki]
.. need to handle speech too
12:43:24 [fsasaki]
.. in the last 10 years W3C started a voice browser and multimodal working group
12:43:56 [fsasaki]
.. today quite a few parts of speech processing is controlled by standards
12:44:14 [fsasaki]
.. other important thing is the language subtag registry
12:44:53 [fsasaki]
.. example. small speech application asking "what do you want to drink"
12:45:22 [fsasaki]
.. recognition of speech means: you have to create grammars
12:45:30 [fsasaki]
.. even the grammar has xml:lang
12:45:39 [fsasaki]
... you can also create a multilingual grammar
12:46:04 [fsasaki]
.. that uses language identifiers for defining and re-defining the language
12:46:39 [fsasaki]
.. tts means e.g. "reading a book"
12:46:51 [fsasaki]
.. richard mentioned SSML. We developed 1.0, later 1.1
12:47:12 [fsasaki]
.. Chinese people said that they need improvements
12:47:36 [fsasaki]
.. now (in 1.1.) there is a tag to specify the language also for pieces of texts
12:47:45 [fsasaki]
.. another point: you need to have a voice speaking
12:48:05 [fsasaki]
.. other application areas: dubbing or gaming
12:48:30 [fsasaki]
.. you can be more precise in SSML 1.1, e.g. for specifying the accent
12:49:35 [fsasaki]
(very nice :) ) demo of phonetic mapping to change spoken language
12:49:45 [fsasaki]
.. e.g. English spoken by Germans
12:50:07 [fsasaki]
.. another standard: PLS 1.0, a lexicon used to correct errors
12:50:18 [fsasaki]
.. for specific words, like locations, proper names
12:51:02 [fsasaki]
.. application for TTS or or speech recognition
12:51:30 [fsasaki]
.. BPC 47 did a lot of things, but there is a need to standardize phonetic alphabets more in detail
12:52:48 [fsasaki]
.. development tool - LoquendoTTs director editing tool
12:53:07 [fsasaki]
.. speech is another way of using the web
12:53:14 [fsasaki]
.. standards help to create speech applications
12:53:37 [fsasaki]
.. work by IANA / BCP 47 is good, but need to extend it for phonetic alphabets
12:54:08 [fsasaki]
peter_constable(Microsoft): your request about phonetic alphabets
12:54:22 [fsasaki]
.. there are subtags registered to denote that content is in IPA
12:54:47 [fsasaki]
.. or other phonetic alphabets
12:55:04 [fsasaki]
paolo: did not follow the discussion in detail
12:55:25 [fsasaki]
.. but idea was to have two registries, one for all subtags, one for phonetic alphabet only
12:55:51 [fsasaki]
topic: Experiences in creating multilingual web sites - talk by Luis Bellido
12:57:47 [fsasaki]
luis: multilingual search in catalogue
12:57:56 [chaals] -> lingu@net europa web site
12:57:58 [fsasaki]
.. currently restricted, search only on multilingual metadata
12:58:09 [fsasaki]
.. 32 different languages
12:59:16 [fsasaki]
.. people involved: language teaching professionals
12:59:29 [fsasaki]
.. no professional translators
12:59:41 [fsasaki]
.. have problems to get used to translation memory etc.
13:00:00 [fsasaki]
.. so we have to develop a process to create the site
13:00:25 [fsasaki]
.. we created our own solution, after looking into existing ones
13:00:56 [fsasaki]
.. relies on: utf-8, HTML, XML, CSS, MS-Office, Apache, Java Servlets, Tomacat, Lucene, Zope
13:01:02 [fsasaki]
.. po files
13:02:58 [fsasaki]
luis describes the workflow in detail
13:03:15 [fsasaki]
13:04:29 [fsasaki]
luis: now about "multilingual links"
13:04:40 [fsasaki]
.. we have same resources in different languages
13:04:49 [fsasaki]
.. but not all the languages are on the site
13:05:01 [fsasaki]
.. e.g. we have a link to "more information"
13:05:22 [fsasaki]
.. but we don't have let's say a Spanish version
13:05:33 [fsasaki]
.. in that case we give a box presenting the languages available
13:06:12 [fsasaki]
.. question is what to show to the user: the current language (of the user), the versions in other languages
13:06:33 [fsasaki]
.. we'd need a CMS supporting all these scenarios
13:06:41 [fsasaki]
.. that is for a whole site
13:08:23 [fsasaki]
.. initial prototype: using XSLT to generate multilingual links
13:08:46 [fsasaki]
.. not sure if that is something to standardize: how to present multilingual links
13:08:56 [fsasaki]
.. would be good to have a solution for that in CMS
13:09:21 [fsasaki]
topic: Pedro L. Díez Orzas, Giuseppe Deriard, Pablo Badía Mas - Key Aspects of Multilingual Web Content Life Cycles: Present and Future
13:09:34 [fsasaki]
presentation from linguaserve
13:10:26 [fsasaki]
pedro: for us multilingual content is content in motion
13:11:15 [chaals]
.me points out that people know Zara, until a native spanish speaker asks about it...
13:11:22 [fsasaki]
.. current state: multilingual web site
13:11:34 [chaals]
s/.me points out that people know Zara, until a native spanish speaker asks about it...//
13:12:03 [fsasaki]
s/web site/webservices/
13:12:23 [fsasaki]
.. current model has problems, since there are more and more language to serve
13:12:34 [chaals]
rrsagent, draft minutes
13:12:34 [RRSAgent]
I have made the request to generate chaals
13:12:54 [fsasaki]
.. using "translatability data type definition" (tDTD)
13:13:06 [fsasaki]
.. indicating which part to translate and what not
13:13:24 [fsasaki]
.. e.g. if you change a price, no need to translate that
13:13:34 [Jirka]
Jirka has joined #mlw
13:13:35 [fsasaki]
.. attributes control if content has been translated or not
13:14:02 [fsasaki]
.. systems are sophisticated for multilingual publishing, but
13:14:12 [fsasaki]
.. multilingual content web life cycles:
13:14:18 [fsasaki]
.. we work with 7 different CMS
13:14:27 [chaals]
13:14:41 [fsasaki]
.. they do not consider multilinguality in the content life cycle
13:15:02 [fsasaki]
.. so no real management for different language version
13:16:11 [fsasaki]
.. localization - three ways:
13:16:15 [fsasaki]
.. direct access to CMS
13:16:51 [fsasaki]
13:16:57 [fsasaki]
13:17:03 [fsasaki]
.. or offline access
13:17:27 [fsasaki]
.. thrid way: automatic real-time machine translation
13:17:35 [fsasaki]
13:17:44 [fsasaki]
.. we want professional quality
13:18:23 [fsasaki]
.. disconnect between CMS; MT and language version allows for a reduction of implementation costs
13:18:28 [fsasaki]
.. now near future
13:18:35 [fsasaki]
.. "everything is hybrid"
13:18:54 [fsasaki]
.. in globalisation: combining different services
13:19:10 [fsasaki]
.. in localization: combining different production systems
13:19:33 [fsasaki]
.. like onlne and offline
13:20:00 [fsasaki]
.. combining several translation methodologies, like MT + professional post editing
13:21:40 [fsasaki]
.. good to use XHTML and good source content
13:21:56 [fsasaki]
.. summary: CMS has to take into account that multilingual content managment is important
13:22:05 [fsasaki]
.. all of them need to collaborate
13:22:56 [fsasaki]
christian_liekse(SAP): how addition of machine translation can help to reduce complexity
13:23:08 [fsasaki]
.. e.g. in content management
13:23:28 [fsasaki]
pedro: MT for translating critical content may be a problem
13:23:39 [fsasaki]
.. but for translating content that changes every two hours, MT is useful
13:23:47 [fsasaki]
.. develops also on the language pairs
13:23:58 [fsasaki]
.. we are working more on integration methodologies
13:24:21 [fsasaki]
.. using XML can be a part of the solution, real time HTML another one
13:24:27 [fsasaki]
.. how much each, depends on the client
13:25:12 [fsasaki]
topic: Max Froumentin World Wide Web Foundation - The Remaining Five Billion: Why is Most of The World's Population Not Online and What Mobile Phones Can Do About It
13:25:27 [fsasaki]
max: introducing the world wide web foundation
13:27:53 [fsasaki]
.. now: 75% of world population have access to the Web
13:28:27 [fsasaki]
.. 1.2 Billion people are able to use the internet
13:28:42 [fsasaki]
.. but we normally don't use messaging into account like SMS
13:28:56 [fsasaki]
.. SMS is the Web, same with Voice
13:29:19 [fsasaki]
.. e.g. if you call an interactive voice system
13:29:30 [fsasaki]
.. its another way to access the Web
13:29:39 [fsasaki]
.. messaging and voice are the WEb
13:30:01 [fsasaki]
.. Web as it was created was: a desktop PC, HTTP
13:30:11 [fsasaki]
.. now we have mobile browers
13:30:26 [fsasaki]
.. there is also HTTP, but also apps, widgets
13:30:36 [fsasaki]
.. the system still uses HTTP, goes to a URI
13:30:49 [fsasaki]
.. but the user does not see the URI
13:31:01 [fsasaki]
.. an SMS gateway uses that too
13:31:09 [fsasaki]
.. we claim that the Web is not only a browser
13:31:29 [fsasaki]
.. many people have access to e.g. SMS, even if they don't have a browser on the mobile phone
13:32:18 [fsasaki]
.. voice browser applications
13:32:37 [fsasaki]
.. consequences of this situation: SMS input / output problems
13:32:52 [fsasaki]
.. no documentation for SMS, no authoring tools
13:34:13 [fsasaki]
.. in voice: prompts ok, acceptance of dialog systems / NLP low
13:35:23 [fsasaki]
.. no content that is interesting enough to send an SMS
13:35:32 [fsasaki]
.. little knowledge about applications you can build with SMS
13:35:38 [fsasaki]
.. no knowledge about business models
13:36:24 [fsasaki]
.. Web for regreening alliance
13:36:34 [fsasaki]
.. big problems with farmers in Sahara region
13:36:51 [fsasaki]
.. one guy has invented a way to grow trees in the desert
13:37:08 [fsasaki]
.. there are thousands of other farmers who don't know about that
13:37:32 [fsasaki]
.. the guy who has the knowledge can't read, but he has a mobile phone
13:37:53 [fsasaki]
.. the foundation will help to build an application to record advice etc.
13:38:09 [fsasaki]
.. other farmers will be able to share their knowledge
13:38:22 [fsasaki]
.. via accesing the web by voice and SMS
13:38:55 [fsasaki]
.. another project: cgnet swara, in India
13:39:04 [fsasaki]
.. a voice application to do citizen journalism
13:39:10 [fsasaki]
.. there is no news in the local language
13:39:23 [fsasaki]
.. using the system participants record a story, calling a number
13:39:31 [fsasaki]
.. other people get that story via the web
13:39:40 [fsasaki]
.. people are willing to pay for the information
13:39:55 [fsasaki]
.. so people are able to make businesses out of this
13:40:38 [fsasaki]
topic: Q/A of creators session
13:41:16 [fsasaki]
jörg_schutz: you said that users hesitate to press a button, but you need that in your projects
13:41:22 [fsasaki]
.. what about acceptance?
13:41:41 [fsasaki]
max: if the people think that the system is useful, they use it
13:41:52 [fsasaki]
.. in Kenia they had an existing system for banking
13:42:19 [fsasaki]
.. they designed both an SMS based and a voice based system to access banking information
13:42:41 [fsasaki]
jörg: so its a learning process for everybody
13:42:56 [fsasaki]
thierry_declerk(DFKI: interesting to see so many news in many languages
13:43:08 [fsasaki]
.. it seems that it is more the broadcaster doing the publication
13:43:18 [fsasaki]
.. and not newspaper publishers
13:43:43 [fsasaki]
.. 2nd question: would you make your content available for e.g. language technology?
13:43:59 [fsasaki]
roberto: to 2nd question, would like to
13:44:13 [fsasaki]
.. but might sometimes not be able to do so, due to restrictions of contributors
13:45:37 [fsasaki]
thierry: in Germany there is a legislation now which forces broadcaster to take away content
13:45:42 [fsasaki]
roberto: not here
13:46:55 [fsasaki]
josef(CNGL): question for pedro
13:47:03 [fsasaki]
.. in a context for localization company
13:47:21 [fsasaki]
.. is something changing of business models with mobile web?
13:47:35 [fsasaki]
pedro: telephone companies are asking for solutions
13:47:46 [fsasaki]
.. problem is not only technical
13:48:08 [fsasaki]
.. problems of text / mobile phones are similar to the other web, maybe different with voice
13:48:24 [fsasaki]
paolo: there are problems with mobile, but no solution yet
13:48:48 [fsasaki]
pedro: training of systems, noise etc. makes it very hard to create general purpose applications
13:49:11 [fsasaki]
paolo: companies doing that were relying on humans, but that was no sustainable business model
13:49:28 [fsasaki]
.. we need to use machines, but solution is still a way ahead
13:49:44 [fsasaki]
natasha_brown(wiki-translate): BBC has a lot of teaching materials
13:49:51 [fsasaki]
.. copyright is about xyz years
13:50:15 [fsasaki]
.. can BBC afford to give up copyright, so that children can learn British English
13:50:19 [chaals]
13:50:56 [fsasaki]
roberto: cannot answer
13:51:15 [fsasaki]
axel(mozilla): our right-to-left community said
13:51:42 [fsasaki]
.. keep it left-to-right, since there is e.g. no video player which does right-to-left
13:51:54 [fsasaki]
.. so they would no be confused by having right-to-left
13:52:22 [fsasaki]
roberto: trying to do things 100% right is not always the solution
13:52:40 [fsasaki]
axel: please don't go for the multilingual links, they are awful
13:53:11 [fsasaki]
luis: not going for a specific solution, just trying to find solutions
13:53:23 [fsasaki]
axel: we use accept-language header, locale info etc.
13:53:30 [fsasaki]
chaals: what do you do if that does not work?
13:53:39 [fsasaki]
axel: file a bug in the browser
13:54:04 [fsasaki]
michael_staffanov(former-un): at w3c and various speakers
13:54:21 [fsasaki]
.. there should be something in HTML that let's us tag "this page is multilingual"
13:54:50 [fsasaki]
.. this is something more and more important as technologies like machine translation evolves
13:55:39 [fsasaki]
.. to BBC: "to be able to say 'this is the spanish version of the UK page'" is important
13:55:51 [fsasaki]
.. there should be a tag in HTML that says "there are other pages available
13:56:23 [fsasaki]
chaals: there is, you can use explicit tags which are machine interpretable for linking to other languages
13:57:22 [fsasaki]
pedro: no way to distinguish between pages that have been translated , and the ones in the original language
13:57:43 [fsasaki]
richard: there is a way to link to different language version, but not many browsers implement it
13:58:09 [fsasaki]
claudio(lionbridge): pedro mentioned real-time translation
13:58:36 [fsasaki]
.. when you talk about machine translation and "real time" translation, concept of "good enough" needs to be taken into account
13:58:44 [fsasaki]
pedro: "hybrid" is the answer
13:59:08 [fsasaki]
.. we can work with MT, including all kinds of processing (statistical, rules, ...)
13:59:20 [fsasaki]
.. we have to be realistic in front of our customers
13:59:32 [fsasaki]
.. merging different types of filters is important
14:00:06 [fsasaki]
.. we can memorize pages, combine MT with xyz, but not in every language
14:00:27 [fsasaki]
.. at the moment we have good results with the "filters" approach
14:00:37 [fsasaki]
.. with close enough languages we have good results
14:00:50 [fsasaki]
.. but with e.g. Spanish and English, it does not work so well
14:01:21 [fsasaki]
.. problem of MT is not a problem for language pairs, but for a given text
14:01:38 [fsasaki]
.. we resolve problem for text of a given client, not for any text
14:02:14 [fsasaki]
UNKNOWN-SPEAKER: how do you deal with interoperability between CMS?
14:02:28 [fsasaki]
pedro: have a web service (SOAP-based)
14:05:56 [fsasaki]
chaals closes the session
14:06:19 [RRSAgent]
I have made the request to generate fsasaki
14:07:22 [nicoletta]
nicoletta has joined #mlw
14:20:15 [me]
me has joined #mlw
14:20:34 [Roberto]
Roberto has joined #mlw
14:21:01 [me]
nobody else in the channel at the moment
14:21:27 [sven_]
sven_ has joined #mlw
14:21:45 [me]
here at the bottom you add text
14:21:53 [me]
and at the top t appears
14:22:03 [me]
coffee time
14:25:38 [E_N]
E_N has joined #mlw
14:29:01 [E_N]
scribe test
14:33:25 [E_N]
Christian Lieske on Best Practices and Stanadards for Improving Globalization-related Processes
14:37:03 [E_N]
Felix: Introducing and reminding delegates about cocktail reception
14:38:21 [E_N]
Christian: Intro re best practices and then moving to wish list
14:39:01 [Jirka]
Jirka has joined #mlw
14:39:08 [E_N]
Christian: presentation is for those with little or lots of knowledge, not so informative for those with middle knowledge
14:39:16 [fsasaki]
fsasaki has joined #mlw
14:40:04 [chaals]
chaals has joined #mlw
14:41:08 [E_N]
Christian: core processes - how are they managed with best practices? Standards.
14:41:55 [fsasaki]
topic: Christian Lieske (SAP) on best practices and standards for improving globalization-related processes
14:42:27 [E_N]
Christian: many technological components to take into account
14:42:36 [E_N]
thank you! ;)
14:43:15 [E_N]
Christian: Does globalization really matter? Why does it matter?
14:43:15 [chaals]
content passes through a chain and there are a lot of things that help improve quality - translation memories, ...
14:43:26 [Sven]
Sven has joined #mlw
14:43:41 [E_N]
1/3rd of money that is involved usually goes to translators
14:43:47 [E_N]
therefore everything else can be overhead
14:44:10 [E_N]
there may be a problem with globalization processes?
14:44:39 [E_N]
we do not have simple processing chains, there are many of them, i.e software, docuementation, training
14:45:08 [E_N]
and there are many people involved...and they need to communicate together - we have to manager this with best practices and standards
14:45:22 [chaals]
14:45:39 [E_N]
basic best practices - when you start to do something please start with best
14:45:54 [E_N]
s/best practices
14:46:24 [E_N]
make sure that all metadata is able to travel with data that has to be globalized
14:46:34 [chaals]
s/1/... 1/
14:46:34 [E_N]
thanks chaals
14:46:47 [E_N]
I was not aware of this
14:46:48 [chaals]
s/therefore everything/... therefore everything/
14:47:12 [chaals]
s/there may be/... there may be/
14:47:31 [E_N]
"good resources for best practices - w3c node - xml internationalization best practices"
14:47:33 [chaals]
s/we do not/... we do not/
14:47:47 [E_N]
chaals you are an expert at this
14:48:46 [E_N]
"pseudo translation - enables you to find problems quickly - before begin translating - allows one to save time and money - and to prevent tension"
14:49:01 [E_N]
"best practice rule = get terminology in order!
14:49:03 [E_N]
14:49:26 [chaals]
14:49:39 [chaals]
14:50:10 [E_N]
"Christian - please take care of terminology and source content quality - to keep downstream processes clean and less troublesome"
14:50:37 [E_N]
"Christian - please automate these early processes to ensure consistency early on"
14:51:13 [chaals]
... like this
14:51:19 [E_N]
14:51:23 [chaals]
s/... like this//
14:52:02 [chaals]
TMX = Translation Memory Exhange (a standard for managing assets in translation
14:52:25 [E_N]
me/ chaals I suggest that you take over, I am afraid I will be producing poor quality notes
14:52:43 [Sven2]
Sven2 has joined #mlw
14:52:54 [chaals]
but will take over.
14:52:57 [Sven]
Sven has left #mlw
14:53:08 [E_N]
14:54:02 [E_N]
"Christian machines and humans need the right type of informtion - to ensure consistency - an important standard is ITS
14:54:31 [E_N]
"Christian ITS very important for tagging"
14:54:53 [E_N]
"XLIFF helps to unify the world, allows to do away with all the myriad formats"
14:54:56 [chaals]
scribe: chaals
14:55:20 [chaals]
Christian: With XLIFF you don't need multiple filters, you just have one format.
14:55:34 [chaals]
... The ITS is about describing resources. It is about explaining things
14:55:56 [chaals]
... e.g. someone comes with content and says "make this in 3 languages"
14:56:24 [chaals]
... and you ask "are there parts in there that shouldn't be translated because they are trademarks or software commands"?
14:56:36 [chaals]
... ITS helps you to provide this sort of information.
14:56:49 [E_N]
scribe: Elliot
14:57:10 [chaals]
scribenick: E_N
14:57:26 [E_N]
scribe: E_N
14:57:49 [lbellido]
lbellido has joined #mlw
14:57:53 [E_N]
Christian: virtues of standards enable easier data transfer between environments (saving time)
14:58:22 [chaals]
chaals has changed the topic to: Multilingual Web workshop. For help on IRC type /msg rrsagent help (chaals)
14:58:28 [E_N]
... some people may disagree with standards, perhaps not applicable in real world
14:58:40 [E_N]
... things are not as simple in the real world applications
14:58:53 [E_N]
... world is much more complex
14:59:25 [E_N]
... a reality check - the scope of standards are not adequate (either to large or complex)
14:59:47 [E_N]
... also some standards may not be mature enough i.e some miss conformance clauses -
14:59:55 [lbellido]
lbellido has joined #mlw
15:00:51 [E_N]
... another issue as that there are not many implementations of standards and the completness of implementations are sub-optimal
15:01:13 [E_N]
... they only implement a fraction of the standard (related to standards being too broad or large)
15:01:41 [E_N]
... there can be data loss between transfer
15:01:59 [E_N]
reducing efficeincy
15:02:16 [E_N]
15:02:36 [E_N]
... there is still much scope for improvement
15:03:04 [E_N]
... How are standards created?
15:03:11 [E_N]
... by accident?
15:03:22 [E_N]
... with grand pretensions?
15:03:44 [E_N]
... what issues can these methods bring?
15:04:32 [chaals]
s/reducing efficeincy/... reducing efficiency/
15:04:40 [E_N]
... 5m safety system to avoid accidents - by liasing between teams and coordinating
15:05:21 [E_N]
... the 5ms are a way to reduce issues during globalization life cycle
15:06:11 [idx]
idx has joined #mlw
15:06:36 [E_N]
... 5ms are essentially ensuring that people, processes and technology are able to communicate adequatley
15:06:51 [E_N]
... for smooth globlization processes
15:07:06 [Roberto]
Roberto has joined #mlw
15:07:53 [Roberto]
Need cocktal now!
15:08:23 [E_N]
... meta - work with standardized vocab - reduced subset (this reduces complexity and therefore issues related to data transfer between globalization workflow steps)
15:09:37 [E_N]
Josef van Genabith: Where are we going?
15:09:41 [chaals]
Topic: Next Generation Localisation
15:10:11 [E_N]
... overview (next generation and future)
15:10:28 [RRSAgent]
I have made the request to generate Jirka
15:10:42 [E_N]
... starting with mega trends
15:10:51 [E_N]
Topic: what is localization?
15:11:44 [E_N]
... it is the industrial process of adapting digital content to culture, locale and linguistic environment, but why? For ROI.
15:11:58 [claudio]
claudio has joined #mlw
15:12:24 [E_N]
... core topics - volume (great magnitude of content and target languages)
15:12:50 [E_N]
... also there is a shift from business content to user generated and social media
15:13:24 [E_N]
... additionally the way people access data is evolving rapidly
15:14:01 [E_N]
... people have much more in common than they did in the past
15:14:09 [E_N]
... people are becoming globalized
15:14:17 [r12a]
r12a has joined #mlw
15:14:44 [E_N]
... and therefore people are the ultimate locale - they are the specific person who we are creating content for
15:14:54 [E_N]
... related to custom content and semantic web
15:15:26 [E_N]
... but how can we balance these various factors?
15:17:57 [E_N]
... example: localization often goes together with customer support - it has to be multilingual, this data shows that for every 10,000 that call there are an additional 100K who use the website to self server + an additional 300K using external forums (customers have direct access to peer groups, before contacting the customer support teams)
15:18:32 [E_N]
... in fact, the customer support teams may even be the last to know about the fixes which are available in the online forums
15:18:45 [labra]
labra has joined #mlw
15:19:13 [r12a]
felix ?
15:19:29 [E_N]
... currently state of the art solutions are kluges of online/offline translation tools + mt + user generated content
15:19:59 [E_N]
... many ways to skin a cat - lots of potential methods to reduce costs and improve processes
15:20:25 [E_N]
... must use a holistic view
15:20:33 [E_N]
... to ensure a mature approach
15:20:56 [E_N]
... feedback between content generation and localization and content delivery
15:21:18 [E_N]
... how it is delivered is related to how it should be created and translted.
15:21:45 [E_N]
s/translated /translted
15:21:54 [E_N]
15:22:21 [chaals]
s/and translted/and translated/
15:22:27 [E_N]
... there is much to do, but we can all play a part
15:22:42 [E_N]
... this work is not just in the scope of large business
15:22:58 [lbellido]
lbellido has joined #mlw
15:23:34 [E_N]
Marko Grobelnik:
15:24:00 [E_N]
Topic: Cross-lingual information retrieval
15:24:03 [E_N]
15:24:36 [fsasaki]
s/Cross-lingual information retrieval/Cross-lingual document similarity for Wikipedia languages/
15:25:02 [E_N]
scribe: E_N
15:25:32 [E_N]
Marko: examples of cross-lingual information retrieval on Reuters RCV 2 news corpus
15:26:05 [E_N]
... where does this info relate to text related research fields?
15:26:24 [E_N]
... there are many fields, what is the difference between the areas?
15:27:19 [E_N]
... main difference is that each areas represent text in different ways, one doc can be seen as a sequence of characters - as we increase structure we see lexical patterns expanding and context recognition becoming obvious
15:28:12 [E_N]
... essentially from simple to complex and the ideal solutions for large scale cross-lingual IR?
15:29:50 [E_N]
...uses of this = language neutrality relies on statistical methods
15:30:37 [E_N]
... this method enables the clear representation of languages and allows for further user analysis on data
15:31:35 [E_N]
... using these stat techniques we can map documents into document neutral representations
15:32:18 [E_N]
... in summary - this technique provides for a way to analyse cross-lingual data
15:32:48 [E_N]
Topic: Q&A
15:33:13 [E_N]
... are there any results from cross-ling?
15:33:22 [chaals]
15:33:24 [E_N]
... yes, but fuzzy results
15:33:33 [E_N]
... many problems and challenges
15:33:35 [chaals]
s/Richard /Richard: /
15:33:46 [chaals]
s/... yes/Christian: yes/
15:33:48 [E_N]
scribe suggests that delegates do further research
15:34:43 [chaals]
Q: How does this relate to DBPedia?
15:35:03 [E_N]
Q: What features does this use?
15:35:40 [E_N]
A: words and phrases are only feature
15:36:14 [E_N]
Felix: Introducing new speaker, please fill out feedback forms!
15:36:19 [chaals]
i/Q: What features/A: DBPedia addresses different use cases, so it has some conceptual similarities but is different
15:36:41 [E_N]
... new speaker is Daniel Grasmick
15:36:59 [E_N]
Daniel: delighted to be here for the first time,
15:37:12 [E_N]
... presentation will be less technical, pragmatic and candid
15:37:31 [E_N]
... from Lucy software, a young company combining language and technology
15:38:11 [E_N]
... three pillars SAP app translation, MT, Standards
15:38:36 [E_N]
... Daniel started as a translator, then worked in MT
15:38:44 [E_N]
... used to sell MT all around the world
15:38:56 [E_N]
... Daniel has noticed how industry has chnged
15:39:12 [E_N]
... worked with SAP for years before Lucy Software
15:39:32 [E_N]
... been involved with defining LISA standards including TMX
15:39:40 [E_N]
... TBX
15:39:55 [E_N]
... Evolution of TMX -
15:40:08 [E_N]
... from open tag to TMX
15:40:27 [E_N]
... finding a unified standard for translated segments
15:40:42 [E_N]
for simple data transfer between or within a business
15:41:03 [E_N]
... provides freedom to choose, non propriatory
15:41:07 [E_N]
... from TMX
15:41:15 [E_N]
they felt the needed SRX
15:41:26 [E_N]
FYI - all of these are part of OAXAL
15:41:45 [E_N]
Open Architecture for XML authoring and localization
15:42:01 [E_N]
scribe recommends looking up OAXAL
15:42:34 [E_N]
Daniel: LISA has suffered from a lack of volunteers
15:42:51 [E_N]
... but standards are a must (rhetorical)
15:42:54 [E_N]
are they?
15:43:33 [E_N]
... sometimes there are too many flavors
15:43:48 [E_N]
... reduced subsets
15:44:02 [E_N]
...minimal meta data for easiest exchange
15:45:09 [E_N]
... XLIFF is far more practical than EXCEL
15:45:27 [E_N]
... Excel should really not be a source format (if at all possible)
15:46:22 [E_N]
Q&A: Why does SAP not directly support XLIFF?
15:46:40 [E_N]
Q: Why does SAP not directly support XLIFF?
15:47:13 [E_N]
A: Daniel - SAP text is stored in tables, hard to see potential win of transferring to XLIFF
15:47:25 [E_N]
... ROI questions perhaps?
15:47:44 [E_N]
... but solution will be developed at some point
15:47:53 [E_N]
... future is bright!
15:48:39 [E_N]
Q: From Proff Reinhard - How do you get standards developed? Whose interested and why?
15:49:06 [E_N]
... to delegates > how do you get them made?
15:50:08 [E_N]
... Proff Reinhard continued highlighting plethora of inherint issues of developing standards
15:50:47 [E_N]
A: Christian Lieske - we make standards happen by networking activities (by developing understanding, the first step is knowing it is possible)
15:51:11 [E_N]
... secondly (people want to get problems out of the way)
15:51:31 [E_N]
Daniel: to to develop standards one needs to be a idealist
15:52:31 [E_N]
A: Standards used to eliminate competition i.e RTF (well it was a non-standard, standard)
15:53:04 [E_N]
A: mix of needs pressures, avoiding opportunities
15:53:32 [E_N]
Charles: Are people prepared to use standards if they will make a profit?
15:53:41 [E_N]
15:54:35 [E_N]
... takes real work to develop a standard
15:54:47 [E_N]
... cost 2million to create SVG
15:54:55 [E_N]
SVG 1.2 much cheaper
15:55:52 [E_N]
... companies are there for their own interests.
15:56:37 [E_N]
... in web, standards are very important
15:56:52 [E_N]
... no benefits in not having standards
15:57:03 [E_N]
... specifically on the web
15:57:44 [E_N]
... Governments can spearhead standards i.e S1000D
15:57:58 [E_N]
15:58:09 [chaals]
s/SVG 1.2 much cheaper/... SVG 1.2 was much more expensive/
15:58:17 [E_N]
... from SGML to XML
15:58:36 [E_N]
standardizing the language of standards
15:58:52 [E_N]
... reducing the complexity to reduce the potential for error
15:59:16 [chaals]
s/no benefits in not/companies work on standards where there is no benefits in not/
15:59:30 [chaals]
scribe: chaals
15:59:48 [chaals]
Comment: A big motivation is governments (especially military purchasing)
15:59:55 [chaals]
... wanting not to be locke to a supplier
16:00:23 [chaals]
Paulo: Working in speech with standards is great - it means people can build a big market
16:00:52 [chaals]
... Loquendo uses standards as a differentiating factor like Opera "trust us now because we are standards-based which means you're not locked into us"
16:01:35 [chaals]
Denis: Webkit question for developer panel. Given the same underlying objective for end users, are there specific reasons not to use webkit?
16:02:12 [E_N]
... webkit doesnt do things we want it to do, had 3rd rate svg support
16:02:25 [E_N]
... but great CSS support
16:03:05 [E_N]
... retooling engineers one must show a clear benefit - Opera assesment of webkit (some people it works for, some it doesnt)
16:03:24 [E_N]
... competition is a good thing
16:03:48 [E_N]
... single standards reduce competition in some instances
16:03:57 [E_N]
... especially in browser world
16:04:36 [E_N]
... when market shifted there was competion. Having one core browser is not a smart way to go.
16:04:44 [E_N]
... competition is important
16:04:55 [E_N]
... people can choose to implement as they see fit
16:05:00 [E_N]
... everyone to their own
16:05:16 [E_N]
... to suit their own software ecology
16:05:29 [E_N]
... and needs of internal and external users
16:06:25 [E_N]
.... customers dont want to rely on webkit (it could be show to be unsatisfactory in the future)
16:06:58 [E_N]
... suppliers disappear ( the future is unknown)
16:07:08 [chaals]
s/to rely on/to be forced to rely on/
16:07:46 [chaals]
Alex: Mozilla thinks it is important to have difference and choice, as a core value.
16:08:03 [chaals]
... And in fact there are a lot of different branched Webkits out there.
16:08:23 [E_N]
my pleasure, it was fun in a paint balling kind of way ;)
16:08:32 [chaals]
... It isn't a browser, it is a core rendering engine. A key part of a browser, but just one part.
16:08:58 [chaals]
... We gladly took other components, because they fit in well.
16:09:53 [chaals]
Josef: To go back to the point from this morning, about whether there should be a tag that identifies things that have been translated.
16:10:24 [chaals]
... there are multiple stakeholders that should be involved (e.g. in that case it didn't matter to the browser makers, but did matter to other stakeholder)
16:10:56 [chaals]
... There are now a lot of spiders using the web for machine learning to improve translation systems.
16:11:27 [chaals]
... would be interesting to identify content that has been machine-translated, so spiders can exclude that from learning because they assume it isn't that good...
16:12:11 [chaals]
PeterC: The browsers aren't impacted by the tag for things that were translated, so aren't the right people to be defining it.
16:13:17 [chaals]
??: We all have cellphones and PDAs. Power plugs aren't standardised, and there has been a lot of talk. But imagine if they had been standardised a decade ago? We would have huge horrible things. Standardisation can hold back innovation as well, so you have to be aware of the cost as well as the benefit.
16:13:56 [chaals]
i/PeterC/FS: Note the the European Commission is here... and intersted in fostering innovation
16:14:09 [chaals]
Christian: Please come to the cocktail reception and keep talking.
16:14:33 [chaals]
rrsagent, draft minutes
16:14:33 [RRSAgent]
I have made the request to generate chaals
16:14:41 [chaals]
rrsagent, this meeting spans midnight
16:17:07 [RRSAgent]
I have made the request to generate fsasaki
16:18:10 [Sven2]
Sven2 has joined #mlw