W3C

Multilingual Web Workshop

26 Oct 2010

Agenda

See also: IRC log

Attendees

Present
many, people
Regrets
Chair
Richard
Scribe
Felix, Jirka, fsasaki, chaals, Elliot, E_N

This is the raw scribe log for the sessions on day one of the MultilingualWeb workshop in Madrid. The log has not undergone careful post-editing and may contain errors or omissions. It should be read with that in mind. It constitutes the best efforts of the scribes to capture the gist of the talks and discussions that followed, in real time. IRC was used not only to capture notes on the talks, but can be followed in real time by remote participants, or participants with accessibility problems. People following IRC can also add contributions to the flow of text themselves.

See also the log for the second day.

Contents


<fsasaki> scribe: Felix

welcome

<inserted> scribenick: fsasaki

welcome address from UPM

introduction to the workshop by Richard Ishida (W3C)

Richard introduces the "Multilingual Web" project - aims, goais

project homepage http://www.multilingualweb.eu/

Talk from Kimmo Rossi

kimmo: I am project officer of this project. Lot of enthusiasm of participants - a dream team of coverage of different areas
... so project can really make a difference for the multilingual web
... this project is about much more than standardization
... EC has made a commitement on the "digital agenda" in Europe
... how communication technology can help to solve European / global challenges
... trying to boost innovation and faster uptake of research results by the industry
... 8th framework program now "research and innovation" as focus
... we need good input to discussions about 8th framework program. Nobody knows how the Web will look in 10 years
... so we want to make use of the opinions of the stakeholders - people like you
... scribing sessions is a very important job. Scribes should help us to get conclusions & recommendations to process outcomes of this event
... my job is to sell money to people who have good ideas. I make an attempt to convince them to work in / on European projects
... 50 mill. Euros for projects in Language Technology available
... areas: multilngual content processing, including Machine Translation, chain of authoring / managing one is multilingual online content
... another area: on multilingual information access & mining
... and third area: natural speech interaction
... call I just described is ongoing, we are taking submissions now
... a future call for 1st Feburary 2011: SME initiative for digital content and languages
... focused on SME, but consortia can encompass also large companies & research institutes. At least 2 SMEs need to be involved
... data sharing & pooling
... mlw project is about standards. Can we address standards in upcoming calls?
... yes, we can. But rather than having a project developing standards, put standards in action
... build something useful around the standard
... thank you for your time, enjoy the workshop!

keynote from Reinhard Schäler

"The Multilingual Web, Policy Making and Access to Digital Knowledge for All"

reinhard: we made a survey on mlw - see current state of results at http://tinyurl.com/3xgfydl
... standards and commercial interested are sometimes in a difficult relation
... there are about 800.000 standards around
... getting a standard through requires political cleverness, friends, power to push against strong interests
... standards in localization: Encoding (Unicode), Quality (e.g. LISA Q/A), data exchange (XLIFF, TMX), Metrics
... Unicode was successful since industry players came together and just did it, also giving up their own existing work on encoding
... people are not so much interest in standards, but by what they can make with them
... expectations and reality are often different things - e.g. sometimes people say they support XLIFF, but they "just" can read / import / export XML files
... comparison to a bus stop - do you want to be in a standardized environment with the bus on time, or in an environment with the bus being delayed and you have time to talk to your friends?
... standards means making compromises. You don't want to wait for a committee, you just do it
... also you don't want to cooperate with your competitors
... data exchange and process management are important too
... you always want to keep an advantage compared to your competitors
... where are we? 19 billion $ industry
... but highly fragmented. Some (2,3,4) dominant players
... short term ROI oriented
... localization industry was established in the 80ies since companies want to sell products in many regions / countries
... LOC people don't look ahead long term, because of short term ROI
... so who can drive mlw?
... we made a survey, see results link here http://tinyurl.com/3xgfydl
... many people want to have the multilingual web - so why isn't it happening?
... localization for all is now in focus:
... more people / languages / content, user drive / own / manage the content
... networks become standards based & interoperable
... that is happening in the non L10N world - why couldn't it happen also in the localization world?
... companies have to give up illusion of control
... focus has to be on impelementations and benefits for the people
... there is a fundamental right: access to information
... that does not need to be judged by business case
... drivers for this change: maybe not large cooperations, but nonprofit sector
... nonprofit translation is the world largest translation movement
... motivation is to make the world a better place
... a forum to achieve that: Intetnet governance forum (IGF)
... UN's IGF working group
... standards, access to technology, and skill are important
... as we can see in the "Close Encounters Of The Third Kind" clip
... if aliens could talk, we would not understand them but we could try to develop technologies to achieve that

break

<Jirka> scribe: Jirka

Developers session

chaired by Adriane Rinsche

Adriane: announcements
... Mark Davis (Google) will videocast tommorŕow at 16:30

The Multilingual Web: Latest developments at the W3C/IETF

by Richard Ishida (W3C)

Richard: introduces W3C

<inserted> scribenick: fsasaki

<chaals> s/scribe:various/scribe: Felix/

<Jirka> ... 22 activities, 50 working groups and more

<Jirka> ... internationalization activity is part of W3C work

<Jirka> ... standards supporting multilingual web are

<Jirka> ... Unicode, W3C technology is built on top of Unicode

<chaals> scribenick: Jirka

Richard: 70% of web pages are using Unicode encodings
... some mistakes solved recently, XML 5ed extends characters for identifiers
... Unicode normalization, W3C proposed to use NFC form
... work on allowing national characters in resource identifiers -- IDN (International Domain Names)
... in June IANA started to release internationalized top-level domain names
... IRI allows to internationalized path part of resource identifier
... language tags, 8000 subtags, described in BCP47
... Speech Synthesis Markup Language
... CSS3 adds more internationalization support
... browser implementers have to support all details in different langauges
... browser developers need to have reason to implement support for i18n features
... they need to hear from users that i18n features are critical
... vertical text is needed in Japan, Korea, Thailand, China; covered by CSS3 module
... there are problems with mixing various script
... Ruby annotation
... there is CSS3 Ruby module
... HTML5 has implemented Ruby but differently then other specs, need for convergence

<chaals> [HTML5 tried to copy IE's implementation to be interoperable with existing usage (which is why it is different from the original spec)]

Richard: plans to support complex Ruby

<chaals> [/me is excited about the requirements for layout, because it has motivated groups in a number of other languages to do the same]

Richard: Requirements for Japanese Layout are used as an input to several specs, including XSL-FO, HTML, CSS
... Web Fonts - ability to use custom fonts in web pages
... there are still subsetting and licensing issues
... HTML5 - language identification, ability to specify dates in standardized way [using uF approach]
... HTML5 new input types for forms
... issues with bidirectional markup
... there are additional requirements related to ordering and alignment of text
... MathML 3.0 supports arabic math typesetting
... ITS -- there will be separate talk by Christian
... The rise of Mobile Web
... MW4D
... Best practices developed by W3C
... there are also tests

<chaals> http://www.w3.org/International -> W3C Internationalisation Activity (lots of useful links and things)

Richard: I18N Checker http://qa-dev.w3.org/i18n-checker/
... there is also MobileOK checker
... Web is about people, not about technology
... we need you to make Web worldwide

Localizing the Web from the Mozilla perspective

by Axel Hecht (Mozilla)

Axel: Firefox 4 will change User-Agent header, no more locale info here, use Accept-Language instead
... 80+ localizations
... community driven
... it is challange to make work everything on all platforms
... negotiating content language
... balance between best content and user privacy
... problems in Javascript, eg. Date.toLocaleString() is not truly i18n
... bettwe APIs -- for BP47, site-specific Accept-Langauge
... web sites at Mozilla
... mostly static content
... locale dependant content
... data-driven sites, not easy
... live multi-lingual documents, like documentation, knowledge base, ...
... how to differentiate about added translation or bug-fix in a content that should be propagated to pages in other languages
... international feedback button
... several existing Wiki system used, noone really sufficient
... developing own Kitsune system
... question: what functionality is missing in the browsers (in general or in Firefox)?
... question: localizing HTML5 content on the client
... question: managing live multilingual docs

question from auditorium about handling speech

Axel: I'm not working in speech area

The Web everywhere, multilingualism at Opera

by Charles McCathieNevile (Opera)

Chaals: Opera was created especially to support non-English web
... Opera supports all kind of devices
... some key markets:
... Japanese -- how to make it work on mobile phones
... Russian
... Vietnamese -- multiple diacritics over one character
... India, Iran -- phones
... technology/standards used
... originaly used UTF-16, good for CJK efficiency
... UTF-8 is better for real world
... using getttext PO files for software
... content for l10n is handled in XLIFF
... translation of Opera Desktop by agencies and volunteers (for minor languages)
... Opera Mini translated only by agencies; 100+ langauges, because space constraints
... Widgets and extensions are translated by developers
... My.Opera content translated by community
... issues:
... XLIFF more complicated then Opera needs
... translators tools -- we use open-source tools and some inhouse stuff
... different translation agencies use different software, so it is hard to change agencies
... word-breaking dictionaries are getiing large
... problem for embeded devices (TV, game consoles, ...)
... layout (RTL, vertical), scary part of browser code, very expensive to change
... without clear message from users there is no interest in touching this complex code
... people issues:
... people don't understand how translation works
... now everything is translated from English
... we have tried to use multiple source languages to cater translators
... it is harder to maintain quality of such translations, but you can have larger translator community

Bridging languages, cultures, and technology

by Jan Nelson (Microsoft), Peter Constable (Microsoft)

<chaals> a/complex code/, so the fact that communities have started being clear about a need for this and writing documents on what is required is important

Jan: MS does a lot of translation, l18n and globalization
... Microsoft research working on Microsoft Translate service
... WikiBhasha -- browser-based multilingual content creator for Wikipedia
... http://www.wikibhasha.org
... open-source tool, currently supporting 35 languages
... Microsoft Local Language Program

Peter: about enabling mlw web
... IE is localized in 95 langauges
... pages in UTF-8 are growing (over 50% of content)
... HTML/CSS is ready for multlingual web
... issues in separation of content and application code
... client-server interaction issues -- handling prefered language when travelling, ...
... show some examples of rendering various scripts on HTML page and on HTML5 canvas

Q&A for Developers session

Question for Alex: Why you have changed several Wiki implementations? Was it because poor localization support.

Alex: content developers want visual editing (no wiki codes). For another wikis I don't know.

Question from Christian Lieske (SAP): There are some issues in handling i18n content by Webkit based applications. Is this caused by core (webkit) or applications?

Alex: there are several rendering engines, no representative from project using webkit is here

Chaals: If there is no functionality in rendering engine, it simply doesn't work and webdevelopers has to find workarounds. Parties has to work together to implement the most demanded features.

Question from David (BBC World Service): Who is responsible for rendering complex scripts -- browser engine or underlying operating systems?

Peter: In Windows we serve more then browsers, so we have this functionality in OS. Some browsers use Windows functionality, some depend on their own rendering engine.

Chaals: some devices don't have any such support, we have this in browser engine to support various devices

Alex: It depends on the platform.
... you have to have good fonts for scripts/languages. This is not easy for minor languages
... the problem is that developes of rendering engine don't have knowledge of foreign languages

Question from Jörg Schultz (BioLoom): There are two groups working on HTML5 -- WHATWG and W3C HTML WG. How this will evolve?

Chaals: HTML5 spec will be produced by W3C.
... WHATWG is very informal and open place for developers playing with possible HTML5 features.
... some features from WHATWG were removed from W3C HTML5

Question from Felix Sasaki: Do you see need for common way marking up what should/should not be translated?

Chaals: It is not matter of browser, browser doesn't do translation. You can use XHTML and add that right now, and it will not be a problem

Richard: Google translation supports it, Hixie (editor of HTML5) dismissed this feature proposed by Microsoft for inclusion in HTML5

Alex: if we are going to support localization directly in browser, we will consider supporting it

Peter: this is not relevant for browser, but for upstream process when content is created and translated

Closing question, where MLW is going in the next couple of years?

<chaals> Richard: There are a huge number of things to work on...

Chaals: A lot of things deserve better support, eg. vertical text

Jan: work together with local governments on support of more languages

<fsasaki> scribe: fsasaki

creators session introduction

chaals: session about making content

chaals introduces the speakers

Challenges for a multilingual news provider: pursuing best practices and standards for BBC World Service

talk by Roberto Belo Rovella, David Vella

roberto: in charge of bbc world services
... our focus now on online platforms
... we are multilingual site, but each site has its own editors, i.e. not direct translation
... recently re-released news site in Burmese, with new fonts and Unicode support

Roberto gives over examples for Chinese market

uszbek site

<chaals> [chinese: produced in simplified chinese, can be auto-converted to traditional]

<chaals> [Uzbek site: Uzbekistan is moving from writing in cyrillic to using the same language in latin. Plus arabic]

roberto: lot'f of other markets, e.g. Brazil
... 5% of the BBC traffic
... we have to package content (javascript, XML, CSS, ....) in some cases to deliver it properly
... past: challenges for multilinguality: creating fonts / input methods "from scratch"
... operating systems get better, so this has become much easier
... BBC was one of the first content providers to publish in Unicode
... was a hard challenge. Some websites offered content as GIFs
... people said "why don't you use a font we are already using?"
... at the end we prevailed, Unicode successed. But we lost a lot of the market
... in some languages we had to use English. Urdu now works on the local script

<scribe> .. new look for Arabic website, with new fonts

UNKNOWN_SPEAKER: publication depends also on other parts of the website, like a localized video player
... there are still web sites offering images instead of fonts, but solutions are coming
... currently on a mobile displaying hindi, a user sees only boxes
... iOS 4 is close to the correct rendering
... reading with that display is very hard
... 70% of devices in India cannot display the Hindi text properly
... we created an image based solution, in addition to the text based one
... we publish both, and have links from the images to the text on the paragraph level
... so we ignored the (W3C) advice of not using images for character display
... we will not replace the text based version in the CMS
... images display on every device, we control the rendering
... we used Pango text rendering library
... average page size is 45 KB, text only is 20 KB
... launched Aug 2010, Hindi mobile traffic up by 50%
... this is only temporary
... Nokia and Samsung, they only localize the UI, so the situation is not changing
... white-label or cheap brand lookalike use their own software, no standard solution
... not standards based, but they have 30% market share in India, we can't ignore them
... collaboration with Google. Users see only messages in their own language, but real time translation in / from other languages. Results were interesting (but not always coherent...)
... size of areas on a page is changing depending on the languages involved

roberto: wishful thinking: create once, publish everywhere
... encourage proper font rendering
... offer language expertise to mobile manufactueres
... rapidly deprecate support for older browsers

alex o conner (CNGL): have you thought of processing binary assets?

(scribe missed answer)

<chaals> [The images are generated on publishing, and left as static. If you correct, you regenerate new images (not an expensive process, like printing)

presentation from Loquendo

paolo: many people cannot read, in many circumstances you don't have written language
... need to handle speech too
... in the last 10 years W3C started a voice browser and multimodal working group
... today quite a few parts of speech processing is controlled by standards
... other important thing is the language subtag registry
... example. small speech application asking "what do you want to drink"
... recognition of speech means: you have to create grammars
... even the grammar has xml:lang
... you can also create a multilingual grammar
... that uses language identifiers for defining and re-defining the language
... tts means e.g. "reading a book"
... richard mentioned SSML. We developed 1.0, later 1.1
... Chinese people said that they need improvements
... now (in 1.1.) there is a tag to specify the language also for pieces of texts
... another point: you need to have a voice speaking
... other application areas: dubbing or gaming
... you can be more precise in SSML 1.1, e.g. for specifying the accent

(very nice :) ) demo of phonetic mapping to change spoken language

scribe: e.g. English spoken by Germans
... another standard: PLS 1.0, a lexicon used to correct errors
... for specific words, like locations, proper names
... application for TTS or or speech recognition
... BPC 47 did a lot of things, but there is a need to standardize phonetic alphabets more in detail
... development tool - LoquendoTTs director editing tool
... speech is another way of using the web
... standards help to create speech applications
... work by IANA / BCP 47 is good, but need to extend it for phonetic alphabets

peter_constable(Microsoft): your request about phonetic alphabets

scribe: there are subtags registered to denote that content is in IPA
... or other phonetic alphabets

paolo: did not follow the discussion in detail
... but idea was to have two registries, one for all subtags, one for phonetic alphabet only

Experiences in creating multilingual web sites - talk by Luis Bellido

luis: multilingual search in catalogue

<chaals> http://www.linguanet-europa.org/plus/welcome.htm -> lingu@net europa web site

luis: currently restricted, search only on multilingual metadata
... 32 different languages
... people involved: language teaching professionals
... no professional translators
... have problems to get used to translation memory etc.
... so we have to develop a process to create the site
... we created our own solution, after looking into existing ones
... relies on: utf-8, HTML, XML, CSS, MS-Office, Apache, Java Servlets, Tomacat, Lucene, Zope
... po files

luis describes the workflows in detail

luis: now about "multilingual links"
... we have same resources in different languages
... but not all the languages are on the site
... e.g. we have a link to "more information"
... but we don't have let's say a Spanish version
... in that case we give a box presenting the languages available
... question is what to show to the user: the current language (of the user), the versions in other languages
... we'd need a CMS supporting all these scenarios
... that is for a whole site
... initial prototype: using XSLT to generate multilingual links
... not sure if that is something to standardize: how to present multilingual links
... would be good to have a solution for that in CMS

Pedro L. Díez Orzas, Giuseppe Deriard, Pablo Badía Mas - Key Aspects of Multilingual Web Content Life Cycles: Present and Future

presentation from linguaserve

pedro: for us multilingual content is content in motion
... current state: multilingual webservices
... current model has problems, since there are more and more language to serve
... using "translatability data type definition" (tDTD)
... indicating which part to translate and what not
... e.g. if you change a price, no need to translate that
... attributes control if content has been transltedor not
... systems are sophisticated for multilingual publishing, but
... multilingual content web life cycles:
... we work with 7 different CMS

<chaals> s/UNKNOWN_SPEAKER/David/

pedro: they do not consider multilinguality in the content life cycle
... so no real management for different language version
... localization - different ways:
... online access to CMS
... or offline access
... third way: automatic real-time machine translation
... we want professional quality
... disconnect between CMS; MT and language version allows for a reduction of implementation costs
... now near future
... "everything is hybrid
... in globalisation: combining different services
... in localization: combining different production systems
... like onlne and offline
... combining several translation methodologies, like MT + professional post editing
... good to use XHTML and good source content
... summary: CMS has to take into account that multilingual content managment is important
... all of them need to collaborate

christian_liekse(SAP): how addition of machine translation can help to reduce complexity

scribe: e.g. in content management

pedro: MT for translating critical content may be a problem
... but for translating content that changes every two hours, MT is useful
... develops also on the language pairs
... we are working more on integration methodologies
... using XML can be a part of the solution, real time HTML another one
... how much each, depends on the client

Max Froumentin World Wide Web Foundation - The Remaining Five Billion: Why is Most of The World's Population Not Online and What Mobile Phones Can Do About It

max: introducing the world wide web foundation
... now: 75% of world population have access to the Web
... Richard: 1.2 Billion people are able to use the internet
... but we normally don't use messaging into account like SMS
... SMS is the Web, same with Voice
... e.g. if you call an interactive voice system
... its another way to access the Web
... messaging and voice are the WEb
... Web as it was created was: a desktop PC, HTTP
... now we have mobile browers
... there is also HTTP, but also apps, widgets
... the system still uses HTTP, goes to a URI
... but the user does not see the URI
... an SMS gateway uses that too
... we claim that the Web is not only a browser
... many people have access to e.g. SMS, even if they don't have a browser on the mobile phone
... voice browser applications
... consequences of this situation: SMS input / output problems
... no documentation for SMS, no authoring tools
... in voice: prompts ok, acceptance of dialog systems / NLP low
... no content that is interesting enough to send an SMS
... little knowledge about applications you can build with SMS
... no knowledge about business models
... Web for regreening alliance
... big problems with farmers in Sahara region
... one guy has invented a way to grow trees in the desert
... there are thousands of other farmers who don't know about that
... the guy who has the knowledge can't read, but he has a mobile phone
... the foundation will help to build an application to record advice etc.
... other farmers will be able to share their knowledge
... via accesing the web by voice and SMS
... another project: cgnet swara, in India
... a voice application to do citizen journalism
... there is no news in the local language
... using the system participants record a story, calling a number
... other people get that story via the web
... people are willing to pay for the information
... so people are able to make businesses out of this

Q/A of creators session

jörg_schutz: you said that users hesitate to press a button, but you need that in your projects

scribe: what about acceptance?

max: if the people think that the system is useful, they use it
... in Kenia they had an existing system for banking
... they designed both an SMS based and a voice based system to access banking information

jörg: so its a learning process for everybody

thierry_declerk(DFKI: interesting to see so many news in many languages

scribe: it seems that it is more the broadcaster doing the publication
... and not newspaper publishers
... 2nd question: would you make your content available for e.g. language technology?

roberto: to 2nd question, would like to
... but might sometimes not be able to do so, due to restrictions of contributors

thierry: in Germany there is a legislation now which forces broadcaster to take away content

roberto: not here

josef(CNGL): question for pedro

scribe: in a context for localization company
... is something changing of business models with mobile web?

pedro: telephone companies are asking for solutions
... problem is not only technical
... problems of text / mobile phones are similar to the other web, maybe different with voice

paolo: there are problems with mobile, but no solution yet

pedro: training of systems, noise etc. makes it very hard to create general purpose applications

paolo: companies doing that were relying on humans, but that was no sustainable business model
... we need to use machines, but solution is still a way ahead

natasha_brown(wiki-translate): BBC has a lot of teaching materials

scribe: copyright is about xyz years
... can BBC afford to give up copyright, so that children can learn British English

<chaals> s/xyz/70/

roberto: cannot answer

axel(mozilla): our right-to-left community said

scribe: keep it left-to-right, since there is e.g. no video player which does right-to-left
... so they would no be confused by having right-to-left

roberto: trying to do things 100% right is not always the solution

axel: please don't go for the multilingual links, they are awful

luis: not going for a specific solution, just trying to find solutions

axel: we use accept-language header, locale info etc.

chaals: what do you do if that does not work?

axel: file a bug in the browser

michael_staffanov(former-un): at w3c and various speakers

scribe: there should be something in HTML that let's us tag "this page is multilingual"
... this is something more and more important as technologies like machine translation evolves
... to BBC: "to be able to say 'this is the spanish version of the UK page'" is important
... there should be a tag in HTML that says "there are other pages available

chaals: there is, you can use explicit tags which are machine interpretable for linking to other languages

pedro: no way to distinguish between pages that have been translated , and the ones in the original language

richard: there is a way to link to different language version, but not many browsers implement it

claudio(lionbridge): pedro mentioned real-time translation

scribe: when you talk about machine translation and "real time" translation, concept of "good enough" needs to be taken into account

pedro: "hybrid" is the answer
... we can work with MT, including all kinds of processing (statistical, rules, ...)
... we have to be realistic in front of our customers
... merging different types of filters is important
... we can memorize pages, combine MT with xyz, but not in every language
... at the moment we have good results with the "filters" approach
... with close enough languages we have good results
... but with e.g. Spanish and English, it does not work so well
... problem of MT is not a problem for language pairs, but for a given text
... we resolve problem for text of a given client, not for any text

<scribe> UNKNOWN-SPEAKER: how do you deal with interoperability between CMS?

pedro: have a web service (SOAP-based)

chaals closes the session

<me> nobody else in the channel at the moment

<me> here at the bottom you add text

<me> and at the top t appears

<me> coffee time

<E_N> scribe test

<E_N> Christian Lieske on Best Practices and Stanadards for Improving Globalization-related Processes

<E_N> Felix: Introducing and reminding delegates about cocktail reception

<E_N> Christian: Intro re best practices and then moving to wish list

<E_N> Christian: presentation is for those with little or lots of knowledge, not so informative for those with middle knowledge

<E_N> Christian: core processes - how are they managed with best practices? Standards.

Christian Lieske (SAP) on best practices and standards for improving globalization-related processes

<E_N> Christian: many technological components to take into account

<E_N> thank you! ;)

<E_N> Christian: Does globalization really matter? Why does it matter?

<chaals> content passes through a chain and there are a lot of things that help improve quality - translation memories, ...

<E_N> 1/3rd of money that is involved usually goes to translators

<E_N> therefore everything else can be overhead

<E_N> there may be a problem with globalization processes?

<E_N> we do not have simple processing chains, there are many of them, i.e software, docuementation, training

<E_N> and there are many people involved...and they need to communicate together - we have to manager this with best practices and standards

<chaals> s/manager/manage/

<E_N> basic best practices - when you start to do something please start with best

<E_N> s/best practices

<E_N> make sure that all metadata is able to travel with data that has to be globalized

<E_N> thanks chaals

<E_N> I was not aware of this

<chaals> s/therefore everything/... therefore everything/

<chaals> s/there may be/... there may be/

<E_N> "good resources for best practices - w3c node - xml internationalization best practices"

<chaals> s/we do not/... we do not/

<E_N> chaals you are an expert at this

<E_N> "pseudo translation - enables you to find problems quickly - before begin translating - allows one to save time and money - and to prevent tension"

<E_N> "best practice rule = get terminology in order!

<E_N> "

<chaals> s/order!/order!"/

<E_N> "Christian - please take care of terminology and source content quality - to keep downstream processes clean and less troublesome"

<E_N> "Christian - please automate these early processes to ensure consistency early on"

<chaals> ... like this

<E_N> thanks

<chaals> s/... like this//

<chaals> TMX = Translation Memory Exhange (a standard for managing assets in translation

<E_N> me/ chaals I suggest that you take over, I am afraid I will be producing poor quality notes

<chaals> but will take over.

<E_N> thanks

<E_N> "Christian machines and humans need the right type of informtion - to ensure consistency - an important standard is ITS

<E_N> "Christian ITS very important for tagging"

<E_N> "XLIFF helps to unify the world, allows to do away with all the myriad formats"

<chaals> scribe: chaals

Christian: With XLIFF you don't need multiple filters, you just have one format.
... The ITS is about describing resources. It is about explaining things
... e.g. someone comes with content and says "make this in 3 languages"
... and you ask "are there parts in there that shouldn't be translated because they are trademarks or software commands"?
... ITS helps you to provide this sort of information.

<E_N> scribe: Elliot

<scribe> scribenick: E_N

<scribe> scribe: E_N

Christian: virtues of standards enable easier data transfer between environments (saving time)
... some people may disagree with standards, perhaps not applicable in real world
... things are not as simple in the real world applications
... world is much more complex
... a reality check - the scope of standards are not adequate (either to large or complex)
... also some standards may not be mature enough i.e some miss conformance clauses -
... another issue as that there are not many implementations of standards and the completness of implementations are sub-optimal
... they only implement a fraction of the standard (related to standards being too broad or large)
... there can be data loss between transfer

reducing efficeincy

s/efficiency

scribe: there is still much scope for improvement
... How are standards created?
... by accident?
... with grand pretensions?
... what issues can these methods bring?

<chaals> s/reducing efficeincy/... reducing efficiency/

scribe: 5m safety system to avoid accidents - by liasing between teams and coordinating
... the 5ms are a way to reduce issues during globalization life cycle
... 5ms are essentially ensuring that people, processes and technology are able to communicate adequatley
... for smooth globlization processes

<Roberto> Need cocktal now!

scribe: meta - work with standardized vocab - reduced subset (this reduces complexity and therefore issues related to data transfer between globalization workflow steps)

Josef van Genabith: Where are we going?

Next Generation Localisation

scribe: overview (next generation and future)
... starting with mega trends

Applying Standards - Benefits and Drawbacks

Daniel Grasmick: it is the industrial process of adapting digital content to culture, locale and linguistic environment, but why? For ROI.
... core topics - volume (great magnitude of content and target languages)
... also there is a shift from business content to user generated and social media
... additionally the way people access data is evolving rapidly
... people have much more in common than they did in the past
... people are becoming globalized
... and therefore people are the ultimate locale - they are the specific person who we are creating content for
... related to custom content and semantic web
... but how can we balance these various factors?
... example: localization often goes together with customer support - it has to be multilingual, this data shows that for every 10,000 that call there are an additional 100K who use the website to self server + an additional 300K using external forums (customers have direct access to peer groups, before contacting the customer support teams)
... in fact, the customer support teams may even be the last to know about the fixes which are available in the online forums

Daniel Grasmick: currently state of the art solutions are kluges of online/offline translation tools + mt + user generated content
... many ways to skin a cat - lots of potential methods to reduce costs and improve processes
... must use a holistic view
... to ensure a mature approach
... feedback between content generation and localization and content delivery
... how it is delivered is related to how it should be created and translated.

Daniel Grasmick: there is much to do, but we can all play a part
... this work is not just in the scope of large business

Cross-lingual information retrieval

?

<fsasaki> s/Cross-lingual information retrieval/Cross-lingual document similarity for Wikipedia languages/

<scribe> scribe: E_N

Marko: examples of cross-lingual information retrieval on Reuters RCV 2 news corpus
... where does this info relate to text related research fields?
... there are many fields, what is the difference between the areas?
... main difference is that each areas represent text in different ways, one doc can be seen as a sequence of characters - as we increase structure we see lexical patterns expanding and context recognition becoming obvious
... essentially from simple to complex and the ideal solutions for large scale cross-lingual IR?
... uses of this = language neutrality relies on statistical methods
... this method enables the clear representation of languages and allows for further user analysis on data
... using these stat techniques we can map documents into document neutral representations
... in summary - this technique provides for a way to analyse cross-lingual data

Q&A

UNKNOWN_SPEAKER: are there any results from cross-ling?
... yes, but fuzzy results
... many problems and challenges

<chaals> s/... yes/Christian: yes/

scribe suggests that delegates do further research

<chaals> Q: How does this relate to DBPedia?

Q: What features does this use?

A: words and phrases are only feature

Felix: Introducing new speaker, please fill out feedback forms!

<chaals> i/Q: What features/A: DBPedia addresses different use cases, so it has some conceptual similarities but is different

<scribe> ... new speaker is Daniel Grasmick

Daniel: delighted to be here for the first time,
... presentation will be less technical, pragmatic and candid
... from Lucy software, a young company combining language and technology
... three pillars SAP app translation, MT, Standards
... Daniel started as a translator, then worked in MT
... used to sell MT all around the world
... Daniel has noticed how industry has chnged
... worked with SAP for years before Lucy Software
... been involved with defining LISA standards including TMX
... TBX
... Evolution of TMX -
... from open tag to TMX
... finding a unified standard for translated segments

for simple data transfer between or within a business

scribe: provides freedom to choose, non propriatory
... from TMX

they felt the needed SRX

FYI - all of these are part of OAXAL

Open Architecture for XML authoring and localization

scribe recommends looking up OAXAL

Daniel: LISA has suffered from a lack of volunteers
... but standards are a must (rhetorical)

are they?

scribe: sometimes there are too many flavors
... reduced subsets
... minimal meta data for easiest exchange
... XLIFF is far more practical than EXCEL
... Excel should really not be a source format (if at all possible)

Q&A: Why does SAP not directly support XLIFF?

Q: Why does SAP not directly support XLIFF?

A: Daniel - SAP text is stored in tables, hard to see potential win of transferring to XLIFF

scribe: ROI questions perhaps?
... but solution will be developed at some point
... future is bright!

Q: From Proff Reinhard - How do you get standards developed? Whose interested and why?

scribe: to delegates > how do you get them made?
... Proff Reinhard continued highlighting plethora of inherint issues of developing standards

A: Christian Lieske - we make standards happen by networking activities (by developing understanding, the first step is knowing it is possible)

scribe: secondly (people want to get problems out of the way)

Daniel: to to develop standards one needs to be a idealist

A: Standards used to eliminate competition i.e RTF (well it was a non-standard, standard)

A: mix of needs pressures, avoiding opportunities

Charles: Are people prepared to use standards if they will make a profit?

specification

scribe: takes real work to develop a standard
... cost 2million to create SVG

SVG 1.2 much cheaper

scribe: companies are there for their own interests.
... in web, standards are very important
... no benefits in not having standards
... specifically on the web
... Governments can spearhead standards i.e S1000D

XML

<chaals> s/SVG 1.2 much cheaper/... SVG 1.2 was much more expensive/

scribe: from SGML to XML

standardizing the language of standards

scribe: reducing the complexity to reduce the potential for error

<chaals> s/no benefits in not/companies work on standards where there is no benefits in not/

<chaals> scribe: chaals

Comment: A big motivation is governments (especially military purchasing)
... wanting not to be locke to a supplier

Paulo: Working in speech with standards is great - it means people can build a big market
... Loquendo uses standards as a differentiating factor like Opera "trust us now because we are standards-based which means you're not locked into us"

Denis: Webkit question for developer panel. Given the same underlying objective for end users, are there specific reasons not to use webkit?

<E_N> ... webkit doesnt do things we want it to do, had 3rd rate svg support

<E_N> ... but great CSS support

<E_N> ... retooling engineers one must show a clear benefit - Opera assesment of webkit (some people it works for, some it doesnt)

<E_N> ... competition is a good thing

<E_N> ... single standards reduce competition in some instances

<E_N> ... especially in browser world

<E_N> ... when market shifted there was competion. Having one core browser is not a smart way to go.

<E_N> ... competition is important

<E_N> ... people can choose to implement as they see fit

<E_N> ... everyone to their own

<E_N> ... to suit their own software ecology

<E_N> ... and needs of internal and external users

<E_N> .... customers dont want to rely on webkit (it could be show to be unsatisfactory in the future)

<E_N> ... suppliers disappear ( the future is unknown)

s/to rely on/to be forced to rely on/

Alex: Mozilla thinks it is important to have difference and choice, as a core value.
... And in fact there are a lot of different branched Webkits out there.

<E_N> my pleasure, it was fun in a paint balling kind of way ;)

Alex: It isn't a browser, it is a core rendering engine. A key part of a browser, but just one part.
... We gladly took other components, because they fit in well.

Josef: To go back to the point from this morning, about whether there should be a tag that identifies things that have been translated.
... there are multiple stakeholders that should be involved (e.g. in that case it didn't matter to the browser makers, but did matter to other stakeholder)
... There are now a lot of spiders using the web for machine learning to improve translation systems.
... would be interesting to identify content that has been machine-translated, so spiders can exclude that from learning because they assume it isn't that good...

PeterC: The browsers aren't impacted by the tag for things that were translated, so aren't the right people to be defining it.

??: We all have cellphones and PDAs. Power plugs aren't standardised, and there has been a lot of talk. But imagine if they had been standardised a decade ago? We would have huge horrible things. Standardisation can hold back innovation as well, so you have to be aware of the cost as well as the benefit.

i/PeterC/FS: Note the the European Commission is here... and intersted in fostering innovation

Christian: Please come to the cocktail reception and keep talking.

Summary of Action Items

[End of minutes]

Minutes formatted by David Booth's scribe.perl version 1.135 (CVS log)
$Date: 2010/10/30 16:12:17 $