See also: IRC log
This is the raw scribe log for the sessions on day one of the MultilingualWeb workshop in Madrid. The log has not undergone careful post-editing and may contain errors or omissions. It should be read with that in mind. It constitutes the best efforts of the scribes to capture the gist of the talks and discussions that followed, in real time. IRC was used not only to capture notes on the talks, but can be followed in real time by remote participants, or participants with accessibility problems. People following IRC can also add contributions to the flow of text themselves.
See also the log for the second day.
<fsasaki> scribe: Felix
<inserted> scribenick: fsasaki
welcome address from UPM
introduction to the workshop by Richard Ishida (W3C)
Richard introduces the "Multilingual Web" project - aims, goais
project homepage http://www.multilingualweb.eu/
kimmo: I am project officer of
this project. Lot of enthusiasm of participants - a dream team
of coverage of different areas
... so project can really make a difference for the multilingual web
... this project is about much more than standardization
... EC has made a commitement on the "digital agenda" in Europe
... how communication technology can help to solve European / global challenges
... trying to boost innovation and faster uptake of research results by the industry
... 8th framework program now "research and innovation" as focus
... we need good input to discussions about 8th framework program. Nobody knows how the Web will look in 10 years
... so we want to make use of the opinions of the stakeholders - people like you
... scribing sessions is a very important job. Scribes should help us to get conclusions & recommendations to process outcomes of this event
... my job is to sell money to people who have good ideas. I make an attempt to convince them to work in / on European projects
... 50 mill. Euros for projects in Language Technology available
... areas: multilngual content processing, including Machine Translation, chain of authoring / managing one is multilingual online content
... another area: on multilingual information access & mining
... and third area: natural speech interaction
... call I just described is ongoing, we are taking submissions now
... a future call for 1st Feburary 2011: SME initiative for digital content and languages
... focused on SME, but consortia can encompass also large companies & research institutes. At least 2 SMEs need to be involved
... data sharing & pooling
... mlw project is about standards. Can we address standards in upcoming calls?
... yes, we can. But rather than having a project developing standards, put standards in action
... build something useful around the standard
... thank you for your time, enjoy the workshop!
"The Multilingual Web, Policy Making and Access to Digital Knowledge for All"
reinhard: we made a survey on mlw
- see current state of results at http://tinyurl.com/3xgfydl
... standards and commercial interested are sometimes in a difficult relation
... there are about 800.000 standards around
... getting a standard through requires political cleverness, friends, power to push against strong interests
... standards in localization: Encoding (Unicode), Quality (e.g. LISA Q/A), data exchange (XLIFF, TMX), Metrics
... Unicode was successful since industry players came together and just did it, also giving up their own existing work on encoding
... people are not so much interest in standards, but by what they can make with them
... expectations and reality are often different things - e.g. sometimes people say they support XLIFF, but they "just" can read / import / export XML files
... comparison to a bus stop - do you want to be in a standardized environment with the bus on time, or in an environment with the bus being delayed and you have time to talk to your friends?
... standards means making compromises. You don't want to wait for a committee, you just do it
... also you don't want to cooperate with your competitors
... data exchange and process management are important too
... you always want to keep an advantage compared to your competitors
... where are we? 19 billion $ industry
... but highly fragmented. Some (2,3,4) dominant players
... short term ROI oriented
... localization industry was established in the 80ies since companies want to sell products in many regions / countries
... LOC people don't look ahead long term, because of short term ROI
... so who can drive mlw?
... we made a survey, see results link here http://tinyurl.com/3xgfydl
... many people want to have the multilingual web - so why isn't it happening?
... localization for all is now in focus:
... more people / languages / content, user drive / own / manage the content
... networks become standards based & interoperable
... that is happening in the non L10N world - why couldn't it happen also in the localization world?
... companies have to give up illusion of control
... focus has to be on impelementations and benefits for the people
... there is a fundamental right: access to information
... that does not need to be judged by business case
... drivers for this change: maybe not large cooperations, but nonprofit sector
... nonprofit translation is the world largest translation movement
... motivation is to make the world a better place
... a forum to achieve that: Intetnet governance forum (IGF)
... UN's IGF working group
... standards, access to technology, and skill are important
... as we can see in the "Close Encounters Of The Third Kind" clip
... if aliens could talk, we would not understand them but we could try to develop technologies to achieve that
<Jirka> scribe: Jirka
chaired by Adriane Rinsche
... Mark Davis (Google) will videocast tommorŕow at 16:30
by Richard Ishida (W3C)
Richard: introduces W3C
<inserted> scribenick: fsasaki
<chaals> s/scribe:various/scribe: Felix/
<Jirka> ... 22 activities, 50 working groups and more
<Jirka> ... internationalization activity is part of W3C work
<Jirka> ... standards supporting multilingual web are
<Jirka> ... Unicode, W3C technology is built on top of Unicode
<chaals> scribenick: Jirka
Richard: 70% of web pages are
using Unicode encodings
... some mistakes solved recently, XML 5ed extends characters for identifiers
... Unicode normalization, W3C proposed to use NFC form
... work on allowing national characters in resource identifiers -- IDN (International Domain Names)
... in June IANA started to release internationalized top-level domain names
... IRI allows to internationalized path part of resource identifier
... language tags, 8000 subtags, described in BCP47
... Speech Synthesis Markup Language
... CSS3 adds more internationalization support
... browser implementers have to support all details in different langauges
... browser developers need to have reason to implement support for i18n features
... they need to hear from users that i18n features are critical
... vertical text is needed in Japan, Korea, Thailand, China; covered by CSS3 module
... there are problems with mixing various script
... Ruby annotation
... there is CSS3 Ruby module
... HTML5 has implemented Ruby but differently then other specs, need for convergence
<chaals> [HTML5 tried to copy IE's implementation to be interoperable with existing usage (which is why it is different from the original spec)]
Richard: plans to support complex Ruby
<chaals> [/me is excited about the requirements for layout, because it has motivated groups in a number of other languages to do the same]
Richard: Requirements for
Japanese Layout are used as an input to several specs,
including XSL-FO, HTML, CSS
... Web Fonts - ability to use custom fonts in web pages
... there are still subsetting and licensing issues
... HTML5 - language identification, ability to specify dates in standardized way [using uF approach]
... HTML5 new input types for forms
... issues with bidirectional markup
... there are additional requirements related to ordering and alignment of text
... MathML 3.0 supports arabic math typesetting
... ITS -- there will be separate talk by Christian
... The rise of Mobile Web
... Best practices developed by W3C
... there are also tests
<chaals> http://www.w3.org/International -> W3C Internationalisation Activity (lots of useful links and things)
Richard: I18N Checker http://qa-dev.w3.org/i18n-checker/
... there is also MobileOK checker
... Web is about people, not about technology
... we need you to make Web worldwide
by Axel Hecht (Mozilla)
Axel: Firefox 4 will change
User-Agent header, no more locale info here, use
... 80+ localizations
... community driven
... it is challange to make work everything on all platforms
... negotiating content language
... balance between best content and user privacy
... bettwe APIs -- for BP47, site-specific Accept-Langauge
... web sites at Mozilla
... mostly static content
... locale dependant content
... data-driven sites, not easy
... live multi-lingual documents, like documentation, knowledge base, ...
... how to differentiate about added translation or bug-fix in a content that should be propagated to pages in other languages
... international feedback button
... several existing Wiki system used, noone really sufficient
... developing own Kitsune system
... question: what functionality is missing in the browsers (in general or in Firefox)?
... question: localizing HTML5 content on the client
... question: managing live multilingual docs
question from auditorium about handling speech
Axel: I'm not working in speech area
by Charles McCathieNevile (Opera)
Chaals: Opera was created
especially to support non-English web
... Opera supports all kind of devices
... some key markets:
... Japanese -- how to make it work on mobile phones
... Vietnamese -- multiple diacritics over one character
... India, Iran -- phones
... technology/standards used
... originaly used UTF-16, good for CJK efficiency
... UTF-8 is better for real world
... using getttext PO files for software
... content for l10n is handled in XLIFF
... translation of Opera Desktop by agencies and volunteers (for minor languages)
... Opera Mini translated only by agencies; 100+ langauges, because space constraints
... Widgets and extensions are translated by developers
... My.Opera content translated by community
... XLIFF more complicated then Opera needs
... translators tools -- we use open-source tools and some inhouse stuff
... different translation agencies use different software, so it is hard to change agencies
... word-breaking dictionaries are getiing large
... problem for embeded devices (TV, game consoles, ...)
... layout (RTL, vertical), scary part of browser code, very expensive to change
... without clear message from users there is no interest in touching this complex code
... people issues:
... people don't understand how translation works
... now everything is translated from English
... we have tried to use multiple source languages to cater translators
... it is harder to maintain quality of such translations, but you can have larger translator community
by Jan Nelson (Microsoft), Peter Constable (Microsoft)
<chaals> a/complex code/, so the fact that communities have started being clear about a need for this and writing documents on what is required is important
Jan: MS does a lot of
translation, l18n and globalization
... Microsoft research working on Microsoft Translate service
... WikiBhasha -- browser-based multilingual content creator for Wikipedia
... open-source tool, currently supporting 35 languages
... Microsoft Local Language Program
Peter: about enabling mlw
... IE is localized in 95 langauges
... pages in UTF-8 are growing (over 50% of content)
... HTML/CSS is ready for multlingual web
... issues in separation of content and application code
... client-server interaction issues -- handling prefered language when travelling, ...
... show some examples of rendering various scripts on HTML page and on HTML5 canvas
Question for Alex: Why you have changed several Wiki implementations? Was it because poor localization support.
Alex: content developers want visual editing (no wiki codes). For another wikis I don't know.
Question from Christian Lieske (SAP): There are some issues in handling i18n content by Webkit based applications. Is this caused by core (webkit) or applications?
Alex: there are several rendering engines, no representative from project using webkit is here
Chaals: If there is no functionality in rendering engine, it simply doesn't work and webdevelopers has to find workarounds. Parties has to work together to implement the most demanded features.
Question from David (BBC World Service): Who is responsible for rendering complex scripts -- browser engine or underlying operating systems?
Peter: In Windows we serve more then browsers, so we have this functionality in OS. Some browsers use Windows functionality, some depend on their own rendering engine.
Chaals: some devices don't have any such support, we have this in browser engine to support various devices
Alex: It depends on the
... you have to have good fonts for scripts/languages. This is not easy for minor languages
... the problem is that developes of rendering engine don't have knowledge of foreign languages
Question from Jörg Schultz (BioLoom): There are two groups working on HTML5 -- WHATWG and W3C HTML WG. How this will evolve?
Chaals: HTML5 spec will be
produced by W3C.
... WHATWG is very informal and open place for developers playing with possible HTML5 features.
... some features from WHATWG were removed from W3C HTML5
Question from Felix Sasaki: Do you see need for common way marking up what should/should not be translated?
Chaals: It is not matter of browser, browser doesn't do translation. You can use XHTML and add that right now, and it will not be a problem
Richard: Google translation supports it, Hixie (editor of HTML5) dismissed this feature proposed by Microsoft for inclusion in HTML5
Alex: if we are going to support localization directly in browser, we will consider supporting it
Peter: this is not relevant for browser, but for upstream process when content is created and translated
Closing question, where MLW is going in the next couple of years?
<chaals> Richard: There are a huge number of things to work on...
Chaals: A lot of things deserve better support, eg. vertical text
Jan: work together with local governments on support of more languages
<fsasaki> scribe: fsasaki
chaals: session about making content
chaals introduces the speakers
talk by Roberto Belo Rovella, David Vella
roberto: in charge of bbc world
... our focus now on online platforms
... we are multilingual site, but each site has its own editors, i.e. not direct translation
... recently re-released news site in Burmese, with new fonts and Unicode support
Roberto gives over examples for Chinese market
<chaals> [chinese: produced in simplified chinese, can be auto-converted to traditional]
<chaals> [Uzbek site: Uzbekistan is moving from writing in cyrillic to using the same language in latin. Plus arabic]
roberto: lot'f of other markets,
... 5% of the BBC traffic
... past: challenges for multilinguality: creating fonts / input methods "from scratch"
... operating systems get better, so this has become much easier
... BBC was one of the first content providers to publish in Unicode
... was a hard challenge. Some websites offered content as GIFs
... people said "why don't you use a font we are already using?"
... at the end we prevailed, Unicode successed. But we lost a lot of the market
... in some languages we had to use English. Urdu now works on the local script
<scribe> .. new look for Arabic website, with new fonts
depends also on other parts of the website, like a localized
... there are still web sites offering images instead of fonts, but solutions are coming
... currently on a mobile displaying hindi, a user sees only boxes
... iOS 4 is close to the correct rendering
... reading with that display is very hard
... 70% of devices in India cannot display the Hindi text properly
... we created an image based solution, in addition to the text based one
... we publish both, and have links from the images to the text on the paragraph level
... so we ignored the (W3C) advice of not using images for character display
... we will not replace the text based version in the CMS
... images display on every device, we control the rendering
... we used Pango text rendering library
... average page size is 45 KB, text only is 20 KB
... launched Aug 2010, Hindi mobile traffic up by 50%
... this is only temporary
... Nokia and Samsung, they only localize the UI, so the situation is not changing
... white-label or cheap brand lookalike use their own software, no standard solution
... not standards based, but they have 30% market share in India, we can't ignore them
... collaboration with Google. Users see only messages in their own language, but real time translation in / from other languages. Results were interesting (but not always coherent...)
... size of areas on a page is changing depending on the languages involved
roberto: wishful thinking: create
once, publish everywhere
... encourage proper font rendering
... offer language expertise to mobile manufactueres
... rapidly deprecate support for older browsers
alex o conner (CNGL): have you thought of processing binary assets?
(scribe missed answer)
<chaals> [The images are generated on publishing, and left as static. If you correct, you regenerate new images (not an expensive process, like printing)
paolo: many people cannot read,
in many circumstances you don't have written language
... need to handle speech too
... in the last 10 years W3C started a voice browser and multimodal working group
... today quite a few parts of speech processing is controlled by standards
... other important thing is the language subtag registry
... example. small speech application asking "what do you want to drink"
... recognition of speech means: you have to create grammars
... even the grammar has xml:lang
... you can also create a multilingual grammar
... that uses language identifiers for defining and re-defining the language
... tts means e.g. "reading a book"
... richard mentioned SSML. We developed 1.0, later 1.1
... Chinese people said that they need improvements
... now (in 1.1.) there is a tag to specify the language also for pieces of texts
... another point: you need to have a voice speaking
... other application areas: dubbing or gaming
... you can be more precise in SSML 1.1, e.g. for specifying the accent
(very nice :) ) demo of phonetic mapping to change spoken language
scribe: e.g. English spoken by
... another standard: PLS 1.0, a lexicon used to correct errors
... for specific words, like locations, proper names
... application for TTS or or speech recognition
... BPC 47 did a lot of things, but there is a need to standardize phonetic alphabets more in detail
... development tool - LoquendoTTs director editing tool
... speech is another way of using the web
... standards help to create speech applications
... work by IANA / BCP 47 is good, but need to extend it for phonetic alphabets
peter_constable(Microsoft): your request about phonetic alphabets
scribe: there are subtags
registered to denote that content is in IPA
... or other phonetic alphabets
paolo: did not follow the
discussion in detail
... but idea was to have two registries, one for all subtags, one for phonetic alphabet only
luis: multilingual search in catalogue
<chaals> http://www.linguanet-europa.org/plus/welcome.htm -> lingu@net europa web site
luis: currently restricted,
search only on multilingual metadata
... 32 different languages
... people involved: language teaching professionals
... no professional translators
... have problems to get used to translation memory etc.
... so we have to develop a process to create the site
... we created our own solution, after looking into existing ones
... relies on: utf-8, HTML, XML, CSS, MS-Office, Apache, Java Servlets, Tomacat, Lucene, Zope
... po files
luis describes the workflows in detail
luis: now about "multilingual
... we have same resources in different languages
... but not all the languages are on the site
... e.g. we have a link to "more information"
... but we don't have let's say a Spanish version
... in that case we give a box presenting the languages available
... question is what to show to the user: the current language (of the user), the versions in other languages
... we'd need a CMS supporting all these scenarios
... that is for a whole site
... initial prototype: using XSLT to generate multilingual links
... not sure if that is something to standardize: how to present multilingual links
... would be good to have a solution for that in CMS
presentation from linguaserve
pedro: for us multilingual
content is content in motion
... current state: multilingual webservices
... current model has problems, since there are more and more language to serve
... using "translatability data type definition" (tDTD)
... indicating which part to translate and what not
... e.g. if you change a price, no need to translate that
... attributes control if content has been transltedor not
... systems are sophisticated for multilingual publishing, but
... multilingual content web life cycles:
... we work with 7 different CMS
pedro: they do not consider
multilinguality in the content life cycle
... so no real management for different language version
... localization - different ways:
... online access to CMS
... or offline access
... third way: automatic real-time machine translation
... we want professional quality
... disconnect between CMS; MT and language version allows for a reduction of implementation costs
... now near future
... "everything is hybrid
... in globalisation: combining different services
... in localization: combining different production systems
... like onlne and offline
... combining several translation methodologies, like MT + professional post editing
... good to use XHTML and good source content
... summary: CMS has to take into account that multilingual content managment is important
... all of them need to collaborate
christian_liekse(SAP): how addition of machine translation can help to reduce complexity
scribe: e.g. in content management
pedro: MT for translating
critical content may be a problem
... but for translating content that changes every two hours, MT is useful
... develops also on the language pairs
... we are working more on integration methodologies
... using XML can be a part of the solution, real time HTML another one
... how much each, depends on the client
max: introducing the world wide
... now: 75% of world population have access to the Web
... Richard: 1.2 Billion people are able to use the internet
... but we normally don't use messaging into account like SMS
... SMS is the Web, same with Voice
... e.g. if you call an interactive voice system
... its another way to access the Web
... messaging and voice are the WEb
... Web as it was created was: a desktop PC, HTTP
... now we have mobile browers
... there is also HTTP, but also apps, widgets
... the system still uses HTTP, goes to a URI
... but the user does not see the URI
... an SMS gateway uses that too
... we claim that the Web is not only a browser
... many people have access to e.g. SMS, even if they don't have a browser on the mobile phone
... voice browser applications
... consequences of this situation: SMS input / output problems
... no documentation for SMS, no authoring tools
... in voice: prompts ok, acceptance of dialog systems / NLP low
... no content that is interesting enough to send an SMS
... little knowledge about applications you can build with SMS
... no knowledge about business models
... Web for regreening alliance
... big problems with farmers in Sahara region
... one guy has invented a way to grow trees in the desert
... there are thousands of other farmers who don't know about that
... the guy who has the knowledge can't read, but he has a mobile phone
... the foundation will help to build an application to record advice etc.
... other farmers will be able to share their knowledge
... via accesing the web by voice and SMS
... another project: cgnet swara, in India
... a voice application to do citizen journalism
... there is no news in the local language
... using the system participants record a story, calling a number
... other people get that story via the web
... people are willing to pay for the information
... so people are able to make businesses out of this
jörg_schutz: you said that users hesitate to press a button, but you need that in your projects
scribe: what about acceptance?
max: if the people think that the
system is useful, they use it
... in Kenia they had an existing system for banking
... they designed both an SMS based and a voice based system to access banking information
jörg: so its a learning process for everybody
thierry_declerk(DFKI: interesting to see so many news in many languages
scribe: it seems that it is more
the broadcaster doing the publication
... and not newspaper publishers
... 2nd question: would you make your content available for e.g. language technology?
roberto: to 2nd question, would
... but might sometimes not be able to do so, due to restrictions of contributors
thierry: in Germany there is a legislation now which forces broadcaster to take away content
roberto: not here
josef(CNGL): question for pedro
scribe: in a context for
... is something changing of business models with mobile web?
pedro: telephone companies are
asking for solutions
... problem is not only technical
... problems of text / mobile phones are similar to the other web, maybe different with voice
paolo: there are problems with mobile, but no solution yet
pedro: training of systems, noise etc. makes it very hard to create general purpose applications
paolo: companies doing that were
relying on humans, but that was no sustainable business
... we need to use machines, but solution is still a way ahead
natasha_brown(wiki-translate): BBC has a lot of teaching materials
scribe: copyright is about xyz
... can BBC afford to give up copyright, so that children can learn British English
roberto: cannot answer
axel(mozilla): our right-to-left community said
scribe: keep it left-to-right,
since there is e.g. no video player which does
... so they would no be confused by having right-to-left
roberto: trying to do things 100% right is not always the solution
axel: please don't go for the multilingual links, they are awful
luis: not going for a specific solution, just trying to find solutions
axel: we use accept-language header, locale info etc.
chaals: what do you do if that does not work?
axel: file a bug in the browser
michael_staffanov(former-un): at w3c and various speakers
scribe: there should be something
in HTML that let's us tag "this page is multilingual"
... this is something more and more important as technologies like machine translation evolves
... to BBC: "to be able to say 'this is the spanish version of the UK page'" is important
... there should be a tag in HTML that says "there are other pages available
chaals: there is, you can use explicit tags which are machine interpretable for linking to other languages
pedro: no way to distinguish between pages that have been translated , and the ones in the original language
richard: there is a way to link to different language version, but not many browsers implement it
claudio(lionbridge): pedro mentioned real-time translation
scribe: when you talk about machine translation and "real time" translation, concept of "good enough" needs to be taken into account
pedro: "hybrid" is the
... we can work with MT, including all kinds of processing (statistical, rules, ...)
... we have to be realistic in front of our customers
... merging different types of filters is important
... we can memorize pages, combine MT with xyz, but not in every language
... at the moment we have good results with the "filters" approach
... with close enough languages we have good results
... but with e.g. Spanish and English, it does not work so well
... problem of MT is not a problem for language pairs, but for a given text
... we resolve problem for text of a given client, not for any text
<scribe> UNKNOWN-SPEAKER: how do you deal with interoperability between CMS?
pedro: have a web service (SOAP-based)
chaals closes the session
<me> nobody else in the channel at the moment
<me> here at the bottom you add text
<me> and at the top t appears
<me> coffee time
<E_N> scribe test
<E_N> Christian Lieske on Best Practices and Stanadards for Improving Globalization-related Processes
<E_N> Felix: Introducing and reminding delegates about cocktail reception
<E_N> Christian: Intro re best practices and then moving to wish list
<E_N> Christian: presentation is for those with little or lots of knowledge, not so informative for those with middle knowledge
<E_N> Christian: core processes - how are they managed with best practices? Standards.
<E_N> Christian: many technological components to take into account
<E_N> thank you! ;)
<E_N> Christian: Does globalization really matter? Why does it matter?
<chaals> content passes through a chain and there are a lot of things that help improve quality - translation memories, ...
<E_N> 1/3rd of money that is involved usually goes to translators
<E_N> therefore everything else can be overhead
<E_N> there may be a problem with globalization processes?
<E_N> we do not have simple processing chains, there are many of them, i.e software, docuementation, training
<E_N> and there are many people involved...and they need to communicate together - we have to manager this with best practices and standards
<E_N> basic best practices - when you start to do something please start with best
<E_N> s/best practices
<E_N> make sure that all metadata is able to travel with data that has to be globalized
<E_N> thanks chaals
<E_N> I was not aware of this
<chaals> s/therefore everything/... therefore everything/
<chaals> s/there may be/... there may be/
<E_N> "good resources for best practices - w3c node - xml internationalization best practices"
<chaals> s/we do not/... we do not/
<E_N> chaals you are an expert at this
<E_N> "pseudo translation - enables you to find problems quickly - before begin translating - allows one to save time and money - and to prevent tension"
<E_N> "best practice rule = get terminology in order!
<E_N> "Christian - please take care of terminology and source content quality - to keep downstream processes clean and less troublesome"
<E_N> "Christian - please automate these early processes to ensure consistency early on"
<chaals> ... like this
<chaals> s/... like this//
<chaals> TMX = Translation Memory Exhange (a standard for managing assets in translation
<E_N> me/ chaals I suggest that you take over, I am afraid I will be producing poor quality notes
<chaals> but will take over.
<E_N> "Christian machines and humans need the right type of informtion - to ensure consistency - an important standard is ITS
<E_N> "Christian ITS very important for tagging"
<E_N> "XLIFF helps to unify the world, allows to do away with all the myriad formats"
<chaals> scribe: chaals
Christian: With XLIFF you don't
need multiple filters, you just have one format.
... The ITS is about describing resources. It is about explaining things
... e.g. someone comes with content and says "make this in 3 languages"
... and you ask "are there parts in there that shouldn't be translated because they are trademarks or software commands"?
... ITS helps you to provide this sort of information.
<E_N> scribe: Elliot
<scribe> scribenick: E_N
<scribe> scribe: E_N
Christian: virtues of standards
enable easier data transfer between environments (saving
... some people may disagree with standards, perhaps not applicable in real world
... things are not as simple in the real world applications
... world is much more complex
... a reality check - the scope of standards are not adequate (either to large or complex)
... also some standards may not be mature enough i.e some miss conformance clauses -
... another issue as that there are not many implementations of standards and the completness of implementations are sub-optimal
... they only implement a fraction of the standard (related to standards being too broad or large)
... there can be data loss between transfer
scribe: there is still much scope
... How are standards created?
... by accident?
... with grand pretensions?
... what issues can these methods bring?
<chaals> s/reducing efficeincy/... reducing efficiency/
scribe: 5m safety system to avoid
accidents - by liasing between teams and coordinating
... the 5ms are a way to reduce issues during globalization life cycle
... 5ms are essentially ensuring that people, processes and technology are able to communicate adequatley
... for smooth globlization processes
<Roberto> Need cocktal now!
scribe: meta - work with standardized vocab - reduced subset (this reduces complexity and therefore issues related to data transfer between globalization workflow steps)
Josef van Genabith: Where are we going?
scribe: overview (next generation
... starting with mega trends
Daniel Grasmick: it is the
industrial process of adapting digital content to culture,
locale and linguistic environment, but why? For ROI.
... core topics - volume (great magnitude of content and target languages)
... also there is a shift from business content to user generated and social media
... additionally the way people access data is evolving rapidly
... people have much more in common than they did in the past
... people are becoming globalized
... and therefore people are the ultimate locale - they are the specific person who we are creating content for
... related to custom content and semantic web
... but how can we balance these various factors?
... example: localization often goes together with customer support - it has to be multilingual, this data shows that for every 10,000 that call there are an additional 100K who use the website to self server + an additional 300K using external forums (customers have direct access to peer groups, before contacting the customer support teams)
... in fact, the customer support teams may even be the last to know about the fixes which are available in the online forums
Daniel Grasmick: currently state
of the art solutions are kluges of online/offline translation
tools + mt + user generated content
... many ways to skin a cat - lots of potential methods to reduce costs and improve processes
... must use a holistic view
... to ensure a mature approach
... feedback between content generation and localization and content delivery
... how it is delivered is related to how it should be created and translated.
Daniel Grasmick: there is much to do, but
we can all play a part
... this work is not just in the scope of large business
<fsasaki> s/Cross-lingual information retrieval/Cross-lingual document similarity for Wikipedia languages/
<scribe> scribe: E_N
Marko: examples of cross-lingual
information retrieval on Reuters RCV 2 news corpus
... where does this info relate to text related research fields?
... there are many fields, what is the difference between the areas?
... main difference is that each areas represent text in different ways, one doc can be seen as a sequence of characters - as we increase structure we see lexical patterns expanding and context recognition becoming obvious
... essentially from simple to complex and the ideal solutions for large scale cross-lingual IR?
... uses of this = language neutrality relies on statistical methods
... this method enables the clear representation of languages and allows for further user analysis on data
... using these stat techniques we can map documents into document neutral representations
... in summary - this technique provides for a way to analyse cross-lingual data
UNKNOWN_SPEAKER: are there any
results from cross-ling?
... yes, but fuzzy results
... many problems and challenges
<chaals> s/... yes/Christian: yes/
scribe suggests that delegates do further research
<chaals> Q: How does this relate to DBPedia?
Q: What features does this use?
A: words and phrases are only feature
Felix: Introducing new speaker, please fill out feedback forms!
<chaals> i/Q: What features/A: DBPedia addresses different use cases, so it has some conceptual similarities but is different
<scribe> ... new speaker is Daniel Grasmick
Daniel: delighted to be here for
the first time,
... presentation will be less technical, pragmatic and candid
... from Lucy software, a young company combining language and technology
... three pillars SAP app translation, MT, Standards
... Daniel started as a translator, then worked in MT
... used to sell MT all around the world
... Daniel has noticed how industry has chnged
... worked with SAP for years before Lucy Software
... been involved with defining LISA standards including TMX
... Evolution of TMX -
... from open tag to TMX
... finding a unified standard for translated segments
for simple data transfer between or within a business
scribe: provides freedom to
choose, non propriatory
... from TMX
they felt the needed SRX
FYI - all of these are part of OAXAL
Open Architecture for XML authoring and localization
scribe recommends looking up OAXAL
Daniel: LISA has suffered from a
lack of volunteers
... but standards are a must (rhetorical)
scribe: sometimes there are too
... reduced subsets
... minimal meta data for easiest exchange
... XLIFF is far more practical than EXCEL
... Excel should really not be a source format (if at all possible)
Q&A: Why does SAP not directly support XLIFF?
Q: Why does SAP not directly support XLIFF?
A: Daniel - SAP text is stored in tables, hard to see potential win of transferring to XLIFF
scribe: ROI questions
... but solution will be developed at some point
... future is bright!
Q: From Proff Reinhard - How do you get standards developed? Whose interested and why?
scribe: to delegates > how do
you get them made?
... Proff Reinhard continued highlighting plethora of inherint issues of developing standards
A: Christian Lieske - we make standards happen by networking activities (by developing understanding, the first step is knowing it is possible)
scribe: secondly (people want to get problems out of the way)
Daniel: to to develop standards one needs to be a idealist
A: Standards used to eliminate competition i.e RTF (well it was a non-standard, standard)
A: mix of needs pressures, avoiding opportunities
Charles: Are people prepared to use standards if they will make a profit?
scribe: takes real work to
develop a standard
... cost 2million to create SVG
SVG 1.2 much cheaper
scribe: companies are there for
their own interests.
... in web, standards are very important
... no benefits in not having standards
... specifically on the web
... Governments can spearhead standards i.e S1000D
<chaals> s/SVG 1.2 much cheaper/... SVG 1.2 was much more expensive/
scribe: from SGML to XML
standardizing the language of standards
scribe: reducing the complexity to reduce the potential for error
<chaals> s/no benefits in not/companies work on standards where there is no benefits in not/
<chaals> scribe: chaals
Comment: A big motivation is
governments (especially military purchasing)
... wanting not to be locke to a supplier
Paulo: Working in speech with
standards is great - it means people can build a big
... Loquendo uses standards as a differentiating factor like Opera "trust us now because we are standards-based which means you're not locked into us"
Denis: Webkit question for developer panel. Given the same underlying objective for end users, are there specific reasons not to use webkit?
<E_N> ... webkit doesnt do things we want it to do, had 3rd rate svg support
<E_N> ... but great CSS support
<E_N> ... retooling engineers one must show a clear benefit - Opera assesment of webkit (some people it works for, some it doesnt)
<E_N> ... competition is a good thing
<E_N> ... single standards reduce competition in some instances
<E_N> ... especially in browser world
<E_N> ... when market shifted there was competion. Having one core browser is not a smart way to go.
<E_N> ... competition is important
<E_N> ... people can choose to implement as they see fit
<E_N> ... everyone to their own
<E_N> ... to suit their own software ecology
<E_N> ... and needs of internal and external users
<E_N> .... customers dont want to rely on webkit (it could be show to be unsatisfactory in the future)
<E_N> ... suppliers disappear ( the future is unknown)
s/to rely on/to be forced to rely on/
Alex: Mozilla thinks it is
important to have difference and choice, as a core value.
... And in fact there are a lot of different branched Webkits out there.
<E_N> my pleasure, it was fun in a paint balling kind of way ;)
Alex: It isn't a browser, it is a
core rendering engine. A key part of a browser, but just one
... We gladly took other components, because they fit in well.
Josef: To go back to the point
from this morning, about whether there should be a tag that
identifies things that have been translated.
... there are multiple stakeholders that should be involved (e.g. in that case it didn't matter to the browser makers, but did matter to other stakeholder)
... There are now a lot of spiders using the web for machine learning to improve translation systems.
... would be interesting to identify content that has been machine-translated, so spiders can exclude that from learning because they assume it isn't that good...
PeterC: The browsers aren't impacted by the tag for things that were translated, so aren't the right people to be defining it.
??: We all have cellphones and PDAs. Power plugs aren't standardised, and there has been a lot of talk. But imagine if they had been standardised a decade ago? We would have huge horrible things. Standardisation can hold back innovation as well, so you have to be aware of the cost as well as the benefit.
i/PeterC/FS: Note the the European Commission is here... and intersted in fostering innovation
Christian: Please come to the cocktail reception and keep talking.