MLW workshop, PISA -- 04 Apr 2011

This is the raw scribe log for the sessions on day one of the MultilingualWeb workshop in Pisa. The log has not undergone careful post-editing and may contain errors or omissions. It should be read with that in mind. It constitutes the best efforts of the scribes to capture the gist of the talks and discussions that followed, in real time. IRC is used not only to capture notes on the talks, but can be followed in real time by remote participants, or participants with accessibility problems. People following IRC can also add contributions to the flow of text themselves.

Welcome

Richard introduces the project and the workshop

<luke> 2nd of 4 MultilingualWeb conferences

<luke> Goal is to facilitate cross-pollination across different areas, so don't tune out if it's not your specialty!

Domenico Laforenza, "The Italian approach to Internationalized Domain Names"

Domenico describes the mechanisms behind IDN, domain names in general, the usage of the internet

Domenico describes what is possible with IDN, compared to domain names in general

Domenico describes how the punycode translation helps to use IDN, while keeping the underlying domain name system as is

Oreste Signore, "Web for all"

<chaals> [webfonts is actually really important for some places ... ]

oreste is showing various areas that need more work to create "a web for all", e.g. in the area of accessibility, multilinguality etc.

oreste describes wcag 2.0

oreste: issues of multilingual web: encoding, colors, navigation, ...

oreste describes the role of W3C offices, translations, W3C I18N Activity etc. as important means to push the multilingual web

Kimmo Rossi, "Welcome message"

Kimmo: I am project officer for mlw project
... I am very happy about the enthusastim in this project. It is very small in terms of budget, but it is very successful
... mlw has also been very succesful in using social media
... looking forward to see the next steps including the review which is coming up
... mlw has been wonderful forum for gathering new ideas, to understand how much fragmentation still exits
... now it is time to become operational, to start to put ideas into practice
... I except that this project will come up with good recommendations: what needs to be done, why, who could do it?
... we have to create operational working links to other European projects
... mid 2015 we will have about 50 onging projects in the area of multilingual technologies
... we started creating these links, i.e. we have speakers from several European projects
... please look into these other initiatives and see what we can do together
... we started funding language technology 2 years ago - we are reaching a plateau
... we just evaluated 90 proposals, asking 240 mill. Euros, we only have 50 mill. Euros
... we can only select one of five projects
... there is still one more call coming up for SME: 35 mill. Euro for sharing data / language resources
... there is still three weeks to put in a proposal
... once SME call is other, we will have about 50 projects
... we spent 150.000 Euros to fund a survey, interviewing many people in European states
... asking about language use while being online
... results will soon be public on our web site and europe barometer web site
... results are that use of other languages is mostly passive
... when people write and engage in social networking, they prefer to use their own language
... 44% said: they are missing important information because they don't understand the language used
... thank you, have a succesful conference

Ralf Steinberger, "Complementarity of information found in media reports across different countries and languages"

ralf: talking about attempts to give access to information across languages
... monitoring news in 50 languages

ralf introduces JRC

ralf describes the news sources used for "media monitoring": 100.000 news articles gathered per day, in 50 languages

ralf: articles are converted into rss for further processing

ralf gives examples of news coverage: not always news are available in English, but sometimes more is available in other languages

ralf: we also find out co-occurences: who or what is mentioned with whom or what in different languages?
... also analysing quotation networks: who gets mentioned by whom, also different depending on the language
... recognition of entities (mostly persons) in about 20 languages
... multilingual categorization, using about 1000 categories, using boolean search word operations, optional weights of words, co-occurance and distance of words, regular expressions for inflection forms (not only morphological)
... multilngual categorization in general and specific for medicine in the medisys - system
... classifying countries and category, e.g. there is 1/2 article about tuberculosus in tzech, but if suddenly it is 5 articles a day, we can issue an alert

ralf introducing news explorer - multilingual news daily overview

ralph: application about multilingual template filling - NEXUS, extracting structured information about events
... focusing on conflicts, crimes, desastors, ...
... want to know if there is a desastor with the need to send aid etc.

raplh: summarizing: have demonstrated our EMM system, technologies being used, application scenarios
... modest attempts to get access across languages, but users appreciate it and it shows that the Web is not only for English

Welcome session, Q&A

<Zakim> chaals, you wanted to ask about how users will distinguish papa.it and papá.it

domenico: punycode translation of papa.it and papá.it is different, so sure, yes

XYZ: question about nexus: if a news paper says "person X is a freedom fighter, another saying "person X is a terrorist", who do you deal with this?

raplh: there is political analysis being done, but categorization like the above is normally not being done
... system is publicly accesibly via our home page

Developers

Adriane Rinsche opens Developer session

Steven Pemberton, "Multilingual forms and applications"

Steven talks about HTTP content negotiation

Steven shows some examples of content negotiation

Steven talks about possibility of providing more better 404 error pages

scribe: and 406 pages
... some servers like www.google.com ignore content negotiation headers
... and try to guess your location based on your IP address

<Tomas> Most do. The general problem is Multilingual Web Sites (MWS).

scribe: another approach is to have button for changing language on the web page itself
... some sites even use Javascript to change content inside the page

After summarizing some bad practices in serving multilingual websites Steven now introduces XForms

XForms separate data and presentation. Steven shows this on example of the simple form

scribe: XForms can contain calculations
... controls are abstract and can get different styling easily
... it's possible to use different datasources

Steven shows form which can dynamically change labels for form fields based on the selected language for the form

scribe: XForms use declarative approach which require much less work to produce
... conclusion - XForms allow to use "language stylesheets" to create multilingual forms even if this wasn't original goal for XForms

<Tomas> It is in my presentation this afternoon. An overview http://dragoman.org/mws-india.html

Charles McCathieNevile (for Marcos Caceres), "Lessons from standardizing i18n aspects of packaged web applications"

Chaals introduces Widgets technology

scribe: history of Widgets development and standardization in W3C
... Widgets are now split into 7 specifications

Chaals shows source of simple Widget

scribe: describes l10n features of Widgets
... Widgets use xml:lang and for more larger resources separate language specific directory can be used
... Widgets do not use ITS because namespaces are too hard for some web develpers, instead few specific attributes and elements were adopted (span, dir, xml:lang)
... Opera extensions are based on Widgets
... l10n is hard, you should get advice and do proper test

Richard Ishida, "HTML5 proposed markup changes related to internationalization"

Richard tries to explain what HTML5 mean

scribe: Richard will talk only about HTML5 specification
... not about related things like CSS3, new Javascript APIs, ...
... HTML5 endorses utf-8 encoding
... simplified encoding declaration <meta charset=utf-8>
... polyglot documents are both XML and HTML5 (HTML syntax) documents, use utf-8, no XML declaration

<Steven> Actually, XHTML 1.0 had the same thing, but didn't call it "Polyglot"

<Steven> But it was addressing the same problem

scribe: charset attribute was removed from link and a elements
... language declaration can use lang attribute or content-language HTTP header
... content-language can contain more languages then one
... content-language was just recently removed from HTML5 draft

Richard now explains Ruby

<chaals> [Ruby was very common in western medieval texts, where greek, latin, hebrew etc would be mixed. E.g. religious texts, and scholarly documents]

<Steven> Yes, Chaals, it is very useful for other things than Ruby; pity they called it Ruby mark up, since it is more than that

scribe: HTML5 have support for Ruby, but uses slightly different markup then XHTML 1.1 or ITS (missing rb element for base text)
... Bidi support
... HTML5 adds bdi element for bidi isolation
... dir="auto" allow run-time decision about directionality

<Steven> I sent a last call comment to the ruby WG, saying they should call it something more generic, but they declined "because Microsoft had already implemented it"

scribe: Richard invites all to get involved in spec development

Gunnar Bittersmann, "Internationalization (or the lack of it) in current browsers"

Gunnar talks about some problems in the HTML5

scribe: validation of email input type field is too restrictive in spec - doesn't support IDN
... each browser provides different UI for changing preferred language
... some browsers has bugs in this

<Steven> Some browsers have bugs, but some do it completely wrong :-)

scribe: language negotiation is missing some feature
... how to label original and translation
... how to label human and machine translation

Jochen Leidner, "What's Next in Multilinguality, Web News & Social Media Standardization?"

Jochen shows mind map of presentation

scribe: presents details about Thomson Reuters company
... customers require high quality
... combination of human and automatic methods is in use
... XML and Unicode is heavily used
... main issue is not lack of standards but developer education
... i18n and l10n is not a part of curriculum
... new challenges are supported for multimedia content

... some content is hidden (Facebook, Twitter, ...)
... proposes more open twitter-like messaging system with better support for i18n
... it might be useful to HTML tag saying that some page is translation of a different page

Developers session, Q&A

Question from Google: Defends current state of affair regarding language selection. Asks whether easier UI will help?

Chaals: Interface should be easier to use, most users doesn't set their language
... content should contain as much metadata as possible to inform about alternative versions of content

Richard: mentions some extension that allows easier change of preferred language

Question from Olaf: What is chance to implement some notation for marking document being in the original language.

Chaals: There are many notations starting from simple rel= going to RDF
... you should use it, browsers will support what is used on the pages visited by users
... you should talk to producers of content creation tools

Richard: you should be more involved, create proposals, ...

Felix Sasaki: It's possible to introduce new language subtag for this

<fsasaki> .. use the ietf-languages list to discuss this with the people reviewing such proposals

Creators

Felix: Welcome to afternoon session

Dag Schmidtke, "Office.com 2010: Re-engineering for Global reach and local touch"

Dag: 37 langs, 51 markets
... some countries have more than one language (eg Belgium, Canada)
... adding value to Office
... content, templates, also sell Office
... campaigns in different markets at different times
... market specific engagement
... Recent migration, site management and authoring from XMetal to Word
... and using sharepoint instead of a custom publishing system
... we did extend Word to support this
... allows federated authoring
... helps with localization
... Lessons from this migration
... internationalisation was a key stakeholder
... designed for scale
... it was quite an effort, next time we won't do everything at once
... 100s of thousands of help documents for at least the last three releases
... content heavy
... complexity wasn't where we expected, and was more complex than we expected
... General lessons from the site
... Serve all global market needs, English is just another language
... scale up *and* down
... design for growth

[gives example of content originating in Japan, and translated to other languages]

Dag: No character formatting, only character styles
... We have an XML format for translation
... Local touch
... deliver right experience to each market

[examples]

Dag: Customer connection
... feedback, evaluation, SEO

[examples from site]

Dag: Continuous updates
... respond to regional events, A/B testing
... use some machine translation
... Future trends
... moving to the cloud
... multilingual multimedia
... language automation
... interoperability with standards
... Conclusions
... It is possible to design for scale and local relevance

Jirka Kosek, "Using ITS in the common content formats"

Jirka: tag set designed to help with translations
... usable with any XML vocabulary

[example of use]

Jirka: Allows automatic software to see what should not be translated, as well as human translators

<chaals> [As Jirka said, you don't have to use the actual ITS namespace to use the ITS pieces - and the decision for widgets was indeed to do that]

Jirka: Now to look at formats that support ITS
... first DocBook

[example]

Jirka: Next format, DITA
... for topic-bsed documentation
... DITA doesn't natively support ITS
... can be added
... Now OOXML
... Open Office, and even for MS Office 2007+
... no native support, but can be added

<jan> Office Open XML is a MS developed standard, not Open Office... ;-)

Jirka: ODF is similar
... XHTML allows use of ITS
... HTML5 has no extension points to allow ITS
... what is to be done?
... HTML5 needs to be augmented to support ITS

Dag: MS translator does support something similar

Steven: If XHTML5 supports it, why not just say "Use XML serialization if you want this facility"?

Jirka: Not sure if people can produce well-formed XML

<Jirka> Slides from my presentation http://www.kosek.cz/xml/2011mlwpisa/

Charles McCathieNevile (for Manuel Tomas Carrasco Benitez), "Standards for multilingual websites"

Chaals: What standards should be developed?
... there are lots of multilingual sites. Substantial problems

<Tomas> I am here ... just in case

Chaals: principles - don't break existing stuff
... expect it to take time
... two sides of coin: users and webmasters

<Tomas> Slides - http://dragoman.org/pisa/carrasco-mw-pisa.pdf

Chaals: But it is often less clear-cut
... Currently - no consistent user interface for a ML website.
... this should be fixed
... No standards for multilingual content production
... this should be fixed

<Tomas> No standards for content production - in general - not a particular problem to MWS

Chaals; Most users are monolingual

<Tomas> One needs hard data

Chaals: Webmasters must manage multilingual system
... users don't want more complexity
... webmasters aren't necessarily experts in this stuff
... interfaces for content from the user side are well-established
... not so for webmasters
... Some ideas - language button in the browser
... use HTTP header fields maybe
... content negotiation

<Tomas> Another good "high level" variant is memento http://www.mementoweb.org

Chaals: reserved URIs
... I am not sure if reserved URIs are a good idea
... It should be possible to request a translation
... there's an Opera extension for that

<Tomas> A reserved URI is very good as one can have all the pages in the MWS with the same URI pointing to the variants

<Tomas> maitaining pages with different URIs for the variants is very hard

Chaals: need a metaresource concept

<Tomas> RDF might do it - needs verification

Chaals: Need server-side standards

<scribe> Scribe: RDFa was largest growing web format last year http://rdfa.info/2011/01/26/rdfa-grows/

Chaals: Next step? Working group maybe
... at W3C? Elsewhere?

<Tomas> No WG, not specifications

Chaals: or create a new initiative?
... Need guides for best practice on user and webmaster sides

<Tomas> A tabular view http://dragoman.org/mws-india.html

Sophie Hurst, "Local is global: Effective Multilingual Web Strategies"

Sophie: 90% of HP's customers buy based on content rather than touching product
... 42% of web users are from Asia
... only 13% from USA
... yet English still leading language
... asia has highest usage but low penetration
... therefore it's a growth area
... 10% retail sales in China are done online

<chaals> [My concern with reserved URIs is that it breaks some existing standards and expectations. I think HTTP headers and metadata are better approaches. (I generally hate reserved URIs - they are used in P3P, favicons, robots.txt and a couple of other places, but I don't think they're going to handle the complexity of multilingual websites without creating as many problems as they solve...)]

Sophie: How to represent brand consistently, locally
... how to make it relevant

<chaals> [I certainly think that being able to get the information about available variants is really important]

Sophie: how to manage translation
... First is to use component based system

<Jirka> chaals: yes, but it might be sufficient to have link/http header pointing to another URL where manifest listing all possible variants will be sitting then to have dozen of alternatives in each page -- to much change when new translation is added

Sophie: synchronisation between compnenet sis then easy to manage
... allows local components, but global style
... eg Emirates site
... Use positioning information to personalise information
... example, Lux brand which is up-market in India, but not elsewhere
... need local input to ensure local nuances are working
... users come with cultural layers as well
... cultures vary in many dimensions
... Finally, managing content
... need a well-managed process

<Tomas> [The browser side is much better, but we have to care for the server side. This is the question: how to implement the server-side. Separate function from the mechanism: we can explore different mechanisms. One fix reserved URI for the whole server combined with the Referer header will certanly resolve a big problem (different URIs for each page).

Sophie: can be automated to large extent (the management, not the translation)

[shows an example process]

Sophie: In conclusion, translation must be part of a larger picture
... use component, geo-positioning, and translation management

Creators session Q&A

<Tomas> Question of scope: what should be in MWS and what in other specifications for full translation system.

<Tomas> The picture is larger: Authorship, Translation and Publishing Chain

<Tomas> Translation is only part of the whole production chain

Christian Lieske: For Chaals- I got different messages - we've got to do stuff, but Sophie seems to suggest we can already do it.

Chaals: It's not that we can't do it already, but that there is no agreed way to do it

<Tomas> We need to define the different scopes and how the different fields integrate; a MWS is *not* a translation management system.

Chaals: We have no interoperability

<Tomas> You wont another beer !!!

Sophie: Changing solutions is hard, standards could help

<Tomas> We need to identify what is particular to MWS and is general.

Sophie: We should work towards a position where you need less developers

<Tomas> Language is just one of the dimensions in TCN; e.g., mementos should be integrated in the same mechanism http://www.mementoweb.org/

Dag: We have a translation tag, but it is not standard, so there is less customer value, in the long run a standard lowers the cost of entry for us

<Tomas> +1 regarding further development of XLIFFpers: one should be able to construct a MWS from Apache out of the box

Tomas Abramovitch: Do you use different CSS for different cultures?

scribe: and how accurate is geo-location?

<Tomas> One could (CSS)

Dag: We componentise our pages, the local part is not done by CSS

Sophie: I can't totally answer the geo-loc part.

Chaals: It is a spectrum from identifying one seat in an audience to just someone in a country

<Tomas> One could generate some pages: "5.3. Generating language in parallel" in http://dragoman.org/mws/oamws.pdf

Ian Truscott: identifying people is always a guess until they log in

<Tomas> Or he set his browser preferences

Reinhard: How do we learn from research? No one has mentioned this
... different people like different things
... 16 year olds in China have more in common with 16 year olds in the USA than with their parents
... all I've heard is corporate policy. Why not let the user decide?

<Tomas> A user wants the page in his language

Sophie: Crowd sourcing is an option

<Tomas> Choosing is already a hurdle

<Tomas> We need to look at all the available mechanisms and decide on a recommendation: "4.4. Options" in http://dragoman.org/mws/oamws.pdf

Dag: There are areas where our interest and the users' coincide
... but we can't do translation on demand
... they pay for premium product

<chaals> [It isn't always a guess identifying the user until they log in. In fact, technically it is often easy to identify users anyway - this is why we have laws to protect privacy and limit the things done to make it easy]

Steven: A good example of Reinhard's point is websites that conflate refgion with language. I often don't knwo which question they are asking.
... and I don't believe that most people are monolingual. There are 6000 languages, and 150 countries. Most people are at least bilingual

[scribe's computer is nearly out of battery]

<Tomas> [we need to identify what the user wants, not who he is]

Reinhard: Crowdsourcing translation is often not possible because of copyright issues

Olaf: We need the possibility to offer translations of parts of sites
... it works on wikipedia

<Tomas> Monolingual user: we need hard data; but circunstancial data point to that the requirement of most user is monolingual.

Olaf: microsoft needs to open its translation tools

Chaals: I use crowdsourced translation of Norwegian law
... it is easy to do, but by and large it doesn't happen
... too little reward

<Tomas> Translation integration in MWS: a language non available could be defined as a "language potentially available" (after translation). One needs a mechanism covering all the aspects of the different translation techiques: human (professional, crowd), machine (fast as RBMT or slow as SMT).

<Tomas> For the whole enchilada: "Open architecture for multilingual parallel texts" http://arxiv.org/ftp/arxiv/papers/0808/0808.3889.pdf

Localizers

Christian Lieske et al., "The Bricks to Build Tomorrow's Translation Technologies and Processes"

christian: five areas show that there is a need for change:
... demand for language related services, shortcomings of today's translation-related standards, ...
... why talking about standards: demand & lack of interoperability
... lack of interoperability e.g. for XLIFF
... things break down across tool chains
... standards in localization area are sometimes not compatible
... example of phrases in TMX vs phrases in XLIFF
... not of work in localization standardization integration new web technologies
... e.g. aspect of RESTful services, use of related protocols (odata, gdata) for translation related services
... these problems have lead to implementation challenges, problems for standards that are already here
... how to solve the problems: four areas of requirements, methodology, compliance , stewardship are important
... requrements: identify processing areas related to language processing - and keep them separated
... determine the entities that needed in each area
... chart technology options and needs
... etc. Next: methodology:
... distinguish between models and implementation / serialization
... distinguish between entities without context and entities with business / processing context
... set up rules to transform data models into syntax
... set up flexible registries, e.g. CLDR, IANA
... provide migration paths / mapping mechanisms for legacy data
... third, compliance: e.g. what does "support for standard X" mean?
... finally, stewardship: driving, supporting standardization activity
... anyone who shouts for small standards should be willing to invest
... EC has a track recor, see e.g. mlw project
... make donations / contributions easy
... discourage fragmentation and unclear roles
... LISA does no longer exist, now there is a kind of competition who could follow in the footsteps
... my fear is that another organization is being cretaed, my and probably Felix' and Yves' thought is that this should be avoided

David Filip on "Multilingual transformations on the web via XLIFF current and via XLIFF next"

David: christian has covered a lot for XLIFF 2.0 - what do I want to cover?

david: my main statements: metadata must survive language transformations, content metadata must be designed upfront with the transformation process in mind, XLIFF is the principle vehicle for criticial metadata throughout multilingual transformations
... and finally: next generation XLIFF standard is an exciting work in progress in OASIS TC
... about preserving metadata: there are various transformations: g11n, l18n, l10n, t9n ("GILT")
... transformation modi: manual, automated, assisted
... transformation types: MT, human translation, postediting, stylistic review, tagging (semantic, subject matter review, transcribing), subtitling, ...
... growing number of source languages
... what metadata is necessary?
... preview and context are critical
... argue for creating standardized XSLT artefacts for preview
... metadata for legally conscious sharing (ownership, licensing, ...)
... grammatical, syntactic, morphological and lexical metadata
... example of m4loc project: they developed an XLIFF middleware to ensure interop between localization open source tool and moses MT tool
... tagging of culturally and legally targeted information
... home for LT standardization? Leverage BP of existing loc standards (XLIFF, TBX, SRX, ...) - pointing into the past (OASIS, LISA)
... now: leverage OASIS XLIFF, ISO TC37, Unicode SRX and GMX
... further development of W3C ITS and RDF, create conscious standardization including RDF and XLIFF
... OASIS is home of XLIFF, but has also UBL and XBL as its home
... W3C has ITS and RDF modeling, Unicode - see above
... ISO TC 37, important not for standards creation but for secondary publishing
... why XLIFF?, and why 2.0? see also presentation from christian
... good progress of XLIFF in 2011 possible, as SWOT analysis shows
... prediction: 2011 will see definition of new features, in 2012 new standard

Sven C. Andrä, "Interoperability Now! A pragmatic approach to interoperability in language technology"

sven: kilgray, "we localize", Andrä, biobloom are behind the "interoperability now!" initiative
... translation (technology) industry is a niche industry
... very few computer scientists here, not a technical, but experience driven industry
... industry is getting more and more important, including technology
... hence interop is getting more important
... there are enough standards here, but they are complex, not many have reference implementations
... and there is little exchange within tool providers

table of features in XLIFF that are supported by all tools - only two features (from about 50?) are supported by all tools

sven: we want lossless data exchange in a mixed (tool) environment
... standards are important, also develo
... but mindset is most important, i.e. about the lossless data exchange
... basis of our work: "interoperability manifesto"
... pushing standards over the edge, give feedback to the TC
... modules that we are working on: about content, package, transportation
... content is modified xliff
... package is currently just made up
... for transfortation we are using regular web services
... basic approach: disclose our concepts
... reference implementations are open source
... early real life usage
... test scenarios to verify compliance
... theoretical aspect: agile vs. standard?
... would be good to have a framework for organizations like W3C that could help is to bring this into standardization step by step
... benefits of this approach: it is a limited time that we are working on this

David Grunwald, "Website translation using post-edited machine translation and crowdsourcing"

david: our vision: have a box that creates quality content very quickly and cheaply
... using MT, we want an efficient solution that will make mlw a reality
... need to develop MT which is good for blog publishing
... MT will never be ready "as is" for human quality translation
... we developed a system for cheap and quick post editing
... currently, explosion of content, lots of it is local because of language barriers
... translation costs are very high
... we are targeting open source CMS platforms
... 20 % of web sites are published on such platforms
... we could offer a good translation solution to these
... large media publishers who use open source CMS
... wordpress, movable type are created for all kinds of web sites, not only blogs
... our solution: based on MT; human post editing, and crowd sourcing
... crowdsourcing startups in many regions
... our solution: not automated open source CMS solution for small guys
... no automated tools for post editing / MT either
... our solution uses data from blogs that is available on the web
... workflow: user installs workpress, MT is done, email notification is sent to crowdsourcing translators, integrated after review by a moderator
... interested in opportunities for funding this kind of work

Pål Nes, "Challenges in Crowd-sourcing"

Pål: opera has been using crowd sourcing for a long time

Pål: caveat of crowd sourcing: it is not free, organizing it is difficult
... e.g. employing managers for the crowd
... should only be used for cretain tasks
... not for time critical tasks
... mostly students are participating, picked up from university talks
... large crowd is not necessarily a good crowd
... better 3,4,5 good translators, than 50 translators doing nothing
... e.g. press releases, marketing material are not well suited for crowd translations
... good for crowd sourcing: applications (web site "my opera", "opera com"), with a stable set of text
... and documentation, that is easy to maintain
... start small, put your crowd under embargo / NDA
... try building up a hierarchy
... be careful with your branding
... and your terminology
... for opera we used XLIFF - we used our own, incompatibly version of XLIFF
... discovered that open source is not open standard
... tools we used: gettext and po4a, transifex, translate toolkit with pootle and virtaal, homebrew applications to bridge the vast gaps
... XLIFF is a mindfield, in the current version
... about html: keep it as simple as possible, semantic markup is key
... write proper CSS - write a separate RTL - stylesheet to negate RTL-challenged CSS

Eliott Nedas, "Flexibility and robustness: The cloud, standards, web services and the hybrid future of translation technology"

eliott: everything that was said from David, Christian etc. in this session about interoperability was right, I concur with them
... we need standards because of interdependence.
... the demise of LISA. Sad that they are gone, but opportunity to look into this in a new way
... LISA standards are important
... now is a good opportunity for a new model of standardization
... new kids on the block: TAUS and Gala

eliott: currently losts of different technologies
... and many different standards
... OAXAL is a solution that brings these together
... that can be used for free

description of various aspects of standards and applications built on top of it

eliott: how to spread the message: important e.g. in academic curricula

Manuel Herranz, "Open Standards in Machine Translation"

manuel: presentation about PangeaMT project
... translation is something that you have to go through for achieving what you want
... web people expect immediate translation

manuel: why don't we have immediate translations?
... inroducing pangeanic: LSP, major clients in Asia and Europe
... we wanted to provide faster service for translation
... became founding member of TAUS
... four years ago created relation with computer science institute in valencia
... challenge at that time: turn academic develpment (moses) into a commercial application
... limitations: plain text, language model building (first), no recording, no update feature, data availability, ...
... objectives: provide high quality MT for post editing
... and to use only open standards: XLIFF, tmx, xml
... built an TMX - XLIFF workflow
... not to be locked into a solution
... PangeMT system: comes as TMX or as XLIFF
... TMX should not die, people are still using it
... future work: on the fly MT training
... pick and match sets of data
... objective stats for post-editors
... confidence scores for users

Localizers session Q&A

reinhard: thank you, was a great session
... about remarks on crowd sourcing: there was emphasis on crowd sourcing for enterprise
... this does not go well together
... other people like rosetta foundations, translators without borders etc. have made good experiences

Pål: crowd sourcing was good for us

scribe: it just took us a lot of effor and time to get there

jörg: there is some similarity: you have to train translators, otherwise you won't get the good results in medical translation

felix: one comment on interop now, it is very important to go into a standards body as a next step

sven: thanks, we will definitely try to do that

richard: w3c just created business groups / community groups, that might be a thing for you to look into

david: about what reinhard said
... if your expectation is high you will be disappointed, but the business case is in the future

MLW workshop, PISA

04 Apr 2011

Attendees

Contents