MLW workshop -- 21 Sep 2011

This is the raw scribe log for the sessions on day one of the MultilingualWeb workshop in Limerick. The log has not undergone careful post-editing and may contain errors or omissions. It should be read with that in mind. It constitutes the best efforts of the scribes to capture the gist of the talks and discussions that followed, in real time. IRC is used not only to capture notes on the talks, but can be followed in real time by remote participants, or participants with accessibility problems. People following IRC can also add contributions to the flow of text themselves.

Welcome

welcome address from Kieran Hodnett

richard introduces the project and the workshop

... many thanks to the sponsors lionbridge and meta-net

... if you create resources (blog entries, fotos, etc.) related to the workshop, please use the tag mlwlim

richard describes the setup of the 2nd day: breakout sessions

... idea is to get more feedback from you on standards and best practices for the mlw

Daniel Glazman, "Babel 2012 on the Web"

daniel: co-chair of CSS WG. In HTML WG; also working on editing software more than 20 years

quotes from CSS working group (1998): "is it really important to support boustrphedon or mongolian in CSS?"

... "since many countries use characters which are not part of ASCII, the default character-set for modern browsers iso-8859-1" (from w3cschool web site)

daniel: above is totally obsolete

.. top 10 languages on the web. june 2010: English still dominant

.. but most other languages use different scripts, different writing directions etc.

... technological bits of the mlw: utf-8, MIME, IRIs, Accept-language, HTML5, CSS3, xml:lang, ...

... what is on the radar today in CSS:

... screenshot of the business card from richard ishida

<r12a> business card: http://www.flickr.com/photos/ishida/4462733374/

... (no vertical text, you should add it ;)

... card is done with HTML

... card is great, a good example of what we want to achieve

... daniel: HTML5 charset

... original charset for HTML5 is utf-8

... authoring tools should use utf-8 only

... recently got files that were not in utf-8, was quite difficult to handle. Please don't create files anymore that are not utf-8!

... language tagging. There is still xml:lang and lang attribute available.

... authoring tools rarely set the language or even offer the user interface for it

... we need only one attribute for language ("lang"), and not xml:lang. Everyone should recognize "lang"

... links: the hreflang attribute can target only one language

... if you want to add a link to multiple languages, it is not possible

... about direction: not enough input on how to realize that in HTML5, need more input from the various communities

<Jirka> XSL-FO has already property for this http://www.w3.org/TR/xsl/#writing-mode

... if you want to put data in a text area in multiple writing scripts, it is still not possible

... forms: date and calendar issues, issues with time zones, "what's a name?", ...

... javascript - poor localization. user interaction entirely based on UA's language and direction

... Node.js is spreading. you don't write php anymore. If your website is not localizable, we will move all issues to the server side. It will be a mess

... DOM - charset an issue.

... PHP has issues

... question - what language has good practice for internationalization?

... no

... for people working on programing languages it is out of scope

... they say it should be done in the framework

.. there is no wide spread programming language with good i18n / l10n features

... CSS3 writing mode

... example of japanese vertical text

... writing mode has to work for all languages and mixtures of directions

... these is one screenshot from the css writing mode spec, containing mongolian. So we are going to do what has been missed in the past

... now css3 text

... hyphenation is based on dictionaries

... very complex since it is language dependent

... in css3 text we have emphasis marks, good for highlighting in asian languages

... css3 columns

.. being requested by newspapers in the web

.. implemented by many browsers already

... css3: lists

... list-style-type property extended to dozens of values

... ability to define your own is missing

... css3 box model

... left and right were used all over the place

.... css3 fonts

... with language specific display, control of glyph substitution and positioning of east asian text

.. ruby in css3: annotation mechanism mostly used for Japanese

... epub3 another important area with i18n facilities

... conclusion

... HTML5 plus CSS3 will be the new pivot format for new wysiwyg editors with good i18n

.. massive adoption of epub3 in asia

.. will help the multilinguality on the web

q&a

thomas: how many people know BlueGriffon?

http://bluegriffon.org/

daniel: editor that implements a lot of HTML5 / CSS3 features mentioned here

Developers

Tadej introduces sessions and speakers

David Filip, "MultilingualWeb-LT: Meta-data interoperability between Web CMS, Localization tools and Language Technologies at the W3C"

David: explains terminology related to EU projects and l10n
... CSA, W3C, WG, LSP, TM, MT, TMS, CMS, CCMS, OASIS DITA, XLIFF, ...
... explains relation between LT-Web and MLW-LT
... LT-Web is CSA (Coordination and Support Action) funded by EU
... LT-Web members will join W3C
... introduces members of MLW-LT
... 3 main scenarios: (1) Deep Web <-> LSP; (2) Surface Web <-> Real Time MT; (3) Deep Web <-> MT Training
... metadata in question
... data categories based on ITS (translate, localization note, terminology, language)
... additional categories - translation provenance, human post-editing, QA, legal metadata, topic/domain, ...
... input is welcome, work will be open under W3C
... more input is expected tomorrow during break-out session

Christian Lieske, "The journey of the W3C Internationalization Tag Set - current location and possible itinerary"

Christian: makes survey - about 1/3 of audience is already aware of ITS
... ITS is about annotations
... which part of content has to be translated?
... does element "x" split run of text into two ling. units?
... shows example of XML and exmplains what kind of additional metadata are needed to support translation and correct processing
... explains ITS data categories (ed. for more info see http://www.w3.org/TR/its/#datacategory-description)
... ITS is supported in various tools like SDL Studio, XTM, ...
... support in open-source tools, Okapi framework
... ITS2XLIFF

<fsasaki> link to ITS2XLIFF tool: http://fabday.fh-potsdam.de/~sasaki/its/

Christian: shows support of ITS in various content formats
... HTML5 still doesn't support at least translate data category

<fsasaki> bug to vote for such support: please vote at http://www.w3.org/Bugs/Public/show_bug.cgi?id=12417

Christian: suggested enhancements to ITS
... targetPointer, idValue, local elementsWithinnText, whitespaces, context, localeSpecificContent, ...
... outlook: further usage scenarios in MLW-LT, possibly ITS 2.0

Gunnar Bittersmann, "CSS & i18n: dos and don'ts when styling multilingual Web sites"

Gunnar: problem - long words (e.g. in German)
... possible solutions: cutting, soft hyphens, hyphenation (in future will be implemented in browsers)

There are JS libraries for client-side hyphenation: http://code.google.com/p/hyphenator/

Gunnar: current support for hyphenation in browsers is poor
... problems with layout, when changing directionality
... for RTL language it might be necessary to introduce special rules in CSS
... tips: not assume that something will fit inside box
... use text effects appropriate for language
... flip everything for RTL scripts

... use one stylesheet for all languages
... use soft-hyphens for long words or use hyphenation
... CSS3 adds hyphenation, new text emhasis, text-align supports start/end not just left/right
... styling by script

Developers session, Q&A

daniel glazman: Farsi and Arabic use different fonts, you need different CSS rules for them

... similar for various Cyrillic languages

Felix Sasaki: There is gap on client-side, as Daniel Glazman pointed out. as daniel pointed out. MLW-LT should close some of these gaps, focusing on simple definitions like "What is translatable content?" provided by ITS.

Gunnar: Support for script styling independent from language could still be useful.

Q: Question about ITS. Should ITS remain small, simple and lean?

Felix: ITS should be kept small and simple.

Christian: Survey again, now more then 1/3 is aware of ITS

Creators

Charles introduces the session - problems with multingual content production in three areas

Moritz Hellwig, "CMS and Localisation – Challenges in Multilingual Web Content Management"

Web Site Ellviva.de as example why multilingual capabilities are needed in CMS like Drupal

Link to legacy system as trouble spot

Complex workflows and image database as additional challenges when it comes to multilinguality

A web offering does not just only need to serve multilingual content

it also needs to be able to offer functionalities related to multilingual content; example: return relevant search results

Integration with translation processes are key for multilingual content on the web

Sample issue: Terms and terminology

To be specific "marker" that allows to say "treat with particular care"

Sample issue: lack of domain information (e.g. for adequate Machine Translation)

To be specific: translation of something like "sake" may not be possibe without info on domain

Domains may be originate in two dimensions: subject area (e.g. financials), or audience (e.g. medical doctor)

Sample issue: Translation workflow integration

To be specific: meta data needs to be able to travel from a CMS to a Translation Agency

XLIFF possibly is a good choice for encapsulating/packaging data

Suggestions for addressing issue in CMS: meta data standard, agreed-on content format like XLIFF

MLW-LT holds the promise to take care of the issues

Customers and Translation Agencies will benefit for improved capabilities in CMS-based Web presences

Danielle Boßlet, "Multilinguality on Health Care Websites - Local Multi-Cultural Challenges"

Looked at 3 Health Care related Web Sites

Analyzed technical aspects and cultural aspects

Example: www.who.int

Shortcomings: content offering differs between languages

Example: Info in Spanish is not as rich as info in English

English 27 links, Spanish 10 links; Spanish: no info on Cancer Control Programme

Special topic "linked content": in Spanish some link texts not translated, some links lead to English content

Example: ec.europa.eu

"Technical" shortcomings: none of the 19 links on the Spanish site lead to Spanish content - all lead to English

The reader is kind of deceived: a lot of content seems to be available in Spanish - but in reality it is not

Additional issue: Site Map and A-Z Function only availble for English

Example: German Government Web Site

www.bmg.bund.de

Example: Layout is not consistent across languages

www.bzga.de

Example: ...

Good: There is the information that not all info is available in all languages

Summary

User support (e.g. site map) may not be considered to be part of core localization processes

Links need special attention: has translated text, leads to content in user's language, indicates language of linked content

Lise Bissonnette Janody, "Balance and Compromise: Issues in Content Localization"

Theme: Need to have content strategy up front

Perception of localization for web content

costly, time-consuming, creates complexity, needs special tools

Possibly issues with content strategy: no governance, cost, ..

What is useful, usable content?

appropriate, useful, user-centered, clear, consistent, ...

Useful, usable content does not just happen - it requires efforts in several areas

Set objectives, understand what you have, define a plan

What is useful, usable _localized_ content?

applies to local context, addresses market-specific purpose, users understand it easily, terminology and brand requirements upheld, ...

Step 1 towards localized content: set the target

Cultural forces (e.g. language preference), site objectives (e.g. inform), internal forces (e.g. market presence), market forces (e.g. legal and contractual obligations)

Possible specific baseline targets: number of languages, IA model, critical mass of localized content for each tier of site

Step 2: Examine what you have

volume of web pages, volume of associated content assets, speed of change, ..

Governance model: centralized or decentralized language version creation

Tools, time and metrics related to your language-related processes

What toolset do you have (e.g. Translation Memory), latency that is acceptable until localized version is available, ...

Step 3: Make smart localization decisions

Example: Do not just try to translate everything. You may keep, chunk, change, ...

Pay particulat attention to what is available locally.

Address localization at level of tiers and locales

Creators session, Q&A

Q (for Danielle): Do web site have info on what has been translated?

Q (for Danielle): Can providers of the web sites be easily contacted?

Danielle did not get a chance to discuss her findings with the web site providers

Usually, only info is available on the general policy to provide local language versions; it is hard to find out whether specific content really is or will be translated

Comment: Some aspects mentioned (e.g. related to User Assistance) are not just linked to multilingual web sites

Comment: Often, resources are the most challenging dimension for offering multilingual web sites

Q (to everybody): there are many standards available for creating multilingual web sites; to which extend are the standards being applied?

The examples related to links (e.g. translated text) indicated that the standards are not applied too often

A: Many standards are not being applied - yet. There is a time lack. However, as globalization needs increase, more people realize that the standards help to build solutions.

A: Standards are being used - however, there are surprises. Example: the WHO web site does not offer language negotiation, a German web site is not encoded in Unicode

Comment: Transparent content negotiation (standard from the 1998) would help to solve many issues

Q: How can better implementations of multilinguality be "motivated"? How can we make companies that implement properly rich?

<chaals> ... and what is the specific role of SEO in that?

Localizers

Christian introduces the speakers: speed, volume and cost are important in localization

... all speakers have to say important things about these issues

Matthias Heyn, "Efficient translation production for the Multilingual Web"

matthias: content is translated by professional translators, by non-professional, by MT, or untranslated

.. this presentation focuses on content production, related to high quality on the web

.. professional translations can be done in any authoring tool, in MS word + plugin, in dedicated translation editor

... here looking into translation editor

.. these have explicit representation of source and target language, do some segmentation

... they abstract from formatting information

... productivity accelerators: at topic level, segment level, subsegment level

.. mechanisms to help here impact on update translations, new translations, redundancies etc.

.. example for topic level: "don't translate if it doesn't change"

... markup exclusions: use ITS or other convention to lock text

.. or custom arrangements between CMS + translation system

.. alternative mechanism available under many different names: "perfect matching": go back to previous translation project and see what has not been changed, and look these parts

matthias: increasing level of matching: 100%, fuzzy, context matches, ...

... cascaded TMs, ranking of TMs

... increasing acceptance of foruming with (statistical) MT

... productivity gains depending on SMT engine training, possibility to choose in-domain trained engine

... and trust stores

... "trust score": determines whether a proposal is useful or not

.. scope is on document and sentence level

.. automate the "retraining" of MT engines a hot topic

... subsegment level: auto-suggest

.. strategies: display not too many suggestions, avoid noise

... relevant and related standards:

... ITS important on the topic level, XLIFF on the xxx level, something missing for auto-suggest

.. current theme for CAT tools is reviewer productivitiy

.. and how to have information available in the production chain

Asanka Wasala, "A Micro Crowdsourcing Architecture to Localize Web Content for Less-Resourced Languages"

asanka: focusing on languages that are not yet well rrepresented on the web

demo - a language that has no MT online system for translation available - how to get the content (MT) translated?

... web is no accessible for people who do not understand English

.. English dominant web pages results in loss of business - millions of dollars per year

... crowed based web content localization can help with the problem

... idea is simple. Example: you visit a website, right click, and translate it

.. architecture: extension talks to server. Translation goes to server, stores the translation as XLIFF

.. then you get the translated web site from the server

... issues: legal, updated, formatting, translation voting, deployment

... standardization mechanism - different extension mechanisms in brooners

... need standardization of these

... summary: MLW is not just about top 10 languages

... even small extensions like dictionaries play a crucial role in MLW

more detailed demo of the system

Sukumar Munshi, "Interoperability standards in the localization industry – Status today and opportunities for the future"

sukumar: interoperability in localization industry

.. many perspectives, data sharing and processing. Definitions from IEEE glossary & wikipedia

... aspects of interoperability: data management, technology usage, business purpose, regulatory aspects

... and process benefits

... who cares about interoperability?

.. interop standards are complex in nature

.. issues with file formats are still common

... some vendors embrace vendors wholeheartly, others make their own "story" out of them

.. interop standards: TMX, ITS 1.0, TBX, SRX, XLIFF, ...

... success is heterogenous

... some concepts have become obsolete and integrated in other standards

... tool support for a single standard varies across tools

.. interop issues esp. in XLIFF: connectivitiy, format, data, metadata, ...

.. important to agree upon: common data set, expected behaviour for that data set

<fsasaki> above looks like asking for an XLIFF test suite

... interop standards should improve process efficiency

.. examples from other industry, like EDI or HL7

.. have been successful standards for promoting interop

.. example DOCSIS: cable lab pushed the standard forward, making the use of cable modems main stream

.. lack of interop costs a lot of money (NIST study)

.. 1bill usd in automotive for engineering data

.. 5bill usd lost in car supply chain

.. organizations interested in interop in loc industry:

iso tc 37, ETSI, W3C, OASIS, GALA, TAUS

.. EU, Unicode, OMG, SAE, SIG

.. support of gala initiative:

... many standards are not in sync with requirements

.. lack of promotion of standards

.. gala initiative aims at bringing initiatives related to standards together

.. Arle Lommel's presentation will provide more info about that later

Localizers session, Q&A

Lise: question to asanka - how can we get the extension?

xyz: question for matthias: if you will use ITS - how will you work on attribute level?

matthias: as I understand it, there are rules that make that possible

xyz: to matthias - how would you distinguish domains?

matthias: would use atc codes

... typically we use level one of these codes

arle: asanka said that 10 mill. is lost - is that all? Companies might not care about 10 mill

asanka: there was an article about that, provides more details

sebastian: comment to asanka - there are many extensions - one should make the source code public

.. a question to asanka: what would you do if the web site changes?

asanka: we have a translation memory that takes care of that

xyz: to asanka - if you compare your system to wikibasha from Microsoft research

asanka: don't know about wikibasha

xyz: wikibasha is doing what you are doing, in a cross-browser fashion

olaf: to asanka - will the output of your work be open source?

reinhard: of course!

christian: w3c has compliance mechanisms which are not avail. in other orgs

.. other orgs are offering certification

.. above are two issues to take into account for interop

Machines

Felix introduces the session about machines

Thomas Dohmen, "The use of SMT in financial news sentiment analysis"

Thomas: Semlab does financial and healthcare software

... flagship product is ViewerPro, samentic analysis platform for news
... To handle 10,00 messages per day a sw solution is useful to process news as it comes in
... News processed with sentiment analysis, traders can use information quickly
... Spinoff is www.newssentiment.eu
... Research, Eu and national funds, research institute collab. Invested 5.8m euro in last decade - a lot for a small company
... Semantic analysis is versatile, research topics include ontology, lang tech, financial tools and social media
... Semlab involved in Let's MT, smt system fundd by EU FP7. Focus on small languages, Dutch
... Hard to find resources, can do topic specific translation
... Let's MT set up by Tilde, University of Edinburgh, Uppsala...
... Semlab provides parallel corpora for system training, news feeds, Dow Jones
... Goal for project is to integrate translation in sentiment website
... Here's what the website looks like. Shows companies on left, hotspot map ordered by positive/negative, on right the events
... The SMT project integrates a translation function, button. You can click an event/message and get an option to translate to a specific language
... MT, have had surprisingly good results with test sets, but problems with business and finance, cannot publish yet
... Service to translate terminology
... Goal is to translate important financial news from other countries in local language relevant to you
... Semantic analysis or sentiment extraction is sensitive, translation may change subtle elements
... Challenges: MT system integration, standards for APIs. Google API not rich enough for domain specific MT
... Problem with data collection: backlog of news, but not delivered as parallel corpora, not managaeble. Instead focus on news reports, use as source for parallel corpora
... Issue with difference in phrasing, writing
... To wrap up: 2 kinds of things needed:
... Accessibility, standard APIs, find the right services
... Quality assurance, how do you know if data is good when it's scraped automatically

Sebastian Hellmann, "NLP interchange format (NIF)"

Sebastian: NLP interchange format (NIP) is an RDF /OWL-based format allowing combination of NLP tools
... Problem: NLP normally organized in pipelines (UIMA, Gate)
... Integration is hardwired, need an adapter for each tool, no ad hoc integration, difficult to aggregate, not robust
... Comparison chart for criteria for integration
... Example of criteria, support for: typed annotation, annotation type inheritence, alternative annotations - all supported by RDF
... NIF integration architecture: client server model, client has a local db, sends document to different NLP tools, retrieves back RDF
... Each NLP tool has an NIF wrapper
... Example language resources, DBPedia, Wordnet - can also be accessed
... Challenge, how to handle strings with URIs as RDF is made up of URIs?
... 2 schemes to handle: version 1, use begin and end index, not good if offset changes
... But this version is easy to handle, parse
... Version 2 is based on a filehash, more stable, only considers local context. Works even if text before changes.
... Some annotation samples, showing how RDF is used, how tools can add annotation to a sentence, snowball stemmer, adding a tag, stanford parser, merged rdf
... OLiA, Ontologies of linguistic annotation, local annottion model for tagset, linked to olia reference model
... Ontologies can be used to achieve parser, lang and framework independence
... Great for conceptual interoperability
... Roadmap: NIF 1.0 published, http://nlp2rdf.org allows to browse implementations
... Next step is to benchmark string uri properties
... Interactive tutorial challenges online to foster adoption,
... later 2.0 draft will be refined based on implementation experience, can serve as basis for standard

Yoshihiko Hayashi, "LMF-aware web services for accessing lexical resources"

Yoshihiko: Take home message, intl standards for lang resource management, worked out by ISO, can be effectively utilized in implementing standardized language web services
... in particular for accessing lexical resources
... Wordnet type semantic lexicons. Princeton WordNet PWN is a large lexical database of English based on relational semantics
... Nouns, verbs etc are grouped into sets of cognitive synonyms, synsets
... Figure, relational structure in Wordnet, relations between synset, more general concept: hyponym
... Lexical markup famrwork (LMF)
... Standard ISO framework for modelling lexical resources (lexicons). Wordnet-LMF is version for Wordnet type lexicons
... Basic ideas: access to a lexicon is achieved query driven extraction, and presentation of relevant portions, sub-lexicon
... Sublexicon epresented by REST uri, sublexicon can be rendered as Wordnet-LMF

... Explanation of URI patterns, uri specifies lexicon, structural constraints, other attributes.
... Example showing part of speech as constraint in uri
... URI used as query language
... Directives, to collect synsets, retrieve by sense number, by relation

[Resulting document example given]

... Format can easily be conferted to other forms, to html from Wordnet-LMF via style conversion
... Next topic: to revise Wordnet-LMF to accomodate bilingual semantic lexicons
... Concluding remarks, LMF can be used to implement standardized lexicon access webservices, however modifications were needed for EDR bilingual dictionary

Machines session, Q&A

Christian: For Sebastian, you are looking at an ability to annotate strings, with RDF correct?

Sebastian: Correct, use ontology

Christian: Could not XPointer be used?

Sebastian: You are thinking about XML/HTML, I'm thinking about text, anything in a text editor.

Christian: Question on Wordnet, usage of RDF, why can't I use Sparkle

Hayashi-san:i think it's simpler to use URIs

Question: Introducing XML:TM wold enhance the application

Sebastian: You can use URIs on the web

Andrej: You use a copy

Users

Session introduced by Reza Keschawarz.

Alexander O'Connor, "Digital Content Management Standards for the Personalised Multilingual Web"

Alexander: There is huge amount of data generated each day on the Web

... audience needs to find and understand to content

... web 2.0 was about interaction, web 3.0 must be about recommendation and personalization

... personalization in use: Google search history, Amazon recommendations

... personalization strategy = user model + domain model + content model

... there is a lot of content formats, we need something which can be reordered for personalization

... linked data is more research oriented

... schema.org is effort of Google, Microsoft and Yahoo to provide common vocabulary for events, people, ...

... for identifying people 3 standards are most used OpenID, WebID and OAuth

... standards for exporting data from G+/FB are very strange for now

Olaf-Michael Stefanov, "An Open Source tool helps a global community of professionals shift from traditional contacts and annual meetings to continuous interaction on the web"

Olaf: Challenge - maintenance of multilingual site without budget

... solution - use open-source tool supporting multilinguality and crowdsourcing

... tiki wiki is a such software

... JIAMCATT = Joint Inter-Agency Meeting on Computer-Assisted Translation and Terminology

... shows some impressive numbers

... linport.org project

... tiki has no concept of single source language

Olaf introduces more tiki features useful for collaborative translation

Users session, Q&A

Christian: Alex, what was recommendation to localizers.

Alex: Good localization needs additional metadata, including identification, personalization, ...

Q: Some business also use linked data, there is also schema.rdf.org

Alex: it is not war, but question of adoption

Policy

Jorg: We start with terminology, new approaches.

Gerhard Budin, "Terminologies for the Multilingual Semantic Web - Evaluation of Standards in the light of current and emerging needs"

Gerhard: Terminology, who needs it and what's up, and standards... how they happen, where to.
... Multilingual semWeb is a current trend. Often terminologies are ontologies, and discussion / work topics include mulri-ligualising them.
... Problem: Semweb is language-unaware, in contrast to the fact that there are partial symmetries between languages, which doesn't match the language-independent data-models.
... Lots of standards, projects, etc. Lots of diversity...
... maybe too much

[explains some alphabet soup of examples]

... consider lemon, one of the most promising approaches so far - might be brought forward within w3c

... pre-normative research is important to road-test potential standards.
... examples for Multilingual SemWeb include Lise (legal), TES4IP (intellectual property), MGRM (risk management)
... Conclusions: We need to make stakeholders talk to each other, implement and test, with users involved at the beginning. Different domains imply diverse scenarios.
... The degree of diversity can be easily under-estimated, but there is still a lot to learn from each other.
... Linked Open Data is a promising approach.

Georg Rehm, "META-NET: Towards a Strategic Research Agenda for Multilingual Europe"

Georg: Who is familiar with META-NET?

[about half]

... ML Europe: Challenge is to stop small languages from being a disadvantage.
... we have progress but not fast enough to succeed.
... METANET is trying to get stakeholders to team up to foster mutilingual information society.

... community, strategic research goals is my topic for today.
... open technology alliance of 300 members, anyone can join.

... Metanet language white papers describe technology status for different languages
... key messages are about social situation - economic, challenges, risks they face.
... Assessing Language technology support is tricky - can count many different kinds of things.
... we went from surveys assessing what is where and how it works, and aggregated and condensed it for journalists/politicians (our target audience)
... clustering different themes and different quality.
... in speech things look good for large languages, while for machine translation things look bad to awful outside english...
... Most large companies have stopped working on language technology.
... Trying to use this information to drive a reasearch agenda for filling the gaps identified.
... And to build a community behind it, with a vision.

... We assembled groups to propose visions.
... these are condensed down.

[roadmap: finish spring next year with a roadmap for more to do]

Arle Lommel, "Beyond Specifications: Looking at the Big Picture of Standards"

Arle: GALA vision is same as what we are thinking here, but not restricted to the web
... problems that standards are created to solve are business needs, output is a technical specification.
... by geeks, for geeks; not for their users.
... How do companies give input to the standards that have input on them, if they're not rich?

http://xkcd.com/927 -> what we do

... There are specs that are *unimplementable*, and/or *incomprehensible*.
... So what is missing that GALA can contribute?
... Coordination, Education, Promotion.
... Coordination: bilateral liaison doesn't scale.
... Need to teach people what to use how, and promote the use of best practice in the first place.
... Need to identify what we are doing at a business level, and then how to tie it to technical solutions
... There is a gap between web solutions, and things that solve these problems not on the web. need to get these groups to talk to each other.
... So join in and help.

Policy session, Q&A

Q: What languages are metanet papers on languages available in?

Georg: Written in english, being translated to language they report on.

Q: Maybe we could get a summary of all the standards from the workshop - a glossary of alphabet soup we've talked about

Gerhard: I can give you a starting list of a few dozen (although it is incomplete...)

Q: European Commission tried to have a couple of people working full time to maintain the list of standards. It's a big job.

O-M: If we list the things we have now can we make it simpler?

Chaals: How do we reduce the number of standards, by getting multiple things to merge together.
... or are we doomed to an ever-increasing list?

Arle: Think they don't have so much overlap, not sure we want to do that

TImo: We would need to have social scientists. Having communication is important. For developers it is important to know about the market situation - which standards are disappearing...

Felix: For people working in web standards, Gala and MetaNet look really similar. But people inside them see the difference clearly.
... we need to make the differences understandable. Which is what this work is doing...

Gerhard: Agree. There are so many groups using these, even without knowing much about them.
... There is a lot of work to prevent confusing target audiences.

Joerg: You mentioned MT support in different languages. Are your data up to date?

Georg: They are very stark generalisations of aggregated expert opinions (and guesses)
... I think yes.

Joerg: Does you strategy include recommendations for implementation and new standardisation?

Georg: Mostly will be roadmaps to about 2025 concerned with solution visions.

[logistics, adjourn. Think about ideas for discussion tomorrow]

MLW workshop

21 Sep 2011

Attendees

Contents