ICT PSP, Thematic Network preliminary proposal

Call for proposals ICT PSP 3

Call identifier: CIP-ICT-PSP-2009-3

Enhancing the Multilingual Web

 

Key facts

This is an early outline of a proposed proposal under THEME 5:MULTILINGUAL WEB of the COMPETITIVENESS AND INNOVATION FRAMEWORK PROGRAMME (CIP) ICT POLICY SUPPORT PROGRAMME for which the EC has put out a call as of 29 January 2009. The proposal is for the World Wide Web Consortium (W3C) to coordinate a Thematic Network (TN), with funding to support European partners (though some additional non-funded partners are permitted).

This is Objective 5.2: Multilingual Web content management: standards and best practices.

We are looking for potential partners. We need a minimum of 7 participants from 7 European or associated countries (listed on page 9 of the Guide for Applicants).

Funded partners get a fixed lump sum of 8K / year (this includes 5K to cover travel costs). The W3C will choose partners based on the potential contributions of applicants.

Submission deadline 2 jun, negotiations sep 09, signature nov 09.

See the Guide for Applicants.

Introduction

Given the importance of the World Wide Web to communication in all walks of life and as the share of English Web pages decreases and number of languages spoken in the European Union increases, the importance of ensuring the multilingual viability of the World Wide Web is paramount. To build on current internationalization of the Web and move it forward it is important to review, consolidate, and project new best practices, standards and partnerships related to managing content on the multilingual Web. We expect this proposal to support, but not be limited to, the aims of the EC to improve the use of machine translation on the Web.

This project, if accepted, will bring together stakeholders from a range of affected activities to discuss where we are currently in terms of providing for a truly multilingual Web, and where we need to concentrate effort as we move forward. Although one of the major goals of the project is to establish a network of stakeholders, it will also deliver some practical initiatives, with an expected lifetime that continues beyond the project itself, aimed at supporting the development of the multilingual Web. The project will last for two years.

The project is lead by the World Wide Web Consortium (W3C), an organization of currently over 400 members worldwide from research and industry, headed by the Web's inventor, Sir Tim Berners-Lee. The project will be supported out of the Internationalization Activity of the W3C, which has been involved in producing specifications, guidelines, and best practices, in education and outreach, and in reviewing new Web technologies for internationalization issues since 1998.

In addition to organizations such as LISA and the LRC, and standards organizations we are looking for participants from various industry sectors, including content developers and publishers, localisation and MT (machine translation) tool developers, user agent developers and ISPs, localization providers, government bodies, universities, etc. We are particularly interested in users and developers of multilingual Web content.

Work programme details

The following reproduces the text on pages 28-29 of the Work programme relevant to the Thematic Network.

Focus and outcomes

The aim is to promote Web standards, best practices and partnerships for multilingual Web content management, in particular the authoring, versioning and maintenance of (parallel) multilingual Web sites, portals or repositories.

The Thematic Network is expected to raise awareness, build consensus on and encourage the use of standards and widely recognised conventions to promote the Web as a primary medium for truly multilingual and cross-lingual content and services.

Conditions and characteristics

The consortium should involve world-class organisations and key stakeholders in the field of Web engineering and standards, as well as public and private providers and distributors of multilingual content. It should demonstrate convincingly its ability to promote the uptake of its results by a broad range of stakeholders from most if not all of the EU Member States.

Expected impact

Broader and faster adoption of standards and open reference architectures, guidelines and best practices in the Web environment which foster and facilitate the creation and management of multilingual Web sites and content, minimising the overhead resulting from multilingualism and capitalising on the rich multilingual and multicultural contribution of online communities.

The EC literature suggests that the Thematic Network should:

For more information, and to see how this fits with other objectives in the call, see the following links:

 

Proposed work items (by type)

The following are brief suggestions for how the W3C currently plans to propose the project. We list here work items that will need to be combined into four 6-monthly work packages for the submission. We welcome your suggestions for improvement. Please provide feedback to me intially at ishida@w3.org.

The network will have ongoing access to two or more archived, public, mail lists, as needed. One mail list will be for coordination of workshop logistics, admin-related activities, etc. and will be mainly for partners. The other mail list(s) will be for general communication on any of the topics addressed during the project, and will be publicly viewable lists, that can be subscribed to by the general public. The lists will be maintained by W3C.

There will also be a project home page for coordinating work and to point to workshop and face to face outputs, ongoing status of work, etc. This will be on the w3c site. Workshops and other major project milestones will also be announced on the main W3C home page.

At any time it is needed a wiki can be set up, also hosted by W3C, to support collaborative working.

It is currently envisaged that each workshop and face to face meeting will last for two days.

The project home page and useful resources will be pointed to from the W3C Internationalization Activity home page, amongst others, ensuring high visibility for the project. It is hoped that other partners with web sites consulted by the industry would also point to the information.

It is also envisaged that the findings would also be disseminated via conference channels such as Unicode, LISA, Localization World, etc.

Note that the funding provided by the Commission for partners is primarily aimed at enabling participation of stakeholders in the network, promoting useful interactions between them and enabling publication of considered recommendations for future work. Partners are not required to be involved in actual development work related to practical work items described below (creation of an i18n checker, a training package and publication of test results). The development work will be carried out by the W3C. As part of the network, partners are asked however to review, suggest ideas for, contribute feedback on and support the deployment of those deliverables, as described in the following list.

Participants will be asked to:

  1. Attend and participate in workshops and face to face meetings (for which funding is provided). Participation in workshops means sharing information about initiatives the partner is involved in, and working together to summarise the current status in the area being discussed as well as issues that need to be addressed. Participants will also be asked to help build the agenda for the workshops by participating in the program committtee.
  2. Provide support, where possible, for hosting one of the above meetings. We would like to hold meetings in diverse locations around Europe (particularly in countries new to the European Union), to increase exposure of attendees to a variety of local people, cultures and issues.
  3. Review and provide feedback on the practical work items.
  4. Assist in providing results of internationalization tests for compilation on the W3C site
  5. Where it is of interest, participants may become more actively involved in the development of the practical work items, though this work will mostly be driven by the W3C.

 

Workshop 1: The landscape of multilingual Web standards & best practices

Partners and any other participants (the latter subject to acceptance of position papers) will review currently available standards, best practices and initiatives supporting the multlingual Web, and discuss areas needing attention. This workshop will look at the landscape from a high level and range widely, whereas follow-on workshops will be somewhat more focused. Participants will also select a workshop theme to be treated later in the project.

One output of the workshop will be a list of initiatives, standards, guidelines and best practices currently available or in development. Another will be a set of recommendations for areas which need attention. Another will be the general education of the participants in the issues of the various areas represented by the participants.

This workshop will be open to the public. It will last two days. Ideally the workshop facilities will be provided by a TN partner. The minutes and a summary report of the findings of the workshop will be made publicly available on the W3C web site.

Workshop 2: Authoring the multilingual Web

Partners and any other participants (the latter subject to acceptance of position papers) will share their experiences with standards, guidelines, best practices and initiatives related to authoring content for the Web, and discuss areas needing attention.

There will also be the opportunity for a small number of selected subject matter experts to present short educational sessions relating to practical techniques for authors (eg. latest developments in language tagging, character and language declarations in HTML5, current status of IDNA, etc..

The workshop should include authoring of corporate content using content management systems and organization websites, but also personal authoring in such things as blogs and social networking environments. Topics can include authoring practices related to automated checking of character encoding, language and other declarations, CSS styling features, translatability issues, navigating around multilingual sites, use of language tagging, idna, authoring for mobile devices, etc...

This workshop will be open to the public. The minutes and a summary report of the findings of the workshop will be made publicly available on the W3C web site. All position papers and slides presented will also be available. In particular, any presentations from subject matter experts sharing practical best practices will be made available on or linked to from the W3C site.

Workshop 3: Translation tool support

Partners and any other participants (the latter subject to acceptance of position papers) will share their experiences with standards, best practices and techniques related to enabling efficient and effective translation of Web based content, and discuss areas needing attention. Relevant topics will include a summary of the status and content, and discussion of issues and possible next steps for such things as the W3C's Internationalization Tag Set specification, standards such as XLIFF, TMX, etc., translation tools, etc....

This workshop will be open to the public. The minutes and a summary report of the findings of the workshop will be made publicly available on the W3C web site.

Workshop 4: Optional topic

The theme of the workshop will be chosen at a later date. It may be used to extend discussions started in earlier workshops, to provide some continuity and follow-on for those ideas, or a new topic may be introduced. Examples could include such things as the following, or other ideas agreed upon by the partners during the first meeting:

  1. Meeting the needs of minority languages. Could be held in conjunction with the Digital World project, looking at minority languages in Europe, but also how Europe should apply it's experience and knowledge to support cultures trying to introduce the Web outside Europe.
  2. Workshops to review and provide feedback on w3c specs in devt, eg. charmod norm, ws i18n, ...
  3. Experience sharing about barriers and lessons in handling deployment of multilingual information
  4. Wrap up for the project and discussion of ways to continue the work after the end of the project.
  5. etc... (more ideas needed here)

This workshop will be open to the public. The minutes and a summary report of the findings of the workshop will be made publicly available on the W3C web site.

 

Face to face meeting 1: Practical work item kick-off

This FTF will be open to the public. The workshop will provide an opportunity for partners to contribute ideas on the training, i18n checker, and test suite work items.

The minutes of the workshop will be made publicly available on the W3C web site.

Face to face meeting 2: Practical work item review

This FTF will be open to the public. The workshop will provide an opportunity for partners to review and provide feedback on the training, i18n checker, and test suite work items.

The minutes of the workshop will be made publicly available on the W3C web site.

 

Practical work item 1: i18n checker tool

Authors are largely unaware of defects in their content related to internationalization. There is a need for a tool that authors can run on content to assess its international readiness. The tool should not only point out problem areas, but should also point to advice on what to do if a problem is reported.

This will be an online tool similar to the W3C's HTML Validation and MobileOK checker, but aimed at checking web pages for internationalization issues. For example, it will report errors, warnings and other advice to page authors on a range of topics that will include such things as character encodings and declarations, language declarations, use of directional markup, non-normalized class or id names, byte order marks, navigational links, etc, etc. Feedback on issues identified will explain the issues in simple terms, and link to existing guidelines and best practices as well as further reading, so that it acts as an educational tool, rather than simply listing errors and warning. Any member of the public will be able to submit any X/HTML or CSS file for checking by specifying the URI. The tool may also address other technologies, such as SVG.

This work will leave behind a durable and widely useful legacy from the project work. The partners will be involved in providing ideas for included features, and reviewing and testing the checker as it is developed. Some partners may wish to provide assistance in developing the tool.

The tool will be available to the general public on the w3c site under open source licences (at a minimum the W3C software licence), and promoted along with the W3C's similar tools. It is expected that the tests will eventually also be integrated into the existing validators.

 

Practical work item 2: i18n training

The W3C will develop a one-day training package for web content developers that will be made available for delivery by and to the public at large. The partners will assist in the development by providing suggestions for content and by providing review feedback at face to face meetings. Discussions may also take place on the email lists. Some partners may wish to provide support for development of the course.

Project partners will also be expected to help by providing facilities and organising attendees for the delivery of a certain number of courses.

The training package will address such things as encoding & language declarations, composite messages, dealing with text expansion, navigation, etc., an overview of Unicode and related concepts, ITS, ...

Annotated slides for the training will be made available for download from the w3c site under open source licences (at a minimum the W3C document licence).

 

Practical work item 3: i18n test suite results gathering

Tests for internationalization features on Web user agents are extremely useful for highlighting areas where user agent support can be improved and alerting content authors to features that may not work interoperably. The W3C already has a number of internationalization-related tests in various test suites, including tests on the i18n area of the site, and is developing more on an ongoing basis. Several of the i18n tests have been picked up by browser development teams to internally promote new features and test prior to deployment. Reporting the results of the tests also significantly heightens the visibility of these features on the part of user agent developers, but also serves to educate and inform content developers.

This work package will bring together partners to review tests and provide test information on test results for user agents on a range of platforms and devices. (Some partners may also wish to assist in the development of the tests themselves. This would be by agreement with the W3C.) During the face to face meetings, partners will also have the opportunity to suggest and discuss additional tests that might be useful.

The test results will be made publicly available on the W3C site. The work begun in this way will hopefully continue after the end of the project, as further tests are developed.

 

Next steps

If you are interested in participating as a partner in the project as outlined, please let us know and send feedback on the proposal.

We will then prepare a draft version of the official proposal, for which we will need information about your organization and the person who will represent it, and when we have secured agreement on the proposal we will submit it.

 

Version: $Id: ictpsp-preliminary-description.html,v 1.10 2009/04/03 16:54:38 rishida Exp $