The MultilingualWeb-LT Working Group published a First Public Working Draft of Metadata for the Multilingual Web – Usage Scenarios and Implementations. This document introduces a variety of usage scenarios and applications for the Internationalization Tag Set (ITS) 2.0, ranging from simple machine translation or human translation quality check to training for machine translation systems or automatic text analyis. Many of the underlying implemementations will be showcased in the upcoming W3C MultilingualWeb Workshop 12-13 March in Rome.
Until now, it has been very difficult for web application designers to do something as simple as sort names correctly according to the user’s language. The new standard ECMA-402 changes this. It provides:
- string comparison for sorting (such as for Swedish, where “ö” is a separate letter that sorts after “z”),
- number and currency formatting (such as “1.234,56 €” for a German language euro presentation, or the following choices for a Serbian language USD presentation: 12.345,12 US$, 12.345,12 USD or 12.345,12 америчких долара),
- date and time formatting capabilities (such as 2012年12月12日 for a Japanese language date, or for a French date: mercredi 12 décembre 2012).
ECMA-402, ECMAScript Internationalization API Specification, is available free of charge from the Ecma International website. See also An introduction to the standard.
Comments are requested on the following proposed updates to material on the Internationalization site, prior to final publication. NOTE THAT the articles are in a temporary location, and will be moved to their final location after the review.
Text direction and structural markup in HTML
This article has been created from material formerly in the tutorial “Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts” and augmented with information about new HTML5 markup constructs that are beginning to see adoption. It should be regarded as a new article, focusing on applying bidi markup to document- and block-level situations and to forms.
What you need to know about the bidi algorithm and inline markup
This is an update of an existing article. It has been almost completely rewritten. The most significant changes are the new parts describing how to apply the new HTML5 constructs which are beginning to see adoption. Additional changes will be needed as HTML5 bidi markup is finalised over the coming months. The article also proposes a simpler way to approach markup of bidi text, particularly useful for those with less experience, that relies less on a deep understanding of the issues involved.
Visual vs. logical ordering of text
This is a new article created from material that has been removed from the previously mentioned material. It was removed into a separate article because visual ordering is much less important these days, and to avoid duplication. Only a few changes have been made to the content itself.
Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts
This tutorial has been modified to bring it in line with the current tutorial format. Rather than contain duplicate content, it now introduces the novice to key concepts and points off to useful further reading in an organized fashion. It has been completely rewritten.
Please send any comments over the next two weeks to email@example.com (subscribe).
We hope to publish a final version shortly after that.
The program has been published for the upcoming W3C MultilingualWeb Workshop: Making the Multilingual Web Work in Rome, 12–13 March 2013.
Mark Davis and Vladimir Weinstein of Google will deliver the keynote presentation, “Innovations in Internationalization at Google”. This will be followed by one and a half days of talks on various aspects of what it takes to make multilingualism work on the Web, plus an afternoon of discussion-oriented breakout sessions that focus on best practices for various aspects the multilingual Web. Speakers will come from organizations like Adobe Systems, SAP, Yandex, the Spanish Tax Agency, the U.N. Food and Agriculture Organization, Microsoft, Lionbridge, SDL, the European Commission, and leading universities and research institutions from around the world.
The program will also feature a showcase of implementations of the forthcoming ITS 2.0 specification that will allow attendees to get a sneak peak at how this specification will impact and support multilingual requirements on the Web.
See the Call for Participation for details about how to register for the workshop. Participation in the workshop is free.
Important: The deadline for registration is 8 March, but available attendance slots are filling up fast and are expected to run out before the deadline. So please be sure to register soon to ensure that you can attend.
The MultilingualWeb workshops, funded by the European Commission and coordinated by the W3C, looks at best practices and standards related to all aspects of creating, localizing and deploying the multilingual Web. The workshops are successful because they attracted a wide range of participants, from fields such as localization, language technology, browser development, content authoring and tool development, etc., to create a holistic view of the interoperability needs of the multilingual Web.
We look forward to seeing you in Rome!
The article The byte-order mark (BOM) in HTML was updated significantly to reflect the fact that the byte-order mark in UTF-8 is less problematic now than it used to be, and that it has a higher precedence than the HTTP header for character encoding detection.
The article was largely rewritten, and now incorporates the relevant information that used to be in the article “Display problems caused by the UTF-8 BOM”. That article has now been decommissioned.
German, Spanish, Russian and Ukrainian translations need to be updated. Translators, please contact Richard Ishida (firstname.lastname@example.org) for the source text.
The deadline for speaker submissions for the 6th MultilingualWeb Workshop (March 12–13, 2013 in Rome, Italy) is this Friday (January 18 at 23:59 UTC).
With a keynote by Mark Davis and Vladimir Weinstein (Google), special breakout sessions on linked open data and other critical topics, this Workshop is set to continue the tradition of excellence set by the previous six Workshops, and will provide an outstanding forum for thought leaders to share their ideas and gain critical feedback.
While the organizers have already received many excellent submissions, there is still time to make a proposal, and we encourage interested parties to do so by the deadline. With over 100 attendee registrations already submitted for the Workshop, we are certain to have a large and diverse audience and stimulating discussion about all of the presentations.
For more information, please visit the Rome Workshop Call for Participation.
Mark Davis and Vladimir Weinstein (Google) to deliver keynote, “Innovations in Internationalization at Google,” at MultilingualWeb Workshop
Mark Davis (President and Cofounder, Unicode Consortium, and Software Engineer, Unicode and ICU, Google) and Vladimir Weinstein (Engineering Manager, Google) will deliver the keynote talk at the upcoming 6th MultilingualWeb Workshop in Rome, Italy (March 12–13).
The keynote will discuss how Google supports its ambitious goals of removing barriers to information, in an ever increasing number of languages, through recent innovations in internationalization technology.
The MultilingualWeb workshop series examines best practices and standards related to all aspects of creating, localizing and deploying the Web multilingually. It aims to raise the visibility of existing best practices and standards and identify gaps, with a view to helping content creators, localizers, tools developers, and others meet the challenges of the multilingual Web.
Participation is free. We welcome participation from both speakers and non-speaking attendees. For more information and to register, see the Call for Participation.
Comments are requested on the following proposed update of the article The byte-order mark (BOM) in HTML prior to final publication. NOTE THAT the article is in a temporary location, and will be moved to its final location after the review.
The majority of the article has been rewritten, with the aim of reducing the previous warnings against using the BOM for UTF-8 documents. Also taken into account is the change to the HTML5 spec that raises the precedence of the BOM versus the HTTP header in terms of character encoding declaration.
We hope to publish a final version at the beginning of the New Year.
Led by experts in the field, two special break-out sessions on Internationalized Domain Names (IDN) and Linked Open Data (LOD) are planned for the upcoming MultilingualWeb workshop, to be held at the headquarters of the UN’s Food and Agriculture Organization in the heart of Rome, on 12-13 March. We will also continue the Open Space discussions that have been so popular in the past.
In addition, lunch-time exhibition sessions will showcase the recent work and progress made on implementing the ITS 2.0 specification, a major effort in the W3C to improve support for language- and translation-related processes.
Register soon to ensure you get a place, especially if you are interested in also speaking. See the Call for Participation.
The W3C’s MultilingualWeb workshops bring together approximately 150 implementers, leading developers, localizers, researchers and users of the Web to discuss best practices and standards related to all aspects of creating, localizing and deploying the Web multilingually. One and a half days of presentations will be followed by break-out sessions that will allow attendees to explore additional topics in an in-depth, discussion-oriented fashion.
Participation is free.
If you have any questions, contact the program committee chair, Dr. Arle Lommel (email@example.com).
This document defines data categories and their implementation as a set of elements and attributes called the Internationalization Tag Set (ITS) 2.0.
ITS 2.0 is designed to foster the creation and localization of multilingual Web content, focusing on HTML5, XML based formats in general, and to leverage localization workflows based on the XML Localization Interchange File Format (XLIFF), and language technology applications like machine translation or named entity annotation. In addition to HTML5 and XML, algorithms to convert ITS attributes to NIF is provided.
Last Call means that the MultilingualWeb-LT Working Group feels that ITS 2.0 is ready to move to recommendation. If you have comments on the document, please send them to the list mentioned in the document status before 10 January.