Archives for: 2009
Posts
Talk slides: Standards-based Translations with W3C ITS and OASIS XLIFF
On November 5th, Christian Lieske and Felix Sasaki gave a talk entitled Standards-based Translations with W3C ITS and OASIS XLIFF at TCWorld, Wiesbaden, Germany.
The slides are in PDF. The presentation describes ITS and XLIFF, the two standards which are important for proper internationalization and localization of XML. Topics include a discussion of general benefits of standards-based internationalization and localization, an introduction to both standards and how they help to achieve such benefits, and an explanation of the relation between the two. A highlight was the introduction of a tool for round-tripping from an XML-document with ITS information to XLIFF, and the integration of translated material from XLIFF back into the original XML. [search keys: talk-2009 talk-sasaki] talk-lieske]
Updated article: Styling using language attributes
The major change was the addition of detailed information about use of CSS selectors with xml:lang, but there were many other edits (see the list below). Translators should consider retranslating the whole tutorial. [search keys: qa-css-lang]
Unicode Collation Algorithm Version 5.2 Released
Version 5.2 of the Unicode Collation Algorithm has been released. This version resynchronizes the Unicode Collation Algorithm with all of the updates for the Unicode Standard, Version 5.2.
The rest of this post is taken from the Unicode Consortium's release notification and details changes and issues for implementations.
- The text of UTS #10 has been updated. Among other changes, the revised text for UTS #10 makes it clear that the BASE for implicit generation of weights for Han characters does not include unassigned code points.
- There are small changes in Gujarati, Telugu, Malayalam (including weighting for chillus), Tamil, and Sinhala. While these changes move in the direction of expected behavior, good results will only come from tailoring for particular languages, such as with CLDR.
- There have been significant changes to the ordering of many combining marks. Many combining marks that are not in customary use in modern languages now have the same secondary weight, and will only be distinguished on a fourth level, by code point ordering. This can be seen by looking at the Unicode Collation Charts (http://unicode.org/charts/collation/). In 5.2, many characters now have a white background, indicating that they sort exactly the same as the previous character, unless a 4th (codepoint) level is used.
- Implementations of UCA should take note that the increased number of characters may cause overflows if the implementing code makes certain assumptions or optimizations. This can result either from the new character additions (which increase the number of distinct weights in the table) or because of changes in the way the weights, particularly for secondary weight values, are assigned in the table. The latter change may result in unexpected numbers of characters having the same weight.
Article for wide review: Choosing a language tag
Comments are being sought on this article prior to final release. Please send any comments to www-international@w3.org (subscribe). We expect to publish a final version in one to two weeks. [search keys: qa-choosing-language-tags]
Updated article: Language tags in HTML and XML
This tutorial was updated to incorporate changes made to BCP 47 by the recent publication of RFC 5646. Changes to BCP 47 include the introduction of extended language subtags, and the addition of ISO 639-3 language subtags, bringing the total number of subtags in the registry to almost 8,000.
Translators should consider retranslating the whole tutorial. [search keys: article-language-tags]
Unicode 5.2.0 Released
On 1st October, Unicode 5.2 was released! The data files, code charts, and Unicode Standard Annexes for this version are final and are posted on the Unicode site.
For Unicode 5.2, the core specification is no longer just a delta document applied to the book; instead, the entire core specification,with all textual changes integrated, will be available on the Unicode site. As of this announcement, the first five chapters are available; the other chapters will follow soon
For full details about what is new or changed in this release, see the version documentation for Unicode 5.2.
New Working Group Note: Requirements for String Identity Matching and String Indexing
On 15th September, the Internationalization Core Working Group published Requirements for String Identity Matching and String Indexing as a Working Group Note.
This document is being published as a Working Group note in order to capture and preserve historical information. It contains requirements elaborated in 1998 for aspects of the character model for W3C specifications. It was developed and extensively reviewed by the Internationalization Working Group, but never progressed beyond Working Draft status. For this publication, the wording of the 1998 version remains unchanged (except for correction of a small number of typographic errors), but the links to references have been updated prior to this publication.
The document describes requirements for some important aspects of the character model for W3C specifications. The two aspects discussed are string identity matching and string indexing.
Editor: Martin Dürst. [search keys: tr-charreq]
More new translations into Spanish
Thanks to the Spanish Translation Team, Spanish Translation US, the following articles have been translated into Spanish.
Codificación de caracteres para principiantes (Character encodings for beginners)
Configuración de codificaciones en aplicaciones de autoría web (Setting encoding in web authoring applications)
[search keys: qa-what-is-encoding qa-setting-encoding-in-applications]
New translations into Spanish
Thanks to the Spanish Translation Team, Spanish Translation US, the following articles have been translated into Spanish.
Uso de entidades de caracteres y NCR (Using character entities and NCRs)
Set de caracteres para documentos (Document character set)
Cómo cambiar la codificación de la página (X)HTML a UTF-8 (Changing (X)HTML page encoding to UTF-8)
[search keys: qa-escapes qa-doc-charset qa-changing-encoding]
New translations into Spanish
Thanks to the Spanish Translation Team, Spanish Translation US, the following articles have been translated into Spanish.
Tutorial: Identificación del idioma en XHTML y HTML (Tutorial: Declaring Language in XHTML and HTML)
Encabezado Accept-Language utilizado para ubicar la configuración (Accept-Language used for locale setting)
HTTP y metadatos para información sobre el idioma (HTTP and meta for language information)
¿Por qué utilizar el atributo de idioma? (Why use the language attribute?)
Etiquetado de texto sin idioma (Tagging text with no language)
xml:lang en esquemas de documentos XML (xml:lang in XML document schemas)
[search keys: qa-accept-lang-locales tutorial-language-decl qa-no-language qa-lang-why qa-http-and-lang qa-when-xmllang]
New Working Group Note: Authoring HTML: Handling Right-to-left Scripts
The Internationalization Core Working Group has published Authoring HTML: Handling Right-to-left Scripts as a Working Group Note.
This document describes techniques for the use of HTML markup and CSS style sheets when creating content in languages that use right-to-left scripts, such as Arabic, Hebrew, Persian, Thaana, Urdu, etc. It builds on (but also goes beyond) markup needed to supplement the Unicode bidirectional algorithm, and also touches on how to prepare content that will later be localized into right-to-left scripts.
Editor: Richard Ishida. [search keys: tr-i18n-html-tech-bidi]
New language tag specification, RFC 5646, published
The IETF has published RFC 5646, an update of Tags for Identifying Languages. This specification obsoletes former RFCs 4646, 3066 and 1766.
RFC 5646 makes it possible to use over 7,000 three-letter ISO 639-3 language codes, in addition to the 2 letter codes that have been in use for some time. It also introduces 220 'extended language' subtags, mainly for backwards compatibility.
It continues to be best to refer to this specification as BCP47. This is a non-changing name and web address that points to the latest relevant RFCs.
The Internationalization Working Group at the W3C is working on an article to help users choose language tags, given the various types of subtag that are now available, and the sheer number of subtags.
You can look up language and other subtags in the IANA Language Subtag Registry.
(Richard Ishida has provided an unofficial tool for searching the registry that also provides advice for choosing subtags, and allows you to partially validate a hyphen-separated language tag.)New translations into Hungarian
Thanks to Dénes Kohn, Metaphraser - Translation Company, the following articles have been translated into Hungarian.
Mikor használjunk nyelvi egyeztetést (When to use language negotiation)
Útmutató: A Nyelv Deklarálása XHTML-ben és HTML-ben (Tutorial: Declaring Language in XHTML and HTML)
Szöveg nyelv nélküli címkézése (Tagging text with no language)
Miért használjuk a nyelv attribútumot? (Why use the language attribute?)
HTTP és meta a nyelvi információhoz (HTTP and meta for language information)
[search keys: qa-when-lang-neg tutorial-language-decl qa-no-language qa-lang-why qa-http-and-lang]
New translation into Hungarian
Thanks to Dénes Kohn, Metaphraser - Translation Company, the following article has been translated into Hungarian.
Karakterkódolások kezdőknek (Character encodings for beginners)
[search keys: qa-what-is-encoding]
New translations into Spanish
Thanks to the English to Spanish Translation Team, Spanish Translation US, the following articles have been translated into Spanish.
El tamaño del texto en la traducción (Text size in translation)
Imágenes de fondo que admiten la localización (Background images that support localization)
Estilos con el atributo lang (Styling using the lang attribute)
[search keys: article-text-size qa-resizing-backgrounds qa-css-lang]
New translations into Hungarian
Thanks to Dénes Kohn, Metaphraser - Translation Company, the following articles have been translated into Hungarian.
Karakterkódolások (Character encodings)
Dokumentum karakter beállítás (Document character set)
Megjelenítési problémák amelyeket az UTF-8 BOM okoz (Display problems caused by the UTF-8 BOM)
[search keys: article-o-charset qa-doc-charset qa-utf8-bom]
New translations into French, German and Italian
Thanks to Trusted Translations Inc. the FAQ-based article "Bidi Space Loss" has now been translated into French, German and Italian.
Bidirektionaler Leerzeichenverlust
[search key: qa-bidi-space]
Updated tests: Web fonts
The tests of font linking and eot fonts were updated, along with the associated results pages. The number of tests was reduced to a single test per script, but test cases were created for HTML4, XHTML 1.1 and XHTML served as both text/html and XML. In addition, links to font licence information were added to the test notes. The Urdu font was also updated.
The tests are linked from here:
The results can be found here:
New translation into German
Thanks to Jens Meiert the following ’getting-started’ article has been translated into German.
Internationalisierungstips für das Web (Internationalization Quick Tips for the Web)
[search keys: article-quicktips]
New translation into Spanish
Thanks to the Spanish Translation Team, Spanish Translation US, the following article has been translated into Spanish.
CSS3 y texto internacional (CSS3 and International Text)
[search keys: article-css3-text]
New translations into Hungarian
Thanks to Dénes Kohn, Metaphraser - Translation Company, the following articles have been translated into Hungarian.
Az (X)HTML oldal kódolásának megváltoztatása UTF-8-ra (Changing (X)HTML page encoding to UTF-8)
A Nyelv a Weben (Language on the Web)
Bevezető a Karakterkészletekbe és Karakterkódolásba (Introducing Character Sets and Encodings)
[search keys: qa-changing-encoding gs-language gs-characters]
tcworld article about Japanese Requirements Note
tcworld magazine has published an article by Tony Graham about the recently published W3C Note, Requirements for Japanese Text Layout.
Updated tests: HTML and CSS and text direction
Continuing the work of repackaging the tests in the Internationalization test suite around 87 more tests, this time relating to right-to-left and bidirectional text have been updated. Each of the 87 tests are implemented for HTML 4.0, XHTML 1.0 served as text/html, XHTML 1.0 served as XML, and XHTML 1.1 served as XML (ie. totally around 350 test cases).
There are also tables covering the results of the tests, and summaries of the findings. Most of these are new. The tests were run on recent versions of major browsers.
The tests and results are linked from here:
(Note that the vertical text tests are not included in this announcement, since they are still in the early stages of development.)
Updated tests: HTML and CSS character encodings and language declarations
As part of the ongoing work of repackaging the tests in the Internationalization test suite around 70 tests relating to character encodings and language declarations have been updated. Each of the 70 tests are implemented for HTML 4.0, XHTML 1.0 served as text/html, XHTML 1.0 served as XML, and XHTML 1.1 served as XML (ie. totally around 280 test cases).
There are also tables covering the results of each test, and summaries of the findings. The tests were run on recent versions of major browsers.
The tests and results are linked from here:
New translations into Hungarian
Thanks to Dénes Kohn, Metaphraser - Translation Company, the following articles have been translated into Hungarian. These are our first Hungarian translations on the Internationalization subsite.
Honosítás és Internacionalizálás (Localization vs. Internationalization)
Nemzetközi és többnyelvű weboldalak (International & multilingual web sites)
Szövegméret a fordításban (Text size in translation)
[search key: qa-i18n] [search key: qa-international-multilingual] [search key: article-text-size]
Updated Working Draft: Best Practices for Authoring HTML: Handling Right-to-left Scripts
The Internationalization Core Working Group has published an updated Working Draft of Best Practices for Authoring HTML: Handling Right-to-left Scripts.
This document provides advice for the use of HTML markup and CSS style sheets to create pages containing languages that use right-to-left scripts, such as Arabic, Hebrew, Persian, Thaana, Urdu, etc.
The Working Group believes this document is complete and does not anticipate any substantive changes. This draft is provided as a last chance for review and feedback before publication as a Working Group Note.
Please send comments on this document to www-international@w3.org (publicly archived) by 28 July 2009.
Editor: Richard Ishida. [search key: tr-bp-bidi]
New translation into Spanish
Thanks to the Spanish Translation Team, Spanish Translation US, the article "Setting the HTTP charset parameter" has now been translated into Spanish. [search key: article-o-http-charset]
Configuración del parámetro charset de HTTP
Updated Polish translation
Thanks to K. Wiśniewski the Getting Started article "Language on the Web" has now been updated in Polish. [search key: gs-language]
New article: Using Unicode controls for bidi text
FAQ-based article: If I'm unable to use markup to correctly order bidirectional text, what can I do?
By Richard Ishida, W3C. [search key: qa-bidi-unicode-controls]
Updated tutorial: Creating HTML Pages in Arabic, Hebrew and Other Right-to-left Scripts
This tutorial was updated to incorporate changes made to the article What you need to know about the bidi algorithm and inline markup, but various additional changes were made, including a new approach to handling examples. For a detailed list of changes read the full post.
Translators should consider retranslating the whole tutorial. [search keys: tutorial-bidi-xhtml]
Updated article: What you need to know about the bidi algorithm and inline markup
This article was revised substantially.
Translators should consider retranslating the whole article. [search keys: article-inline-bidi-markup]
New translation into Bulgarian: Changing (X)HTML page encoding to UTF-8
Thanks to Ivan Baldwin the FAQ-based article "Changing (X)HTML page encoding to UTF-8" has now been translated into Bulgarian.
Променяне на (X)HTML кодировката на страницата с UTF-8
[search key: qa-changing-encoding]
Article for wide review: Using Unicode controls for bidi text
Comments are being sought on this article prior to final release. Please send any comments to www-international@w3.org (subscribe). We expect to publish a final version in one to two weeks. [search keys: qa-bidi-unicode-controls]
Talk slides: New Work on Japanese Layout Requirements
Richard Ishida gave a presentation entitled New Work on Japanese Layout Requirements on 11 June, 2009 at the Fachhochschule Potsdam, Germany. The slides are annotated and in PDF. They build on a previous talk by Richard Ishida, Steve Zilles and Tatsuo Kobayashi at the Unicode Conference, and describe some of the key characteristics of Japanese Layout described in the newly published W3C Note, Requirements for Japanese Text Layout. [search keys: talk-2009 talk-ishida]
Talk slides: Practical Tips for Designing International Web Pages
Richard Ishida gave a presentation entitled Practical Tips for Designing International Web Pages on 9 June, 2009, at Localization World, Berlin, Germany.
The slides are annotated and in PDF. The presentation looked at a selection of practical issues for people who develop web pages for a multilingual audience. Topics included the dangers of composing sentences in content using scripting, strategies for designing layout so that text expansion during translation will not destroy your efforts, strategies for navigating localized content, and the separation of content and presentation. It explored some of the potential difficulties that can be encountered in these areas and recommended some best practices to help you avoid them. [search keys: talk-2009 talk-ishida]
New translations into Dutch, Spanish and Portuguese: Bidi Space Loss
Thanks to Trusted Translations Inc. the FAQ-based article "Bidi Space Loss" has now been translated into Spanish, Dutch and Portuguese.
[search key: qa-bidi-space]
New translations into Dutch, Spanish and Portuguese: Creating SVG Tiny Pages in Arabic, Hebrew and other Right-to-Left Scripts
Thanks to Trusted Translations Inc. the tutorial "Creating SVG Tiny Pages in Arabic, Hebrew and other Right-to-Left Scripts" has now been translated into Spanish, Dutch and Portuguese.
Creación de páginas SVG Tiny en árabe, hebreo y otros sistemas de escritura de derecha a izquierda
SVG Tiny pagina's creëren in het Arabisch, Hebreeuws en andere 'van rechts naar links' schriften
Criação de SVG Tiny Pages em árabe, hebraico, e em outros scripts da direita para a esquerda
[search key: tutorial-svg-tiny-bidi]
New translation: Дву-символни или три-символни кодове за език
Thanks to Ivan Baldwin the FAQ-based article "Two-letter or three-letter language codes" has now been translated into Bulgarian. [search key: qa-lang-2or3]
New Working Group Note: Requirements for Japanese Text Layout (日本語組版処理の要件)
This document describes requirements for Japanese layout realized with technologies like CSS, SVG and XSL-FO. For non-Japanese speakers it provides access for the first time to a wealth of detailed and authoritative information about Japanese typesetting. The document is mainly based on a standard for Japanese layout, JIS X 4051 and its authors include key contributors to that standard. However, it also addresses areas which are not covered by JIS X 4051.
The document was created by the Japanese Layout Task Force (with participation from four W3C Working Groups, CSS, Internationalization Core, SVG and XSL)
A Japanese version is also available.
New translation: Codificación de caracteres
Thanks to Spanish Translation Team at Spanish Translation US the article "Character encodings" has now been translated into Spanish. [search key: article-O-charset]
New translation: Darstellungsprobleme durch das UTF-8-BOM
Thanks to Gunnar Bittersmann and Juliane Wünsche the FAQ-based article "Display problems caused by the UTF-8 BOM" has now been translated into German. [search key: qa-utf8-bom]
New translation: Проблеми с визуализацията на UTF-8 BOM
Thanks to Ivan Baldwin the FAQ-based article "Display problems caused by the UTF-8 BOM" has now been translated into Bulgarian. [search key: qa-utf8-bom]
New translation: Кодировка на символите
Thanks to Ivan Baldwin the article "Character encodings" has now been translated into Bulgarian. [search key: article-o-charset]
ITS support in the Okapi framework
The Okapi Framework Team has announced the first milestone of its Java-based products. The framework provides cross-platform and open-source components and applications for localization tasks.
One of the components in this release is an XML filter based on an implementation of the W3C Internationalization Tag Set (ITS) Recommendation.
The filter allows access to the translatable content of an XML document, based on any external or internal global rules, as well as local rules. The ITS processor provided supports the following data categories: Translate, Localization Note, Element Within Text, Terminology, Directionality, and Language Information.
Rainbow, an Okapi application, uses the filter to extract and merge translatable content to and from XLIFF. Many other utilities provided in the framework take advantage of the ITS-based filter as well, for example to perform pseudo-translation.
You can download the Okapi components and get their source code.
New translation: 使用<select>鏈結到本地化內容
Thanks to Samuel Chong the FAQ-based article "Using <select> to Link to Localized Content" has now been translated into Traditional Chinese. [search key: qa-navigation-select]
Updated article: Setting language preferences in a browser
This article was updated to add and remove browser information and correct some text. For a detailed list of changes read the full post.
Translators should consider retranslating the whole article. [search keys: qa-lang-priorities]
Changes in the W3C Internationalization Team
We wish Felix success for the future, and thank him for his dedication and hard work in supporting the internationalization effort for the past four years.
New translation: CSS versus marcação para suporte bidirecional
Thanks to Gaston Diego Valente the FAQ-based article "CSS vs. markup for bidi support" has now been translated into Portuguese. [search key: qa-bidi-css-markup]
New translation: Hojas de estilo en cascada en contraposición al etiquetado para la compatibilidad bidireccional
Thanks to Gaston Diego Valente the FAQ-based article "CSS vs. markup for bidi support" has now been translated into Spanish. [search key: qa-bidi-css-markup]
New translation: Rozmiar tekstu w tłumaczeniu
Thanks to Kamil Wiśniewski the article "Text size in translation" has now been translated into Polish. [search key: article-text-size]
Updated tests and results: list-style-type set to armenian
These tests check whether and how a user agent displays list numbering when the value of the CSS list-style-type property is set to armenian, lower-armenian and upper-armenian.
A number of errors in the tests were corrected and the results page was rewritten to reflect the changes and results for latest versions of major browsers. [search keys: test-list-style-type results-list-style-type]
New translation: Sygnatura UTF-8 BOM a problemy z wyświetlaniem
Thanks to Ana Backstone the FAQ-based article "Display problems caused by the UTF-8 BOM" has now been translated into Polish (language negotiated). [search key: qa-utf8-bom]
New translation: Witryny jednojęzyczne a wielojęzyczne
Thanks to Ana Backstone the FAQ-based article "Monolingual vs. multilingual Web sites" has now been translated into Polish (language negotiated). [search key: qa-mono-multilingual]
New tutorial: Creating SVG Tiny Pages in Arabic, Hebrew and other Right-to-Left Scripts
Right-to-left scripts include Arabic, Hebrew, Thaana and N'ko, and are used by a large number of people around the world. If you are new to dealing with bidirectional text, getting it to display correctly can sometimes appear complex and confusing, but it need not be so. If you have struggled with this or have yet to start, this tutorial should help you adopt the best approach to marking up your content. It also explains enough of how the bidirectional algorithm works for you to understand much better the root causes of most problems, and it addresses some common misconceptions about ways to deal with markup for bidirectional content
After reading this tutorial you should:
- create effective SVG Tiny 1.2 content containing text written in the Arabic or Hebrew (or other right-to-left) scripts
- understand the basics of how the Unicode bidirectional algorithm works, so that you can understand why bidirectional text behaves the way it does, and how to work around problems
- take decisions about the appropriateness of alternatives to markup
Questions or comments? ishida@w3.org
Powered by ![]()
Copyright © 1997-2009 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark, document use and software licensing rules apply. Your interactions with this site are in accordance with our public and Member privacy statements.