Localization-Related Formats

Last update: Jan-30-02

Table of Contents

1. TMX
    1.1. Purpose
    1.2. Stage
    1.3. Maintaining Organization
    1.4. Relevance to the W3C
2. TBX
    2.1. Purpose
    2.2. Stage
    2.3. Maintaining Organization
    2.4. Relevance to the W3C
3. OLIF
    3.1. Purpose
    3.2. Stage
    3.3. Maintaining Organization
    3.4. Relevance to the W3C
4. XLIFF
    4.1. Purpose
    4.2. Stage
    4.3. Maintaining Organization
    4.4. Relevance to the W3C
5. Summary
    5.1. Language/Locale Identification
    5.2. Localization Properties of XML Formats
    5.3. Localization Namespace

1. TMX

TMX is the Translation Memory eXchange format.
Web site: http://www.lisa.org/tmx.

1.1. Purpose

Allows the transfer of translation memories between from a translation tool to another. A translation memory (TM) is a collection of source entries with their translations in one or more target languages.

Example: TMXExample.xml (normally the file uses a .tmx extension).

1.2. Stage

Version 1.3 was released on August 29th 2001.
The format is implemented, at various degrees, by most translation and localization tools.

1.3. Maintaining Organization

The OSCAR Special Interest Group at LISA (the Localisation Industry Standards Association).

1.4. Relevance to the W3C

Only relatively relevant. One of the main common areas of interest is the definition of a set of proper identifier for languages. Currently TMX uses xml:lang but the consensus is that the values do not cover all necessary languages/locales (for example, Latin-American Spanish). OSCAR has a sub-committee on this topic.

2. TBX

TBX is the TermBase eXchange format. It is also known as DXLT (Default XLT format (XLT: XML representations of Lexicons and Terminologies)).
Web site: http://www.ttt.org/oscar/xlt/DXLT.html.

2.1. Purpose

Allows the transfer of glossaries between from translation tool to another. The format is based on ISO 12200: MARTIF (Machine-Readable Terminology Interchange Format).

Example: TBXExample.xml (normally the file uses a .tbx extension).

2.2. Stage

Still at a draft stage, but well advanced.

2.3. Maintaining Organization

SALT (Standards-based Access service to multilingual Lexicons and Terminologies) at BYU.

2.4. Relevance to the W3C

Only relatively relevant. Currently TBX uses a lang attribute, it plans to uses xml:lang but the consensus is that the values do not cover all necessary languages/locales (for example, Latin-American Spanish).

3. OLIF

OLIF is the Open Lexicon Interchange Format.
Web site: http://www.olif.net.

3.1. Purpose

Allows the transfer of terminlogogical and lexical data between from translation tool to another. This is close to the same purpose as TBX, but OLIF is more geared toward NLP data (for example: Machine Translation lexicons). Designed for 6 languages for now.

Example: OLIFExample.xml.

3.2. Stage

Version 2.0 still at a draft stage, but well advanced.

3.3. Maintaining Organization

The OLIF Consortium. (Note: the OLIF Consortium and the SALT group collaborate closely).

3.4. Relevance to the W3C

Only relatively relevant. One of the main common areas of interest is the definition of a set of proper identifier for languages. Currently OLIF uses a <language> element and a <geogUsage> element.

4. XLIFF

XLIFF is the XML Localisation Interchange File Format.
Web site: http://www.oasis-open.org/committees/xliff.

4.1. Purpose

Allowing the transfer of localizable data extracted from various original files from one stage of the localization process to the next, up to merging the localized data back into its original format.

Example: XLIFFExample.xml (normally the file uses a .xlf extension).

4.2. Stage

Version 1.0 is within days to be moved as a Committee Specification, and to be submitted to be a OASIS Standard.

4.3. Maintaining Organization

The XLIFF Technical Committee at OASIS.

4.4. Relevance to the W3C

There are several common area of interest:

  1. As for all other localization-related formats: definition of a set of proper identifier for languages. Currently XLIFF uses xml:lang, source-language, and target-language with the same values as for xml:lang.
  2. As XLIFF takes care of extracting localizable text, it is interested also in a standard rule file for defining the different parts that are translatable in a given XML format.
  3. In addition, XLIFF elements carry a number of properties associated to each translatable piece of text (i.e. maxwidth, maxheight, maxbytes, etc.). It would be very advantageous to have a standard way of defining such properties, either for a given vocabulary (along with the rule file or as part of the schema), as well as within any XML document (as a standard set of attributes and elements belonging to a reserved namespace). Many of the XLIFF attributes should have a counterpart in this namespace.

5. Summary

There are several areas where localization-related formats have currently a need for some kind of standardization that may be relevant for W3C work:

5.1. Language/Locale Identification

Need for a better mechanism to identify languages and/or locales. Several participants have express some ideas on this topic:

5.2. Localization Properties for XML Formats

Need for a way to identify the localizable elements and attributes of an XML vocabulary. Several participants have express some ideas on this topic:

5.3. Localization Namespace

Need for a common way to provide additional localization-specific information within XML documents. Several participants have express some ideas on this topic: