ITS WG Collaborative editing page

Follow the conventions for editing this page.

Author: Christian Lieske

Requirements related to Purpose specification/mapping

Summary

[R008] Currently, it does not appear to be realistic that all XML vocabularies tag localization-relevant information identical (e.g. all use the "term" tag for terms). One way to take care of diverse localization-relevant markup in localization environments is a mapping mechanism which maps localization-relevant markup onto a canonical representation (such as the Internationalization Tag Set).

Challenges

From a localization point of view, many XML vocabularies include markup which requires special attention, since the markup is associated with a specific type of content. Examples:

elements which are associated with embedded/binary graphics
elements which are associated with specific text styles (e.g. underline and bold)
elements which are associated with linking (e.g. <a> in HTML)
elements which are associated with lists
elements which are associated with tables

elements which are associated with with generated content (e.g. an element that fires a query to a database in order to pull in the data for a product catalogue)

Here are some reasons why this type of markup may require special attention:

the localization tool may be able to render specific text styles in a standard way (e.g. increased font weight for bold)
embedded binary images may have to follow a specific workflow
content generation queries may have to be adapted

Since it is hardly imaginable that all content developers will be able to work with the same elements and attributes for this specific type of content, the ITS should include markup which allows people to specify the purpose of specific elements.

Challenges arise for example from the fact that the 'source/original' vocabularies may vary widely with regards to the representation they choose for a specific data category (e.g. their markup related to graphics; see the longer discussion of this).

Notes

This requirement is related to the "Section 3.15: Limited Impact" requirement.

For the specific case of linking something to look at already exists: HLink [HLink].

The approach may be used to support term identification. Suppose that an original document has the following:

Markup to map

You can define multiple computation IDs for one company in the <index sortstr="currency restatement">Currency Restatement</index> program.

When you wish that the <index> element serves as an ITS "term", you could use the following mapping:

Mapping

<purposeSpec>
 <servesPurpose origVoc="index" its="term"/>
</purposeSpec>

One question to answer is: How can existing attributes (e.g. sortstr in the sample above) be carried over, or how can new attributes (like partOfSpeech, termType be introduced?

Quick Guideline Thoughts

The purpose specification could look like the following example:

<its:purposeSpec> <servesPurpose origVoc="img" its="graphic"/>

Here, we specify that the original vocabulary (e.g. HTML) maps to ITS. The "img" in HTML e.g. maps to "graphic" in ITS.

A specific application scenario for this requirement is a 1:1 mapping from a vocabulary in one natural language (e.g. English) to a target language (e.g. Japanese). For highly technical vocabulary (e.g. in business documents), such a mapping can ease up the task of editing a document in the target language.

A useful input for the specification of the mapping description can be [1]. This is a part of an ISO-effort to standardize various aspects of validation and XML-processing in general, see [2] for more information.