ITS WG Collaborative editing page

Status: Working Draft

Author: Masaki Itagaki

Term Identification


[R007] It should be possible to identify terms inside an element or a span and to provide data for terminology management and index generation. Terms should be either associated with attributes for related term information or linked to external terminology data.


The capability of specifying terms within the source content is important for terminology management that is beneficial to translation/localization quality. Terms to be identified include any domain-specific words and abbreviations for which translators need additional information in order to find appropriate concepts in their target languages. Term identification also facilitates the creation of glossaries and allows validation of terminology usage in the source and target documents.

Meanwhile, identified terms could be used for indexing that may require some language specific information. For example, Japanese words are sorted not by script characters, but by phonetic characters. Therefore when a Japanese index item is created, it should be accompanied with a phonetic string, called Yomigana.

As a result, terms may require various attributes, such as part of speech, gender, number, term types, definitions, notes on usage, etc. To avoid such a large attribute data is repeated within a document, it should be possible for identified terms to link to externalized attribute data, such as glossary documents and terminology database.


For more details, please see discussions on term links at OASIS/XLIFF[1].

The OSCAR/TBX working group is currently working on drafting the TBX-Link specification [TBX-Link][2].

This requirement is related to the "[3]" usage scenario.

Quick Guidelines

An example of externalization of term attributes:

<source>Our guests can appease their spirit of adventure and itchy feet 
by exploring the various islands of 
our small  '''<term cID="1234ABE34FE">archipelago</term>'''</source>.

The term, archipelago, could link to the following external data:

<termEntry cID="1234ABE34FE">
   <def>Group of island<def>
   <pronunciation>"är-k&-'pe-l&-"gO, "är-ch&-<pronunciation>