ITS WG Collaborative editing page

Follow the conventions for editing this page.

Author: Tim Foster Type: Spec, requires scope

Span-like Element

Summary

[R002] span-like element is required to allow authors to mark sections text that may have special properties, from a localization and internationalization point of view.

Challenges

Given a section of XML text, there's often insufficient information in the original markup in order to determine how exactly the contents should be dealt with from a localization and internationalization point of view. Adding various span-like elements to the markup at the authoring stage, would allow this information to be passed on to localization processes (either human or machine assisted processes).

For example, span-like elements could be used to mark sections of text that need to be translated by a domain-expert (as with source code fragments) or mark those that need special terminology in order to be properly translated. In particular, a span-like element can be useful to help translation tools determine where to apply sentence-breaks and also to assist metrics-calculating algorithms.

A span-like element is also extremely useful for marking language information in source files that translation tools can use to determine which translation process to use for each given section of text (e.g. a Latin quotation in a section of English text is often intended to be left in Latin for the translated version of the English text.) Other uses are foreseen, within the scope of the ITS.

One example would be the following sentence, which contains some source code that we would like to treat specially during translation:

Text with portion of source code

The Java statement System.out.println("Hello World!"); prints the text "Hello World!" to standard output.

Here, we would like to put a span-like element around the source code fragment to indicate that it is not standard text for translation and should be translated by a someone familiar with the Java programming language. Also, translation tools should treat the exclamation points in this sample text carefully with respect to sentence-segmentation if they perform that function.

While the <code> tag in XHTML could be used to markup this text (in an XHTML document), it is often not specific enough for translators: it does not tell the translator what sort of source code is contained inside the tag, nor does it mark which portions of the code contents are translatable.

A suggestion of the sort of usage we could foresee for a span-like element could be the following:

Text with marked-up source code

The Java statement <code> <span trans="no"> System.out.println(" </span> Hello World <span trans="no"> "); </span> </code> prints the text "Hello World!" to standard output.

An alternative to this sort of construction, would be to put the translatable text in a separate document, and then refer to that using using some form of linking mechanism, for example:

Source code with entity reference

<code>System.out.println("&java.code.example.text;");</code>

Another example is shown below, where we have a piece of text that contains a file name which should also not be translated:

Text with non-translatable file name
The file /etc/passwd is a local source of information about users' accounts.

In this case, the filename /etc/passwd should not be translated, and we would like to add markup to indicate this.

In these examples, we show that we are aiming to shift some of the responsibility of identifying translatable versus non-translatable content off the translation tools author, on to the content author, or at the very least, make recommendations to content authors to separate out the translatable versus non-translatable portions of text more clearly.

Notes

This requirement is related to some other requirements, namely :

For the Section "http://esw.w3.org/topic/its0504ReqPurposeSpecMap", we need to ensure that any related semantics in the target schema are also sufficient for translation: that is for example, saying that a <programlisting> element in DocBook is related to a <code> element in XHTML is interesting, but neither will help the translator determine which contents of <code> or <programlisting> are actually translatable.

A span-like element could be used in cases like these where specific text properties are identified.