Re-using Strings in Scripted Content

In this article we look at a particular design and development practise that can cause major problems for translation of content. Many programmers and designers decide that if a particular string is used in many places, they will use copies of the same string rather than implement many identical strings. The perceived advantages to this are to save on memory, to promote consistency in the source and, sometimes, to save on translation cost.

String reuse is not necessarily a bad thing. It sometimes makes very good sense to ask the translator to translate a string once rather than 50 times. The trick is to know what constitutes a good candidate for reuse and what does not.

If you get it wrong, you can be creating an insuperable obstacle to good localization.

The problem

In the example below, the designer has decided to copy the single English string On to several different locations on the user interface, rather than create three separate instances of the string.

A diagram showing the word 'On' being inserted into 3 panels containing the text 'Printer', 'Stacker', and 'Stapler options'.

In Spanish, whereas the printer would normally be encendida, that is, on-line, the stacker would be encendido and the stapler options would be activadas, that is, enabled.

If there is only a single available instance of the string expressing the English idea On, these variations cannot be expressed. It is therefore impossible to obtain a correct translation, and the quality of the Spanish user interface is seriously impaired.

These differences arise out of the way agreement and concept mappings change in translation and cannot be avoided. The Spanish word for On can be translated in at least 12 ways: activado, conectado, encendido, activada, conectada, encendida, activados, conectados, encendidos, activadas, conectadas and encendidas.

Examples of good and bad string re-use

The key to deciding whether or not it is appropriate to reuse a string is knowing whether or not the string will be used in different contexts.

In many languages an adjective like none usually agrees with its subject. Let's suppose, as an example, that the word none is used three times in the message text. If each use refers to one of shading, line and background color, it is used in three different contexts and so there should be three different strings. If, however, the three uses were only ever used to refer to shading, it would have only one context and would therefore be a good candidate for reuse.

Bad candidates for reuse

The following examples are taken from a real product. The product team who developed a message set submitted a list of strings they were thinking of reusing to the localization group, who provided the following feedback.

Any word that will change its shape through agreement is a bad candidate — for example, all adjectives. The following strings in the message set just contained adjectives:

Many ordinary words are also subject to differences in translation according to the context because of different concept mappings between languages. Here are some examples of strings that were bad candidates for reuse:

Here are a few explanations of why there are issues with these unsuitable terms.

Time in German can be expressed as Uhr for current time, that is, nine o'clock, but as Dauer for duration, that is, the file was downloading for nine hours.

Reset appears to be a technical word, but is a special case. It has two translations in Dutch. The general translation for this term is Herstel. However, when talking about a System Reset, the appropriate translation is Herstarten or Herstart. This illustrates the usefulness to have the localization group review the strings you propose to reuse, since it would be difficult for a developer who doesn't speak the language to spot this.

Low will have different endings in French and German according to whether it refers to something singular or plural and something masculine, feminine or (in German) neuter. This is the case for most adjectives.

None Don't forget that this also tends to agree with the subject in gender and number.

Insert before job is in progress has no object, that is, it doesn't state what you are inserting. In languages like Japanese and German the “what” may lead to different translations of the word insert.

Good candidates for re-use

Most technical words are reasonably safe candidates (but see the previous example of Reset).

Any self-contained phrase such as a complete sentence or a heading is likely to be safe.

Any word — even an adjective — is a good candidate as long as it is always used in exactly one context.

Here are some examples of good candidates for reuse from the same product review.

Cancel Used 48 times, but always in this product to cancel a dialog box. (If it had been used to cancel an operation of some kind, an additional string would be needed.)

Open the front door Used 16 times, but a self-contained sentence.

Misfed original Used 14 times, but although original is an adjective, it always refers to the original document in this product.

Paper supply Used 4 times, but sufficiently technical and specific to have a single translation.

Recommendations

Strings should be reused where text is always used in exactly the same context, or where the string is a self-contained, independent sentence or phrase.

Reused strings must not refer to more than one text, graphic or conceptual context.

If in doubt as to whether a string is a good candidate for re-use, don't. Create separate strings instead. If you are in a position to ask advice of localization experts, that can often be helpful.

If re-used strings will be displayed in fixed-sized displayers of varying sizes, ensure that the translation will all fit in the smallest sized display box. Remember, also, that text is likely to grow in translation from English and Chinese into other languages.

By the way

String re-use is very commonly seen in association with composite messages. It tends to be a problem when variable substrings in a composite message are shared with more than one parent string.

Optimizing memory usage by reducing multiple instances of the same string to one can still be done for all strings - even adjectives, but it should be done after translation, not before. It should also be done on a language by language basis. This would mean, for example, that the word On is optimized to one string in English, but three or more strings in Spanish.