BidiControlsAndPlainStrings

From Internationalization

Bidi Controls and Plain Strings on the Web

This page is in response to https://www.w3.org/International/track/actions/702

Introduction

In a recent thread on www-international@ and later in teleconference, we discussed the implications of the addition of isolating bidi controls for strings on the Web. In particular, the question is whether externalized strings such as those found in a content management system (CMS) or in localized string tables should routinely include the isolating bidi controls as part of the text. This page examines both possibilities:

Proposal: Recommend including isolating controls in external strings

A recommendation to include isolating bidi controls around externalized strings would be a recommendation to authors and authoring tools. Tools might produce the isolating controls when serializing the strings and perform checking for the controls when reading, transferring, or accessing them later. The resulting isolating bidi controls would be included in the string when used at runtime.

The argument for including these controls is that the string author cannot see the context on the Web where the string will be presented. While some strings are pretty obviously standalone, many strings can be inserted into a larger string at display time. There may not be a surrounding element, such as <span> to provide isolation. If the surrounding string has the opposite base direction to the inserted string or the context isn't quite right, then the string may display poorly.


Pro

  • String can be presented in any context, including outside of a markup context, with appropriate bidi formatting
  • String can be inserted into any content without wrapping (isolating) markup

Con

  • Most strings are uni-directional, making controls superfluous: why take the extra overhead
  • Controls add to length of the string and may affect what can be in the content in length-constrained applications
  • Controls are paired and leading or trailing control might be lost due to normal processing, such as truncation
  • Controls rendering is not uniformly supported and the controls may show as tofu
  • Controls may interfere with some naive text processing processes

Proposal: Do NOT recommend including isolating controls

The long standing recommendation of the I18N WG has been to use markup for bidi and a lot of effort has been expended into providing isolating semantics in HTML.

Pro

  • Directional metadata is preferred. Having both is overkill.
  • No extra data or pair matching required.
  • No length effects

Con

  • DOM elements created dynamically or string insertion done by a rendering engine or page scripting language cannot rely on the presence or absence of isolating markup
  • Not everything is HTML: strings used in plain text and other non-HTML contexts may need controls to help rendering

Recommendation

While some strings can certainly benefit from using isolating bidi controls, consistent usage may produce layers of overhead, processing, and validation that are unnecessary. It's clear that there are use cases where the controls are the best solution, but the list of negatives and lack of clear benefit to consistently requiring isolating controls suggests that they only be automatically provided for strings that are of ambiguous direction and for which the display context is not known ahead of time. Even in these cases, providing base direction as metadata should almost always be used as well.