From W3C Wiki
Jump to: navigation, search

ITS WG Collaborative editing page

Follow the conventions for editing this page.

Status: Initial Draft ie. please focus on technical content, rather than wordsmithing at this stage.

Author: Yves Savourel

Handling of White Spaces


It must be possible to specify for a given element content how white spaces are to be handled (i.e. whether they are to preserve or collapsible).


YS--''' Here is a new try for a description of the issues. '''

Knowing whether the white spaces in a given element (especially the line-breaks) are collapsible or not is important for proper segmentation and matching when using computer assisted translation tools.

There are three main types of wrapped text:

1. Text formatted for no special reasons:

<para>This is the first
sentence of the paragraph. It's followed
by a second sentence.</para>

2. Text where line-breaks can be segment-breaks:

<data name="CMD_USAGE">
 <value>Usage: po2xliff input[ options[ output]]
Where options are:
   -trg : create target entries
   -fill: fill the target entries with the source text</value>

3. Text intentionally pre-formatted for display constraints without regard for the linguistic aspects:

<print witdh="75">Copyright (C) 2005 Okapi Framework Developers

This library is free software; you can redistribute it and/or modify it
under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation; either version 2.1 of the License, or (at
your option) any later version.

This library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of
General Public License for more details.

You should have received a copy of the GNU Lesser General Public License
along with this library; if not, write to the Free Software Foundation,
Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307 USA</print>

  • It is important for translation tools to make a difference between the first example (text can be collapsed safely) and the last two (text should not be collapsed).
  • It is important for translation tools to have a way to address the differences between the two last examples (i.e. how line-breaks should be treated).
  • The indication of whether white-spaces should be preserved or not should be accessible from the document itself, as defining the information at the rendering level (e.g. in a CSS style-sheet) may not be accessible for the translation tool.


There are case where the white space handling can be overriden at the style sheet level only, bypassing information withing the XML document itself: CSS allows the property 'white-space' (See [1]).

CL I am not sure if we need to say sth. about whitespace-related changes (NEL) in XML 1.1

[[CL Should we possibly go for a general requirement (stated in the guidelines) along the lines of "canonicalize your XML" (see [2]).]]

Quick Guidelines

The xml:space="preserve" attribute may provide a solution for some of these requirements at the document instance level.

YS--''' Not sure if it is important to be noted, but xml:space defines only "preserve" and "default", "default" not being necessarily "do-not-preserve". Do we have situations where "do-not-preserve" would be needed? '''

The whiteSpace constraint defined in the XML Schema Part 2: Datatypes Second Edition may provide a solution for these requirements at the schema level.

[[ GS-- Keep at least two whiteSpaces as a default for target Indian languages (e.g. Bangla,
        Hindi etc in order to make two words visible (i.e., separated by space). ]]