ITS WG Collaborative editing page
Follow the conventions for editing this page.
Status: Initial Draft ie. please focus on technical content, rather than wordsmithing at this stage.
Author: Yves Savourel
Handling of White Spaces
It must be possible to specify for a given element content how white spaces are to be handled (i.e. whether they are to preserve or collapsible).
Knowing whether the white spaces in a given element (especially the line-breaks) are collapsible or not is important for proper segmentation and matching when using computer assisted translation tools.
There are three main types of wrapped text:
1. Text formatted for no special reasons:
<para>This is the first sentence of the paragraph. It's followed by a second sentence.</para>
2. Text where line-breaks can be segment-breaks:
<data name="CMD_USAGE"> <value>Usage: po2xliff input[ options[ output]] Where options are: -trg : create target entries -fill: fill the target entries with the source text</value> </data>
3. Text intentionally pre-formatted for display constraints without regard for the linguistic aspects:
<print witdh="75">Copyright (C) 2005 Okapi Framework Developers This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA</print>
- It is important for translation tools to make a difference between the first example (text can be collapsed safely) and the last two (text should not be collapsed).
- It is important for translation tools to have a way to address the differences between the two last examples (i.e. how line-breaks should be treated).
- The indication of whether white-spaces should be preserved or not should be accessible from the document itself, as defining the information at the rendering level (e.g. in a CSS style-sheet) may not be accessible for the translation tool.
There are case where the white space handling can be overriden at the style sheet level only, bypassing information withing the XML document itself: CSS allows the property 'white-space' (See ).
[[CL Should we possibly go for a general requirement (stated in the guidelines) along the lines of "canonicalize your XML" (see ).]]
The xml:space="preserve" attribute may provide a solution for some of these requirements at the document instance level.
YS--''' Not sure if it is important to be noted, but xml:space defines only "preserve" and "default", "default" not being necessarily "do-not-preserve". Do we have situations where "do-not-preserve" would be needed? '''
The whiteSpace constraint defined in the XML Schema Part 2: Datatypes Second Edition may provide a solution for these requirements at the schema level.
[[ GS-- Keep at least two whiteSpaces as a default for target Indian languages (e.g. Bangla, Hindi etc in order to make two words visible (i.e., separated by space). ]]