ACTION-124: Write an appropriately reworded email to ULI expressing concern about the SEGEMENTATION character

Write an appropriately reworded email to ULI expressing concern about the SEGEMENTATION character

State:
closed
Person:
Felix Sasaki
Due on:
May 16, 2012
Created on:
May 9, 2012
Related emails:
  1. [minutes] Internationalization Core WG telecon 2012-05-30 (from ishida@w3.org on 2012-05-31)

Related notes:

Mail below sent 17 May.

=================
Dear Helena, all,

this is a follow-up on the "segmentation character proposal" discussion we had a while ago. Arle Lommel might have told you already that he and I have talked about this topic. Arle opened also a related issue within the W3C MultilingualWeb-LT Working Group, see

http://lists.w3.org/Archives/Public/public-multilingualweb-lt/2012May/0081.html

I assume that you are aware of this discussion, so this mail is just to let you know that the i18n core WG proposes that we should be involved before finalizing this, since this group has discussed issues in the area of "characters vs. markup" before.

I don't know how to make this involvement happen - one approach might be to ping i18n core once you have consensus in ULI about a proposal. We don't want to interfere with your discussion, so just let me know what works for you.

Regards,

Felix

2011/10/14 Felix Sasaki <felix.sasaki@dfki.de>
Dear Lisa and Helena,

recently the W3C i18n Core Working Group was made aware of a proposal [1] to add a SEGMENTATION MARKER character to the General Punctuation block of Unicode.
We think that this proposal should not be adopted, for several reasons.

(1) As Unicode TR20 [2] points out, there are many characters that are not suitable for the usage with markup. [3] provides an overview. We think that the function of expressing segmentation information falls into the same category as such characters, e.g. related to language information, ruby, or bidi embedding controls.

(2) Introducing a segmentation marker creates problems somewhat in the same manner as bidi embedding controls. Processes that are modifying the text without being aware of the special role of such markers may inadvertently create unexpected side effects.

(3) The proposal for a segmentation marker character argues that the mechanism is needed for plain text. Citing from [1]: "While there are markup-based solutions, none of these are widely used and such methods are not supported for plain-text solutions." We believe that rather than introducing a plain-text solution that creates issues for users and tooling like with bidi embedding controls (see above), the ULI TC should encourage the usage of markup-based solution. We are worried that introducing the new character will break more existing tools than promoting the usage of existing markup.

(4) TR20 mentions at [3] a list of existing markup like <xhtml:br />, <xhtml:p></xhtml:p>. The xhtml namespace here and the usage of XML is only an example - for other formats similar mechanisms are available. See SSML [4] for an example. Like in [4], these mechanisms are not meant as a general segmentation marker, but own specific processing requirements like display in HTML or boundary controls in speech synthesis. Nevertheless they demonstrate the feasibility of markup-based solutions.

(5) If only a character-based solution, as opposed to a markup-based solution, may satisfy the requirement, an update of UAX29 should be considered: the Unicode repertoire should be looked up carefully in search of an existing character which could be used as a segmentation marker, in preference to inventing a new character for that purpose.

On behalf of the i18n Core Working Group,

Felix

[1] http://dl.dropbox.com/u/223919/uli/SEGMENTATION-MARKER-proposal.pdf
[2] http://unicode.org/reports/tr20/
[3] http://unicode.org/reports/tr20/#Charlist
[4] http://www.w3.org/TR/speech-synthesis/#S3.2.3


--
Felix Sasaki
DFKI / W3C Fellow
=================

Felix Sasaki, 17 May 2012, 07:01:28

Done on 17 May.

Felix Sasaki, 18 May 2012, 15:50:41

[aphillip]: some confusion on other end?

30 May 2012, 15:09:11

Display change log.


Addison Phillips <addisonI18N@gmail.com>, Chair, Richard Ishida <ishida@w3.org>, Bert Bos <bert@w3.org>, Fuqiao Xue <xfq@w3.org>, Atsushi Shimono <atsushi@w3.org>, Staff Contacts
Tracker: documentation, (configuration for this group), originally developed by Dean Jackson, is developed and maintained by the Systems Team <w3t-sys@w3.org>.
$Id: 124.html,v 1.1 2023/07/19 12:01:44 carcone Exp $