ISSUE-496: note about character counts ⓟ

note about character counts ⓟ

State:
CLOSED
Product:
find-text
Raised by:
Addison Phillips
Opened on:
2015-10-15
Description:
http://www.w3.org/TR/2015/WD-findtext-20151015/#introduction

In the introduction there is this note:

--
For character counts in ranges, what exactly would be counted as a character? Unicode code points? Graphemes?
--

This is an important distinction. I would suggest that the most commonly needed offset will be either in (a) Unicode code points or (b) in UTF-16 code units.

The latter would be best for JavaScript and DOM access, which are based on UTF-16 and thus would allow direct application of APIs such as substring().

The former would be better from a pure API perspective and for computing things such as string length in "characters".

Grapheme clusters can be complex and, while APIs may wish to find grapheme boundaries or to avoid splitting withing a grapheme , it is rarely the case that API access should be in these terms. Indeed, in some cases, it may be desirable to find text withing a grapheme and not the entire thing.
Related Actions Items:
No related actions
Related emails:
  1. Weekly github digest (Tracker items) (from sysbot+gh@w3.org on 2021-03-03)
  2. Daily github digest (WG INTERNAL review issues) (from sysbot+gh@w3.org on 2021-02-26)
  3. I18N-ISSUE-496: note about character counts ⓟ [find-text] (from sysbot+tracker@w3.org on 2015-10-15)

Related notes:

https://github.com/w3c/i18n-activity/issues/102

Richard Ishida, 17 Mar 2016, 12:29:03

Display change log ATOM feed


Addison Phillips <addisonI18N@gmail.com>, Chair, Richard Ishida <ishida@w3.org>, Bert Bos <bert@w3.org>, Fuqiao Xue <xfq@w3.org>, Atsushi Shimono <atsushi@w3.org>, Staff Contacts
Tracker: documentation, (configuration for this group), originally developed by Dean Jackson, is developed and maintained by the Systems Team <w3t-sys@w3.org>.
$Id: 496.html,v 1.1 2023/07/19 12:02:05 carcone Exp $