APIs vs. physical string representations

This is a last call comment from Björn Höhrmann (bjoern@hoehrmann.de) on
the Character Model for the World Wide Web 1.0
(http://www.w3.org/TR/2002/WD-charmod-20020430/).

Semi-structured version of the comment:

Submitted by: Björn Höhrmann (bjoern@hoehrmann.de)
Submitted on behalf of (maybe empty): 
Comment type: other
Chapter/section the comment applies to: 4.4 Choice and Identification of Character Encodings
The comment will be visible to: public
Comment title: APIs vs. physical string representations
Comment:
4.4, Choice and Identification of Character Encodings:

[...]
  C016 [S] When designing a new protocol, format or API, specifications
  SHOULD mandate a unique character encoding.
[...]

This would only be a good thing if everyone adheres to this approach. Consider the DOM, it requires UTF-16 yet many DOM implementations are
non-conforming just because it made more sense for them to use UTF-8,
for example if the programming language uses UTF-8 for strings and the DOM implementation is based on this string type. In fact, if the language provides a Unicode string type, the internal storage should not matter for an API specification. I am fine with this for protocols and content.


Structured version of  the comment:

<lc-comment
  visibility="public" status="pending"
  decision="pending" impact="pending" id="LC-">
  <originator email="bjoern@hoehrmann.de"
      >Björn Höhrmann</originator>
  <represents email=""
      >-</represents>
  <charmod-section href='http://www.w3.org/TR/2004/WD-charmod-20040225/#sec-Encodings'
    >4.4</charmod-section>
  <title>APIs vs. physical string representations</title>
  <description>
    <comment>
      <dated-link date="2004-04-08"
         href="http://www.w3.org/mid/598374775.20040408212915@toro.w3.mag.keio.ac.jp"
        >APIs vs. physical string representations</dated-link>
      <para>4.4, Choice and Identification of Character Encodings:

[...]
  C016 [S] When designing a new protocol, format or API, specifications
  SHOULD mandate a unique character encoding.
[...]

This would only be a good thing if everyone adheres to this approach. Consider the DOM, it requires UTF-16 yet many DOM implementations are
non-conforming just because it made more sense for them to use UTF-8,
for example if the programming language uses UTF-8 for strings and the DOM implementation is based on this string type. In fact, if the language provides a Unicode string type, the internal storage should not matter for an API specification. I am fine with this for protocols and content.</para>
    </comment>
  </description>
</lc-comment>

Received on Thursday, 8 April 2004 17:29:17 UTC