Clarify "legacy encoding"

This is a last call comment from Björn Höhrmann (bjoern@hoehrmann.de) on
the Character Model for the World Wide Web 1.0
(http://www.w3.org/TR/2002/WD-charmod-20020430/).

Semi-structured version of the comment:

Submitted by: Björn Höhrmann (bjoern@hoehrmann.de)
Submitted on behalf of (maybe empty): 
Comment type: other
Chapter/section the comment applies to: 1.2 Background
The comment will be visible to: public
Comment title: Clarify "legacy encoding"
Comment:
Section 1.2, Background:

[...]
  It should be noted that such aspects also exist in legacy encodings
  (where legacy encoding is taken to mean any character encoding not
  based on Unicode), and in many cases have been inherited by Unicode
  in one way or another from such legacy encodings.
[...]

It is not clear to me what it means for an encoding to be based on Unicode. Is US-ASCII a legacy encoding (there is a complete mapping to Unicode hence it appears to be based on Unicode)? Is UTF-7 a legacy encoding (a UTF clearly is based on Unicode, isn't it)? Or CESU-8? I would suggest to define e.g. "Unicode Encoding" (or the existing "Unicode encoding form") to mean UTF-8/16/32 and "legacy encoding" to mean all other encodings.


Structured version of  the comment:

<lc-comment
  visibility="public" status="pending"
  decision="pending" impact="pending" id="LC-">
  <originator email="bjoern@hoehrmann.de"
      >Björn Höhrmann</originator>
  <represents email=""
      >-</represents>
  <charmod-section href='http://www.w3.org/TR/2004/WD-charmod-20040225/#sec-Background'
    >1.2</charmod-section>
  <title>Clarify &#x22;legacy encoding&#x22;</title>
  <description>
    <comment>
      <dated-link date="2004-04-08"
         href="http://www.w3.org/mid/858624148.20040408212607@toro.w3.mag.keio.ac.jp"
        >Clarify "legacy encoding"</dated-link>
      <para>Section 1.2, Background:

[...]
  It should be noted that such aspects also exist in legacy encodings
  (where legacy encoding is taken to mean any character encoding not
  based on Unicode), and in many cases have been inherited by Unicode
  in one way or another from such legacy encodings.
[...]

It is not clear to me what it means for an encoding to be based on Unicode. Is US-ASCII a legacy encoding (there is a complete mapping to Unicode hence it appears to be based on Unicode)? Is UTF-7 a legacy encoding (a UTF clearly is based on Unicode, isn&#x27;t it)? Or CESU-8? I would suggest to define e.g. &#x22;Unicode Encoding&#x22; (or the existing &#x22;Unicode encoding form&#x22;) to mean UTF-8/16/32 and &#x22;legacy encoding&#x22; to mean all other encodings.</para>
    </comment>
  </description>
</lc-comment>

Received on Thursday, 8 April 2004 17:26:10 UTC