Clarify C034 in case of heuristics

This is a last call comment from Björn Höhrmann (bjoern@hoehrmann.de) on
the Character Model for the World Wide Web 1.0
(http://www.w3.org/TR/2002/WD-charmod-20020430/).

Semi-structured version of the comment:

Submitted by: Björn Höhrmann (bjoern@hoehrmann.de)
Submitted on behalf of (maybe empty): 
Comment type: other
Chapter/section the comment applies to: 4.4.2 Character encoding identification
The comment will be visible to: public
Comment title: Clarify C034 in case of heuristics
Comment:
Section 4.4.2, Character encoding identification

[...]
  C034 [C] Content MUST make use of available facilities for character
  encoding identification by always indicating character encoding; where
  the facilities offered for character encoding identification include
  defaults (e.g. in XML 1.0 [XML 1.0]), relying on such defaults is
  sufficient to satisfy this identification requirement. 
[...]

This needs some clarification. Is this a requirement because otherwise the implementation does not know the encoding of the content? What if the specification requires heuristics, would content still be required to include such information? For example, would a CSS 2.1 style sheet be required to have either a charset parameter or the @charset rule (or maybe a BOM) in order to conform to C034? CSS 2.1 has a default but it applies only if the style sheet is loaded without a referring resource (editors or validators might do this, browsers typically not [1]), so
it seems that most cases style sheets would be required to have charset/@charset which would be most reasonable but I think there is not necessarily consensus to this effect in the CSS WG.

[1] which raises an interesting question, would a style sheet considered
    by a "View Style Sheet Source" function in a browser be considered
    to have no referring document and thus show different content than
    what was applied to the document? ...


Structured version of  the comment:

<lc-comment
  visibility="public" status="pending"
  decision="pending" impact="pending" id="LC-">
  <originator email="bjoern@hoehrmann.de"
      >Björn Höhrmann</originator>
  <represents email=""
      >-</represents>
  <charmod-section href='http://www.w3.org/TR/2004/WD-charmod-20040225/#sec-EncodingIdent'
    >4.4.2</charmod-section>
  <title>Clarify C034 in case of heuristics</title>
  <description>
    <comment>
      <dated-link date="2004-04-13"
         href="http://www.w3.org/mid/353642239.20040413071107@toro.w3.mag.keio.ac.jp"
        >Clarify C034 in case of heuristics</dated-link>
      <para>Section 4.4.2, Character encoding identification

[...]
  C034 [C] Content MUST make use of available facilities for character
  encoding identification by always indicating character encoding; where
  the facilities offered for character encoding identification include
  defaults (e.g. in XML 1.0 [XML 1.0]), relying on such defaults is
  sufficient to satisfy this identification requirement. 
[...]

This needs some clarification. Is this a requirement because otherwise the implementation does not know the encoding of the content? What if the specification requires heuristics, would content still be required to include such information? For example, would a CSS 2.1 style sheet be required to have either a charset parameter or the @charset rule (or maybe a BOM) in order to conform to C034? CSS 2.1 has a default but it applies only if the style sheet is loaded without a referring resource (editors or validators might do this, browsers typically not [1]), so
it seems that most cases style sheets would be required to have charset/@charset which would be most reasonable but I think there is not necessarily consensus to this effect in the CSS WG.

[1] which raises an interesting question, would a style sheet considered
    by a &#x22;View Style Sheet Source&#x22; function in a browser be considered
    to have no referring document and thus show different content than
    what was applied to the document? ...</para>
    </comment>
  </description>
</lc-comment>

Received on Tuesday, 13 April 2004 03:11:09 UTC