Proposed revision of CSS2.1 description of backslash escapes

I wanted to provide a concrete proposal to deal with the backslash
issues I raised in
http://lists.w3.org/Archives/Public/www-style/2010Feb/0150.html and
http://lists.w3.org/Archives/Public/www-style/2010Feb/0210.html but
was not able to do so in a way that made sense, without rewriting the
whole section.  So here is a rewrite of the whole section. :)

I *believe* that the only normative changes are to clarify the behavior
of \-newline not within a string, and \-EOF in any context.  However, I
may have made errors.  Please let me know if you find any.

The attached diff is not especially readable so I also append the text
that should entirely replace the third bullet point of section 4.1.3.

zw

--- new text ---
    <li><p>Backslash (\) characters are not significant inside
	<a href="#comments">comments</a>.  Elsewhere, they
	introduce <span class="index-def" title="backslash
	escapes"><a name="escaped-characters"><dfn>character
	escapes</dfn></a></span>.</p>

        <p>Some character escapes have the effect of inserting a
        character into the style sheet, in place of the escape.
        Whenever this happens, the inserted character is treated as
        either part of an identifier, or part of a string, even if it
        normally would have some special meaning.  See the examples
        below.</p>

	<p>If a backslash is immediately followed by the end of the
	style sheet, it is a normal character, not an escape.</p>

    <ol>
      <li><p>Within <a href="#strings">strings</a>, a backslash
	followed by a newline is ignored; i.e., the string continues
	on the next line, but with neither the backslash nor the
	newline included in the string's value.  Outside strings, a
	backslash followed by a newline is a normal punctuation
	character.</p></li>

      <li><p>A backslash followed by one to six hexadecimal digits,
	[0-9a-fA-F], inserts the ISO 10646
	(<a href="refs.html#ref-ISO10646" rel="biblioentry"
	class="noxref"><span class="normref">[ISO10646]</span></a>)
	character with that number into the style sheet.</p>

	<p>One (and only one) white space character is ignored after a
	hexadecimal escape of any length.  This rule allows authors to
	write hexadecimal escapes that are immediately followed by
	characters from the set [0-9a-fA-F], without ambiguity.  For
	instance, <samp>"\26&nbsp;B"</samp>,
	<samp>"\000026B"</samp>, and <samp>"\000026&nbsp;B"</samp> are
	all equivalent to <samp>"&amp;B"</samp>.
	However, <samp>"\26B"</samp> is equivalent
	to <samp>"&#619;"</samp> (a string containing the single
	character U+026B).</p>

	<p>If a hexadecimal escape would insert the character with
	code point U+0000, the behavior is undefined.  Hexadecimal
	escapes that are outside the range allowed by Unicode
	(e.g. "\110000" stands for a character above the current limit
	of U+10FFFF) may be treated as inserting the "replacement
	character" (U+FFFD).  If such characters are to be displayed,
	the UA should show a visible symbol, such as a "missing
	character" glyph (cf. <a href="fonts.html#algorithm">15.2,</a>
	point 5).</p></li>

      <li><p>A backslash followed by any other character (neither a
	  hexadecimal digit nor a newline) simply removes that
	  character's special meaning.  For instance, <samp>"\""</samp>
	  is a string consisting of one double quote, <samp>a\:b</samp>
	  is an identifier consisting of the three characters
	  <samp>a:b</samp>, and <samp>"te\nt"</samp> is exactly the
	  same string as <samp>"tent"</samp>.  <samp>\7B</samp> is not
	  punctuation, even though <samp>{</samp> is,
	  and <samp>\32</samp> is allowed at the start of an
	  identifier, even though <samp>2</samp> is not).</p></li>
    </ol>

    <p class="note">Style sheet preprocessors are free to convert
      escape sequences to the equivalent characters, or vice versa, as
      long as they do not change the style sheet's meaning.  For
      instance, "\61 b" may be rewritten as "ab"; "a\3a b" may be
      rewritten as "a\:b" or vice versa, but not "a:b".</p>
  </li>

Received on Wednesday, 24 February 2010 00:36:53 UTC