Treating carriage return as white space in layout

Context:
https://bugzilla.mozilla.org/show_bug.cgi?id=557197
https://bugzilla.mozilla.org/show_bug.cgi?id=534071

An XML Processor or an implementation of the HTML5 parsing algorithm puts a carriage return in a text node in the document tree when there is 
 in the source. It is also possible to introduce a carriage return into document tree via scripting. Thus, a carriage return may participate in layout.

The CSS 2.1 'white-space' processing model is oddly inconsistent about the treatment of carriage return.

First, http://www.w3.org/TR/CSS21/text.html#white-space-prop says: "Newlines in the source can be represented by a carriage return (U+000D), a linefeed (U+000A) or both (U+000D U+000A) or by some other mechanism that identifies the beginning and end of document segments, such as the SGML RECORD-START and RECORD-END tokens. The CSS 'white-space' processing model assumes all newlines have been normalized to line feeds."

It's not exactly clear to me if the last sentence is an informative statement or a normative statement. As an informative statement it's misleading, since both XML and HTML5 parsers can introduce carriage returns into the document tree even though such carriage returns weren't any kind of line breaks in the source. As a normative statement it would introduce needless complexity if it meant that the CSS formatting layer re-normalized CRLF to LF.

Regardless of whether the last sentence of the quote above is meant as informative or normative, the next section of the spec (http://www.w3.org/TR/CSS21/text.html#white-space-model) implies that a carriage return may be found in a text node in the document tree.

In point #1, carriage return is treated as white space and trimmed like space, line feed and tab. Yet, carriage return is not mentioned in either point #3 or point #4 subpoint #1. Since carriage return is mentioned in point #1, this implies that CSS treats carriage return as somehow white space-like. Therefore, I'd expect CSS to treat carriage return as white space-like in the later points: either as line feed-like or as tab-like.

CSS3 Text is more vague: http://dev.w3.org/csswg/css3-text/#white-space-processing

Changing Gecko to treat carriage return as tab-like for the purpose of white space treatment in layout fixed both Gecko bugs mentioned at the start of this email.

Could you please change CSS specs so that:
 1) the possibility of a carriage return making its way to the document tree is acknowledged clearly
 2) it is clear that the CSS formatter doesn't normalize CRLF to LF
 3) CR is treated equivalently to tab or line feed for the purpose of white space processing

I'm not quite sure which is better in point #3, though I'm guessing that treating CR as not having line breaking semantics is better since CR tends to end up in the document tree by mistake, which is why my initial Gecko patch treated CR like tab--except for tab stops of course.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/

Received on Tuesday, 11 May 2010 13:15:26 UTC