This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 13309 - HTML and XHTML handle newline differently
Summary: HTML and XHTML handle newline differently
Status: RESOLVED WONTFIX
Alias: None
Product: HTML WG
Classification: Unclassified
Component: LC1 HTML5 spec (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-20 12:08 UTC by Franklin Tse
Modified: 2011-08-09 23:17 UTC (History)
6 users (show)

See Also:


Attachments
text/html test document (390 bytes, text/html)
2011-07-20 14:36 UTC, Franklin Tse
Details
application/xhtml+xml test document (390 bytes, application/xhtml+xml)
2011-07-20 14:37 UTC, Franklin Tse
Details
Firefox 5 and Chrome 12 output of both HTML and XHTML documents (86 bytes, text/plain)
2011-07-20 14:39 UTC, Franklin Tse
Details
IE 9 output of text/html document (108 bytes, text/plain)
2011-07-20 14:39 UTC, Franklin Tse
Details
IE 9 output of application/xhtml+xml document and Opera output for both HTML and XHTML (94 bytes, text/plain)
2011-07-20 14:40 UTC, Franklin Tse
Details

Description Franklin Tse 2011-07-20 12:08:37 UTC
text/html and application/xhtml+xml documents currently have 2 different newline handling rules, causing inconsistent rendering even when the markup is the same. The rule for text/html ignores character reference of U+000D CARRIAGE RETURN (Section 8.1.3.1 of HTML 5 spec) while the rule for application/xhtml document translates both the two-character sequence #xD #xA and any #xD that is not followed by #xA to a single #xA character. (Section 2.11 of XML spec).

Should HTML 5 follow the handling rule of XML?
Comment 1 Henri Sivonen 2011-07-20 14:30:15 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are
satisfied with this response, please change the state of this bug to CLOSED. If
you have additional information and would like the editor to reconsider, please
reopen this bug. If you would like to escalate the issue to the full HTML
Working Group, please add the TrackerRequest keyword to this bug, and suggest
title and text for the tracker issue; or you may create a tracker issue
yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: The spec already says what is requested
Change Description: no spec change
Rationale:

HTML5 doesn't ignore carriage returns. Lone literal carriage returns turn into line feeds. So does the CRLF pair. An escaped carriage return ends up in the DOM as a carriage return.

Just like in XML.

I don't know which particular passage or sentence in which particular publication you are referring to, so I can't comment on the exact text that made it seem as though this wasn't already the case. For future bugs: Please avoid referring to the section numbers of the HTML5 spec. The numbering changes all the time and differs between different views and publication venues of HTML5. In the future, please use a URL with a fragment id part to refer to a section of the spec.
Comment 2 Franklin Tse 2011-07-20 14:34:59 UTC
Under Section 8.1.3.1 of HTML 5:

"Where character references are allowed, a character reference of a U+000A LINE FEED (LF) character (but not a U+000D CARRIAGE RETURN (CR) character) also represents a newline."

In other words, 
 is ignored.

I will upload some browser test results to this report soon.
Comment 3 Franklin Tse 2011-07-20 14:36:45 UTC
Created attachment 1013 [details]
text/html test document
Comment 4 Franklin Tse 2011-07-20 14:37:23 UTC
Created attachment 1014 [details]
application/xhtml+xml test document
Comment 5 Franklin Tse 2011-07-20 14:39:27 UTC
Created attachment 1015 [details]
Firefox 5 and Chrome 12 output of both HTML and XHTML documents
Comment 6 Franklin Tse 2011-07-20 14:39:51 UTC
Created attachment 1016 [details]
IE 9 output of text/html document
Comment 7 Franklin Tse 2011-07-20 14:40:35 UTC
Created attachment 1017 [details]
IE 9 output of application/xhtml+xml document and Opera output for both HTML and XHTML
Comment 8 Franklin Tse 2011-07-20 14:43:11 UTC
IE 9's text/html output is buggy while application/xhtml+xml output follows XML spec. Firefox and Chrome use HTML 5's handling regardless of media type. Opera uses XML spec's handling regardless of media type.
Comment 9 Michael[tm] Smith 2011-08-04 05:01:27 UTC
mass-moved component to LC1
Comment 10 Ian 'Hixie' Hickson 2011-08-09 23:17:46 UTC
(In reply to comment #2)
> Under Section 8.1.3.1 of HTML 5:
> 
> "Where character references are allowed, a character reference of a U+000A LINE
> FEED (LF) character (but not a U+000D CARRIAGE RETURN (CR) character) also
> represents a newline."
> 
> In other words, 
 is ignored.

No, what you quote is not at all saying that 
 is ignored. It's saying that if you want to represent a newline, you can use 
, but not 
.

If you read the parser section, you'll see 
 turns into a U+000D character, which doesn't represent anything useful in HTML.


EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Rejected
Change Description: no spec change
Rationale: Spec is as intended.