This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 10954 - plain text processing breaks text/plain; format=flowed
Summary: plain text processing breaks text/plain; format=flowed
Status: RESOLVED FIXED
Alias: None
Product: HTML WG
Classification: Unclassified
Component: pre-LC1 HTML5 spec (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: HTML WG Bugzilla archive list
URL: http://dev.w3.org/html5/spec/Overview...
Whiteboard:
Keywords: NE
Depends on:
Blocks:
 
Reported: 2010-10-01 13:57 UTC by Julian Reschke
Modified: 2010-10-12 09:41 UTC (History)
7 users (show)

See Also:


Attachments

Description Julian Reschke 2010-10-01 13:57:55 UTC
<http://dev.w3.org/html5/spec/Overview.html#read-text>:

"When a plain text document is to be loaded in a browsing context, the user agent should queue a task to create a Document object, mark it as being an HTML document, create an HTML parser, associate it with the document, act as if the tokenizer had emitted a start tag token with the tag name "pre" followed by a single U+000A LINE FEED (LF) character, and switch the HTML parser's tokenizer to the PLAINTEXT state. Each task that the networking task source places on the task queue while the fetching algorithm runs must then fill the parser's input stream with the fetched bytes and cause the HTML parser to perform the appropriate processing of the input stream."

Handling text/plain this way seems to make it impossible to handle format=flowed properly (unless I'm missing something about <pre>).

Oddly enough, the next paragraph does mention format=flowed (which is what RFC 2646 defines):

"The rules for how to convert the bytes of the plain text document into actual characters are defined in RFC 2046, RFC 2646, and subsequent versions thereof. [RFC2046] [RFC2646]"

I assume that RFC2646 is mentioned here by mistake.

That being said, it would be cool if HTML5 *allowed* UAs to do the right thing with format=flowed.
Comment 1 Boris Zbarsky 2010-10-01 14:32:39 UTC
The <pre> per se is not an issue for format=flowed.  For example, one can just change the styling on it to allow wrapping inside; the pre-wrap whitespace style would work fine, for example.

The PLAINTEXT state would be more of a problem, since it doesn't do the processing format=flowed needs to do (removing line breaks following spaces, for example).  Also, to get the wrapping of quoted text right (and especially the quoting) one probably needs to introduce more nodes into the DOM.

The big worry with that last bit is that some pages reach into text/plain DOMs and muck with them last I checked.  Maybe that's rare enough that there are no issues with modifying those DOMs slightly though.
Comment 2 Henri Sivonen 2010-10-02 13:13:40 UTC
What's the use case for serving format=flowed over HTTP? Wouldn't it make more sense not to put UA-removable line breaks in the text data and to make sure the styling of the pre allows the UA to wrap long lines?

I worry that allowing something here without requiring it would expose a different DOM in different UAs.
Comment 3 Julian Reschke 2010-10-02 14:00:11 UTC
(In reply to comment #2)
> What's the use case for serving format=flowed over HTTP? Wouldn't it make more

The same as sending it per email, I guess.

> sense not to put UA-removable line breaks in the text data and to make sure the
> styling of the pre allows the UA to wrap long lines?

As far as I understand, that would be a change over what's being done today.

> I worry that allowing something here without requiring it would expose a
> different DOM in different UAs.

I'm not sure how the DOM exposed for something that hasn't got any support today is really relevant. That being said, I'd be happy if the spec required support.

But the original issue would remain: by being overly specific the spec in principle prevents innovation in text/plain. It shouldn't.
Comment 4 Henri Sivonen 2010-10-04 06:56:03 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > What's the use case for serving format=flowed over HTTP? Wouldn't it make more
> 
> The same as sending it per email, I guess.

I think the IETF mindset of just copying the email stuff over to HTTP has been a mistake. (See CRLF line breaks and US-ASCII default for text/*, for one.)

format=flowed exists because legacy email protocols wanted the payloads to have limited line lengths. HTTP has no such limitation, so it's OK to send long chunks of text between line breaks over HTTP.

If you meant backwards-compatibility with UAs that don't soft-wrap long lines, I can see a potential use case there, but is there a reason to believe that Web authors who aren't IETF WG participants would use format=flowed over HTTP if UA supported it?

> > sense not to put UA-removable line breaks in the text data and to make sure the
> > styling of the pre allows the UA to wrap long lines?
> 
> As far as I understand, that would be a change over what's being done today.

A style change--not a DOM change.

> > I worry that allowing something here without requiring it would expose a
> > different DOM in different UAs.
> 
> I'm not sure how the DOM exposed for something that hasn't got any support
> today is really relevant.

Suppose the spec said UAs are allowed but not required to implement a PLAINTEXT_flowed state alongside the PLAINTEXT state and could use PLAINTEXT_flowed for text/plain; format=flowed. The PLAINTEXT_flowed state removed line breaks preceded by a space. Now the DOM for text/plain; format=flowed resources would be different in UAs that do the allowed-but-not-required thing and UAs that don't.
Comment 5 Julian Reschke 2010-10-04 07:17:26 UTC
(In reply to comment #4)

Note that we're starting to get off-topic. The main point of this bug was to point out that it cites the format=flowed spec as normative, but then doesn't seem to appear to allow flowed handling.

> I think the IETF mindset of just copying the email stuff over to HTTP has been
> a mistake. (See CRLF line breaks and US-ASCII default for text/*, for one.)

In retrospective, that's true. But I'm sure this sounded like a good, pragmatic idea when it was introduced.
 
> format=flowed exists because legacy email protocols wanted the payloads to have
> limited line lengths. HTTP has no such limitation, so it's OK to send long
> chunks of text between line breaks over HTTP.

Indeed. We're doing cleanup where we can, compare for instance RFC2231 and 5987 (the continuation line mechanism is gone).

> If you meant backwards-compatibility with UAs that don't soft-wrap long lines,
> I can see a potential use case there, but is there a reason to believe that Web
> authors who aren't IETF WG participants would use format=flowed over HTTP if UA
> supported it?

Among those who currently serve text/plain with wide lines? Probably.

> > As far as I understand, that would be a change over what's being done today.
> 
> A style change--not a DOM change.

Just asking. It's not always clear what kind of change is considered to be acceptable. For instance, I'm very surprised that anybody is interested in the DOM generated for text/plain (or is even assuming there is one).
 
> > I'm not sure how the DOM exposed for something that hasn't got any support
> > today is really relevant.
> 
> Suppose the spec said UAs are allowed but not required to implement a
> PLAINTEXT_flowed state alongside the PLAINTEXT state and could use
> PLAINTEXT_flowed for text/plain; format=flowed. The PLAINTEXT_flowed state
> removed line breaks preceded by a space. Now the DOM for text/plain;
> format=flowed resources would be different in UAs that do the
> allowed-but-not-required thing and UAs that don't.

Yes. So? After sll, that's an effect of what the author of the document wanted. Why not trust her/him?
Comment 6 Ian 'Hixie' Hickson 2010-10-11 22:12:27 UTC
EDITOR'S RESPONSE: This is an Editor's Response to your comment. If you are satisfied with this response, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document:
   http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Accepted
Change Description: see diff given below
Rationale: This was already implicitly supported, but I've added a note and some more text to make it clearer that the spec is not trying to contradict those other RFCs.


> I'm very surprised that anybody is interested in the DOM generated for text/plain

This actually turns out to be surprisingly common (enough that UAs have to be interoperable on the handling of short text files in iframes). I highly recommend working as QA for a browser vendor for a couple of years, it is quite enlightening. :-)
Comment 7 contributor 2010-10-11 22:12:48 UTC
Checked in as WHATWG revision r5598.
Check-in comment: Mention Format=Flowed explicitly.
http://html5.org/tools/web-apps-tracker?from=5597&to=5598
Comment 8 Julian Reschke 2010-10-12 09:41:50 UTC
Thanks for adding the consideration for "flowed".

That being said; I'm still not convinced that specifying the DOM at this level of detail even for future variants of text/plain makes sense. If an author augments the type information like that, he/she clearly *want* different processing, and are very unlikely to care about a modified DOM.

Also, making this a CSS issue is a nice idea; I wonder whether you talked to any of the implementers whether they're willing to do that?