This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23145 - Add <textarea> content restrictions for XHTML5
Summary: Add <textarea> content restrictions for XHTML5
Status: CLOSED INVALID
Alias: None
Product: HTML WG
Classification: Unclassified
Component: HTML5 spec (show other bugs)
Version: unspecified
Hardware: PC All
: P3 normal
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: HTML WG Bugzilla archive list
URL: http://www.w3.org/html/wg/drafts/html...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-03 23:49 UTC by Leif Halvard Silli
Modified: 2013-09-05 05:18 UTC (History)
4 users (show)

See Also:


Attachments

Description Leif Halvard Silli 2013-09-03 23:49:51 UTC
PROPOSAL: 

   For <textarea>, state that that, in XML/XHTML,
   no elements are allowed as children. 

BACKGROUND: 

<textarea> is an "escapable raw text element" (which is the same element type as the <title> element btw).

Everyting related to "content model: text" as well as "element of kind raw text" as well as "element of kind escapable raw text", is problematic when it comes to XHTML.

Seemingly, the spec allows any content for <textarae>. For instance:¨

   <textarea><html></textarea>

The above will, in HTML, be interpreted the same as this:

   <textarea>&lt;html></textarea>

But in XML, the firstexample would count as NOT well-formed. And if you do it the well formed way, like so:

   <textarea><html /></textarea>

then the <html> element will be parsed as an element, and not as text (thus: difference from how it is parced in HTML).

CONCLUSION:

It is seems like an omission that the spec does not state that elements are not permitted as child of <textarea>. For contrast, then, for the <iframe> element, it is specified that it must be empty if used in XHTML. 

http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#iframe-content-model

Thus, XHTML requires different use. And the same goes for <textarea> - which does not need to be empty, but which does need to have escaped content.

May be you should simply say that it is not required to escape the "<" in HTML, but that it is required to escapeit in XHTML.
Comment 1 Michael[tm] Smith 2013-09-04 01:58:33 UTC
The spec already makes it very clear that no elements are allowed as children of <textarea>:

  http://www.w3.org/html/wg/drafts/html/master/forms.html#the-textarea-element

It says, "Content model: Text".

It doesn't matter whether that <textarea> is in a text/html document or an XML/XHTML document. It's still only allowed to have text. If any elements were allowed as children of <textarea>, the "Content model" field there would explicitly list which elements, instead of saying "Text".

(In reply to comment #0)
> Seemingly, the spec allows any content for <textarae>.

No, it very clearly only allows Text.

> For instance:¨
> 
>    <textarea><html></textarea>

That's no "any content". In a text/html document, that is just text. It doesn't matter at all that it's text which looks like markup. There is nothing special about that "<html>" text. It could just as well be <textarea><<html>></textarea> or <textarea><>html<></textarea> or <textarea>>html<</textarea> or whatever.

In a text/html document, the string of characters "<html>" is not always a start tag. It may just be just, depending on where it occurs in the document. When it is text, it doesn't matter that it happens to look like a start tag. 

> But in XML, the first example would count as NOT well-formed. And if you do
> it the well formed way, like so:
> 
>    <textarea><html /></textarea>
> 
> then the <html> element will be parsed as an element, and not as text (thus:
> difference from how it is parsed in HTML).

Yeah, that's because unlike in a text/html document, where "<html />" can sometimes just be text, in an XML document, it can never be. So it's not text in your example, and so it's not allowed as a child of <textarea>. Because the spec says that content model for <textarea> is Text.

> CONCLUSION:
> 
> It is seems like an omission that the spec does not state that elements are
> not permitted as child of <textarea>.

It does state that, very clearly, by defining the content model of <textarea> as Text. That explicitly disallows any elements. So there's no omission.

> For contrast, then, for the <iframe>
> element, it is specified that it must be empty if used in XHTML. 
> 
> http://www.w3.org/html/wg/drafts/html/master/embedded-content-0.html#iframe-
> content-model
> 
> Thus, XHTML requires different use. And the same goes for <textarea> - which
> does not need to be empty, but which does need to have escaped content.

No, <textarea> doesn't need to have escaped content. It just needs to have text.

> May be you should simply say that it is not required to escape the "<" in
> HTML,

A statement like that would be fine as a non-normative Note in section on <textarea>. But if it's added it should also be added as Note in the section on <title>.

However, it's not really necessary to add it at all, because the details on what kind of Text is allowed where in text/html documents are already provided in "HTML syntax" section of the spec:

  http://www.w3.org/html/wg/drafts/html/master/syntax.html

That sections says:

  [1] http://www.w3.org/html/wg/drafts/html/master/syntax.html#syntax-text
  "Text is allowed inside elements, attribute values, and comments. Extra
  constraints are placed on what is and what is not allowed in text based
  on where the text is to be put, as described in the other sections."

Then it says,

  [2] http://www.w3.org/html/wg/drafts/html/master/syntax.html#elements-0
  "There are five different kinds of elements: void elements, raw text
  elements, escapable raw text elements, foreign elements, and normal elements."

And it lists "textarea, title" as being "escapable raw text elements".

Then it says,

  [3] http://www.w3.org/html/wg/drafts/html/master/syntax.html#normal-elements
  "Normal elements can have text, character references, other elements,
  and comments, but the text must not contain the character "<" (U+003C) or an
  ambiguous ampersand."

So that means that for text in normal elements, if you want to use the character "<", you have it escape it as a character reference.

Then it says,

  [4] http://www.w3.org/html/wg/drafts/html/master/syntax.html#escapable-raw-text-elements
  "Escapable raw text elements can have text and character references, but
  the text must not contain an ambiguous ampersand...
  "The text in raw text and escapable raw text elements must not contain any
  occurrences of the string "</".

So unlike for "normal elements", "Escapable raw text elements" don't have any restriction about the "<" character. Therefore you don't need to escape it.

> but that it is required to escape it in XHTML.

That's because unlike in text/html, in XML there is no such thing as "escapable raw text". But that fact is already documented in the spec in the "HTML syntax" section and "XHTML syntax" section.
Comment 2 Leif Halvard Silli 2013-09-04 07:24:03 UTC
Hi Mike. Thanks for your teaching. I shall NOT say that it isn’t helpfull. However, some things needs to be clarified for myself, and in the spec.

FIRSTLY, you seem to say that element kind 
               "raw text"
and element kind
     "escapable raw text"
are synonym for elements with
    "content model: Text"

But then, why does‘t the "raw text" elements <script> and <style> have content model: Text? (Yeah, in a way they *do* have content model: Text, but that is not what is their "official" content model.) In fact, none of the elements (they are not many) with content model: Text, are listed as being of the "raw text" kind.

To underline my point, the spec says,
about the content model of <option>:

 "If the element has a 
  label attribute but no
  value attribute: Text."

However, if you, in text/html, do this:

  <select><option label='l'><html/></option></select>

then you get an error.

CONCLUSION: Currently, the meaning of 'content model: Text' depends - not only of text/html vs xml, but also of the *kind* of element.
Comment 3 Leif Halvard Silli 2013-09-04 09:23:52 UTC
Upon further reflection. I, simply put, agree with you, Mike.

However I believe the differences in meaning [for authors/code] of ”content model: Text”, _within HTML_ (that is: it depends on the kind of element), needs to be called out in the spec. And, as a consequence, because we are speaking ”only in HTML” (that is: it differs from XHTML), it must be called out that in XHTML, the meaning of 'content model: Text’ does *not* vary with the kind of element. 

For this I have opened bug 23152.