This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23157 - XHTML: validator doesn’t check that Raw Text elements (style/script) match the constraints
Summary: XHTML: validator doesn’t check that Raw Text elements (style/script) match th...
Status: REOPENED
Alias: None
Product: HTML Checker
Classification: Unclassified
Component: General (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Michael[tm] Smith
QA Contact: qa-dev tracking
URL: http://www.w3.org/html/wg/drafts/html...
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-09-04 17:48 UTC by Leif Halvard Silli
Modified: 2013-09-05 16:09 UTC (History)
5 users (show)

See Also:


Attachments

Description Leif Halvard Silli 2013-09-04 17:48:54 UTC
For *escapable* Raw text elements (textarea/title), the validator does a *good* job. 

Example:

     <textarea><foo/></textarea>

The validator blesses the above as valid, if it occurs in an HTML document, whereas if it occurs inside XHTML, it stamps it as invalid. This is how it should be!

However, for <script> and <style>, the story is different: Irrespective of HTML or XHTML, the validator will use the the rules of HTML parsing and stamp the following as valid:

   <script><FOO/></script>

Per the XML parser, the above <FOO> is an element. Thus: not text. Whereas the content model of <script> and <style> is (some flavour of) text. Hence, the above ought to be stampd as invalid, measured as XHTML against the HTML5 specification.
Comment 1 Michael[tm] Smith 2013-09-04 22:49:39 UTC
(In reply to comment #0)

> However, for <script> and <style>, the story is different: Irrespective of
> HTML or XHTML, the validator will use the rules of HTML parsing

That's not true. The reason for the behavior you're seeing is not due to the validator using the "rules of HTML parsing" to parse <script> and <style> in XHTML documents. If the documents are served with an XML MIME type, the validator is using the XML parser to parse them, exactly as you'd expect.

> and stamp the following as valid:
> 
>    <script><FOO/></script>

In an XHTML document, that markup above results in a DOM with an empty element named FOO as a child of that script element. There's nothing in the spec that disallows XHTML documents from having arbitrary elements as children of <script>. So in XHTML, <script><FOO/></script> is actually valid.

> Per the XML parser, the above <FOO> is an element.

Yep

> Thus: not text.

Yep

> Whereas
> the content model of <script> and <style> is (some flavour of) text.

Actually, the spec doesn't say that the content model of <script> and <style> is text. In a text/html document it is necessarily text, because that's how it's parsed into the DOM. But the spec doesn't say that it must be text in an XML document. It just says, "If there is no src attribute, depends on the value of the type attribute, but must match script content restrictions."

In some XML/XHTML documents, it's imaginable that the contents of the script element might actually be in some XML language. Maybe even XSLT or XQuery or something. Or maybe not those but anyway, it can be element content. The spec has it that way by design.

> Hence,
> the above ought to be stamped as invalid, measured as XHTML against the HTML5
> specification.

No, <script><FOO/></script> is not invalid per the current HTML spec.
Comment 2 Leif Halvard Silli 2013-09-05 00:36:01 UTC
(In reply to comment #1)
> (In reply to comment #0)

> Actually, the spec doesn't say that the content model of <script> and
> <style> is text.

You are right. But wrong. Spec says about the content model that it:

“If there is no src attribute, depends on the value of the type attribute, but must match script content restrictions.”

So, in the default state - which is what I intended by this bug, the content model is _JavaScript_. And does elements belong inside the JavaScript code? As much as I know, the answer is "no". I don't think they belong in any scripting language (except perhaps one that is based on XML - like you mentioned below).

Hence, <script><foo/></script> does break the content type of that particular <script> element.

Of course, I don’t expect the validator to validate the JavaScript. But I do expect the validator to understand if there are markup mixed with the code. Just like the validator can flag a <title> that occurs in the <body>, it can also flag an <element> that occurs inside a <script> element of type javascript.

> In a text/html document it is necessarily text, because
> that's how it's parsed into the DOM. But the spec doesn't say that it must
> be text in an XML document. It just says, "If there is no src attribute,
> depends on the value of the type attribute, but must match script content
> restrictions."

Right!

> In some XML/XHTML documents, it's imaginable that the contents of the script
> element might actually be in some XML language

Absolutely. But then the @type atribute must be something other than "text/javascript".

>. Maybe even XSLT or XQuery or
> something. Or maybe not those but anyway, it can be element content. The
> spec has it that way by design.
> 
> > Hence,
> > the above ought to be stamped as invalid, measured as XHTML against the HTML5
> > specification.
> 
> No, <script><FOO/></script> is not invalid per the current HTML spec.

It seems you are only looking at it froma well-formed kind of perspective, and I agree that it is well-formed, of course.

But the way I see it, since there is no @type attribute in <script><FOO/></script>, the content is, by default, javaScript. Hence, it is not in line with the content model if an element occurs in the midst of the script.

It might be that you don’t wanna add this feature, but that does not make the request invalid.
Comment 3 Michael[tm] Smith 2013-09-05 03:13:28 UTC
(In reply to comment #2)
> So, in the default state - which is what I intended by this bug, the content
> model is _JavaScript_. And does elements belong inside the JavaScript code?
> As much as I know, the answer is "no". I don't think they belong in any
> scripting language (except perhaps one that is based on XML - like you
> mentioned below).

You have a point there, but that's a different bug than what you described in the Summary/Description for this bug report.

> Hence, <script><foo/></script> does break the content type of that
> particular <script> element.
> 
> Of course, I don’t expect the validator to validate the JavaScript.

It'd feasible for it to check that the syntax is valid JavaScript, actually. We already have an error-reporting JS parser (Rhino) we're using to check the syntax of attribute values that can contain JS. So I'll probably add JS-syntax-checking support for <script> contents at some point. There may even be an open validator bug for it already. If not, feel free to raise one.

> But I do
> expect the validator to understand if there are markup mixed with the code.

I don't expect it to. There's nothing special about the fact that it's markup as opposed to just some JS syntax error. So I don't plan to add any special checking for whether there's markup in there not.

> Just like the validator can flag a <title> that occurs in the <body>, it can
> also flag an <element> that occurs inside a <script> element of type
> javascript.

I guess it could, if there were any special value in having it do that. But as far as I see, there's not.

> But the way I see it, since there is no @type attribute in
> <script><FOO/></script>, the content is, by default, javaScript. Hence, it
> is not in line with the content model if an element occurs in the midst of
> the script.
> 
> It might be that you don’t wanna add this feature, but that does not make
> the request invalid.

In the Summary/Description for this bug, you didn't mention anything about checking for JavaScript. You described something else. If you want to request that we add a feature for checking that the syntax of <script> contents is valid JS, then I think you should raise a different bug for that.
Comment 4 Simon Pieters 2013-09-05 15:46:28 UTC
(In reply to comment #3)
> I don't expect it to.

The spec requirement is

[[
Whatever language is used, the contents of the script element must conform with the requirements of that language's specification.
]]

Since javascript is text rather than a tree of elements, elements should be banned.

> There's nothing special about the fact that it's
> markup as opposed to just some JS syntax error. So I don't plan to add any
> special checking for whether there's markup in there not.

It is, actually, since execution of the script only uses child Text nodes, so child Text nodes is what you'd use for syntax-checking the script, too.

[[
If the script is inline and the script block's type is a text-based language
The value of the text IDL attribute at the time the element's "already started" flag was last set is the script source.
]]

There's also the case where <script> is being used for data blocks:

[[
When used to include data blocks (as opposed to scripts), the data must be embedded inline, the format of the data must be given using the type attribute, the src attribute must not be specified, and the contents of the script element must conform to the requirements defined for the format used.
]]

...which one could argue should allow elements, although it's not really clear.
Comment 5 Leif Halvard Silli 2013-09-05 16:09:43 UTC
(In reply to comment #4)

That’s best bug-repopening you have ever seen!

> ...which one could argue should allow elements, although it's not really
> clear.

I agree.