24451 – editorial comments on LCWD

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 24451 - editorial comments on LCWD

Summary: editorial comments on LCWD

Status:	RESOLVED FIXED

Alias:	None

Product:	HTML WG
Classification:	Unclassified
Component:	HTML/XHTML Compatibility Authoring Guide (ed: Eliot Graff) (show other bugs)
Version:	unspecified
Hardware:	PC Linux

Importance:	P2 normal
Target Milestone:	---
Assignee:	Leif Halvard Silli
QA Contact:	HTML WG Bugzilla archive list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-01-31 05:20 UTC by Liam R E Quin
Modified:	2014-05-27 20:52 UTC (History)
CC List:	5 users (show)

See Also:

Attachments

Description Liam R E Quin 2014-01-31 05:20:02 UTC

There are a lot of comments here, but I think they are mostly or all editorial, except for a Process comment at the end, so I have sent them all in one comment. If you prefer I can separate them into multple Bugzilla entries.

This is a ueful and good document - I'm pleased to see it move forward, but I found quite a few minor typos and some slightly confusing passages, as noted...



Status of this Document

Please don't refer to "legacy XML" - I think you mean just "XHTML 1.x".
XML in general is not deprecated by W3C.

"this recommendation" - it's not yet a W3C Recommendation, although I do hope it becomes one!


2.1 Principles

s/requiremetn/requirement/


3.1 Processing instructions and 3.2

Forbidding the XML Declaration - 3. says, "character encoding MAY be left undeclared in XML" but 3.1 forbids the XML declaration, which is where an encoding would be declared in XML. (the document goes on to clarify, but suggest change

"As such, character encoding MAY be left undeclared in XML with the result that UTF-8 is still supported"

to

"Documents served with an XML content type therefore do not need to use any of the HTML encoding declaration methods, although if the document might be interpreted as text/html it SHOULD do so."

However, the green NOTE further down restates this, sojust removing the "As such" sentence would also be fine.

Further down you note that the I18N WG recommaends [that one] always include an encoding declaration, which is helpful but may leave the reader confused as to whether this applies to HTML or to XHTML.


3.3 The DOCTYPE


The note that the string may be in mixed case or uppercase letters and still be well-formed XML is perhaps confusing since it starts talking about valid xml and then, later in the same sentence, moves to well-formed XML.

Suggest,

Note
For valid XML the document element named in the document type declaration must exactly match the top-level element of the document, including in case.  This rule is relaxed for well-formed, rather than valid, XML documents. Since XHTml requires a lower-case <code>html</code> element, Polyglot documents <rfc>should</rfc> use lower-case <code>html</code> for the element named in the DOCTYPE declaration.


but not sure if it's worth the extra length.


It would probably be worth saying something about customized XHTML DTDs here,
with element and entity declarations inside the document type definition subset within the document, or that point to an alternate DTD.


3.4 Namespaces


In XML it is the URI, not the prefix, that is the namespace, so the first paragraph (3.4.1, [HTML5] introduces..) is, formally, meaningless.

What is meant, I think, is that the HTML 5 specification requires that HTML processors implicitly associate the prefixes html, svg and math with their respective URIs, which are as follows [...].

Regarding the paragraph, [[
Note that there are other prefixed attributes that can be used beyond xlink:href (such as xml:base). Polyglot markup does not declare these prefixes via xmlns. The prefixes are implicitly declared in XML and are automatically applied to the appropriate attributes in HTML
]]

Is this a note or is it normative? It says it's a note but does not use Note markup. Also, "such as xml:base" seems far too wishy-washy for a specification. Is foaf:email such a prefixed attribute? what about xml:id?

I _think_ what is meant is,

The "xml" namespace prefix used e.g. in xml:base, xml:lang, xml:space and xml:id
does not need to be declared in XML documents. See CSS namespaces [CSS3NAMESPACE] for how to use CSS selectors with these attributes.

The following paragraph seems to be attempting to say this.

I don't think the "Note" means to say anything about attributes associated with namespace URIs other than the URI normally associated with the "xml" prefix.

I do like the "can be sued as CSS selectors" and have contacted my attorneys already :-)


3.5.1 Required elements and tags

The first paragraph seems superfluous, but maybe it's needed for HTML people?

In the next paragraph there's an extra comma in "optional tags, may create".

s/in their code/in their markup/

Remove the extra comma in " with regard to tags, is"

3.5.1.1

"Every polyglot markup document therefore ontains an html, head, title, and body element, represented in the code with their tags." -- that's true in HTMl too, as the previous section just explained, although they are not represented in the "code" [please let's call it markup, not code]. Maybe you have an extra comma there just before "represented"?

s/following source code/following markup/


3.5.1.2 Required tags examples

I think this section is talking about required _elements_, not required _tags_.
Of course, in XML, the presence of an element is never inferred, so tags are always required at the start and end of element boundaries.


3.5.2 Excluded elements and tags

This should just be Excluded elements. All three XML tags (start, end, null) are used in XHTML and polyglot HTML.


Delete spurious comma in "Elements with features designed for HTML alone, are non-polyglot from the outset."

(the rationale for excluding noscript is a nonsense of course: there's also no mechanism for producing img or a or table in XML directly. But we'll let that pass)

In this section (3.5.2) you say that noscript is not allowed, then have a non-normative note that says there are other elements that are also not allowed but which you do not list. Since this is non-normative, how should the reader know which elements have features designed for HTML? I'd say "a" and "img" are the obvious candidates, but surely you don't mean these?


3.5.3.1 Element names.

"Polyglot markup uses the correct case for element names."

I think this sentence translates to, "conforming documents conform to this
specification", and can be deleted. I'd suggest making the bullet list that follows it be three simple paragraphs instead.

3.5.3.2 Attribute names

Again, since no conforming document could use an "incorrect" case, I'd delete the first sentence, and maybe promote the bullet list items to paragraphs.


3.6 Element Contents

The term strictly speaking in both SGML and Xml is "Element content",
although I think everyone will understand "Element contents" not to be a reference to an element called Contents :-)

3.6.1 "Example: Polyglot markup uses the minimized tag syntax for void elements"

It uses the empty element tag syntax.

You (mis)use the "minimized form" term again in the Example and, confusingly, use the undefined term "self-closing" in the note. Please either use the same term in all places or define all the terms.

3.6.2 Raw text elements

XML does not have "comment tags" or "cdata tags". SGML does have CDATA elements, but that's not what you mean here.

A better way to put it is that in HTML the content of the script and style elements is treated as if it were CDATA, so that & and < are not special except when they occur as the end tag to close the element.

The "As a result" paragraph doesn't seem to add anything except suggesting that the editor of this document prefers HTML in some way :-)

In the last column of the table, </script and </style should have the same description as for HTMl - they terminate the corresponding element.


3.6.2.2.1 Safe CDATA usage rules

s/These rules assumes that CDATA is of limited use for CSS./These rules assumes that CDATA is of limited use for CSS and therefore focos on JavaScript used with the script element./

HTML's restrictions on <script>/<style> -- probably you should say what they are, and I sugget using an "and" instead of a virgule/slash here, as it looks like part of markup syntax.

"Before the CDATA section there can only be one node" - preferrably only one line of code" -- by code here do you mean JavaScript code? There aren't any nodes at all in an XML document, nor in an HTMl document until it's aprsed, and then you get nodes in the DOM representation (XML systems mostly don't use DOM at all). So I don't understand this phrase.

EXAMPLE 12

has a </script> but no <script>, is that intended?

"Disadvantage: Less safe for templating since the comment could become treated as part of the template." I think this needs an explanation. Are you referring to XSLT templates here?

You probably need an example in which the string ]]> occurs as part of the
text, to demonstrate how to handle it.

You may want to mention the problem of CDATA injection in which a malicious user creates data that looks like ]]> nasty stuff here <![CDATA[


3.6.3 Escapable raw text elements


delete spurious comma after "permitted"

you could also delete the comma after 'safe text content"

s/permittd/permitted/


3.6.5 Normal Elements

add a missing comma after iframe element to end the paranthetical clause in "Normal elements have no special restrictions other than those that normally apply to polyglot markup. But note that some elements, such as the iframe element must be empty"

When you say these elements must be empty,
1. which elements exactly?
2. do you mean EMPTY, using the empty element tg syntax <iframe/> ?
3. If not, what do you mean?


3.7.1 newlines

You probably need to explain that the problem is that HTML/SGML-based systems will delete the initial newline on parsing, but XML parsers will not.

3.8 Attributes

"the literal character '\t'" -- that's actually four characters. Do you mean a literal tab character or do you mean that in HTML one can use \t to represent a tab? (I have no idea which you mean)

It might be worth noting that javaScript and CSS in attribute values are affected by attribute value normalization, because a comment will end up commenting out not to the end of the source line but to the end of the entire attribute value. (whether CSS has comments to end of line is up for debate, but browsers behave as if it does, which is all most authors care about)


In 3.8.1 Disallowed attributes you say that xm:space and xml:base are not allowed in HTML but are allowed on SVG and MathML elements - do you mean, even when those SVG or MathML elements occur within HTML documents? (if so, you shoudl probably say so; as it stands it could be taken to mean that they are allowed by those specs but not when SVG or MathML are used inside HTML)


3.8.3.1 The id attribute

Note that for valid XHTML the value of every id attribute must unique within the document and must be a legal XML name, starting with a letter.

[[
Polyglot markup always uses character references for the less than sign (<) and ampersand (&) when they are used as characters, except when those characters appear inside a CDATA section.
]]
s/ inside a CDATA section/ inside a CDATA section or a comment/


3.10 Comments

"Polyglot markup does not begin a comment with either ">" or "->". "
That's good because neither HTML nor XML do this - they use <! and <!-- respectively.


3.11.1 s/XHTM/XHTML/

3.11.2 CSS

I think the example at the start should be [attr]{property:value;}

Remove spurious comma in "required by polyglot markup, are namespaced"

[[
 As result, a selector such as [xmlns]{rule:foo} will only work in HTML – it will not work in XHTML, where it is a namespace attribute.
]]

The selector is not a namespace attribute. I think you mean, where the attribute has an associated namespace.

[[
And the same goes for prefixed attributes – even if one escapes the colon ([xml\:lang]{rule:foo}), such selectors will only work in HTML, except that for the namespace declaration for the xlink: prefix, then it works like in XML even in the HTML syntax and must thus be selected in a namespaced way in both syntaxes.
]]

This sentence is confusing for me and hard to read. Part of the problem is that the editor seems unaware of the distinction between a prefix, a namespace and a namespace URI, but most of the problem is that it's a run-on sentence. "it works like in XML" -- what works "like in XML"? Suggest rewriting as multiple sentences. I can't comment on correctness because I don't understand it, sorry.

I think this section overall is good and correct, but needs a slight polishing. Hey, it's a draft :-)

3.12 Templating restrictions

This section appears to be empty.


*

What is the repationship between Polyglot and XML 1.1? Is NEL allowed in whitespace in HTML? What about c0 and c1 controls?

*

Please remember to send a formal request to the XML Core Working Group to review this document; they/we may decline, or may accept these (personal) comments and endorse them, or do something else, but they must obviously be consulted just as the XML Working Group would consult the HTML Working Group in similar circumstances.

Thank you, and thank you for working on this important and helpful document.

Comment 1 Eliot Graff 2014-02-02 00:24:13 UTC

Thanks, Liam. I'll start working through these.

Comment 2 Eliot Graff 2014-02-02 01:20:42 UTC

Done:

Please don't refer to "legacy XML" - I think you mean just "XHTML 1.x".
XML in general is not deprecated by W3C.
     Changed to XML 1.x

"this recommendation" - it's not yet a W3C Recommendation, although I do hope it becomes one!
     Changed to "this document"

3.12 Templating restrictions
This section appears to be empty.
    Deleted header

2.1 Principles
s/requiremetn/requirement/
     Can't repro. I only see "requirement"

     Changed
"As such, character encoding MAY be left undeclared in XML with the result that UTF-8 is still supported"
     to
"Documents served with an XML content type therefore do not need to use any of the HTML encoding declaration methods, although if the document might be interpreted as text/html it SHOULD do so."

     Changed
The W3C Internationalization (i18n) Group recommends to always include a visible encoding declaration in a document...
     to
The W3C Internationalization (i18n) Group recommends that one always include a visible encoding declaration in an HTML document...

More to come later....

Comment 3 Eliot Graff 2014-02-03 22:33:41 UTC

3.3 The DOCTYPE
     Changed
The string html SHOULD be in lowercase letters, in order to be both well-formed and valid XML; however, the string MAY be in mixed case or uppercase letters and still be well-formed XML. 
     to
For valid XML the document element named in the document type declaration must exactly match the top-level element of the document, including in case. This rule is relaxed for well-formed, rather than valid, XML documents. Because XHTML requires a lower-case html element, Polyglot documents SHOULD use lower-case html for the element named in the DOCTYPE declaration. Bear in mind that a customized XHTML DTD with element and entity declarations inside the document type definition subset within the document, or one that points to an alternate DTD, may have special case requirements. 

3.4 Namespaces
     changed
Note that there are other prefixed attributes...
     to a Note

Edited 4.4.2 Attribute-level namespaces to incorporate feedback.

Sadly, I cannot repro the "sued" typo, as I thought that was one of the funniest things I've heard.

3.5.1 Required elements and tags
   Fixed superfluous commas
     changed
s/in their code/in their markup/

Will pick it back up with 3.5.1.1

Comment 4 Eliot Graff 2014-02-06 00:35:55 UTC

3.5.1.1
     Edited to read:
Every polyglot markup document therefore contains an html, head, title, and body element. ... Therefore, the following is the most basic polyglot markup document. 

3.5.1.2 Required tags examples
     Changed "tags" to "element"

3.5.2 Excluded elements and tags
     Changed to "Excluded elements"

Still need to get to the list of elements in 3.5.2

Comment 5 Liam R E Quin 2014-02-06 14:07:38 UTC

Eliot, I'll respond formally when you've finished :-) but I just want to thank you for being so responsive.

Comment 6 Eliot Graff 2014-02-19 00:57:17 UTC

3.5.2
     Changed the content to read:
Polyglot markup does not use the noscript element, because the noscript element cannot be used in XML documents. [HTML5] 

Note   Polyglot markup should not use any elements excluded from HTML, XHTML, or both. For example, including any of the elements listed in <a>Non-conforming features within a document</a> increases the risk of that document not being polyglot markup. 

3.5.3.1 Element names. AND 3.5.3.2 Attribute names
     Rewrote per your insightful suggestions.

3.6 Element Contents
     Changed to 
Element Content
     Also     
Rewrote section content so that "empty element tag syntax" is used correctly and throughout.

3.6.2 Raw text elements
   Rewrote to:
In polyglot markup, the contents of all elements listed as raw text elements in the HTML specification or in an extension spec, MUST conform to the extra requirements defined in this section. 

HTML5 defines the following raw text elements: 

script, style  

In HTML, the content of the script and style elements is treated as if it were CDATA, so that & and < are not special except when they occur as the end tag to close the element. In XHTML, however, the same elements are treated as tags, character references, CDATA, etc. 
   Also,
Changed two values in the table as suggested.


Will pick up next time with 3.6.2.2.1 Safe CDATA usage rules

Comment 7 Eliot Graff 2014-03-19 23:58:17 UTC

EDITOR'S RESPONSE: This is an Editor's Response to your comment. If
you are satisfied with this response, please change the state of
this bug to CLOSED. If you have additional information and would
like the Editor to reconsider, please reopen this bug. If you would
like to escalate the issue to the full HTML Working Group, please
add the TrackerRequest keyword to this bug, and suggest title and
text for the Tracker Issue; or you may create a Tracker Issue
yourself, if you are able to do so. For more details, see this
document:

http://dev.w3.org/html5/decision-policy/decision-policy.html

Status: Partially (mostly) Accepted
Change Description: See comments. CVS Commit 1.31
Rationale: Varied, and explained in comments

Would appreciate any further feedback you have. Smaller bugs would be better, though. ;-)

The XML Core Working Group has reviewed the document and decided to give no comments.

Liam,

Thank you so much for taking the time to go through this spec in such detail and with such thought. We really appreciate your time and effort.

Eliot

3.6.2.2.1 Safe CDATA usage rules
Can't repro the first two editorial requests.
Commented out (in source) the two bullet points about nodes, to give Leif time to respond. If he says OK, will remove them entirely.

EXAMPLE 12
has a </script> but no <script>, is that intended?
Added opening <script>
Disadvantage is removed, as we're leaving usage advice for another venue. Therefore, we'll leave off the problem of CDATA injection.

3.6.3 Escapable raw text elements
edits incorporated

3.6.5 Normal Elements
Removed commas, used parentheses instead
Reworded and retitled section:
4.6.5 Special elements
Unless otherwise specified, elements have no special restrictions other than those that apply to all polyglot markup.

The iframe element has restrictions in polyglot markup, because the HTML specification sets special restraints on iframe in XML documents. [HTML5]

3.7.1 newlines
Incorporated suggested text.

3.8 Attributes
Changed sentence to:
For example, within an attribute's value, polyglot markup uses &#x9; for a tab rather than the verbatim string literal, \t. This is because of <link>attribute normalization</link> in XML [XML10]. Note, too, that JavaScript and CSS in attribute values are affected by attribute value normalization, because a comment ends up commenting out not to the end of the source line but to the end of the entire attribute value.

3.8.1 Disallowed attributes
Rewrote note to say:
Note that the xml:space and xml:base attributes <strong>are</strong> allowed on SVG and MathML elements. The attributes may therefore appear in polyglot markup when they appear within SVG or MathML as foreign content.

3.8.3.1 The id attribute
Added:
Polyglot markup ensures that every id attribute must be unique within the document and must be a legal XML name, starting with a letter. [XML10]

4.9 Named entity references
changed to:
however for CDATA inside foreign content, strings within comments, and for safe CDATA, the following rules apply:

3.10 Comments
Changed to
Polyglot markup begins a comment with either "<!" or "<!--". Polyglot markup does not begin a comment with either ">" or "->".

3.11.1 s/XHTM/XHTML/

3.11.2 CSS
s/[attr]{Foo:value;}/[attr]{property:value;}
comma removed

Changed the paragraph in question to the following:
However, some of the attributes required by polyglot markup are namespaced. Some are namespaced by default, such as the xmlns attribute. Some attributes are namespaced by a prefix that is namespaced by default, such as xml:, xmlns:, and xlink:. In addition, extension specs may allow namespaced attributes other than those defined by the HTML specification. As result, a selector such as [xmlns]{rule:foo} will not work in XHTML, where the attribute has an associated namespace. The same is true for prefixed attributes. Even if one escapes the colon ([xml\:lang]{rule:foo}), such selectors will only work in HTML (except for the namespace declaration for the xlink: prefix. This works in XML and in HTML and must thus be selected in a namespaced way in both syntaxes).

Comment 8 Liam R E Quin 2014-03-20 01:33:03 UTC

Thank you for taking the time to go through all these. I did offer to submit them as separate comments :-)

Marking as CLOSED - changes accepted with thanks.

Comment 9 Leif Halvard Silli 2014-03-20 02:58:30 UTC

(In reply to Liam R E Quin from comment #8)


> 3.6.2.2.1 Safe CDATA usage rules

> "Before the CDATA section there can only be one node" - preferrably only one
> line of code" -- by code here do you mean JavaScript code? There aren't any
> nodes at all in an XML document, nor in an HTMl document until it's aprsed,
> and then you get nodes in the DOM representation (XML systems mostly don't
> use DOM at all). So I don't understand this phrase.

I believe it it essential that the text says "node". This was deliberate. DOM equivalence is one of the most important things we look for in polyglot markup.

The spec, part you ask about, have a parsed XHTML DOM in mind. To say "line of code" would be a huge loophole. The script element is not only for javascript, for instance.

Parsed as HTML, a script element has only a single text node - always. Parsed as XML, if the script element contains correctly closed tags, comments, cdata etc, then each such "thing" will result in a node. Thus in XHTML we can have multiple nodes. Whereas in HTML we only have one node.

In polyglot markup, while the ideal is equality between HTML and XML, we must sometimes operate with inequality. In which case, we must define the limits of the inequality. 

And so for the script element, the goal is to have the same amount of nodes. For instance this script element, since it contains no whitespace outside the CDATA node, has but one node, whether parsed as HTML or as XML:

<script><![CDATA[foo]]></script>

But, because we may need to comment out the CDATA ”start tag” and ”end tag”, we should allow one node before and after the CDATA section, like so:

<script>/*<![CDATA[*/
        foo 
/*]]>*/</script>

The above example has 3 nodes in XML: One text node before and after the CDATA section - and the CDATA section.  And spec thus says that <script> should not have more than 3 nodes. 

I hope with this explanation you understand my choice of the word "node". 

Hence I propose that we keep the section as it is/was, when you opened the bug. However, I am open to state something about what we mean about node and why we talk about nodes.

I reopen until we get the old text back, eventually with improvements.

Comment 10 Liam R E Quin 2014-03-20 04:46:39 UTC

Leif, thanks for your detailed response.

My comment was not about the intent, I'm fine with that, but just about the phrasing. There are no nodes in a text file. There are nodes inan XML document, nor in an HTML document, before you parse it. So it doesn't actually make sense to say there must be no nodes before a CDATA section, because there will never be any nodes before a CDATA section.

The nodes are constructed by some (not all) XML parsers and HTML parsers (e.g. a validator might stream the document and never construct an in-memory tree). The nodes are a result of parsing, and are not in the input document.

So, "A CDATA section must appear at the start of its containing element, and hence be the first child of that element" would be fine, as would a note talking about DOM nodes.

The point of this spec is to be a bridge between two worlds; yes, you're right, we have to compromise, but it's best to be clear and to write text that actually applies to both worlds, not, through accident, text that applies to neither world :-)

Thanks.

Comment 11 Leif Halvard Silli 2014-03-20 07:05:41 UTC

(In reply to Liam R E Quin from comment #10)
> Leif, thanks for your detailed response.

> The point of this spec is to be a bridge between two worlds; yes, you're
> right, we have to compromise, but it's best to be clear and to write text
> that actually applies to both worlds, not, through accident, text that
> applies to neither world :-)

Be aware that "element" and "CDATA" - just as "nodes", are things that  does not exist until paring, as I see it.

What I gather is that you think the current text can be misunderstood by people from ”the XML world” - they might not think in "nodes".  So, OK, I will try to think up something that is better and may be move the node talk to a note - as you hinted.

Comment 12 Eliot Graff 2014-03-26 21:44:32 UTC

I believe that I understand both of your positions here, so I have made the following changes to section 4.6.2.2.1:

****************************************

General rules:
 • The CDATA section is subject to HTML’s restrictions on <script> and <style>.
 • There can be only one CDATA section per raw text element.
 • A CDATA section must appear at the start of its containing element, and hence be the first child of that element. 
    ◦ Before the CDATA section there can only be content that creates one node - preferably only one line of code - which may consist of whitespace, an XML comment, or a construct of the scripting/styling language (usually a comment of the scripting/styling language).
    ◦ After the CDATA section there can only be content that creates one node - preferably only one line of code - which may consist of whitespace, an XML comment, or a construct of the scripting/styling language (usually a comment of the scripting/styling language).

Note
The statement that a "CDATA section must appear at the start of its containing element, and hence be the first child of that element," is due to how parsers may create DOM nodes based on characters and whitespace. The following script element, because it contains no whitespace outside the CDATA node, has one node, whether parsed as HTML or as XML: 
   <script><![CDATA[foo]]>/<script> 
Because an author may need to comment out the CDATA "start tag" and "end tag," polyglot markup allows for one node before and after the CDATA section. The following example has three nodes: one text node before the CDATA section, one for the CDATA section, itself, and one after the CDATA section: 

Fig. 5 CDATA section that is commented out, resulting in a total of three DOM nodes.

Example 12
<script>/*<![CDATA[*/
    foo 
    /*]]>*/</script>

Thus, for polyglot markup, a CDATA section must appear at the start of its containing element, and hence be the first child of that element. 

****************************************

I'll resolve this as fixed, but please let me know if this text needs to be modified in any way.

Comment 13 Liam R E Quin 2014-03-26 21:50:28 UTC

Eliot, I'm happy with that careful wording - thank you!

Comment 14 Leif Halvard Silli 2014-03-27 02:53:42 UTC

(In reply to Eliot Graff from comment #12)

The wording seems fine with me. But I have one small gripe with the very last sentence, see below.


> Fig. 5 CDATA section that is commented out, resulting in a total of three
> DOM nodes.
> 
> Example 12
> <script>/*<![CDATA[*/
>     foo 
>     /*]]>*/</script>
> 
> Thus, for polyglot markup, a CDATA section must appear at the start of its
> containing element, and hence be the first child of that element. 

The last sentence is some kind of a summary, which can be a nice thing to have, although it sometimes creates confusion too, whenever such a summary doesn’t fully support the previous text.

The sentence currently sounds as if the code in Fig. 5 is not permitted in polyglots. The problem is the word "child", which to me is synonymous with "node". Thus, in Fig. 5, the CDATA section will be the second child, while the summary says it should be the first child.

I think perhaps we should forget the wording "first child":

]] 
  Thus, for polyglot markup, a CDATA section must appear at the start of its containing element and must span the entire element. It should be the only node of the element. However, as described above, contents that results in a singular DOM node before and/or after the CDATA section is permitted.
[[

Comment 15 Liam R E Quin 2014-03-27 03:32:29 UTC

I'm Ok also with the change suggested in comment 14, even though "node" and "child" are not for me synonyms :-) (node is about a particular representation and child about the abstract tree).

Comment 16 Eliot Graff 2014-03-27 16:30:37 UTC

OK, hopefully, this will bring us to a single solution, even if we may generate three nodes of thinking*

(*Sorry, geek humor)

Anyway, I've rewritten the concluding sentences of the note so that the whole note now reads as follows. If this is acceptable, Leif, can you please close the bug. If you see the need for further edits, just let me know in the bug and reassign it to me and I'll take care of them right away.

Thanks, gentlemen!!


***********************************************

NOTE
The statement that a "CDATA section must appear at the start of its containing element, and hence be the first child of that element," is due to how parsers may create DOM nodes based on characters and whitespace. The following script element, because it contains no whitespace outside the CDATA node, has one node, whether parsed as HTML or as XML: 

 <script><![CDATA[foo]]>/<script> 

Because an author may need to comment out the CDATA "start tag" and "end tag," polyglot markup allows for one node before and after the CDATA section. The following example has three nodes: one text node before the CDATA section, one for the CDATA section, itself, and one after the CDATA section: 

Fig. 5 CDATA section that is commented out, resulting in a total of three DOM nodes.

Example 12
<script>/*<![CDATA[*/
    foo 
    /*]]>*/</script>

Thus, a CDATA section may appear at the beginning of its containing element, span the entire element, be the only node of the element, and yet still generate more than one DOM node. Polyglot markup therefore permits content that results in a single DOM node before and/or after the CDATA section.

Comment 17 Liam R E Quin 2014-03-27 19:27:08 UTC

Sorry to do this to you! but...

"be the only node of the element" isn't actually technically correct - not being pedantic here, but because the *only* nodes in the picture are DOM nodes (XML files do not have nodes and feither do HTML files), so you have written, in effect,
Even when there is only one DOM node in the constructed tree there may still be more than one DOM node in the constructed tree.

I don't see how to implement or test for that assertion.

So what you want instead is,
[[
Because an author may need to comment out the CDATA "start tag" and "end tag," polyglot markup allows plain text before and after the CDATA section. The following example generates three DOM nodes: one text node before the CDATA section, one for the CDATA section, itself, and one after the CDATA section:

Fig. 5 CDATA section that is commented out, resulting in a total of three DOM nodes.

Example 12
<script>/*<![CDATA[*/
foo
/*]]>*/</script>

]]

The contradictory "This,..." statement should just be removed:
[[
Thus, a CDATA section may appear at the beginning of its containing element, span the entire element, be the only node of the element, and yet still generate more than one DOM node. Polyglot markup therefore permits content that results in a single DOM node before and/or after the CDATA section.
]]
as it tries to restate something but ends up not quite correct. I don't think we need to come to agreement on the "Thus" paragraph if we've agreed on the text before the example (except I added the word DOM in front of node, and "generate", to try and reduce the confusion).

Thanks.

Comment 18 Leif Halvard Silli 2014-03-27 19:46:45 UTC

(In reply to Liam R E Quin from comment #17)

> The contradictory "This,..." statement should just be removed:
> [[
> Thus, a CDATA section may appear at the beginning of its containing element,
> span the entire element, be the only node of the element, and yet still
> generate more than one DOM node. Polyglot markup therefore permits content
> that results in a single DOM node before and/or after the CDATA section.
> ]]
> as it tries to restate something but ends up not quite correct. I don't
> think we need to come to agreement on the "Thus" paragraph if we've agreed
> on the text before the example (except I added the word DOM in front of
> node, and "generate", to try and reduce the confusion).
> 
> Thanks.

I think you are right. One could just remove that paragraph.

Comment 19 Leif Halvard Silli 2014-03-28 00:18:30 UTC

(In reply to Liam R E Quin from comment #17)

> The contradictory "This,..." statement should just be removed:
> [[
> Thus,

Btw, what I said in comment #18 was based on the assumption that you by "This,..." really meant "Thus,...".

Comment 20 Eliot Graff 2014-05-27 20:52:41 UTC

Removed the ambiguous paragraph as determined in comment 17. CVS version 1.37

TAnks!!!