Ext element

From SVG

From WHATWG Wiki:

Note: proposal by Doug Schepers, comments by Ian Hickson.
Each example should explain in details (ideally with examples) how to handle:
* Syntax errors at the tokeniser level, the tree construction level, and the schema level.
* Existing content that happens to use elements or syntax that you are proposing have special 
  processing rules.
* Pages that contain any special syntax after that syntax was copied and pasted by an 
  ignorant Web author from a valid page written by a competent Web author aware of the new syntax.


Extensibility Element

This is a proposed generic extensibility point, for SVG and possibly MathML or other XML content, with the temporary placeholder name of <ext> (the real element name would be some token that doesn't clash with existing content). Inside you can use XML or another format. Naturally, any content placed in an <ext> element would have to be understood by the UA in order to render correctly, and more complex rules may need to be developed for specific kinds of interaction between the root document and the inline content, such as with script, CSS, etc. For some discussion, see the IRC logs.

 <p>Hello world. 
    <ext type="image/svg+xml">
       <svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 10 10">
          <circle x="5" y="5 r="5" stroke="green"/>
       </svg>
    </ext> 
 </p>

We should define a content model for where the <ext> element can occur, and if there are implications for different locations (such as inside a table, a paragraph, the head, etc). The simplest thing, at least for SVG (and probably MathML), would be that it would have the same restrictions as an <img> element. Also, there should be a default block model for <ext> in CSS.

Please define this in enough detail that I can construct a tokeniser and tree constructor from the description. Ideally, just provide the actual tokeniser and tree constructor that you are proposing. This is what it looks like in the HTML5 spec (sections 8.2.3 and 8.2.4). It's probably easiest to define it as a delta from what the spec has today. Perhaps easier for you. :) Since you are the expert in the tokenizer and the tree construction algorithm, I ask you to extrapolate from what we've already contributed. If you require such precision of detail for all proposals, then the deadline should be expanded for submission, as 12 days from your initial announcement is insufficient time to make a complete proposal; I suggest that 2 months is more appropriate. -Shepazu

This first came up more than 2 years ago, and I've been asking for detailed proposals for most of that time, including repeatedly over the past few weeks; there's been plenty of time for proposals. Not quite accurate. In the past, you have said that you did not see this in scope for HTML5, but rather for HTML5.1 or such. If you are moving up the timeframe, that's great, but I see no reason to create the rush for this at the present time; the spec is not expected to be stable for at least a year and a half, at the most optimistic. -Shepazu

Regarding extrapolation: I've already tried to extrapolate, and I can't find an extrapolation that works with existing Web content (for example, all the possibilities I've tried end up with the problem I've listed below under "Why it won't work"). That's why I'm asking for details.


Notes:

  • This is similar to IE's "XML islands" with the <xml> element. It's believed that there are some conflicts with the <xml> element itself, since it creates a separate document that is tied to the <xml> element in the DOM, but more research is needed. See also Using XML Data Islands in Mozilla.
  • The <ext> element could potentially be an implicit element, generated by the HTML5 parser on encountering a start tag of e.g. "<svg " or "<math ". That would save authors of having to type this extra element, but has a drawback in that it doesn't provide fallback content for legacy UAs. -Ed
  • We could specify exactly what flavors of markup must be supported by a UA, and which may be supported by a UA. This would be rather restrictive, but could improve interoperability of UA features, and would ensure that the proper DOM interfaces are available. For example, SVG and MathML must be supported, and FooML may be supported (or something).

Error Handling

Notes:
The main options seem to be:
# strict XML parsing (not favored by many)
# very permissive error handling (as in HTML5); this idea is controversial 
  and has many open issues, which should be detailed below 
# moderate error handling, as detailed in SVG Tiny 1.2 
# other ideas?

The chief risk with permissive error handling is that it would create 
content that is not compatible across different UAs, including mobile 
devices and authoring tools.

Proposal:

  • Tree construction recovers from errors by closing the <svg> element, and not rendering any content after the error. What are you considering an error in tree construction?
  • Case folding is not supported within the main body of the <ext> element, though it would be within the <fallback> element.
  • The tree builder would assign the appropriate namespace URI to the element and attribute nodes it creates.
  • Unknown attributes and elements are ignored. What elements/attributes are known?
  • Unquoted attribute values will be ignored (should the element also not be rendered?)
  • If the "/>" is not found at the end of an element, all subsequent element will be placed as child elements of the element (and thus not rendered) until a matching closing tag is found, or until the a matching root tag is found, or until the "</ext>" element is found. Does this mean that <ext> introduces a new scope? e.g. what does the DOM look like for this?:
    <table><tr><td><ext></table></ext>A
    • If a matching closing tag is found, the element is closed, and subsequent elements are rendered as normal.
    • If a matching closing root tag is found, the element is closed, the root tag is closed, and any subsequent elements are ignored if outside a root tag.
    • If a matching closing <ext> tag is found, the element is closed, the root tag is closed, the <ext> tag is closed, and HTML processing continues as normal.

One question is how to find the matching tag. Which parsing mode is active after 2-open/1-close:

<ext><ext></ext>

Notes (not sure this fits in the above proposal):
* Tokenizer recovers from errors by ignoring subsequent content 
  between the error and the closing <ext> tag, closing the <svg> 
  element and <ext> element, and moving on with normal HTML 
  processing; for SVG, any element with errors is not rendered.


See the following emails by Henri Sivonen for comparison and contrast:

Embedded HTML

The case of content inside a <foreignObject> element could be subject to the parsing model of the root document. (Note that this is only a partial solution, and more thought and details are needed.)

For content outside <foreignObject>, it should follow the XML processing rules.

Fallback Behavior

This is an opportunity to get nice fallback behavior, as well.

Here's a possible suggestion, where the raster image would show in UAs that didn't support the <ext> syntax, and the SVG would show in those that did (and which support SVG). In UAs which support <ext> and not SVG, the fallback would also be the raster. The fallback content should be inside a wrapper element (<fallback>), so that you can have rich fallback options, such as an image map, a table, <canvas> and an accompanying <script> element, or whatever; in this case, I also include fallback CSS to hide textual content in title, desc, and text elements, but it may be desirable to leave this content as alternate text to the image, even including styling.

For MathML content, a conditional CSS override could allow for CSS styling of MathML elements for those that don't render MathML natively.

Note: as stated before, the names of the <ext> and <fallback> elements are subject to change based on existing element names in the wild.

<html lang="en">
<head>
	<title>HTML Extensibility Test</title>
</head>
<body>
	<h1 id="test_of_extensibility">Test of Extensibility</h1>
	<p>This is a test of an extensibility point in text/html, with a fallback mechanism.</p>
	<ext type="image/svg+xml">
		<fallback>
			<img src="anIsland.png" alt="..."/>
			<style type='text/css'>
			   svg > * { display: none; }
		        </style>
		</fallback>
		<svg xmlns="http://www.w3.org/2000/svg"
		     xmlns:xlink="http://www.w3.org/1999/xlink"
		     width="100%" height="100%"
		     version="1.1">
			<title>My Title</title>
			<desc>schepers, 01-04-2008</desc>
			<circle id="circle_1" cx="75" cy="25" r="20" fill="lime" />
      			<text id='text_1' x='10' y='25' font-size='18' fill='crimson'>This is some text.</text>
		</svg>		
	</ext>
</body>
</html>

Reasons why we can't do this

It's not clear what the processing model being proposed actually is. However, there is already one problem:

  • The idea relies on not conflicting with legacy content. Unfortunately, whatever syntax we end up using, people will copy and paste it from documents that were written by competent authors that tested it against the new UAs, into documents written by authors who don't know about this, and who don't have the new UA, thus creating new "legacy documents" that use whatever syntax we come up with. Saying the risk is minimal doesn't mitigate this problem. It's a real problem, and we have to deal with it. Such content clearly wouldn't work in the legacy UAs, and so the mistake will have no reason to propagate (the novice author will have no incentive to copy such content if it doesn't work in their UA); further, this is an issue with any new HTML5 syntax. Please explain in more detail how this is a problem. -Shepazu

Say someone writes:

 <p>foo <newsyntax> ... </newsyntax> bar </p>

...and that that is all good, and then someone copies just the "foo" part, accidentally including the <newsyntax> bit:

 <p>bla bla foo <newsyntax> bla bla </p>

For most features in HTML5, nothing drastically bad will happen. With <ext>, depending on exactly what your proposal is, the rest of the page would now be showing an error or be ignored.

What happens for those features in HTML5 where something bad does happen? Which features are those? This will give me some idea of the range of acceptable behavior. -Shepazu

Note also that the fallback idea doesn't work. Elements like <script>, <style>, <title>, <input>, <textarea> etc, get treated as HTML elements in legacy UAs. Please expand on this; the <fallback> proposal relies on the fact that any content in the <fallback> element be treated as HTML, so it's not clear what the objection is. -Shepazu The objection is relating to content in the <svg> part, not the <fallback> part.

Relying on CSS for hiding the text content doesn't work either, because CSS is optional and might not be enabled (or supported). In this case, legacy browsers that don't support <fallback> and in which CSS (or optionally JS) is not enabled or supported would, unfortunately print the textnodes; this still doesn't break the page, however, and any page which relies on JS or CSS will always fail in such UAs; this is true of almost every Webapp. -Shepazu It means the fallback doesn't work -- showing both isn't fallback. And yes, pages relying on CSS are badly designed. HTML5 goes out of its way to make that unnecessary.

(It doesn't much matter, though, because fallback isn't one of the things we're trying to address with this.) This seems like an artificial constraint; any solution which solves another larger problem (even if that's not the problem being addressed) is not a bug, it's a feature. -Shepazu