SVG in text-html 2009

These are the opinions of the SVG WG on the topic of SVG in HTML for consideration by the HTML WG.

XXX I'd rather say this is "some feedback" on their proposal. I'm not ready to say/imply any of this is a position I'm taking, just yet. -jwatt

The requirements are based on consensus reached at the SVG WG Sydney F2F 2009, and in part from TPAC 2008. For reference, here's the old SVG in HTML proposal from the SVG WG.

Requirements

HTML5 and SVG should make every effort to minimize the learning curve, pitfalls and other undesirable issues that content authors may encounter due to differences between SVG served as image/svg+xml and SVG in text/html, and when it comes to moving SVG between these two types of document. In so far as is possible, content authors should be able to take a valid SVG document, paste its markup into an HTML document, and have it render as expected and have the SVG fragment's DOM be identical to the DOM of the standalone SVG document when served as image/svg+xml. Content authors should not be burdened with unnecessary debugging, tweaking or cleanup steps in the common case when it comes to this simple process.
HTML5 should not place unnecessary barriers in the way of, or unnecessarily restrict, the future evolution of the SVG language. (Both working groups should coordinate to maximize compatibility between the two specifications and avoid standing on each others toes, of course.)

In line with making the Open Web Platform as easy and pain free to use as possible, the WG believes that, in general, when HTML5 parsers encounter SVG that would not be valid XML SVG, the SVG should be non-conforming, even though it would render. The rational is that validators and error consoles would flag and raise awareness of any issues in someones HTML SVG that would stop them from copying the markup out to an XML file, and thereby be another weapon in reducing author pain when working with open formats like SVG and HTML.

Feedback

The following is feedback on the "foreign content" text that is currently commented out of the HTML5 draft by  comments. (These comments can be seen by loading pages from the parsing section of the HTML5 draft, running this Show XXXSVG comments bookmarklet, and then searching the pages for "XXXSVG".)

The SVG WG is of the opinion that the contents of the SVG 'title' element should be RCDATA, and therefore would prefer that the HTML5 parsing algorithm not require conforming parsers to break out of foreign content mode and parse the element's content as HTML.

The SVG WG feels that, on balance, it would be useful for the contents of SVG's 'desc' and 'foreignObject' elements to be parsed as HTML by default, and therefore do not object to the HTML5 draft requiring conforming parsers to break out of foreign content mode to parse the content of these elements. However, the SVG WG does have some concerns regarding adverse effects on extensibility. We also do not support the use of 'desc' as a container for fallback content, as has been suggested, though we do agree that a fallback mechanism for both SVG and HTML is a useful idea.

The SVG WG recognizes that entities pose a particular challenge: undefined entity/character references won't work if SVG fragments are copied out of HTML, and DOCTYPE-defined entities (as is common for some SVG authoring tools) could only work if those entities definitions are included in the file and are somehow recognized. The same problem could also occur in XHTML+SVG documents. In general, the SVG WG agrees that special-casing some entity handling is acceptable, and is happy to have a further dialog with implementers about this.

For the 'font' element: If the HTML WG believes that it's worth the extra complexity of implementation with the special handling of the <font> element in order to have a minor fraction of existing html content not change its rendering, then ok. (The SVG WG thinks it's good that the <font> element won't break out of foreign content mode for SVG for the most part.)

There's a comment in the HTML5 spec [[]] Could the HTML WG please clarify what is required with regards to that?

In XML CDATA-sections are distinct from text, but in HTML it's all the same. It means scripts that look at the structure of documents may not work. However, this is a minor issue that the SVG WG is willing to live with.

The SVG WG is happy to see that XML and DOCTYPE declarations are ignored if found under the root element of the document. In that case they should have no effect (though it may be useful to discuss this in terms of the effect on entities declared in the DOCTYPE).

The HTML5 draft defines a set of tags names for which the parser should break out of foreign content mode. The SVG WG would like to know the rationale for doing so for each of these tags.

The SVG WG suggests that unless proven to be breaking lots of content, adding character encoding-detection for SVG files served as "text/html" based on <?xml encoding="..."?>. There would still be an issue with UTF-8 SVG documents lacking an XML declaration; perhaps the fact that the first open tag encountered in the document is an <svg> tag could make the encoding guesser choose UTF-8 in this case?

Ideally, the SVG WG would like the HTML tokenizer to be case-preserving for attribute and element names.

The SVG WG requests that the SVG case-fixup table be removed from the draft. We believe that HTML5 should defer to the appropriate (SVG) specification(s), and that this is not something that HTML5 should define. If the tokenizer is required to be case-preserving, the table is no longer necessary.

Going forward, the SVG WG recognizes that choosing all lowercase attribute names would be helpful for both integration in HTML and if certain attributes are to become CSS properties. Choosing all lowercase element names would also be preferred, although in some cases consistency would dictate that we would introduce some new mixed case element names. For example, if we introduced a new filter primitive element that didn't adhere to the "feSomethingOrOther" style, it would be confusing for authors.

XXX do we really want to do this, even for attributes? We have a ton of attributes with mixed case. HTML5 is going to have to deal with them one way or another, so I see little value is changing convention for future attributes or elements. We then just create an inconsistent mess internally within SVG. What's the rational for breaking internal consistency in SVG? The SVG 1.1 mixed case attributes are: attributeType, baseFrequency, baseProfile, calcMode, clipPathUnits, contentScriptType, contentStyleType, diffuseConstant, edgeMode, externalResourcesRequired, filterRes, filterUnits, glyphRef, gradientTransform, gradientUnits, kernelMatrix, kernelUnitLength, keyPoints, keySplines, keyTimes, lengthAdjust, limitingConeAngle, markerHeight, markerUnits, markerWidth, maskContentUnits, maskUnits, numOctaves, pathLength, patternContentUnits, patternTransform, patternUnits, pointsAtX, pointsAtY, repeatCount, repeatDur, requiredExtensions, specularConstant, specularExponent, spreadMethod, stdDeviation, stitchTiles, surfaceScale, systemLanguage, tableValues, targetX, targetY, textLength, viewBox, viewTarget, xChannelSelectoryChannelSelector and zoomAndPan. So lots.

For the case where an SVG file is inadvertently served as 'text/html', the SVG WG proposes that if the parser encounters an 'svg' element in the "before html" parse mode that no 'html' and 'body' element be inserted above the 'svg' element. Rather, we would prefer that the parser be required to simply insert the 'svg' element and switch to foreign content mode. (HTML5 could specify that documents with 'svg' as the root element are non-conforming so validators would flag this case.) There are at least two reasons for making this change. First, if parented by an implicit 'body' element, most SVG (specifically SVG that depends on the default value of 100% for the 'height' attribute on the 'svg' element) would then get a used height of the 150px (the CSS 2.1 replaced element fallback height). This would result in SVG mistakenly or deliberately served as text/html rendering differently to the same SVG viewed locally or served as image/svg+xml. Secondly, accessing the 'document.documentElement' object is common in JavaScript in SVG, and SVG assumes that this will be the 'svg' element and will not be prepared to encounter inserted parent 'html' and 'body' elements. This script would need to be change if pasted in the middle of an HTML document, but we would be able to prevent breakage if the SVG were pasted as the whole document. Such documents should be in standards mode, regardless of whether they include the SVG DOCTYPE. We do have one unresolved issue with our request, however. If the parser encounters an HTML start tags that break out of foreign content mode, where would it "break out" to (There's no <body> element to pop back to)?

When SVG fragments in HTML are encountered, any invalid element or attribute casing should be generating parse errors.

The SVG WG is happy to see that unknown elements that are inside SVG fragments are inserted as SVG elements, but we'd like to see the casing of attributes and element names preserved.

The SVG WG agrees that foreign content should not be allowed to imply start or end tags.

The SVG WG requests that minimized and unquoted attribute values raise parse errors when found on SVG elements. Rationale:
1. Consistent with making incorrect xmlns attributes generate parse error.
2. Minimizing the number of documents which are conforming HTML whose SVG fragments when copied to "image/svg+xml" are non-wellformed.

The SVG WG agrees that it may be useful to forego namespace declarations for the SVG and XLink namespaces (as well as certain others, such as MathML). However, we believe that rather than hardcoding the namespace prefixes, those prefixes should default to that namespace. We are not suggesting at this time that namespace declarations should be able to override that default in HTML5, but some future revision of the language may specify that behavior, and hardcoding limits the potential for future extensibility solutions.

Unresolved issues (no consensus):

Should 'xlink:href' attributes be recognized in the face of xmlns:xlink="typo"? What if there is no 'xmlns:xlink' attribute specified (in scope)? Even if they are recognized, should they be parse errors so validators will flag them?

Should SVG elements with namespace prefixes be supported by HTML5 parsers?

Instead of requesting that unquoted attributes etc. be parse errors, should we instead suggest that authors write polyglot SVG-in-HTML and SVG-in-XHTML documents, and validate against both of those schemas?

Should a missing xmlns="http://www.w3.org/2000/svg" attribute on a root <svg> element be a parse error? [I think it should be so that validators will flag it, even though the SVG will still render. -jwatt]

How do we fix our suggestion that doesn't generate implied <html> and <body> open tags?

We need to discuss a "merging" strategy. The SVG WG is interested in aligning with some HTML elements and attributes to make it simpler to use SVG for those already familiar with HTML. Among the things discussed:
- allowing @null:href or even @null:src where currently SVG requires @xlink:href (e.g. <a href="...">)
- adding a <link> element to allow linking to CSS stylesheets

How would these be parsed under the current proposal?

Additional Considerations

In addition to inline SVG in text/html and XHTML, there are other points that need some clarification and coordination.

The SVG WG is planning to specify various capabilities of SVG files when referenced in different contexts. For example, when an SVG file with script functionality and declarative animation is referenced by an <html:object> element, the script will execute and the declarative animation should run; however, that same file when referenced by an <html:img> element may only run the animation, and not execute the script, for security reasons. Similarly, restrictions on interactivity, link traversal, external (or cross-domain) references, and other capabilities may be imposed, based on the referencing context. One use case is permitting SVG to used as an animated, but not scripted, icon for the Widget packaging format. Where should the specifications for these restrictions happen? The SVG WG is operating on the assumption that different "profiles" (or sets of featurestrings) of SVG can be defined in the SVG specification itself, and that it can work with other WGs, such as the HTML and CSS WGs, to provide references that those specifications can employ in their own specs. For example, the HTML spec could point to the full profile of SVG as possible content for <html:object>, and a restricted profile of SVG for <html:img>. The SVG WG is interested in discussing this further with the HTML and CSS WGs.

Some thoughts on this are gathered in Embedding SVG Examples.