This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Over in bug 5808 I suggested a way to coerce the output of the HTML5 parsing algorithm into XML. It's theoretically unpure for conforming documents to trigger coercions that aren't mostly harmless. I, therefore, suggest narrowing the conformance definition accordingly. * The document mode isn't part of the infoset: Optionally communicate as out-of-infoset-band data. Instruct apps to use the standards mode when not communicated. Mostly harmless. * The form pointer isn't part of the infoset: Make communicating the form pointer optional. Allow communicating it as out-of-infoset-band data. When the form element is not an ancestor of the form control, allow an UUID id attribute be generated on the form element and allow a form attribute be generated on the form control. Mostly harmless. * Some XML APIs treat the doctype as syntactic sugar: Make representing the document type information item is optional. Mostly harmless. * Attributes with the local name "xmlns" or a local name starting with "xmlns:" are not permitted attribute information items: Drop on the floor. Mostly harmless. However, in the case of <embed>, this theoretically loses conforming data. These attributes could be excluded from what is permitted on <embed> as plug-in parameters. * Namespace declarations are not attribute information items: Drop on the floor. (Optionally syntethize namespace information items for XLink and SVG or MathML on <svg> and <math> nodes, respectively, and XHTML namespace information items on HTML elements (including root) that do not have an HTML element as the parent.) Mostly harmless. * Form feed is not an XML character (either literally or as a character reference expansion): turn into a space. Mostly harmless. * The input stream contains a literal non-XML character other than form feed: turn into a REPLACEMENT CHARACTER. Mostly harmless, but these might as well be defined as non-conforming. * A comment contains "--": Replace with "- -". Mostly harmless. * A name is not an NCName: Use the original name on tree builder stack for matching, but use as escaped name in the output. The escaping function must escape each non-NCName to a unique NCName, and the result must have at least one upper case ASCII character but must not match any known SVG camelCase name. This is dataloss in theory even if not in probable practice. Attributes that are actually used on <embed> are NCNames anyway, so forbidding non-NCNames wouldn't break anything. Forbidding data-* from forming a non-NCName would still leave a countably infinite space of names, and authors are likely to use printable ASCII anyway.
So what exactly are the changes you're proposing? (No need to tell me what you _don't_ want me to change!)
I'm suggesting that attributes on <embed> and data-* attributes be restricted to XML 1.0 4th ed. + Namespaces NCNames for the purpose of conformance (with the consequence that xmlns:foo on <embed> ends up as non-conforming, too). If this is against data-* principles, we could at least have this restriction on <embed> to get some theoretical purity without actually restricting any practical activities.
Bah. You ruin all the fun. :-P
r1836
This bug predates the HTML Working Group Decision Policy. If you are satisfied with the resolution of this bug, please change the state of this bug to CLOSED. If you have additional information and would like the editor to reconsider, please reopen this bug. If you would like to escalate the issue to the full HTML Working Group, please add the TrackerRequest keyword to this bug, and suggest title and text for the tracker issue; or you may create a tracker issue yourself, if you are able to do so. For more details, see this document: http://dev.w3.org/html5/decision-policy/decision-policy.html This bug is now being moved to VERIFIED. Please respond within two weeks. If this bug is not closed, reopened or escalated within two weeks, it may be marked as NoReply and will no longer be considered a pending comment.