This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
According to section 7.3 of Serialization,[1] "Certain characters, specifically the control characters #x7F-#x9F, are legal in XML but not in HTML. It is a serialization error [err:SERE0014] to use the HTML output method when such characters appear in the instance of the data model. The serializer MUST signal the error." The definition of the error in appendix B[2] repeats this with a slightly different formulation: "It is an error to use the HTML output method when characters which are legal in XML but not in HTML, specifically the control characters #x7F-#x9F, appear in the instance of the data model." It is true that the control characters #x7F through #x9F were the only characters permitted in XML 1.0 that were not permitted in HTML. In addition, the control characters #x01 through #x1F, excepting #x09, #xA and #xD, are permitted in XML 1.1 (though only as character references), but not in HTML per the SGML declaration of HTML 4.[3] I suggest the following corrections: . In the third paragraph of section 7.3, change "specifically the control characters #x7F-#x9F, are legal in XML" to "specifically the control characters #x1-#x8, #xB, #xC, #xE-#x1F and #x7F-#x9F, are legal in one or both versions of XML, but not in HTML" . In appendix B, in the definition of err:SER0014, change "specifically the control characters #x7F-#x9F" to "specifically the control characters #x1-#x8, #xB, #xC, #xE-#x1F and #x7F-#x9F" [1] http://www.w3.org/TR/xslt-xquery-serialization/#HTML_CHARDATA [2] http://www.w3.org/TR/xslt-xquery-serialization/#ERRSERE0014 [3] http://www.w3.org/TR/html401/sgml/sgmldecl.html
Is this now a complete list? Will it always remain a complete list? Might it not be better to change the "specifically" to "such as"?
Yes, it's quite possible that an explicit enumeration of characters will become out of date. I had worried about that, but I was also concerned that the list of proscribed characters in HTML is so obscure that simply saying "such as" wouldn't be of much help to either implementers or users. (After seven years of experience with implementing XSLT, it took me about an hour to discover where the list appears. I'd like to save others that pain.) How would you feel about the following proposed edits, which list all the control characters, while still hedging by using "such as"? . In the third paragraph of section 7.3, change "specifically the control characters #x7F-#x9F, are legal in XML" to "such as the control characters #x1-#x8, #xB, #xC, #xE-#x1F and #x7F-#x9F, are legal in one or both versions of XML, but not in HTML" . In appendix B, in the definition of err:SER0014, delete ", specifically the control characters #x7F-#x9F,"
That looks fine to me.
At its teleconference of 2009-11-12,[4] the WG suggested the wording proposed in comment #2 should be reworked to make it clear which control characters are permitted by which version of XML - particularly as many people will not be as familiar with the XML 1.1 Recommendation. This is my revised proposal: . In the third paragraph of section 7.3, change "Certain characters, specifically the control characters #x7F-#x9F, are legal in XML but not in HTML." to "Certain characters are legal in XML, but not in HTML -- for example, the control characters #x7F-#x9F, are legal in both XML 1.0 and XML 1.1, and the control characters #x1-#x8, #xB, #xC and #xE-#x1F are legal in XML 1.1, but none of these is permitted in HTML." . In appendix B, in the definition of err:SER0014, delete ", specifically the control characters #x7F-#x9F," [4] http://lists.w3.org/Archives/Member/w3c-xsl-wg/2009Nov/0028.html (Member-only link)
At the joint teleconference of the XQuery and XSL Working Groups of 2009-12-01,[1] the proposal in comment #4 was accepted. As only a few members of the XSL WG were present on the call, I will bring the proposal back to that working group for final ratification. [5] http://lists.w3.org/Archives/Member/w3c-xsl-query/2009Dec/0005.html (Member-only link)
[Revising the abstract.]
At its teleconference of 2009-12-03,[6] the XSL Working Group ratified the decision reported in comment #5. This will be Serialization erratum SE.E15. [6] http://lists.w3.org/Archives/Member/w3c-xsl-wg/2009Dec/0008.html