This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
This function would perform URI escaping as currently defined by the Serialization specification. If this function is adopted, then the Serialization specification would reference this function definition when describing URI-escaping in the character expansion phase. -------------------------------------------------- fn:escape-html-uri($uri as xs:string?) as xs:string This function escapes all characters except printable characters of the US- ASCII coded character set, specifically octet ranging from 32 to 126 (decimal). The effect of the function is to escape a URI according to how html user agents would handle attribute values that expect URIs. Each character in $uri to be escaped is replaced by an escape sequence, which is formed by encoding the character as a sequence of octets in UTF-8, and then representing each of these octets in the form %HH, where HH is the hexadecimal representation of the octet. This function must always generate hexadecimal values using the upper-case letters A-F. If $uri is the empty sequence, returns the zero-length string. Note: The behavior of this function corresponds to the recommended handling of non-ASCII characters in URI attribute values as described in Appendix B.2.1 [HTML 4.0] -------------------------------------------------- Thanks, Joanne
(In reply to comment #0) > This function would perform URI escaping as currently defined by the > Serialization specification. If this function is adopted, then the > Serialization specification would reference this function definition when > describing URI-escaping in the character expansion phase. > > > -------------------------------------------------- > > fn:escape-html-uri($uri as xs:string?) as xs:string > > This function escapes all characters except printable characters of the US- > ASCII coded character set, specifically octet ranging from 32 to 126 > (decimal). The effect of the function is to escape a URI according to how html > user agents would handle attribute values that expect URIs. Each character in > $uri to be escaped is replaced by an escape sequence, which is formed by > encoding the character as a sequence of octets in UTF-8, and then representing > each of these octets in the form %HH, where HH is the hexadecimal > representation of the octet. This function must always generate hexadecimal > values using the upper-case letters A-F. > > If $uri is the empty sequence, returns the zero-length string. > > Note: > > The behavior of this function corresponds to the recommended handling > of non-ASCII characters in URI attribute values as described in Appendix B.2.1 > [HTML 4.0] In the serialization specification, you refer to XLink 1.0. In this specification, you say that URI escaping is defined in terms of the serialization specification, but you also define it in terms of HTML 4.0. I'm a little bit confused by this. Could you clarify? In a previous comment, we pointed out that XLink 1.1 defines escaping in terms of IRI. Could you imagine to refer to IRI, section 3.1, for the URI escaping? Thank you for your reply in advance. -- Regards, Felix Sasaki. > > -------------------------------------------------- > > Thanks, > Joanne
Here's an attempt to clarify this confusing subject. Currently, the serialization specification, when describing URI escaping for the HTML output method, does indeed contain a reference to XLink; but the detailed algorithm described is actually by design identical to that described in Appendix B.2.1 of the HTML 4.01 specification: http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#non-ascii-chars People have often asked why we escape non-ASCII characters rather than escaping the characters listed in the XLink specification; it seems useful therefore to reference the HTML algorithm rather than the XLink algorithm, since that is the one we are using. (The practical reason for choosing this algorithm is that using the XLink algorithm doesn't work: in particular, it breaks many Javascript URIs in typical browsers). This proposal (which the WGs have accepted) makes the algorithm which is currently built-in to the serializer available as a user-callable function, so that applications can invoke it when they need it and use a different algorithm when they don't. As a result of this proposal, there is a new F+O function which refers to the HTML 4.01 specification, and the serialization specification will refer to this new F+O function to describe the default serialization behavior. Does this make things clearer? Michael Kay (personal response)
On the joint WG telcon om 7/12/2005 we decided to add this function based on the text provided by Joanne Tong. Joanne was asked to provide example which she sent privately to me.
(In reply to comment #2) > Here's an attempt to clarify this confusing subject. > > Currently, the serialization specification, when describing URI escaping for the > HTML output method, does indeed contain a reference to XLink; but the detailed > algorithm described is actually by design identical to that described in > Appendix B.2.1 of the HTML 4.01 specification: > > http://www.w3.org/TR/1999/REC-html401-19991224/appendix/notes.html#non-ascii- chars > > People have often asked why we escape non-ASCII characters rather than escaping > the characters listed in the XLink specification; it seems useful therefore to > reference the HTML algorithm rather than the XLink algorithm, since that is the > one we are using. (The practical reason for choosing this algorithm is that > using the XLink algorithm doesn't work: in particular, it breaks many Javascript > URIs in typical browsers). > > This proposal (which the WGs have accepted) makes the algorithm which is > currently built-in to the serializer available as a user-callable function, so > that applications can invoke it when they need it and use a different algorithm > when they don't. As a result of this proposal, there is a new F+O function which > refers to the HTML 4.01 specification, and the serialization specification will > refer to this new F+O function to describe the default serialization behavior. > > Does this make things clearer? > > Michael Kay (personal response) Sorry for the late reply. Yes, this makes things clearer. Thank you very much for your explanatation. Felix Sasaki