This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
Encoded unicode chars in a link anchor are incorrectly described as "broken URI fragments". (It seems the program does not recognize encoded chars in an anchor.) Example: http://www.domain.org/folder/file#espa%C3%B1a Error output: ============================================================================= Status: 200 OK Some of the links to this resource point to broken URI fragments (such as index.html#fragment). ============================================================================= But encoded unicode chars in other parts of the url do not output error. Example: http://www.domain.org/espa%C3%B1a/espa%C3%B1a P.S. I am using http://validator.w3.org/checklink which currently is version 4.5. Unfortunately, only version 4.4 is available at the bug report drop down menu.
The set of allowed characters depends on the type of the document where the target of the fragment identifier (not the link) is, and what the target is. For example, if it's an "id" attribute, the set of characters is quite restricted and for example for <a name="..."> there are differences between HTML 4.x and XHTML 1.x what the allowed characters are: http://www.w3.org/TR/xhtml1/#C_8 No working, real URL to check was provided so it isn't possible to dig deeper into your particular case, but for compatibility I would suggest sticking with the allowable characters for ID and NAME types from HTML 4: http://www.w3.org/TR/html4/types.html#type-name It is quite possible that the link checker indeed has issues with some things that should be allowed though, so I'm leaving this bug open for now. A working link to real, public document with which such issues can be reproduced would be nice though. (Version 4.5 is in the drop down menu now, thanks for noting its absence.)
I am using xhtml 1.0 sent as text/html and 1.1 sent as application/xhtml+xml. I am using content negotiation depending on which the user agent accepts. I use "id" inside an "a". Example 1 with "id": <a href="/espa%C3%B1a/espa%C3%B1a" id="españa">España</a> Example 2 with "#": <a href="/espa%C3%B1a/espa%C3%B1a#espa%C3%B1a">España</a> (Note different encoding for char in text and uri.) Both validate using the W3C Markup Validator. Unfortunately, Example 2 fails the W3C Link Checker stating error mentioned in my previous post. I have read your links where it is stated, "Note that the collection of legal values in XML 1.0 Section 2.3, production 5 is much larger than that permitted to be used in the ID and NAME types defined in HTML 4." html's "id" (and "name") allow: letters (a-z), digits, hyphens, underscores, colons, and periods. Unfortunately, I have failed to find a list of valid encodings for the "id" in xhtml. Nevertheless, since xhtml's list is "much larger", then I guess the previously mentioned anchor is correct?