This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 13821 - href attributes not being restricted to valid URIs
Summary: href attributes not being restricted to valid URIs
Status: RESOLVED INVALID
Alias: None
Product: Validator
Classification: Unclassified
Component: Parser (show other bugs)
Version: HEAD
Hardware: PC Windows NT
: P2 normal
Target Milestone: ---
Assignee: This bug has no owner yet - up for the taking
QA Contact: qa-dev tracking
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-18 11:45 UTC by Ben
Modified: 2011-08-19 10:41 UTC (History)
2 users (show)

See Also:


Attachments

Description Ben 2011-08-18 11:45:04 UTC
Invalid URIs are being allowed as values for href attributes in anchor tags within XHTML document instances.

Eg this element:
<a href=" \ \ / / \\ \\ \\ ">blah</a>

was contained with an instance that "was successfully checked as XHTML 1.0 Transitional".

Header from the instance:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" dir="ltr" lang="en-US">
Comment 1 Ben 2011-08-18 11:50:08 UTC
Section 2.4.3 of the URI definition at http://www.ietf.org/rfc/rfc2396.txt clearly excludes the backslash character, for example, from a valid URI.
Comment 2 Ville Skyttä 2011-08-18 17:16:39 UTC
Validator checks against the given DTD, which defines the content of the href attribute as %URI, which again is defined as CDATA, so however invalid the URL might in reality be, it is valid as far as validation against the DTD goes.

http://www.w3.org/TR/xhtml1/dtds.html#dtdentry_xhtml1-transitional.dtd_a
http://www.w3.org/TR/xhtml1/dtds.html#dtdentry_xhtml1-transitional.dtd_URI
http://validator.w3.org/docs/help.html#validation_basics
Comment 3 Ben 2011-08-19 00:14:22 UTC
I appreciate that the DTD does not enforce a valid URI (in fact, most data "types" defined there are just CDATA??).

So the point you're making I think is that the href example I've given is "valid", but does not "conform".

But the help page for this validator states the following:

"
Is validity the same thing as conformance?

 No, they are different concepts. 

Markup languages are defined in technical specifications, which generally include a formal grammar. A document is valid when it is correctly written in accordance to the formal grammar, whereas conformance relates to the specification itself. The two might be equivalent, but in most cases, some conformance requirements can not be expressed in the grammar, making validity only a part of the conformance. 
"

I understand and agree with this statement.

But my point is that the ACTUAL specification for a uri IS defined in a formal grammar within a technical specification (the RFC I referenced earlier).  While the DTD may be A specification (that the validator uses) and does include A formal grammar, it's not THE specification that properly defines a valid URI.

In short, the DTD does not properly reflect what is specified with the defining grammar, and hence the validator is not properly validating.
Comment 4 Ville Skyttä 2011-08-19 07:43:33 UTC
No matter how incomplete/insufficient wrt. conformance it might be, the "defining grammar" for XHTML 1.0 validity is the DTD, not the URI RFC.  See "Validation" at http://www.w3.org/TR/xhtml1/#general
Comment 5 Ben 2011-08-19 10:22:45 UTC
The help page says this, re the gap b/w conformance and validity:

"A document is valid when it is correctly written in
accordance to the formal grammar, whereas conformance relates to the
specification itself. The two might be equivalent, but in most cases, some
conformance requirements can not be expressed in the grammar, making validity
only a part of the conformance."  

That last sentence implies that any gap between validity and conformance is as a result of being unable to express certain usage requirements in the language of a validity grammar.  It does not suggest, as you have, that any gap could be as a result of incomplete/insufficient specification within the DTD.

But I appreciate that the validator is working against the DTD.  Perhaps I should be logging a bug with the DTD itself?  Or perhaps with the help documentation itself which is apparently misleading?  Is this possible?

Because one way or the other and regardless of where the fault lies, the validator is NOT producing correct results wrt validity against the authoritative technical specification, and the formal grammar contained therein, for a URI as defined by the IETF RFC, as the help documentation implies it should.

This is not minutia.  The URI is a concept that exists beyond an XHTML definition, so it seems reasonable that the XHTML definition should use a definition of a URI that is in line with the broader understanding (and more importantly, specification) of what a URI actually is.  It's NOT a general string, by (strict) definition! 

Perhaps the help documentation could be re-written to qualify that the formal grammar that is referred to, is the W3C one, and that this may or may not accurately reflect the restrictions imposed by the authoritative technical specification and/or formal grammar of the individual sub-units that form part of an XHTML document?

And I have to ask (as I'm seriously struggling to understand it!) why isn't a URI, of all things, sufficiently specified in the DTD?  Am I missing something?  Surely well-formed URIs within a document instance are of much more (practical) importance than a misplaced or missing closing P tag, for example?
Comment 6 Ville Skyttä 2011-08-19 10:30:14 UTC
(In reply to comment #5)
> Perhaps I should be logging a bug with the DTD itself?

Regarding feedback on the HTML specifications and DTDs etc, I suggest contacting the W3C HTML Working Group: http://www.w3.org/html/wg/
Comment 7 Ben 2011-08-19 10:41:18 UTC
Thanks Ville, I'll do that.

In the mean time though, what do you think of my suggestion:

"
Perhaps the help documentation could be re-written to qualify that the formal
grammar that is referred to, is the W3C one, and that this may or may not
accurately reflect the restrictions imposed by the authoritative technical
specification and/or formal grammar of the individual sub-units that form part
of an XHTML document?
"

As it sits, the help documentation for this validator "hides" the fact that the DTD used by the validator causes insufficient validation.