<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE bugzilla SYSTEM "https://www.w3.org/Bugs/Public/page.cgi?id=bugzilla.dtd">

<bugzilla version="5.0.4"
          urlbase="https://www.w3.org/Bugs/Public/"
          
          maintainer="sysbot+bugzilla@w3.org"
>

    <bug>
          <bug_id>13821</bug_id>
          
          <creation_ts>2011-08-18 11:45:04 +0000</creation_ts>
          <short_desc>href attributes not being restricted to valid URIs</short_desc>
          <delta_ts>2011-08-19 10:41:18 +0000</delta_ts>
          <reporter_accessible>1</reporter_accessible>
          <cclist_accessible>1</cclist_accessible>
          <classification_id>1</classification_id>
          <classification>Unclassified</classification>
          <product>Validator</product>
          <component>Parser</component>
          <version>HEAD</version>
          <rep_platform>PC</rep_platform>
          <op_sys>Windows NT</op_sys>
          <bug_status>RESOLVED</bug_status>
          <resolution>INVALID</resolution>
          
          
          <bug_file_loc></bug_file_loc>
          <status_whiteboard></status_whiteboard>
          <keywords></keywords>
          <priority>P2</priority>
          <bug_severity>normal</bug_severity>
          <target_milestone>---</target_milestone>
          
          
          <everconfirmed>1</everconfirmed>
          <reporter name="Ben">sepster</reporter>
          <assigned_to name="This bug has no owner yet - up for the taking">dave.null</assigned_to>
          <cc>sepster</cc>
    
    <cc>ville.skytta</cc>
          
          <qa_contact name="qa-dev tracking">www-validator-cvs</qa_contact>

      

      

      

          <comment_sort_order>oldest_to_newest</comment_sort_order>  
          <long_desc isprivate="0" >
    <commentid>55415</commentid>
    <comment_count>0</comment_count>
    <who name="Ben">sepster</who>
    <bug_when>2011-08-18 11:45:04 +0000</bug_when>
    <thetext>Invalid URIs are being allowed as values for href attributes in anchor tags within XHTML document instances.

Eg this element:
&lt;a href=&quot; \ \ / / \\ \\ \\ &quot;&gt;blah&lt;/a&gt;

was contained with an instance that &quot;was successfully checked as XHTML 1.0 Transitional&quot;.

Header from the instance:

&lt;!DOCTYPE html PUBLIC &quot;-//W3C//DTD XHTML 1.0 Transitional//EN&quot; &quot;http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd&quot;&gt;
&lt;html xmlns=&quot;http://www.w3.org/1999/xhtml&quot; dir=&quot;ltr&quot; lang=&quot;en-US&quot;&gt;</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55417</commentid>
    <comment_count>1</comment_count>
    <who name="Ben">sepster</who>
    <bug_when>2011-08-18 11:50:08 +0000</bug_when>
    <thetext>Section 2.4.3 of the URI definition at http://www.ietf.org/rfc/rfc2396.txt clearly excludes the backslash character, for example, from a valid URI.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55438</commentid>
    <comment_count>2</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2011-08-18 17:16:39 +0000</bug_when>
    <thetext>Validator checks against the given DTD, which defines the content of the href attribute as %URI, which again is defined as CDATA, so however invalid the URL might in reality be, it is valid as far as validation against the DTD goes.

http://www.w3.org/TR/xhtml1/dtds.html#dtdentry_xhtml1-transitional.dtd_a
http://www.w3.org/TR/xhtml1/dtds.html#dtdentry_xhtml1-transitional.dtd_URI
http://validator.w3.org/docs/help.html#validation_basics</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55455</commentid>
    <comment_count>3</comment_count>
    <who name="Ben">sepster</who>
    <bug_when>2011-08-19 00:14:22 +0000</bug_when>
    <thetext>I appreciate that the DTD does not enforce a valid URI (in fact, most data &quot;types&quot; defined there are just CDATA??).

So the point you&apos;re making I think is that the href example I&apos;ve given is &quot;valid&quot;, but does not &quot;conform&quot;.

But the help page for this validator states the following:

&quot;
Is validity the same thing as conformance?

 No, they are different concepts. 

Markup languages are defined in technical specifications, which generally include a formal grammar. A document is valid when it is correctly written in accordance to the formal grammar, whereas conformance relates to the specification itself. The two might be equivalent, but in most cases, some conformance requirements can not be expressed in the grammar, making validity only a part of the conformance. 
&quot;

I understand and agree with this statement.

But my point is that the ACTUAL specification for a uri IS defined in a formal grammar within a technical specification (the RFC I referenced earlier).  While the DTD may be A specification (that the validator uses) and does include A formal grammar, it&apos;s not THE specification that properly defines a valid URI.

In short, the DTD does not properly reflect what is specified with the defining grammar, and hence the validator is not properly validating.</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55467</commentid>
    <comment_count>4</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2011-08-19 07:43:33 +0000</bug_when>
    <thetext>No matter how incomplete/insufficient wrt. conformance it might be, the &quot;defining grammar&quot; for XHTML 1.0 validity is the DTD, not the URI RFC.  See &quot;Validation&quot; at http://www.w3.org/TR/xhtml1/#general</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55473</commentid>
    <comment_count>5</comment_count>
    <who name="Ben">sepster</who>
    <bug_when>2011-08-19 10:22:45 +0000</bug_when>
    <thetext>The help page says this, re the gap b/w conformance and validity:

&quot;A document is valid when it is correctly written in
accordance to the formal grammar, whereas conformance relates to the
specification itself. The two might be equivalent, but in most cases, some
conformance requirements can not be expressed in the grammar, making validity
only a part of the conformance.&quot;  

That last sentence implies that any gap between validity and conformance is as a result of being unable to express certain usage requirements in the language of a validity grammar.  It does not suggest, as you have, that any gap could be as a result of incomplete/insufficient specification within the DTD.

But I appreciate that the validator is working against the DTD.  Perhaps I should be logging a bug with the DTD itself?  Or perhaps with the help documentation itself which is apparently misleading?  Is this possible?

Because one way or the other and regardless of where the fault lies, the validator is NOT producing correct results wrt validity against the authoritative technical specification, and the formal grammar contained therein, for a URI as defined by the IETF RFC, as the help documentation implies it should.

This is not minutia.  The URI is a concept that exists beyond an XHTML definition, so it seems reasonable that the XHTML definition should use a definition of a URI that is in line with the broader understanding (and more importantly, specification) of what a URI actually is.  It&apos;s NOT a general string, by (strict) definition! 

Perhaps the help documentation could be re-written to qualify that the formal grammar that is referred to, is the W3C one, and that this may or may not accurately reflect the restrictions imposed by the authoritative technical specification and/or formal grammar of the individual sub-units that form part of an XHTML document?

And I have to ask (as I&apos;m seriously struggling to understand it!) why isn&apos;t a URI, of all things, sufficiently specified in the DTD?  Am I missing something?  Surely well-formed URIs within a document instance are of much more (practical) importance than a misplaced or missing closing P tag, for example?</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55474</commentid>
    <comment_count>6</comment_count>
    <who name="Ville Skyttä">ville.skytta</who>
    <bug_when>2011-08-19 10:30:14 +0000</bug_when>
    <thetext>(In reply to comment #5)
&gt; Perhaps I should be logging a bug with the DTD itself?

Regarding feedback on the HTML specifications and DTDs etc, I suggest contacting the W3C HTML Working Group: http://www.w3.org/html/wg/</thetext>
  </long_desc><long_desc isprivate="0" >
    <commentid>55475</commentid>
    <comment_count>7</comment_count>
    <who name="Ben">sepster</who>
    <bug_when>2011-08-19 10:41:18 +0000</bug_when>
    <thetext>Thanks Ville, I&apos;ll do that.

In the mean time though, what do you think of my suggestion:

&quot;
Perhaps the help documentation could be re-written to qualify that the formal
grammar that is referred to, is the W3C one, and that this may or may not
accurately reflect the restrictions imposed by the authoritative technical
specification and/or formal grammar of the individual sub-units that form part
of an XHTML document?
&quot;

As it sits, the help documentation for this validator &quot;hides&quot; the fact that the DTD used by the validator causes insufficient validation.</thetext>
  </long_desc>
      
      

    </bug>

</bugzilla>