This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 6089 - Revise anyURI to use RFCs 3986 and 3987
Summary: Revise anyURI to use RFCs 3986 and 3987
Status: RESOLVED WONTFIX
Alias: None
Product: XML Schema
Classification: Unclassified
Component: Datatypes: XSD Part 2 (show other bugs)
Version: 1.1 only
Hardware: PC Windows XP
: P2 normal
Target Milestone: ---
Assignee: C. M. Sperberg-McQueen
QA Contact: XML Schema comments list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-09-16 20:25 UTC by David Ezell
Modified: 2011-05-09 12:48 UTC (History)
6 users (show)

See Also:


Attachments

Description David Ezell 2008-09-16 20:25:23 UTC
From MURATA Makoto (FAMILY Given) <eb2m-mrt@asahi-net.or.jp> 

Please revise subsubsection 3.2.17 ("anyURI") using RFCs 3986 and 3987.  
At present, RFC 2396 is still referenced.  Do you really disallow 
IPv6 Addresses?

http://lists.w3.org/Archives/Public/www-xml-schema-comments/2008JulSep/0040.html
Comment 1 Michael Kay 2008-09-16 20:42:43 UTC
This is surely intended as a bug against 1.0, rather than "1.0/1.1 both"?  The 1.1 spec has gone completely liberal about the contents of an anyURI, stating only that it's a character string whose intended purpose is as a URI.

As for XSD 1.0, I'm personally against historical revisionism, but I know that others disagree. The fact is that after 7 years, many of the implementations that exist today are stable and are not going to track every erratum to the spec anyway. The 1.0 spec for anyURI already allows implementations a fair degree of latitude, and many implementations take advantage of it.

Michael Kay (personal response)
Comment 2 Murata 2010-09-19 07:55:08 UTC
xsd:anyURI of 1.1 should allow LEIRIs of W3C (and IETF) and nothing else.  

W3C appears to be promoting two extensions of IRIs.  One extension 
is LEIRIs, while the other is xsd:anyURI of XSD 1.1.  IMHO, this is an 
extemely bad idea.  These two should be aligned.   After all, we have 
so many variations of web addresses.  Why XSD people introduce yet
another?

I would strongly urge the XML Schema WG to contact I18N, TAG, and IETF.
Comment 3 Michael Kay 2010-09-19 09:08:11 UTC
Changed the "version" to 1.1 to ensure this comes up on the WG agenda.
Comment 4 Pete Cordell 2010-09-20 11:10:21 UTC
My concern with only allowing LEIRIs is that xs:targetNamespace is declared as xs:anyURI, but it's long been understood that any string will pretty much do.  So there may be cases in the wild of people using, e.g., com.myco.spec or just foo, especially for internal purposes.  This change therefore stands a chance of needlessly breaking stuff.  

Maybe language along the lines of xs:anyURI SHOULD be a LEIRI, but any xs:string will do.
Comment 5 Michael Kay 2010-09-20 13:17:51 UTC
I'm inclined to suggest leaving xs:anyURI allowing any string, but defining a facet that allows the space to be constrained, e.g. <xs:specification value="rfc3986"/>. Too late to do this in XSD 1.1, but the spec allows vendor-defined facets so I may experiment with this in Saxon.
Comment 6 David Ezell 2010-09-24 14:41:46 UTC
The WG debated this point at some length in 2007, and my recollection is that we settled on registering the semantic of a URI in the anyURI datatype, but not testing implementations for conformance.  My opinion is that anyURI should be considered an accomodation to current practice, and not an alternative definition.
Comment 7 Mukul Gandhi 2010-09-26 04:14:30 UTC
(In reply to comment #5)
> I'm inclined to suggest leaving xs:anyURI allowing any string, but defining a
> facet that allows the space to be constrained, e.g. <xs:specification
> value="rfc3986"/>. Too late to do this in XSD 1.1, but the spec allows
> vendor-defined facets so I may experiment with this in Saxon.

May be even a pattern/assertion facet can be utilized for this purpose :)
Comment 8 Murata 2010-09-26 04:31:07 UTC
(In reply to comment #4)
> My concern with only allowing LEIRIs is that xs:targetNamespace is declared as
> xs:anyURI, but it's long been understood that any string will pretty much do. 
> So there may be cases in the wild of people using, e.g., com.myco.spec or just
> foo, especially for internal purposes.  This change therefore stands a chance
> of needlessly breaking stuff.  

Pete, 

"com.myco.spe" and "foo" is are correct URI references.  Part 2 of 
W3C XML Schema 1.0 Part 2 already allows them.

[Definition:]   anyURI represents a Uniform Resource Identifier Reference (URI). An anyURI value can be absolute or relative, and may have an optional fragment identifier (i.e., it may be a URI Reference). This type should be used to specify the intention that the value fulfills the role of a URI as defined by [RFC 2396], as amended by [RFC 2732].
Comment 9 Henry S. Thompson 2010-11-14 18:50:23 UTC
The WG asked me to write explaining our position, which is against applying any real constraint to the value space of anyURI.  Before doing so, however, I wonder if the WG has misunderstood the Murata's original request.  He cites the following phrase (from 1.0 2nd edition)

   . . . the value fulfills the role of a URI as defined
      by [RFC 2396], as amended by [RFC 2732].

But the relevant section now reads

  [Definition:]   anyURI represents an Internationalized Resource Identifier
  Reference (IRI).  An anyURI value can be absolute or relative, and may have an
  optional fragment identifier (i.e., it may be an IRI Reference).  This type
  should be used when the value fulfills the role of an IRI, as defined in [RFC
  3987] or its successor(s) in the IETF Standards Track.

  Note: IRIs may be used to locate resources or simply to identify them. In the
  case where they are used to locate resources using a URI, applications should 
  use the mapping from anyURI values to URIs given by the reference escaping 
  procedure defined in [LEIRI] and in Section 3.1 Mapping of IRIs to URIs of
  [RFC 3987] or its successor(s) in the IETF Standards Track.  This means that
  a wide range of internationalized resource identifiers can be specified when
  an anyURI is called for, and still be understood as URIs per [RFC 3986] and
  its successor(s).

It seems to me that this wording addresses Murata's concerns -- Murata, is that in fact the case?
Comment 10 C. M. Sperberg-McQueen 2010-12-17 01:52:12 UTC
For the record:  the original email from Murata Makoto is quoted in full in the bug description, and does not specify explicitly whether it is directed against XSD 1.0, XSD 1.1, or both.  The reference to section 3.2.17 and the statement that RFC 2396 is referenced seem to suggest, however, that 1.0 is intended, not 1.1:  in the then current working draft of XSD 1.1 (as now) there is no section 3.2.17, the type anyURI is discussed in section 3.3.18, and the references given are to RFC 3986 and 3987.

So perhaps the original bug report should be against 1.0, not 1.1.  But in comment 2, Murata-san addresses the design of XSD 1.1 and says, if I understand him correctly, that in XSD 1.1 the anyURI type should accept LEIRIs and no other strings.

It may be helpful to summarize some of the technical arguments brought forward at the time the WG made the decision on bug 2751, bug 2754, and bug 2755.  When I tried to formulate such a summary, it ran rather long, longer than can be read comfortably in the Bugzilla interface. So I have put the summary into an email to the www-xmlschema-comments list at

    http://lists.w3.org/Archives/Public/www-xml-schema-comments/2010OctDec/0153.html

If this issue is taken to be about XSD 1.0, I lean toward classifying it WONTFIX.

If this issue is taken to be about XSD 1.1 and the need to cite the current RFCs, I think the correct disposition is RESOLVED/FIXED.  If it's about 1.1 and the need to enforce tightly the rules in the RFCs, I regretfully lean toward classifying it WONTFIX:  such a change would not be backward compatible, since RFC 3986 and RFC 3987 are not backward compatible.  XSD 1.1 would have been finished a lot faster if we had been willing to break backward compatibility, but it's too late to change our decisions on that now.
Comment 11 C. M. Sperberg-McQueen 2010-12-17 02:49:35 UTC
In private email, Murata Makoto asks me to post the following comments on his behalf, since owing to  technical difficulties he cannot currently post to Bugzilla himself.

    I'm sorry for my unclear bug report.  I'm afraid that I do not remember
    my original intention, but I am now concerned about 1.1.

    Some XML documents heavily contains IRIs.  For example, Atom feeds
    contain a lot of IRIs.  If we allow any string for xsd:anyURI, validaitors 
    will not validate such IRIs at all.  Will people then migrate from WXS Part
    2 1.0 to WXS Part2 1.1?  I probably won't.
Comment 12 C. M. Sperberg-McQueen 2011-05-09 12:48:21 UTC
When the WG discussed this issue again on 17 December 2010 I was asked to draft a user-defined type definition to illustrate the possibility of using the pattern facet to enforce the rules of RFC 3986, as suggested by Mukul Gandhi in comment 7.  This has taken longer than hoped to reach the top of my to-do list, but two schema documents defining URI and IRI types using patterns are now available in the directory 

  http://www.w3.org/2011/04/XMLSchema/
  (world-accessible resource)

The schema document http://www.w3.org/2011/04/XMLSchema/TypeLibrary-URI-RFC3986.xsd defines types based on xsd:anyURI with patterns that require the literal to be legal according to the RFC 3986 definitions of 'URI', 'URI-ref', 'absolute-URI', and 'relative-ref'.  The schema document http://www.w3.org/2011/04/XMLSchema/TypeLibrary-IRI-RFC3987.xsd does analogous work based on the grammar of RFC 3987; it defines types called IRI-3987, absolute-IRI-3987, relative-reference-3987, and IRI-reference-3987.  It is this last which most users who want IRI validation should use unless they know what they are doing and know that they want one of the others.

The testframe.xsd and testframe.xml documents in the same directory illustrate the application of the datatypes to various strings some of which match the relevant grammars and some of which do not.  So, for example, the IRI-reference-3987 type accepts the string 

  http://r&#xE9;sum&#xE9;.example.org

and rejects the string 

  //2/-:)z254/:2a2$::25[v42.42.42.42:AA:]3 

Further work that may be done when time allows (not soon, probably) may include definition of similar types for the grammars of RFC 2396 and other earlier definitions of URIs and IRIs.  In due course the WG will prepare and publish a new version of the type library at http://www.w3.org/2001/03/XMLSchema/ incorporating these new datatypes.

As the discussion of this issue has shown (both in the comments here and in the long technical discussions summarized in the email mentioned in comment 10) there is no unanimity in the community about what form of checking should be done for URIs and IRIs.  Type definitions like those in the schema documents mentioned above show how users can control their own destiny and get the validation they need for their particular applications.  Those who want to be careful about namespace names, for example, will want relative references to be caught as errors, so they will want to use type absolute-IRI-3987 and not type IRI-reference-3987.  

And of course implementations can always check the syntactic correctness of anyURI values as a service to their users; failure to match the grammar of the RFC isn't a well defined type error in XSD 1.0, and it's clearly defined as NOT a type error in XSD 1.1, but that doesn't mean it can't be mentioned in a message to the user.

Since the WG is firm in its decision not to change the text of XSD 1.1 as suggested here and hopes that the user-defined types described above will show how users can get whatever form and level of validation is appropriate to their situation, I am marking this issue RESOLVED / WONTFIX.  I am sorry that the WG has not found a way to make all interested parties happy.

Murata-san, as the originator of the issue you are asked to mark this issue CLOSED, thus indicating that you are satisfied with the working group's efforts to resolve the issue (even if not happy with the final result) and are willing to accept the decision.  Or alternatively you may choose to REOPEN the issue, thus indicating that you do not believe the working group has made a sufficient effort to resolve the issue, that you refuse to accept the outcome, and that if necessary you wish to appeal the decision of the WG to the director of the W3C.

If we do not hear from you within a period of two weeks, we will assume that you are willing to accept the outcome.