This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 4665 - Clarify URI equivalence in reference to RFC 3986
Summary: Clarify URI equivalence in reference to RFC 3986
Status: RESOLVED FIXED
Alias: None
Product: SML
Classification: Unclassified
Component: Interchange Format (show other bugs)
Version: unspecified
Hardware: Macintosh All
: P2 normal
Target Milestone: Second draft
Assignee: Kumar Pandit
QA Contact: SML Working Group discussion list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-06-19 20:39 UTC by C. M. Sperberg-McQueen
Modified: 2007-09-26 21:22 UTC (History)
0 users

See Also:


Attachments

Description C. M. Sperberg-McQueen 2007-06-19 20:39:36 UTC
Section 3.3.1 defines URI equivalence with an appeal to RFC 3986:

   To determine whether two URIs are equivalent, consumers 
   MUST use the definition of URI equivalence given by RFC 
   3986 [IETF RFC 3986]. 

Examining the RFC, one doesn't find a single definition of 
equivalence, but a "ladder" of increasingly aggressive
normalizations.  It's not clear whether level of equivalence
defined in 3986 is intended to be used by SML-IF impelementations.
I think the reference needs to be more specific about what 
concept of 3986 is being appealed to.
Comment 1 Kumar Pandit 2007-09-20 03:28:59 UTC
added definition in section 3.4.1 URI equivalence:

To determine whether two URIs are equivalent, consumers MUST perform case sensitive simple string comparison based on codepoint-by-codepoint comparison of the corresponding characters in the URIs.
Comment 2 Kumar Pandit 2007-09-20 03:30:36 UTC
The current definition is based on the following proposal sent to the WG earlier. The only change is that the current definition uses case sensitive comparison instead.
  
Proposal: 
Uri equivalence in SML-IF should be defined as case insensitive simple string comparison based on codepoint-by-codepoint comparison of the corresponding characters in the uri. 
  
Justification: 
1.        Performance: Simple string comparison provides highest performance. Although it is true that two aliases of the same uri may not compare as equal without normalization, the problem does not exist in the specific context of an SML-IF producer. This is because, when a producer is writing out an SML-IF document, it can apply normalizations (if necessary) such that a given uri always appears in the same way. This allows consumers to perform fast string comparison without needing to perform any type of normalization. 

RFC 3986 section 2 (Comparison Ladder) describes many different forms of normalizations (syntax-based/case/percent-encoding/path-segment/scheme-based/protocol-based). If we want a consumer to perform normalizations, we not only make a consumer less efficient but also need to add very specific normalization step definitions in the SML-IF spec. On the other hand, if we leave the burden of normalization to the producer, we can keep the SML-IF spec much simpler and allow consumers to be more efficient. This way the spec does not need to talk about any specific comparison ladder step(s) to be performed by a producer. The producer is free to apply any (or none) normalization steps as long as it knows it will write a given uri in the same format. 
2.        Precise definition: RFC 3986 section 6.2.1 (Simple String Comparison) discusses issues involved in performing a string comparison but does not provide a precise definition of how the comparison must be performed. In other words, it leaves some room for interpretation. We should avoid this by presenting an unambiguous definition based on that discussion. 
  
Comment 3 Kumar Pandit 2007-09-26 21:22:58 UTC
The original proposal was for case insensitive comparison. In the conf call on 9/20/2007, the WG indicated preference for case sensitive comparison. The proposal was then verbally changed to reflect that opinion. 

Thus, the resultant proposal as discussed during the 9/20 conf call was:

Uri equivalence in SML-IF should be defined as case sensitive simple string
comparison based on codepoint-by-codepoint comparison of the corresponding
characters in the uri. 

The WG reached consensus on this proposal on 9/20. The spec has already been updated to reflect this.