This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 27640 - Define canonicalization
Summary: Define canonicalization
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-17 20:43 UTC by Sam Ruby
Modified: 2015-08-14 07:39 UTC (History)
4 users (show)

See Also:


Attachments

Description Sam Ruby 2014-12-17 20:43:48 UTC
At the moment, this is listed in the goals, but nowhere else.  The implicit assumption is that output of parsing is a canonical URL.  If so, that should be stated explicitly.

See also: https://annevankesteren.nl/2012/09/url-equivalence

This bug was opened on behalf of Larry Masinter, based on:

https://github.com/webspecs/url/issues/18
Comment 1 Sam Ruby 2014-12-19 12:27:27 UTC
See also: https://github.com/webspecs/url/issues/20, where in addition to "same as", there is a request for "same origin" and "subsumes".
Comment 2 Larry Masinter 2014-12-21 16:52:57 UTC
 3986 and 3987 both talk about how to compare two URIs, two URIs, for equivalence for various purposes. Neither is entirely accurate or corresponds to current implementation. Every URL processor implicitly defines an equivalence relationship -- two URLs are X-equivalent if they have the same result when processed by X, but comparison generally is predictive, and sometimes being conservative means avoiding false positives and sometimes means avoiding false negatives. comparison. 3986 and 3987 both talk about how to compare two URIs, two URIs, for equivalence for various purposes. Neither is entirely accurate or corresponds to current implementation. Every URL processor implicitly defines an equivalence relationship -- two URLs are X-equivalent if they have the same result when processed by X, but comparison generally is predictive, and sometimes being conservative means avoiding false positives and sometimes means avoiding false negatives. For example, cache invalidation should prefer false positives over false negatives (invalidate if in doubt), while security questions likely prefer false negatives (when in doubt, treat as different origins).

Any API for comparison needs to be explicit (or parameterized) about the context presumed (equivalent for what purpose) and how it handles edge cases, illegal values, etc.
Comment 3 Anne 2015-08-14 07:39:17 UTC
https://github.com/whatwg/url/commit/3f0bc8b84d2f3bdb651207ce0f90c659c3a5a573

There's probably still scope for normalization operations that operate on percent-encoded bytes, but for now this seems sufficient and in line with implementations.