This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26553 - Ambiguous statements in basic URL parser
Summary: Ambiguous statements in basic URL parser
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: PC Linux
: P2 minor
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-08-11 12:48 UTC by Brandon Ross
Modified: 2014-08-12 17:15 UTC (History)
1 user (show)

See Also:


Attachments

Description Brandon Ross 2014-08-11 12:48:41 UTC
http://url.spec.whatwg.org/#parsing

[[
If url's scheme is not "file", or c is not an ASCII alpha, or remaining does not start with either ":" or "|", or remaining does not consist of one code point, or remaining's second code point is not one of "/", "\", "?", and "#", then set url's host to base's host, url's port to base's port, url's path to base's path, and then pop url's path. 
]]

There are some ambiguous statements in this sentence.

1) "...remaining does not consist of one code point..."

Does this mean that remaining must be exactly one code point? If so, the rest of checks make no sense, since there cannot be a second code point. Does it then mean that remaining has at least one code point? Then it's just checking if the string is not empty, which the previous checks will catch.

2) "...and then pop url's path."

What does this mean? Remove the first segment? Remove the last segment? The term "pop" is only meaningful in the context of a stack, and is meaningless as the name of an operation on a general purpose list.

I tried to make sense of this part in the context of the whole parsing algorithm, but the algorithm itself is very hard to follow. It could benefit from some example of what types of inputs will lead to particular states, or some additional explanation of what each state's purpose is. As it is, I have no idea what sort of cases this particular check is supposed to be detecting.
Comment 1 Anne 2014-08-12 11:47:33 UTC
(In reply to Brandon Ross from comment #0)
> 1) "...remaining does not consist of one code point..."
> 
> Does this mean that remaining must be exactly one code point? If so, the
> rest of checks make no sense, since there cannot be a second code point.

Why? It uses OR, not AND.


> 2) "...and then pop url's path."
> 
> What does this mean? Remove the first segment? Remove the last segment?

The last segment, as in JavaScript's [].pop(). I can clarify that.
Comment 2 Brandon Ross 2014-08-12 15:49:26 UTC
(In reply to Anne from comment #1)
> Why? It uses OR, not AND.

Exactly. It's saying "if remaining has either 0 or 2+ code points, the whole statement is true". Then there's an additional check that looks at the second character of remaining, but the only way we'd need to perform that check is if there's exactly 1 code point (since 0 or 2+ code points is covered), which means there is no second code point to check. Is it supposed to read "remaining consists of only one code point"?

> The last segment, as in JavaScript's [].pop(). I can clarify that.

Okay, I figured from later parts of the document. It's different in other languages, though; Java's LinkedList.pop() removes the first element.