This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.
The parser should exist out of a clearer set of components. Simon hacked together a great start in Rust. Take that and build on it.
https://github.com/SimonSapin/rust-url now has a complete parser (except for IDNA). All issues that I know of are filed in this Bugzilla. Other than these, I believe this code to be equivalent to the algorithm in the spec.
Is the API implemented? As there's quite a few alternating code paths just for the API.
There is a UrlUtils trait that implements the setters defined in the spec: https://github.com/servo/rust-url/blob/f27c9e691d3b2db21650be7603e84b52d1cdac20/src/url.rs#L835 I didn’t bother with the getters since they are fairly straightforward. (This trait is private because I’d rather not expose this API to rust code. For now it exists only to demonstrate feasibility, though at some point it might be used in Servo.)
Another potential source for inspiration: http://intertwingly.net/blog/2014/10/21/pegurl-js . If there is agreement that this seems promising, I'm willing to do the work and submit pull requests.
In addition to rewriting the parser to be a series of functions that return values, I propose following the lead of https://github.com/whatwg/streams/tree/master/reference-implementation and committing the reference implementation (in this case, written in ES5) into the same repository.
To help further discussion: http://intertwingly.net/projects/pegurl/url.html
Spec and reference implementation source now checked into github: git clone https://github.com/rubys/url; cd url; git checkout peg.js; make (before running make, `head -10 make` might be useful) For more information, see: https://github.com/rubys/url/blob/peg.js/README.md
Sam, unless Simon thinks otherwise your approach seems like the way to go to me. We'll need to carefully compare it with the existing text and make sure it handles all the cases properly of course. If that all works out, we'll have something that is much more readable.
I haven’t looked into the details, but I like the general approach in https://github.com/rubys/url of specifying the parser.
Thanks! I've said this elsewhere, but I'll repeat it here for visibility. My PEG.js effort started out as a spike in the agile programming sense of the word. I did test driven development using urltestdata.txt. When I hit a problem, I consulted the URL Standard for guidance. When I couldn't determine my answer there, I consulted other implementations (first rust-url and later galimatias). Any deviations that remain is therefore likely due a missing test. If there are any deviations, I would appreciate tests that exhibit the problem, as my workflow is: add a test, see it fail; fix reference implementation, get it working; fix spec prose, publish.
Progress on the merge can be seen here: https://specs.webplatform.org/url/webspecs/develop/ ; issues related to the merge can be found at https://github.com/webspecs/url/issues Once the merge has been verified as complete, this work will be pulled into the whatwg repository and published with a traditional whatwg specification look and feel.
Since this report was filed there has been at least one new implementation based on the existing parser. The proposed replacement had a number of issues reported against it that were not resolved. And I don't really want to put effort into rewriting the parser into a more method-based style. Closing this for now.