25946 – Rewrite URL parser

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25946 - Rewrite URL parser

Summary: Rewrite URL parser

Status:	RESOLVED WONTFIX

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	URL (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Sam Ruby
QA Contact:	sideshowbarker+urlspec

URL:
Whiteboard:
Keywords:

Depends on:	27236
Blocks:	26341 26362 26431 27233 27234 27289 27518
	Show dependency tree / graph

Reported:	2014-06-02 11:58 UTC by Anne
Modified:	2017-05-25 05:51 UTC (History)
CC List:	4 users (show)

See Also:

Attachments

Description Anne 2014-06-02 11:58:45 UTC

The parser should exist out of a clearer set of components. Simon hacked together a great start in Rust. Take that and build on it.

Comment 1 Simon Sapin 2014-07-19 16:01:27 UTC

https://github.com/SimonSapin/rust-url now has a complete parser (except for IDNA). All issues that I know of are filed in this Bugzilla. Other than these, I believe this code to be equivalent to the algorithm in the spec.

Comment 2 Anne 2014-07-28 08:17:36 UTC

Is the API implemented? As there's quite a few alternating code paths just for the API.

Comment 3 Simon Sapin 2014-07-28 08:48:15 UTC

There is a UrlUtils trait that implements the setters defined in the spec: 

https://github.com/servo/rust-url/blob/f27c9e691d3b2db21650be7603e84b52d1cdac20/src/url.rs#L835

I didn’t bother with the getters since they are fairly straightforward. (This trait is private because I’d rather not expose this API to rust code. For now it exists only to demonstrate feasibility, though at some point it might be used in Servo.)

Comment 4 Sam Ruby 2014-10-21 15:25:35 UTC

Another potential source for inspiration: http://intertwingly.net/blog/2014/10/21/pegurl-js .  If there is agreement that this seems promising, I'm willing to do the work and submit pull requests.

Comment 5 Sam Ruby 2014-10-22 14:57:41 UTC

In addition to rewriting the parser to be a series of functions that return values, I propose following the lead of https://github.com/whatwg/streams/tree/master/reference-implementation and committing the reference implementation (in this case, written in ES5) into the same repository.

Comment 6 Sam Ruby 2014-10-24 11:57:43 UTC

To help further discussion: http://intertwingly.net/projects/pegurl/url.html

Comment 7 Sam Ruby 2014-10-27 16:28:48 UTC

Spec and reference implementation source now checked into github:

git clone https://github.com/rubys/url; cd url; git checkout peg.js; make

(before running make, `head -10 make` might be useful)

For more information, see: https://github.com/rubys/url/blob/peg.js/README.md

Comment 8 Anne 2014-11-04 16:53:14 UTC

Sam, unless Simon thinks otherwise your approach seems like the way to go to me. We'll need to carefully compare it with the existing text and make sure it handles all the cases properly of course. If that all works out, we'll have something that is much more readable.

Comment 9 Simon Sapin 2014-11-04 18:11:01 UTC

I haven’t looked into the details, but I like the general approach in https://github.com/rubys/url of specifying the parser.

Comment 10 Sam Ruby 2014-11-05 11:34:28 UTC

Thanks!

I've said this elsewhere, but I'll repeat it here for visibility.

My PEG.js effort started out as a spike in the agile programming sense of the word.  I did test driven development using urltestdata.txt.  When I hit a problem, I consulted the URL Standard for guidance.  When I couldn't determine my answer there, I consulted other implementations (first rust-url and later galimatias).

Any deviations that remain is therefore likely due a missing test.

If there are any deviations, I would appreciate tests that exhibit the problem, as my workflow is: add a test, see it fail; fix reference implementation, get it working; fix spec prose, publish.

Comment 11 Sam Ruby 2014-12-04 21:39:20 UTC

Progress on the merge can be seen here: https://specs.webplatform.org/url/webspecs/develop/ ; issues related to the merge can be found at https://github.com/webspecs/url/issues

Once the merge has been verified as complete, this work will be pulled into the whatwg repository and published with a traditional whatwg specification look and feel.

Comment 12 Anne 2015-08-13 08:21:45 UTC

Since this report was filed there has been at least one new implementation based on the existing parser. The proposed replacement had a number of issues reported against it that were not resolved. And I don't really want to put effort into rewriting the parser into a more method-based style. Closing this for now.