Bug 20148 - URLQuery interface does not handle query parameter ordering
URLQuery interface does not handle query parameter ordering
Status: RESOLVED WORKSFORME
Product: WHATWG
Classification: Unclassified
Component: URL
unspecified
PC Windows NT
: P2 normal
: Unsorted
Assigned To: Anne
sideshowbarker+urlspec
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-11-29 04:24 UTC by Simon Kaegi
Modified: 2012-11-30 21:58 UTC (History)
4 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Simon Kaegi 2012-11-29 04:24:14 UTC
For good or bad there exist a number of back-end systems that rely on a specific ordering of query parameters. The URLQuery abstraction seems to try to treat the query like a property bag so ordering is lost and multiple value handling is awkward.

If a "query" abstraction is still seen as desirable an alternate way of looking at things might be to treat the query as an ordered set of URLQueryParameter(s)

Something like:

interface URLQueryParameter {
  attribute DOMString name; //decoded
  attribute DOMString? value; //decoded
}

interface URLUtils {
  ...
  attribute URLQueryParameter[]? query;
}
---
1) http://localhost/a/path
url.query -> null;
2) http://localhost/a/path?
url.query -> []
3) http://localhost/a/path?a=b&c%20d=e%20f&g=&h&a=newb
url.query ->[
  {name:"a", value:"b"},
  {name:"c d", value:"e f"},
  {name:"g", value=""},
  {name:"h", value=null}, //or possibly value is not present at all
  {name:"a", value="newb"}
]

The idea being that the "query" attribute would be get/set friendly and kept in synch with "search".
Comment 1 Glenn Maynard 2012-11-29 04:39:49 UTC
Ordering isn't lost; it's just not exposed by the "get" method.  getAll lets you access the less common cases: repeated keys, significant ordering, and when you have elements that aren't key=value.  (A system I've used now and then is "http://url.com?foo/bar&x=y", where the first parameter is treated like a path.)
Comment 2 Simon Kaegi 2012-11-29 05:07:35 UTC
(In reply to comment #1)
> Ordering isn't lost; it's just not exposed by the "get" method.  getAll lets
> you access the less common cases: repeated keys, significant ordering, and
> when you have elements that aren't key=value.  (A system I've used now and
> then is "http://url.com?foo/bar&x=y", where the first parameter is treated
> like a path.)

Hmm... perhaps I'm misunderstanding the intent of the api.

I can see how "getAll" supports repeated keys and perhaps value ordering for the repeated key case but not overall ordering. e.g. How do I use getAll to get the first query parameter?

Also, when outputting a URL how do I control parameter ordering for serialization. This matters (I believe) because http://url.com?a=b&c=d and http://url.com?c=d&a=b are treated as different URLs for caching purposes.
Comment 3 Glenn Maynard 2012-11-29 05:22:47 UTC
It's not defined in the spec yet (the interface is just a stub at the moment), so I can only say what I expect it'll do, but I expect getAll would return an array of the query portion of the URL, split on "&".  The first query parameter would be url.query.getAll()[0].
Comment 4 Simon Kaegi 2012-11-29 06:00:32 UTC
re: just a stub etc. fair enough

My interpretation of "getAll" is that is identical to "get" except that it returns an array/sequence of values (to handle the repeated key case) associated with the parameter "name".

e.g.
For http://localhost?a=b&c=d&c=e

url.getAll("a") -> ["a"]
url.getAll("c") -> ["d", "e"]
url.getAll("z") -> []
url.getAll() -> TypeError

Essentially
url.get(x) === url.getAll(x)[0]

--

So at least by my interpretation "getAll" does not cut it for query parameter ordering and the bug is valid.
Comment 5 Anne 2012-11-29 09:15:32 UTC
I do not understand why that would not cut it. Also, any examples of the systems mentioned in comment 0?
Comment 6 Anne 2012-11-29 13:19:12 UTC
To explain a bit more. There will be an iterator at some point. And set() will be made to work to accept multiple values. And we'll add either append() or add() to take care of just adding a new parameter to the end.

But it should remain very simple for the common case. Which is no duplicate parameter keys. So you'd just use get() and set() with single values.
Comment 7 Simon Kaegi 2012-11-30 19:33:13 UTC
(In reply to comment #5)
> I do not understand why that would not cut it.

Ok, I see you're doing changes in the spec now. Thanks Anne.
I have a URL shim that I'll update and follow along with... it's here for the moment -- https://github.com/eclipse/orion.client/blob/master/bundles/org.eclipse.orion.client.core/web/orion/URL-shim.js (New BSD license)
At the moment I have nothing for URLQueryUtil but will add it in the next few days.

> Also, any examples of the systems mentioned in comment 0?
I was able to find many examples where order mattered for a repeated query parameter including the project I'm currently working on (Eclipse Orion). In the Java Servlet API ServletRequest.getParameter(name) will get just the first parameter so if adding an additional parameter can change relative order this can cause problems.

I do not have any concrete examples of systems where single parameter order has an impact. So... 
1) http://localhost?a=1&b=2
2) http://localhost?b=2&a=1
... are generally treated the same in terms of behavior to the backend systems. A general pattern is that a backend using an HTTP library is probably fine. Direct use of a regular expression on the query string might be problematic but is arguably a bug. 

With that said, these are still different URLs so system like squid (if they cache query parametered URLs) and traffic analyzers will treat them differently. Again Orion is an example of a system where both are caching and analysis is affected by query order differences.
Comment 8 Anne 2012-11-30 19:41:43 UTC
FYI http://url.spec.whatwg.org/#interface-urlquery now reflects the final design. set() only changes a single value, the first. If you want to effect multiple name-value pairs you will have to use a combination of delete() and append().

get() returns the first value for a given name, getAll() returns all values for a given name.

You might want to have a look at https://github.com/annevk/url for a URL parser in JavaScript. I haven't updated it to match the specification yet though, I wrote it to base the specification on.

I'm going to mark this bug WORKSFORME. Specific bugs for serialization and comparison APIs are welcome provided there are good enough use cases. (There's a vague plan of adding URL comparison to the API and serialization might make sense for URLQuery I suppose, but it's not really concrete at this stage.)
Comment 9 Simon Kaegi 2012-11-30 21:45:56 UTC
How do I get the set/multi-set of parameter names?
Comment 10 Anne 2012-11-30 21:58:00 UTC
That's waiting for bug 20019 to be fixed.