This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 26405 - URL terminology for other specs
Summary: URL terminology for other specs
Status: RESOLVED FIXED
Alias: None
Product: WHATWG
Classification: Unclassified
Component: URL (show other bugs)
Version: unspecified
Hardware: All All
: P2 normal
Target Milestone: Unsorted
Assignee: Anne
QA Contact: sideshowbarker+urlspec
URL:
Whiteboard:
Keywords:
: 27528 (view as bug list)
Depends on:
Blocks: 26402
  Show dependency treegraph
 
Reported: 2014-07-22 16:51 UTC by Ian 'Hixie' Hickson
Modified: 2015-08-15 16:54 UTC (History)
6 users (show)

See Also:


Attachments

Description Ian 'Hixie' Hickson 2014-07-22 16:51:58 UTC
What do we call a string that will be interpreted as a Web page address?

What do we call the conceptual object consisting of the post-parse components of a Web page's address, obtained from parsing the aforementioned string type, and which can be serialised into the aforementioned string type?
Comment 1 Anne 2014-07-30 13:50:13 UTC
What's wrong with calling them both URL and letting the context make sense of it? It seems like we do that for a lot of these things, such as HTML and WebVTT.
Comment 2 Ian 'Hixie' Hickson 2014-07-30 21:32:43 UTC
It causes a lot of confusion for HTML. I recommend strongly against it.

At least with HTML we have terms that mean the various specific subparts — the DOM, the text/html stream, Documents, Elements vs tags, etc. Right now, I get really confused when reading or talking about the URL spec, because I can never figure out if I'm talking about the string or the parts.
Comment 3 James Graham 2014-08-05 13:45:35 UTC
"URL string" and "URL object"?
Comment 4 Ian 'Hixie' Hickson 2014-09-04 22:58:29 UTC
"URL object" sounds like it refers to the JS Object implementing the IDL interface "URL", which is a thing. "URL string" would be ok, but people really refer to URL strings as just "URL", so I think it would sound stilted.

In practice, I've found that I rarely refer to the "parsed URL", because I usually refer to a "URL" (the string) and "the /path/ component resulting from _parsing_ the URL" (or similar). It's mostly in the URL spec itself that there's a URL object (or URL record, to use JS-like terminology) that is widely referenced.
Comment 5 Anne 2014-09-08 08:48:00 UTC
URL record and URL string would work too...

Note that what I want specifications to pass around are URL records. This is more important now than before because they carry a pointer to an object (for now just Blob objects) which needs to survive. So once we have parsed a URL string, we should only operate on the resulting URL record to ensure that is passed to Fetch eventually.
Comment 6 Ian 'Hixie' Hickson 2014-09-18 21:44:57 UTC
I don't think it is realistic to expect URL records to be what is passed around all the time. It's too different from what authors will do (e.g. you can't put a record in an attribute). What would it mean for e.g. srcset="", for example? Or <a href="">? Also, many places that use URLs have relative URLs which need to be reparsed at certain times to take into account new base URLs (e.g. again, <a href="">).
Comment 7 Anne 2014-09-19 08:09:21 UTC
If we are not passing around records/objects, blob URLs won't work.

The idea for srcset is that when it is set, it is parsed and the records stored on the associated object. Then, when time comes to invoke Fetch, the appropriate record is selected and passed on.

I would expect something similar for <a>.
Comment 8 Glenn Maynard 2014-09-19 13:56:06 UTC
We've discussed this already at https://www.w3.org/Bugs/Public/show_bug.cgi?id=17765, including what this would mean for srcset.  Parse it at the time the attribute is set and store the parsed URL internally with the URL.

I'm not sure what the right model is for the reparsing issue.  It's not incompatible with why blob URLs need to be parsed early, since blob URLs are never relative, but it would be pretty annoying to have to say "reparse the URL unless its scheme is blob".  It would mean the early-parsed URL could never really be used for anything but blobs.
Comment 9 Ian 'Hixie' Hickson 2014-09-24 20:57:35 UTC
(In reply to Anne from comment #7)
> [...] blob URLs won't work.

I've been saying _that_ for a long time.

I don't think this "store a record" model is viable. It would mean that things like:

   img.src = img.src

...would have highly unintuitive side-effects. It would mean that <picture>'s implementation model is way more complicated than it already is. It would have all kinds of problems. URLs are strings, fundamentally.

If this is a problem for blobs, then the problem is with blobs.
Comment 10 Glenn Maynard 2014-09-24 22:58:27 UTC
(In reply to Ian 'Hixie' Hickson from comment #9)
> (In reply to Anne from comment #7)
> > [...] blob URLs won't work.
> 
> I've been saying _that_ for a long time.

(Removing text from quotes in a way that changes its meaning from what the person actually said is poor practice, even if everyone reading knows what you mean.  People reading quickly or reading individual comments will think that Anne said something he didn't.)

> I don't think this "store a record" model is viable. It would mean that
> things like:
> 
>    img.src = img.src
> 
> ...would have highly unintuitive side-effects. It would mean that
> <picture>'s implementation model is way more complicated than it already is.

Unintuitive is much better than unspecified or nondeterministic, and we have both of those today.  I don't know anything about <picture> (except to ask the question which I'm sure everybody else is asking: why would we need another image element?)

I think there's no comparison in this particular case: it's far better for "img.src = img.src" in a script to do the same odd thing everywhere ("blob-fed images disappear if the blob URL is no longer registered) than to have them disappear in one browser and not another, or disappear on fast computers but not slow computers, or disappear only if the blob URL isn't in use somewhere else, or whatever other unpredictable nonsense we'll get if we don't nail this down.

> It would have all kinds of problems. URLs are strings, fundamentally.

This doesn't change URLs, it changes what APIs do when they're handed a URL.

> If this is a problem for blobs, then the problem is with blobs.

They're a part of the platform which is widely used and catastrophically badly specified, and this is the only approach I've seen that even comes close to fixing that.  It also makes auto-revoke work, which avoids the huge problems of manual revocation.  Everyone's open to other fixes if there are any, but as far as I can tell this is the best we've got.
Comment 11 Ian 'Hixie' Hickson 2014-09-25 22:26:06 UTC
Sorry, I didn't mean to convey a different meaning. I don't mean that blob URLs won't work in general, I mean that they won't work if we don't the whole thing with URL records and so on. I think the whole approach we're taking with blob URLs is wrong, precisely because of the issue Anne mentioned.


> I don't know anything about <picture> (except to ask the question which I'm sure 
> everybody else is asking: why would we need another image element?)

Apparently you and I are the only ones asking this question. :-(
Comment 12 Glenn Maynard 2014-09-25 22:54:58 UTC
(In reply to Ian 'Hixie' Hickson from comment #11)
> Sorry, I didn't mean to convey a different meaning. I don't mean that blob
> URLs won't work in general, I mean that they won't work if we don't the
> whole thing with URL records and so on. I think the whole approach we're
> taking with blob URLs is wrong, precisely because of the issue Anne
> mentioned.

I didn't see Anne mention an issue, he just said that blob URLs won't work without this.  (Well, they will, since they are today, they'll just work inconsistently, like any other poorly-specified feature...)  I understand the unintuitive cases, but I think there will be less with this than without it, and unexpected behavior seems better than undefined behavior (which is also unexpected).

The semantic I was after here was: "if you hand a blob URL to an API, and the blob URL is valid at the time you hand it to the API, it'll work, even if under the hood we're not making use of the URL until after the revoke".  That gives both users and APIs a simple rule to understand how this works, and users don't have to care about different APIs performing fetches at different times.

This doesn't work everywhere: I think APIs that really don't want to parse out what they're given synchronously are a lost cause with blob URLs.  The big one is CSS, and maybe innerHTML too.

Anyhow, maybe you'll find another idea.  It just feels like the only alternative to this is leaving things as they are and letting it keep deteriorating, and then having a much harder problem to solve in the future...
Comment 13 Anne 2014-12-06 14:22:58 UTC
*** Bug 27528 has been marked as a duplicate of this bug. ***
Comment 14 Ian 'Hixie' Hickson 2014-12-22 23:54:33 UTC
"at the time you hand [something] to the API" is not "a simple rule to understand".
This is especially the case given that the "something" here is just a string, which can be concatenated with other strings, kept in author script before being passed to the platform, stored in attributes, comments, or other out-of-band contexts, sent to the server, stored in databases, sent back from the server, etc.

I think it would be better to drop blob: entirely than require such an API. Something like srcObject, which we now have for media elements, is much simpler and saner.
Comment 15 Anne 2014-12-23 08:39:31 UTC
Yes, that was my position, but the blob URL crowd won :-/ I don't think we can take it back now.
Comment 16 Anne 2015-08-15 16:54:26 UTC
To address comment 0, I have introduced "URL string" and "URL record" as disambiguation terms. You can now also reference model and syntax productions separately from other specifications.

If further blob URL discussion is warranted we should probably do that elsewhere.

https://github.com/whatwg/url/commit/656b803201027c022e3603c5b2b4d4fa498bc911