17765 – APIs need to take a reference to blob data underlying object URLs

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 17765 - APIs need to take a reference to blob data underlying object URLs

Summary: APIs need to take a reference to blob data underlying object URLs

Status:	RESOLVED FIXED

Alias:	None

Product:	WHATWG
Classification:	Unclassified
Component:	URL (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P2 normal
Target Milestone:	Unsorted
Assignee:	Anne
QA Contact:	sideshowbarker+urlspec

URL:
Whiteboard:
Keywords:

Duplicates (1):	22526 (view as bug list)
Depends on:
Blocks:

Reported:	2012-07-12 23:04 UTC by Glenn Maynard
Modified:	2014-05-22 14:37 UTC (History)
CC List:	10 users (show)

See Also:

Attachments

Description Glenn Maynard 2012-07-12 23:04:00 UTC

This is the remaining portion of the autoRevoke blob URL feature.  APIs that take URLs and then operate on them asynchronously need to synchronously take a reference to the underlying blob data before returning.

For example, when you say "img.src = URL.createBlobURL(blob)", the image fetch doesn't necessarily begin immediately.  It may not happen until well after the script returns.  Currently, that means that by the time the fetch begins, the URL would no longer exist, because it's released as soon as the script returns.

To fix this, "img.src = blobURL" needs to take a reference to the underlying blob data before the assignment returns.  Then, all fetches that would normally operate on the @src URL actually take place on the blob data.

For example, using XHR2 as an example, "Associate blob data with *url*" would be added as a step after 6 (after resolving the URL).  The "associate blob data with url" algorithm would look up the Blob associated with the URL, and associate the underlying data of that blob to it (or do nothing if it's not a blob URL).  That way, the blob data tags along with the URL when it's lter sent to fetch (when send is called).

(In case this isn't clear, this is treating XHR2's "url" property as a string-like object with a property hanging off it, and this wouldn't be visible to scripts.  It's essentially shorthand for passing a (url, blob) tuple to fetch.)

As a side-effect, this also prevents "img.src = URL.createObjectURL(blob); blob.close();" from being nondeterministic.  Currently, it depends on what stage the image update was at.

One other note: all APIs should attempt to take this reference at the time the URL enters the API (eg. when xhr.open is called), not at some later point (like xhr.send).  That is, this should still always work (for any API) without caring if the caller gave you a blob URL:

function openURLLater(url)
{
    xhr.open(url);
    setTimeout(function() { xhr.send(); }, 1000);
}

Comment 1 Arun 2012-07-13 15:23:35 UTC

This bug is well logged, and it's good to have it as a tracking bug, but I'm not sure that the answer lies in File API.  Of course, a Note would be useful in File API, and I'm definitely going to add one.  

I'm worried that similar bugs to this must be logged against ALL affiliated specifications: XHR, HTML5, MediaStream, etc.  We could use this bug as a tracking bug for those.

Comment 2 Glenn Maynard 2012-07-13 16:30:42 UTC

The algorithm involves File API, since the definition involves Blob, but the actual definition might belong in HTML.  You can reassign wherever it makes sense, but the autoRevoke feature is incomplete until this is resolved.

I do think bugs should be filed against other affected specs, but not until the algorithm exists for them to hook into.

Comment 3 Anne 2012-07-13 18:54:50 UTC

Just to make sure I understand this correctly: URL.createBlobURL(blob) creates a blob URL with an identifier. The identifier is tied to a revoke flag and origin.

What if we add a "revoke from now on flag" to that.

Then once the URL parser is adequately defined blob URLs have some post-parsing steps that set the revoke flag if the revoke from now on flag is set and otherwise set the revoke from now on flag.

That would mean that:

xhr.open("GET", blob) // can be fetched
img.src = blob // fails

Parsing is always done when setting so tying it to that makes the most sense I think. It also means not all the specifications will have to start special casing for blob URLs. That would be a big mess.

Comment 4 Glenn Maynard 2012-07-13 19:05:34 UTC

(In reply to comment #3)
> xhr.open("GET", blob) // can be fetched
> img.src = blob // fails

Both of these succeed.  You can use the URL as many times as you like (including zero).  The URL is revoked when the script returns to the browser, the next time "provide a stable state" happens.

This ticket is about something else: whenever an API receives a blob URL that it will use later asynchronously, it needs to grab a reference to the underlying blob data immediately, so the reference is still available when the async load begins (even though the blob URL no longer exists).

It might be possible to do this in resolve, attaching the blob data to result.  We'd need to make sure that all APIs resolve synchronously ("update the image data" and XHR do, at least); doing it in a queued task or async section would be too late.

Comment 5 Anne 2012-07-13 19:25:02 UTC

I believe parsing always happens synchronously because it can fail.

The only way I can see this work is by tying some kind of flag to the identifier that is set after the task in which the associated blob URL is being parsed and anytime after that when the same blob URL is being parsed the result ends up being about:invalid or some such (about:invalid is a URL representing a network error). Because if it would still be the same URL there would be no way to distinguish it when you start fetching the thing. (The blob data would be stored after parsing along with the identifier.)

Comment 6 Glenn Maynard 2012-07-13 19:29:54 UTC

(In reply to comment #5)
> I believe parsing always happens synchronously because it can fail.
> 
> The only way I can see this work is by tying some kind of flag to the
> identifier that is set after the task in which the associated blob URL is being
> parsed and anytime after that when the same blob URL is being parsed the result
> ends up being about:invalid or some such (about:invalid is a URL representing a
> network error). Because if it would still be the same URL there would be no way
> to distinguish it when you start fetching the thing. (The blob data would be
> stored after parsing along with the identifier.)

Sorry, I'm having trouble parsing this.  You can parse/resolve/fetch the same autoRevoke blob URL any number of times, so why would it resolve to about:invalid?

Comment 7 Anne 2012-07-13 19:53:05 UTC

How does the UA tell the difference between parsed (please assume parsing and resolving becomes a single thing) blob URL and a parsed blob URL after it has been revoked? They need to be different.

Comment 8 Glenn Maynard 2012-07-13 20:19:28 UTC

(In reply to comment #7)
> How does the UA tell the difference between parsed (please assume parsing and
> resolving becomes a single thing) blob URL and a parsed blob URL after it has
> been revoked? They need to be different.

They aren't different.  Once you pass a blob URL to an API, that API retains a reference to the blob data.  Revoking the URL causes future uses of the URL to fail, but not ones you've already handed off to APIs.

See the earlier example:

function openURLLater(url)
{
    xhr.open(url);
    setTimeout(function() { xhr.send(); }, 1000);
}

This would work, even if "url" is revoked between open() and send().  It'll also work if blob.close() has been called (the reference is to the underlying data, not the Blob itself).  XHR retains a reference to the blob data (attached to the resolved URL).  Revoking the URL would never interrupt the XHR fetch after send() is called, either.

Similarly (and more importantly, since this one is harder to sidestep), if you say

    img.src = URL.createObjectURL(blob, {autoRevoke: true});

the "update the image data" algorithm takes a reference to the underlying blob data (possibly on resolve).  That way, even though the fetch happens asynchronously (the algorithm goes async in step 10) and the blob URL may be revoked before the fetch begins, the fetch still has access to the blob data.

One other important detail, though: if a reference is taken to the blob data, then the release of the reference needs to be defined too.  I don't know if that can be done other than spec-by-spec.  XHR would want to do this when entering the DONE state, for example; "update the image data" would do it when the algorithm returns.

I suppose in principle you don't need to define this; just pretend the reference is kept forever and leave releasing it as a QoI detail, since you can derive when that data can no longer possibly be accessed from the spec implicitly.  I'm not sure if that'd be a good idea or not...

Comment 9 Anne 2012-07-13 20:29:41 UTC

Right, but I don't want to tie the data to the API, just the URL. Otherwise all the APIs will have to special case blob URLs and that just seems unclean. So rewriting revoked blob URLs to about:invalid so the API can still fetch the blob URL as-is seems better.

Comment 10 Glenn Maynard 2012-07-13 20:54:21 UTC

(In reply to comment #9)
> Right, but I don't want to tie the data to the API, just the URL. Otherwise all
> the APIs will have to special case blob URLs and that just seems unclean. So
> rewriting revoked blob URLs to about:invalid so the API can still fetch the
> blob URL as-is seems better.

If the blob URL is revoked at the time the resolve happens, then it can just do nothing, as if it's not a blob URL at all.  The fetch would follow the same failure path it does now.

Are we thinking of this the same way?  I'm not saying this should cause the resolved URL to be different, just that the data would be attached as a property.  For example, in JS-like terms (not quite, since you can't actually put properties on strings):

function resolveURL(url)
{
    // ... perform other resolve steps
    var blob = getBlobFromURL(url);
    if(blob)
        url.blobData = blob.underlyingData;
    return url;
}

It still returns the same resolved URL, but (conceptually) the associated blob data is attached to the URL string like a property, which fetch can check for and use.  (There may be a clearer way to conceptualize this.)  The algorithm resolving the URL wouldn't have to know anything about this (except possibly for releasing the url.blobData reference).

Comment 11 Masatoshi Kimura 2012-07-14 20:44:43 UTC

If all relevant APIs need to be updated anyway, why did you avoid taking a blob paramater directly in the first place?
  audio.src = new Blob([...], {type:audio/webm});
would be much more concise and intuitive. Also that has no backward compatibility problem because we don't have to change createObjectURL().

Comment 12 Glenn Maynard 2012-07-14 21:04:50 UTC

I fully agree, but for reasons I don't understand there was pushback against that.

You couldn't use that with CSS url(), but I'm not familiar enough with CSS parsing to know if that could ever work anyway (eg. whether CSS parsing happens synchronously when inserting a <style> element--if not, there's no chance to grab a reference).

It'd require adding new entry points to every API that takes a URL, so the surface area might be a lot higher.

Comment 13 Arun 2012-08-24 18:56:22 UTC

Let's start with this being reassigned to HTML.  It would seem that a clone of this bug should be assigned to XHR too, but I leave that to the editor to decide.  In the meantime, I am reassigning this bug.

Comment 14 Ian 'Hixie' Hickson 2012-10-31 23:43:25 UTC

I disagree with the very premise of this bug. When you set a srcset="" content attribute on an HTMLImageElement, nothing happens until the UA wants it to happen. Adding prose everywhere that might eventually use part of a string as a URL to make sure it parses that string and grabs the data of any blob: URLs mentioned in it is just impractical. I don't even know how you would do it. For example, suppose you stuck a blob: URL into a Text node as "html { background: url(blob:...)}", and that you then insert that Text node in an HTMLStyleElement node that has a type="" attribute set to "bogus/bogus", and you later change the attribute to "text/css". What happens? What about if instead of type="", it's media="" that's set to a media query that doesn't apply, but later the user resizes the window so it does apply? What about if you set it on the fifth <source> element of a <video> element that is the child of an <object> element that has a plugin instantiated, but you later change the <object> element's data="" attribute so the plugin goes away, but the video's first <source> works fine, and then later you remove all but the 5th <source>, should that work? What about if the URL is in the string of an event handler attribute, as in setAttribute('onclick', 'image.src="' + myBlobURL + '"')? Or in a string you document.write('<script src="' + blobURL + '"><\/script>') or some such? What if it's a <link href=""> to a favicon but you change the rel="" from rel=icon to rel=stylesheet afterwards, does it survive? What if you change from rel=bogus to rel=icon? What if you set HTMLAnchorElement.href to a blob: URL, and then move that <a> element to another document and call .click() on it? What if you set a data-foo-src="" attribute to a blob: URL because you passed it to a library that used to use the blob: URL directly on an HTMLImageElement but now it puts it on a <span> element and later grabs the URL for stuffing into CSS?

Comment 15 Glenn Maynard 2012-11-01 00:23:19 UTC

(Paragraphs, dude, I thought the bugmail was misbehaving...)

One of the simplest, most obvious use of blob URLs is:

    img.src = URL.createObjectURL(blob, {autoRevoke: true});

Worse than merely not working--a dealbreaker for such a basic thing--this *may or may not* work, because the fetch happens asynchronously.  It may begin before or after the "URL scope" is exited and the blob URL expires.  The image may or may not be loaded.  This is the basic problem we're trying to solve: this sort of thing needs to work, and without nondeterministic behavior leaking through.

> Adding prose everywhere that might eventually use part of a string as a URL to make sure it parses that string and grabs the data of any blob: URLs mentioned in it 

That's not what we need.  It's perfectly OK if auto-revoke blob URLs don't work in every conceivable place.  In particular, the goal is *not* to take a reference to the blob in every single possible way you can store a blob URL.  But the behavior needs to be deterministic.  img.src currently isn't, since the fetch on the URL happens asynchronously.

So, I think the premise of the wall of text was mistaken, but to respond to a couple with what I'd expect to happen:

> What about if the URL is in the string of an event handler attribute, as in setAttribute('onclick', 'image.src="' + myBlobURL + '"')?

This would deterministically fail (unless you happen to fire onclick synchronously before returning from the same invocation).  The goal is not to extend the lifetime of blob URLs if you happen to store it anywhere (such as within another string, in this case).

I don't think anything needs to be done here.  This will already deterministically fail now, and that's the correct behavior.

> Or in a string you document.write('<script src="' + blobURL + '"><\/script>') or some such? 

Just as HTMLImageElement should synchronously take a reference to the blob when @src is set, HTMLScriptElement should too.  Since the HTMLScriptElement is created and its @src attribute set synchronously, before document.write returns, the reference will be taken before the calling script returns.  So, this would deterministically work.

> What if you set HTMLAnchorElement.href to a blob: URL, and then move that <a> element to another document and call .click() on it? 

This would work.  The <a> would take a reference to the blob synchronously, when a.href is assigned (again identically to img.src and script.src).  That reference would be carried around within the <a> when it's moved around.

(If it's possible to move an element to a document in a different origin, then it shouldn't work, however.  Blob URLs are restricted to same-origin, so references to the blob through a blob URL should be, too.)

Comment 16 Glenn Maynard 2012-11-01 14:35:52 UTC

(In reply to comment #15)
> > What if you set HTMLAnchorElement.href to a blob: URL, and then move that <a> element to another document and call .click() on it? 
> 
> This would work.

(Well, it would work at this level--it probably doesn't always make sense to allow navigating to blob URLs.  It definitely doesn't make sense for top-level browsing contexts.  That's a separate issue, though.)

Comment 17 Ian 'Hixie' Hickson 2012-11-01 22:24:24 UTC

This is nuts, IMHO.

Please list all the places you think that should have this behaviour.

Comment 18 Glenn Maynard 2012-11-01 23:50:21 UTC

I don't know if it's nuts, but I know of no other proposal so far that even attempts to make the auto-revoke feature interoperable.  I'm happy to see alternatives that are better than "drop the feature" and "have it remain unspecified".

I can't list every possible place it might be needed, because I'm not familiar with every piece of the platform, but I'm happy to make a start.

- HTML attributes which take a URL that gets fetched.  The itemprop section gives a start on this: @src for audio, embed, img, source, track, and video; @href for a, area and link; and <object data>.  Additionally, <img srcset>.  (This doesn't include every attribute which is a URL; for example, it probably doesn't make sense to do this for <blockquote cite> or <base href>.)
- WebSocket's constructor.  (If handing blobs to WebSocket isn't useful, then simply prohibit them here, so there's no question of async races.)
- XHR's client.open() (@anne).

(I'm ignoring CSS since that's a different spec, and because I don't know enough--or anything, really--about its processing model to guess at what it should do.  FWIW, "prohibit blob URLs entirely" is one possible answer.)

> What if it's a <link href=""> to a favicon but you change the rel="" from rel=icon to rel=stylesheet afterwards, does it survive? What if you change from rel=bogus to rel=icon? 

I think that for all of these attributes, the reference should be taken when the attribute itself is assigned, and should survive until the next time the attribute is assigned, and other attributes should have no effect.  So, the value of @rel, or @rel changing in the future, would have no effect here.

Comment 19 Simon Pieters 2012-11-02 09:50:23 UTC

(In reply to comment #15)
> (If it's possible to move an element to a document in a different origin,
> then it shouldn't work, however.  Blob URLs are restricted to same-origin,
> so references to the blob through a blob URL should be, too.)

It is possible, using document.domain.

(In reply to comment #18)
> - WebSocket's constructor.  (If handing blobs to WebSocket isn't useful,
> then simply prohibit them here, so there's no question of async races.)

I don't see how it could succeed, and currently this fails at the step of resolving the URL (the scheme is not "ws" or "wss").

Comment 20 Glenn Maynard 2012-11-02 21:08:55 UTC

(In reply to comment #19)
> I don't see how it could succeed, and currently this fails at the step of
> resolving the URL (the scheme is not "ws" or "wss").

OK, so that can be dropped from the list.

Comment 21 Michael[tm] Smith 2013-01-09 03:35:44 UTC

Housekeeping note: This bug was already resolved for HTML WG purposes before it was reopened and moved to the WHATWG product.

Comment 22 Ian 'Hixie' Hickson 2013-04-23 16:56:41 UTC

I'd like to see some other spec (e.g. a CSS module) do this before HTML does it, because it affects disproportionally more of the HTML spec than other specs and so HTML is probably not the best place to experiment with this in.

Comment 23 Arun 2013-07-09 20:36:12 UTC

*** Bug 22526 has been marked as a duplicate of this bug. ***

Comment 24 Arun 2013-07-09 20:41:14 UTC

Per discussions with Anne, I'm reassigning this bug to URL, modulo a fix in File API to Bug 22626.

Comment 25 Anne 2014-01-14 13:21:47 UTC

Arun, is there a summary somewhere of what needs to be done here now there is some machinery in place in the File API specification?

Comment 26 Arun 2014-01-14 23:56:33 UTC

Should the "fetch" part for blob URLs be done within the File API spec. or within http://fetch.spec.whatwg.org/?  I can see that given that the Blob model should live in File API, maybe the fetch part should too.  On the other hand, the blob URL scheme has related schemes (mediastream) and so maybe the fetch spec. could serve those as well.

But aside from deciding that, I think what should be done is:

1. When fetching, user agents should check to see whether the scheme data portion of a blob URL matches an identifier in the Blob URL store.  If yes, it should create a response (basically using the sanctioned response headers in File API) and return response.

2. If not in Blob URL store, it should return a network error.

Currently, the "machinery" in the File API is imperfect, because it isn't strict enough about what URL.createObjectURL and URL.createFor emit, and it isn't strict enough about scheme data as identifier.  I intend to fix this.

Related is Bug 24072, in which Blob.close() might force an identifier for that resource to be removed the next time the Revocation List is processed.  This part of the machinery needs fixing also.

The important aspects of the File API machinery are that the Blob URL Store and the Revocation List give Fetch enough to work with Blob URL references.

Comment 27 Anne 2014-01-15 10:33:21 UTC

So that makes sense for Fetch or the File API to define.

However, there was also an aspect for the URL parser to define, no? Where we store a reference on the parsed URL? And then if the URL is parsed again it would no longer work? See also the description in comment 0.

Comment 28 Arun 2014-01-15 19:17:05 UTC

(In reply to Anne from comment #27)
> So that makes sense for Fetch or the File API to define.
> 
> However, there was also an aspect for the URL parser to define, no? Where we
> store a reference on the parsed URL? And then if the URL is parsed again it
> would no longer work? See also the description in comment 0.

My understanding is that part *is* in the Blob URL store, which is the bit that addresses Glenn's Comment 0: 

"To fix this, "img.src = blobURL" needs to take a reference to the underlying blob data before the assignment returns.  Then, all fetches that would normally operate on the @src URL actually take place on the blob data."

The Blob URL Store stores the identifier (which is the scheme data) and the blob data (which is the resource); if both exist, then the response is prepared, and the fetch acts on the blob data that is stored in the Blob URL Store.

If the URL is fetched *again* and the reference is no longer valid (since while processing the Revocation List, it may have been expunged), it returns a network error.  Does URL parsing itself need to also touch on this further?

Comment 29 Anne 2014-01-16 14:56:07 UTC

You have minted a blob URL that can be used once.

Multiple APIs parse it in deterministic order. They then fetch it in random order.

The idea is that we can deterministically tell which APIs will get a network error and which single API (hint: the first to parse) will get the contents of the blob.

So either all future attempts to parse that blob URL need to fail or we need to store on the URL object that only this one has a reference to the actual blob.

Comment 30 Arun 2014-01-16 19:42:37 UTC

(In reply to Anne from comment #29)
> You have minted a blob URL that can be used once.
> 
> Multiple APIs parse it in deterministic order. They then fetch it in random
> order.
> 
> The idea is that we can deterministically tell which APIs will get a network
> error and which single API (hint: the first to parse) will get the contents
> of the blob.

Well, we know it's not necessarily true that only the first-to-parse API should get the contents of the blob upon fetch, and the rest should get a network error.

1. URL.createFor(blob) which was formerly URL.createObjectURL(blob, {autoRevoke: true}) blob URLs get automatically revoked when processing global script cleanup jobs, which is essentially containing script order within that unit of similar origin browsing contexts (ref for global script cleanup list: http://www.whatwg.org/specs/web-apps/current-work/multipage/webappapis.html#scripting). This should allow the same URL to be used more than once.

2. URL.createObjectURL(blob) are revoked when coupled with a URL.revokeObjectURL(url) call; again, multiple APIs can use the URL.

> 
> So either all future attempts to parse that blob URL need to fail or we need
> to store on the URL object that only this one has a reference to the actual
> blob.

Can we have "URL parse" do something like check the Blob URL Store during the scheme data portion of the parse algorithm, and then store the blob data at that point (toggling on scheme of course)?  This way, each time the reference is valid, the invoking API has a reference to the blob data.  I'm not sure what we should say about flushing the reference, if anything at all.

Comment 31 Glenn Maynard 2014-01-16 20:35:58 UTC

(In reply to Anne from comment #29)
> So either all future attempts to parse that blob URL need to fail or we need
> to store on the URL object that only this one has a reference to the actual
> blob.

Why would only the first API call get a reference?  As long as the URL parse happens before the blob URL is revoked, you should be able to use the URL as many times as you want.

(In reply to Arun from comment #30)
> Can we have "URL parse" do something like check the Blob URL Store during
> the scheme data portion of the parse algorithm, and then store the blob data
> at that point (toggling on scheme of course)?  This way, each time the
> reference is valid, the invoking API has a reference to the blob data.  I'm
> not sure what we should say about flushing the reference, if anything at all.

That was my recommendation.  Blob URLs should always be reusable--the mechanics of auto-revoke are partially based on the problems caused by the original "one-use" blob URL idea.

If possible, I wouldn't say anything about releasing/flushing the data reference, and just treat it as an (important) optimization.  Some cases wouldn't want to hold the reference at all, if a URL is being parsed for some reason other than a fetch (new URL(url)).  Ideally individual APIs that invoke fetch wouldn't have to care about this very much, beyond needing to invoke URL parsing synchronously even if the fetch doesn't happen until later.

Are there any cases where this could result in browsers never being able to release the blob data, or having to hold onto it for too long?

Comment 32 Arun 2014-01-16 23:01:35 UTC

(In reply to Glenn Maynard from comment #31)
 
> Are there any cases where this could result in browsers never being able to
> release the blob data, or having to hold onto it for too long?

In the case of a long-living web app, it would be ideal if we can hitch the lifetime of the blob data resource held in the URL parse step to the Revocation List for autorevoking Blob URLs, or allow for a synchronous purge of the resource in the case of a URL.revokeObjectURL call.  I can't exactly think of how to do this now, though, or even if this is feasible.  Maybe another list, which has a process that matches the Blob URL Store and Revocation List?

I think there might be cases where browsers hold on too long (e.g. what if some code mints URL.createObjectURL blob URLs, and creates new URL objects for them?) , so finding a way to expunge that kind of thing would be good.

Comment 33 Glenn Maynard 2014-01-17 00:06:51 UTC

(In reply to Arun from comment #32)
> In the case of a long-living web app, it would be ideal if we can hitch the
> lifetime of the blob data resource held in the URL parse step to the
> Revocation List for autorevoking Blob URLs, or allow for a synchronous purge
> of the resource in the case of a URL.revokeObjectURL call.  I can't exactly
> think of how to do this now, though, or even if this is feasible.  Maybe
> another list, which has a process that matches the Blob URL Store and
> Revocation List?

The revocation list is the list of blob URLs that are revoked synchronously when the script returns to the browser, right?  You wouldn't want to discard the blob data there, since the fetch hasn't happened yet.  I might not be following what you mean, though.

> I think there might be cases where browsers hold on too long (e.g. what if
> some code mints URL.createObjectURL blob URLs, and creates new URL objects
> for them?) , so finding a way to expunge that kind of thing would be good.

I'm a bit out of date: is there a way to initiate a fetch from a URL object (other than converting the URL object back to a string, of course)?  If not, then a URL object created with new URL(blobURL) wouldn't actually need to keep a data reference, since it would never be used.  (If there is then that's tricky, since you wouldn't want to force blob data to stay alive until the URL object is GC'd, given that most of those URL objects created by scripts will probably be just to parse out parts of the URL.)

img.src = blobURL would need to stash the blob data for the lifetime of the <img>, though, at least if the browser might discard the image data (so it can reload it again).  I think those sort of cases can't always be avoided without causing interop problems.

Comment 34 Arun 2014-01-17 18:41:42 UTC

(In reply to Glenn Maynard from comment #33)
> (In reply to Arun from comment #32)
> > In the case of a long-living web app, it would be ideal if we can hitch the
> > lifetime of the blob data resource held in the URL parse step to the
> > Revocation List for autorevoking Blob URLs, or allow for a synchronous purge
> > of the resource in the case of a URL.revokeObjectURL call.  I can't exactly
> > think of how to do this now, though, or even if this is feasible.  Maybe
> > another list, which has a process that matches the Blob URL Store and
> > Revocation List?
> 
> The revocation list is the list of blob URLs that are revoked synchronously
> when the script returns to the browser, right?  


Yep -- the global script cleanup jobs list is processed synchronously AFAICT.

>You wouldn't want to discard
> the blob data there, since the fetch hasn't happened yet.  I might not be
> following what you mean, though.


True!  I was trying to cobble a strawperson for how we could discard, if we do at all.

The Fetch itself should have a dependency on the entries in the Blob URL Store.  That is, if the scheme data of a parsed Blob URL (returned from URL parse) matches an entry in the Blob URL Store, only then return the response headers and response itself (where the response is the blob data; this is essentially Comment 26).

But since a Fetch runs asynchronously to when parse occurs, we're in the situation of having to get a ref to the blob data during URL parse.  I think spec'ing *getting the ref* is straightforward.  I'm still mystified by whether we should spec releasing the ref at all, and if we don't, whether we're setting things up for Bad Things.



> 
> I'm a bit out of date: is there a way to initiate a fetch from a URL object
> (other than converting the URL object back to a string, of course)?  


I don't think so!  But if we introduce getting a ref into URL parse, aren't we setting things up so that whether or not the URL is fetched, a reference to blob data is stored?  Or am I misunderstanding?


> 
> img.src = blobURL would need to stash the blob data for the lifetime of the
> <img>, though, at least if the browser might discard the image data (so it
> can reload it again).  I think those sort of cases can't always be avoided
> without causing interop problems.


This sounds right.

Comment 35 Glenn Maynard 2014-01-17 22:40:36 UTC

(In reply to Arun from comment #34)
> But since a Fetch runs asynchronously to when parse occurs, we're in the
> situation of having to get a ref to the blob data during URL parse.  I think
> spec'ing *getting the ref* is straightforward.  I'm still mystified by
> whether we should spec releasing the ref at all, and if we don't, whether
> we're setting things up for Bad Things.

Doing that in every case would need changes to every spec that uses parse+fetch.  For example, img.src assignment would need to explicitly say "if the old parsed URL was a blob URL, discard the reference".  I don't think that's needed, since the releasing is really just a side-effect of the captured data no longer being required (and since it'd be redundant with that, it could actually introduce bugs).

> I don't think so!  But if we introduce getting a ref into URL parse, aren't
> we setting things up so that whether or not the URL is fetched, a reference
> to blob data is stored?  Or am I misunderstanding?

Browsers can do whatever they want as long as the end result is the same, so if the data reference can never actually be used, they don't have to store it--the difference isn't observable.

The blob-releasing side of things can be thought of as the same as what we already have with Blob itself, where you have to keep the blob's data around while there are any accessible Blobs referencing it.  The spec doesn't talk about things like "discard the blob's data after all blobs are inaccessible and no fetches are active against it", it's just a side-effect of the other requirements of the API.  It's basically just GC.

Comment 36 Arun 2014-01-21 16:46:17 UTC

(In reply to Glenn Maynard from comment #35)

 
> Browsers can do whatever they want as long as the end result is the same, so
> if the data reference can never actually be used, they don't have to store
> it--the difference isn't observable.

In fact Firefox does something like what we discussed, and the goal was to see if we could specify it as part of the platform (short answer: I don't think we can). But I think you're right: as long as the end result is the same, we shouldn't have to nitpick at the details around releasing references.

The next problem is whether both *fetch* checking the Blob URL store *and* parse checking the Blob URL store and taking a reference creates a conflict, or thwarts the intent of parse taking a reference.  If we proceed like Comment 26 suggests, there might be a "valid" reference lingering because of parse, but one that fetch no longer thinks is valid.

Comment 37 Glenn Maynard 2014-01-21 18:53:40 UTC

(By the way, the name "URL.createFor" seems confusing.  It sounds like it's something that returns a URL object, but from what you said it's just another name for createObjectURL with a different autoRevoke default.)

(In reply to Arun from comment #36)
> The next problem is whether both *fetch* checking the Blob URL store *and*
> parse checking the Blob URL store and taking a reference creates a conflict,
> or thwarts the intent of parse taking a reference.  If we proceed like
> Comment 26 suggests, there might be a "valid" reference lingering because of
> parse, but one that fetch no longer thinks is valid.

Fetch definitely shouldn't care about the state of blob URLs.  The only time the blob URL state should matter is once, synchronously, when the actual API call is invoked and the URL is parsed.  If fetch cares about whether the URL was already revoked then we're back where we started.

This pattern should be guaranteed to work:

var xhr = new XMLHttpRequest();
var url = URL.createObjectURL(blob);
xhr.open("GET", url);
URL.revokeObjectURL(url); // before send()
xhr.send();

The URL is (or would be) parsed when open() is called, so that's when XHR takes a reference to the blob.  The URL is long gone when fetch begins (after send is called), but it already has taken a reference to the data, so that's OK.

Comment 38 Anne 2014-01-28 00:35:59 UTC

Parse URL takes a string and returns a URL. As part of its algorithm if it determines the input is a blob/mediastream URL, it does a lookup in the blob/mediastream URL table and does a structured clone of the object that maps to and stores it on the URL it is about to return.

Doing a structured clone ensures that blob.close() and such do not affect it and structured clones are cheap. Fetch can just use invoke a read operation on that object.

Comment 39 Anne 2014-02-03 17:23:30 UTC

https://github.com/whatwg/url/commit/4ba631a00c768b9ee8a5cc50b6ded6277d498f85

Comment 40 Glenn Maynard 2014-05-22 14:26:50 UTC

Nice work on this.

This isn't finished until other APIs are actually using this, like img.src, but we're probably at a good stopping point for now.  This is specced end-to-end for XHR, so we can see if this gets implemented for that before lobbying other specs to start using it.  Should we open another bug for that and mark it LATER to remind us to come back to it?

Here's a quick test for this:

https://zewt.org/~glenn/test-grabbing-blob-url-references.html

This passes in Firefox, which is surprising...

Comment 41 Anne 2014-05-22 14:37:39 UTC

Well, other specifications not using the URL Standard's URL parser would indeed be problematic and worthy of a bug on them, but not worthy of a bug on the URL Standard.