This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 25081 - Make read operation really async
Summary: Make read operation really async
Status: RESOLVED FIXED
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: File API (show other bugs)
Version: unspecified
Hardware: PC All
: P2 normal
Target Milestone: ---
Assignee: Arun
QA Contact: public-webapps-bugzilla
URL:
Whiteboard:
Keywords:
Depends on: 24576
Blocks: 24338
  Show dependency treegraph
 
Reported: 2014-03-17 15:58 UTC by Anne
Modified: 2014-05-28 19:25 UTC (History)
3 users (show)

See Also:


Attachments

Description Anne 2014-03-17 15:58:43 UTC
It seems if the blob is closed read should not return failure synchronously if you expect to use it asynchronously. A task would be better.

Also, what happens if a blob is closed while you're reading it?
Comment 1 Arun 2014-03-17 16:48:58 UTC
Read methods (e.g. FileReader.readAsText) currently throw on attempts to read a closed blob; you think all of them should fail asynchronously, or do you only care about the generic read method?

Asynchronous failures without promises now seem awkward, but doable.
Comment 2 Anne 2014-03-17 18:01:10 UTC
Okay, if throwing is always the correct thing to do (and compatible) we can do that but then the check should not be there.

There's a bunch of scenarios.

1. send(closedBlob);
2. send(blob); blob.close();
3. send(blob); setTimeout(() => blob.close(), 0)

It seems you account for 1, but not 2 or 3. I think the way to account for 1 would actually be directly in the algorithm itself (potentially a wrapper around the read operation if it happens often) and not the asynchronous read operation because then you overload that operation with two distinct purposes (sync checking of close and async reading).
Comment 3 Arun 2014-03-19 20:03:05 UTC
(In reply to Anne from comment #2)
> Okay, if throwing is always the correct thing to do (and compatible) we can
> do that but then the check should not be there.
> 
> There's a bunch of scenarios.
> 
> 1. send(closedBlob);


As you mention, this is accounted for in the current scenario: read methods throw if the Blob is closed; the Blob is set to 0 bytes if CLOSED, and subsequent operations on it are mostly spec'd to throw.


> 2. send(blob); blob.close();

Close is spec'd not to affect any read currently underway. That is, unless at the time of a read operation, the blob is neutered, the read will go through and the bag of bits will be read into the format specified by the read method. It won't affect a read in progress, but of course it will affect a subsequent read.


> 3. send(blob); setTimeout(() => blob.close(), 0)


I think the above holds true here too.

> 
> It seems you account for 1, but not 2 or 3. I think the way to account for 1
> would actually be directly in the algorithm itself (potentially a wrapper
> around the read operation if it happens often) and not the asynchronous read
> operation because then you overload that operation with two distinct
> purposes (sync checking of close and async reading).


So actually, I just have a "read operation" which takes a synchronous flag, and behaves differently if the synchronous flag is set or unset. The check for neutered blobs occurs before the check for the synch. flag. I am checking "directly in the algorithm itself." Can you help me out with an example of what you think would be an improvement?
Comment 4 Anne 2014-03-20 10:16:56 UTC
I don't see how this approach is going to work for FormData, which has references to a bunch of blobs it will all have to read in sequence and cannot deal with them being closed and probably cannot start reading them all at once.

I'm not entirely sure what I would do. I find the current setup rather weird. I would probably only have the close checks at the API-level. That way for FormData we can do a check once it is passed for all its associated blobs and if none of them is closed we can proceed with reading them all regardless of whether they have been closed meanwhile.
Comment 5 Arun 2014-03-20 21:24:24 UTC
(In reply to Anne from comment #4)
> I don't see how this approach is going to work for FormData, which has
> references to a bunch of blobs it will all have to read in sequence and
> cannot deal with them being closed and probably cannot start reading them
> all at once.
> 
> I'm not entirely sure what I would do. I find the current setup rather
> weird. I would probably only have the close checks at the API-level. That
> way for FormData we can do a check once it is passed for all its associated
> blobs and if none of them is closed we can proceed with reading them all
> regardless of whether they have been closed meanwhile.


But the "current setup" is born out of trying to use the same relationship as XHR:Fetch (that is, FileReader.readAsxx:Read Operation), as discussed!

Ok, I can definitely move the CLOSED check to API level. That way, all APIs are responsible for internal state checking of Blobs. And read does not state check.

-- A*
Comment 6 Anne 2014-03-21 11:18:36 UTC
Yeah that would be good.

I think if we want to define FormData and keep the concept of closed blobs that can be collected, we should have some kind of "keepalive number" on blob. By default it would be 0. Whenever an attempt is made to serialize a FormData object it would increase the "keepalive number" of the blobs it consist of by 1. Meaning that even if the blobs get closed they would still be kept alive because they will be read from. After reading the FormData object would be responsible for decreasing the number by 1. It needs to be a number as multiple FormData objects might reference the same blob.

Does that make sense?
Comment 7 Arun 2014-03-25 20:46:24 UTC
(In reply to Anne from comment #6)
> Yeah that would be good.
> 
> I think if we want to define FormData and keep the concept of closed blobs
> that can be collected, we should have some kind of "keepalive number" on
> blob. By default it would be 0. Whenever an attempt is made to serialize a
> FormData object it would increase the "keepalive number" of the blobs it
> consist of by 1. Meaning that even if the blobs get closed they would still
> be kept alive because they will be read from. After reading the FormData
> object would be responsible for decreasing the number by 1. It needs to be a
> number as multiple FormData objects might reference the same blob.
> 
> Does that make sense?


OK, but a few questions:

1. Is this really a specification detail, or should it be an implementation detail? If a specification detail, is there a limit on the keepalive number for Blobs?

2. Other than FormData, which API uses the keepalive property on Blobs? Should this be internal to FormData? That is, FormData is responsible for keeping records of Blobs it needs after they are neutered?
Comment 8 Arun 2014-04-09 21:38:38 UTC
(In reply to Arun from comment #7)
> (In reply to Anne from comment #6)
> > Yeah that would be good.
> > 
> > I think if we want to define FormData and keep the concept of closed blobs
> > that can be collected, we should have some kind of "keepalive number" on
> > blob. By default it would be 0. Whenever an attempt is made to serialize a
> > FormData object it would increase the "keepalive number" of the blobs it
> > consist of by 1. Meaning that even if the blobs get closed they would still
> > be kept alive because they will be read from. After reading the FormData
> > object would be responsible for decreasing the number by 1. It needs to be a
> > number as multiple FormData objects might reference the same blob.
> > 
> > Does that make sense?
> 
> 
> OK, but a few questions:
> 
> 1. Is this really a specification detail, or should it be an implementation
> detail? If a specification detail, is there a limit on the keepalive number
> for Blobs?
> 
> 2. Other than FormData, which API uses the keepalive property on Blobs?
> Should this be internal to FormData? That is, FormData is responsible for
> keeping records of Blobs it needs after they are neutered?

Bug 25302 is to address Comment 6. I've Cc'd you to that bug, since I have questions.
Comment 9 Dave Herman 2014-04-14 22:57:44 UTC
It looks like this feature is trying to do a lot of work to make .close() automatically race-free, but it feels awfully DWIMmy to me. The stuff here and especially in bug 25302 sounds super complex and has me pretty concerned.

ISTM .close() is an inherently racy concept, and adding a lot of machinery to try to defer closing won't eliminate races and will make it harder to predict when the blob is actually closed. The straightforward semantics seems to me that any outstanding async operation immediately fails if a concurrent .close() happens. Then it's on the program to manage the asynchrony (which are facilitated by features like promises, task.js, and ES7 async/await).

Dave
Comment 10 Jonas Sicking (Not reading bugmail) 2014-04-15 00:24:50 UTC
I don't think we need keepalives and the like. It's easier to simply define that a lot of these APIs clones (by calling blob.slice()) the Blob that is passed to them.

So calling `myFormData.append("foo", myblob)` would make the myFormData object call myblob.slice() before returning from .append. When myFormData is submitted it wouldn't matter if myblob has been closed or not.

Same thing when xhr.send(myblob) is called. Before .send() returns it would call myblob.slice() and hold a reference to the returned object.

This should generally avoid racy behavior. Though it does have the downside that it's somewhat harder to make sure that the data backing the blob goes away *right now* since there are more clones floating around.

However it has the upside that if you want to use a blob and free the resources backing the blob as soon as you're done using it, you can do

`useBlob(myblob); myblob.close();`

rather than

`x = useBlob(myblob); x.registerDoneHandler(() => myblob.close());`

I think the latter is more error prone given that it puts more distance between when the blob is used and when its closed in application code. So it's more likely that bugs, error conditions or other edge cases will make the application not close the blob.


So, in short, I think we can get rid of the keepalive counter and instead simply use Blob.slice() which gives us the semantics that we want.
Comment 11 Arun 2014-04-15 03:36:00 UTC
(In reply to Jonas Sicking from comment #10)
> I don't think we need keepalives and the like. It's easier to simply define
> that a lot of these APIs clones (by calling blob.slice()) the Blob that is
> passed to them.
> 
> So calling `myFormData.append("foo", myblob)` would make the myFormData
> object call myblob.slice() before returning from .append. When myFormData is
> submitted it wouldn't matter if myblob has been closed or not.
> 
> Same thing when xhr.send(myblob) is called. Before .send() returns it would
> call myblob.slice() and hold a reference to the returned object.
> 
> This should generally avoid racy behavior. Though it does have the downside
> that it's somewhat harder to make sure that the data backing the blob goes
> away *right now* since there are more clones floating around.
> 
> However it has the upside that if you want to use a blob and free the
> resources backing the blob as soon as you're done using it, you can do
> 
> `useBlob(myblob); myblob.close();`
> 
> rather than
> 
> `x = useBlob(myblob); x.registerDoneHandler(() => myblob.close());`
> 
> I think the latter is more error prone given that it puts more distance
> between when the blob is used and when its closed in application code. So
> it's more likely that bugs, error conditions or other edge cases will make
> the application not close the blob.
> 
> 
> So, in short, I think we can get rid of the keepalive counter and instead
> simply use Blob.slice() which gives us the semantics that we want.


Do you explicitly mean for it to be a slice or a structured clone?
Comment 12 Jonas Sicking (Not reading bugmail) 2014-04-15 05:51:38 UTC
The two behave the same way, no? I.e. both a x = myblob.slice() and a structured of myblob create a semantic copy of the data. In both cases would the implementation likely simply grab another reference to the underlying resource, be that an in-memory buffer, a filename, or something else, and in both cases calling myblob.close() doesn't affect the "copy".
Comment 13 Anne 2014-04-15 10:22:48 UTC
Jonas, you would not want FormData to clone on append, but rather when serializing. Otherwise you are not preserving object identify in what is essentially a map, which seems bad.
Comment 14 Jonas Sicking (Not reading bugmail) 2014-04-15 19:25:44 UTC
If you want to keep it map-like and track object identity then I would also expect that something like:

formdata.append("foo", blob);
blob.close();
xhr.send(formdata);

would fail. It would be weird if you could see that the blob instance is in the formdata, and thus is closed there, but when you send the formdata the blob would magically work.

So if that's the behavior we want then you could simply have the formdata grab a reference to the passed in blob and let the submission fail. So you'd still not need to worry about keepalive counts as nothing would be kept alive.
Comment 15 Anne 2014-04-15 22:21:27 UTC
You're a bit late to the party :-) In some other bug we reached that conclusion already. I'm not sure if submission should necessarily fail. Whether cloning a closed blob fails or whether it just returns a new closed blob of which reading returns the empty byte sequence is still under discussion.
Comment 16 Arun 2014-04-15 23:00:57 UTC
(In reply to Anne from comment #15)
> You're a bit late to the party :-) In some other bug we reached that
> conclusion already. I'm not sure if submission should necessarily fail.
> Whether cloning a closed blob fails or whether it just returns a new closed
> blob of which reading returns the empty byte sequence is still under
> discussion.


... or in fact if reading a closed blob simply returns failure, and not 0 bytes: this is Bug 25242. There's an additional point that a closed Blob should only cause read failure, either for asynchronous or synchronous reads, and that read failure should be the only thing that fails on closed blobs. This means that structured cloning works on a closed blob, contrary to 

http://www.whatwg.org/specs/web-apps/current-work/multipage/common-dom-interfaces.html#safe-passing-of-structured-data

If you serialize for FormData, then at the point of read you'll get a failure on the close Blob.

It's been argued that a check for a closed blob in the read operation disrupts the asynchrony of the read operation (the original point of this bug!!!) but I think that if reading a closed blob fails, the generic read operation should return failure and a failure reason.

That means tweaking http://dev.w3.org/2006/webapi/FileAPI/#readOperationSection
 a bit to do the right thing with failure reasons. But I think it's mostly ok at this point.
Comment 17 Jonas Sicking (Not reading bugmail) 2014-04-16 17:55:52 UTC
*submission* would definitely fail. Any attempts to read from a closed blob should fail and submitting a blob (either standalone or as part of formdata) definitely reads from the blob.
Comment 18 Arun 2014-04-16 17:59:28 UTC
(In reply to Jonas Sicking from comment #17)
> *submission* would definitely fail. Any attempts to read from a closed blob
> should fail and submitting a blob (either standalone or as part of formdata)
> definitely reads from the blob.

Yes, submission should fail, since it is contingent on reading from a Blob.

The Blob closure model we should specify on the platform is in Bug 25302 Comment 9. If you agree with that model, I think we're in good shape to solve this bug about making the read operation "really" async.
Comment 19 Anne 2014-04-16 19:34:40 UTC
Should it fail? Is that what happens with <input type=file>?
Comment 20 Jonas Sicking (Not reading bugmail) 2014-04-16 19:46:57 UTC
<input type=file> doesn't have the concept of submitting closed files. The user could remove the file from the filesystem before submission happens, but I don't think that's equivalent since it happens out of control of the webpage, whereas blob closing is an explicit action from the author.
Comment 21 Anne 2014-04-16 19:54:38 UTC
Fair, but it's a scenario we have to deal with as well. I'd prefer if all the failure scenarios lined up a bit, but I guess we can handle them separately.

There's also reading a file and while reading it the file is removed. We should really define this as a stream concept in the specification. We may already be streaming part of the blob to the server at such a point.
Comment 22 Jonas Sicking (Not reading bugmail) 2014-04-22 19:01:53 UTC
What we need here is an algorithm which does something like the following. Note that this algorithm would be started synchronously from any API which reads from Blobs.


1. Check the closed flag. If the closed flag is set to true then queue a task which returns a IO error from this algorithm. This step is always done asynchronously.
2. Create a clone of the Blob and call the new clone /clone/
2. If the synchronous flag is true, then synchronously read the contents of /clone/. If an IO error occurs then return an IO error from this algorithm. If no IO error occurs then return the contents of the Blob. Either way abort this algorithm.
3. (The synchronous flag should at this point always be false). Queue a task to read the contents of /clone/. If reading the contents of /clone/ results in an IO error, then return an error. Otherwise return the contents of the blob.

There's obviously a lot of handwaving here. But the important part is that we should always synchronously check the closed flag.

For things like reading from a blob URL there is of course the question about when we invoke the algorithm. I.e. does setting

img.src = myblobURL

synchronously invoke the read algorithm, or does it happen at some later undefined asynchronous point in time. I think this is exactly the same question that we've been struggling with when it comes to revoked URLs.

But all APIs that deal with Blob instances should synchronously invoke the above algorithm, and would thus synchronously check the closed flag.
Comment 23 Arun 2014-04-22 19:08:31 UTC
(In reply to Jonas Sicking from comment #22)
> What we need here is an algorithm which does something like the following.
> Note that this algorithm would be started synchronously from any API which
> reads from Blobs.
> 
> 
> 1. Check the closed flag. If the closed flag is set to true then queue a
> task which returns a IO error from this algorithm. This step is always done
> asynchronously.
> 2. Create a clone of the Blob and call the new clone /clone/
> 2. If the synchronous flag is true, then synchronously read the contents of
> /clone/. If an IO error occurs then return an IO error from this algorithm.
> If no IO error occurs then return the contents of the Blob. Either way abort
> this algorithm.
> 3. (The synchronous flag should at this point always be false). Queue a task
> to read the contents of /clone/. If reading the contents of /clone/ results
> in an IO error, then return an error. Otherwise return the contents of the
> blob.
> 
> There's obviously a lot of handwaving here. But the important part is that
> we should always synchronously check the closed flag.
> 
> For things like reading from a blob URL there is of course the question
> about when we invoke the algorithm. I.e. does setting
> 
> img.src = myblobURL
> 
> synchronously invoke the read algorithm, or does it happen at some later
> undefined asynchronous point in time. I think this is exactly the same
> question that we've been struggling with when it comes to revoked URLs.
> 
> But all APIs that deal with Blob instances should synchronously invoke the
> above algorithm, and would thus synchronously check the closed flag.


I think http://dev.w3.org/2006/webapi/FileAPI/#readOperationSection already synchronously checks for the closed flag, but it is certainly lacking the asynchronous error reporting part *and* the cloning part, which Bug 25302 calls for. 

There are other synchronous determinations that a read operation might do, in addition to checking for closed: snapshot state, for example (http://dev.w3.org/2006/webapi/FileAPI/#file).

I think the rule is: as long as there's no IO for a particular algorithm step in the read operation, it's ok to be synchronous. Only IO steps have to be asynchronous.
Comment 24 Anne 2014-04-23 11:59:42 UTC
img.src = bloburl is one of the reasons why you want to separate checking flags and cloning from reading.

When you do img.src = bloburl you get a clone of the blob (through parsing the URL, presumably parsing has to check the closed too), but reading from that blob happens during fetch.

This is why I think reading should just describe the reading process and the parts that can fail. You could have "asserts" in the form of notes there that say it is assumed that the blob being closed is checked or is made impossible (by cloning).
Comment 25 Jonas Sicking (Not reading bugmail) 2014-04-23 18:24:19 UTC
When doing img.src = bloburl there are indeed some things that should happen synchronously and some that should happen asynchronously. I'll avoid using the terms "fetch" and "parse" since those are editorial constructs rather than behavioral constructs.

When img.src = bloburl is called we should synchronously
* Find which Blob object, if any, is backing the url. If the url was revoked
  this step should fail.
* Check the closed-flag of the blob.
* Grab a reference to the underlying data so that if the blob is closed or the
  url is revoked, this no longer affects what happens. (The simplest way to do
  this would be to clone the blob, but there are certainly other ways)

And we should asynchronously
* Read the data and return it.


I don't particularly care which of the above steps live in or are triggered by the "fetch" spec and which live in a "parse" spec.

I had assumed that we wanted img.src = X to trigger the fetch spec synchronously, and then the fetch spec would take care of triggering all of the above steps, both the synchronous and the asynchronous ones.

Anne, if you think that the File spec should have separate algorithms for the synchronous and asynchronous parts, what exactly should be in the asynchronous algorithm? Simply saying "read the data from the blob and return it" makes for an awfully short algorithm.

I also sort of like the idea of the file spec ensuring that everyone reading from the blob triggers only a single algorithm which ensures that all of the above steps are performed. I.e. I wouldn't want someone to write a spec which only triggers the "read data and return it" step.
Comment 26 Anne 2014-04-23 18:30:50 UTC
The reason you want to separate the URL parser and fetch is because of things like <img srcset>. There can be multiple URLs that you want to parse synchronously, but not necessarily all fetch at the same time (or indeed not necessarily at any point in the future).

I would expect reading from a blob to also have other failure conditions that might need to be enumerated. E.g. if if it was a blob backing a file that was moved by the user. That's why I thought it would be a longer algorithm.

As for the closed flag, I suppose I need to add a check for that here http://url.spec.whatwg.org/#concept-url-parser and not set a url's object if the blob is closed?
Comment 27 Arun 2014-04-23 20:54:01 UTC
(In reply to Anne from comment #26)
> The reason you want to separate the URL parser and fetch is because of
> things like <img srcset>. There can be multiple URLs that you want to parse
> synchronously, but not necessarily all fetch at the same time (or indeed not
> necessarily at any point in the future).

The way I would expect this to work is:


1. Parser sets URL's object to a structured clone of the entry in the Blob URL store. 

(Note: if the Blob has been closed with a blob.close(), currently this revokes all entries from the Blob URL store too. It's been suggested that this is not the right behavior. Assuming it is, the Parse Algo may obtain the structured clone only if there is an entry in the Blob URL Store; if there isn't, it will simply return without an object. In other words, if there isn't an entry in the Blob URL store, it could mean the Blob has been closed.)

2. Fetch operates on the URL object in 1. Fetch can invoke a read operation on the URL's object, which is a structured clone that is opened. Fetch's invocation of the read operation still works, even if the original object has been closed; the structured clone obtained in 1. is still opened, and not closed.

So, I'm not sure we'll take a structured clone for the read operation, contrary to Comment 22; the read operation will operate on a Blob, whether it's a structured clone of another Blob or not.


> 
> I would expect reading from a blob to also have other failure conditions
> that might need to be enumerated. E.g. if if it was a blob backing a file
> that was moved by the user. That's why I thought it would be a longer
> algorithm.


It does. If it was a blob backed by a file that was moved by the user, that's still an error that has to be reported through an error event. So's reading on a closed blob. It would be nice to have all of these in one algorithm.

 
> As for the closed flag, I suppose I need to add a check for that here
> http://url.spec.whatwg.org/#concept-url-parser and not set a url's object if
> the blob is closed?


If the Blob is closed, it is expected to flush an entry from the Blob URL store now. If that's the model, then checking for closed is redundant for URL parser.
Comment 28 Jonas Sicking (Not reading bugmail) 2014-04-23 22:20:24 UTC
One thing that we should make sure of is that a URL object which represents a Blob doesn't hold the Blob data alive. As I understand it that would just be a waste of resources since once you do

img.src = myurlobject;

we roundtrip through a string, and so if the blob has been closed, or the url revoked, then the img still wouldn't load successfully. So if myurlobject kept the Blob data alive it would be for naught.
Comment 29 Arun 2014-04-23 22:24:11 UTC
(In reply to Jonas Sicking from comment #28)
> One thing that we should make sure of is that a URL object which represents
> a Blob doesn't hold the Blob data alive. As I understand it that would just
> be a waste of resources since once you do
> 
> img.src = myurlobject;
> 
> we roundtrip through a string, and so if the blob has been closed, or the
> url revoked, then the img still wouldn't load successfully. So if
> myurlobject kept the Blob data alive it would be for naught.


But the problem is if you close a Blob synchronously, or revoke the URL using URL.revokeObjectURL,it's still possible that an asynchronous API has need of it and that it should work. That's why parse stores the object for possible asynchronous fetching. Of course, if it is already closed, then parse doesn't store an object.
Comment 30 Jonas Sicking (Not reading bugmail) 2014-04-23 22:29:09 UTC
new URL(bloburl);

should not do any of the synchronous steps in comment 25. However

img.src = bloburl;

should do all of the synchronous steps in comment 25. And it should do so before returning.

So if we define that the synchronous steps in comment 25 is part of a "parse" algorithm, then |new URL(bloburl)| should not run said "parse" algorithm.

Arguably this would be confusing, and I think this is an argument for making the synchronous steps of comment 25 not be part of "parse".
Comment 31 Anne 2014-04-24 08:06:56 UTC
sicking, that is covered. new URL() does a basic parse, which does not include special blob handling.
Comment 32 Jonas Sicking (Not reading bugmail) 2014-04-24 21:59:06 UTC
So we have three separate algorithms? "basic parse", "parse" and "fetch"?

Is "parse" and "fetch" ever done separately from each other? I.e. are there any instances where a spec wants to do "parse" but then not followed by "fetch"?
Comment 33 Anne 2014-04-25 09:08:16 UTC
Yes and yes, as I said before, <img srcset> is such a thing.
Comment 35 Anne 2014-05-16 13:03:56 UTC
1) I think it would be clearer if errors were more closely defined as they can only occur here.

2) Errors can occur during the read process as well so catching them upfront does not help.

3) For the asynchronous algorithm you do not want to return a value, but always queue something I think.
Comment 36 Arun 2014-05-28 19:25:58 UTC
(In reply to Anne from comment #35)
> 1) I think it would be clearer if errors were more closely defined as they
> can only occur here.
> 
> 2) Errors can occur during the read process as well so catching them upfront
> does not help.
> 
> 3) For the asynchronous algorithm you do not want to return a value, but
> always queue something I think.

Done.

http://dev.w3.org/2006/webapi/FileAPI/#reading-data-section

Marking fixed.