Bug 17262 - send function should have async interface
Summary: send function should have async interface
Status: RESOLVED WORKSFORME
Alias: None
Product: WebAppsWG
Classification: Unclassified
Component: WebSocket API (editor: Ian Hickson) (show other bugs)
Version: unspecified
Hardware: PC All
: P2 major
Target Milestone: ---
Assignee: Ian 'Hixie' Hickson
QA Contact: public-webapps-bugzilla
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-31 05:50 UTC by Takashi Toyoshima
Modified: 2012-07-10 21:35 UTC (History)
8 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Takashi Toyoshima 2012-05-31 05:50:45 UTC
Current spec provide synchronous send function to transmit Blob object.
On the other hand, blob causes I/O operations and must be handled in async operations.

Our current choices are;
 1. Block JavaScript thread until the requested blob is read to internal buffer and become safely reusable.
 2. Queue the requested blob as a reference internally, then return from send operation immediately.

Choice 1 is seriously problematic from two viewpoints. One is blocking JavaScript thread.
The other is that JavaScript can not send larger blob than internal buffer size because it requires to
copy to internal buffer at once.

Choice 2 has another critical problem. JavaScript has no chance to know when the queued blob can be
reusable. It means that JavaScript never modify the blob after passing to WebSocket send operation.
Also, this choice makes bufferedAmount meaningless because limitation by internal buffer never depends
buffer size.

Anyway, JavaScript has no way to queue the object to internal send buffer safely whether they want to
send blob or others. Busy requests will result in _fail_ and its limit looks vague to JavaScript.

My proposed change is like this.

<Plan A: Change a series of send interfaces>
void send(DOMString data, optional Function callback);
void send(ArrayBufferView data, optional Function callback);
void send(Blob data, optional Function callback);

<Plan B: Introduce another event handler>
[TreatNonCallableAsNull] attribute Function? onsendend;

SendEndEvent : Event {
  // TBD
  readonly attribute boolean wasClean;
  readonly attribute unsigned long long size;
  readonly Object object;
};

In both of my plan, the behavior in no callback (event handler) is compatible to current one.
If callback is set, another send request before callback invokation results in _fail_.
Comment 1 Anne 2012-05-31 08:07:16 UTC
This would also be a problem for XMLHttpRequest. We should probably have a single solution to address both.
Comment 2 Takashi Toyoshima 2012-05-31 08:30:52 UTC
As far as I know, XHR allows only one send call, and send operation can be asynchronous originally.
Caller can know the sending status through event handlers like onprogress and onloadend.

On the other hand, WebSocket allows continuous calls without any completion notification.
I think this is a WebSocket specific issue.

I'm not super familiar with XHR, so please correct if my understanding was wrong.
Comment 3 Anne 2012-05-31 08:33:38 UTC
That makes sense, though manipulating Blob after it has been passed to send() is a problem either way I think.
Comment 4 Jonas Sicking 2012-05-31 08:59:40 UTC
Blob's are read-only and can't be changed, so I don't understand why this is a problem? What we do in Gecko for both these APIs is to store a reference to the Blob object internally and then read lazily from it as needed. All IO happens on the background network thread.
Comment 5 Takashi Toyoshima 2012-05-31 09:35:56 UTC
> Blob's are read-only
I missed this point. Thank you for letting me know.

Actually Blob is a read-only interface. But it might be a reference to an actual file on FileSystem API.
File inherits Blob, so we can send File object via these APIs.
Of course, the file may provide another FileWriter interface to modify the actual file contents.
We can not avoid this problem if Blob is a read-only interface.
Comment 6 Glenn Maynard 2012-05-31 14:04:04 UTC
(In reply to comment #4)
> Blob's are read-only and can't be changed

With the exception of neutering (eg. structured clone transfer and, if it's added, Blob.close).  All that's needed to deal with this exception is for async APIs to synchronously take a reference to the underlying data of a Blob (or File) before returning to the caller, not a reference to the Blob itself.  For example, XHR.send() says "Let the request entity body be the raw data represented by data."

(I'm not sure that this concept is well-defined enough.  Here, XHR wants to take a conceptual reference to the underlying data, not actually read it from disk if it's a File.)

(In reply to comment #0)
>  2. Queue the requested blob as a reference internally, then return from send
> operation immediately.
> 
> Choice 2 has another critical problem. JavaScript has no chance to know when
> the queued blob can be
> reusable. It means that JavaScript never modify the blob after passing to
> WebSocket send operation.

JavaScript can never modify blobs anyway, by design.

This seems like the right thing to do, with the caveat that when the actual Blob read happens, it might fail.  I don't know the WebSocket API and usage enough to know if simply reporting an error is good enough, and what should happen to any data queued afterwards.

> Also, this choice makes bufferedAmount meaningless because limitation by
> internal buffer never depends buffer size.

I'm not sure what you mean.  After calling send(File), bufferedAmount is increased by the file's length, which is always known in advance without any disk I/O.  (Some browsers don't implement this correctly yet, but that's a bug.)  As the file is streamed from disk over the network, bufferedAmount is decreased.  The buffer size is usually not actually that big, but it seems like reasonable behavior.

(In reply to comment #5)
> File inherits Blob, so we can send File object via these APIs.
> Of course, the file may provide another FileWriter interface to modify the
> actual file contents.

Actually, File is read-only, too.  More accurately, it's immutable (again with the exception of neutering).

If the underlying File changes (whether by another application or FileWriter), reads to earlier File instances will fail, not return the changed data.  A File represents a snapshot of a file on disk at the time the File was created, and if that data is no longer available because the file was modified, the read fails.
Comment 7 Jonas Sicking 2012-06-01 07:51:02 UTC
Blobs are very intentionally read-only. It's part of their design. We can rely on the fact that no interfaces will be introduced in the future to make them mutable.

For something like a FileSystem API, if the "underlying" data changes the FileSystem API needs can deal with this in many ways:

* Neuter the blob such that any reads from it will fail
* Use a copy-on-write scheme such that any already existing Blob objects maintain
  their values (this would likely be prohibitively expensive though).
* Block all writes until any already started reads finish
* Block all writes until all Blob references to the data go away.

This is up to the FileSystem API to define.

For what it's worth I believe the FileSystem spec proposed by google uses the first bullet above. The FileHandle API proposed by mozilla uses the third.
Comment 8 Takashi Toyoshima 2012-06-01 07:53:26 UTC
(In reply to comment #6)
> > Also, this choice makes bufferedAmount meaningless because limitation by
> > internal buffer never depends buffer size.
> 
> I'm not sure what you mean.  After calling send(File), bufferedAmount is
> increased by the file's length, which is always known in advance without any
> disk I/O.  (Some browsers don't implement this correctly yet, but that's a
> bug.)  As the file is streamed from disk over the network, bufferedAmount is
> decreased.  The buffer size is usually not actually that big, but it seems like
> reasonable behavior.

I think bufferedAmount is used as a hint to control sending speed at JavaScript application level.
Some application will use this hint like,
"OK, now bufferedAmount is less than 512kB. This browser is capable to queue 1MB data safely. Now I can send 256kB text safely."
So, what's happen when bufferedAmount count actual file size and also a browser accepts many Blob objects as references.
"Wow, now bufferedAmount is over 1GB. When can I send the next text safely?"
This is the point I said that bufferedAmount is meaningless now.

> Actually, File is read-only, too.  More accurately, it's immutable (again with
> the exception of neutering).
> 
> If the underlying File changes (whether by another application or FileWriter),
> reads to earlier File instances will fail, not return the changed data.  A File
> represents a snapshot of a file on disk at the time the File was created, and
> if that data is no longer available because the file was modified, the read
> fails.

Thank you for clarification.

So, the last point looks important.
If the Blob is stored in WebSocket sending queue, internal reading and sending operation will fail by file modifications. If we want to avoid this unexpected read failures, we should wait for the completion before update the file. Thus, we need completion callback or completion event handler.
Comment 9 Takashi Toyoshima 2012-06-01 07:56:09 UTC
> For what it's worth I believe the FileSystem spec proposed by google uses the
> first bullet above. The FileHandle API proposed by mozilla uses the third.

Thank you for good information. I agreed that the third idea works fine for everything.
Comment 10 Jonas Sicking 2012-06-01 16:21:37 UTC
ghe point of the bufferedAmount property is to be able to send low-latency data. For example in a game you want to send the game characters current position. bufferedAmount allows you to do this without filling up the send buffer so much that buffered data causes latency.

I'm marking this bug WORKSFORME as asynchronous reading from thd blob is already possible
Comment 11 Glenn Maynard 2012-06-01 22:55:45 UTC
(In reply to comment #9)
> Thank you for good information. I agreed that the third idea works fine for
> everything.

This is a separate discussion, but actually it doesn't work at all ("Block all writes until any already started reads finish").  The browser can't prevent external applications from modifying files (at least short of using mandatory locks, which are bad).
Comment 12 Takashi Toyoshima 2012-06-04 07:45:18 UTC
I reopened this issue because I think it doesn't work for Mozilla, too.
Comment 13 Jonas Sicking 2012-06-04 07:58:54 UTC
What doesn't work for mozilla?

The fact that files can be modified by external sources is indeed a problem. However it's a problem that can't be solved with more APIs such as the one suggested by this bug. There is nothing preventing the user from modifying the file even if we have an "async send" method. Having an "async send" method wouldn't affect what behavior the UA has in that case at all.

Am I missing something?
Comment 14 Takashi Toyoshima 2012-06-04 08:44:34 UTC
(In reply to comment #10)
> ghe point of the bufferedAmount property is to be able to send low-latency
> data. For example in a game you want to send the game characters current
> position. bufferedAmount allows you to do this without filling up the send
> buffer so much that buffered data causes latency.

I don't think so. All things we can estimate from bufferedAmount is ratio between sending data speed and actual network throughput.
Usually we should allocate another WebSocket channel for low latency communication and application may want to attach an event timestamp. If bufferedAmount was useful to know the latency, we had no way to avoid the latency.

Anyway, we have no way to know if the following send operation cause internal buffer overflow. Firefox implementation might be better one, but it never resolve potential issue how JavaScript care about internal sending buffer capability.

My original proposal referred to Blob. But this proposal is not only for handling Blob. I think asynchronous sending API without any polling and completion callback could not work. Of course we never want to have select() in JavaScript.
Comment 15 Takashi Toyoshima 2012-06-04 09:09:43 UTC
> This is a separate discussion, but actually it doesn't work at all ("Block all
> writes until any already started reads finish").  The browser can't prevent
> external applications from modifying files (at least short of using mandatory
> locks, which are bad).

(notes; I'm not super familiar with FileSystem API)
In the case of FileSystem API, it's sandbox-ed and separated from native file systems. We can assume it is not modified by external application. There is no difference from that OS assumes disks are not modified by VMM.
Comment 16 Takashi Toyoshima 2012-06-04 09:42:40 UTC
(In reply to comment #7)
> * Neuter the blob such that any reads from it will fail
> * Use a copy-on-write scheme such that any already existing Blob objects
> maintain
>   their values (this would likely be prohibitively expensive though).
> * Block all writes until any already started reads finish
> * Block all writes until all Blob references to the data go away.

Now, I feel the third one is the complete resolution.
The reason follows.

How can I use a Blob object which might become invalid suddenly?
Read failures inside WebSocket implementation lead connection closed.
It is not acceptable behavior. Thus, the fist idea looks bad to me.

The forth idea is also bad because implicit dependency causes deadlock easily. Application may hold a read reference to a object when it waits for a write completion on the same object. Of course, it can be forbidden by API definition.

The third idea might be good for WebSocket. Because a Blob which is queued into WebSocket internal buffer will be released sooner or later and it never blocks any write requests forever. I'm not sure 'any already started reads' includes 'any Blobs queued into WebSocket' But, it must contain. Because read failures lead the connection closed, and any writers have no chance to know when the read operations by WebSockets are finished.

Anyway, this discussion must be done in another thread with guys who are responsible for the FileSystem API and the FileHandle API.

My major interest is the way to realize back pressure at the JavaScript API level.
Comment 17 Glenn Maynard 2012-06-04 13:42:48 UTC
(In reply to comment #15)
> > This is a separate discussion, but actually it doesn't work at all ("Block all
> > writes until any already started reads finish").  The browser can't prevent
> > external applications from modifying files (at least short of using mandatory
> > locks, which are bad).
> 
> (notes; I'm not super familiar with FileSystem API)
> In the case of FileSystem API, it's sandbox-ed and separated from native file
> systems. We can assume it is not modified by external application. There is no
> difference from that OS assumes disks are not modified by VMM.

It's File API you want, not FileSystem API.  You can easily get a File for a native user file, eg. using <input type=file>.

(FS-API is currently sandboxed, but there will likely be a native interface for it eventually, but regardless it's only one possible way of getting File objects.)

(In reply to comment #16)
> (In reply to comment #7)
> > * Neuter the blob such that any reads from it will fail
> > * Use a copy-on-write scheme such that any already existing Blob objects
> > maintain
> >   their values (this would likely be prohibitively expensive though).
> > * Block all writes until any already started reads finish
> > * Block all writes until all Blob references to the data go away.
> 
> Now, I feel the third one is the complete resolution.

You can't lock external files that can be modified by other non-browser apps for the lifetime of the File object.

> The reason follows.
> 
> How can I use a Blob object which might become invalid suddenly?

By catching errors in the API you're using and handling them.  It'll always be possible for reading a File object to fail; eg. the user might delete the file, eject the DVD it's on, or an NFS error might occur.

The only way to prevent this is to copy the entire File out of native storage when creating the File object, which is obviously not an option.
Comment 18 Takashi Toyoshima 2012-06-04 14:10:25 UTC
> > Now, I feel the third one is the complete resolution.
> 
> You can't lock external files that can be modified by other non-browser apps
> for the lifetime of the File object.

I'm sorry. I meant the *SECOND* one is the best way.
If native files can be usable as a File object, we should take a lock at OS level while getting its Blob reference.
Comment 19 Glenn Maynard 2012-06-04 14:20:56 UTC
(In reply to comment #18)
> > > Now, I feel the third one is the complete resolution.
> > 
> > You can't lock external files that can be modified by other non-browser apps
> > for the lifetime of the File object.
> 
> I'm sorry. I meant the *SECOND* one is the best way.
> If native files can be usable as a File object, we should take a lock at OS
> level while getting its Blob reference.

No way.  If I drag a file into a web page, it's completely unacceptable for the browser to hold a hard write lock on a user's file until the page decides to let go of the reference.

(Worse, it'd have to wait for the reference to be GC'd, which might be much longer.  Firefox used to have problems with this when uploading files: after uploading a file with a form, the file stayed locked for a while.  This gave me daily headaches--I couldn't delete files after uploading them.)

Scripts must deal with errors anyway, as there will always be error cases for File.  This is just an error.
Comment 20 Takashi Toyoshima 2012-06-05 07:26:51 UTC
Note: This discussion looks out of scope to this thread.

(In reply to comment #19)
I believe all application including a browser are eligible to use lock in order to assure their internal operations.

Should users have a chance to delete a file when they request the Explorer to copy the file to another place and it's going on. I don't think so. An application may gracefully provide a "Cancel" button. But it's a best case.

Uploading file looks a similar case. I agreed that lazy release caused by GC is problematic. But it's another issue.

> Scripts must deal with errors anyway, as there will always be error cases for
> File.  This is just an error.

In our case, scripts have no chance to catch read errors. Because sending operation is just queued and actual operations are processed by internal implementation. All we can do here is to send close frame with status code 1011 and to close the connection. I think we never want to close the connection just to know the file is ready for write operations. Also, we never want to close the connection only to know the sending buffer capability.

Only way to send a data safely is to wait for bufferedAmount being zero becore calling send(). Of course, it increases application level end to end communication latency and decrease total message throughput.
Comment 21 Glenn Maynard 2012-06-05 16:06:46 UTC
On the original point of this bug: It sounds like your basic issue was that you read send(Blob) as though it can be synchronous.  Maybe it should be more explicit that it's always asynchronous.

(In reply to comment #20)
> In our case, scripts have no chance to catch read errors. Because sending
> operation is just queued and actual operations are processed by internal
> implementation. All we can do here is to send close frame with status code 1011
> and to close the connection. 
> I think we never want to close the connection just to know the file is ready for write operations. Also, we never want to close the connection only to know the sending buffer capability.

No matter what you do, there are going to be error paths in file I/O.  Trying to eliminate one of them won't make all of the others go away.  If you want to know when WS is finished with a Blob, address that directly.

For example, add a "blob" attribute to onerror indicating which blob (if any) the error is associated with, and add an onprogress event to WS with a similar attribute.  If you have use cases for something like this, it should probably be opened as a separate bug.

(The rest is a bit off-topic, explaining why locking won't work; reordered to the bottom.)

> I believe all application including a browser are eligible to use lock in order
> to assure their internal operations.

First, this is unimplementable.  Mandatory locking is not allowed in Linux by default; you have to enable it with a special mount option, "mand", which is normally disabled.  You can't prevent external apps from writing to or deleting files.

But there's a more basic issue: what you're suggesting is to lock the file so long as you have a reference to a File (not only while reading from it).  That's the only way to ensure that the file isn't deleted or modified between the file being opened and the script calling send(blob).

It's OK to lock a file *while* you're actively accessing it, but that's not what a File is.  A File is analogous to a pathname in a native application; it can be kept around indefinitely, long before and after actual file access.  A File can even be stored long-term via IndexedDB and the History API (to restore open files across session reloads), though I think nobody implements those features yet.

It's only actually being accessed while an API (FileReader, FileWriter, WebSocket) is active on it--that's the only time locking makes sense.
Comment 22 Ian 'Hixie' Hickson 2012-07-10 21:35:40 UTC
I don't understand what this bug has to do with the WebSockets API. The send() method in WebSockets is already implementable in a completely async manner. It's up to the browser to get a copy-on-write reference to the underlying data, but that's an implementation detail.