28798 – Make BufferSource more convenient to use

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 28798 - Make BufferSource more convenient to use

Summary: Make BufferSource more convenient to use

Status:	NEW

Alias:	None

Product:	WebAppsWG
Classification:	Unclassified
Component:	WebIDL (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Cameron McCormack
QA Contact:	public-webapps-bugzilla

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	24072 27030
	Show dependency tree / graph

Reported:	2015-06-11 17:57 UTC by Anne
Modified:	2017-03-06 07:07 UTC (History)
CC List:	8 users (show)

See Also:

Attachments

Description Anne 2015-06-11 17:57:20 UTC

http://heycam.github.io/webidl/#idl-buffer-source-types

[[
 get a reference to the bytes held by the buffer source or get a copy of the bytes held by the buffer source
]]

These algorithms can throw.

It seems for ArrayBuffer the binding layer would have thrown, but this is not the case for e.g. Uint8Array. We might want to make that consistent though and let the binding layer handle all the exceptional cases. I doubt any specification writer is taking this into account.

(I would also somewhat prefer it if the "pointer vs copy" decision was a syntax-decision too. That way it is much easier to detect whether a specification is considering this and whether it is doing the right thing.)

Comment 1 Boris Zbarsky 2015-06-11 18:08:39 UTC

Handling the exceptional cases in the binding layer is not trivial.  Consider this function signature:

  void foo(Uint8Array array, long something)

called like so:

  var arr = new Uint8Array(5);
  var obj = { valueOf: function() { /* detach arr here */ } };
  foo(arr, obj);

You have to check for detached arr after all things that might have side-effects have happened, basically.  In practice, the right time is when you actually want to use the data.

Comment 2 Anne 2015-06-12 07:33:34 UTC

Also in practice, most specifications today copy the bytes. That does mean the binding layer could be in charge. 

We could then require some special syntax for the more dangerous pointer approach.

Comment 3 Tab Atkins Jr. 2015-06-16 21:51:07 UTC

Strongly agree with something syntactic to indicate the reference case.  I don't want to handle strewing "is detached" checks throughout my algorithms when the binding layer can just copy the bytes for me; I also want it to be clear when this *is* a possibility, so I don't forget it.

Comment 4 Domenic Denicola 2016-01-22 16:51:49 UTC

Maybe one thing that could be done to move this forward is someone could do a pull request against HTML that updates it to use a syntactic [CopyBytes] or [ReferenceBytes] on every buffer source arguments. We can then critique concrete examples of how this makes things clearer, or how they change the behavior (for example by moving checks earlier than they currently are). That would help us determine whether they're a good idea.

Comment 5 Anne 2016-01-28 01:02:55 UTC

The only thing in HTML that takes a BufferSource-like argument is WebSocket and we'd copy there. (The other place they come up is structured clones which requires custom logic.)

sendBeacon(), fetch(), and XMLHttpRequest would copy.

TextDecoder would copy.

Web Audio API has custom logic (it detaches the ArrayBuffer).

Haven't looked further, but it seems like the pattern is copy or custom logic. Having something for copy in IDL would be great. In particular by letting the input be a BufferSource, and the output an exception or a byte sequence.

Comment 6 Joshua Bell 2016-01-28 17:46:14 UTC

Documenting for posterity: IndexedDB has custom logic and explicitly copies via prose (and accounts for throwing)
https://w3c.github.io/IndexedDB/#steps-to-convert-a-value-to-a-key

Comment 7 Tab Atkins Jr. 2016-02-09 20:40:21 UTC

Similarly documenting: Font Loading's FontFace() constructor takes binary data, and would copy.

These examples, and my own intuition, suggest that most functions with binary data arguments are *consuming* those arguments, not sharing them, and so we should default to copying (which is the safer default anyway) and require an annotation to indicate that a reference is being held.

Comment 8 Domenic Denicola 2016-02-09 22:50:24 UTC

(In reply to Tab Atkins Jr. from comment #7)
> Similarly documenting: Font Loading's FontFace() constructor takes binary
> data, and would copy.
> 
> These examples, and my own intuition, suggest that most functions with
> binary data arguments are *consuming* those arguments, not sharing them, and
> so we should default to copying (which is the safer default anyway) and
> require an annotation to indicate that a reference is being held.

The way you phrase this is interesting. I'd think that if something is *consuming* its argument, it should instead transfer it (thus detaching the argument). Then, it has been consumed.

I guess that's probably not what you meant, but regardless it seems unfortunate that there's no easy way for FontFace() users to avoid the copy, even if they never intend to use that binary data again.

Comment 9 Tab Atkins Jr. 2016-02-10 04:09:32 UTC

Ah yeah, I wasn't thinking of the third option - actually stealing ownership of the buffer.  I wasn't intending that - it's unspecified right now, but FontFace *should* copy.  IndexedDB, and all the messaging APIs, are doing the same.

Copying is just the safest of all the options.  We *could* design things such that authors could avoid the copy, by setting some flag or similar on a buffer that indicates it's a "copy-once" buffer, and loses control of its data when any API does a copy.  I'm not comfortable doing that by default, tho.

Comment 10 Anne 2016-02-11 15:17:15 UTC

Here's a strawman:

1. BufferSource (and what it expands to) copies by default. This is safe and the common case. It gives the specification algorithm to a byte sequence. IDL handles all exceptional cases.
2. [Detach] BufferSource detaches the buffer source and gives the specification algorithm access to a byte sequence. This is useful for Web Audio, maybe elsewhere. IDL handles all exceptional cases.
3. [UnsafeReference] BufferSource gives the specification algorithm to the JavaScript object. Handle with care. postMessage() and friends will need this. IDL only performs its normal checks.

Comment 11 Domenic Denicola 2016-02-11 15:28:02 UTC

That seems reasonable to me.

See also https://www.w3.org/Bugs/Public/show_bug.cgi?id=29388#c2; we may want to think about shared array buffers at the same time. I guess [AllowShared] BufferSource would implicitly not copy? Or maybe we would require [AllowShared] [UnsafeReference] BufferSource.

Comment 12 Boris Zbarsky 2016-02-11 15:36:27 UTC

> 1. BufferSource (and what it expands to) copies by default.

At what point does the copy occur?  At IDL argument conversion time for the BufferSource argument, or after all argument conversions are done but before entry into the algorithm prose?  Does this only apply to BufferSource, or to typed array and ArrayBuffer arguments in general?

I assume WebGL will end up using [UnsafeReference] for performance reasons, unless we make the copy happen at entry-into-prose time; then it could probably use the copying behavior.

Comment 13 Anne 2016-02-11 15:38:00 UTC

Judging from the comments on that bug it sounds like for shared buffer/views we want an explicit opt-in and use the reference model (coupled with prose).

That suggests something like [UnsafeReferenceAndShared] BufferSource. Or indeed two attributes, but that seems a little odd since you could never use [AllowShared] on its own.

Comment 14 Domenic Denicola 2016-02-11 15:39:18 UTC

> Does this only apply to BufferSource, or to typed array and ArrayBuffer arguments in general?

I would hope for typed array and ArrayBuffer in general. I'd guess that's what Anne means by "BufferSource (and what it expands to)"

> I assume WebGL will end up using [UnsafeReference] for performance reasons, unless we make the copy happen at entry-into-prose time; then it could probably use the copying behavior.

Can you explain why this would be? It's not clear to me why entry-into-prose time would be more performant and acceptable to WebGL than at IDL argument conversion time.

Comment 15 Anne 2016-02-11 15:40:32 UTC

bz, I don't really care when the conversion takes place as long as it is before it hits the relevant standard. Faster seems better obviously. The idea was that it would apply to ArrayBuffer and all views (i.e., everything BufferSource represents).

Comment 16 Boris Zbarsky 2016-02-11 15:48:14 UTC

> Can you explain why this would be? 

Well, let's take an example WebGL API:

    void bufferData(GLenum target, ArrayBuffer? data, GLenum usage);

If we do the copying at argument conversion time, we MUST make a copy when we process the "data" argument, because processing the "usage" argument can detach the ArrayBuffer (e.g. if the thing passed for the third argument is { valueOf: function() { /* detach stuff here*/ return something; } }).  Then we have to hand over the buffer to the actual graphics driver, which involves another buffer copy.

If we do the copying at entry-into-prose time (which I will be the first to admit is much harder to specify and implement!) then the implementation could make the handoff to the graphics driver be the only copy, as long as it's careful to not enter script in any way between entry into the algorithm and the call into the graphics driver.

If the array is big, having to do an extra copy can be pretty painful, both in terms of memory usage and time....

Comment 17 Boris Zbarsky 2016-02-11 15:50:56 UTC

Oh, and the point is that copying at arg conversion time is quite observably different from copying at entry into prose time.  It's possible to try to add hacks that avoid it in an implementation (e.g. in my example detect that the "usage" argument is a primitive and avoid the copy in that case or something), but relying on implementations doing that is a bit questionable, especially since it's hard to automate that sort of thing.

Comment 18 Domenic Denicola 2016-02-11 16:47:47 UTC

Thanks for the explanation. I certainly don't want an extra copy. I'm still a little confused though. Is the idea that implementations are only able to call into the graphics driver once? They can't detach/hand to graphics driver, continue running argument conversions (including potential user code), and then go back into the graphics driver and make use of the buffer?

Comment 19 Domenic Denicola 2016-02-11 16:50:25 UTC

(In reply to Domenic Denicola from comment #18)
> They can't detach/hand to graphics driver

Sorry, I meant copy to the graphics driver.

Comment 20 Boris Zbarsky 2016-02-11 17:41:13 UTC

I'm not sure I follow the question... For that specific API (bufferData) the implementation would make a call into the graphics driver, passing in a buffer.  The driver would then make a copy of that buffer and work on uploading it to the GPU and whatnot.

From what I can tell, the OpenGL API doesn't allow handing over ownership of memory to the driver in this situarion.  See documentation at https://www.khronos.org/opengles/sdk/docs/man/xhtml/glBufferData.xml -- it explicitly does a copy.

Comment 21 Boris Zbarsky 2016-02-11 17:42:15 UTC

Oh, you were asking whether there's a way to hand over the data before you know the value of "usage"?  The answer seems to be "no".

Comment 22 Domenic Denicola 2016-02-11 17:55:32 UTC

(In reply to Boris Zbarsky from comment #21)
> Oh, you were asking whether there's a way to hand over the data before you
> know the value of "usage"?  The answer seems to be "no".

Right, that was the question. So yeah, since that doesn't work then I agree doing the copy before entering prose makes the most sense.

A bit annoying to spec in IDL, and presumably to implement in bindings, but the end result will be a very usable system for spec authors (and I doubt web developers will notice a difference).

Comment 23 Boris Zbarsky 2016-02-11 18:54:40 UTC

I think it's actually quite annoying to spec and even worse to implement for nontrivial cases like dictionaries that contain sequences of things that might be ArrayBuffers...

Maybe we should think carefully about where large buffers would actually be used, other than WebGL.  If most of the rest are fully under the control of the browser (so that once the first copy has been made the data can get handed off without further copies), then the right solution might be to spec/implement the simple thing, have WebGL use [UnsafeReference] and manual data extraction (which is what it does in UAs right now anyway, right?) and leave it at that.

Comment 24 Tab Atkins Jr. 2016-02-11 19:55:58 UTC

This sounds very reasonable to me.