23369 – Provide hooks for Typed Arrays (ArrayBuffer and friends)

This is an archived snapshot of W3C's public bugzilla bug tracker, decommissioned in April 2019. Please see the home page for more details.

Bug 23369 - Provide hooks for Typed Arrays (ArrayBuffer and friends)

Summary: Provide hooks for Typed Arrays (ArrayBuffer and friends)

Status:	RESOLVED FIXED

Alias:	None

Product:	WebAppsWG
Classification:	Unclassified
Component:	WebIDL (show other bugs)
Version:	unspecified
Hardware:	PC All

Importance:	P2 normal
Target Milestone:	---
Assignee:	Cameron McCormack
QA Contact:	public-webapps-bugzilla

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:	23332 25966 26153 26154
	Show dependency tree / graph

Reported:	2013-09-26 14:21 UTC by Anne
Modified:	2014-10-20 17:15 UTC (History)
CC List:	11 users (show)

See Also:

Attachments

Description Anne 2013-09-26 14:21:47 UTC

Now Typed Arrays moved into ES6 we need new hooks.

I think what most platform APIs need is "byte sequence" as input type and a way to get the value out of that "byte sequence". I.e. doing the right thing with a %TypedArray%'s buffer, accepting ArrayBuffer(?), etc.

We also need a way to return specific types, see http://encoding.spec.whatwg.org/#textencoder for an example.

Comment 1 Boris Zbarsky 2013-09-26 14:37:04 UTC

I don't see any reason to not keep calling these by their ES names, even in IDL.  No need to make up new names and syntax here.

Comment 2 Joshua Bell 2013-09-26 17:07:15 UTC

Marking as "blocks https://www.w3.org/Bugs/Public/show_bug.cgi?id=23332" since if we decide anything here it will change how that API should be defined. It's another "operation needs a byte sequence" API, like encoding.

Agreed w/ BZ that we do not need new syntax/names here. It seems to me that we should agree on what this (growing?) class of operations should accept so they are consistent.

In these APIs, the type of the view is unimportant and potentially misleading - if a caller passes a Float32Array the floats are ignored and just the underlying bytes are used.

A few possibilities:

#1: operation( (ArrayBuffer or ArrayBufferView) byteSequence );
#2: operation( ArrayBuffer byteSequence );
#3: operation( ArrayBufferView byteSequence );
#4: operation( DataView byteSequence );
#5: operation( Uint8Array byteSequence );
#6: operation( (ArrayBuffer or Uint8Array) byteSequence );

#1 lets you pass in a buffer or a view on a buffer. https://bugzilla.mozilla.org/show_bug.cgi?id=796327 indicates window.ArrayBufferView is being removed (which would complicate polyfills) and ArrayBufferView is not in the ES6 drafts. Can it still be used in IDL? If not this would need to become a union of all the array types i.e. (Int8Array or Uint8Array or Uint8ArrayClamped or Int16Array or ... or Float64Array).

#2 forces the caller to have a raw "array of bytes". If operating on a subset of a larger buffer this requires a copy. Bleah.

#3 is possibly problematic for the same reasons as #1. (FWIW, this is what TextDecoder accepts today)

#4 uses the type-agnostic DataView, most commonly used for encoding/decoding structs/files. In the abstract this seems more correct than many of the other options, but it's one more concept for developers to learn.

#5 anoints Uint8Array as "this is a byte sequence". It's akin to saying "in this C++ project, a we hold a sequence bytes in a std::vector<unsigned char>", for the web platform. (And I'm okay with that, but others may object)

#6 is a slight compromise on #5 that avoids having to new Uint8Array(buffer) if you already have an ArrayBuffer from another source. This seems like API clutter to me. 

If we e.g. agree on #5 for this class of APIs then we don't need anything new in IDL - it's Uint8Array in and out from these operations.

Comment 3 Tab Atkins Jr. 2013-09-26 17:49:15 UTC

I don't like anything that forces us to be explicit about what kind of view we require when it's irrelevant.  Spec authors will just get this wrong.

For example, my FontFace interface defines a "BinaryData" typedef to allow passing either a raw ArrayBuffer or a view to the constructor, because it doesn't actually matter which one is used.

Comment 4 Anne 2013-09-26 17:52:15 UTC

So I think we should go with APIs accepting /TypedArray/ Objects, ArrayBuffer Objects, and DataView Objects, in terms of ES6. If we introduce these as IDL types as suggested in comment 1, I suppose we could produce a typedef as follows:

typedef (Int8Array or Uint8Array or Uint8ClampedArray or Int16Array or Uint16Array or Int32Array or Uint32Array or Float32Array or Float64Array or ArrayBuffer or DataView) ByteSequence;

It's still not entirely clear to me what the best return value is. XMLHttpRequest uses ArrayBuffer. TextEncoder uses Uint8Array.

Comment 5 Boris Zbarsky 2013-09-26 18:17:32 UTC

For what it's worth, SpiderMonkey internally uses ArrayBufferView to mean "any typed array or DataView" (which is what ArrayBufferView actually used to mean).  Basically anything with a .buffer on it.

That's more or less what your ByteSequence ends up being, modulo the ArrayBuffer bit, but I think the ArrayBufferView terminology actually makes sense for "any object which provides a view of an ArrayBuffer".

I would be fine with having ArrayBufferView as a primitive IDL type defined as above, to simplify things for spec authors, even though it's not an interface type...  The only problem is defining distinguishability: typed array types are distinguishable from each other and from DataView, but ArrayBufferView is not distinguishable from typed array types and DataView).

Comment 6 Allen Wirfs-Brock 2013-09-26 18:29:15 UTC

(In reply to Anne from comment #4)
> So I think we should go with APIs accepting /TypedArray/ Objects,
> ArrayBuffer Objects, and DataView Objects, in terms of ES6. If we introduce
> these as IDL types as suggested in comment 1, I suppose we could produce a
> typedef as follows:
> 
> typedef (Int8Array or Uint8Array or Uint8ClampedArray or Int16Array or
> Uint16Array or Int32Array or Uint32Array or Float32Array or Float64Array or
> ArrayBuffer or DataView) ByteSequence;

But this would unnecessarily preclude ES6 subclasses of any of those named typed array constructors.

The replacement for ArrayBufferView is the function ArrayBuffer.isView(obj) which tests where its argument is an object that has an ArrayBuffer as its backing store.

It seems to me, that what you really want to say (in prose) for that typedef is any object for which ArrayBuffer.isView(object) is true.

Also, need to keep in mind that such view objects may only be viewing a subsequence of bytes in an ArrayBuffer.  If an API is going to accept both views and ArrayBuffers it needs to be able to map a view to the appropriate subsequence of its buffer.

Comment 7 Boris Zbarsky 2013-09-26 18:33:05 UTC

> But this would unnecessarily preclude ES6 subclasses of any of those named
> typed array constructors.

That depends on how this stuff is defined.  If it's defined to check branding bits, ES6 subclasses should Just Work, I would think.

> The replacement for ArrayBufferView is the function ArrayBuffer.isView(obj)

Right.  My proposal is that we have a WebIDL way to say "accept any object for which ArrayBuffer.isView() would return true".  Then we just need a name for that concept.

> If an API is going to accept both views and ArrayBuffers it needs to be able to
> map a view to the appropriate subsequence of its buffer.

Yes.  The point of accepting views is to only use the byes mapped by the view, not the entire backing store.

Comment 8 Jonas Sicking (Not reading bugmail) 2013-09-26 18:55:23 UTC

I think it depends on what the API does with the passed in function.

For example in bug 23332 we want to do a binary sort of two "byte sequences". In that case accepting two Float64Arrays, but then not comparing them float-by-float feels very unintuitive. So either we should not accept Float64Arrays, or we should compare them according as floats.

For something like WebSocket it seems more ok to accept a Float64Array and then simply treat it as a binary blob since all we are doing is streaming it over the wire. Likewise with something like a file-writing API as well as the Blob constructor.

Comment 9 Allen Wirfs-Brock 2013-09-26 19:11:44 UTC

(In reply to Boris Zbarsky from comment #7)
> > But this would unnecessarily preclude ES6 subclasses of any of those named
> > typed array constructors.
> 
> That depends on how this stuff is defined.  If it's defined to check
> branding bits, ES6 subclasses should Just Work, I would think.
> 
Yes, subclasses should work at runtime.

My question was more about the semantics of the WebIDL you wrote. When (in WebIDL) you write Uint8Array does that mean direct instances of the corresponding ES constructor or does it also allow for instances of subclasses of the constructor.

If, when you talk about, branding you mean something else other than subclassing then that probably needs to be made more explicit in the WebIDL ES binding (or may it is, and I'm just not up to speed on that...)

In the ES6 spec. there (currently, it's still a draft) are two relevant branding-like internal properties of typed array instances. [[ViewedArrayBuffer]] indicates that the object is a view on an ArrayBuffer and [[TypedArrayName]] indicates that it is a TypedArray instances and its value (which may not be the actual constructor name) provides the array element type.  See http://people.mozilla.org/~jorendorff/es6-draft.html#sec-22.2.7

Comment 10 Anne 2013-09-26 19:15:57 UTC

In IDL if something accepts Node, it also accepts Element, which is a subclass. Whether that can still be true with ES6-style subclassing is something we have to figure out, but given that through @@create we can guarantee some reliable bits in the object, it might be okay.

Indexed DB keys seems like a special case. I was primarily thinking of IO and encoding APIs where anything that represents bytes is fine.

Comment 11 Boris Zbarsky 2013-09-26 19:40:05 UTC

> When (in WebIDL) you write Uint8Array does that mean direct instances of the
> corresponding ES constructor or does it also allow for instances of subclasses
> of the constructor.

I think the only sane thing is allowing subclasses.

Right now webidl just talks about "objects implementing the interface" or some such, but we can formalize that as desired.  And I think it's desired to formalize it in a way that makes subclassing work.  As Anne, says, @@create inheritance makes this reasonable.

Comment 12 Anne 2014-04-11 10:08:47 UTC

So can we get a type (maybe DOMBytes?) for objects for which ArrayBuffer.isView(object) returns true or which are ArrayBuffer instances? As stated in comment 0 this should have a concept of "bytes" it holds, taking into account buffer, byteOffset, and byteLength.

It's not entirely clear to me what would be the best output type. The Encoding Standard uses Uint8Array at the moment, but maybe we should just use ArrayBuffer for that?

Comment 13 Ryan Sleevi 2014-06-09 06:09:41 UTC

Another reason to place this in WebIDL: Currently, when dealing with Promise-returning specifications, it's ambiguous about how ArrayBuffer/ArrayBufferView data is handled.

That is, the naive approach - "Return a Promise and continuing executing these steps asynchronously" (as recommended by https://github.com/w3ctag/promises-guide ) doesn't quite work, because it allows the script environment to manipulate the underlying bytes while the Promise-d operation continues.

Both WebCrypto and WebAudio have had to define how to do this, and in slightly different ways. WebAudio pursued an (invalid) approach of 'temporarily neuter', while WebCrypto pursued a path of 'copy'. Other paths discussed in WebCrypto were 'permanently neuter' (aka, Transferrable) or 'copy on write' (ruled out by implementors as complexity/performance overhead for the common case of ArrayBuffers)

It's unclear whether or not WebIDL should have annotations to support this special-case, or whether all specs should pursue one or the other. I would expect that, since ArrayBufferViews are seen as valid inputs, a 'copy' approach is desirable for consistency, but I can see specs needing to annotate transferrable.

Comment 14 Domenic Denicola 2014-06-09 06:18:29 UTC

I think the best return type is ArrayBuffer in general, when representing "binary data".

If there's a specific reason for 8-bit sequences in a particular case, Uint8Array could be OK. For example if some API was explicitly doing floating-point math it could return a Float32Array instead of an ArrayBuffer.

Comment 15 Boris Zbarsky 2014-06-10 01:50:36 UTC

> because it allows the script environment to manipulate the underlying bytes
> while the Promise-d operation continues

Which may be the right thing in some rare cases, in fact (e.g. if the Promise-d operation is not actually doing anything with those bytes).

> or 'copy on write' (ruled out by implementors as complexity/performance
> overhead for the common case of ArrayBuffers

I'd think that except for performance characteristics "copy on write" is not black-box distinguishable from "copy", no?

> I would expect that, since ArrayBufferViews are seen as valid inputs, a 'copy'
> approach is desirable for consistency

I don't think we should be forcing a copy on people, though we should probably have guidance that says that it's the appropriate thing to do if you plan to actually asynchronously use the data that was passed in.

Comment 16 Adrian Bateman [MSFT] 2014-06-17 15:31:47 UTC

This topic was discussed for Bug 25966 and Anne suggested that I comment. My proposal is to find a good way to allow the following:

1) Methods should accept either ArrayBuffer or ArrayBufferView. If you have an ArrayBuffer you don't want to have to create a view to be able to use it. Alternatively, you might have a view that maps part of a buffer and you don't want to have to make an ArrayBuffer to use this part.

2) For data being provided in attributes or events, provide it as an ArrayBuffer over which you can create a view if you wish or use it as is. It may be that 8-bit interpretation is the only natural way to read the data but developers can do what they want.

From reading the history here, it looks like the key thing is some simple IDL for accepting the ArrayBuffer or ArrayBufferView (including subclassing, though that's a general issue).

Comment 17 Allen Wirfs-Brock 2014-06-17 17:07:27 UTC

(In reply to Ryan Sleevi from comment #13)
> Another reason to place this in WebIDL: Currently, when dealing with
> Promise-returning specifications, it's ambiguous about how
> ArrayBuffer/ArrayBufferView data is handled.
> 
> That is, the naive approach - "Return a Promise and continuing executing
> these steps asynchronously" (as recommended by
> https://github.com/w3ctag/promises-guide ) doesn't quite work, because it
> allows the script environment to manipulate the underlying bytes while the
> Promise-d operation continues.

There is nothing unique to binary data in this regard, the same would be true if you were passing a tree of ordinary JS objects. Also, there is nothing unique to Promises in this regard. JS does not provide any sort of isolation between JS-level micro-tasks or callbacks.

The solution in any specific case is simply, if you don't want anybody other than a 'then' onFulfilled function to have access to the promise's ultimate value don't pass the value to anybody else.  If the value is already shared, copy it before fulfilling the promise, so any sort of deferred communications presents the possibility of this sort of interference.  

> 
> Both WebCrypto and WebAudio have had to define how to do this, and in
> slightly different ways. WebAudio pursued an (invalid) approach of
> 'temporarily neuter', while WebCrypto pursued a path of 'copy'. Other paths
> discussed in WebCrypto were 'permanently neuter' (aka, Transferrable) or
> 'copy on write' (ruled out by implementors as complexity/performance
> overhead for the common case of ArrayBuffers)
> 
> It's unclear whether or not WebIDL should have annotations to support this
> special-case, or whether all specs should pursue one or the other. I would
> expect that, since ArrayBufferViews are seen as valid inputs, a 'copy'
> approach is desirable for consistency, but I can see specs needing to
> annotate transferrable.

These seem like everyday async cases that anybody who is designing async subsystems have to deal with, regardless of whether or not an ArrayBuffer is involved.  A solution that only worked with ArrayBuffer based data structures does not help with the general problem.

Comment 18 Ryan Sleevi 2014-06-17 17:28:36 UTC

(In reply to Allen Wirfs-Brock from comment #17)
> There is nothing unique to binary data in this regard, the same would be
> true if you were passing a tree of ordinary JS objects. Also, there is
> nothing unique to Promises in this regard. JS does not provide any sort of
> isolation between JS-level micro-tasks or callbacks.

This isn't actually correct.

The WebIDL specification defines the conversion routines for ECMAscript objects into WebIDL types. This conversion process is documented as copying (eg: a DOMString that contains the value equivalent to the ECMAScript that results from the ToString(v))

A tree of objects is cascaded into IDL types, if using a dictionary/sequence, for example.

The only time that I can see that WebIDL doesn't provide this explicit conversion is the any/object types, which retain references to their underlying platform type.

That's why this is a request to provide a WebIDL definition for how the ECMAScript %TypedArray% type is handled, within WebIDL, rather than relying on "object" semantics.

> The solution in any specific case is simply, if you don't want anybody other
> than a 'then' onFulfilled function to have access to the promise's ultimate
> value don't pass the value to anybody else.  If the value is already shared,
> copy it before fulfilling the promise, so any sort of deferred
> communications presents the possibility of this sort of interference.  

I fear you've misunderstood this bug.

This has nothing to do with the promises ultimate value (eg: the output argument); it has everything to do with the inputs to a Promise-returning function (eg: function arguments).

As noted, specifications that wish to perform asynchronous operations on the binary data need to explicitly document that a copy is returned (otherwise, once control returns to the caller, the caller can manipulate the underlying data in parallel to the Promise-returning operation)

Not all specifications will; some may be able to perform their tasks over the data before yielding control to the author.


> These seem like everyday async cases that anybody who is designing async
> subsystems have to deal with, regardless of whether or not an ArrayBuffer is
> involved.  A solution that only worked with ArrayBuffer based data
> structures does not help with the general problem.

WebIDL defines how this works for most other ECMAScript primitives, by providing IDL types that represent and duplicate the data, leading them to to be "safe" for asynchronous operations. This is not special-casing ArrayBuffer; this is ensuring that ArrayBuffer has a similar first-class experience.

Comment 19 Allen Wirfs-Brock 2014-06-17 18:28:50 UTC

(In reply to Ryan Sleevi from comment #18)
> (In reply to Allen Wirfs-Brock from comment #17)
> > There is nothing unique to binary data in this regard, the same would be
> > true if you were passing a tree of ordinary JS objects. Also, there is
> > nothing unique to Promises in this regard. JS does not provide any sort of
> > isolation between JS-level micro-tasks or callbacks.
> 
> This isn't actually correct.
> 
> The WebIDL specification defines the conversion routines for ECMAscript
> objects into WebIDL types. 
...

I'm afraid I have to continue to disagree. 

This stuff all will be  happening in the context of an standard ECMAScript environment that includes ES based specifications of Promise, ArrayBuffer, various typed arrays, DataView, etc. And none of the conversions and copying you describe are normal ES behaviors. In particularly, the copying of a graph of objects object when passed behavior you describe would be a very rare occurrence in idiomatic ES code.

Sure, you can make your WebIDL-based APIs do whatever you want, but you should get past the perspective that it is WebIDL that is defining the entire programming model and the expected idioms ES programmers. ES programmers will be using Promises for many things, and most of them won't be defined using WebIDL interfaces and semantics.

Web platform APIs should start out as good idiomatic ES API and then get refined, if necessary, to ensure any platform invariants. Not the other way around.

> 
> > The solution in any specific case is simply, if you don't want anybody other
> > than a 'then' onFulfilled function to have access to the promise's ultimate
> > value don't pass the value to anybody else.  If the value is already shared,
> > copy it before fulfilling the promise, so any sort of deferred
> > communications presents the possibility of this sort of interference.  
> 
> I fear you've misunderstood this bug.
> 
> This has nothing to do with the promises ultimate value (eg: the output
> argument); it has everything to do with the inputs to a Promise-returning
> function (eg: function arguments).

I actually don't think it makes a difference.  As long as async tasks are involved, the stability of both inputs and outputs can be an issue.  Whether or not mutable values are accessible from other logically concurrent tasks can be an issue in either case. But, as long as you have shared state (which you do have even with ES's concurrency constrained event loop tasking) you have to deal with it.)
> 
> As noted, specifications that wish to perform asynchronous operations on the
> binary data need to explicitly document that a copy is returned (otherwise,
> once control returns to the caller, the caller can manipulate the underlying
> data in parallel to the Promise-returning operation)

If there are integrity issues involved, you may need to explicitly copy.  But it many other situations such copying will be too expensive and it will be perfectly adequate to document that the passed data structures should not be accessed or modified by the caller until the result promise has been settled. What if the caller violates that restriction?  It's a programmer bug just like missions of other possible programming bugs.

> 
> Not all specifications will; some may be able to perform their tasks over
> the data before yielding control to the author.
> 
> 
> > These seem like everyday async cases that anybody who is designing async
> > subsystems have to deal with, regardless of whether or not an ArrayBuffer is
> > involved.  A solution that only worked with ArrayBuffer based data
> > structures does not help with the general problem.
> 
> WebIDL defines how this works for most other ECMAScript primitives, by
> providing IDL types that represent and duplicate the data, leading them to
> to be "safe" for asynchronous operations. This is not special-casing
> ArrayBuffer; this is ensuring that ArrayBuffer has a similar first-class
> experience.

The important thing in ES isn't the primitives, it is the new abstractions (call them "classes" if you want) that ES programmer construct out of graphs of ordinary ES objects. WebIDL doesn't really provide much at all that generalizes over such ES based abstractions.

You are taking the approach that WebIDL is defining "first-class" behavior and that makes everything else an ES programmer does "second-class".  That's just backwards.

Comment 20 Ryan Sleevi 2014-06-17 19:11:43 UTC

(In reply to Allen Wirfs-Brock from comment #19)
> Web platform APIs should start out as good idiomatic ES API and then get
> refined, if necessary, to ensure any platform invariants. Not the other way
> around.

I think we're actually in violent agreement here; 

WebAudio, as a bad example, uses pseudo-neutering of objects. That doesn't really work, because it's not valid ES.

WebCrypto handles this in terms of ES primitives and ES-language; see https://dvcs.w3.org/hg/webcrypto-api/raw-file/tip/spec/Overview.html#concept-clone-CryptoOperationData

My request is that this sort of language be incorporated (optionally) as part of WebIDL, so that a more meaningful short-hand can emerge, and that conventions can be consistently identified.

That is, imagine an attribute on function arguments that is similar to [TreatNullAs=] that indicates how the incoming %TypedArray% type should be handled, as part of the ES->WebIDL type conversion.

> If there are integrity issues involved, you may need to explicitly copy. 
> But it many other situations such copying will be too expensive and it will
> be perfectly adequate to document that the passed data structures should not
> be accessed or modified by the caller until the result promise has been
> settled. What if the caller violates that restriction?  It's a programmer
> bug just like missions of other possible programming bugs.

Again, more violent agreement; there are times where copying is appropriate, and times when copying is not appropriate.

When copying is appropriate or desired for a general platform API, it's desirable to have that copying behaviour behave consistently across conforming WebIDL specifications. That is, much in the same way that the Object->Dictionary conversion is defined (down to the order of how fields are accessed), it would be "Nice" / "Good" to have a similar method, as part of WebIDL, to indicate how TypedArrays are handled for Promise returning specifications. That's all I was requesting.

> You are taking the approach that WebIDL is defining "first-class" behavior
> and that makes everything else an ES programmer does "second-class".  That's
> just backwards.

I fear you've misunderstood the intent, as it's certainly not the case. My comparison of first and second class has nothing to do with ES-vs-WebIDL, and is solely about WebIDL-vs-prose (as in the example of WebCrypto), where typed annotations are vastly less likely to result in bugs or implementation errors than prose is, and is vastly more likely to be consistent across multiple independent specifications.

Comment 21 Allen Wirfs-Brock 2014-06-17 21:14:54 UTC

(In reply to Ryan Sleevi from comment #20)
> (In reply to Allen Wirfs-Brock from comment #19)
> > Web platform APIs should start out as good idiomatic ES API and then get
> > refined, if necessary, to ensure any platform invariants. Not the other way
> > around.
> 
> I think we're actually in violent agreement here; 

Coo!
> 
> ... 
> My request is that this sort of language be incorporated (optionally) as
> part of WebIDL, so that a more meaningful short-hand can emerge, and that
> conventions can be consistently identified.
> 
> That is, imagine an attribute on function arguments that is similar to
> [TreatNullAs=] that indicates how the incoming %TypedArray% type should be
> handled, as part of the ES->WebIDL type conversion.

I see two issues, the bigger one is on how exactly to generalize the sort of cloning you describe in your spec.  More on that below.

The second issue is that, to me, the need to do this sort of cloning should be relatively rare and should be backed up by some sort of analysis that justifies it. If it is too easy for an API designer say that n argument value should be cloned that might lead to a tendency for some API designers to clone everything without really having thought through whether the clone is essential.

For example, in the WebCrypot API I notice that the encrypt and decrypt methods both are specified to clone their data parameters. Why? It certainly doesn't make any sense to modify the data buffer after the initial call and before processing starts, but is there any integrity or other essential reasons that must be actively guarded against with the cost of forcing a copy of a potentially large buffer? Also, is the a reason that the CryptoKey parameter objects aren't cloned (probably deeply) to ensure that it isn't mutated at the same time a buffer might be mutated?

> 
> > If there are integrity issues involved, you may need to explicitly copy. 
> > But it many other situations such copying will be too expensive and it will
> > be perfectly adequate to document that the passed data structures should not
> > be accessed or modified by the caller until the result promise has been
> > settled. What if the caller violates that restriction?  It's a programmer
> > bug just like missions of other possible programming bugs.
> 
> Again, more violent agreement; there are times where copying is appropriate,
> and times when copying is not appropriate.
> 
> When copying is appropriate or desired for a general platform API, it's
> desirable to have that copying behaviour behave consistently across
> conforming WebIDL specifications. That is, much in the same way that the
> Object->Dictionary conversion is defined (down to the order of how fields
> are accessed), it would be "Nice" / "Good" to have a similar method, as part
> of WebIDL, to indicate how TypedArrays are handled for Promise returning
> specifications. That's all I was requesting.

Which gets us back to my first issue.  It easy to say what it means to clone a binary buffer. But as soon as you get anything more complicated a generalized meaning of clone becomes less clear.  With ES6 we have the possibility of user defined subclasses of the built-in typed array types existing and they can add  own properties to the array instances. Does a clone need to copy them? It probably depends on the usage and the motivation for the clone. 

If we're talking about any sort of real object graph (for example, a CryptKey??) a generalized clone operation is much more problematic. 

To the degree the are only talking about cloning binary data this seems reasonable but as soon has we get any object references involved its much less clear what a general solution might be (or even what the general problem is).

> 
> > You are taking the approach that WebIDL is defining "first-class" behavior
> > and that makes everything else an ES programmer does "second-class".  That's
> > just backwards.
> 
> I fear you've misunderstood the intent, as it's certainly not the case. My
> comparison of first and second class has nothing to do with ES-vs-WebIDL,
> and is solely about WebIDL-vs-prose (as in the example of WebCrypto), where
> typed annotations are vastly less likely to result in bugs or implementation
> errors than prose is, and is vastly more likely to be consistent across
> multiple independent specifications.

Cool.  Sorry I misunderstood your approach.  BTW, on a pretty much separate matter I'm not really convinced that you need to allow either an ArrayBuffer or a typed array for all of these data arguments.  I understand, the desire to be flexible but that complicates each API to a certain extent and also muddles the abstraction over which the APIs operate. I'd choose one or the other (I'd probably to the array route).  It's easy enough for an ES programmer to wrap a typed array instance around an ArrayBiuffer if that is what they have in hand.

Comment 22 Ryan Sleevi 2014-06-17 21:39:57 UTC

(In reply to Allen Wirfs-Brock from comment #21)
> For example, in the WebCrypot API I notice that the encrypt and decrypt
> methods both are specified to clone their data parameters. Why? It certainly
> doesn't make any sense to modify the data buffer after the initial call and
> before processing starts, but is there any integrity or other essential
> reasons that must be actively guarded against with the cost of forcing a
> copy of a potentially large buffer? 

Since this is core to the discussion on this bug, I'll address it. There are several questions that are more specific to WebCrypto that, for sake of those following this bug, I've not answered. I'm more than happy to follow-up on public-webcrypto@ if you want to continue the discussion/API concerns there.

As it relates to this, the Promise vended by these operations (and it's not just encrypt/decrypt but *all* of the SubtleCrypto operations, save for exportKey), the user agent continues asynchronously processing the data 'in the background'.

As such, we need to ensure that the user does not mutate data at the same time the implementation is reading data, and that implementations afford consistency to script authors. There is not, within Promise-using ES6, seemingly any requirement for synchronizing access to data to the task queue; indeed, that could be quite bad for some cases (example being networking events).

Cryptographic operations can be costly, not just in CPU terms, but in some implementations, may result in IO waits. Many cryptographic APIs are blocking APIs themselves. Even a small buffer (of 20 bytes) can, for some operations, cause multi-second stalls if backed by hardware.

This API was designed to support both hardware and software-backed operations, even if we expect most of the first implementations are software-backed (as that's what the spec currently favours in description). Because of the hardware requirements, we don't want to pause the ES microtask pool / HTML task pool in order to synchronously invoke these methods; we would instead prefer to treat them like networking events (eg: dispatch data to some API, receive asynchronous notification of completion).

The output of the cryptographic operation should be consistent with its inputs. If we allowed arbitrary modification, we can run into situations like:

var a = new Uint8Array(4);
a.set([0, 1, 2, 3]);

var promise = crypto.subtle.encrypt(.., a, ...);
// At this time slice, the UA has not yet begun the encryption operation,
// because the underlying subsystem may only support a single pending operation,
// and the encrypt() is still pending
a[0] = 4

promise.then(result => { 
  // Does result contain ciphertext for [0, 1, 2, 3] or for [4, 1, 2, 3]
});

This is why an explicit copy is noted; the actual transformation of the input, plaintext buffer (A) is, in almost all cases (hardware or software), not going to happen synchronously before the return of the Promise and continuation of the task pool. So callers should have a reasonable expectation of what the transformation of the data will look like. Since objects - such as ArrayBuffer - are passed by reference, callers can fully mutate the object throughout the lifetime of the Promise, so it's necessary, as spec authors, to indicate at what point in execution the state is accessed.

In Web Crypto's case, we try to make it clear that any and all state that will be accessed before the return of the Promise object. Other specs (Web Audio and Web Font Loading come to mind), have not, and as such, have subtle issues in their spec that can lead to user-observerable, non-interoperable behaviours.

> Also, is the a reason that the CryptoKey
> parameter objects aren't cloned (probably deeply) to ensure that it isn't
> mutated at the same time a buffer might be mutated?

Let's follow up on public-webcrypto@. TL;DR: CryptoKey's are immutable holders of an internal object ( [[handle]] ). All ops work on the internal slots, which cannot be mutated.

> If we're talking about any sort of real object graph (for example, a
> CryptKey??) a generalized clone operation is much more problematic. 

We aren't, or more specifically, I'm not, and certainly not in this bug. I agree that other Promise-returning APIs may accept other forms of mutable objects, and those APIs may need to define at what timeslice/task event snapshots (if any) are taken of the object state for working with the API, and when/how object state is accessed.

However, for fundamental ES types - those for which Web IDL spends significant time dealing with (in terms of the ES <-> IDL type conversions) - it would be nice to see %TypedArray% - which is as primitive as %Array%, %String%, %Boolean%, or %Number% are in ES6 - to have specific language in Web IDL to explain how it's used with Web IDL, and to afford Spec Authors ways to hook or customize that behaviour (eg: copies), if necessary.

> BTW, on a pretty much separate matter I'm not really convinced
> that you need to allow either an ArrayBuffer or a typed array
> for all of these data arguments. 

A better fora for this is public-webcrypto@ / public-webcrypto-comments@ (or our WebCrypto bug tracker), where I'd be happy to address this feedback.

Comment 23 Allen Wirfs-Brock 2014-06-17 23:45:56 UTC

(In reply to Ryan Sleevi from comment #22)
... yes, got that.  It's how promise work in ES.
> 
> The output of the cryptographic operation should be consistent with its
> inputs. If we allowed arbitrary modification, we can run into situations
> like:
> 
> var a = new Uint8Array(4);
> a.set([0, 1, 2, 3]);
> 
> var promise = crypto.subtle.encrypt(.., a, ...);
> // At this time slice, the UA has not yet begun the encryption operation,
> // because the underlying subsystem may only support a single pending
> operation,
> // and the encrypt() is still pending
> a[0] = 4

yes, this would be a stupid thing for somebody to do

> 
> promise.then(result => { 
>   // Does result contain ciphertext for [0, 1, 2, 3] or for [4, 1, 2, 3]
> });
> 

Does it matter which cipertext was processed.  The caller was adequately warned that they were invoking an async operation which they acknowledged by explicitly coding a 'then' call.  They went ahead and did something stupid anyway.  Do we really need to be nannies protecting them from their stupidity. Is it worth the copying overhead for the vast majority of non-stupid callers who don't do stupid things like this?  So far, at least for these two operations it doesn't look like its an integrity issue.

> This is why an explicit copy is noted; the actual transformation of the
> input, plaintext buffer (A) is, in almost all cases (hardware or software),
> not going to happen synchronously before the return of the Promise and
> continuation of the task pool. So callers should have a reasonable
> expectation of what the transformation of the data will look like. Since
> objects - such as ArrayBuffer - are passed by reference, callers can fully
> mutate the object throughout the lifetime of the Promise, so it's necessary,
> as spec authors, to indicate at what point in execution the state is
> accessed.

Like I said above, yes this is all normal. Every good ES programmer understands these things.  They don't need to be protected from themselves. 


> 
> In Web Crypto's case, we try to make it clear that any and all state that
> will be accessed before the return of the Promise object. Other specs (Web

I didn't catch this as I quickly scan the Web Crypto spec, but I probably just missed it. At any rate, this answers my why not these question.

> Audio and Web Font Loading come to mind), have not, and as such, have subtle
> issues in their spec that can lead to user-observerable, non-interoperable
> behaviours.

I'm all for spec'ing thinks at a detailed enough level to ensure consistent, interoperable behavior. But, as soon as you have timing variability, you may have get observable difference, even within the same implementation. Isn't it adequate to make it clear which objects are subject to deferred access and hence produce timing sensitive results?

> ...
> 
> However, for fundamental ES types - those for which Web IDL spends
> significant time dealing with (in terms of the ES <-> IDL type conversions)
> - it would be nice to see %TypedArray% - which is as primitive as %Array%,

well, neither %TypedArray% or Array are primitive in the same sense as Number, Boolean, Number in ES6. But I agree that handling of %TypedArrays% has a place within WebIDL.  I don't think either Array or %TypedArray% instances should be routinely copied as part of the WebIDL parameter passing mechanism.

> %String%, %Boolean%, or %Number% are in ES6 - to have specific language in
> Web IDL to explain how it's used with Web IDL, and to afford Spec Authors
> ways to hook or customize that behaviour (eg: copies), if necessary.
>

Comment 24 Ryan Sleevi 2014-06-18 00:07:21 UTC

(In reply to Allen Wirfs-Brock from comment #23)
> Does it matter which cipertext was processed.  The caller was adequately
> warned that they were invoking an async operation which they acknowledged by
> explicitly coding a 'then' call.  They went ahead and did something stupid
> anyway.  Do we really need to be nannies protecting them from their
> stupidity. Is it worth the copying overhead for the vast majority of
> non-stupid callers who don't do stupid things like this?  So far, at least
> for these two operations it doesn't look like its an integrity issue.

I don't know what you mean by "an integrity issue". Modifying a buffer during verify, for example, can cause a different message to be verified than intended. Modifying a message during decrypt might allow an invalid message to be treated as valid, thus causing all sorts of cryptographic bugs.

This is not about being nannies. This is about being responsible spec authors and defining sensible memory models and pseudo-threading semantics, rather than being blaise and saying "Don't do that". The term "undefined behaviour" should be the last refuge of the spec author, not the first.

> well, neither %TypedArray% or Array are primitive in the same sense as
> Number, Boolean, Number in ES6. But I agree that handling of %TypedArrays%
> has a place within WebIDL.  I don't think either Array or %TypedArray%
> instances should be routinely copied as part of the WebIDL parameter passing
> mechanism.

And again, I'm not arguing that all %TypedArray% parameters should be copied. I've merely requested for the capability of indicating when a copy should happen.

It's clear from multiple specifications (Web Audio and Web Crypto at the forefront), that spec authors and users both have a desire to have reliable, known access patterns for data; patterns that conceptually align with the access patterns synchronous APIs provide, and patterns that encourage consistency across specs.

An annotative capability, for example, much like [TreatNullAs], does provide a consistent way for specifications to indicate certain behaviours, and authors to rely on them.

I'm not sure what use cases you have in mind that would benefit from this, or what design objections you have to the concept of copying, but I think the feedback and experience of implementers so far is that 'copying' is what makes sense for a number of reasons. I can't help but feel like if there is a use case for 'mutable-while-running', it's almost certainly a case that is better served by things like the nascent Streams API ( https://github.com/whatwg/streams ).

As it relates to Web IDL specifically - rather than, I think, some of the discussion here that might be better on public-script-coord@ - having Web IDL support TypedArray - as indicated in this bug report - and a way to indicate what behaviours/treatment the underlying buffer are given, would be useful to authors and users.

Comment 25 Allen Wirfs-Brock 2014-06-18 01:07:29 UTC

(In reply to Ryan Sleevi from comment #24)

I'll be short.  Didn't really mean to have a debate.
> 
> I don't know what you mean by "an integrity issue". Modifying a buffer
> during verify, for example, can cause a different message to be verified
> than intended. Modifying a message during decrypt might allow an invalid
> message to be treated as valid, thus causing all sorts of cryptographic bugs.

By "integrity issue" I primarily mean an issue that exposes otherwise inaccessible information or functionality.

Decrypting the wrong text because of a caller error sounds like a bug, not an integrity issue.  Leaking knowledge that allowed a secret key to be inferred would be an integrity issue.

> 
> This is not about being nannies. This is about being responsible spec
> authors and defining sensible memory models and pseudo-threading semantics,
> rather than being blaise and saying "Don't do that". The term "undefined
> behaviour" should be the last refuge of the spec author, not the first.
> 
Processor and (particularly) memory cycles are valuable things.  I'm not sure that requiring non-essential buffer copying in responsible spec authoring. My experience is that pervasively doing such things bogs down systems.  It's not any single copy or redundant check that's the problem is the collective effect of millions of them at runtime.

Comment 26 Domenic Denicola 2014-06-18 02:20:12 UTC

The question is, could interop survive with the undefined behavior Allen suggests? I am not so sure.

For example, let's say that in browser A the code given always produces ciphertext for [0, 1, 2, 3], whereas browser B has implemented an optimization that ~10% of the time produces ciphertext for [4, 1, 2, 3]. Users will find sites that break 10% of the time in browser B, and then browser game theory comes into play, causing browser B to drop their optimization. At that point we may as well have specced to always produce ciphertext for [0, 1, 2, 3]; the undefined behavior has acquired a de-facto required definition.

I am wary of the performance perils of excessive copying as well, but I don't think undefined behavior can survive long on the web.

Furthermore, it's worth pointing out that if browsers would get around to implementing copy-on-write for array buffers, then the copying behavior would be free in the usual case (where the JS programmer does not mutate the array buffer after giving it to web crypto). I have heard implementers say COW is hard and not likely to get done any time soon, but I haven't heard them say that it's impossible or incompatible with the design of typed arrays.

Comment 27 Ryan Sleevi 2014-06-18 02:24:43 UTC

(In reply to Domenic Denicola from comment #26)
> Furthermore, it's worth pointing out that if browsers would get around to
> implementing copy-on-write for array buffers, then the copying behavior
> would be free in the usual case (where the JS programmer does not mutate the
> array buffer after giving it to web crypto). I have heard implementers say
> COW is hard and not likely to get done any time soon, but I haven't heard
> them say that it's impossible or incompatible with the design of typed
> arrays.


http://lists.w3.org/Archives/Public/public-webcrypto/2013Oct/0043.html

"It would not be practical to implement copy-on-write for array buffers"

Comment 28 Boris Zbarsky 2014-06-18 03:52:38 UTC

> Every good ES programmer understands these things.

My (admittedly biased, since it's based on bug reports) sample of web site code suggests that many web site programmers are not good ES programmers, unfortunately.  :(

> "It would not be practical to implement copy-on-write for array buffers"

For large enough ones, if they're page-aligned (as large allocations tend to be), it might in fact be doable, albeit complex, with mprotect.  Whether doing an mprotect is faster than a memcpy is an interesting question, of course....  would need to measure.  For small (sub-page) buffers, copying is correspondingly cheaper, of course.

Comment 29 Allen Wirfs-Brock 2014-06-18 16:32:52 UTC

(In reply to Domenic Denicola from comment #26)
> The question is, could interop survive with the undefined behavior Allen
> suggests? I am not so sure.

I'm concerned about that, too, which is why we go to such effort to specify everything in such detail in the ES spec. However, I do think we are already allowing some non-determinism into the specifications as soon as we have any async behavior. It may be the case we are talking about falls under that shadow.

But perhaps there is another way to specify this case deterministically that doesn't require copying.

The specific problem that has been described here is the possibility that a caller might change the content of the data buffer after the promise returning call but before any actual processing of the buffer contents. If it isn't specified at what point the buffer content is used by an asynchronous callee the result is not deterministically specified.  Copying before returning the promise is the proposed way to make things deterministic. It moves data access to the earliest possible time.  The alternative would be to move the data access until after the caller had its opportunities to modify the buffer.  This might be done by specify that the promise returning function only captures its raw argument values (the object references, no copying) before immediately returning the promising.  In addition, the specification of the function would say that all processing of the argument data is deferred until after the current turn completes. This could be implemented by the promising-returning function scheduling a microtask to do the actual work.

This seems consistent with the ES concurrency model where there are no data race conditions within a turn, but where anything could happen between turns.

This is not a perfect solution (what if the buffer is modified in a subsequent turn?); but copying everything everytime isn't a perfect solution either.

> 
> I am wary of the performance perils of excessive copying as well, but I
> don't think undefined behavior can survive long on the web.
> 
> Furthermore, it's worth pointing out that if browsers would get around to
> implementing copy-on-write for array buffers, then the copying behavior
> would be free in the usual case 

Copy-on-write is only practical for primitives things like ArrayBuffers.  It doesn't address the general problem that arises when sharing complex object graphs among async tasks.

Presumably, if we are talking about  WebIDL level support we will have situations where we are dealing with object graphs and not just buffers.

Which means it's time for my annual mention of http://www.wirfs-brock.com/allen/posts/379 . Which Web APIs need to be treated as OS-like stable, reliable, kernel interfaces; and, which can be treated as unreliable, evolvable, framework interfaces. There is where some platform architecture thinking would be helpful.

Comment 30 Allen Wirfs-Brock 2014-06-18 16:59:53 UTC

Of course, an even simpler way around this problem would be to introduce the concept of marking an ArrayBuffer as immutable. Then APIs like these could simply that any passed buffers must be mark as immutable.

Boris, this must have been considered in the past?  Why don't we have immutable ArrayBuffers?

Allen

Comment 31 Boris Zbarsky 2014-06-18 17:02:08 UTC

Marking an ArrayBuffer immutable because one view on it is passed to an API is a bit weird: it makes all the other views read-only as well, which is a pretty nonlocal effect....

Past that, I don't know of any reasons it couldn't be done offhand, though the JIT folks may have concerns.

Comment 32 Allen Wirfs-Brock 2014-06-18 17:14:42 UTC

(In reply to Boris Zbarsky from comment #31)
> Marking an ArrayBuffer immutable because one view on it is passed to an API
> is a bit weird: it makes all the other views read-only as well, which is a
> pretty nonlocal effect....

No, different from neutering in that regard.  And certainly the most common situation is a single view over the entire buffer.

The advantage, is that  it places the burden of copying on specific callers that are dealing with shared buffers rather than having the callee unilaterally copying everything.

It also chains nicely, avoiding multiple layers of redundant copying


> 
> Past that, I don't know of any reasons it couldn't be done offhand, though
> the JIT folks may have concerns.

should be any worse than neutering, and the same write guard could probably be used for both.

Comment 33 Ryan Sleevi 2014-06-18 17:23:16 UTC

(In reply to Allen Wirfs-Brock from comment #32)
> The advantage, is that  it places the burden of copying on specific callers
> that are dealing with shared buffers rather than having the callee
> unilaterally copying everything.

Which largely means it only works for ArrayBuffer, not ArrayBufferView, as the act of passing a ABV to an API call shouldn't necessarily cause the underlying ArrayBuffer to be frozen/neutered/etc.

Comment 34 Domenic Denicola 2014-06-18 17:44:22 UTC

I think you guys are talking past each other a bit. Allen is envisioning an API that allows you to "mark as immutable" array buffers, and suggesting that these methods should throw whenever given array buffers that aren't marked as such. Whereas Ryan and Boris have taken him to mean that passing a (mutable) array buffer to these methods would suddenly make them immutable.

So I believe Allen is suggesting something like

var ab = new ArrayBuffer(1024);
ArrayBuffer.makeImmutable(ab);
crypto.subtle.encrypt(ab); // works

var ab2 = new ArrayBuffer(1024);
crypto.subtle.encrypt(ab); // rejects with TypeError saying it needs to be immutable.

Comment 35 Allen Wirfs-Brock 2014-06-18 17:54:17 UTC

(In reply to Domenic Denicola from comment #34)

> 
> So I believe Allen is suggesting something like
> 
> var ab = new ArrayBuffer(1024);
> ArrayBuffer.makeImmutable(ab);
> crypto.subtle.encrypt(ab); // works
> 
> var ab2 = new ArrayBuffer(1024);
> crypto.subtle.encrypt(ab); // rejects with TypeError saying it needs to be
> immutable.

exactly, except that I would probably  design the ArrayBuffer API such that the second line would be:
  ab.makeImmutable();

Comment 36 Jonas Sicking (Not reading bugmail) 2014-06-18 18:44:35 UTC

To make neutering happen you always have to explicitly pass the ArrayBuffer itself as one of the transferrable arguments to postMessage. Passing an ArrayBufferView is not allowed. This was done to avoid the kind of surprising behavior that we're talking about here.

Comment 37 Ryan Sleevi 2014-06-18 19:14:43 UTC

(In reply to Domenic Denicola from comment #34)
> I think you guys are talking past each other a bit. Allen is envisioning an
> API that allows you to "mark as immutable" array buffers, and suggesting
> that these methods should throw whenever given array buffers that aren't
> marked as such. 

No, I understood this.

This has two surprising consequences:
1) It prevents the use of ArrayBufferViews from working on slices of data with such APIs - that is, you have to mark the underlying buffer as immutable, which means if you want your use of ABV to have no non-local state, you're forcing the application developer to copy.

This can be seen in the neutering behaviour discussions, which is essentially the same thing.

2) It essentially forces copying into the application's responsibility, rather than having it be part of the API. This doesn't seem to do anyone any favours.

Almost certainly, the logical consequence of such an API is you'd want to 'unmark' something as immutable. This approach (mutable then unmutable) is what Web Audio tried to do with "temporarily neuter". If you do introduce a markMutable, when what you've really done is introduce a reader/writer lock on ArrayBuffers. Is that really a good API choice? I honestly don't know, but I feel like probably not.



I'm probably biased, because I'm approaching this from the cryptographic operations. Consider something like PBKDF2 or HKDF, which are algorithms that compose on existing algorithms (eg: HMAC) in a somewhat iterative way to 'expand' key material into a suitable derived key.

HKDF is defined somewhat recursively
   T(0) = empty string (zero length)
   T(1) = HMAC-Hash(PRK, T(0) | info | 0x01)
   T(2) = HMAC-Hash(PRK, T(1) | info | 0x02)
   T(3) = HMAC-Hash(PRK, T(2) | info | 0x03)

The resulting key is concatenating T(1), T(2), ... T(N) until you have enough data to fit the desired length.

If I were writing this in C/C++, I'd typically start by allocating T (aligned to HMAC-Hash length), and an "intermediate" buffer.

Then I'd write a loop

for (i = 1; i < N; ++i) {
  if (i == 1) {
    // set up intermediate
    HMAC(PRK, intermediate, intermediateLength, &T[0]);
  } else {
    memcpy(&intermediate[0], &T[(i - 1) * hashLength], hashLength);
    intermediate[intermediate.length - 1] = i;
    HMAC(PRK, intermediate, intermediateLength, &T[i * hashLength]);
  }
}

While likely bugged in many ways, the net effect is that my code has performed two allocations - one for the resulting buffer, and one for the intermediate result.

If I were to implement this in ES, with WebCrypto, I'd likely do something similar. I'd have a Uint8Array called T, which had my targetted results. I'd have a Uint8Array, intermediate, that contains my intermediate data.

Now, WebCrypto doesn't do emplacement-writes, it always returns a fresh ArrayBuffer, so I'd do a Promise to get T(1), then copy that into T[0...hashLength]. Then I'd do a Promise to get T(2), and copy that into [hashLength...hashLength*2], then a Promise for T(3), copy that into [hashLength*2...hashLength*3]

As Web Crypto is currently specified, I can use "intermediate" as my scratch, and keep mutating it (updating the count and copying the result from T(i-1) in). Web Crypto will take responsibility for 'snapshotting' intermediate for each state.

If we required .markImmutable(), then I, as the application author, would be forced to create a new copy of the data each time, as part of my algorithm.

That 'smells' to me for two reasons:
1) I assume (perhaps incorrectly) that by keeping the copying on the UA side of things, I'm allowing greater flexibility for memory management strategies
2) As an author, that's more boilerplate for me to write in order for things to "Just Work"

Comment 38 Allen Wirfs-Brock 2014-06-19 00:44:40 UTC

(In reply to Ryan Sleevi from comment #37)

Thanks, this is enlightening.

As an overall comment, I think your starting premise is wrong. ES programmers don't necessarily think about this sort of processing in the same way that a C/C++ programmer would.  In particular, they don't generally think about this local memory management/buffer reuse like this. In this  case I suspect that incrementally accumulating  into a common buffer wouldn't even occur to many ES programmer. They would probably just do concatenations along the way and expect intermediate results that are no longer needed to get garbage collected.

Regarding your approach, I suspect the extra copying (and related memory management overhead) you are proposing on calls into your subsystem, would just about balance out savings you get from using the accumulating buffer. Working with immutable independent buffers seems conceptually much simpler and more along the lines of what a good ES programmer would expect.

This sort of ES based thinking really should be a factor in your API designs. Don't assume that the ES developer will approach problems just like a C++ developer.

Comment 39 Ryan Sleevi 2014-06-19 01:07:12 UTC

(In reply to Allen Wirfs-Brock from comment #38)
> This sort of ES based thinking really should be a factor in your API
> designs. Don't assume that the ES developer will approach problems just like
> a C++ developer.

So, I think we may be a bit off into the woods about what the "normal" ES developer would do. That is, ES developers come from all sorts of backgrounds. They may not approach it like C/C++ - they may come from a C# or Java background, knowing about the immutability of strings and the importance of StringBuilder, and thus be similarly paranoid. They may come from a gaming background, and know how gc stalls can kill you, and thus do everything they can to reduce gc.

As it relates to this specific issue: The makeImmutable() proposal has two key downsides, in my mind:

1) It doesn't have a good solution for ArrayBufferView, short of copying into a newly-minted immutable array
2) It forces authors (such as the example I gave) to explicitly copy, even if they did *want* to optimize.

In the priority of constituencies, it feels like makeImmutable() is probably the wrong approach; it forces this overhead onto callers. As to whether (ArrayBufferView) is a valid input to most APIs - I would just note that it's seemingly in use by most APIs that take binary data as inputs (Web GL, Web Audio, there's an open bug for MSE, Web Crypto)

Now, I'm biased because I'm thinking in the context of cryptographic operations and Web Crypto, and still strongly believe that "don't do that" is a bad answer. From the sampling of the other APIs, though, it does seem that for complex data processing (eg: the APIs that do tend to take ArrayBuffers), if they're going to behave asynchronously (via Promises), they copy.

Indeed, even the synchronous APIs, such as WebGL, seem to result in copying in most implementations. That is, because WebGL calls themselves tend to have synchronous overheads, implementations are deferring the operations (which have void returns, and thus can return immediately), and as long as they ensure a consistency in the ordering of WebGL operations, this is perfectly legal!

I agree, there are things unaddressed - complex object trees that are preserved as platform objects (IDL annotation 'object' or 'any'). Frankly, I'm not too bothered by this - we don't need to boil the ocean. We should come with a good set of prose though for the non-simple data type that might be mutated. Strings are immutable, as are Numbers, thus WebIDL is good and fine. ArrayBuffers are mutable, hence it's good to include in WebIDL how authors MAY handle them.

Comment 40 Allen Wirfs-Brock 2014-06-19 20:24:30 UTC

(In reply to Ryan Sleevi from comment #39)

> 
> So, I think we may be a bit off into the woods about what the "normal" ES
> developer would do. ...
> 

We should be trying to understand, encourage, and design APIs to a native JS style rather than picking and choosing among styles derived from other languages. 

> As it relates to this specific issue: The makeImmutable() proposal has two
> key downsides, in my mind:
> 
> 1) It doesn't have a good solution for ArrayBufferView, short of copying
> into a newly-minted immutable array
> 2) It forces authors (such as the example I gave) to explicitly copy, even
> if they did *want* to optimize.

It's an excellent solution for situations where there is a 1:1 correspondence between ArrayBufferViews and ArrayBuffers. My hypothesis is that this will be the most common situation and a very idiomatic JS approach. Particularly if an overall functional style using immutable data is being used.  

> 
> In the priority of constituencies, it feels like makeImmutable() is probably
> the wrong approach; it forces this overhead onto callers.

Which, to me, seems like exactly where it belongs. Your API design is always copying, even if the data is already stable and/or logically immutable.  It's the author of the calling code that understands which of the various possible data management policies they are using and hence whether or not copying is needed. You, on the other hand, have to pessimistically assume that copying is always needed.

The ability for an ArrayBuffer to be marked as immutable would allow your API layer to recognize when copying is not needed. 


> As to whether
> (ArrayBufferView) is a valid input to most APIs - I would just note that
> it's seemingly in use by most APIs that take binary data as inputs (Web GL,
> Web Audio, there's an open bug for MSE, Web Crypto)
> 

I don't think any of the examples you list are particularly good exemplars of a a natural idiomatic JS style we should be looking for in new APIs. 

> 
> I agree, there are things unaddressed - complex object trees that are
> preserved as platform objects (IDL annotation 'object' or 'any'). Frankly,
> I'm not too bothered by this - we don't need to boil the ocean. We should
> come with a good set of prose though for the non-simple data type that might
> be mutated. Strings are immutable, as are Numbers, thus WebIDL is good and
> fine. ArrayBuffers are mutable, hence it's good to include in WebIDL how
> authors MAY handle them.

Sure, and let's also allow for immutable ArrayBuffers so can recognize and optimize for that common case.

Comment 41 Boris Zbarsky 2014-06-19 20:27:33 UTC

Note that the common implementation of crypto API in particular will actually perform work on the data on some background thread, or in a separate process entirely.  So I suspect in practice the data will get copied anyway, since keeping alive the original object is quite nontrivial in such a situation.

Comment 42 Allen Wirfs-Brock 2014-06-19 21:15:45 UTC

(In reply to Boris Zbarsky from comment #41)

Sure, but this bug is about possible enhanced WebIDL support for ArrayBuffer arguments. Not every API that has an ArrayBuffer parameter is going to do the sort of asynchronous processing that requires copying.

Unobservable copy the implementation level is presumably always allowed. My recent points is more about whether possibly observable copying is always required, even in situations where the caller knows it isn't necessary.

Comment 43 Anne 2014-06-20 06:47:42 UTC

Audio, video, crypto, fetching (XMLHttpRequest, fetch()), all potentially have the separate process semantics. The only case I can think of that might not require copying is text encoding. ImageData specifically does not do copying but requires upfront work (only takes Uint8ClampedArray).

At the moment everything that says sequence<> is a copy too and not always strictly required I suppose.

It seems like IDL should provide the tools to do both and we should just be careful when designing/reviewing APIs as to what is actually required.

Comment 44 Domenic Denicola 2014-06-20 13:24:49 UTC

Getting down to the business of solving this bug:

As I see it we have two choices. One, be consistent with sequence<>:

func(sequence<any> x); // copies the iterable into an array
func(ArrayBuffer x); // copies the ArrayBuffer into another ArrayBuffer

func([NoCopy] sequence<any> x); // no copying; spec author must be cautious
func([NoCopy] ArrayBuffer x); // no copying; spec author must be cautious

Or two, be inconsistent, but efficient-by-default:

func(sequence<any> x); // copies
func(ArrayBuffer x); // no copying

func([NoCopy] sequence<any> x); // no copying
func([Copy] ArrayBuffer x); // copies

The third alternative is to try to retroactively change the meaning of sequence<> to require a [Copy] when it must be copied, but I have a hard time seeing that as feasible.

In all cases, we will want to write up a good explanation and guidance document explaining the various concerns; in particular we'd need to give concrete guidelines on how to use non-copied objects without causing undefined behavior or other horrible things.

Comment 45 Cameron McCormack 2014-10-03 06:19:23 UTC

(In reply to Domenic Denicola from comment #44)
> Getting down to the business of solving this bug:
> 
> As I see it we have two choices. One, be consistent with sequence<>:
> 
> func(sequence<any> x); // copies the iterable into an array
> func(ArrayBuffer x); // copies the ArrayBuffer into another ArrayBuffer
> 
> func([NoCopy] sequence<any> x); // no copying; spec author must be cautious
> func([NoCopy] ArrayBuffer x); // no copying; spec author must be cautious
> 
> Or two, be inconsistent, but efficient-by-default:
> 
> func(sequence<any> x); // copies
> func(ArrayBuffer x); // no copying
> 
> func([NoCopy] sequence<any> x); // no copying
> func([Copy] ArrayBuffer x); // copies
> 
> The third alternative is to try to retroactively change the meaning of
> sequence<> to require a [Copy] when it must be copied, but I have a hard
> time seeing that as feasible.
> 
> In all cases, we will want to write up a good explanation and guidance
> document explaining the various concerns; in particular we'd need to give
> concrete guidelines on how to use non-copied objects without causing
> undefined behavior or other horrible things.

I don't really like "[NoCopy] sequence<>".  If we want to allow references to Array objects then we should add a new type "Array" which means that.

Do we need to solve the copying-or-not in the IDL syntax?

Here's my proposal:

* stick with sequence<> meaning "generate a new copy of the iterable thing passed in"

* introduce types named ArrayBuffer, DataView, and all of the typed arrays Int8Array, Uint8Array, etc., which all mean "reference to an instance of that class (or a subclass)"

* pre-define

    typedef (Int8Array or Uint8Array or ... or DataView) ArrayBufferView;

* pre-define

    typedef (ArrayBufferView or ArrayBuffer) BufferSource;

  and make ArrayBuffer distinguishable from all the ArrayBufferView types,
  using [[TypedArrayName]], [[ViewedArrayBuffer]] and [[ArrayBufferData]]
  checks to implement the distinguishability needed in the overload resolution
  algorithm and union type conversion algorithm

* suggest spec writers to use BufferSource as the type when they want to
  receive a chunk of data

* suggest spec writers to return an ArrayBuffer object when generating a
  chunk of data (though I'm convinced about this -- what about for cases
  where we know the author will be interested in inspecting the bytes, not
  just passing the buffer around as an opaque thing?)

* get rid of the ArrayBufferData typedef I recently added in lieu of
  BufferSource

* add a term that means "get a reference to the bytes held by the
  BufferSource" and a term that means "get a copy of the bytes held by the
  BufferSource", both of which work on any of the types in that union;
  the spec author then must be explicit about whether a copy is made
  before getting easy prose access to the bytes (easy access being something a
  bit more high level than generating array index property names and calling
  [[Get]] on the object that came in.)

If the spec author wants to allow plain JS Arrays to be passed in to provide the data, they can include sequence<octet> themselves.

Comment 46 Anne 2014-10-03 13:03:47 UTC

This plan looks good to me. Encoding Standard is one place I know where we return Uint8Array. HTML has Uint8ClampedArray for <canvas>. XMLHttpRequest and fetch() return ArrayBuffer.

Comment 47 Domenic Denicola 2014-10-03 17:13:42 UTC

That plan also sounds good to me.

You'll need some very strong language so that spec editors don't overuse "get a reference to the bytes held by the BufferSource" since it can be a source of data races. They at least need to understand the consequences.

As for return types, I strongly believe ArrayBuffer is correct where possible. It just makes more sense to give people the lower-level buffer primitive and letting them access it however they want. (And a DataView might be more correct than Uint8Array for people who want byte access, BTW.) Kind of like in C# returning IEnumerable<T> instead of ArrayList<T>.

Comment 48 Cameron McCormack 2014-10-04 08:31:11 UTC

https://github.com/heycam/webidl/commit/631316b8d513d414223de265b157153c032958c2
https://github.com/heycam/webidl/commit/c497edf5108bdccd2038158ea15f03e93a274308

http://heycam.github.io/webidl/#idl-buffer-source-types
http://heycam.github.io/webidl/#es-buffer-source-types
http://heycam.github.io/webidl/#dfn-distinguishable
http://heycam.github.io/webidl/#dfn-overload-resolution-algorithm
http://heycam.github.io/webidl/#es-union

Review welcome!

Comment 49 David Dorwin 2014-10-10 23:09:35 UTC

ArrayBufferData still appears in five places, including the new typedef for BufferSource. Was it supposed to be removed as suggested in comment 45?

Comment 50 David Dorwin 2014-10-20 16:58:54 UTC

(In reply to David Dorwin from comment #49)
> ArrayBufferData still appears in five places, including the new typedef for
> BufferSource. Was it supposed to be removed as suggested in comment 45?

Cameron, is this correct?

Comment 51 Anne 2014-10-20 17:15:55 UTC

Cameron is away this month. I filed bug 27110 for the one case where it should say ArrayBufferView. The other cases appear correct to me as they refer to internal slots and ES6 does define an [[ArrayBufferData]] internal slot.