10:20:48 <RRSAgent> RRSAgent has joined #zerocopy
10:20:48 <RRSAgent> logging to https://www.w3.org/2020/10/26-zerocopy-irc
10:20:53 <dom> Zakim, start meeting
10:20:53 <Zakim> RRSAgent, make logs Public
10:20:54 <Zakim> Meeting: Memory copies & zero-copy operations on the Web
10:22:57 <annevk> annevk has changed the topic to: https://www.w3.org/2020/10/TPAC/breakout-schedule.html#zerocopy
12:40:15 <Zakim> Zakim has left #zerocopy
12:46:34 <Zakim> Zakim has joined #zerocopy
13:23:49 <tidoust> tidoust has joined #zerocopy
13:56:07 <dom> scribe+
13:57:08 <dom> -> https://www.w3.org/2020/Talks/TPAC/unconference/zerocopy.pdf Slides
13:57:11 <Geunhyung_Kim> Geunhyung_Kim has joined #zerocopy
13:57:15 <dom> Chair: tidoust
13:58:26 <dom> Present+ Elad_Alan, Francois_Daoust, Harald_Alvestrand, Myles, Youenn, Yutaka_Hirano, Dan_Sanders, Dominique_Hazael-Massieux
13:58:39 <dom> dom has changed the topic to: https://www.w3.org/2020/10/TPAC/breakout-schedule.html#zerocopy Zoomid 814 3680 6430
13:58:47 <takio> takio has joined #zerocopy
13:59:44 <dom> Present+ Tzviya_Siegman, Chai_Chaoweeraprasit, Anssi_Kostiainen
13:59:54 <Chai> Chai has joined #zerocopy
13:59:59 <dom> Present+ Mehmet_Oguz_Derin
14:00:08 <dom> Present+ Adam_Rice
14:01:27 <dom> Present+ Ben_Smith, Anne_van_Kesteren
14:01:39 <Geunhyung_Kim> present+ Geunhyung_Kim
14:01:57 <dom> Present+ Jan-Ivar_Bruaroey
14:02:04 <dom> Present+ Daniel_Ehrenberg
14:02:11 <dom> Present+ Ken_Russell
14:02:19 <dom> Present+ Anita_Chen
14:02:34 <dom> Present+ Carine_Bournez, Ben_Wagner
14:02:57 <dom> Present+ Chris_Cunningham, Florent_Castelli
14:02:59 <caribou> caribou has joined #zerocopy
14:03:03 <dom> Present+ Yves_Lafon
14:03:09 <chcunningham> chcunningham has joined #zerocopy
14:03:12 <chcunningham> present+
14:04:33 <anssik> anssik has joined #zerocopy
14:05:25 <dom> Present+ Austin_Eng
14:05:29 <anssik> Present+ Anssi_Kostiainen
14:05:42 <dom> Francois: at the origin of this breakout session, there was a machine learning workshop organized by Dom in September
14:06:00 <dom> ... the topic of efficiency issues with real-time media processing was raised by several speakers
14:06:17 <dom> ... Bernard Aboba in particular mentioned the cost of memory copies in that context
14:06:19 <takio> Present+ Takio_Yamaoka
14:06:23 <jib> jib has joined #zerocopy
14:06:46 <dom> ... Likewise, Tero mentioned that in the context of music processing with ML, moving bytes around takes as much processing time as doing the actual processing
14:06:52 <dom> ... This is not a ML-specific issue
14:07:10 <dom> ... the GPU on the Web has had similar conversations on the topic last week
14:07:12 <Yves> Yves has joined #zerocopy
14:07:14 <dom> ... this issue spans multiple groups
14:07:23 <dom> ... as a result, it may not have a clear owner
14:07:30 <dom> ... which is why we're convening this conversation
14:07:38 <dom> ... would like to start by introducing the situation as I understand it
14:08:22 <dom> ... then I want us to discuss, with a goal of clarifying whether everything is already under control or if instead we need some coordination effort somewhere
14:08:42 <dom> ... to reason about this, I thought I would start by trying to represent the different components involved in memory copies
14:08:52 <dom> ... a very rough and incomplete, possibly wrong visualization
14:09:16 <dom> ... we can split memory between CPU & GPU (generally physically different)
14:09:31 <dom> ... which means that data needs to go from one to another depending on which unit does the processing
14:10:15 <dom> ... if you add the browser to this landscape, it manages JS & WASM with CPU, where GPU is under the control of WebGL / WebGPU
14:10:27 <dom> ... in JS, there are various threads (incl via workers)
14:10:53 <dom> ... and then browsers will interact with various pieces of hardware and external devices, incl hardware encoders/decoders
14:11:04 <dom> Present+ Bernard_Aboba
14:11:18 <mehmetoguzderin> mehmetoguzderin has joined #zerocopy
14:11:19 <dom> ... all of these blocks needs to communicate with the browser as a mediator
14:11:29 <dom> ... which means memory copies as soon as one of the boundaries get crossed
14:11:48 <dom> ... in a non-optimized version at least
14:12:20 <dom> ... Memory copies are needed to transfer across boundaries (some physical, some not but may be linked to security checks)
14:12:36 <MikeSmith> MikeSmith has joined #zerocopy
14:12:40 <dom> ... copies may be needed due to difference of structures (e.g. buffers in JS vs WASM, RGBA vs YUV)
14:12:52 <dom> Present+ Guido
14:12:57 <dom> Present+ Keith_Miller
14:13:06 <kim_wooglae> kim_wooglae has joined #zerocopy
14:13:09 <dom> Present+ Paul_Adenot
14:13:15 <dom> Present+ Riju
14:13:24 <dom> Present+ Stefan_Holmer
14:13:29 <dom> Present+ Shuangting Yao
14:13:30 <kim_wooglae> Present+ kim_wooglae
14:13:41 <dom> Prsesent+ MikeSmith
14:13:52 <dom> Present+ MikeSmith
14:14:12 <dom> Francois: sometimes, there is a need for a copy for behavior invariant
14:14:22 <dom> ... sometimes, a copy leads to a better API design
14:14:22 <MikeSmith> RRSAgent, make minutes
14:14:22 <RRSAgent> I have made the request to generate https://www.w3.org/2020/10/26-zerocopy-minutes.html MikeSmith
14:14:31 <dom> ... sometimes, a copy doesn't matter from a performance perspective
14:14:57 <dom> ... What would it take to reduce memory copies? This would require enabling direct access in a given pipeline
14:15:07 <dom> ... e.g. allow full media processing in the GPU
14:15:08 <MikeSmith> RRSAgent, make logs public
14:15:39 <dom> ... there are already mechnasisms in place to help: sharedarraybuffer, transferable interfaces
14:15:41 <MikeSmith> s/Prsesent+ MikeSmith//
14:16:17 <dom> ... and a number of opaque interfaces where bytes aren't exposed to allow browsers to optimize memory handling (e.g. mediastreamtrack, opaque frames in WebCodec, nodes in WebAudio)
14:16:54 <dom> Francois: giving the floor to Bernard to share an example where memory copies show up
14:17:23 <dom> Bernard: the use case I wanted to highlight is the gallery view that has become popular in teleconferencing, esp in the context of education
14:17:40 <dom> ... native apps go up to 7x7 in gallery views, where web apps are limited to 4x4
14:17:59 <dom> ... the bandwidth is not issue - the bottleneck is in the receive -> display path
14:18:20 <dom> ... implementing this natively, we've been able to use a full-GPU processing pipeline from reception onwards
14:18:31 <dom> ... this enables 7x7 gallery at 30FPS
14:19:14 <dom> ... each copy added in the pipeline reduces the gallery size - 1 copy → 5x5, 2 copies → 4x4, 3 copies (which is what we have with WebTransport today) → 3x3
14:19:32 <dom> ... here the memory operation has a direct impact on performance
14:19:40 <dom> ... this involves no ML processing
14:19:52 <dom> ... (background blur would happen on the sending side, not receving)
14:19:58 <dom> s/cev/ceiv/
14:19:58 <MikeSmith> RRSAgent, make logs public
14:20:16 <dom> Francois: this illustrates a very practical impact of memory copies
14:20:27 <dom> ... There are various discussions & proposals linked to this topic across various groups
14:20:46 <dom> ... linked to Streams, WebTransport, WebAssembly, WebGPU, WebCodecs, WebRTC
14:20:54 <dom> ... some started a long time ago - may be worth reviewing
14:21:31 <dom> ... there may be other ideas to consider - e.g. allow direct fetch in GPU memory? Allow to declare a media pipeline to enable memory optimization by the UA
14:22:00 <dom> ... this concludes my presentation - I think it would be useful to identify scenarios, figure out whether they're being addressed or not
14:22:21 <dom> ... it may be that the most interesting scenarios only cross one boundary and can be addressed on one-to-one group basis
14:22:28 <dom> ... or maybe that's just implementation considerations
14:22:50 <dom> ... or maybe we need more coordination - which the participation here seems to suggest
14:23:03 <MikeSmith> q?
14:23:13 <chcunningham> q+
14:23:17 <padenot_> padenot_ has joined #zerocopy
14:23:24 <Chai> present+
14:23:29 <padenot_> present+
14:23:39 <jib> present+
14:23:39 <dom> ChrisC: thank you for that overview - want to share a WebCodecs perspective here
14:23:56 <dom> ... the ML issues that were the seed of this conversation - WebCodecs will help
14:24:22 <riju> riju has joined #zerocopy
14:24:37 <dom> ... getting frames out of video elements and the need to converse back to canvas, with RGB conversion... WebCodecs is making all of this better and allows to skip canvas altogether, the RGB->YUV conversion
14:25:15 <dom> ... WebCodecs is facing hard problems with WASM copies - WebCodecs as a VideoFrame primitive
14:25:39 <dom> ... which allows to copy the planes out into an arraybuffer, which for WASM means wrapping it in a heap and copying it
14:25:54 <dom> ... due to security concerns
14:26:15 <dom> ... if you can mutate the data, this creates risks for codecs that don't expect mutations
14:26:49 <dom> ... Can we have some interfaces for a buffer which we would read into but that once done cannot be modified
14:27:06 <dom> ... We've been told this is very challenging both in JS & WASM worlds
14:27:16 <dom> ... and so probably not coming immediately
14:27:32 <MikeSmith> RRSAgent, make minutes
14:27:32 <RRSAgent> I have made the request to generate https://www.w3.org/2020/10/26-zerocopy-minutes.html MikeSmith
14:28:20 <dom> KenRussell: (Chrome team) confirm this is a very hard problem - needs to memory-protect a segment of the WASM memory
14:28:41 <dom> ... this would require rearchitecting the WASM engines, slicing up WASM memory to make it read-only is hard, and OS dependent
14:28:57 <dom> ... ArrayBuffer are transferable, by design, to allow zero-copy across Web workers
14:29:13 <dom> ... recycling path would have to be redesigned
14:29:57 <dom> BenSmith: in WASM, there is access to one type of memory, with several purposes: memory for the language being run (C++, Rust)
14:30:21 <dom> ... adding another memory to WASM would mean adding support to the underlying language
14:30:33 <dom> ... Accessing directly through a static memory index is one way, but there are other ways
14:30:48 <dom> ... it's possible that you could access that memory as a dynamic memory object
14:30:53 <dom> ... not sure there is a way to do that
14:31:13 <dom> ... a third way is to take that one memory, use it as address space which can then be transferred
14:31:18 <dom> ... but that has complexity has well
14:31:32 <dom> ... part of it is complexity of implementation, part of it is architecture
14:33:03 <dom> Yutaka: arraybuffer given to the buffer reader is allocated as shared memory buffer, not local buffer memory - we need some specification that allows optimization (?)
14:33:21 <Domenic> Domenic has joined #zerocopy
14:34:17 <dom> Adam_Rice: (Google), work on streams with Yutaka - if we support sharedarraybuffer, it will be visible to developers, therefore needs standardization work
14:34:23 <dom> ... but it's difficult to do safely
14:34:32 <dom> ... by safely I mean protecting the C++ code from data races
14:34:44 <dom> ... (the C++ browser code)
14:35:17 <Chai> q+
14:35:18 <dom> Francois: how much is an implementation problem vs specification?
14:35:22 <padenot_> q+
14:35:37 <dom> q- chcunningham
14:35:41 <dom> ack Chai:
14:35:43 <dom> ack Chai
14:36:10 <dom> Chai: (WebNN API) I have a question wrt WebCodecs and its applicability to ML
14:36:33 <dom> ... from the ML perspective, esp on GPU, the way data is consumed through GPU buffers, not necessarily through textures
14:36:54 <dom> ... incl for historical reasons, with the different kind of swizzling (?) patterns done in their own hardware
14:37:02 <myles> myles has joined #zerocopy
14:37:06 <dom> ... GPU buffers are the currency of ML data going into the compute engine
14:37:17 <keith_miller> keith_miller has joined #zerocopy
14:37:18 <dom> ... I heard about WebCodecs implying that the process of decode can be done into the memory texture
14:37:24 <youenn> youenn has joined #zerocopy
14:37:49 <dom> ... I'm wondering whether that can also be done onto the GPU buffers, because ML processing for video streams or frames can require many kind of transforms
14:37:56 <dom> ... (color space, formats, ...)
14:38:02 <dom> ... ML typically uses normalized floats
14:38:12 <dom> ... if the conversion is not done the right way, it will cause many copies
14:38:42 <dom> ... also, depending on the destination, this can require more copies
14:38:55 <dom> ... e.g. for computer vision
14:39:14 <dom> ... What is the destination? How does it do it? What are the thoughts around producing the data into reusable forms for ML?
14:39:28 <dom> Dan_Sanders: (working on WebCodecs)
14:39:41 <dom> ... the current impl of WebCodecs in Chrome use GPU buffers
14:39:47 <dom> ... we don't have a way to expose them in a convenient way
14:39:57 <dom> ... we haven't figured out how to do that yet
14:40:11 <dom> ... leading proposal is a texture based approach, although I realize that's limitating
14:40:18 <dom> ... the relevant issue is linked from the slide
14:40:56 <dom> PaulAdenot: to reiterate things that were mentioned in previous TPAC and games workshop, one key API pattern for playing nice with memory and real-time processing
14:41:02 <baboba> baboba has joined #zerocopy
14:41:08 <dom> ... is the concept of memory ownership
14:41:28 <dom> ... in native, if you take the multimedia framework, you see an API where you pass memory in, which is then written into
14:41:51 <dom> ... not great ergonomy, but the most sensible way to do it with an unopiniated approach
14:42:02 <dom> ... it would be good to take a similar approach for the Web
14:42:08 <baboba> Question: Is there a way for WebTransport to support receive into a GPU buffer or send from a GPU buffer?  This would improve performance when WebTransport is used in concert with WebCodecs.
14:42:10 <dom> ... memory copies add up pretty quickly
14:42:35 <dom> ... for the audio part, it's not so much that we big objects, but we have extremely high number of them
14:42:43 <dom> ... so touch memory very often, which also piles up
14:42:58 <dom> ... one thing is to have lower level APIs without fancy syntactic sugar
14:43:19 <jib> q+
14:43:25 <dom> ... and carefully check where the memory is coming from, and can it uses float32 / WASM buffers
14:43:28 <dom> ack padenot
14:43:32 <dom> ack jib
14:43:42 <dom> Jan-Ivar: (WebRTC, WebTransport) +1 to Paul
14:43:53 <dom> ... we've been talking about sources and sinks for memory copies
14:44:01 <dom> ... but we need to look at the full pipe chain
14:44:13 <dom> ... decode, modify, play - each of the nodes to access the memory
14:44:22 <dom> ... do you build the API around the memory optimization path?
14:44:44 <dom> ... or do you build a declarative pipe chain and leave it to the browser to optimize?
14:44:57 <dom> ... WebCodecs is not using streams, whereas WebTransport, WebRTC are
14:45:18 <dom> ... the streams spec has pipeTo allows processing to happen in-parallel-but-not-really
14:45:30 <dom> ... not clear whether it does allow for the clean API we would like
14:45:45 <dom> ... if webcodecs doesn't participate in the declarative API, will the whole approach still work?
14:46:12 <dom> Francois: so we've looked at the technical issues
14:46:30 <dom> ... I'm hearing we're confident there are scenarios where this needs to be approached
14:46:45 <padenot_> q+
14:46:48 <dom> ... do we need a specific coordination to make progress on this? or is it already happening on an ad-hoc basis?
14:46:52 <jib> q-
14:47:32 <dom> Paul: pair-wise / opportunistic approaches are what we've been doing so far through overlap of participation
14:47:40 <dom> ... the solution might not be always the same
14:47:40 <tidoust> ack padenot_
14:47:53 <dom> ... but learning from what has been done in other groups has translated wells in the past
14:48:11 <dom> ... there would be value in better API consistency, incl for ease of use for developers
14:48:28 <dom> ... more could be done
14:48:57 <dom> Ken: I've sat on WebGPU-WebCodecs discussions - they indeed tend to happen pair-wise
14:49:25 <dom> ... the pattern of passing memory - it remains to be seen if it applies to the all the use cases we have
14:49:35 <dom> ... would it make sense to create a CG to host these cross-groups discussion?
14:49:57 <MikeSmith> https://github.com/WICG/proposals is one possible place
14:50:22 <dom> Keith: is there some kind of a new primitive? a JS object of some kind around which we could coordinate
14:50:40 <dom> ... may still be difficult
14:51:17 <dom> DanS: +1  - getting clarity from other groups beyond happenstance would be great
14:52:02 <dom> MikeSmith: one possible place I pasted the URL on IRC - we have a repository which is under the WICG organization for proposals
14:52:16 <dom> ... one lightweight way would be to raise an issue there and to use that as a coordination place
14:52:26 <dom> ... I don't know if that's the best fit, but that's one of the motiviations for this repo
14:53:03 <dom> Francois: assuming this would work wrt "where", "who" would be willing to contribute?
14:53:49 <Chai> +1
14:53:55 <dom> [ChrisC, Paul, Ken, DanSanders, Chai volunteer]
14:54:32 <dom> Ken: more discussions sounds useful - not sure we're at the stage where we can get to a single primitive
14:54:52 <dom> ... Austin has looked a lot into zero-copy into GPU
14:55:22 <dom> ... a single discussion place sounds great, with roads toward new ideas / designs
14:55:38 <dom> Francois: I agree I haven't heard a silver bullet solution, but interest in exchanging on scenarios
14:56:20 <dom> Mike: what WICG proposals expect are problem statements, rather than solutions - so despite the name, this would be a good fit there
14:58:09 <dom> Francois: summarizing: multiple needs, no easy solution, cross-group collaboration to share ideas and align API designs is needed
14:58:24 <dom> ... I'll follow up with some of you on how to move forward with this
14:58:28 <dom> ... thanks a lot for attending
14:58:55 <dom> RRSAgent, draft minutes v2
14:58:55 <RRSAgent> I have made the request to generate https://www.w3.org/2020/10/26-zerocopy-minutes.html dom
14:59:51 <MikeSmith> I will create a WICG/reducing-memory-copies repo
14:59:58 <MikeSmith> RRSAgent, draft minutes v2
14:59:58 <RRSAgent> I have made the request to generate https://www.w3.org/2020/10/26-zerocopy-minutes.html MikeSmith
15:08:30 <Yves> Yves has left #zerocopy
15:42:52 <keith_miller> keith_miller has joined #zerocopy
16:20:21 <keith_m__> keith_m__ has joined #zerocopy
16:21:37 <keith_miller> keith_miller has joined #zerocopy
17:17:05 <padenot_> padenot_ has joined #zerocopy
17:21:01 <keith_miller> keith_miller has joined #zerocopy
17:21:53 <padenot_> dom, tidoust, is there a public link for the minutes of this breakout session already? Thanks!
17:22:19 <dom> padenot_, https://www.w3.org/2020/10/26-zerocopy-minutes.html
17:22:38 <padenot_> dom: thanks!
17:23:02 <keith_m__> keith_m__ has joined #zerocopy
17:30:00 <Zakim> Zakim has left #zerocopy
18:21:07 <caribou> caribou has left #zerocopy