10:20:48 RRSAgent has joined #zerocopy 10:20:48 logging to https://www.w3.org/2020/10/26-zerocopy-irc 10:20:53 Zakim, start meeting 10:20:53 RRSAgent, make logs Public 10:20:54 Meeting: Memory copies & zero-copy operations on the Web 10:22:57 annevk has changed the topic to: https://www.w3.org/2020/10/TPAC/breakout-schedule.html#zerocopy 12:40:15 Zakim has left #zerocopy 12:46:34 Zakim has joined #zerocopy 13:23:49 tidoust has joined #zerocopy 13:56:07 scribe+ 13:57:08 -> https://www.w3.org/2020/Talks/TPAC/unconference/zerocopy.pdf Slides 13:57:11 Geunhyung_Kim has joined #zerocopy 13:57:15 Chair: tidoust 13:58:26 Present+ Elad_Alan, Francois_Daoust, Harald_Alvestrand, Myles, Youenn, Yutaka_Hirano, Dan_Sanders, Dominique_Hazael-Massieux 13:58:39 dom has changed the topic to: https://www.w3.org/2020/10/TPAC/breakout-schedule.html#zerocopy Zoomid 814 3680 6430 13:58:47 takio has joined #zerocopy 13:59:44 Present+ Tzviya_Siegman, Chai_Chaoweeraprasit, Anssi_Kostiainen 13:59:54 Chai has joined #zerocopy 13:59:59 Present+ Mehmet_Oguz_Derin 14:00:08 Present+ Adam_Rice 14:01:27 Present+ Ben_Smith, Anne_van_Kesteren 14:01:39 present+ Geunhyung_Kim 14:01:57 Present+ Jan-Ivar_Bruaroey 14:02:04 Present+ Daniel_Ehrenberg 14:02:11 Present+ Ken_Russell 14:02:19 Present+ Anita_Chen 14:02:34 Present+ Carine_Bournez, Ben_Wagner 14:02:57 Present+ Chris_Cunningham, Florent_Castelli 14:02:59 caribou has joined #zerocopy 14:03:03 Present+ Yves_Lafon 14:03:09 chcunningham has joined #zerocopy 14:03:12 present+ 14:04:33 anssik has joined #zerocopy 14:05:25 Present+ Austin_Eng 14:05:29 Present+ Anssi_Kostiainen 14:05:42 Francois: at the origin of this breakout session, there was a machine learning workshop organized by Dom in September 14:06:00 ... the topic of efficiency issues with real-time media processing was raised by several speakers 14:06:17 ... Bernard Aboba in particular mentioned the cost of memory copies in that context 14:06:19 Present+ Takio_Yamaoka 14:06:23 jib has joined #zerocopy 14:06:46 ... Likewise, Tero mentioned that in the context of music processing with ML, moving bytes around takes as much processing time as doing the actual processing 14:06:52 ... This is not a ML-specific issue 14:07:10 ... the GPU on the Web has had similar conversations on the topic last week 14:07:12 Yves has joined #zerocopy 14:07:14 ... this issue spans multiple groups 14:07:23 ... as a result, it may not have a clear owner 14:07:30 ... which is why we're convening this conversation 14:07:38 ... would like to start by introducing the situation as I understand it 14:08:22 ... then I want us to discuss, with a goal of clarifying whether everything is already under control or if instead we need some coordination effort somewhere 14:08:42 ... to reason about this, I thought I would start by trying to represent the different components involved in memory copies 14:08:52 ... a very rough and incomplete, possibly wrong visualization 14:09:16 ... we can split memory between CPU & GPU (generally physically different) 14:09:31 ... which means that data needs to go from one to another depending on which unit does the processing 14:10:15 ... if you add the browser to this landscape, it manages JS & WASM with CPU, where GPU is under the control of WebGL / WebGPU 14:10:27 ... in JS, there are various threads (incl via workers) 14:10:53 ... and then browsers will interact with various pieces of hardware and external devices, incl hardware encoders/decoders 14:11:04 Present+ Bernard_Aboba 14:11:18 mehmetoguzderin has joined #zerocopy 14:11:19 ... all of these blocks needs to communicate with the browser as a mediator 14:11:29 ... which means memory copies as soon as one of the boundaries get crossed 14:11:48 ... in a non-optimized version at least 14:12:20 ... Memory copies are needed to transfer across boundaries (some physical, some not but may be linked to security checks) 14:12:36 MikeSmith has joined #zerocopy 14:12:40 ... copies may be needed due to difference of structures (e.g. buffers in JS vs WASM, RGBA vs YUV) 14:12:52 Present+ Guido 14:12:57 Present+ Keith_Miller 14:13:06 kim_wooglae has joined #zerocopy 14:13:09 Present+ Paul_Adenot 14:13:15 Present+ Riju 14:13:24 Present+ Stefan_Holmer 14:13:29 Present+ Shuangting Yao 14:13:30 Present+ kim_wooglae 14:13:41 Prsesent+ MikeSmith 14:13:52 Present+ MikeSmith 14:14:12 Francois: sometimes, there is a need for a copy for behavior invariant 14:14:22 ... sometimes, a copy leads to a better API design 14:14:22 RRSAgent, make minutes 14:14:22 I have made the request to generate https://www.w3.org/2020/10/26-zerocopy-minutes.html MikeSmith 14:14:31 ... sometimes, a copy doesn't matter from a performance perspective 14:14:57 ... What would it take to reduce memory copies? This would require enabling direct access in a given pipeline 14:15:07 ... e.g. allow full media processing in the GPU 14:15:08 RRSAgent, make logs public 14:15:39 ... there are already mechnasisms in place to help: sharedarraybuffer, transferable interfaces 14:15:41 s/Prsesent+ MikeSmith// 14:16:17 ... and a number of opaque interfaces where bytes aren't exposed to allow browsers to optimize memory handling (e.g. mediastreamtrack, opaque frames in WebCodec, nodes in WebAudio) 14:16:54 Francois: giving the floor to Bernard to share an example where memory copies show up 14:17:23 Bernard: the use case I wanted to highlight is the gallery view that has become popular in teleconferencing, esp in the context of education 14:17:40 ... native apps go up to 7x7 in gallery views, where web apps are limited to 4x4 14:17:59 ... the bandwidth is not issue - the bottleneck is in the receive -> display path 14:18:20 ... implementing this natively, we've been able to use a full-GPU processing pipeline from reception onwards 14:18:31 ... this enables 7x7 gallery at 30FPS 14:19:14 ... each copy added in the pipeline reduces the gallery size - 1 copy → 5x5, 2 copies → 4x4, 3 copies (which is what we have with WebTransport today) → 3x3 14:19:32 ... here the memory operation has a direct impact on performance 14:19:40 ... this involves no ML processing 14:19:52 ... (background blur would happen on the sending side, not receving) 14:19:58 s/cev/ceiv/ 14:19:58 RRSAgent, make logs public 14:20:16 Francois: this illustrates a very practical impact of memory copies 14:20:27 ... There are various discussions & proposals linked to this topic across various groups 14:20:46 ... linked to Streams, WebTransport, WebAssembly, WebGPU, WebCodecs, WebRTC 14:20:54 ... some started a long time ago - may be worth reviewing 14:21:31 ... there may be other ideas to consider - e.g. allow direct fetch in GPU memory? Allow to declare a media pipeline to enable memory optimization by the UA 14:22:00 ... this concludes my presentation - I think it would be useful to identify scenarios, figure out whether they're being addressed or not 14:22:21 ... it may be that the most interesting scenarios only cross one boundary and can be addressed on one-to-one group basis 14:22:28 ... or maybe that's just implementation considerations 14:22:50 ... or maybe we need more coordination - which the participation here seems to suggest 14:23:03 q? 14:23:13 q+ 14:23:17 padenot_ has joined #zerocopy 14:23:24 present+ 14:23:29 present+ 14:23:39 present+ 14:23:39 ChrisC: thank you for that overview - want to share a WebCodecs perspective here 14:23:56 ... the ML issues that were the seed of this conversation - WebCodecs will help 14:24:22 riju has joined #zerocopy 14:24:37 ... getting frames out of video elements and the need to converse back to canvas, with RGB conversion... WebCodecs is making all of this better and allows to skip canvas altogether, the RGB->YUV conversion 14:25:15 ... WebCodecs is facing hard problems with WASM copies - WebCodecs as a VideoFrame primitive 14:25:39 ... which allows to copy the planes out into an arraybuffer, which for WASM means wrapping it in a heap and copying it 14:25:54 ... due to security concerns 14:26:15 ... if you can mutate the data, this creates risks for codecs that don't expect mutations 14:26:49 ... Can we have some interfaces for a buffer which we would read into but that once done cannot be modified 14:27:06 ... We've been told this is very challenging both in JS & WASM worlds 14:27:16 ... and so probably not coming immediately 14:27:32 RRSAgent, make minutes 14:27:32 I have made the request to generate https://www.w3.org/2020/10/26-zerocopy-minutes.html MikeSmith 14:28:20 KenRussell: (Chrome team) confirm this is a very hard problem - needs to memory-protect a segment of the WASM memory 14:28:41 ... this would require rearchitecting the WASM engines, slicing up WASM memory to make it read-only is hard, and OS dependent 14:28:57 ... ArrayBuffer are transferable, by design, to allow zero-copy across Web workers 14:29:13 ... recycling path would have to be redesigned 14:29:57 BenSmith: in WASM, there is access to one type of memory, with several purposes: memory for the language being run (C++, Rust) 14:30:21 ... adding another memory to WASM would mean adding support to the underlying language 14:30:33 ... Accessing directly through a static memory index is one way, but there are other ways 14:30:48 ... it's possible that you could access that memory as a dynamic memory object 14:30:53 ... not sure there is a way to do that 14:31:13 ... a third way is to take that one memory, use it as address space which can then be transferred 14:31:18 ... but that has complexity has well 14:31:32 ... part of it is complexity of implementation, part of it is architecture 14:33:03 Yutaka: arraybuffer given to the buffer reader is allocated as shared memory buffer, not local buffer memory - we need some specification that allows optimization (?) 14:33:21 Domenic has joined #zerocopy 14:34:17 Adam_Rice: (Google), work on streams with Yutaka - if we support sharedarraybuffer, it will be visible to developers, therefore needs standardization work 14:34:23 ... but it's difficult to do safely 14:34:32 ... by safely I mean protecting the C++ code from data races 14:34:44 ... (the C++ browser code) 14:35:17 q+ 14:35:18 Francois: how much is an implementation problem vs specification? 14:35:22 q+ 14:35:37 q- chcunningham 14:35:41 ack Chai: 14:35:43 ack Chai 14:36:10 Chai: (WebNN API) I have a question wrt WebCodecs and its applicability to ML 14:36:33 ... from the ML perspective, esp on GPU, the way data is consumed through GPU buffers, not necessarily through textures 14:36:54 ... incl for historical reasons, with the different kind of swizzling (?) patterns done in their own hardware 14:37:02 myles has joined #zerocopy 14:37:06 ... GPU buffers are the currency of ML data going into the compute engine 14:37:17 keith_miller has joined #zerocopy 14:37:18 ... I heard about WebCodecs implying that the process of decode can be done into the memory texture 14:37:24 youenn has joined #zerocopy 14:37:49 ... I'm wondering whether that can also be done onto the GPU buffers, because ML processing for video streams or frames can require many kind of transforms 14:37:56 ... (color space, formats, ...) 14:38:02 ... ML typically uses normalized floats 14:38:12 ... if the conversion is not done the right way, it will cause many copies 14:38:42 ... also, depending on the destination, this can require more copies 14:38:55 ... e.g. for computer vision 14:39:14 ... What is the destination? How does it do it? What are the thoughts around producing the data into reusable forms for ML? 14:39:28 Dan_Sanders: (working on WebCodecs) 14:39:41 ... the current impl of WebCodecs in Chrome use GPU buffers 14:39:47 ... we don't have a way to expose them in a convenient way 14:39:57 ... we haven't figured out how to do that yet 14:40:11 ... leading proposal is a texture based approach, although I realize that's limitating 14:40:18 ... the relevant issue is linked from the slide 14:40:56 PaulAdenot: to reiterate things that were mentioned in previous TPAC and games workshop, one key API pattern for playing nice with memory and real-time processing 14:41:02 baboba has joined #zerocopy 14:41:08 ... is the concept of memory ownership 14:41:28 ... in native, if you take the multimedia framework, you see an API where you pass memory in, which is then written into 14:41:51 ... not great ergonomy, but the most sensible way to do it with an unopiniated approach 14:42:02 ... it would be good to take a similar approach for the Web 14:42:08 Question: Is there a way for WebTransport to support receive into a GPU buffer or send from a GPU buffer? This would improve performance when WebTransport is used in concert with WebCodecs. 14:42:10 ... memory copies add up pretty quickly 14:42:35 ... for the audio part, it's not so much that we big objects, but we have extremely high number of them 14:42:43 ... so touch memory very often, which also piles up 14:42:58 ... one thing is to have lower level APIs without fancy syntactic sugar 14:43:19 q+ 14:43:25 ... and carefully check where the memory is coming from, and can it uses float32 / WASM buffers 14:43:28 ack padenot 14:43:32 ack jib 14:43:42 Jan-Ivar: (WebRTC, WebTransport) +1 to Paul 14:43:53 ... we've been talking about sources and sinks for memory copies 14:44:01 ... but we need to look at the full pipe chain 14:44:13 ... decode, modify, play - each of the nodes to access the memory 14:44:22 ... do you build the API around the memory optimization path? 14:44:44 ... or do you build a declarative pipe chain and leave it to the browser to optimize? 14:44:57 ... WebCodecs is not using streams, whereas WebTransport, WebRTC are 14:45:18 ... the streams spec has pipeTo allows processing to happen in-parallel-but-not-really 14:45:30 ... not clear whether it does allow for the clean API we would like 14:45:45 ... if webcodecs doesn't participate in the declarative API, will the whole approach still work? 14:46:12 Francois: so we've looked at the technical issues 14:46:30 ... I'm hearing we're confident there are scenarios where this needs to be approached 14:46:45 q+ 14:46:48 ... do we need a specific coordination to make progress on this? or is it already happening on an ad-hoc basis? 14:46:52 q- 14:47:32 Paul: pair-wise / opportunistic approaches are what we've been doing so far through overlap of participation 14:47:40 ... the solution might not be always the same 14:47:40 ack padenot_ 14:47:53 ... but learning from what has been done in other groups has translated wells in the past 14:48:11 ... there would be value in better API consistency, incl for ease of use for developers 14:48:28 ... more could be done 14:48:57 Ken: I've sat on WebGPU-WebCodecs discussions - they indeed tend to happen pair-wise 14:49:25 ... the pattern of passing memory - it remains to be seen if it applies to the all the use cases we have 14:49:35 ... would it make sense to create a CG to host these cross-groups discussion? 14:49:57 https://github.com/WICG/proposals is one possible place 14:50:22 Keith: is there some kind of a new primitive? a JS object of some kind around which we could coordinate 14:50:40 ... may still be difficult 14:51:17 DanS: +1 - getting clarity from other groups beyond happenstance would be great 14:52:02 MikeSmith: one possible place I pasted the URL on IRC - we have a repository which is under the WICG organization for proposals 14:52:16 ... one lightweight way would be to raise an issue there and to use that as a coordination place 14:52:26 ... I don't know if that's the best fit, but that's one of the motiviations for this repo 14:53:03 Francois: assuming this would work wrt "where", "who" would be willing to contribute? 14:53:49 +1 14:53:55 [ChrisC, Paul, Ken, DanSanders, Chai volunteer] 14:54:32 Ken: more discussions sounds useful - not sure we're at the stage where we can get to a single primitive 14:54:52 ... Austin has looked a lot into zero-copy into GPU 14:55:22 ... a single discussion place sounds great, with roads toward new ideas / designs 14:55:38 Francois: I agree I haven't heard a silver bullet solution, but interest in exchanging on scenarios 14:56:20 Mike: what WICG proposals expect are problem statements, rather than solutions - so despite the name, this would be a good fit there 14:58:09 Francois: summarizing: multiple needs, no easy solution, cross-group collaboration to share ideas and align API designs is needed 14:58:24 ... I'll follow up with some of you on how to move forward with this 14:58:28 ... thanks a lot for attending 14:58:55 RRSAgent, draft minutes v2 14:58:55 I have made the request to generate https://www.w3.org/2020/10/26-zerocopy-minutes.html dom 14:59:51 I will create a WICG/reducing-memory-copies repo 14:59:58 RRSAgent, draft minutes v2 14:59:58 I have made the request to generate https://www.w3.org/2020/10/26-zerocopy-minutes.html MikeSmith 15:08:30 Yves has left #zerocopy 15:42:52 keith_miller has joined #zerocopy 16:20:21 keith_m__ has joined #zerocopy 16:21:37 keith_miller has joined #zerocopy 17:17:05 padenot_ has joined #zerocopy 17:21:01 keith_miller has joined #zerocopy 17:21:53 dom, tidoust, is there a public link for the minutes of this breakout session already? Thanks! 17:22:19 padenot_, https://www.w3.org/2020/10/26-zerocopy-minutes.html 17:22:38 dom: thanks! 17:23:02 keith_m__ has joined #zerocopy 17:30:00 Zakim has left #zerocopy 18:21:07 caribou has left #zerocopy