06:20:46 RRSAgent has joined #webrtc-ic 06:20:46 logging to https://www.w3.org/2021/10/20-webrtc-ic-irc 06:20:48 RRSAgent, stay 06:20:52 RRSAgent, make log public 08:26:16 Slideset: https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0003/TPAC_2021_breakout_WebRTC_-_NV_Use_cases.pdf 08:46:52 jrossi has joined #webrtc-ic 12:07:44 Xiao_Fu has joined #webrtc-ic 13:42:20 tuukkat has joined #webrtc-ic 13:46:08 hta_ has joined #webrtc-ic 13:57:18 yajun_chen has joined #webrtc-ic 13:57:32 cpn has joined #webrtc-ic 13:59:34 takio has joined #webrtc-ic 13:59:50 zkis has joined #webrtc-ic 14:00:11 eehakkin has joined #webrtc-ic 14:00:29 takio_ has joined #webrtc-ic 14:00:44 xueyuan has joined #webrtc-ic 14:02:49 jeff has joined #webrtc-ic 14:04:03 [slide 2] 14:04:18 Riju: massive surge of popularity of a number of features in native apps 14:04:49 ... can we make these features available to the Web Platform by leveraging the underlying platform support, often accelerated by dedicated hardware 14:05:00 ... the top features we need are listed in the slide 14:05:03 [slide 3] 14:05:17 Riju: [shows video face detection API] 14:08:08 [slide 4] 14:08:23 Riju: all video platfrom stack do some kind of face detection 14:08:40 ... in fact many cameras optimize their processing (e.g. exposure) based on face detection 14:08:53 ... most of that is done via neural network models 14:09:13 ... today, on the Web this would be done with e.g. tensorflow.js with a WASM or GPU backend 14:09:23 ... there are cloud-based solutions for this 14:09:39 ... these aren't restricted by model size - they can deal with blur, emotion, ... 14:09:49 ... some of them are coming to client side as well 14:10:04 ... it's always better when the data can be processed locally, and ideally without extra compute 14:10:17 jib has joined #webrtc-ic 14:10:18 ... cameras already tend to do that for optimizing their own behavior 14:10:24 [slide 5] 14:10:40 Riju: the shape detection API includes a face detector 14:11:09 ... we could achieve similar results using proposals like Harald's breakout box 14:11:37 ... making the API work directly on MediaStreamTrack would help in terms of developers ergonomics 14:11:53 [slide 6] 14:12:03 Riju: which knobs to expose on the Web platform? 14:12:12 scribe+ 14:12:44 ... this shows a snapshot of the API our implementation is using 14:12:47 [slide 7] 14:13:07 Riju: all new windows 11 compatible devices have a feature for gaze correction 14:13:15 ... facetime on iOS has something similar 14:13:24 ... they point to a genuine user need 14:13:37 ... Harald mentioned the risk of uncanny valley for these features 14:13:58 ... giving somewhat unsettling results - almost but not quite natural 14:14:12 ... this may need more data on whether this is worth proceeding 14:14:21 [slide 8] 14:14:41 Riju: [video demonstrating background blur with standards windows platform feature] 14:15:08 ... background blur has been one of the most used feature in videoconference apps 14:15:45 ... on the Web this can be done with TF with WASM/GPU (e.g. in jitso); Google Meet has that feature as well 14:16:03 ... this could be combined with background replacement 14:16:13 ... the platform APIs don't expose the blur level at the moment 14:16:30 [slide 10] 14:16:51 riju: [showing what the API could look like with blur/replacement separated] 14:16:59 i/... on the Web/[slide 9] 14:17:09 [slide 11] 14:17:57 present+ Dominique_Hazael_Massieux, Lance_Deng, Barbara_Hochgesang, Carine_Bournez, Chris_Needham, Chun_Gao, Cullen_Jennings, Dwalka, Eero_Häkkinen, Eric_Mwobobia, Florent_Castelli, Harald_Alvestrand, Hunter_Loftis, Jan-Ivar_Bruaroey, Jeffrey_Jaffe, juliana, Julien_Rossi, Jungkee_Song, Justin_Pascalides, Kazuhiro_Hoya, Lea_Richard, Louay_Bassbouss, mamatha, Marcelo_Xella, Peipei_Guo, Philippe_Le_Hegaret, Priya_B, Randell_Jesup, Ruinan_Sun, Shengy 14:17:57 uan_Feng, Takio_Yamaoka, Tove, Tuukka_Toivonen, Zhibo_Wang, Fu_Xiao, Xueyuan_Jia, Xiaoqian_Wu, Yajun_Chen, youenn, zhenjie, Zoltan_Kis, Shihua_Du, Elad_Alon, Larry_Zhao, Xuan_Li, Anssi_Kostiainen 14:18:10 [slide 12] 14:18:19 riju: this is an on-device speech to text demo 14:18:41 ... there were a few errors, I'm not a native English speaker 14:18:53 ... there is a Web Speech API already available, but we wanted to use local compute capabilities 14:19:11 ... There are live-caption capabilities built-in in Chromium 14:19:37 ... Wonder if there will be an integration in Web Platform APIs 14:19:45 [slide 13] 14:19:58 riju: we've also looked at providing noise suppression to the Web platform 14:20:09 ... our Proof of Concept isn't ready yet 14:20:19 ... track settings has a boolean setting 14:20:41 ... maybe it could be extended to an enum? or is it an implementation detail transparent to the developer? 14:21:33 ... is it something the UA decides? is it something the end-user decides to keep their audio local? 14:21:50 Harald: interesting presentation 14:22:00 ... one of the central things that we need a conversation about is 14:22:13 ... what do we do when the platform we're running on integrates multiple things in one place 14:22:16 ... e.g. face detection 14:22:30 ... when it happens on the camera, the camera has more info than what it passes on in the video stream 14:22:42 ... which can't be obtained otherwise later 14:22:58 ... but conversely, depending on apps, what you want out of face detection varies a lot 14:23:13 ... e.g. if you want to position a hat vs position objects linked to the direction of the gaze 14:23:27 ... I find it difficult to design an API that creates interoperability for apps 14:23:37 dirk has joined #webrtc-ic 14:23:44 q+ 14:23:45 ... we may want to go down a level and talk about allowing video annotations on a frame by frame level 14:23:59 ... so that anyone can adapt to their own needs 14:24:11 ... while still having standardized annotations to allow any kind of processing 14:24:26 riju: great points 14:24:44 ... I was thinking that how useful exposing just a rectangle would be 14:24:56 ... platform APIs don't expose masks which would be needed for replacement 14:25:12 ... the APIs are distinct for face detection (rectangle) vs background blur 14:25:31 ... we were trying to get as much attributes as possible from lower down the stack from the camera 14:25:48 ... via a getAttributes call e.g. to get facial landmark 14:25:54 ... we need to add more 14:26:46 Cullen: native performance exceeds what can be achieved via WASM today for this type of use cases 14:27:00 ... ideally we would be able push a neural network to cameras directly 14:27:13 ... computer vision and audio processing are way ahead on native clients than Web apps today 14:27:23 ... not possible to achieve the same on the Web 14:27:41 riju: Web exposes only CPU & GPU through WASM and WebGL 14:27:55 ... but there is amore and more specialized hardware coming from that kind of workload and available in native apps 14:28:19 ... we're trying to mirror how native platform apps are going to do instead of bringing your own framework 14:28:23 present+ Eric_Carlson, martin_wonsiewicz 14:28:35 ChrisNeedham: thank you for organizing this 14:28:45 ... it seems to me that it opens up a number of privacy and ethical questions 14:28:57 ... e.g. face detection - how well will it work with different color skins? 14:29:12 ... even more of a concern with additional details (age, gender, emotion) 14:29:35 ... if we're providing lower level APIs where developers build these capabilities, we're not baking possibly biases primitives in the platform 14:30:04 ... That also applies to speech2text - will it work well for all users? 14:30:27 Louay has joined #webrtc-ic 14:30:28 ... re privacy, on-device vs in-cloud will be key - people are high awareness of this 14:30:40 Present+ Louay_Bassbouss 14:30:41 ... allowing on-device, signaling very clearly when in-cloud would be important 14:31:03 Riju: what I showed was a proof of concept, whose quality depends on the quality of the model 14:31:10 ... in this case, it wasn't state of the art 14:31:22 ... a better one like SODA from Google would give better results 14:31:45 ... biases can be dealt with better models 14:31:55 ... exposing to users whether the compute is local or cloud 14:32:08 ... would help being transparent with users 14:32:09 jrossi has joined #webrtc-ic 14:32:19 ... noise suppression algorithms today are running mostly on the cloud 14:32:37 ... which may make people not comfortable ; we should give the choice 14:33:47 ... In platform APIs, only "blink" and "smile" are exposed consistently 14:33:59 ... for age/gender, it's mostly based on a "bring your own model" approach 14:34:24 ... the objective of our discussions today is to leverage the underlying platform capabilities instead of brining up frameworks 14:34:42 ... when the Web platform cannot provide equivalent capabilities, people use other platforms 14:34:56 ... there may need to add additional permissions prompts to help with privacy 14:35:11 ... there may be attributes we wouldn't expose (e.g. age, gender 14:35:30 Jan-ivar: I have some concerns 14:35:52 ... one is redundancy - we're already working on exposing data to JS on MediaStreamTrack 14:36:01 ... also concerns about standardizing across platforms 14:36:43 ... not sure browsers should own these features - how far would be go 14:36:56 ... e.g. sepia tone, filters, ... 14:37:05 ... Maybe face detection browsers would be better positioned 14:37:19 ... browsers may do a better job at face tracking e.g. from a diversity aspect 14:37:51 riju: if we want to bridge the gap with native in terms of capabilities 14:38:22 ... in terms of WG, I'm happy to look at which groups would be best to incubate this 14:39:48 jan-ivar: Would you expect browser to polyfill the results in software if it's not available from hardware? 14:39:56 ... e.g. face detection 14:40:42 riju: face detection is available across Window, ChromeOS, Mac, ... 14:41:08 jan-ivar: then background blur? how to deal with it if not hardware accelerated? 14:41:25 riju: it would depend on the performance 14:42:24 Tuukka: if the platform exposes the API, the browser would expose it, and if not, it would not 14:42:56 ... in some cases, it's not clear where in the platform it runs in the camera stack 14:43:06 ... but we should assume if the platform exposes it, it brings better performances 14:43:30 Youenn: good and exciting topic 14:43:36 kirkwood has joined #webrtc-ic 14:43:41 ... it's fine if we can expose OS-level data like face detection 14:43:55 ... there will be more and more on-device processing with end-to-end-encryption 14:44:12 ... so cloud processing will go down 14:44:25 ... e.g. speech recognition API should have a local mode 14:44:36 ... wasm can always provide a fallback approach for local processing 14:44:53 ... we need to look at what producers do in terms of generating the data, if it's consistent across providers 14:44:57 ... what consumers expect 14:45:05 ... and if there is a sweet spot among all of these 14:45:27 ... even if face detection is not exactly what the app wants, it may be a good starting point for apps 14:45:40 ... I would start with face detection 14:45:59 ... if background blur is possible, maybe - but it can already be done with existing APIs 14:46:23 ... going deeper with e.g. binary depth map - maybe, but that seems more difficult, not sure we're there yet 14:46:31 ... face detection would be a good way to explore this space 14:46:51 riju: thanks - starting with face detection is indeed what I would suggest 14:47:10 ... even if the app doesn't want exactly the rectangle, it still reduces the size of what you have to search for 14:47:43 ... re echo cancellation, it was only supported on mac via system settings 14:48:19 youenn: I think the WebRTC WG is the right place to do this week given the tie to media capture work 14:48:37 ... raising an issue in mediacapture-extensions 14:48:56 Jeff: thanks, very interesting and exciting to improve the collaboration capabilities of the platform 14:49:11 ... I got the impression that some of these ideas have different level of maturity 14:49:26 ... I was wondering if you had some thoughts about what's the right location for the work to be done? 14:49:40 ... some of it maybe in WebRTC WG, in other WGs, in Community Groups for incubation? 14:50:18 riju: Harald adviced to file face detection and background blur in mediacapture extensions 14:50:25 ... initially filed them in image catpure 14:50:33 ... they're a mixture of Media/WebRTC WG 14:50:43 ... I'll take advice from their chairs 14:50:58 ... mediacapture-extensions repo seems to be one of the right places 14:51:13 randell: I'd echo Jan-Ivar's comments 14:51:40 ... one thing that concerned me is the possibility of locking-in an API that is tied to specific capabilities on specific hardware 14:51:52 ... removing flexibility to future alternatives 14:52:15 ... e.g. today face recognition has a rectangle, and the future would provide an ovale 14:52:30 ... if you're stick with a rectangle API, you won't be able to take advantage of that 14:52:43 ... there are many things provided by hardware, drivers, OS 14:53:07 ... we can't add an API for each of those - there needs to thoughts about exposing these hardware and OS features in a more general matters 14:53:16 ... that lets us more easily integrate feature capabilities in this framework 14:53:22 ... that's what I would suggest to integrate 14:53:31 riju: thank you randell 14:53:46 ... this is based on platform APIs available across all platforms today 14:54:11 ... these features are used by lots of people, with a high demand 14:54:34 ... if we don't provide this on the Web, everyone will have to do it on their own with lower performance 14:54:45 ... to keep up, everybody will bring their own framework and infrastructure 14:54:55 ... sometimes it might make sense not to bring everything to the Web 14:55:05 ... but with the level of popularity... 14:55:23 randell: I'm not saying that we shouldn't do this, but that we need to develop it in a way that allows evolution 14:55:38 ... it's too focused on catch-up with current state 14:56:58 RRSAgent, draft minutes 14:56:58 I have made the request to generate https://www.w3.org/2021/10/20-webrtc-ic-minutes.html dom 14:57:21 Cullen: I would discourage face-detection 14:57:33 ... lots of variability of needs, big risks of biases 14:57:50 ... it doesn't run on all platforms where browsers ship 14:57:56 ... I think background blur would much better 14:58:06 ... it's really compute intensive and hard to do well 14:58:19 ... it has much less variation in how it would be used 14:59:05 Harald: in order to do background removal well, you actually have to do foreground/person detection 14:59:21 ... in terms of exploration that might be useful anyway would be annotating video frames with area of interest 14:59:37 ... would be a good assistance to both features but independent to specific applications 15:00:35 RRSAgent, draft minutes 15:00:35 I have made the request to generate https://www.w3.org/2021/10/20-webrtc-ic-minutes.html dom 15:01:57 takio_ has left #webrtc-ic 15:02:44 present+ Rijubrata Bhaumik, Shengyuan_Feng, Takio_Yamaoka, Tove, Tuukka_Toivonen, Zhibo_Wang, Fu_Xiao, Xueyuan_Jia, Xiaoqian_Wu, Yajun_Chen, youenn, zhenjie, Zoltan_Kis, Shihua_Du, Elad_Alon, Larry_Zhao, Xuan_Li, Anssi_Kostiainen 15:02:52 rrsagent, make minutes 15:02:52 I have made the request to generate https://www.w3.org/2021/10/20-webrtc-ic-minutes.html xueyuan 15:21:08 riju has joined #webrtc-ic 15:40:04 eehakkin has joined #webrtc-ic 16:29:38 RRSAgent, bye 16:29:38 I see no action items