06:20:46 <RRSAgent> RRSAgent has joined #webrtc-ic
06:20:46 <RRSAgent> logging to https://www.w3.org/2021/10/20-webrtc-ic-irc
06:20:48 <dom> RRSAgent, stay
06:20:52 <dom> RRSAgent, make log public
08:26:16 <dom> Slideset: https://lists.w3.org/Archives/Public/www-archive/2021Oct/att-0003/TPAC_2021_breakout_WebRTC_-_NV_Use_cases.pdf
08:46:52 <jrossi> jrossi has joined #webrtc-ic
12:07:44 <Xiao_Fu> Xiao_Fu has joined #webrtc-ic
13:42:20 <tuukkat> tuukkat has joined #webrtc-ic
13:46:08 <hta_> hta_ has joined #webrtc-ic
13:57:18 <yajun_chen> yajun_chen has joined #webrtc-ic
13:57:32 <cpn> cpn has joined #webrtc-ic
13:59:34 <takio> takio has joined #webrtc-ic
13:59:50 <zkis> zkis has joined #webrtc-ic
14:00:11 <eehakkin> eehakkin has joined #webrtc-ic
14:00:29 <takio_> takio_ has joined #webrtc-ic
14:00:44 <xueyuan> xueyuan has joined #webrtc-ic
14:02:49 <jeff> jeff has joined #webrtc-ic
14:04:03 <dom> [slide 2]
14:04:18 <dom> Riju: massive surge of popularity of a number of features in native apps
14:04:49 <dom> ... can we make these features available to the Web Platform by leveraging the underlying platform support, often accelerated by dedicated hardware
14:05:00 <dom> ... the top features we need are listed in the slide
14:05:03 <dom> [slide 3]
14:05:17 <dom> Riju: [shows video face detection API]
14:08:08 <dom> [slide 4]
14:08:23 <dom> Riju: all video platfrom stack do some kind of face detection
14:08:40 <dom> ... in fact many cameras optimize their processing (e.g. exposure) based on face detection
14:08:53 <dom> ... most of that is done via neural network models
14:09:13 <dom> ... today, on the Web this would be done with e.g. tensorflow.js with a WASM or GPU backend
14:09:23 <dom> ... there are cloud-based solutions for this
14:09:39 <dom> ... these aren't restricted by model size - they can deal with blur, emotion, ...
14:09:49 <dom> ... some of them are coming to client side as well
14:10:04 <dom> ... it's always better when the data can be processed locally, and ideally without extra compute
14:10:17 <jib> jib has joined #webrtc-ic
14:10:18 <dom> ... cameras already tend to do that for optimizing their own behavior
14:10:24 <dom> [slide 5]
14:10:40 <dom> Riju: the shape detection API includes a face detector
14:11:09 <dom> ... we could achieve similar results using proposals like Harald's breakout box
14:11:37 <dom> ... making the API work directly on MediaStreamTrack would help in terms of developers ergonomics
14:11:53 <dom> [slide 6]
14:12:03 <dom> Riju: which knobs to expose on the Web platform?
14:12:12 <dom> scribe+
14:12:44 <dom> ... this shows a snapshot of the API our implementation is using
14:12:47 <dom> [slide 7]
14:13:07 <dom> Riju: all new windows 11 compatible devices have a feature for gaze correction
14:13:15 <dom> ... facetime on iOS has something similar
14:13:24 <dom> ... they point to a genuine user need
14:13:37 <dom> ... Harald mentioned the risk of uncanny valley for these features
14:13:58 <dom> ... giving somewhat unsettling results - almost but not quite natural
14:14:12 <dom> ... this may need more data on whether this is worth proceeding
14:14:21 <dom> [slide 8]
14:14:41 <dom> Riju: [video demonstrating background blur with standards windows platform feature]
14:15:08 <dom> ... background blur has been one of the most used feature in videoconference apps
14:15:45 <dom> ... on the Web this can be done with TF with WASM/GPU (e.g. in jitso); Google Meet has that feature as well
14:16:03 <dom> ... this could be combined with background replacement
14:16:13 <dom> ... the platform APIs don't expose the blur level at the moment
14:16:30 <dom> [slide 10]
14:16:51 <dom> riju: [showing what the API could look like with blur/replacement separated]
14:16:59 <dom> i/... on the Web/[slide 9]
14:17:09 <dom> [slide 11]
14:17:57 <xueyuan> present+ Dominique_Hazael_Massieux, Lance_Deng, Barbara_Hochgesang, Carine_Bournez, Chris_Needham, Chun_Gao, Cullen_Jennings, Dwalka, Eero_Häkkinen, Eric_Mwobobia, Florent_Castelli, Harald_Alvestrand, Hunter_Loftis, Jan-Ivar_Bruaroey, Jeffrey_Jaffe, juliana, Julien_Rossi, Jungkee_Song, Justin_Pascalides, Kazuhiro_Hoya, Lea_Richard, Louay_Bassbouss, mamatha, Marcelo_Xella, Peipei_Guo, Philippe_Le_Hegaret, Priya_B, Randell_Jesup, Ruinan_Sun, Shengy
14:17:57 <xueyuan> uan_Feng, Takio_Yamaoka, Tove, Tuukka_Toivonen, Zhibo_Wang, Fu_Xiao, Xueyuan_Jia, Xiaoqian_Wu, Yajun_Chen, youenn, zhenjie, Zoltan_Kis, Shihua_Du, Elad_Alon, Larry_Zhao, Xuan_Li, Anssi_Kostiainen
14:18:10 <dom> [slide 12]
14:18:19 <dom> riju: this is an on-device speech to text demo
14:18:41 <dom> ... there were a few errors, I'm not a native English speaker
14:18:53 <dom> ... there is a Web Speech API already available, but we wanted to use local compute capabilities
14:19:11 <dom> ... There are live-caption capabilities built-in in Chromium
14:19:37 <dom> ... Wonder if there will be an integration in Web Platform APIs
14:19:45 <dom> [slide 13]
14:19:58 <dom> riju: we've also looked at providing noise suppression to the Web platform
14:20:09 <dom> ... our Proof of Concept isn't ready yet
14:20:19 <dom> ... track settings has a boolean setting
14:20:41 <dom> ... maybe it could be extended to an enum? or is it an implementation detail transparent to the developer?
14:21:33 <dom> ... is it something the UA decides? is it something the end-user decides to keep their audio local?
14:21:50 <dom> Harald: interesting presentation
14:22:00 <dom> ... one of the central things that we need a conversation about is
14:22:13 <dom> ... what do we do when the platform we're running on integrates multiple things in one place
14:22:16 <dom> ... e.g. face detection
14:22:30 <dom> ... when it happens on the camera, the camera has more info than what it passes on in the video stream
14:22:42 <dom> ... which can't be obtained otherwise later
14:22:58 <dom> ... but conversely, depending on apps, what you want out of face detection varies a lot
14:23:13 <dom> ... e.g. if you want to position a hat vs position objects linked to the direction of the gaze
14:23:27 <dom> ... I find it difficult to design an API that creates interoperability for apps
14:23:37 <dirk> dirk has joined #webrtc-ic
14:23:44 <jeff> q+
14:23:45 <dom> ... we may want to go down a level and talk about allowing video annotations on a frame by frame level
14:23:59 <dom> ... so that anyone can adapt to their own needs
14:24:11 <dom> ... while still having standardized annotations to allow any kind of processing
14:24:26 <dom> riju: great points
14:24:44 <dom> ... I was thinking that how useful exposing just a rectangle would be
14:24:56 <dom> ... platform APIs don't expose masks which would be needed for replacement
14:25:12 <dom> ... the APIs are distinct for face detection (rectangle) vs background blur
14:25:31 <dom> ... we were trying to get as much attributes as possible from lower down the stack from the camera
14:25:48 <dom> ... via a getAttributes call e.g. to get facial landmark
14:25:54 <dom> ... we need to add more
14:26:46 <dom> Cullen: native performance exceeds what can be achieved via WASM today for this type of use cases
14:27:00 <dom> ... ideally we would be able push a neural network to cameras directly
14:27:13 <dom> ... computer vision and audio processing are way ahead on native clients than Web apps today
14:27:23 <dom> ... not possible to achieve the same on the Web
14:27:41 <dom> riju: Web exposes only CPU & GPU through WASM and WebGL
14:27:55 <dom> ... but there is amore and more specialized hardware coming from that kind of workload and available in native apps
14:28:19 <dom> ... we're trying to mirror how native platform apps are going to do instead of bringing your own framework
14:28:23 <xueyuan> present+ Eric_Carlson, martin_wonsiewicz
14:28:35 <dom> ChrisNeedham: thank you for organizing this
14:28:45 <dom> ... it seems to me that it opens up a number of privacy and ethical questions
14:28:57 <dom> ... e.g. face detection - how well will it work with different color skins?
14:29:12 <dom> ... even more of a concern with additional details (age, gender, emotion)
14:29:35 <dom> ... if we're providing lower level APIs where developers build these capabilities, we're not baking possibly biases primitives in the platform
14:30:04 <dom> ... That also applies to speech2text - will it work well for all users?
14:30:27 <Louay> Louay has joined #webrtc-ic
14:30:28 <dom> ... re privacy, on-device vs in-cloud will be key - people are high awareness of this
14:30:40 <Louay> Present+ Louay_Bassbouss
14:30:41 <dom> ... allowing on-device, signaling very clearly when in-cloud would be important
14:31:03 <dom> Riju: what I showed was a proof of concept, whose quality depends on the quality of the model
14:31:10 <dom> ... in this case, it wasn't state of the art
14:31:22 <dom> ... a better one like SODA from Google would give better results
14:31:45 <dom> ... biases can be dealt with better models
14:31:55 <dom> ... exposing to users whether the compute is local or cloud
14:32:08 <dom> ... would help being transparent with users
14:32:09 <jrossi> jrossi has joined #webrtc-ic
14:32:19 <dom> ... noise suppression algorithms today are running mostly on the cloud
14:32:37 <dom> ... which may make people not comfortable ; we should give the choice
14:33:47 <dom> ... In platform APIs, only "blink" and "smile" are exposed consistently
14:33:59 <dom> ... for age/gender, it's mostly based on a "bring your own model" approach
14:34:24 <dom> ... the objective of our discussions today is to leverage the underlying platform capabilities instead of brining up frameworks
14:34:42 <dom> ... when the Web platform cannot provide equivalent capabilities, people use other platforms
14:34:56 <dom> ... there may need to add additional permissions prompts to help with privacy
14:35:11 <dom> ... there may be attributes we wouldn't expose (e.g. age, gender
14:35:30 <dom> Jan-ivar: I have some concerns
14:35:52 <dom> ... one is redundancy - we're already working on exposing data to JS on MediaStreamTrack
14:36:01 <dom> ... also concerns about standardizing across platforms
14:36:43 <dom> ... not sure browsers should own these features - how far would be go
14:36:56 <dom> ... e.g. sepia tone, filters, ...
14:37:05 <dom> ... Maybe face detection browsers would be better positioned
14:37:19 <dom> ... browsers may do a better job at face tracking e.g. from a diversity aspect
14:37:51 <dom> riju: if we want to bridge the gap with native in terms of capabilities
14:38:22 <dom> ... in terms of WG, I'm happy to look at which groups would be best to incubate this
14:39:48 <dom> jan-ivar: Would you expect browser to polyfill the results in software if it's not available from hardware?
14:39:56 <dom> ... e.g. face detection
14:40:42 <dom> riju: face detection is available across Window, ChromeOS, Mac, ...
14:41:08 <dom> jan-ivar: then background blur? how to deal with it if not hardware accelerated?
14:41:25 <dom> riju: it would depend on the performance
14:42:24 <dom> Tuukka: if the platform exposes the API, the browser would expose it, and if not, it would not
14:42:56 <dom> ... in some cases, it's not clear where in the platform it runs in the camera stack
14:43:06 <dom> ... but we should assume if the platform exposes it, it brings better performances
14:43:30 <dom> Youenn: good and exciting topic
14:43:36 <kirkwood> kirkwood has joined #webrtc-ic
14:43:41 <dom> ... it's fine if we can expose OS-level data like face detection
14:43:55 <dom> ... there will be more and more on-device processing with end-to-end-encryption
14:44:12 <dom> ... so cloud processing will go down
14:44:25 <dom> ... e.g. speech recognition API should have a local mode
14:44:36 <dom> ... wasm can always provide a fallback approach for local processing
14:44:53 <dom> ... we need to look at what producers do in terms of generating the data, if it's consistent across providers
14:44:57 <dom> ... what consumers expect
14:45:05 <dom> ... and if there is a sweet spot among all of these
14:45:27 <dom> ... even if face detection is not exactly what the app wants, it may be a good starting point for apps
14:45:40 <dom> ... I would start with face detection
14:45:59 <dom> ... if background blur is possible, maybe - but it can already be done with existing APIs
14:46:23 <dom> ... going deeper with e.g. binary depth map - maybe, but that seems more difficult, not sure we're there yet
14:46:31 <dom> ... face detection would be a good way to explore this space
14:46:51 <dom> riju: thanks - starting with face detection is indeed what I would suggest
14:47:10 <dom> ... even if the app doesn't want exactly the rectangle, it still reduces the size of what you have to search for
14:47:43 <dom> ... re echo cancellation, it was only supported on mac via system settings
14:48:19 <dom> youenn: I think the WebRTC WG is the right place to do this week given the tie to media capture work
14:48:37 <dom> ... raising an issue in mediacapture-extensions
14:48:56 <dom> Jeff: thanks, very interesting and exciting to improve the collaboration capabilities of the platform
14:49:11 <dom> ... I got the impression that some of these ideas have different level of maturity
14:49:26 <dom> ... I was wondering if you had some thoughts about what's the right location for the work to be done?
14:49:40 <dom> ... some of it maybe in WebRTC WG, in other WGs, in Community Groups for incubation?
14:50:18 <dom> riju: Harald adviced to file face detection and background blur in mediacapture extensions
14:50:25 <dom> ... initially filed them in image catpure
14:50:33 <dom> ... they're a mixture of Media/WebRTC WG
14:50:43 <dom> ... I'll take advice from their chairs
14:50:58 <dom> ... mediacapture-extensions repo seems to be one of the right places
14:51:13 <dom> randell: I'd echo Jan-Ivar's comments
14:51:40 <dom> ... one thing that concerned me is the possibility of locking-in an API that is tied to specific capabilities on specific hardware
14:51:52 <dom> ... removing flexibility to future alternatives
14:52:15 <dom> ... e.g. today face recognition has a rectangle, and the future would provide an ovale
14:52:30 <dom> ... if you're stick with a rectangle API, you won't be able to take advantage of that
14:52:43 <dom> ... there are many things provided by hardware, drivers, OS
14:53:07 <dom> ... we can't add an API for each of those - there needs to thoughts about exposing these hardware and OS features in a more general matters
14:53:16 <dom> ... that lets us more easily integrate feature capabilities in this framework
14:53:22 <dom> ... that's what I would suggest to integrate
14:53:31 <dom> riju: thank you randell
14:53:46 <dom> ... this is based on platform APIs available across all platforms today
14:54:11 <dom> ... these features are used by lots of people, with a high demand
14:54:34 <dom> ... if we don't provide this on the Web, everyone will have to do it on their own with lower performance
14:54:45 <dom> ... to keep up, everybody will bring their own framework and infrastructure
14:54:55 <dom> ... sometimes it might make sense not to bring everything to the Web
14:55:05 <dom> ... but with the level of popularity...
14:55:23 <dom> randell: I'm not saying that we shouldn't do this, but that we need to develop it in a way that allows evolution
14:55:38 <dom> ... it's too focused on catch-up with current state
14:56:58 <dom> RRSAgent, draft minutes
14:56:58 <RRSAgent> I have made the request to generate https://www.w3.org/2021/10/20-webrtc-ic-minutes.html dom
14:57:21 <dom> Cullen: I would discourage face-detection
14:57:33 <dom> ... lots of variability of needs, big risks of biases
14:57:50 <dom> ... it doesn't run on all platforms where browsers ship
14:57:56 <dom> ... I think background blur would much better
14:58:06 <dom> ... it's really compute intensive and hard to do well
14:58:19 <dom> ... it has much less variation in how it would be used
14:59:05 <dom> Harald: in order to do background removal well, you actually have to do foreground/person detection
14:59:21 <dom> ... in terms of exploration that might be useful anyway would be annotating video frames with area of interest
14:59:37 <dom> ... would be a good assistance to both features but independent to specific applications
15:00:35 <dom> RRSAgent, draft minutes
15:00:35 <RRSAgent> I have made the request to generate https://www.w3.org/2021/10/20-webrtc-ic-minutes.html dom
15:01:57 <takio_> takio_ has left #webrtc-ic
15:02:44 <xueyuan> present+ Rijubrata Bhaumik, Shengyuan_Feng, Takio_Yamaoka, Tove, Tuukka_Toivonen, Zhibo_Wang, Fu_Xiao, Xueyuan_Jia, Xiaoqian_Wu, Yajun_Chen, youenn, zhenjie, Zoltan_Kis, Shihua_Du, Elad_Alon, Larry_Zhao, Xuan_Li, Anssi_Kostiainen
15:02:52 <xueyuan> rrsagent, make minutes
15:02:52 <RRSAgent> I have made the request to generate https://www.w3.org/2021/10/20-webrtc-ic-minutes.html xueyuan
15:21:08 <riju> riju has joined #webrtc-ic
15:40:04 <eehakkin> eehakkin has joined #webrtc-ic
16:29:38 <dom> RRSAgent, bye
16:29:38 <RRSAgent> I see no action items