WebML WG Teleconference – 19 October 2023

Meeting minutes

Repository: webmachinelearning/webnn

anssik: we again had a number of new folks joining the WG since our last meeting, let me introduce them
… please welcome Reilly Grant from Google Chrome; Reilly co-chairs W3C Devices and Sensors WG with me and works on device-related features in Chromium, he is very knowledgeable in spec authoring having edited many specs
… leading a new team with Phillis that is looking at Web APIs such as WebNN

Reilly: I've worked on Shape Detection API a while ago, also APIs for high-level ops such as Media Capture extensions
… also looking at ML from computation perspective as a new workload, how it works with Wasm, WebGPU, we see devices coming to the market with more purpose-built accelerators

anssik: similarly, please welcome Christian Weyer, CTO of Thinktecture AG, a SW service and consultancy company working in web tech space; I've worked with some of his team in W3C context, e.g. Christian Liebel who I met at TPAC recently to discuss this WG's recent progress
… also please welcome to Austin Sullivan from Google who has worked on File System related web features

Austin: worked on File System features, moving into the new space with Reilly

Phillis: Phillis Tang, working with Reilly on this new team

anssik: I want to note BlinkOn conference is currently on
… on Day 1 Oct 17 a breakout session "WebNN implementation on DirectML" was on the BlinkOn agenda, those who attended feel free to share your key takeaways

Chai: BlinkOn is also today, Rafael is there at this very moment
… we had one breakout session on WebNN, it was well-attended
… chatted with Deepti there at BlinkOn
… went well and the recording is posted on YT on BlinkOn channel, as a summary, this is replay of the talk we gave at Intel Innovation conference a month ago

<Deepti> Enjoyed the talk, thanks for giving it Chai!

Chai: a bit more discussion, after the talks and break we talked with a number of Google people, I'm happy to share the status of the work we're doing here with a number of people
… new faces here, very encouraging!
… this is an active area with interest from many companies in this space,
… the keynote was on the challenges of running AI in the browser by Jim Bankoski, I had a good chat with him

Ningxin: Chai did a great job represent WebNN at BlinkOn

WebNN v2: Review transformer ops spec contributions

anssik: issue #375

<gb> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]

anssik: the WG has actively worked on transformer op definitions as its priority effort during Q3 and now Q4
… we have followed the guidelines for adding new operations:

https://github.com/webmachinelearning/webnn/blob/main/CONTRIBUTING.md#proposing-and-adding-a-new-operation

anssik: we have identified the use cases and sample models (thanks Dwayne, JoshuaL!):
… - Text-to-image: https://huggingface.co/runwayml/stable-diffusion-v1-5
… - Image segmentation: facebookresearch/segment-anything
… - Speech-to-text: https://huggingface.co/openai/whisper-tiny
… - Text-to-text generation (encoder-decoder): https://huggingface.co/t5-small and https://huggingface.co/facebook/m2m100_418M
… - Text-generation (decoder-only): https://huggingface.co/meta-llama/Llama-2-7b
… we have done the initial op decomposition aka Transformer Models Analysis (thanks Wanming!):

Transformer Models Analysis

anssik: based on Google's feedback this spreadsheet now includes also TOSA and StableHLO mapping contributed by Dwayne, feedback welcome on this mapping in particular from Google's TOSA and StableHLO experts

Dwayne: there's roughly 1:1 mapping for all except one, triangularMatrix
… the next step for the WG is to investigate:
… - Cross-framework support. Is an identical or similar operation supported by multiple popular frameworks? What are they?
… - Cross-platform implementability. Is the operation implementable in more than one platform? What are they?

Chai: I think in the BlinkOn session there were a few good questions re op coverage, I explained the methodology how we look at these overlaps across frameworks and design principles re small and big ops we're talked about in this WG for a while
… ongoing balance, tradeoff, trying to find things that are useful to everyone, discussion is alive and well in the minds of people, transformers have good overlap across frameworks

dom: how are the conversations around transformers and LLMs, should we look at storage and caching of large models?

anssik: we had a dedicated session to discuss storage APIs and caching on our call a month ago

<asully> https://www.w3.org/2023/09/21-webmachinelearning-minutes.html

Reilly: I think that for storage there's potential integrations with some of the storage APIs
… can load constants from buffer views, could also do that from storage objects such as blobs directly
… on the general subject of integrations with other APIs, I saw MLCommandEncoder and it sounds nice for WebGPU integration
… Wasm is getting better support for loading additional ArrayBuffers to share immediate values with the WebNN API possibly

anssik: we are also starting to work on spec definitions for these ops in parallel
… we've also committed to follow the security guidelines for new operations to ensure proper considerations are given to the design

Security guidelines for new operations

anssik: I want to make sure the WG addresses any security issues reported or identified as a top priority
… on today's agenda there's a separate topic to discuss the recently published WebGPU security technical report to distill any learnings to apply to WebNN

anssik: so that's the current status of transformer ops effort
… now I want us to discuss and review any spec PRs or contributions for the proposed transformer ops informed by the op breakdown exercise
… I know Chai is working on a spec PR for op definitions, I think it needs a bit more work before we can review it, Chai?

Chai: it is a big PR, I've been working on it actively over the last few week, I hope to be able to share the PR soon
… can push it out and have people look at it
… what is outlined in the issue #375 is pretty complete picture, the expected spec definition PR addresses the issue pretty closely

<gb> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]

Chai: maybe we can collapse some of the ops to reduce [no pun] the API surface
… design principles around small and big ops applied to the design
… I've also discussed with Zoltan and he will make time available in Q4 to help the WG with these spec PRs and supports reviews, align new ops with the modern spec conventions as needed

Zoltan: I try to be useful for the WG, I got great help from Joshua Bell

Reilly: some of the earlier documentation on v2 plans, where the group is in terms of removing some ops?

<dom> 3.1. Guidelines for new operations

Chai: great question, versioning is an important topic, v1/v2 is what we coined
… the changes to op set need to be taken with case, we've done some name changes that have been done carefully
… in my day job, working on Windows, we are very sensitive to API breakage, want it not to happen
… if there's something that is redundant we can composite the semantics in terms of other ops
… transformers change will be one of the bigger ones, I'll pay a lot of attention to any possibly breaking changes

ningxin_hu: in prototyping, we work with Google and Microsoft on Chromium implementation, DML and XNNPACK backends
… this implementation experience informs the spec design
… we actively open new spec issues as we find issues in implementation
… op set simplification is informed by implementation experience too, more data points for the WG

Deepti: wanted to touch on the API surface and possible changes to it
… it is my understanding it is still early in browser adoption, we have experimental implementation in Chromium but not yet other public web engine implementations
… the WG should take that into consideration, want to know what the requirements are, how to minimize API breakage

reillyg: I've looked at CoreML API with Phillis, and it does a bit of digging to find it, they have a list of ops supported by their platform
… the WG should start looking to include those ops into the analysis

Chai: before the spec reached initial CR we talked with our Apple contacts and they reviewed the spec and the latest feedback was they liked the spec design as backend API, graph API, also noting it is implementable on their platform
… for op comparison, we can look at different frameworks, we want to be extensive in our research, want to say op coverage is still a living area, maybe not as fast as earlier
… there will be changes to native OS-level APIs too, but it is a never-ending story

WebGPU security technical report

anssik: Jiewei (thanks!) brought to the WG's attention the recently published WebGPU security technical report:

WebGPU security technical report

anssik: this well-written report (kudos to authors tiszka, bookholt, mattdr) outlines how WebGPU works through the mind of an attacker, Chrome team's vulnerability research methodologies, and thought processes in some of the more difficult research areas.
… I encourage all WG participats to read this report, in particular I'd ask the WG to identify any hardening opportunities for the WebNN API based on these WebGPU security insights.
… from the report I gather WebGPU introduces two unique attack surfaces to Chrome:
… - the WebGPU API implementation which was added to the GPU process & renderer process; and
… - the WGSL (pronounced "wig-sal"?) shader compiler added to the GPU process
… at the end there's a section summarizing systemic concerns

WebGPU systemic concerns

anssik: I'm glad the Chrome Offensive Security team shared this report publicly so other APIs can learn from these insights
… any findings anyone else would like to surface for discussion in the context of WebNN API?

Reilly: I did not yet read the report, but have worked on similar API that enable this type of interaction with renderer and service processes
… the common theme is that anytime you give untrusted content the ability to affect the lifetime of trusted context, you may provide exploitable surface
… e.g. referring to platform resources in privileged processes in WebNN

anssik: any other similar reports?

reillyg: Project Zero has made good publications in this space

ningxin_hu: want to add that because WebNN supports multiple devices, CPU, GPU, NPU, different backends have different security design considerations
… XNNPACK CPU backend runs in renderer sandbox
… different from GPU backend in GPU process

<chai> sorry need to drop.

ningxin_hu: WebNN is HW-agnostic and has different security considerations

– DRAFT –
WebML WG Teleconference – 19 October 2023

19 October 2023

Attendees

Meeting minutes

WebNN v2: Review transformer ops spec contributions

WebGPU security technical report

Diagnostics