WebML WG Teleconference – 14 December 2023

Meeting minutes

Repository: webmachinelearning/webnn

Welcome to new participants

anssik: we had a slew of new folks joining the WG over the past few weeks, let me introduce them:

anssik: Tianqi Chen joined the WG as an Invited Expert, welcome! Tianqi is an Assistant Professor at CMU, creator of WebLLM, Chief Technologist at OctoML. I've invited Tianqi to share his learnings from WebLLM and his proposed use cases for WebNN to consider on our 11 January 2024 call subject to his availability.

anssik: Bryan Bernhart from Intel joined as a WG rep, welcome! Bryan brings in a wealth of GPU expertise, he is an active WebGPU contributor, works with many of our participants in that space already.

Bryan: good intro Anssi!

anssik: Laszlo Gombos from Samsung also joined the WG, welcome! It has been my pleasure to work with Laszlo in many areas of the web platform. I want to note Samsung's web contributions extend beyond mobile devices. For example, Samsung Internet, Samsung's Chromium-powered browser, is available on Windows PCs too. I'm pleased to see more browser vendors join the WG with interest in the WebNN API.

anssik: Christos Bacharakis, Director of Engineering at eyeo, joined the WG to help drive the ML ethical guidelines forward. Welcome and thank you! The ethics effort is expected to accelerate in 2024.

anssik: I may have missed some new folks due to the avalanche of new people joining, my apologies if I missed you.
… the strong momentum behind the WG's work has clearly been recognized in the industry and as a consequence the WG continues to grow.
… please join me in welcoming these new people to the WG
… there are opportunities for everyone to make impactful contributions.

WebNN v2: Transformer ops spec contributions celebration

anssik: issue #375 and PR #478

<gb> Issue 375 Support for transformers (by dontcallmedom) [v2] [operation set]

<gb> CLOSED Pull Request 478 Add support for operations needed for well-known transformers e.g. Segment Anything, Stable Diffusion, etc. (by wchao1115)

Thank You!

anssik: I wanted to use this meeting to celebrate the major milestone the WG hit this week by adding support for operations needed for well-known transformers.
… first I want to acknowledge various individuals for their contributions:
… Chai and Ningxin as the editors diligently worked on this PR addressing in total 195 review comments! A lot of work to go though all the feedback. Chai's contribution was a big PR so getting this done was a heavy lift.
… Dwayne and Joshua L provided the initial seeds with their contributions that shaped and helped formulate the initial scope of v2 ops
… your implementation experience informed the WG in significant ways and helped converge on the target models
… Wanming's detailed transformer models analysis formed the basis for the WG's data-driven work mode that ensured the new ops that were added to the WebNN API and truly required to satisfy the requirements of the target models
… we also recognize Google Chrome team's directional guidance received earlier this year and reflected that into this analysis
… including consideration for compatibility with other op sets, TOSA and StableHLO
… Bruce, May & co worked to advance WPT tests and webnn-baseline implementation to meet the interop demonstration expectations
… Jiewei continued his careful review of WebNN API implementation and as you've noticed, his Chromium CL review comments have helped this WG improve the WebNN API spec in significant ways
… Jiewei is demonstrating and role-modelling how a tight spec-implementation feedback loop works, identify numerous edge cases and proposed improvements
… Joshua Bell and Zoltan helped keep this latest PR and the spec in general aligned with the latest spec authoring conventions, an important area of work
… Bin, Shiyi and Rafael have all provided great implementation informed insights and Rafael in addition has been our resident GPU expert in this WG, thank you! I expect Rafael to pair with Bryan who joined the WG with whom he has worked on many GPU things in the past.
… and this is not a full list of people who made this possible, everyone's contributions are equally appreciated!

Reflections and learnings

Rachel: I want to say you missed you Anssi, I want to acknowledge your leadership in this WG, thank you for pulling this WG together!

<Ningxin_Hu> +1

Rachel: I want also suggest that how is it possible to bring more people to work in this space, computational models, also not just neural nets, the other side of the ecosystem

Joshua_Lochner: I want to say thank you to Anssi! It is fun to be able to contribute to a project like this, from my side with Transformers.js library I created in Mar this year, happy to see people are interested in this technology with a huge community building around the project
… 75 supported architectures currently with Transformers.js, as WebNN becomes more mainstream adding different new execution providers is great to give people more options to run in the browser, a key goal for the next year
… getting WebLLM on board this WG is amazing!
… pushing the boundaries, great to be part where the WebML world is going

Ningxin_Hu: I want to acknowledge Alex Gough from Chrome Security Team who provided great inputs from security perspective, e.g. for gather op tightening
… also want to acknowledge S. Raja and Patrick who provided great input on the spec and both helped with the Chromium implementation for new transformer ops

anssik: Acknowledge also Dom our staff contact from W3C
… careful background work in the GH issues with a description of the problem and and careful analysis of solutions prior to the PR helps reach consensus faster
… close collaboration between the spec effort and implementation effort is a huge plus, can help validate assumptions with running code during the PR review even
… similarly, co-developing tests together with the spec help uncover underspecified parts

zkis: just a question for the future, in order to make big changes easier to review, how to make them more digestible?
… I was wondering if we can land next big change with integration branch with multiple PRs delivered there and when we are ready to make an atomic change we merge the integration branch
… multiple smaller PRs may get faster reviews, sometimes retaining the full context is useful

Chai: thank you Anssi, big PR with 200 comments takes its time to converge, I'm fine with that it takes long with many stakeholders involved
… we started working in this space 4 years and this year we've really accelerated with more and more folks and companies joining
… also want to acknowledge your leadership Anssi
… everyone on the call knows this year 2023 is the year of AI
… I see this WG getting stronger and stronger, super excited for the future
… working in this group is dream coming true, experts in this industry coming together in this group
… we have a lot of issues and proposals to go through so 2024 will be even busier than this year, thank you all!

Next steps

anssik: now that we've landed this major PR I would like to discuss what are our shared goals going into 2024
… I believe we want to advance the issues we spun off from the PR and I plan to bring them to our discussions in our future calls
… extending beyond this v2 ops PR, I think we want to work on the NPU support and WebGPU interop
… before getting into W3C Process-level next steps, anyone want to share areas of focus for 2024?

anssik: I have invited Dom to share with us guidance for the expected W3C process next steps

Initial wide review completed Mar 2023

<gb> CLOSED Issue 239 Wide review tracker (by anssiko) [process]

Dom: we reached first CR March 2023, since we landed this major change to the spec that significantly expands the scope and incorporates rewrites I suggested to Anssi we want to target another CR snapshot
… one motivation is it gives greater IP grant for the whole scope, it is also an opportunity that all these changes are well aligned with the rest of the platform and how Web APIs are expected to behaev
… when we reached CR in March we went through the wide review process
… if we were to publishing this CR snapshot we'd be expected to do a delta wide review for the changed and new parts
… horizontal groups relevant primarily would be TAG for technical architecture
… any additional short term issues worth fixing prior to that would be good to know, not necessarily WebGPU interop
… any of the review group could review any part of the spec but recommendation is to highlight what is new
… we could point to the list of PRs rather than raw HTML diff

anssik: maybe 2024 Q1 is a good time for snapshot?

RafaelCintron: WebGPU interop portion of the spec has currently least implementation experience, welcoming any review by TAG or any group
… before we go ahead we should probably gather more implementation experience on the WebGPU bits

Enhancements

API lacks handling for async ML device errors on the context (revisit)

anssik: issue #477

<gb> Issue 477 API lacks handling for async ML device errors on the context (by bbernhar)

anssik: we discussed this on our 2023-11-16 call: https://www.w3.org/2023/11/16-webmachinelearning-minutes.html#t09
… I wanted to revisit this now with Bryan in the group officially
… to recap Bryan is asking: "What happens if a WebNN operation dispatched through MLContext encounters some internal error which causes the GPU device to get removed?"
… and his expectation was: "I would expect WebNN to provide a spec into how fatal (device) errors are handled so the WebNN developer could respond appropriately. If we want to do more with MLContext (ex. create buffers), I believe we'll need a more robust error mechanism like WebGPU"

WebGPU Errors & Debugging

<RafaelCintron> +1

Bryan: we need to agree what the spec should be for this, there have been multiple approaches, there can be a driver error, anything that causes a device loss

Rafael: WebGL and WebGPU have a similar way to surface these errors, WebGL used callbacks to signal these errors, WebGPU did it in a modern way with a Promise when context is lost
… WebGPU path seems worth following,
… in future we can have multiple adapters, WebGL/GPU on different adapters, you lose context but things continue work on another adapter

New features

Support for device-based tensor storage objects

anssik: issue #482

<gb> Issue 482 Support for device-based tensor storage objects (by bbernhar)

anssik: a proposal from Bryan for device-based tensor storage objects
… problem statement:
… - WebNN and WebGPU lack a way of sharing tensor data on-device directly with each other
… - WebNN does not support chained inferences without copying everything back to the CPU
… proposed solution is MLBuffer, features:
… - Give WebNN developer control of device-storage to avoid round-trips to/from CPU
… - Could be extended to export/import to support WebNN interop with web APIs

anssik: the GH issue proposes new interfaces for:
… - MLBuffer construction/destruction
… - Upload/Download tensor data to/from MLBuffer
… - Binding MLBuffer to an MLGraph

Bryan: WebNN needs a way to share data and avoid roundtripping, affecting some models and inference performance
… MLBuffer helps WebNN provide means to source tensor data and initialize itself and have explicit resource sharing similar to WebGPU
… one stone two birds, interop and non-interop situation

Chai: I do understand the motivation behind this proposal, I'm thinking the implications of this proposal on the API side, understand why encapsulation is beneficial for WebGPU interop and other reasons when we want the MLContext to act like a resource domain
… I understand the motivation behind this

RafaelCintron: near the end the proposal uses overloading for compute method, we need to change it, a new method probably that does not return a promise and only accepts a buffer
… the API does not know you assign a return value to something, needs some improvement in the current proposal
… that said, I see the need for this feature, so each inference does not have to round trip

Chai: likely implication for adopting MLBuffer is we maybe need to scrub WebGPUBuffer
… we have to redesign how to move resource to GPU, need to think about this more, otherwise we have two ways to do the same thing

Bryan: that matches my understanding

Ningxin_Hu: want to add one point, chained inferences, CPU inference does not need to do relayout, can use internal representation and pass it to the next in the chain

Thank you for a transformative 2023!

anssik: Thank You for your major contributions during 2023. Some highlights and milestones from our journey this year:
… - WebNN API hit its Candidate Recommendation milestone in March 2023
… - the WG delivered a substantive spec refresh in Dec 2023, transformers support with v2 ops, an early seasonal gift
… - super strong progress on implementations across multiple backends and platforms, frameworks
… - the WG's participation grew by 100% YOY and we merged ~80 PRs into the WebNN API spec
… As this year draws to a close, we are accelerating into an exciting 2024 from a position of strength
… I look forward to more great things to come from this WG in 2024, the year of AI PC
… Happy Holidays and a Prosperous New Year!
… relax, recharge, and see you on our next call 11 January 2024

– DRAFT –
WebML WG Teleconference – 14 December 2023

14 December 2023

Attendees