WebML CG Virtual Meeting at TPAC 2020 – 22 October 2020

Meeting minutes

<vagner_> present Vagner NIC.br

<piwanczak> sorry to bother again - could I kindly ask for call in details? The w3c list is unaccessible for me unfortunately

<br-rtbhouse> @piwanczak – same for me, I'm getting either 'waiting...' or 'unauthorised'

Workshop proceedings

<br-rtbhouse> Is the w3c website working fine for anyone else? I still cannot access this page with the call-in details :/

anssik: W3C organized a workshop on Web and Machine Learning over the course of August and September 2020. This workshop brought together web platform and machine learning practitioners to enrich the Open Web Platform with better foundations for machine learning.

<ehsan> I still can't as well

anssik: Today's meeting is to discuss the W3C Workshop on Web and Machine Learning key outcomes with a focus on proposed near-term and long-term standardization directions and next steps.

Materials

https://‌www.w3.org/‌2020/‌06/‌machine-learning-workshop/‌report.html

https://‌www.w3.org/‌2020/‌06/‌machine-learning-workshop/‌report.html Workshop report

Workshop presentations

Workshop discussions on GitHub

<BarbaraH> Zoom logistics https://‌mit.zoom.us/‌j/‌233956702?pwd=redacted

Workshop minutes

Next steps

anssik: Next Steps from the workshop broken into three categories:

<ehsan> thanks

Next Steps

anssik: Next Steps in Standardization
… Next Steps in Incubation
… Other exploratory work

anssik: Let's focus on the first one, next steps in standardization:

Web Machine Learning Working Group Charter

<dom> https://‌github.com/‌w3c/‌machine-learning-charter/‌issues

anssik: some issues raised for the Charter

Web Machine Learning Working Group Charter - issues

anssik: 1) Is a graph of operations the right level of abstraction for a web standard?

https://‌github.com/‌w3c/‌machine-learning-charter/‌issues/‌2
… 2) Detailed explainer for Web NN API

https://‌github.com/‌w3c/‌machine-learning-charter/‌issues/‌3

anssik: We believe these issues are being addressed by the explainer update

Explainer update PR

WebNN API Explainer preview (in review, in staging)

anssik: Inviting Chai to respond to the two concerns, starting with 1, Is a graph of operations the right level of abstraction for a web standard?

Chai: A summary of what has been discussed in CG is condensed into the first section of the explainer
… The diagram in the explainer depicts where the WebNN API sits
… WebNN API sits between OS and web browsers, because similarly to WebGL/GPU it provides a set of ML APIs that the browser can implement so that the web apps or frameworks sit atop of
… why do we need this API then? In the past 5 year ago there has been a lot of innovation in hw ecosystem, e.g. in GPU also specialized AI accelerators have emerged tailored for ML workloads
… many web frameworks have also been created, TF.js, ONNX.js, so the problem is how to bridge this development on the hardware side with the developments on the web regardless of the OS
… anyone should be able to execute ML models on any browser and on any OS
… also in our early prototype, we've seen that in terms of performance, WebNN being able to connect to hardware, executes much faster than generic APIs such as WebGL/GPU
… with this interface, the web browser can support these experiences more efficiently
… another aspect, why does this need to run on OS?
… as a DirectML lead, I've learned the integrity is also very important

[Chai walking through the example code]

Chai: first non-goal is we do not intend to define a model format
… format can be built on top of the API
… people using ML know there are many formats out there, and the idea of WebNN API is to enable all these formats
… speaking to both model format, and serialization format, including encryption, packaging
… we do not define the delivery mechanism
… third, we do not define media formats, the web is already rich in terms of media type definitions, we want to interop with those types

<ningxin_hu> The WebNN POC perf num Chai just mentioned: https://‌www.w3.org/‌2020/‌06/‌machine-learning-workshop/‌talks/‌access_purpose_built_ml_hardware_with_web_neural_network_api.html#slide-10

<Jonathan_Bingham> Thanks for the updates to the Explainer, Chai. We'll take a look.

Ehsan: What is the level of explainability and transparency for the ML models here?

<dom> https://‌github.com/‌w3c/‌machine-learning-workshop/‌issues/‌108 (in which Ehsan chimed in)

anssik: 1) Is a graph of operations the right level of abstraction for a web standard?

https://‌github.com/‌w3c/‌machine-learning-charter/‌issues/‌2
… 2) Detailed explainer for Web NN API

https://‌github.com/‌w3c/‌machine-learning-charter/‌issues/‌3

Chai: Not going to all the details, but responding to 1) re abstraction level
… we spent most of the time in CG discussing this topic
… the rationale is 1) if we look across all ML frameworks, every single one produces a graph
… job #1 is to understand what abstractions people are using

Chai: the second thing is a bit deeper, what would be the abstraction that compose that graph
… these are many ideas discussing this topic
… looking across all the platform supporting ML today, iOS, macOS, Windows, Linux, Android ...
… the common currency is operator-level, conv, gem, reductions, all these important math functions
… in the context of studying these existing platforms and frameworks we found a lot of commonality
… we started with conv and gem and found other highly reusable functions used everywhere
… this is the common currency across all these platforms
… this looks reasonable for a contract, since #1 it is supported by consumer OSes, #2 it is already produced by the frameworks on the frontend level
… I hope that answers the questions raised why the API is defined as a graph of ops

Jonathan: Thanks Chai for the updates to the explainer, we'll take a look
… also thank for review of how we ended up with a graph API
… sounds familiar to the Google's discussions, having created NN API
… we have a lot of users for our NN API, we learned something during that process
… the second this about the level of abstraction, all the ML frameworks do on some level, construct a graph, does not necessarily mean web platform should expose the same level
… that's not the real point, there are many ops that are common across ML models
… this is the reason we went in this direction with Android NN API
… in out experience from NN API in Android, and TF, is that there is no end to the number of ops that can be useful
… we have to major sets of ops for TF
… on the one hand we have TF that is research focused
… engaging with the research community, thus adding new ops almost daily, lots of churn
… TF has not over 1000 ops, historically growing double-digit per year

Jonathan: TF Lite focuses on a smaller set of ops
… that smaller set of ops has grown from 30-40 ops MVP to 120 ops today
… that's pretty similar to ONNX opset
… could be the outcome of WebNN API too
… but even there, we already feel pulled to too many operations and there are so many models people want to run and cannot be expressed
… Google has been working on NN API next version taking a different approach
… we try to find lower level building blocks for that
… I prefer to call them instructions, where ops are higher level than instructions
… instructions end up in a graph
… so discussion is whether we have a graph of ops, graph of instructions, or a mix of both
… we're like to be cautions to go too high-level for a web standards
… we're wanting to make sure we have some up with a low-level set of instructions
… that could be potentially be done by a graph API, maybe also with a model loader API
… as people in this group have said, there approaches are complementary

Anssi: do we have an explainer doc for the low-level instructions proposal?

Jonathan: we're very early with this proposal
… we have Tensor Compute Primitives
… that's one initiative
… another one is XLA-SLO

<ping_yu> TF-RISC

Jonathan: internally also working on TF Reduced Instruction Set

Jonathan: implications to WebNN, in the case of TF Lite, it has 120 ops, the current version of TF-RISC only has ~20% of those
… it is quite different

Anssi: How about moving ahead with WebNN API now and see what comes in the future and adapt as needed?

Jonathan: That is a valid questions, we would need to find out how much value WebNN API does have for web developers it is worth doing
… to answer that questions, I'd like to ask for an update from Ningxin on some performance benchmark, WebNN API vs. WebGPU/GL/Wasm, what is the performance benefit pushing this through now, even if the WebNN API might not be the forever there API?

<ningxin_hu> https://‌www.w3.org/‌2020/‌06/‌machine-learning-workshop/‌talks/‌access_purpose_built_ml_hardware_with_web_neural_network_api.html#slide-10

Ningxin: I can drop a link to my workshop talk, that has some PoC data in it, it can be used as a reference for performance gains
… Slide 11 has the performance numbers on a smartphone, we tested on two devices, PC using OpenVINO, on smartphone we use NN API

[Ningxin shares slide 10]

Ningxin: performance numbers on PC using CPU, GPU, AI accelerator
… if we look at the difference, for Wasm with SIMD+Threads, there's 8x speedup with WebNN with delegation to native ML API undernearth
… compared to WebGL, WebNN ~5x faster

<Jonathan_Bingham> Are these performance numbers for latency of the first inference ? Or for subsequent inferences? Because there are startup costs.

ningxin_hu: we use VNNI and int8 with WebNN, not apples to apples comparison to Wasm using float32

Jonathan: Is this latency for the first inference, or running multiple times?

ningxin_hu: multiple inference runs

Jonathan: I'd like to get our engineer to work with you on this data, thanks!

ningxin_hu: On Slide 11, using Android smartphone, on CPU 2.4X faster using WebNN, using GPU 4.5X, with DSP lower precision int8 inference, 10X speedup

<mehmetoguzderin> (sorry, accidental message)

<mehmetoguzderin> Haha no no, so sorry!

Web Machine Learning Working Group Charter

anssik: some issues raised for the Charter

Web Machine Learning Working Group Charter - issues

Chai: Listening to Jonathan, a lot of what Jonathan said sounds very familiar to me as the lead of WinML used by industry widely
… we are not necessarily making a decision between short vs long-term investment
… we work closely with all ISV, and all these vendors have been super focused on improving their hardware
… the performance numbers shared by Ningxin demonstrate what the hardware is capable and what the frameworks utilize
… the web is already a lot of behind in this sense, we can never know what might come in the future
… by defining this set of ops to be forwarded to the OS and hardware is the way to close the gap, as the gap gets wider every year
… speaking with my first-hand experience working on this space for many years, with billions of devices out these running Windows leveraging ML capabilities
… I don't think WebNN will be a stopgap solution

Dom: on the short vs long term is a very complex decision to make, and I'm happy the community is pursuing this question
… in the past we have cases such as XHR replaced by Fetch, WebGL being replaced with WebGPU possibly
… what would be useful for me re WG formation is the timeline it takes to get to agreement on this question
… I'd be grateful if you could share the timeline with me

Jonathan: we saw the explainer today, we saw the proposal the WG creation is happening during the workshop
… I expect that over the course of next couple of weeks we take a look at the materials available to understand this better
… before we do that prepwork we cannot give a timeline

Dom: Two weeks making plans around this is totally fine.
… the tension is making sure the platform has the right tools for developers
… this is delicate balance

Adjourn

anssik: Thank you all for attending!

– DRAFT –
WebML CG Virtual Meeting at TPAC 2020 – 22 October 2020

22 October 2020

Attendees

Meeting minutes

Workshop proceedings

Materials

Next steps

Adjourn

Diagnostics