WebML CG Teleconference – 8 August 2019

Meeting minutes

<Nikhil> :)

Define the set of operations and their specification

Define the set of operations and their specification #17

anssik: we had a review of the proposed resolution and receiver good feedback we need to resolve, let's discuss that now.
… the objective of this call is to resolve objections raised for the proposed resolution and clarify proposed resolution based on feedback where appropriate

To start, I captured the following questions from issue #17 we need to resolve:

nsthorat: "An important part of this specification will be ensuring this set of ops are compatible with the major ML JavaScript frameworks [...] it's not possible for us to move forward with this resolution without understanding compatibility."

jbingham: "what's the plan for dealing with versioning?"

jbingham: "How are custom ops defined and included in the graph?"

walrusmcd: "How many ops?"

jbingham: "Decide if a graph API is the right thing to standardize on"

anssik: To summarize, need to choose a set of operations to be included in the API that enables adequate compatibility in the major ML frameworks

<Zakim> Nikhil, you wanted to talk about something and to talk about onnx & tf lite compatibility doc: https://‌docs.google.com/‌document/‌d/‌1RXCkZ9mliWbqSakYvNlWhsRH4yFtnpe1YQQNFAIRZo8/‌edit

Nikhil: shared doc on the chat, please take a look
… spend time looking at compat, started with basic 2 ops, tried to understand diff
… starting with low number of ops is our preference and grow that over time to understand compat issues of each op

danielsmilkov: this is about diffing libs, looking into possible compat issue, 1) when comparing with NN API e.g. some ops allow fusing, we propose separate ops no fused kernel, under the hood implementer could fuse the ops so that it runs great on a particular hardware
… ONNX is opinionated regarding the layout, TF lite wants channels come last, different hw depend channels first or channels last
… which layout is better changes over time

Nikhil: would prefer to start very small with a POC that works, and have a plan how to grow that set of ops
… probably need a way to deal with custom ops, have a way in app space to describe custom ops share memory with matmul

ack?

ac?

Rafael: I agree with a plan to keep this hardware agnostic

<Nikhil> awesome! that would be great.

<Nikhil> regarding the script to convert onnx / tensorflow

Paul: everything Rafael basically yes, goal is to be hw agnostic
… ONNX has done work on channel formats and hit these same issues, proposed solutions

<Zakim> anssik, you wanted to ask about something and to

ack?

<Zakim> Ningxin_Hu, you wanted to talk about op set & use cases

Ningxin_Hu: thanks for the efforts of Nikhil and Daniel, great work! Agree with approach of starting with a small set of ops and validate compat with JS libs
… proposal how to grow the op set: add ops that are needed to implement identified use cases

https://‌webmachinelearning.github.io/‌webnn/#usecases

<Ningxin_Hu> op set and use cases: https://‌github.com/‌webmachinelearning/‌webnn/‌issues/‌17#issuecomment-508426036

[silence, agreement]

Paul: we took an approach where we selected the ops that benefit from hw acceleration
… a bit similar approach to CUDA

Ningxin_Hu: if we only select expensive ops that benefit from hw, that may impose perf penalty when doing context switching

Paul: I agree, it might be worth prototyping that now, assumption we're proposing is this hybrid approach (w/ WebGL) is viable

<jonathan> What other ML frameworks should review each op, like Daniel did for TensorFlow, and confirm compatibility before we finalize the definition?

Ningxin_Hu: agree with Paul's comments, interleaving with Wasm in POC, overhead was significant

Rafael: CPU readback is slow, staying with GPU compute shaders should work pretty well

jdarpinian: I'm on the Chrome team and think custom ops based on WebGL can work, but will be very complex to implement

<Nikhil> We think it's important to be able to have custom operations share memory with conv2d / matmul without doing a readback. for cpu-accelerators, share the buffer with WASM, for gpu-accelerators share the buffer with WebGL

jdarpinian: portability between custom ops between different systems, CPU and GPU not very good

<Nikhil> this allows us to grow the spec slowly and not have tail-end ops be bottlenecks and the webnn accelerated ops can get quick wins by accelerating the bottleneck ops (conv2d, matmul, etc)

Paul: I think Ningxin_Hu posted an architecture diagram

arch diagram

Paul: frameworks will do the heavy lifting, web developer won't see the complexity

Nikhil: we think the same, but not all devices have WebGL backend so fallback to Wasm for example

Ningxin_Hu: about custom ops, folks talked about memory transfer overhead
… even long SIMD instructions on CPU can require tensor memory re-layout, an expensive operation

anssik: it was asked on the issue whether graph is the right abstraction?

jonathan: what are the other JS frameworks we need to take into compatibility study?

Paul: in ONNX we considered all frameworks that matter, they have a voice in ONNX project
… in ONNX we have considered PyTorch, Caffe, Intel's frameworks, Microsoft's frameworks, TensorFlow, we have ONNX to TF converter, Apple's CoreML
… CoreML was part of the opset 1 compatibility

Nikhil: specifically interested in JS ML frameworks
… for compatibility
… for example, Brain.js

Paul: we don't want to have two bodies managing op schema, right?

Nikhil: we want to grow slowly, right?
… focus on web stuff to figure out an intersection of JS ML libraries, does that sounds reasonable?

Paul: ONNX does have namespace and versioning concepts, so we could create our own ONNX namespace for the ops references by Web NN API

Rafael: it is up to us to decide how many ops to adopt, the op definitions themselves would come from ONNX standards body

danielsmilkov: that makes sense, want to be clear, because of portability issues and JS libs as users, some changes needed to ONNX may be needed e.g. memory layout

Paul: that's fairly reasonable, ONNX community would certainly welcome that

danielsmilkov: relaxing, not breaking existing ONNX behaviour
… going to custom ops
… we deal with real models every way, need to add ops to TF, interoperability important for e.g. pre and post-processing of media, video

jdarpinian: also need to look into hardware we want to support, there's a lot of hardware out these and new coming up, e.g. neural engines coming up in ARM chips

Nikhil: that's a good point, e.g. for matmul would be good to do homework checking how that works across all hardware

anssik: Daniel and Nikhil could you move your doc https://‌docs.google.com/‌document/‌d/‌1RXCkZ9mliWbqSakYvNlWhsRH4yFtnpe1YQQNFAIRZo8/‌edit#heading=h.n1gbg8k8lggq into a GH issue

Nikhil: yes, we'll do that

danielsmilkov: GH issue #17 there's a comment where Ningxin_Hu proposed 14 ops, we could do the work to split these 14 ops into 3-4 GH issues with some logical bundling

PROPOSED RESOLUTION: The specification will reference the ONNX operations and if there are any improvements desired for ONNX the work should be there.

<Ningxin_Hu> 14 ops proposal: https://‌github.com/‌webmachinelearning/‌webnn/‌issues/‌17#issuecomment-512651711

PROPOSED RESOLUTION: The specification will reference a subset of the ONNX operations, starting small, adding more ops when compatibility with major ML JavaScript frameworks has been validated

<Zakim> want, you wanted to talk about custom ops technical details

<Zakim> AI, you wanted to discuss custom ops

<Zakim> kainino, you wanted to discuss jdarpinian, want to point out it's important to not only understand the current and upcoming hardware, but since the browser runs in userspace we also need to run on top of the userspace apis (NNAPI, CoreML, DirectML) so we are constrained by how they expose things

kainino: we want to point out it's important to not only understand the current and upcoming hardware, but since the browser runs in userspace
… we also need to run on top of the userspace apis (NNAPI, CoreML, DirectML) so we are constrained by how they expose things

Nikhil: sharing memory with custom ops needs to be better understood
… can you Ningxin_Hu do that investigation?

Ningxin_Hu: with help from james or kai we could make progress with custom ops issue

Rafael: have bandwith to help, but not time to drive

jdarpinian: the same, can help not drive

Ningxin_Hu: I can take the lead, with help from others

PROPOSED RESOLUTION: The specification will reference a subset of the ONNX operations, starting small, adding more ops when compatibility with major ML JavaScript frameworks has been validated

<kainino> Ningxin_Hu: Please reach out to us as needed

<kainino> oops that's supposed to be @Ningxin_Hu

<Ningxin_Hu> thanks @kainino

https://‌www.w3.org/‌2019/‌09/‌TPAC/

any concerns with the amended proposed resolution?

[hearing no concerns]

Resolved: The specification will reference a subset of the ONNX operations, starting small, adding more ops when compatibility with major ML JavaScript frameworks has been validated

– DRAFT –
WebML CG Teleconference – 8 August 2019

08 August 2019

Attendees

Meeting minutes

Define the set of operations and their specification

Adjourn

Summary of resolutions

Diagnostics