WebML CG F2F Day 2 – 20 September 2019

Meeting minutes

Operation set

Compatibility study of 14 operators of Opset 10 #17

anssik: Nikhil filed issues for tracking op compatibility resolution for matMul and conv2d. This is a start, more to come, I guess contributions are welcome!

[op compatibility] matMul #27

Nikhil: compat study is to understand that the API supports all of the JS libraries we care about
… the API should be compatible with the platform-provided underlying APIs
… filed two issues, bread and butter of ops, we want to start small and slowly grow

dom: TF.js is an open source project that can evolve, can it evolve to match what happens here, or the other way

nikhil: matMul will not change, hopefully conv2d also does not change, there are such basic primitives
… TF.js strict requirement is we have the same API with TF to allow model sharing
… breaking change for signature is hard, we can superset TF that is a possibility

dom: thanks, that help, do you know if ONNX has similar constraints?

nikhil: I assume they are more flexible, but cannot speak on behalf of them
… worked with TF and other Google people to figure out how to find an abstraction that will be good for the next 10 years
… not sure ops is the right abstraction, 25% growth in ops YoY in TF alone
… maybe there's an abstraction below ops that could be the standardized, and layer ops on top of that layer
… next, I'll introduce matMul op compat study issues findings
… matMul is easier, this signature is from numpy directly
… and much of this is from numpy docs
… two important notes for this op, there are other arguments for this op e.g. transpose
… maybe transposing should be a graph optimizer's task
… that would allow us to make the API surface smaller

dom: how confident we are that could be done?
… how's the implementation complexity from the implementers (web engine) perspective?
… the way that the graph APIs work in general, stick together a graph, computation happens only when you feed the values into it, that's TF 1.0 style actually
… doing that from the user's perspective is complicated, TF 2.0 does eager mode instead, i.e.when you call it run, losing graph optimizations
… the hybrid approach is better for users, get graph optimizations also in this case
… discussion ongoing how to expose these to the browser, underlying there's always a graph

<sushrajaMSFT> sushraja_MSFT: does graph rewriting result in loosing ability to use the API for training

nikhil: good feedback from Benjamin/WebKit on this issue re CoreML & MPS
… need to understand how other accelerators deal with the concept of accumulator precision

ningxin_hu: it relates to our experiment on Android NN API

dom: should this be a capability that is exposed?

nikhil: question becomes, what accelerometers we want to support?
… conv2d precision could be different between devices, e.g. mobile vs. laptop and could lead to severely different results, this is not theory, we see this happen in TF

Gabe: when is the operator going to be its own variant?

ningxin_hu: question is also, do you want to open quantization issue or is precision issue enough?

dom: broadly, how do you handle capabilities, do you want to allow different code path based on underlying capabilities

nikhil: in TF we let this to happen, we don't throw at you, thinks just work but I expect the model to work the same on phone and desktop/laptop

ningxin_hu: decision should be done by frameworks, API should expose the capabilities

<sushrajaMSFT> sushraja_MSFT: something to think about is exposing hardware capability or for the UA to automatically fill in with a slower code path

ningxin_hu: questions, there's a todo, want to know why you choose matMul over fully connected

nikhil: to get the discussion started :-)
… matMul is the simplest thing

ningxin_hu: we can contribute our input from POC to the compat study

nikhil: conv2d(x, filter, padding, strides, dilations)
… padding, everyone does this differently
… it get fun with tensor layout, shape of x, many ways to represent, channels can be transposed, different hardware support different ways, e.g. CUDA has a different way to transpose
… with Daniel thought this, from user's POV, good if web developer does not need to think about underlying platform we're in a better place, so proposal to choose just one format, even if internal representation is different
… browsers have already chosen a particular format, channels not transposed outside

ningxin_hu: two re-layouts can happen, with constants and feed images into your network, done for every frame

dom: there seem not be benefit in picking one over another, matter of optimization, exposing capability not useful here

ningxin_hu: align with media element layout by the underlying implementation

dom: one mental model is easier to map

ningxin_hu: earlier discussion, we accept ArrayBuffer, investigate WebGL buffer, TextImage2D, video or canvas to be fed as input to this API

ningxin_hu: activation, per our native API experience, fused activation etc. can be done by underlying graph optimizer, current direction is a group with small number of ops and leave other for custom ops, need to investigate if we can optimize more, since optimizers do not work with custom ops(?)

nikhil: we need discussion on the underlying APIs

anssik: currently our charter says: "The APIs in scope of this group will not be tied to any particular platform and will be implementable on top of existing major platform APIs, such as Android Neural Networks API, Windows DirectML, and macOS/iOS Metal Performance Shaders and Basic Neural Network Subroutines."

nikhil: we should look at each of those

anssik: ningxin_hu do you think you can help with this part?

ningxin_hu: yes, we've already done this work in a separate spreadsheet

Supported ops

ningxin_hu: let's look at the supported ops table we've collected

ningxin_hu: listing different op types and their compatibility across Wasm, WebGL, NNAPI, MPS, BNNS, clDNN, MKLDNN, DirectML
… NN API and MPS have good coverage, DirectML with some compat issues documented in this table

Native mapping

ningxin_hu: this is a little bit complex, this table tries to map the native capability, API and parameters, compat issue marked with notes
… e.g. for MPS padding, we have 4 places that need to be padded, open question how to do the right padding
… for DirectML, we can extract static conv2d information into this table and provide it as input to the compat study under progress
… we want to get the data how the definition and what op is supported by native platforms
… also uplevel compat study, looking at frameworks

Performance

ningxin: WebNN POC perf data for DirectML and WebGL backends
… our POC is open source, code available so you can run these benchmarks yourself
… models used are official TFLite and ONNX models

ningxin: summary, very promising performance speedup, opportunity for better perf with further optimization

ningxin: across platforms we see similarly good speedups, not just Windows

anssik: how much work was it to produce these op compat study items?

nikhil: it was some work, not trivial

Standards track next steps

anssik: Wanted to discuss two things: 1) near-term goal to produce an Explainer document that complements the spec that helps solicit early review from W3C TAG; 2) incubation to standards track transition, invited Dom of W3C Staff to talk about this.

Explainer document #18

anssik: Web specs are expected to be reviewed by W3C's Technical Architecture Group (TAG), and the best practice is to seek such TAG review earlier rather than later in the spec design process.

Web Neural Network API Explained template

anssik: This is a collective group action.

https://‌github.com/‌immersive-web/‌webxr/‌blob/‌master/‌explainer.md

anssik: we could copy with pride WebXR explainer's approach, it includes e.g. target hardware section

Alex: supporting explainer-driven spec design

Sushanth_Rajasankar: also splitting a spec into modules is one possible design approach

ningxin: what if we have alternative design we don't yet know which to pick?

alex: explainer is the right place for those, put your alternative designs in the explainer

dom: any discussion in the TAG on the architectural decision record?

alex: 8 month dated understanding, not sure at this point

ningxin: what is the process to update explainer?

anssik: PR with review

Towards W3C Standardization (slides)

anssik: Hand over to Dom

dom: W3C Standardization aka Recommendation Track
… build shared understanding when to advance
… Happens in Working Group
… Under strong Royalty-Free policy
… Following a well-defined process to enable:
… - Fairness and consensus
… - Architectural consistency with the platform
… - Proper review from a security, privacy, internationalization, accessibility perspectives (as it applies)

dom: When?
… Incubation in Community Group
… transition to WG when

W3C Recommendation Track Readiness Best Practices

dom: - Rough agreement on shape & scope of API
… - Some early implementation experience
… - Before it is too late to evolve based on broader input

dom: How?
… Find a W3C Staff to help, e.g. Dom :-)
… Draft a charter reflecting Community Group's view
… Build momentum in W3C broader community (cf workshop)
… Iterate based on reviews (from W3C and others)
… Get formal approval

dom: What about the CG then?
… Various possible approaches:
… - Keep CG to incubate new proposals (e.g. Immersive Web, Web Assembly, Web Audio)
… - Pause the CG while the standardization work happens (may relaunch afterwards)
… - DYI

anssik: thanks Dom, any questions?

nikhil: what is a typical timing?

dom: Immersive Web was 4-5 years in incubation
… Wasm incubated for 2 years
… there's no rules really, depends on where you are in your design process

nikhil: how to evaluate maturity?

dom: interest from target community, key thing is making sure whatever you produce gets adoption
… when you see that implementers are behind the rough shape of the API, it is good time to graduate

W3C Workshop

anssik: asked Dom to talk to us about Workshops and how they help in getting wider community engaged around a web spec proposal

Towards a Web & Machine Learning Workshop

dom: What is a W3C workshop?
… Open to anyone with relevant expertise
… Broader perspective than specific CG/WG
… Opportunity to hear from more relevant communities
… Typically, 2-days event

dom: W3C Workshop examples
… Web & Virtual Reality Workshop in 2016
… Web & Games Workshop in June 2019

W3C workshop archive

dom: Why a W3C Workshop on Machine Learning?
… Lots of energy in WebML CG
… Lots of interest from many connected but not yet involved communities
… Opportunity to put WebNN in broader context of ML in browser

dom: Possible topics
… WebNN in context
… Integrate ML with browser data sources (e.g. sensors)
… Integration with WebGPU, WASM
… Relation to Speech Recognition, Shape detection
… Relation to /integration with cloud-based inference

– DRAFT –
WebML CG F2F Day 2 – 20 September 2019

20 September 2019

Attendees

Meeting minutes

Operation set

Standards track next steps

W3C Workshop

Adjourn

Diagnostics