anssik: welcome to the WebML CG's 2nd F2F, happy to see both new and old faces around
… on the agenda today on Day 1: intros, custom operations, MLIR (Multi-Level Intermediate Representation) exploration, Operation set
… on Friday Day 2 exploratory topics, standards track next steps, W3C workshop planning
anssik: Let's do a roundtable 30-sec intros: your affiliation & interests toward the group
anssik: I'm the chair working for Intel
nikhil: working for Google, Deeplearn.js co-author, want to bring the ecosystem forward, not familiar with W3C
ningxin_hu: Intel, CV and ML interest, OpenCV.js background
kenneth: Intel architect, W3C TAG rep, overseeing the architecture of the Web
Yongsun: Samsung, interested in ML in general
Dave: Payments network with many members, just interested in ML
Chimiming: University affiliated
Dean: Apple, interested in everything the group does, not ML specialist but I'll do my best connecting Apple experts, I work on WebKit project also Safari
Philip: Omnijar, working with DL for 13 years, with large companies, automotive, NVIDIA, ARM, interest to continue move commercial project to the Web
Riju: Intel, Chromium developer, sensors, NFC, media capture, OpenCV, not using ML currently
Kangchan: ETRI Korea, working on standards in ITU, ML as a Service
Wenson: Apple, WebKit, interest in ML
Diogo: Brazil W3C office, NLP background and interest
Takio: Yahoo Japan, sensor processing, transcoding, interest in CV w/ ML
Sangwhan: TAG member, used to work for Opera, CV startup not affiliated with Web, I also do NLP
Frank: Inria France, curious of the group
Belem: Intel, responsible to WebML polyfill
James: Google, working on Chrome, WebGL/GPU, interested in ML in Chrome
<Zakim> anssik, you wanted to say something
[ningxin presents the slides]
ningxin_hu: ML field is fast moving. Model architecture and the ops are evolving quickly. This leads JS ML frameworks usually have big op set (e.g. TF.js has over 200 ops)
… Today’s framework’s ops are implemented in WebGL and WASM, and WebGPU
… WebNN’s built-in op set that focuses on hardware acceleration will be small and grow slowly
… Problem: It demands a way for library authors to write ops that can interop with built-in ops.
Options: WebNN built-in ops interop with framework ops in WASM and WebGL/WebGPU (focus of this investigation)
Kenneth: can you mix Wasm and WebNN ops?
Shangwan: there's a GPU-CPU transfer with a performance cost
… WebNN provides a way to write custom op by a domain specific language (e.g. Kai’s proposal) (future exploration)
ningxin_hu: next subtopic, WebNN-WebGPU Interop
[showing example code of Conv + Add + Relu by TF.js WebGPU]
[showing example of compile WebNN op for WebGPU device]
[scribe sees ~30 participants, not all names recorded in minutes]
[showing example of execute WebNN's op with WebGPU op]
[ningxin showing a demo on his laptop]
ningxin_hu: custom build of Chromium on macOS
<ningxin_hu> conv input dims: [1,100,100,100] and filter dims: [3,3,100,100] WebGPU conv2d/add/relu elapsed time: 60.81 ms WebNN conv2d interops with WebGPU add/relu via ArrayBuffer elapsed time: 39.67 ms WebNN conv2d interops with WebGPU add/relu via WebGPUBuffer elapsed time: 22.11 ms WebNN conv2d with fused add/relu elapsed time: 21.11 ms
[above pasted text is an output of test case of TF.js sets backend as WebGPU]
sangwhan: is the Chromium source available?
ningxin_hu: that's available
nikhil: how fast is the readback?
ningxin_hu: not yet tested that
dino: you can't use MPS, why is that?
ningxin_hu: different memory layout internally
dino: can you show conv operations, what they are doing?
… I was expected to see a custom op, i.e. shader code
ningxin_hu: shader code is inside TF.js
ningxin_hu: subtopic, POC Implementation on MPS
… Reuse the same MTLDevice associated with WebGPUDevice.
… Get the MTLBuffer associated with input and output WebGPUBuffer.
… Allocate MPSImage for inputs with MTLDevice.
… Create MTLCommandBuffer from MTLQueue associated with WebGPUDevice.
… Encode a compute shader that copies and reorders data from MTLBuffer to MPSImage (MPSImage layout).
dino: is this a custom WebGPU implementation? Where you decide you MPS?
… TF.js running on top of WebGPU
… this is an impl of WebNN, not TF on for of Chromium
… using WebGPU infra underneath it has platform implementation e.g. MPS
ningxin_hu: Encode MPSNNGraph/MPSCNNKernel to MTLCommandBuffer
… Encode a compute shader that copies and reorders data from output MPSImage to output MTLBuffer.
… Commit MTLCommandBuffer.
ningxin_hu: Performance Summary
… Inference time (ms)
… WebGPU conv/add/relu 61.31
… WebNN conv interops with WebGPU add/relu via ArrayBuffer 43.42
… WebNN conv interops with WebGPU add/relu via WebGPUBuffer 23.06
… WebNN conv with fused add/relu 21.25
ningxin_hu: Copying/Reordering Optimization
… Inference time (ms)
… WebGPU conv x2 112.96
… WebNN conv + WebGPU conv 67.33
… WebNN conv x2 with reordering 24.53
sangwhan: with this design, vendors targeting a single type of accelerator, what are the implications?
… if you were to implement this in a general browser, not OS bound, you'd have multiple accelerators, what's the story?
… you'd need to have compilers for every accelerator
… implementability question
… if you'd use the platform APIs, it'd be fine, but they can be limited in terms of support
dino: Apple's perspective is we want to offload to the hardware as much as possible
sangwhan: when testing the POC, did the inference affect the ref(?)
dino: same issue with WebGL/GPU
… issue if the background task freezes the computer
… battery and perf benefit for going to ML hardware
sangwhan: would be nice if everyone had these purpose-built accelerators
… curious of implications of that
dino: not sure what Android devices have AI accelerators
sangwhan: based on testing, could be NEON accelerated, or GPU, whatever the vendor had time to do
nikhil: also good to benchmark readback times from those accelerators
[skipping slides to Summary of WebNN-WASM interop slide]
ningxin_hu: WebNN ops allow to access vendor specific CPU acceleration
… Interop between WASM ops and WebNN op has overhead
… Memory copying between WASM heap and WebNN backend
… Memory reordering, e.g. MKL-DNN blocked layout
… Execute WebNN ops chain with opaque operands can avoid unnecessary overhead
… Support key ops that access hardware acceleration (#17) E.g. conv2d and matmul
… Support compiling and executing ops for devices (new issue?) CPU or GPU
… Support interop with WebAssembly and WebGPU compute shader
… Sharing ArrayBuffer with WASM op
… Sharing WebGPUBuffer with WebGPU op (new issue?)
… Support interop with WebAssembly and WebGPU compute shader
… - Sharing ArrayBuffer with WASM op
… - Sharing WebGPUBuffer with WebGPU op (new issue?)
… Support executing ops chain with opaque operands (#11)
… - Leverage device optimized memory layout and avoid unnecessary memory reordering
… Explore custom op support by DSL (new issue?)
dino: how do these numbers compare with true native frameworks, CoreML, TensorFlow native?
ningxin_hu: 10% WebNN overhead over native
nikhil: TensorFlow/WebGL vs. CUDA, CUDA 10x faster
???: what kind of model do you use?
ningxin_hu: we have multiple models for this experiment, we use conv kernels, MobileNet, Inception, ResNet50
… on our website we have bigger models, the model size constraints us
nikhil: CPU and non-CPU accelerators an issue, how to consider them in the context of custom ops, understand readbacks
???: what is the focus in terms of hardware targets of this group?
ningxin_hu: we have experience on Android phone with an AI accelerator, close to native perf
???: what is the scope of this work? Recommendation to define a higher level abstraction to be flexible
[hearing no concerns for the proposed tasks to investigate further]
ningxin_hu: I'm willing to take "Support compiling and executing ops for devices (new issue?)" task
… maybe Kai could help with "Explore custom op support by DSL (new issue?)"
dino: Apple could look at "Support key ops that access hardware acceleration (#17)" and provide feedback for that
nikhil: just filed issues for conv2d and matmul
… will move forward with issues #27 and #28
nikhil: disclaimer, I'm not a compiler person, but talked with Google experts on that field
nikhil: we'll not proposing MLIR, just exploring this area
<jdarpinian> do you have a link to the slides?
[nikhil presenting MLIR slides]
???: XLA compiler spits out LLVM IR already?
… Domain specific optimizations, progressive lowering
… The TensorFlow compiler ecosystem has many “Graph” IRs, each with challenges
… Domain Specific IRs, Great! High-level domain-specific optimizations; Progressive lowering encourages reuse between levels
… Not great!
… Huge expense to build this infrastructure
… Reimplementation of all the same stuff:
… pass managers, location tracking, use-def chains, inlining,
… constant folding, CSE, testing tools, ….
… Innovations in one community don’t benefit the others
nikhil: let's talk about what is MLIR
… "An open source machine learning framework for everyone"
… Multi-Level Intermediate Representation
… "An open source program optimization framework for ... everyone"
… Abstraction Building Toolkit
… Reusable set of compiler passes for higher abstractions
… Targeting analysis/program optimization/code generation
… Open governance and part of LLVM
nikhil: MLIR has wide support across industry
nikhil: Extensible Operations Allow Multi-Level IR
… MLIR “Dialects”: Families of defined operations
… Example Dialects:
… TensorFlow, LLVM IR, XLA HLO, TF Lite, Swift SIL…
… Dialects can define:
… Sets of defined operations
… Entirely custom type system
… Customization hooks
… Constant folding, decoding
… Operation can define:
… Invariants on # operands, results, attributes, etc
… Custom parser, printer, verifier, …
nikhil: MLIR Type System - some examples
… f16, bf16, f32, … i1, i8, i16, i32, … i3, i4, i7, i57, …
… vector<4 x f32> vector<4x4 x f16> etc.
… Tensors, including dynamic shape and rank:
… tensor<4x4 x f32> tensor<4x?x?x17x? x f32> tensor<* x f32>
… Others: functions, memory buffers, quantized integers, other ... TensorFlow stuff, ...
nikhil: Applications of MLIR
… TensorFlow Lite Converter
… One of the focusses: Usability
… Usability of TOCO top complaint among TFLite users
… Report why a model failed to convert
… Dialect types enable more checking & better reporting
… [MLIR] for the Web?
… Some facts from MLIR investigations
… Operator expansion is about 25% YoY for TensorFlow
… Hardware vendors will implement dialects
… Open governance
riju: regarding operator expansion, is there a fallback mechanism, even if with performance penalty?
nikhil: we'd need to use e.g. a Wasm polyfill
… MLIR dialect on the web -- thoughts
… No backwards compatible guarantees today from MLIR
… A dialect could be invented that is backwards compatible
… What does maintaining this look like?
… Web sourcemaps => python code
… Immediately tells you whether python code will execute in browser
kenneth: web needs backwards compat, and we do not really do versioning on the Web
nikhil: how maintaining backwards compatibility could happen?
dino: LLVM IR is NOT a well-suited as a web transport format
<whsieh> ^ *not* not well-suited?
dino: a lot of lowering, what is the improvement?
dino: what is the scope of the group, all models interop with all devices?
… we could start with a set of ops everyone supports
nikhil: initially we wanted to support all ops
… then understood better growing the set slowly is a better approach
dino: our fear is, and I can be wrong, if the ecosystem becomes skewed toward TF models, so that those get hardware acceleration while some other models might not
nikhil: as a group we can grow that set so that it does not happen
dino: TF is growing fast, how's hardware adding ops?
nikhil: I think hardware vendors add new ops more slowly
kenneth: do any ops go away with time?
riju: any kind of ranking within these ops, what are used the most?
nikhil: TF has it, not sure if can make that data public
Philip: Swift for TF was good experience from usability perspecticve
… ML not a domain of data scientists for any longer, need good dev ergonomics
ningxin_hu: on which level of abstraction would the Web dialect of MLIR sit on?
nikhil: lower level things would evolve more slowly, but not sure at this point on which level the web dialect should be at
dino: generally Apple's position is that a high-level abstraction works well on the Web since it allows implementations to optimize
… we don't have a huge dataset, but JS is a good example
… no enough data yet how Wasm goes
… if we did a Web dialect, it would be something like that, but we'd make it a bit more higher-level than LLVM IR
nikhil: I'm wondering whether there's a level of abstraction between ops and LLVM IR we should target
anssik: what would be good next steps for the group re MLIR tasks?
nikhil: talking to MLIR people, it seems a bit too early still since moving target
… concretely, I can try to figure out which ops are used, how many times an op is called
<HelloFillip> The link to Chris's talk on Swift for TensorFlow can be found here (as an example for other languages): https://www.youtube.com/watch?v=s65BigoMV_I
we'll defer Day 1 3rd topic "operation set" to Day 2 on Friday
thanks for attending, we'll see again on Friday!
<belem> Thanks Anssi!
Succeeded: s/ WebGPU conv x2 112.96/... WebGPU conv x2 112.96/
Succeeded: s/WebNN conv + WebGPU conv 67.33/... WebNN conv + WebGPU conv 67.33/
Succeeded: s/well-suited/not well-suited/
Succeeded: s/Present+ A//
Maybe present: ???, anssik, Belem, Chimiming, Dave, Dean, Diogo, Frank, James, Kangchan, kenneth, nikhil, Options, Philip, Riju, Sangwhan, Shangwan, Takio, Wenson, Yongsun