SIMD operations in WebGPU for ML

How subgroup operations will help ML in the browser

Mehmet Oguz Derin @mehmetoguzderin

Table of Contents

  • Concepts
  • Impact on ML
  • Web Interface
  • Operations
  • Hardware Support
  • Roadmap

Subgroups

  • Subdivision of threadgroups

  • Also named as simdgroups, warps, waves

  • Subgroup operations make sharing and reducing data across threads in a subgroup measurably faster

  • Their size vary across GPUs

  • We can have these operations in WGSL (WebGPU Shading Language)

Impact on ML

  • Exploratory Data Analysis
  • Model Fine Tuning
  • Edge Inference

How:

  • ~2x reduced runtime
  • Reduced power consumption
  • Intuitive calculations [0]
[0] It is important to note that GPU has no atomics or advisable locking mechanism for floating point numbers.

Web Interface

Subgroup operations exposed to web are

  • Compute stage only
  • Active threads only
  • Non-uniform
Basic Operations
subgroup_size
subgroup_invocation_idx
subgroupIsFirst
Vote Operations
subgroupAll
subgroupAny
Arithmetic Operations
subgroupAdd
subgroupMul
subgroupMin
subgroupMax
subgroupAnd
subgroupOr
subgroupXor
Arithmetic Prefix Operations
subgroupPrefixAdd
subgroupPrefixMul
Ballot Operations
subgroupBallot
subgroupBroadcastFirst

Hardware Support

Desktop: available everywhere
Mobile: most of the next generation chips support

Roadmap

Raised concerns mostly fall out-of-scope for this PR, not blockers for adoption as-is

Technically can make into MVP as a good addition in the standard library of WGSL

Thanks for your attention!

You can check out the PR itself #954