WebML WG Teleconference – 27 January 2022

Meeting minutes

Security review

anssi: received feedback from the chrome security team
… they help us build our wide review of the WebNN spec on our path to CR

General Security Questions - 1. new scripting language

General Security Questions #241

WebNN Security and Privacy Self-Review Questionnaire responses

2.9. Do features in this specification enable new script execution/loading mechanisms?

Anssi: this touches on question 2.9 of the security questionnaire wrt new script execution

anssik: questionnaire section 2.9 reads: "New mechanisms for executing or loading scripts have a risk of enabling novel attack surfaces. Generally, if a new feature needs this you should consult with a wider audience, and think about whether or not an existing mechanism can be used or the feature is really necessary."

Anssi: the Google security reviewer suggests WebNN introduces a new scripting execution that gets executed in different contexts (CPU, GPU, etc)
… this creates new attack surface for malicious sites
… any concern in updating our response to 2.9 as acknowledgement to that report

RafaelCintron: I strongly disagree with the characterization of this as a scripting language
… I agree with the risks around out-of-bound access
… that will need goo tests and validation to prevent that from API
… as would be needed with any graph API

Anssi: so you're suggesting we respond by focusing on the out-of-bounds aspects in the spec / security explainer
… while disagreeing with the characterization of a new scripting language

General Security Questions - 2. ops that change shape mid-calculation

> Operations such as split/slice/squeeze that change the shape of tensors mid-calculation may lead to incorrect assumptions in later operations - for instance if eliding bounds checks this could lead to out of bounds accesses. It would be good for their to be operation level metadata that might be consumed by implementors to help prevent such problems.

anssi: are there effective mitigations against this already?

RafaelCintron: similar to previous question - I'm not sure what they really mean by operation metadata
… we need to clearly spec what each operator ingest
… and mark graphs as invalid when they generate out of bond access
… we shoul definitely prevent these problems

chai: the question is a bit unclear, but it is true that in some cases that the exact shape of the tensor is unknown until runtime
… it's not insecure per se, but implementations need to mitigate against risks for out-of-bond access
… we should spend a bit more time looking at this

General Security Questions - 3. op availability and deprecation

> The universe of operations is likely to vary in future - how will consumers discover which operations are available (short of enumerating them through failures to instantiate)? How will operations be deprecated (for instance if they turn out to be badly implemented?)

anssi: I interpret this as a question on feature detection and deprecation
… can we shape the API to make deprecate easier?
… the spec does a great job to explain how to polyfill higher level ops in terms of lower level ops
… this would help in case higher level ops need to be deprecated, they could still be polyfilled
… I'm thinking we should note that concern in our security considerations and develop a clear answer

General Security Questions - 4. async APIs

> It feels like .build and .compute should be asynchronous in all cases?

Should restrict the sync APIs to only exist in Workers? #229

Should WebNN support async APIs? #230

Anssi: we're discussing this in issue #229 and #230

anssi: not sure if there is a security-specific rationale behind that one

General Security Questions - 5. Side channels from shared resources

> New side channels will be made available from shared resources (cpu/gpu). Timeable things should be out of process so incur at least some ipc to achieve anything. Probably not a massive worry when compared with already sharing a cpu between processes running renderers.

RafaelCintron: I don't understand why we would need to run timeable things out of process

dom: worth a clarification given timing attacks have been of interest in the past

RafaelCintron: we could make the timing less precise as a mitigation
… as has been done e.g. in WebGL
… doesn't necessarily need out of proc, but worth getting more information

General Security Questions - 6. Permission delegation

> Verify: Sites must delegate permission to host/run models.

anssi: if I understand this correctly, we're covered with the permission policy integration we have in place for the spec
… the top-level Web site needs to delegate permission to iframe to use this feature

RafaelCintron: +1 on permission delegation as important
… permission policy gets us this indeed

dom: my reading is similar to Anssi's, we satisfy the requirement, given complexity of security model, let's loop back and confirm

General Security Questions - 7. serialization and caching

> Verify: No serialization or caching yet - although this is likely in future.

<RafaelCintron> +1 to Anssi

Ningxin: that's also an implementation question

anssik: serialization or caching is out of scope for the WebNN API spec

Ningxin: does directml imply caching e.g. in the driver for comparison?

dom: I think in the context of security reviews, caching creates timing attack surface
… likely implementation considerations, what is asked is to call out this risk to implementations

dom: questionnaire is a tool for us, the responses should be reflected in the spec either normative language or security considerations

chai: the OS security deals with this, probably doesn't depend specifically on WebNN

dom: I fully trust Chai on that, I think there could be mitigations on the WebNN side to complement that, e.g. timing attacks would be need to be considered and protected on the browser level, browser exec code from anywhere and from anyone

RafaelCintron: following up - why specifically timing attacks on caching? e.g. shaders have the same property

dom: you go to your bank site that runs shader, then iframe runs the same shader and loads faster, can detect history of browsing
… on top of my head attack, general principle to understand how much information you can get x-origin from existing caches using timing attack vectors

dom: we should think about this, not sure if there's a mitigation, worth investigating

General Security Questions - 8. Control over how a model is run

> Control over how a model is run - (selecting cpu/gpu/tpu say) - is this too much power for the consuming site - it will for instance make it possible to more directly target a flawed implementation. It's not clear why this is required.

RafaelCintron: goes back to how much choice we should give the developer on where to run
… this is a decision that really matters in terms of performance
… some models really don't run well on GPU, some don't run on CPUs

anssi: one aspect is that this is a hint, it's left to the implementation

dom: I think a hint is a partial answer
… if the UA runs on a platform with specific narrow vuln on some processing unit, higher risk for exploitation

zkis: I was wondering hinting could be one think, but how do you handle errors when something is not available?
… is the following a fingerprinting concern: you have CPU and GPU on most devices, whether some dedicated accelerator, less common
… should not allow enumerating devices, let implementations to decide whether it respects the hint or not, but how to handle errors if that causes the model to fail?
… an issue in OpenVINO, you can run hybrid CPU-GPU models

anssi: I don't think we should allow device enumeration, not sure about how WebGPU deals with this

chai: I'm confused about how this would be a fingerprinting issue
… I'm more worried about not being clear about what devices is going to be used to run this
… it has big implications on sync/async

<dom> +1 to chai's point re sync/async impact

dom: I don't this the current API shape is so fingerprintable
… agree with Chai's point that if CPU vs GPU is a hint it impacts greatly our discussion on sync vs async

Guidelines/philosophy for new operations, including security principles

Guidelines/philosophy for new operations, including security principles #242

WebNN API MLOperand

Adding new operators, view from ONNX by Michal Karzynski

anssi: this reinforces the value of creating guidance for creating new ops

Rationale/criteria for adding new ops to the WebNN API (TPAC 2021 minutes)

anssi: could be part of the Operand section

<dom> +1

Op metadata that helps avoid implementation mistakes

Op metadata that helps avoid implementation mistakes #243

A conformance suite with disallowed intra-op examples would be helpful for hardening

A conformance suite with disallowed intra-op examples would be helpful for hardening #244

dom: I heard earlier out of bounds we need to look into, formalizing that into test cases for w-p-t makes a lot of sense to me

anssik: on behalf of the group, I want to thank Alex Gough and Chrome Security team for this security review!

dom: really valuable indeed
… incl for wide review

dom: this is well beyond the expectation of a security review, very good feedback for CR horizontal review purposes

Integration with real-time video processing

Review proposed prototype next steps

Proposed prototype next steps

Proposed detailed GPU pipeline processing steps for semantic segmentation prototype

Anssi: Ningxin has proposed a plan to make progress here

dom: Ningxin thanks for formalizing this into concrete next steps!
… is this blocked until WebCodecs proposal is implemented in Chromium?

ningxin_hu: GPU Import of VideoFrame?
… is a dependency

dom: can be proceed without this?

ningxin_hu: another is WebGPU-WebNN interop, per request by Corentin opened an issue with WebGPU WG to investigate this
… that is another dependency in this proposal
… we can look into that in parallel
… so this proposal has these two dependencies and we can work on these in parallel

dom: I'm hearing we need improvements in both specs and implementation to make meaningful progress on the prototype

ningxin_hu: I need to confirm with people working on the WebCodec import to WebGPU, there's some prototype code for that in Chromium

dom: I followed the great discussion on WebNN-WebGPU integration, Ningxin, is this going to a good direction? Need for a joint meeting?

ningxin_hu: I'm fine with GH discussion on this, also checked with WebGPU people and they have monthly and welcome us to have an agenda item there to discuss
… this issue also marked as "post v1" in WebGPU

anssi: I understand WebGPU people are focusing on shipping their v1

chai: the outstanding topic remains the support for async
… the control over the GPU timeline matters when integrating with WebGPU
… WebNN isn't clear on this timeline intersection
… since we're also dealing with CPU, this could have a really big impact on the API shape
… we need to resolve that issue sooner rather than alter

<chai> https://github.com/webmachinelearning/webnn/issues/230

chai: #230 mentions integration with WebGPU as a consideration in this discussion

Review proposed use case for spec inclusion

Add real-time video processing use case #249

anssi: do we refer to WebGPU/Webcodecs, or remain abstract?

dom: this combines use cases and requirements

dom: I guess if we want to highlight technical aspects, then that would be more of requirements derived from the use case

<ningxin_hu> sgtm

Double-precision baseline implementation of WebNN operations for testing

anssik: Review the double-precision baseline implementation of WebNN operations for web-platform-tests purposes

The baseline implementation of WebNN ops #245

webnn-baseline (staging repo)

anssik: any concerns with proceeding with this implementation work?

ningxin_hu: would like to get confirmation it is fine to move this repo to WebML GH
… would like to set up the repo, initial PR, people can review then

– DRAFT –
WebML WG Teleconference – 27 January 2022

27 January 2022

Attendees