WebML CG Teleconference – 7 January 2021

Meeting minutes

<anssik> #/invite RRSAgent #webmachinelearning

WebNN API TAG review progress report

anssik: The review seems to be work-in-progress and triaged. I just pinged the issue.

Web Neural Network API - TAG Spec Review request

anssik: related, W3C announced today 7 January 2021 the results of the W3C TAG election and Sangwhan Moon was re-elected for another term on the TAG, congrats to him! https://www.w3.org/blog/news/archives/8846

anssik: I'll work with the TAG to get the review conducted in a timely manner.

anssik: any questions?

Security and Privacy considerations

anssik: Review proposed questionnaire responses, fill in TBDs:

Self-Review Questionnaire: Security and Privacy #119

anssik: In prep for the TAG review it is recommended to complete the Self-Review Questionnaire. I took the first stab, but appreciate if the spec editors and other contributors take a look and in particular help address the following questions:

How does this specification distinguish between behavior in first-party and third-party contexts?

How does this specification work in the context of a user agent’s Private Browsing or "incognito" mode?

anssik: The answers will form the basis for the security and Privacy considerations for the spec, this assessment is expected when the spec advances to the WG.

Support style transfer models

Support style-transfer models with changes/additions to the following operations (PR #123)

anssik: Reviewed and merged, thanks Chai for the work!
… could you Chai briefly explain the gist of this PR for folks who did not review it yet?

Chai: one of the models that have been used a lot in all the frameworks
… good exercise because it allows us to look at what the API is still lacking, specifically a few ops and options that need to be filled
… the important ones are to extend the conv2d operation to support transposed convolution, an essential upsample tool for encoder-decoder models.
… those new to these models, look at the spec first section, quite a few use cases need this
… I've been looking through those use cases so the API is able to support those
… essentially these models take input image and input style image and combine them using the content of the first image and stylistics of the second to produce an output

anssik: any questions?

Dynamic shape inference

WebNN needs to support models requiring dynamic shape inference (issue #124)

anssik: Let's discuss this issue opened by Chai

Chai: scanned use cases before xmas to see what are the possible gaps
… essentially, these are some models that in the middle of processing the graph need to get the tensor shape that flow in and turn into another tensor
… a bit unusual behaviour, but it becomes popular in some models
… it is easier for an ML developer to do this without having to know the shape ahead of time
… not a very good technique if you're concerned about performance
… if you want to construct a model with good perf, you identify perf ahead of time and be explicit of the size and tensors that flow in
… bad for GPU, you need to flush the queue, kills pipelining
… ... explained in the issue that dynamic shape inference is required in NSNet2, which "requires shape and constant-of-shape operation. Additionally, static shape inference should also be supported as it's a majority case to many models."
… WebNN API implementer can also do optimization ahead of time, many cases in model that the shape can be identified ahead of time
… the section of the spec should explain these details so that the execution can be as fast as possible
… if all fails the implementation should do that hard work or figuring out the shape

anssik: comments?

ningxin_hu: I implemented NSNet2 sample, so I'd like to add a comment I agree with Chai this is required by NSNet2
… I implemented a technique Chai just mentioned using static shape, since I know the input dimensions
… I can skip the op and constant shape because I know then and can set them statically and can workaround and run NSNet2 with success
… any comments Chai?

Chai: Looked at the sample it looks great

WebNN conv2d layout parameter TensorFlow incompatibility

TensorFlow conv2d expects channel_last filter layout regardless of input layout format (issue #125)

anssik: Chai explains "From the WebNN conv2d spec, the same layout parameter controls both the input and filter layout (below). In TensorFlow (and presumably TFLite), regardless of the input layout, the filter layout remains in the "channel_last" format i.e. [height, width, input_channels/groups, output_channels]."

Chai: this is one of the issues that we discovered after we put in conv2d definition
… it turns out TF internally can transpose the input tensor, they do not necessarily have to transpose the filter
… the API needs to be able to support this scenario of input and filter layout to be specified separately
… Ningxin may have insights on TF.js?

ningxin_hu: IIRC TF.js supports both for input, but for input only channel_last format
… this aligns with TF, IIRC
… checking TFLite, it looks like it uses channel_first for filter
… need input from Google folks

Zoltan: can we encapsulate this layout in the implementation?

Ningxin: filter layout only channel_last, input can be both

Chai: if filter is channel_last that is the problematic case, WebNN cannot represent that
… to answer the encapsulation questions, it depends on how WebNN API is used, you can definitely transpose any tensor before calling any API, but normally it is a bad think to do due to behavior, think trying to converse ResNet-50 with many layers
… in the conversion process you'd end up with a graph that
… would be inoptimal
… would need more work to massage the layout every now and then and that hurts the performance
… writing a converter between two formats, you'd like to have a very simple conversion
… to answer the Zoltan
… it could work, it is just more tedious

Chai: would love feedback on the issue #125 whether this should be supported

Super-resolution models

WebNN should supports super-resolution models (issue #127)

Chai: one of the use cases of the spec, super-resolution has been up and coming, it is a model essentially upsample an image to a higher res
… a lot of utilization of this technique in images and processing apps etc.
… processing lab like photoshop, takes a low resolution image to uplevel the resolution of the image
… also use cases in gaming, where Deep Learning Super Sampling is used to upsample game visuals in real time
… useful for models, many variant everywhere, in the context of the web I think when this use cases is added, while doing video conferencing, you can on the client upsample the image quality of the video feed

Zoltan: does this operate on image or time series level?

Chai: on image level, a video can be an image, just capture the images from video frame by frame

int8 quantized models

WebNN should support int8 quantized models (issue #128)

Chai: Supporting int8 quantized models is essential for mobile scenarios and in many NPU architectures. TensorFlow (Lite) and ONNX, for instances, have int8 quantization support built-in, and WebNN should to. Related https://github.com/webmachinelearning/webnn/issues/93

Chai: int8 quantized is not going to be just an op, so pretty large work item
… essential for NPUs and also GPUs more and more
… has been supported in TF and PyTorch etc. for WebNN to be relevant should support int8 quantization
… this is a cross-section of new ops added and enhancing current ops
… this is a big ticket issue, for the API to be serious we should add this to the API
… int8 will cut down the intermediate memory needed when you unpack and run the inference
… more and more developers become aware of this technique to get smaller models that run quicker, that nice!

<Chai> (need to be away from kbd)

ningxin_hu: I would like to share that in my workshop talk I actually experimented with int8 and got good speedup, e.g. 10X better perf on CPU

Conformance testing of WebNN API

<Chai> (back now)

anssik: Let's discuss conformance testing of operations, integration of web-platform-tests with NNAPI CTS
… the proposal is to add compatibility tests for WebNN first wave ops by converting existed native Android Neural Networks API tests. Feedback on the general approach?

anssik: Specifically, the request is to review PR for generated NNAPI CTS:

https://github.com/webmachinelearning/webnn-polyfill/pull/29

anssik: and review PR for SqueezeNet model test:

https://github.com/webmachinelearning/webnn-polyfill/pull/32

anssik: Ningxin anything specific you want to share about these tests?
… noted a question re numerical precision difference between tf.js webgl and cpu/wasm backends.

ningxin_hu: I think the precision is critical, needs group's feedback
… Chai mentioned on the workshop talk his experience on conformance testing and numerical accuracy
… want feedback from Chai on that

NSNet2 sample

anssik: Next, we'll discuss the sample of NSNet2 which is one of the first-wave models and used in explainer key scenarios
… review PR of NSNet2 PR:

https://github.com/webmachinelearning/webnn-samples/pull/22

anssik: also an update to the explainer:

https://github.com/webmachinelearning/webnn/blob/master/explainer.mdinvite RRSAgent #webmachinelearning/key-scenarios

ningxin_hu: explainer update is WIP
… before merge, one remaining issue
… once the TF.js issue is fixed we can merge this PR, otherwise the TF.js will crash Chrome

Proposals for future work

Operation-specific APIs proposal by Jonathan

Jonathan: my main concern with submitting this proposal was that WG Charter would support this if we chose to
… the concept is what if we take couple of computationally intensive ops and standalone implement those
… in many DL models those are the expensive ops per Ningxin did earlier, we got 90% perf boost from those ops alone
… within the implementation of the JS lib, those handful ops would use the op-specific APIs for better perf
… this proposal would just accelerate the most expensive ops, so a lot of gain with a small number of ops
… maybe this is a stepping stone, could launch faster due to smaller API surface
… I brought this proposal back motivated by the charter work
… want to know if this helps us launch useful subset faster

<Jonathan> i can't hear him either

<Geunhyung_Kim> me too

<Jonathan> great

RafaelCintron: are the ops you wanted to target the same we have already specced as part of WebNN API?
… or are you expecting a new set of ops with new inputs and outputs?
… is this proposal just a WebNN with one node in the graph
… I expect op definitions to be the same
… as to whether to wrap this proposal in to a graph or not

Jonathan: motivation to make this similar to compute shaders
… complete graph API would still have these primitive ops that JS libs could use for a perf boost
… we could commit to graph API and not implement the primitive ops

RafaelCintron: I think that clarifies
… motivation is you think we spec too many ops? Or because you think graph API would be too complicated?

Jonathan: in this group the benefits were that small API surface, big perf gain, given we spec'd some ops we could launch faster
… maybe also anxiety some people have with the general direction, get performance boost to web platform sooner without possible pushback from Android folks exploring different options

Jonathan: Android NN team is exploring various options

Chai: you mentioned compute platforms, and folks who already have ML frameworks implemented in terms of compute platforms, this could be useful to them

<Jonathan> yes, correct

Chai: I'm wondering if you're discussing tensorflow implemented in compute shaders, there's already a WebGPU Working Group that works toward interacting at the atomic compute primitive level
… I'm hearing you want to propose to spec the compute primitive, maybe the WebGPU WG Charter provides a venue to explore that options, WebNN Charter is a bit higher level,
… not compute primitives, but ML primitives

Chai: I get what you say
… I apply that there might be some needs when people are writing a library and want to tap into primitive that does matrix multiply
… also gemm, but if you want to reach down to compute primitive I can see it could be useful but probably more in scope of WebGPU WG Charter

Jonathan: not strictly GPU, because would need to support ML-specific hardware

Chai: is there any group that deals with compute platforms, not graphics specifically

https://www.w3.org/2020/12/gpu-wg-charter.html

https://www.w3.org/community/gpu/

RafaelCintron: as for whether this proposal belongs to WebNN of WebGPU

<Jonathan> +1

RafaelCintron: even if we have a graph API on the table here, we have use cases to the compute platform targeting APIs too

<Zakim> zkis, you wanted to ask could we split the API to conformance classes/namespaces to handle this?

Zoltan: it makes sense to use multiple conformance classes in the API per Jonathan's proposal?

https://github.com/webmachinelearning/proposals/issues/2

– DRAFT –
WebML CG Teleconference – 7 January 2021

07 January 2021

Attendees