Exploring Distributed DNNs for the mobile web over cloud, edge and end devices.
Presenter: Yakun Huang, Xiuquan Qiao
Demo from Web & Networks Interest Group
Hello, I’m Yakun Huang from the Beijing University of Posts and Telecommunications The topic of my speech today is ‘Exploring Distributed DNNs for the Mobile Web Over Cloud, Edge and End Devices’.
I am going to start from the overview of the implementation of neural networks on the mobile Web, the implementation of of distributed deep neural networks on the mobile Web with edge offloading and some thinking.
As we all know, as a representative method to realise artificial intelligence in many applications, deep neural network also shows great potential in Intelligent Web applications.
First of all, I will introduce three typical mobile network execution schemes.
The first method is to run a complete deep neural network model.
However, this method requires high transmission delay to load the DNN model with large parameters.
In addition, limited computing resources of mobile Web will lead to slow reasoning speed of deep neural network, even if Web Assembly or WebGPU is used to accelerate deep neural network reasoning.
The second method is to use the pure cloud computing mode, that is, mobile Web transfers the computing task to the remote cloud.
and completes the task calculation of the deep neural network in the cloud.
This method needs to transmit large amounts of data, such as pictures, audio, video, etc.
At the same time, high concurrency may increase the computing pressure on the cloud.
Not only that, but also may bring security and privacy issues to users, such as home safety cameras.
As mobile edge computing is becoming an important computing infrastructure in 5G era, edge cloud has lower communication cost and reduces the burden of core network compared with offloading computing to remote cloud.
This method can divide the deep neural network computing layer by layer, and dynamically allocate the calculation between mobile Web and edge server.
In order to further explore how to improve the operation efficiency of distributed deep neural network, we explore the fine-grained adaptive DNN partition for cooperation between cloud, edge and mobile Web to improve the performance of latency and mobile energy consumption, as shown in the figure.
Firstly, the network bandwidth and the computing power of terminal devices are detected periodically; Then, the pertained multi branch deep neural network is divided into several parts, which are transmitted to the edge and mobile Web users respectively.
Finally, the reasoning is completed in a cooperative way among cloud, edge and mobile Web.
Also, we introduce adding an efficient branch to the traditional DNNs for executing in terence on the mobile Web independently.
Concretely, we add a binary neural network branch at the first convolutional layer of the traditional neural network, and it has the same structure to the rest of the traditional neural network.
for a given sample, if the binary branch is confident to predict the results and satisfy users, the sample can exit from the binary branch directly.
Otherwise, it has to transfer the output of the first convolutional layer to the edge server for a precise result.
Furthermore, considering that in actual scenarios, user requirements for delay network conditions, and the computing capabilities of devices may change dynamically.
So a constant lightweight branch or traditional DNN compression network cannot meet the requirements.
We also provide a context aware pruning algorithm, which includes execution delay, network conditions and device computing power, and is applied to adaptive reasoning framework across mobile Web, edge server and cloud server.
Although we have discussed some ideas for enabling distributed DNN inference It mainly includes two stages: offline network pruning and online network dynamic reasoning.
for the mobile Web to implement AI services, they still face some problems in actual development.
The first thing os what role should the edge server play in providing processing support for intelligent Web applications requiring heavy computation?
I mean, is there a better computing collaboration or deployment model for accelerating DNNs?
The second is how to seamlessly offloading DNN computation from the mobile Web to the edge and remote cloud with existing technologies (e.g. WebWorker, VM)?