Split DNN Inference for Exploiting Near-Edge Accelerators

Abstract

The deployment of increasingly complex deep learn- ing models for inference in real world settings requires dealing with the constrained computational capabilities of edge devices. Splitting inference between edge and cloud has been proposed to overcome these limitations, but entails significant communication latency. Newer edge accelerator devices can be distributed throughout layers of the network, supporting fine-grained offload. We propose a method for splitting a deep neural network (DNN) across the edge, near-edge accelerator, and cloud to exploit the combined computing capabilities of such devices while minimizing transmission bandwidth and, hence, energy. We formulate an approach to find near-optimal two-split configurations to optimize inference energy and latency. We thoroughly evaluate our approach on the VGG16 and ResNet50 models using the CIFAR-100 and ImageNet datasets to demonstrate that our method can navigate the trade-off space effectively.

Publication
In IEEE International Conference on Edge Computing and Communications (EDGE)
Hao Liu
Hao Liu
PhD Student

My research interests include distributed robotics, mobile computing and programmable matter.

Suhaib A. Fahmy
Suhaib A. Fahmy
Associate Professor of Computer Science

Suhaib is Principal Investigator of the Accelerated Connected Computing Lab (ACCL) at KAUST. His research explores hardware acceleration of complex algorithms and the integration of these accelerators within wider computing infrastructure.

Related