Split DNN Inference for Exploiting Near-Edge Accelerators

Hao Liu, Mohammed E. Fouda, Ahmed M. Eltawil, Suhaib A. Fahmy

July 2024

Abstract

The deployment of increasingly complex deep learn- ing models for inference in real world settings requires dealing with the constrained computational capabilities of edge devices. Splitting inference between edge and cloud has been proposed to overcome these limitations, but entails significant communication latency. Newer edge accelerator devices can be distributed throughout layers of the network, supporting fine-grained offload. We propose a method for splitting a deep neural network (DNN) across the edge, near-edge accelerator, and cloud to exploit the combined computing capabilities of such devices while minimizing transmission bandwidth and, hence, energy. We formulate an approach to find near-optimal two-split configurations to optimize inference energy and latency. We thoroughly evaluate our approach on the VGG16 and ResNet50 models using the CIFAR-100 and ImageNet datasets to demonstrate that our method can navigate the trade-off space effectively.

Type

Conference paper

Publication

In IEEE International Conference on Edge Computing and Communications (EDGE)

Machine Learning

Suhaib A. Fahmy

Associate Professor of Computer Science

Suhaib is Principal Investigator of the Accelerated Connected Computing Lab (ACCL) at KAUST. His research explores hardware acceleration of complex algorithms and the integration of these accelerators within wider computing infrastructure.

Split DNN Inference for Exploiting Near-Edge Accelerators

Abstract

Hao Liu

PhD Student

Suhaib A. Fahmy

Associate Professor of Computer Science

Related