The deployment of increasingly complex deep learn- ing models for inference in real world settings requires dealing with the constrained computational capabilities of edge devices. Splitting inference between edge and cloud has been proposed to overcome these limitations, but entails significant communication latency. Newer edge accelerator devices can be distributed throughout layers of the network, supporting fine-grained offload. We propose a method for splitting a deep neural network (DNN) across the edge, near-edge accelerator, and cloud to exploit the combined computing capabilities of such devices while minimizing transmission bandwidth and, hence, energy. We formulate an approach to find near-optimal two-split configurations to optimize inference energy and latency. We thoroughly evaluate our approach on the VGG16 and ResNet50 models using the CIFAR-100 and ImageNet datasets to demonstrate that our method can navigate the trade-off space effectively.