The continued development of neural network architectures continues to drive demand for computing power. While data center scaling continues, inference away from the cloud will increasingly rely on distributed inference on multiple devices. Most …
The deployment of increasingly complex deep learn- ing models for inference in real world settings requires dealing with the constrained computational capabilities of edge devices. Splitting inference between edge and cloud has been proposed to …
Supporting the evolution of wireless communication beyond 5G using high-performance networks requires massive device connectivity. Massive Multiple-Input Multiple-Output (MIMO) systems have been used and proven to increase the data throughput of …
Serverless computing has become a popular cloud computing paradigm. However, its deployment abstraction entails significant performance overheads. We explore the potential for enabling serverless computing on FPGAs and present some early results that …
Federated Learning (FL) enables distributed training by learners using local data, thereby enhancing privacy and reducing communication. However, it presents numerous challenges relating to the heterogeneity of the data distribution, device …
Wireless communication systems rely on aggressive spatial multiplexing Multiple-Input Multiple-Output (MIMO) access points to enhance network throughput. A significant computational hurdle for large MIMO systems is signal detection and decoding, …
We present a high performance tridiagonal solver library for Xilinx FPGAs optimized for multiple multi-dimensional systems common in real-world applications. An analytical performance model is developed and used to explore the design space and obtain …
We explore the design and development of structured-mesh-based solvers on Intel FPGA hardware using the SYCL programming model. Two classes of applications are targeted : (1) stencil applications based on explicit numerical methods and (2) …
Distributed embedded applications (DEAs) are typ- ically implemented on diverse embedded nodes interconnected through communication network(s) to exchange data and control information to achieve the desired functionality. Conventional approaches of …
We present StressBench, a network benchmarking framework written for testing MPI operations and file I/O concurrently. It is designed specifically to execute MPI communication and file access patterns that are representative of real-world scientific …