1

High Throughput Massive MIMO Signal Decoding Using Multi-Level Tree Search on FPGAs

Supporting the evolution of wireless communication beyond 5G using high-performance networks requires massive device connectivity. Massive Multiple-Input Multiple-Output (MIMO) systems have been used and proven to increase the data throughput of …

Exploring FPGA Acceleration for Distributed Serverless Computing

Serverless computing has become a popular cloud computing paradigm. However, its deployment abstraction entails significant performance overheads. We explore the potential for enabling serverless computing on FPGAs and present some early results that …

REFL: Resource-Efficient Federated Learning

Federated Learning (FL) enables distributed training by learners using local data, thereby enhancing privacy and reducing communication. However, it presents numerous challenges relating to the heterogeneity of the data distribution, device …

Signal Detection for Large MIMO Systems Using Sphere Decoding on FPGAs

Wireless communication systems rely on aggressive spatial multiplexing Multiple-Input Multiple-Output (MIMO) access points to enhance network throughput. A significant computational hurdle for large MIMO systems is signal detection and decoding, …

High Throughput Multidimensional Tridiagonal System Solvers on FPGAs

We present a high performance tridiagonal solver library for Xilinx FPGAs optimized for multiple multi-dimensional systems common in real-world applications. An analytical performance model is developed and used to explore the design space and obtain …

FPGA Acceleration of Structured-Mesh-Based Explicit and Implicit Numerical Solvers using SYCL

We explore the design and development of structured-mesh-based solvers on Intel FPGA hardware using the SYCL programming model. Two classes of applications are targeted : (1) stencil applications based on explicit numerical methods and (2) …

Heterogeneous Communication Virtualization for Distributed Embedded Applications

Distributed embedded applications (DEAs) are typ- ically implemented on diverse embedded nodes interconnected through communication network(s) to exchange data and control information to achieve the desired functionality. Conventional approaches of …

StressBench: A Configurable Full System Network and I/O Benchmark Framework

We present StressBench, a network benchmarking framework written for testing MPI operations and file I/O concurrently. It is designed specifically to execute MPI communication and file access patterns that are representative of real-world scientific …

High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers

This paper presents a workflow for synthesizing near-optimal FPGA implementations of structured-mesh based stencil applications for explicit solvers. It leverages key characteristics of the application class and its computation-communication pattern …

Runtime Abstraction for Autonomous Adaptive Systems on Reconfigurable Hardware

Autonomous systems increasingly rely on on-board computation to avoid the latency overheads of offloading to more powerful remote computing. This requires the integration of hardware accelerators to handle the complex computations demanded by …