利用硬體感知計算的混合精度矩陣乘法：一種基於塊的方法

2508.14848v1

中文标题#

利用硬件感知計算的混合精度矩陣乘法：一種基於塊的方法

英文标题#

Leveraging Hardware-Aware Computation in Mixed-Precision Matrix Multiply: A Tile-Centric Approach

中文摘要#

通用矩陣乘法（GEMM）是支撐高性能計算（HPC）和人工智慧（AI）廣泛應用的關鍵操作。針對低精度算術優化的硬體的出現，需要重新評估數值算法，以利用混合精度計算，實現性能和能效的提升。本研究引入了一個自適應混合精度 GEMM 框架，可在細粒度的塊 /tile 級別支持不同的精度格式。我們利用 PaRSEC 運行時系統在各種架構上平衡工作負載。該性能在基於 ARM CPU 的 Fugaku 超級計算機、基於 Nvidia GPU 的 A100 DGX 和基於 AMD GPU 的 Frontier 超級計算機上表現良好。本研究旨在通過弥合算法進步與硬體創新之間的差距，提高計算效率和準確性，推動各種應用的變革性進展。

英文摘要#

General Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic necessitates a reevaluation of numerical algorithms to leverage mixed-precision computations, achieving improved performance and energy efficiency. This research introduces an adaptive mixed-precision GEMM framework that supports different precision formats at fine-grained tile/block levels. We utilize the PaRSEC runtime system to balance workloads across various architectures. The performance scales well on ARM CPU-based Fugaku supercomputer, Nvidia GPU-based A100 DGX, and AMD GPU-based Frontier supercomputer. This research aims to enhance computational efficiency and accuracy by bridging algorithmic advancements and hardware innovations, driving transformative progress in various applications.

文章页面#

利用硬件感知計算的混合精度矩陣乘法：一種基於塊的方法

PDF 获取#

查看中文 PDF - 2508.14848v1

智能達人抖店二維碼

抖音掃碼查看更多精彩內容