中文标题#
利用硬件感知計算的混合精度矩陣乘法:一種基於塊的方法
英文标题#
Leveraging Hardware-Aware Computation in Mixed-Precision Matrix Multiply: A Tile-Centric Approach
中文摘要#
通用矩陣乘法(GEMM)是支撐高性能計算(HPC)和人工智慧(AI)廣泛應用的關鍵操作。 針對低精度算術優化的硬體的出現,需要重新評估數值算法,以利用混合精度計算,實現性能和能效的提升。 本研究引入了一個自適應混合精度 GEMM 框架,可在細粒度的塊 /tile 級別支持不同的精度格式。 我們利用 PaRSEC 運行時系統在各種架構上平衡工作負載。 該性能在基於 ARM CPU 的 Fugaku 超級計算機、基於 Nvidia GPU 的 A100 DGX 和基於 AMD GPU 的 Frontier 超級計算機上表現良好。 本研究旨在通過弥合算法進步與硬體創新之間的差距,提高計算效率和準確性,推動各種應用的變革性進展。
英文摘要#
General Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic necessitates a reevaluation of numerical algorithms to leverage mixed-precision computations, achieving improved performance and energy efficiency. This research introduces an adaptive mixed-precision GEMM framework that supports different precision formats at fine-grained tile/block levels. We utilize the PaRSEC runtime system to balance workloads across various architectures. The performance scales well on ARM CPU-based Fugaku supercomputer, Nvidia GPU-based A100 DGX, and AMD GPU-based Frontier supercomputer. This research aims to enhance computational efficiency and accuracy by bridging algorithmic advancements and hardware innovations, driving transformative progress in various applications.
文章页面#
PDF 获取#
抖音掃碼查看更多精彩內容