GPU上Tensor Core加速的共轭梯度解法器

Tensor Core accelerated conjugate gradient solver on GPUs

  • 摘要: 共轭梯度方法(CG)和稳定双共轭梯度方法(BiCGSTAB)是求解稀疏线性系统的两种经典且高效的迭代方法,被广泛应用于科学计算和工程问题中。尽管GPU等并行处理器提升了这两种方法的并行性,但最新的硬件单元Tensor Core及其计算能力尚未被用于这两种方法中。该文设计了一个Tensor Core加速的CG解法器,利用Tensor Core计算CG和BiCGSTAB方法中的关键组件稀疏矩阵−向量乘法(SpMV)和点积操作,以发挥Tensor Core的计算能力,从而提升两种方法的整体性能。在NVIDIA A100和H100 GPU上的实验结果表明,Tensor Core加速的这两种方法相比调用CUDA官方库的基准版本在多个稀疏矩阵上均取得了显著的加速效果。

     

    Abstract: Conjugate gradient (CG) and biconjugate gradient stabilized (BiCGSTAB) are two classical and efficient iterative methods for solving sparse linear systems, widely used in scientific computing and engineering applications. Although GPUs and other parallel processors have enhanced the parallelism of these methods, the latest hardware unit, Tensor Core, and its computing power have not yet been fully exploited for these two methods. This work proposes a Tensor Core-accelerated CG solver that leverages Tensor Cores for the key components in the CG and BiCGSTAB methods, such as sparse matrix-vector multiplication (SpMV) and dot product computation, thereby exploiting the computational capability of Tensor Cores to improve the overall performance of both methods. Experimental results on NVIDIA A100 and H100 GPUs demonstrate that both of these methods accelerated by Tensor Core achieve significant speedups over the baseline version that uses the CUDA official library on various sparse matrices.

     

/

返回文章
返回