GPU上Tensor Core加速的共轭梯度解法器

Tensor core accelerated conjugate gradient solver on GPUs

  • 摘要: 共轭梯度方法(CG)和稳定双共轭梯度方法(BiCGSTAB)是求解稀疏线性系统的两种经典且高效的迭代方法,被广泛应用于科学计算和工程问题中。尽管GPU等并行处理器提升了这两种方法的并行性,但最新的硬件单元Tensor Core及其计算能力尚未被用于这两种方法中。该文设计了一个Tensor Core加速的CG解法器,利用Tensor Core计算CG和BiCGSTAB方法中的关键组件稀疏矩阵-向量乘法(SpMV)和点积操作,以发挥Tensor Core的计算能力,从而提升两种方法的整体性能。在NVIDIA A100和H100 GPU上的实验结果表明,Tensor Core加速的这两种方法相比调用CUDA官方库的基准版本在多个稀疏矩阵上均取得了显著的加速效果。

     

    Abstract: Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) are two classical and efficient iterative methods for solving sparse linear systems, widely used in scientific computing and engineering applications. Although GPUs and other parallel processors have enhanced the parallelism of these methods, the computing power of the latest computing units, Tensor Cores, have not yet been fully exploited for these two methods. This work proposes a Tensor Core-accelerated CG solver that leverages Tensor Cores for the key components in the CG and BiCGSTAB methods, such as sparse matrix-vector multiplication (SpMV) and dot product computation, thereby exploiting the computational capability of Tensor Cores to improve the overall performance of both methods. Experimental results on NVIDIA A100 and H100 GPUs demonstrate that both approachs proposed in this work achieve significant speedups compared to the baseline version using the CUDA official librarys on various sparse matrices.

     

/

返回文章
返回