基于集成学习的不平衡图节点分类算法

Unbalanced graph node classification algorithm based on ensemble learning

  • 摘要: 图神经网络(GNN)被广泛应用于节点分类。然而,现有研究集中于平衡数据集,但是不平衡数据却普遍存在。传统处理不平衡数据集的方法,如重采样和重加权,往往需要进行较多的预处理或提出新的网络结构,容易引入新的偏差并导致信息丢失。该文提出了一种改良的装袋(bootstrap aggregating,Bagging)集成学习方法,对不平衡图数据集进行了k折划分,并采用GNN为基础模型对子数据集进行训练得到多个不同的子模型。最后,通过融合不同模型来提升节点的分类精度而不引入过多的预处理。基于不平衡图数据集的实验结果,表明所提出的方法在准确性和鲁棒性上优于基本分类器,此外,还发现分类精度随着k的增加先提高后降低。

     

    Abstract: Graph Neural Network (GNNs) has been widely employed in node classification over the past few years. However, existing research has predominantly focused on balanced datasets, whereas imbalanced data is prevalent. Traditional approaches to handling imbalanced datasets, such as resampling and reweighting, often require substantial preprocessing or proposing new network structures, which can introduce new biases and lead to information loss. An enhanced Bootstrap Aggregating (Bagging) ensemble learning method is proposed to address imbalanced graph datasets. It involves partitioning the data into k folds and training multiple distinct sub-models using GNNs as the base model. Finally, by fusing different models, the node classification accuracy is improved without introducing excessive preprocessing. Experimental results on imbalanced graph datasets demonstrate that the proposed method outperforms the base classifier in terms of accuracy and robustness. Additionally, it is observed that classification accuracy initially increases and then decreases with the increase of k.

     

/

返回文章
返回