基于miRNA组学的数据增强算法

Data Augmentation Algorithm for miRNA Omics-Based Classifications

  • 摘要: 近年来,诸多研究揭示了miRNA的表达和疾病之间的关系,特别是其与肿瘤的发生、发展和治疗的密切关联。然而,传统的分子生物学测试方法既耗时又昂贵,患病样本获取困难,不平衡的数据集训练得到的分类器导致患病样本识别准确率低。面对以上挑战,提出了一种新的区分患病样本、健康样本以及挖掘疾病生物标志物的数据增强算法OCF,使用条件式生成对抗网络进行数据增强,然后用特征选择算法减少特征数量,最后再利用机器学习分类器进行分类识别并筛选出生物标志物进行分析。实验结果表明,该算法具有更好的分类性能,并验证了筛选出的生物标志物的准确性。

     

    Abstract: In recent years, many studies have revealed the relationship between microRNA expression and diseases, especially its close relationship with the occurrence, development and treatment of tumors. However, traditional molecular biology testing methods are time-consuming and expensive, and it is difficult to obtain disease samples. The classifier obtained from imbalanced data set training leads to low accuracy of disease sample recognition. In the face of the above challenges, we propose a new data augmentation algorithm OCF (original data-based conditional generative adversarial network for sample generation) to distinguish health samples from disease samples and mine disease biomarkers, by using conditional generative adversarial networks for data augmentation, followed by feature selection algorithms to reduce the number of features. Finally, the machine learning classifier is used for classification and recognition, and the biomarkers are selected for analysis. The experimental results show that our proposed algorithm has better classification performance, and verify the accuracy of the selected biomarkers.

     

/

返回文章
返回