Abstract:
In recent years, many studies have revealed the relationship between microRNA expression and diseases, especially its close relationship with the occurrence, development and treatment of tumors. However, traditional molecular biology testing methods are time-consuming and expensive, and it is difficult to obtain disease samples. The classifier obtained from imbalanced data set training leads to low accuracy of disease sample recognition. In the face of the above challenges, we propose a new data augmentation algorithm OCF (original data-based conditional generative adversarial network for sample generation) to distinguish health samples from disease samples and mine disease biomarkers, by using conditional generative adversarial networks for data augmentation, followed by feature selection algorithms to reduce the number of features. Finally, the machine learning classifier is used for classification and recognition, and the biomarkers are selected for analysis. The experimental results show that our proposed algorithm has better classification performance, and verify the accuracy of the selected biomarkers.