基因数据的交互依赖特征选择算法

An Algorithm for Cross-Dependent Feature Selection of Genetic Data

  • 摘要: 特征选择是生物信息领域中数据预处理阶段必不可少的步骤。传统特征选择算法忽视了特征之间的依赖相关性和冗余性,因此提出一种联合互信息的特征选择算法(JFRR)。该算法利用互信息计算特征之间的冗余值,并利用联合互信息分别计算已选特征集合、候选特征及类标签之间的相关性。将JFRR与其他6个特征选择算法在2个分类器上,使用9个不同基因数据集,进行分类准确率指标(Precision_micro和F1_micro)验证。实验结果表明,该算法能有效提高分类精度。

     

    Abstract: Feature selection is an essential step in the data preprocessing phase in the field of bioinformatics. Traditional feature selection algorithms ignore the problems of dependency relevance and redundancy between features. This paper proposes a joint feature relevance and redundancy (JFRR) algorithm for feature selection. The algorithm uses mutual information to calculate the redundancy values between features and applies joint mutual information to compute the relevance among the set of selected features, candidate features and class labels. Finally, JFRR is validated with the other six feature selection algorithms on two classifiers using nine different gene datasets with classification accuracy metrics (Precision_micro and F1_micro). The experimental results show that the JFRR method can effectively improve classification accuracy.

     

/

返回文章
返回