细胞穿膜肽识别问题的多特征融合卷积网络预测算法

Integrating Multiple Feature Representations in the Convolution Neural Network Prediction Algorithm for the Cell-Penetrating Peptides

  • 摘要: 细胞穿膜肽是一类特殊的多肽,具有独特的医学价值,因此如何通过计算方法高效地识别细胞穿膜肽是一个值得研究的重要问题。目前的主流方法是使用各种特征表示算法获取序列特征,然后使用机器学习分类器进行分类。提出了一种新的识别算法 ConvCPP,利用改进的卷积神经网络提取蛋白质序列特征。改进之处包括在卷积层之前添加注意力层,并且优化了池化层的池化方式。设计消融实验来验证改进的有效性,之后结合多种其他基于蛋白质序列特征的特征提取算法,并测试了两种特征选择算法,最终得到最优的向量表示。再根据得到的向量表示,结合多种机器学习分类器对蛋白质序列进行分类识别。在基准数据集上的实验表明,该算法比当前的细胞穿膜肽识别方法具有更好的预测性能。

     

    Abstract: Cell-penetrating peptides(CPPs) are a special class of peptides with unique medical value. Therefore, it’s important to efficiently identify CPPs by computational methods. The current mainstream method is to use different feature extraction algorithms to extract sequence features and then use machine learning classifier for classification. This paper proposed a novel classification method: ConvCPP, which uses an improved convolution neural network to extract protein sequence features. The improvements include adding an attention layer before the convolution layer and optimizing the pooling layer. Ablation experiments were designed to verify the effectiveness of the improvement. After that, a variety of other feature extraction algorithms based on protein sequence features are combined, and two feature selection algorithms are tested, and finally the optimal vector representation is obtained. Then, according to the vector, the protein sequences are classified and recognized with a variety of machine learning classifiers. On the benchmark data set, the proposed method shows better prediction performance than the state-of-the-art CPPs predictors.

     

/

返回文章
返回