Abstract:
Cell-penetrating peptides(CPPs) are a special class of peptides with unique medical value. Therefore, it’s important to efficiently identify CPPs by computational methods. The current mainstream method is to use different feature extraction algorithms to extract sequence features and then use machine learning classifier for classification. This paper proposed a novel classification method: ConvCPP, which uses an improved convolution neural network to extract protein sequence features. The improvements include adding an attention layer before the convolution layer and optimizing the pooling layer. Ablation experiments were designed to verify the effectiveness of the improvement. After that, a variety of other feature extraction algorithms based on protein sequence features are combined, and two feature selection algorithms are tested, and finally the optimal vector representation is obtained. Then, according to the vector, the protein sequences are classified and recognized with a variety of machine learning classifiers. On the benchmark data set, the proposed method shows better prediction performance than the state-of-the-art CPPs predictors.