Abstract:
Aiming at the problems of many parameters, large amount of computation and slow training speed of the current speech emotion recognition model, this paper proposes a lightweight network model suitable for small data sets. The model is based on the capsule network, and the deep separable convolution module is introduced to replace the original convolution layer in the capsule network to reduce the amount of calculation. Transfer learning is used to extract the universal underlying image features, and then spectrogram is used to finely tune the over fitting phenomenon of the whole network weakening model on small data sets. The angle cosine is used to calculate the vector similarity in the dynamic routing structure so as to improve the performance of the dynamic routing algorithm. The experimental results show that the recognition rate and operation speed of the lightweight capsule network are better than the seven deep learning network models.