Abstract:
At present, Uyghur sentimental speech synthesis uses prosodic boundary prediction method to realize emotional speech conversion. The speech synthesized by this method can express the corresponding emotions, but its emotional expression is not ideal. To solve this problem, this paper proposes an attention model of Uygur emotional prosodic phrases based on BiRNN. The model is used to classify emotion before prosodic conversion, and the classification results are used as input for prosodic boundary prediction to improve the method of prosodic conversion. The improved part-of-speech feature vector and prosodic phrase vectors are used to supplement the word vector, which effectively improve the accuracy of Uyghur text sentiment classification. The experimental results show that when the prosodic phrase composed of two words is used as a feature, the accuracy of the model achieves the best classification effect on the Uyghur five-category sentiment data set.