Abstract:
In order to optimize Chinese-Tibetan neural machine translation (NMT) based on attention mechanism, this paper proposes a Tibetan byte-pair encoding algorithm with maximum byte threshold to improve the original byte-pair encoding algorithm. By collecting one million Chinese-Tibetan sentence pairs and dictionaries with 200, 000 Chinese-Tibetan names and places, we train the Chinese-Tibetan NMT model using attention mechanism. Our model has a better translation result in named entity compared with commercial using of Chinese-Tibetan online translation and it achieves 36.84 in bilingual evaluation understudy (BLEU) score. Our work has already deployed in Chinese-Tibetan machine translation system web which will promote the spread and application of Chinese-Tibetan NMT system.