Abstract:
As globalization continues to develop, cross-lingual summarization has become an important topic in natural language processing. In low-resource scenarios, existing methods face challenges such as limited representation transfer and insufficient data utilization. To address these issues, this paper proposes a novel method based on joint training and self-training. Specifically, two models are used to handle the translation and cross-lingual summarization tasks, respectively, which unify the language vector space of the output and avoid the issue of limited representation transfer. Additionally, joint training is performed by aligning the output features and probabilities of parallel training pairs, thereby enhancing semantic sharing between the models. Furthermore, based on joint training, a self-training technique is introduced to generate synthetic data from additional monolingual summary data, effectively mitigating the data scarcity issue of low-resource scenarios. Experimental results demonstrate that this method outperforms existing approaches in multiple low-resource scenarios, achieving significant improvements in ROUGE scores.