基于异构图和关键词的抽取式文本摘要模型

Extractive Document Summarization Model Based on Heterogeneous Graph and Keywords

  • 摘要: 抽取式文本摘要使用一定的策略从冗长的文本中选择一些句子组成摘要,其关键在于要尽可能多地利用文本的语义信息和结构信息。为了更好地挖掘这些信息,进而利用它们指导摘要的抽取,提出了一种基于异构图和关键词的抽取式文本摘要模型(HGKSum)。该模型首先将文本建模为由句子节点和词语节点构成的异构图,在异构图上使用图注意力网络学习节点的特征,之后将关键词抽取任务作为文本摘要任务的辅助任务,使用多任务学习的方式进行训练,得到候选摘要,最后对候选摘要进行精炼以降低冗余度,得到最终摘要。在基准数据集上的对比实验表明,该模型性能优于基准模型,此外,消融实验也证明了引入异构节点和关键词的必要性。

     

    Abstract: Extractive document summarization uses certain strategies to select some sentences from lengthy texts to form a summary, whose key is to use as much semantic and structural information of the text as possible. In order to better mine such information and then use it to guide the summarization, an extractive document summarization model based on heterogeneous graph and keywords (HGKSum) is proposed, which models the text as a heterogeneous graph composed of sentence nodes and word nodes. The model uses the graph attention networks to learn the features of the nodes in the graph. The multi-task learning is applied to the model, which considers the keywords extraction task as an auxiliary task of the document summarization task. The candidate summary which derived from the prediction of the neural networks in the model is often highly redundant, so the model refines it to create the final summary of low redundancy. The comparative experiment on the document summarization benchmark shows that the proposed model outperforms the baselines. Besides, ablation studies also demonstrate the necessity of introducing heterogeneous nodes and keywords.

     

/

返回文章
返回