Extractive Document Summarization Model Based on Heterogeneous Graph and Keywords

ZHU Qilin; WANG Yu; XU Jian

doi:10.12178/1001-0548.2023019

Extractive document summarization uses certain strategies to select some sentences from lengthy texts to form a summary, whose key is to use as much semantic and structural information of the text as possible. In order to better mine such information and then use it to guide the summarization, an extractive document summarization model based on heterogeneous graph and keywords (HGKSum) is proposed, which models the text as a heterogeneous graph composed of sentence nodes and word nodes. The model uses the graph attention networks to learn the features of the nodes in the graph. The multi-task learning is applied to the model, which considers the keywords extraction task as an auxiliary task of the document summarization task. The candidate summary which derived from the prediction of the neural networks in the model is often highly redundant, so the model refines it to create the final summary of low redundancy. The comparative experiment on the document summarization benchmark shows that the proposed model outperforms the baselines. Besides, ablation studies also demonstrate the necessity of introducing heterogeneous nodes and keywords.

HTML

信息时代信息容量呈指数级增长，人们通常需要一份好的摘要来帮助自己处理如此庞大的信息，然而人工进行的摘要总结无论是从时间成本上还是经济成本上都已经变得越来越难以实施，因此依靠计算机的自动文本摘要技术已经成为了研究的热点。

自动文本摘要就是要在保留原文本主要思想的同时，对其进行压缩凝练，并最终产生一个简明流畅摘要的过程。一般来说，自动文本摘要按照其生成方式可以分为抽取式（extractive）摘要和生成式（abstractive）摘要^[1]。生成式摘要对文本进行深度分析，产生一个由新生成的句子构成的摘要；而抽取式摘要则是选择原文本中重要的句子或段落，并把它们组合起来组成摘要。与生成式摘要相比，抽取式摘要具有语义正确性高、语法错误少、摘要速度快等优点，因此，本文关注的是抽取式摘要。

在抽取式文本摘要的研究中，最重要的是如何对句子的重要性进行评价。传统的方法通常是基于统计信息的方法，该类方法使用一些统计信息，如句子位置、词频等来识别文档中的重要句子。尽管这类方法具有实现简单、计算迅速等优点，但由于它们没有考虑句子和词语的语义信息，所以生成的摘要质量相对较差。

语义信息是指文本中的语义单元（如字、词、句子等）所携带的符合人类认知的有意义的信息。人类正是通过感知语义信息，才能理解文本所表达的含义。而结构信息则更侧重于语义单元在文本中的布局信息，如开头和结尾的语义单元通常更重要，再如两个语义单元之间的距离通常能反映二者之间的联系。

为了更好地获得句子的语义信息和结构信息，基于深度学习的抽取式文本摘要方法开始变得流行起来。这类方法通常采用的是编码器−解码器框架，并使用循环神经网络（Recurrent Neural Network, RNN）对句子进行编码。然而，基于循环神经网络的模型通常难以捕获句子级的长程依赖，进而导致模型表示的跨句关系不够丰富，影响摘要的质量。

一种直观的捕获跨句关系的方法是将文本表示为图。基于图的抽取式文本摘要方法将文本单元（如词语、句子等）建模为图中的节点，根据其在文本中的关系（如共现关系、语法关系等）进行连边，之后在图上进行节点的排序算法得到摘要。大多数基于图的抽取式文本摘要方法构建的是同构图，只考虑句子一种节点，而忽视了词语等其他类型的文本单元，而少数基于异构图的方法又严重依赖外部工具，存在着错误传播的问题。

与摘要类似，关键词也能表示文本的主要信息，而且从某种程度上来讲，关键词可以看作是一种更细粒度的摘要。容易发现，关键词常常贯穿于整个文本中，且集中出现在参考摘要中，因此关键词对于跨句关系的捕获以及摘要的抽取有着至关重要的指导作用。然而目前的抽取式文本摘要方法却常常忽视这一重要信息，对其利用程度还远远不够。

基于此，本文研究了基于异构图和关键词的抽取式文本摘要模型HGKSum（Summarization Based on Heterogeneous Graph and Keywords）。为了丰富句子间的关系，此模型不仅依靠句子节点构图，还引入了词语节点，构造一种异构图。词语节点可以看作是句子节点之间的桥梁，让没有直接相邻的句子也间接联系在了一起。在如何对文本图进行学习的方面，模型采用图注意力网络来学习节点特征，可以很好地捕获图的结构信息。在如何利用关键词信息方面，模型使用关键词信息来缓解噪声词语节点对文本结构的影响，此外，由于关键词抽取任务和文本摘要任务具有互补性，模型使用多任务学习的思想，将关键词抽取任务作为文本摘要任务的辅助任务，在训练阶段，不仅对句子节点进行预测，对词语节点也进行预测，二者联合训练，能够获得更佳的摘要。

1. 相关工作

本文提出的模型在摘要过程中融入了关键词信息，因此相关工作主要包括文本摘要以及关键词抽取。

1.1. 文本摘要

文献[2]开创了系统研究文本摘要的先河，提出在排除停止词后，包含高频词语越多的句子越有可能成为摘要。之后，越来越多的文本摘要方法被提出，主要包括抽取式和生成式两类方法。生成式文本摘要方法目前主流的框架为序列到序列（Sequence to Sequence, Seq2Seq）模型，如文献[3]在编码阶段引入了注意力机制，在解码阶段使用束搜索（Beam Search）来生成摘要。由于本文提出的模型为抽取式模型，因此将重点关注抽取式文本摘要方法。

抽取式文本摘要方法大致可以分为有监督的方法和无监督的方法两类，有监督的方法需要标注数据，而无监督的方法则不需要提前标注。传统的基于统计信息的方法和基于图的方法都属于无监督的方法。早期的方法通常是基于统计信息的方法，文献[4]在关键词频的基础上引入了线索词（Cue Words）、标题词、句子位置等因素来计算句子权重。由于图能够很好地表示结构信息，基于图的方法也很早就得到了研究人员的关注。文献[5]根据句子的余弦相似度进行构图，使用文献[6]的思想对图中节点进行排序，提出了TextRank算法。文献[7]研究了使用语义角色信息来改进基于图的多文档摘要排序算法。文献[8]引入大规模预训练语言模型来捕获语义信息，并根据句子的位置信息将传统的无向图转换为有向图来进行摘要。

基于机器学习的方法通常是有监督的方法，其中基于深度学习的方法成为了新的研究热点。文献[9]提出了SummaRuNNer，这是一种基于顺序分类来提取文本摘要的两层双向门控循环单元（Gated Recurrent Unit, GRU）序列模型，其中每个句子按原始顺序依次访问，并判断是否使用它作为摘要。文献[10]研究了强化学习在抽取式文本摘要任务上的应用，提出了一种基于深度Q网络（Deep Q-Network, DQN）的摘要方法，该方法利用Q值对句子的显着性和冗余性进行建模。文献[11]针对案件舆情领域的摘要问题，提出使用图卷积网络（Graph Convolutional Network, GCN）来进行摘要。此外，基于BERT^[12]等预训练模型的方法也得到了许多研究并取得了不错的效果。MatchSum^[13]将抽取式文本摘要任务视作语义匹配问题，使用孪生BERT网络在摘要粒度上进行训练，选择最佳的候选摘要成为最终摘要。BertSum^[14]通过修改BERT模型的输入并叠加不同的分类器来对其进行微调，并最终获得摘要。

1.2. 关键词抽取

作为理解文本内容的最小单位，关键词表征了文本主题性和关键性的内容，并得到了研究人员的广泛关注。文献[15]提出了使用句子聚类和潜在狄利克雷分布的单文档关键短语提取算法。该算法利用句子向量之间的余弦相似度进行聚类，以突出显示语义相关的文本部分，之后分析反映文档主题的句子集群以获得文本的主要主题，这些主题中最重要的词被提取为关键词。YAKE算法^[16]基于文档的统计特征进行关键词抽取。该算法充分考虑了文档中词语的一些属性，如大小写、位置、频率、停止词的相似度和包含该单词的句子数等，提出了统一的评分公式，并最终生成冗余度低的关键词。文献[17]提出引入瑞丽熵来对词语重要性进行评判，进而抽取关键词。

与抽取式文本摘要算法类似，基于图的关键词抽取算法也是一类重要的算法。sCAKE算法^[18]关注词语在邻接句子之间的共现关系，利用图的truss分解导出文档中词语的语义关系，利用词语间的语义关系抽取关键词。文献[19]选择使用k-core、k-truss、k-shell等对语言网络图进行图收缩，抽取出影响力大的节点作为文档关键词。FLAKE算法^[20]根据词语在文档中出现的位置和次数，依靠模糊逻辑将文档构建为一个模糊图，并根据模糊图中节点的中心度来确定关键词。MGRank算法^[21]提出使用多重图来建模文档，使得两个节点之间的边数可以多于一条，提高了抽取关键词的质量。

4. 结束语

本文提出了一种基于异构图和关键词的抽取式文本摘要模型，构建的异构图能够很好地捕获复杂的句子关系，引入的关键词信息通过多任务学习的方式能够有效地指导摘要的抽取。在基准数据集上的实验表明，本文模型优于不依赖预训练模型的摘要模型。为了获取更好的文本表示，下一步的工作拟探索预训练模型在本模型中的应用。

Reference (43)

[1]	李金鹏, 张闯, 陈小军, 等. 自动文本摘要研究综述[J]. 计算机研究与发展, 2021, 58(1): 1-21.	LI J P, ZHANG C, CHEN X J, et al. Survey on automatic text summarization[J]. Journal of Computer Research and Development, 2021, 58(1): 1-21.
[2]	LUHN H P. The automatic creation of literature abstracts[J]. IBM Journal of Research and Development, 1958, 2(2): 159-165.
[3]	RUSH A M, CHOPRA S, WESTON J. A neural attention model for abstractive sentence summarization[C]//Proc of the 2015 Conf on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2015: 379-389.
[4]	EDMUNDSON H P. New methods in automatic extracting[J]. Journal of the ACM, 1969, 16(2): 264-285.
[5]	MIHALCEA R, TARAU P. TextRank: Bringing order into text[C]//Proc of the 2004 Conf on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2004: 404-411.
[6]	PAGE L, BRIN S, MOTWANI R, et al. The PageRank citation ranking: Bringing order to the web[R]. Palo Alto: Stanford InfoLab, 1999.
[7]	YAN S, WAN X J. SRRank: Leveraging semantic roles for extractive multi-document summarization[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(12): 2048-2058.
[8]	ZHENG H, LAPATA M. Sentence centrality revisited for unsupervised summarization[C]//Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 6236-6247.
[9]	NALLAPATI R, ZHAI F F, ZHOU B W. SummaRuNNer: A recurrent neural network based sequence model for extractive summarization of documents[C]//Proc of the 31st AAAI Conf on Artificial Intelligence. Palo Alto: AAAI, 2017: 3075-3081.
[10]	YAO K C, ZHANG L B, LUO T J, et al. Deep reinforcement learning for extractive document summarization[J]. Neurocomputing, 2018, 284: 52-62.
[11]	韩鹏宇, 余正涛, 高盛祥, 等. 案件要素句子关联图卷积的案件舆情摘要方法[J]. 软件学报, 2021, 32(12): 3829-3838.	HAN P Y, YU Z T, GAO S X, et al. Case-related public opinion summarization method based on graph convolution of sentence association graph with case elements[J]. Journal of Software, 2021, 32(12): 3829-3838.
[12]	DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding[C]//Proc of the 2019 Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Stroudsburg: ACL, 2019: 4171-4186.
[13]	ZHONG M, LIU P F, CHEN Y R, et al. Extractive summarization as text matching[C]//Proc of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 6197-6208.
[14]	LIU Y, LAPATA M. Text summarization with pretrained encoders[C]//Proc of the 2019 Conf on Empirical Methods in Natural Language Processing and the 9th Int Joint Conf on Natural Language Processing (EMNLP-IJCNLP). Stroudsburg: ACL, 2019: 3730-3740.
[15]	PASQUIER C. Single document keyphrase extraction using sentence clustering and latent Dirichlet allocation[C]//Proc of the 5th Int Workshop on Semantic Evaluation. Stroudsburg: ACL, 2010: 154-157.
[16]	CAMPOS R, MANGARAVITE V, PASQUALI A, et al. YAKE! Keyword extraction from single documents using multiple local features[J]. Information Sciences, 2020, 509: 257-289.
[17]	SINGHAL A, SHARMA D K. Keyword extraction using Renyi entropy: A statistical and domain independent method[C]//2021 7th Int Conf on Advanced Computing and Communication Systems (ICACCS). Piscataway, NJ: IEEE, 2021: 1970-1975.
[18]	DUARI S, BHATNAGAR V. sCAKE: Semantic connectivity aware keyword extraction[J]. Information Sciences, 2019, 477: 100-117.
[19]	TIXIER A, MALLIAROS F, VAZIRGIANNIS M. A graph degeneracy-based approach to keyword extraction[C]//Proc of the 2016 Conf on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2016: 1860-1870.
[20]	JAIN A, MITTAL K, VAISLA K S. FLAKE: Fuzzy graph centrality-based automatic keyword extraction[J]. The Computer Journal, 2022, 65(4): 926-939.
[21]	GOZ F, MUTLU A. MGRank: A keyword extraction system based on multigraph GoW model and novel edge weighting procedure[J]. Knowledge-Based Systems, 2022, 251: 109292.
[22]	PENNINGTON J, SOCHER R, MANNING C D. GloVe: Global vectors for word representation[C]//Proc of the 2014 Conf on Empirical Methods in Natural Language Processing (EMNLP). Stroudsburg: ACL, 2014: 1532-1543.
[23]	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[24]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[25]	VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks[EB/OL]. [2023-01-03]. https://arxiv.org/abs/1710.10903.
[26]	MAAS A L, HANNUN A Y, NG A Y. Rectifier nonlinearities improve neural network acoustic models[C]//Proc of the 30th Int Conf on Machine Learning. Brookline, MA: Microtome Publishing, 2013: 2104-2109.
[27]	BRODY S, ALON U, YAHAV E. How attentive are graph attention networks?[EB/OL]. [2023-01-03]. https://arxiv.org/abs/2105.14491.
[28]	VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]//Proc of the 31st Int Conf on Neural Information Processing Systems. New York: Curran Associates Inc, 2017: 6000-6010.
[29]	HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Los Alamitos: IEEE Computer Society, 2016: 770-778.
[30]	BA J L, KIROS J R, HINTON G E. Layer normalization[EB/OL]. [2023-01-05]. https://arxiv.org/abs/1607.06450.
[31]	NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines[C]//Proc of the 27th Int Conf on Machine Learning. Madison, WI: Omnipress, 2010: 807-814.
[32]	PAULUS R, XIONG C M, SOCHER R. A deep reinforced model for abstractive summarization[EB/OL]. [2023-01-03]. https://arxiv.org/abs/1705.04304.
[33]	CARBONELL J, GOLDSTEIN J. The use of MMR, diversity-based reranking for reordering documents and producing summaries[C]//Proc of the 21st Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 1998: 335-336.
[34]	HERMANN K M, KOČISKÝ T, GREFENSTETTE E, et al. Teaching machines to read and comprehend[C]//Proc of the 28th Int Conf on Neural Information Processing Systems-Volume 1. Cambridge: MIT Press, 2015: 1693-1701.
[35]	NALLAPATI R, ZHOU B W, SANTOS C, et al. Abstractive text summarization using sequence-to-sequence RNNs and beyond[C]//Proc of the 20th SIGNLL Conf on Computational Natural Language Learning. Stroudsburg: ACL, 2016: 280-290.
[36]	SEE A, LIU P J, MANNING C D. Get to the point: Summarization with pointer-generator networks[C]//Proc of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Stroudsburg: ACL, 2017: 1073-1083.
[37]	LIN C Y, HOVY E. Automatic evaluation of summaries using n-gram co-occurrence statistics[C]//Proc of the 2003 Conf of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Stroudsburg: ACL, 2003: 71-78.
[38]	KINGMA D P, BA J. Adam: A method for stochastic optimization[EB/OL]. [2023-01-03]. https://arxiv.org/abs/1412.6980.
[39]	NARAYAN S, COHEN S B, LAPATA M. Ranking sentences for extractive summarization with reinforcement learning[C]//Proc of the 2018 Conf of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Stroudsburg: ACL, 2018: 1747-1759.
[40]	DONG Y, SHEN Y K, CRAWFORD E, et al. BanditSum: Extractive summarization as a contextual bandit[C]//Proc of the 2018 Conf on Empirical Methods in Natural Language Processing. Stroudsburg: ACL, 2018: 3739-3748.
[41]	ZHOU Q Y, YANG N, WEI F R, et al. Neural document summarization by jointly learning to score and select sentences[C]//Proc of the 56th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2018: 654-663.
[42]	ZHONG M, LIU P F, WANG D Q, et al. Searching for effective neural extractive summarization: What works and what’s next[C]//Proc of the 57th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2019: 1049-1058.
[43]	WANG D Q, LIU P F, ZHENG Y N, et al. Heterogeneous graph neural networks for extractive document summarization[C]//Proc of the 58th Annual Meeting of the Association for Computational Linguistics. Stroudsburg: ACL, 2020: 6209-6219.

指标	γ=0.0	γ=0.1	γ=0.2	γ=0.3	γ=0.4	γ=0.5	γ=0.6	γ=0.7	γ=0.8	γ=0.9	γ=1.0
R-1	41.26	42.08	41.91	42.41	42.54	42.59	42.71	42.75	42.78	42.67	42.60
R-2	18.33	19.06	18.98	19.27	19.43	19.39	19.58	19.64	19.62	19.40	19.52
R-L	37.57	38.39	38.31	38.73	38.86	38.87	39.03	39.13	39.12	39.03	38.92

指标	λ=0.5	λ=0.6	λ=0.7	λ=0.8	λ=0.9	λ=1.0
R-1	42.76	42.75	42.75	42.74	42.74	42.24
R-2	19.58	19.55	19.64	19.57	19.59	19.15
R-L	39.10	39.08	39.13	39.03	39.13	38.52

模型	R-1	R-2	R-L
ORACLE	52.59	31.24	48.87
LEAD-3	40.34	17.70	36.57
REFRESH	40.00	18.20	36.60
BanditSum	41.50	18.70	37.60
NeuSUM	41.59	19.01	37.98
PNBERT	42.69	19.60	38.85
MatchSum	44.41	20.86	40.55
HSG	42.03	19.44	38.51
HGKSum	42.75	19.64	39.13

模型	R-1	R-2	R-L
HGKSum	42.75	19.64	39.13
M1	42.12	19.04	38.43
M2	42.60	19.52	38.92
M3	41.26	18.33	37.57
M4	42.51	19.40	38.81
M5	42.24	19.15	38.52
M6	41.65	19.08	38.07
M7	42.30	19.20	38.58
M8	42.46	19.35	38.74

Extractive Document Summarization Model Based on Heterogeneous Graph and Keywords

doi: 10.12178/1001-0548.2023019

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views