基于网络节点中心性的新闻重要性评价研究

曹开臣; 陈明仁; 张千明; 蔡世民; 周涛

doi:10.12178/1001-0548.2020355

基于网络节点中心性的新闻重要性评价研究

doi: 10.12178/1001-0548.2020355

1.
西南电子技术研究所　成都　610036
2.
电子科技大学大数据研究中心　成都　611731

基金项目: 国家自然科学基金(61703074，11975071)

详细信息

作者简介:
曹开臣(1984 − )，男，高级工程师，主要从事信息管理系统方面的研究

通讯作者: 蔡世民，E-mail：shimincai@uestc.edu.cn

中图分类号: TP391

Research on Importance Evaluation of News Based on Nodal Centralities of Complex Network

1.
Southwest Institute of Electronic Technology　Chengdu　610036
2.
Big Data Research Center, University of Electronic Science and Technology of China　Chengdu　611731

摘要: 评价权威报刊的新闻重要性对于正确理解国家政策变化具有重要意义。该文以《人民日报》为例，抽取发表在1946−2008年期间的新闻，利用其内容相似性构建新闻网络。从复杂网络视角，一篇新闻与其他新闻的相似性越高，其在新闻网络中连接越紧密，具有较大的节点中心性。鉴于此，该文将H指数引入PageRank排序算法，提出H-PageRank排序算法，利用其计算H-PageRank中心性，评价新闻重要性。在实验过程中，考虑到不同领导核心执政时期《人民日报》的新闻风格与新闻版面的差异性将新闻划分为4个时代，基于表示学习分别形成对应的新闻网络。研究结果表明：1) 4个新闻网络的拓扑结构都表现出高聚类性与同配性，且具有近似幂律的度分布，表现出复杂网络一般特性；2) 基于多种网络节点中心性指标，对每个新闻网络中的节点进行全局排序，并以是否成为头版新闻为重要性的评价准则计算得到相近的AUC值，然后基于局部排序的Top-N评价方法计算得到正确率、召回率和F1指标，综合以上指标的实验结果表明，H-PageRank中心性显著优于其他算法的中心性，验证H-PageRank排序算法的有效性；3) 针对每个新闻网络，基于网络节点中心性的Top-N评价方法不同排序列表长度条件，其计算得到的正确率显著高于理论基准，表明评价方法的鲁棒性。
- H-PageRank排序算法 /
- 新闻重要性评价 /
- 新闻网络 /
- 节点中心性 /
- 表示学习
Abstract: It is of great significance to correctly evaluate the importance of news in national newspapers and magazines for better understanding the changes of national policies. In this paper, we take People’s Daily as an example, extract news published in 1946−2008, and construct news network by using their content-based similarities. In the view of complex network, news has higher similarities with others, making it be closely connected and larger nodal centrality in news network. In respect to this, we propose an H-PageRank ranking algorithm by introducing the H-index to improve the PageRank ranking algorithm. In the experiment, all news in People’s Daily is divided into four stages according to their styles and editions in different governing times, which is respectively used to construct news networks based on representation learning. The experimental results show that 1) the topologies of four news networks all have a general properties of complex network, including the high clustering coefficients, positive assortativity coefficients and approximately power-law degree distributions; 2) each news network presnets a mostly similar AUC calculated by the global rank score of the front-page news according to diverse nodal centralities, however the precision, recall and F1-score calculated by the Top-N evaluating model according to the H-PageRank centrality are optimal, which validate the efficiency of local ranking news according to the H-PageRank centrality; 3) the precision of each news network is significantly superior to the theoretical baselines even when the ranking list is restricted into different length, which suggests the roubustness of evaluating model.
- H-PageRank algorithm /
- importance evaluation of news /
- news network /
- nodal centrality /
- representation learning

图 1 新闻重要性评价方法工作流程图

下载: 全尺寸图片幻灯片

图 2 新闻网络度分布

下载: 全尺寸图片幻灯片

图 4 不同局部排序列表长度条件下基于7种中心性的Top-N评价方法的性能对比分析

下载: 全尺寸图片幻灯片

表 1 各个时代的子数据集属性信息

时间划分	数据子集	新闻数量/条	头版新闻数量/条	头版新闻占比/%	非头版新闻数量/条	非头版新闻占比/%
1946.05.01−1977.07.15	Stage1	445 749	77 021	17.28	368 728	82.72
1977.07.16−1987.06.24	Stage2	237 351	30 542	12.87	206 809	87.13
1987.06.25−2002.11.14	Stage3	476 690	48 142	10.10	428 548	89.90
2002.11.15−2008.12.31	Stage4	207 400	15 697	7.57	191 703	92.43

下载: 导出CSV

表 2 各个时代的新闻网络拓扑属性

网络	节点数/个	边数/条	连通分支数	最大连通分支	平均度	聚类系数	同配系数
Stage1	136311	1436869	15259	86 430 (63.4%)	21.08	0.40	0.72
Stage2	86099	247532	12115	48 664 (56.5%)	5.75	0.39	0.82
Stage3	215671	662558	23060	153 080 (71.0%)	6.14	0.41	0.84
Stage4	69989	172406	10719	39 584 (56.6%)	4.93	0.39	0.72

下载: 导出CSV

[1]	汪晓东, 张炜, 钱一彬. 风雨兼程, 与党和人民同行——写在人民日报创刊七十周年之际[J]. 新闻战线, 2018(11): 2-7. WANG Xiao-dong, ZHANG Wei, QIAN Yi-bin. Go hand in hand with the party and the people——Written on the occasion of the 70th anniversary of the People's Daily[J]. News Front, 2018(11): 2-7.
[2]	ZHONG Wei-feng, CHAN J T. Reading China: Predicting policy change with machine learning[M]. [S.l.]: American Enterprise Institute, 2018.
[3]	周涛, 柏文洁, 汪秉宏, 等. 复杂网络研究综述[J]. 物理, 2005, 34(1): 31-36. doi: 10.3321/j.issn:0379-4148.2005.01.007 ZHOU Tao, BAI Wen-jie, WANG Bing-hong, et al. A brief review of complex networks[J]. Physics, 2005, 34(1): 31-36. doi: 10.3321/j.issn:0379-4148.2005.01.007
[4]	周涛, 张子柯, 陈关荣, 等. 复杂网络研究的机遇与挑战[J]. 电子科技大学学报, 2014, 43(1): 1-5. doi: 10.3969/j.issn.1001-0548.2014.01.001 ZHOU Tao, ZHANG Zi-ke, CHEN Guan-rong, et al. The opportunities and chanllenges of complex netowrks research[J]. Journal of University of Electronic Science and Technology of China, 2014, 43(1): 1-5. doi: 10.3969/j.issn.1001-0548.2014.01.001
[5]	WANG R, LIN P, LIU M, et al. Hierarchical connectome modes and critical state jointly maximize human brain functional diversity[J]. Physical Review Letters, 2019, 123(3): 038301. doi: 10.1103/PhysRevLett.123.038301
[6]	AKSOY S G, PURVINE E, COTLLA-SANCHEZ E, et al. A generative graph model for electrical infrastructure networks[J]. Journal of Complex Networks, 2019, 7(1): 128-162. doi: 10.1093/comnet/cny016
[7]	SCOTT J. Social network analysis[J]. Sociology, 1988, 22(1): 109-127. doi: 10.1177/0038038588022001007
[8]	BOVET A, MAKSE H A, Influence of fake news in Twitter during the 2016 US presidential election[J]. Nature Communication, 2020, 36(12): DOI: http://doi.org/10.1038/s41467-020-20274-1.
[9]	顾亦然, 朱梓嫣. 基于LeaderRank和节点相似度的复杂网络重要节点排序算法[J]. 电子科技大学学报, 2017, 46(2): 441-448. doi: 10.3969/j.issn.1001-0548.2017.02.020 GU Yi-ran, ZHU Zi-yan. Node ranking in complex networks based on LeaderRank and modes similarity[J]. Journal of University of Electronic Science and Technology of China, 2017, 46(2): 441-448. doi: 10.3969/j.issn.1001-0548.2017.02.020
[10]	胡小军, 郭强, 杨凯, 等. 基于相对熵的多属性作者学术影响力排名研究[J]. 电子科技大学学报, 2018, 47(2): 279-285. doi: 10.3969/j.issn.1001-0548.2018.02.019 HU Xiao-jun, GUO Qiang, YANG Kai, et al. Multi-attribute researcher academic influence ranking based on relative entropy[J]. Journal of University of Electronic Science and Technology of China, 2018, 47(2): 279-285. doi: 10.3969/j.issn.1001-0548.2018.02.019
[11]	顾亦然, 许梦馨. 基于PageRank的新闻关键词提取算法[J]. 电子科技大学学报, 2017, 46(5): 777-783. doi: 10.3969/j.issn.1001-0548.2017.05.021 GU Yi-ran, XU Meng-xin. Keyword extraction from news articles based on PageRank algorithm[J]. Journal of University of Electronic Science and Technology of China, 2017, 46(5): 777-783. doi: 10.3969/j.issn.1001-0548.2017.05.021
[12]	LÜ L, CHEN D, REN X L, et al. Vital nodes identification in complex networks[J]. Physics Reports, 2016, 650: 1-64. doi: 10.1016/j.physrep.2016.06.007
[13]	朱军芳, 陈端兵, 周涛, 等. 网络科学中相对重要节点挖掘方法综述[J]. 电子科技大学学报, 2019, 48(4): 595-603. doi: 10.3969/j.issn.1001-0548.2019.04.018 ZHU Jun-fang, CHEN Duan-bing, ZHOU Tao, et al. A survey on mining relatively important nodes in network science[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(4): 595-603. doi: 10.3969/j.issn.1001-0548.2019.04.018
[14]	PAGE L, BRIN S, MOTWANI R, et al. The pagerank citation ranking: Bringing order to the web[R]. [S.l.]: Stanford InfoLab, 1999.
[15]	KIM S J, LEE S H. An improved computation of the pagerank algorithm[C]//European Conference on Information Retrieval. [S.l.]: Springer, 2002: 73-85.
[16]	宋聚平, 王永成, 尹中航, 等. 对网页PageRank算法的改进[J]. 上海交通大学学报, 2003, 37(3): 397-400. doi: 10.3321/j.issn:1006-2467.2003.03.024 SONG Ju-ping, WANG Yong-cheng, YIN Zhong-hang, et al. Improved PageRank alorihtm for ordering web pages[J]. Journl of Shanghai Jiaotong University, 2003, 37(3): 397-400. doi: 10.3321/j.issn:1006-2467.2003.03.024
[17]	HIRSCH J E. An index to quantify an individual ’s scientific research output[J]. Proceedings of the National academy of Sciences USA, 2005, 102(46): 16569-16572. doi: 10.1073/pnas.0507655102
[18]	LÜ L, ZHOU T, ZHANG Q M, et al. The H-index of a network node and its relation to degree and coreness[J]. Nature Communications, 2016, 7: 10168. doi: 10.1038/ncomms10168
[19]	范天龙, 吕琳瑗. H-指数及其衍生指标的本质探讨[J]. 电子科技大学学报, 2019, 48(1): 142-149. doi: 10.3969/j.issn.1001-0548.2019.01.021 FANG Tian-long, LÜ Lin-yuan. The study of the essense of H-index and its derivative indices[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(1): 142-149. doi: 10.3969/j.issn.1001-0548.2019.01.021
[20]	LEVY O, GOLDBERG Y. Neural word embedding as implicit matrix factorization[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. [S.1.]: ACM, 2014: 2177-2185.
[21]	MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. [S.1.]: ACM, 2013: 3111-3119.
[22]	DATAR M, IMMORLICA N, INDYK P, et al. Locality-sensitive hashing scheme based on p-stable distributions[C]//Proceedings of the Twentieth Annual Symposium on Computational Geometry. [S.l.]: ACM, 2004: 253-262.
[23]	陈玲娇, 蔡世民, 张千明, 等. 基于信任关系的资源分配推荐算法改进研究[J]. 电子科技大学学报, 2019, 48(3): 449-455. doi: 10.3969/j.issn.1001-0548.2019.03.022 CHEN Ling-jiao, CAI Shi-min, ZHANG Qian-ming, et al. Improved research on resource-allocation recommendation algorithm based on trust relationship[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(3): 449-455. doi: 10.3969/j.issn.1001-0548.2019.03.022
[24]	LÜ L, ZHANG Y C, YEUNG C H, et al. Leaders in social networks: The delicious case[J]. PLoS ONE, 2011, 6(6): e21202.
[25]	MACKENZIE K D. Structural centrality in communications networks[J]. Psychometrika, 1966, 31(1): 17-25. doi: 10.1007/BF02289453
[26]	BAVELAS A. Communication patterns in task-oriented groups[J]. The Journal of The Acoustical Society of America, 1950, 22(6): 725-730. doi: 10.1121/1.1906679
[27]	ZAKI M J, MEIRA J W, MEIRA W. Data mining and analysis: fundamental concepts and algorithms[M]. Cambridge: Cambridge University Press, 2014.
[28]	ZHOU T, REN J, MEDO M, et al. Bipartite network projection and personal recommendation[J]. Physical Review E, 2007, 76: 046115.

[1]	张典, 王洁宁, 李昭颖, 刘润楠, 郑文. 基于BVANet的财经新闻情感分析 . 电子科技大学学报, 2023, 52(2): 263-270. doi: 10.12178/1001-0548.2022058
[2]	张婷婷, 杨红雨, 林毅. 融合表示学习的中医面部穴位检测框架 . 电子科技大学学报, 2023, 52(2): 175-181. doi: 10.12178/1001-0548.2022392
[3]	郁湧, 钱天宇, 高悦, 艾合买提尼牙孜, 刘金卓. 基于结构平衡理论和高阶互信息的符号网络表示学习算法 . 电子科技大学学报, 2023, 52(5): 780-788. doi: 10.12178/1001-0548.2022168
[4]	龚志豪, 蒋沅, 代冀阳, 杨智翔. 基于交叉熵的节点重要性排序算法 . 电子科技大学学报, 2023, 52(6): 944-953. doi: 10.12178/1001-0548.2023058
[5]	王哲, 郭强, 刘建国. 基于会计报表和网络中心性的指数增强策略研究 . 电子科技大学学报, 2021, 50(3): 459-466. doi: 10.12178/1001-0548.2020296
[6]	梁耀洲, 郭强, 殷冉冉, 杨剑楠, 刘建国. 基于排名聚合的时序网络节点重要性研究 . 电子科技大学学报, 2020, 49(4): 519-523. doi: 10.12178/1001-0548.2019087
[7]	杨越, 黄瑞章, 魏琴, 陈艳平, 秦永彬. 基于上下文语义的新闻人名纠错方法 . 电子科技大学学报, 2019, 48(6): 809-814. doi: 10.3969/j.issn.1001-0548.2019.06.002
[8]	朱军芳, 陈端兵, 周涛, 张千明, 罗咏劼. 网络科学中相对重要节点挖掘方法综述 . 电子科技大学学报, 2019, 48(4): 595-603. doi: 10.3969/j.issn.1001-0548.2019.04.018
[9]	郭强, 殷冉冉, 刘建国. 基于TOPSIS的时序网络节点重要性研究 . 电子科技大学学报, 2019, 48(2): 296-300. doi: 10.3969/j.issn.1001-0548.2019.02.021
[10]	孙健, 廖丹, 李可, 巩玉, 孙罡. 基于排队论的异构数据中心性能及能源管理策略 . 电子科技大学学报, 2018, 47(2): 161-168. doi: 10.3969/j.issn.1001-0548.2018.02.001
[11]	朱为华, 刘凯, 闫小勇, 汪明, 吴金闪. 识别流网络关键节点的虚拟外界投入产出分析法 . 电子科技大学学报, 2018, 47(2): 292-297. doi: 10.3969/j.issn.1001-0548.2018.02.021
[12]	叶娅兰, 何文文, 程云飞, 侯孟书, 李云霞. 面向压缩感知的基于相关性字典学习算法 . 电子科技大学学报, 2017, 46(5): 703-708. doi: 10.3969/j.issn.1001-0548.2017.05.011
[13]	何海江. 基于排序学习算法的软件错误定位模型研究 . 电子科技大学学报, 2017, 46(3): 577-582. doi: 10.3969/j.issn.1001-0548.2017.03.016
[14]	顾亦然, 朱梓嫣. 基于LeaderRank和节点相似度的复杂网络重要节点排序算法 . 电子科技大学学报, 2017, 46(2): 441-448. doi: 10.3969/j.issn.1001-0548.2017.02.020
[15]	顾亦然, 许梦馨. 基于PageRank的新闻关键词提取算法 . 电子科技大学学报, 2017, 46(5): 777-783. doi: 10.3969/j.issn.1001-0548.2017.05.021
[16]	尤志强, 朱燕燕, 韩筱璞, 吕琳媛. 基于任务队列的新闻报道模型 . 电子科技大学学报, 2016, 45(2): 295-300.
[17]	陈思宝, 徐丹洋, 罗斌. 一种非负稀疏近邻表示的多标签学习算法 . 电子科技大学学报, 2015, 44(6): 899-904. doi: 10.3969/j.issn.1001-0548.2015.06.018
[18]	李静茹, 喻莉, 赵佳. 加权社交网络节点中心性计算模型 . 电子科技大学学报, 2014, 43(3): 322-328. doi: 10.3969/j.issn.1001-0548.2014.03.001
[19]	武妍, 王守觉. 一种并联抑制神经网络结构及学习算法 . 电子科技大学学报, 2006, 35(3): 399-402.
[20]	周纪, 程亮, 胡钢. OBS边缘节点中突发包组装算法的实现 . 电子科技大学学报, 2004, 33(6): 697-700.

点击查看大图

图(4) / 表(2)

计量

文章访问数: 5169
HTML全文浏览量: 1395
PDF下载量: 70
被引次数: 0

全文HTML

国家级(或权威)报刊的头版新闻通常报道与国家政治、经济政策相关的重要信息。正确评价国家级报刊的新闻重要性对理解国家政策变化具有重要意义。如《人民日报》是中国共产党的机关报刊，也是中共中央向外界表达其观点的宣传工具。《人民日报》的头版新闻(第01版要闻)在不同时期对我国政治、经济政策有着指导作用，甚至起到了改变中国发展历史的作用^[1]。《人民日报》除了为外界提供中国共产党的政策及观点等直接信息外，其社论亦反映了中共中央对事件的处理意见，被外界作为分析我国政治、经济政策变化的重要渠道之一^[2]。

在自然界中存在的大量复杂系统都可以通过不同的复杂网络加以描述^[3]。一个典型的复杂网络是由许多节点与节点之间的连边组成，其中节点用来代表真实系统中不同的实体，而连边则用来表示实体间的关系^[4]。如大脑神经系统可以看作大量神经细胞(或功能单元)通过神经纤维相互连接形成的神经网络^[5]；电力系统可以看作不同的变压设备通过电缆相互连接形成的电力网络^[6]；社会系统可以看作不同的人通过交互关系连接形成的社交网络^[7]。类似地，新闻媒体(如《人民日报》)也可以通过其新闻(文章)的内容相似度形成新闻网络^[8]。

在复杂网络研究领域，节点重要性可以用于识别社交网络中意见领袖^[9]、科研人员的学术影响力^[10]及新闻关键词提取^[11]等等。网络节点中心性是对节点核心地位的定量刻画，以反映该节点在网络中的重要性^[12-13]。本文通过对H指数和PageRank排序算法的深入分析，提出改进的PageRank排序算法—H-PageRank。H-PageRank排序算法保留了H指数局部邻域内对大度节点的依赖，同时结合了PageRank算法随机游走的全局性，是一种兼顾局部和全局中心性的节点排序算法。

本文以《人民日报》为例，抽取发表在1946−2008年期间的新闻，利用新闻内容相似性，基于表示学习构建新闻网络。在此基础上，基于H-PageRank排序算法对新闻网络进行节点重要性度量。以是否成为头版新闻作为新闻重要性评价准则，相比其他的网络节点中心性指标，H-PageRank中心性能够更正确地评价新闻重要性，验证H-PageRank排序算法的有效性。