基于共同邻居有效性的复杂网络链路预测算法

Kai WANG; Shu-xin LIU; Hong-tao YU; Xing LI

doi:10.3969/j.issn.1001-0548.2019.03.020

基于共同邻居有效性的复杂网络链路预测算法

doi: 10.3969/j.issn.1001-0548.2019.03.020

国家数字交换系统工程技术研究中心郑州 450002

基金项目: 国家自然科学基金创新研究群体项目（61521003）

详细信息

作者简介:
王凯(1980-), 男, 博士生, 副研究员, 主要从事社会网络分析方面的研究

中图分类号: TP391;N94

Predicting Missing Links of Complex Network via Effective Common Neighbors

National Digital Switching System Engineering and Technological R & D Center Zhengzhou 450002

More Information

Corresponding author: 刘树新, E-mail:liushuxin11@126.com

摘要: 链路预测旨在预测网络中的缺失连边，对于实际网络演化机制的了解具有重要意义。虽然现有研究已经提出了很多相似性指标，但它们都忽视了不同网络结构下共同邻居的有效性，而局部拓扑结构信息尤其是共同邻居结构在计算节点间相似性中发挥重要作用。考虑到共同邻居周围局部拓扑信息，该文提出了一种高效共同邻居指标。该指标首先分析了共同邻居所有连边的有效性，分别从端点两侧量化了节点的有效性；然后，通过分析共同邻居节点拓扑有效性对两侧资源分配过程的影响刻画节点间相似性。15个实际网络数据实验表明，相比现有经典的9种方法，所提方法具有较高的预测精度。
- complex network /
- effective common neighbors /
- link prediction /
- network topology /
- similarity
Abstract: Link prediction can predict the missing links of complex networks, which promotes a better understanding of evolution mechanisms in real networks. Many similarity indices have been proposed based on a topology structure for link prediction. Local topological information, especially common neighbors, plays an important role in calculating the similarity between two endpoints. However, plenty of similarity indices ignore the effectiveness of common neighbors under different topology structures. Considering the local topological information around common neighbors, an effective common neighbor index is proposed. Firstly, we analyze the effectiveness of all neighbor links of common neighbors. Then, based on the local topology on both sides of two endpoints around common neighbors, the effectiveness of two sides of common neighbors is quantified separately. Finally, the similarity between two endpoints is described through the effect of common neighbors' effectiveness on bilateral resource allocation process. Empirical study on 15 real networks shows that the index proposed can achieve higher prediction accuracy, compared with 9 mainstream baselines.
- complex network /
- effective common neighbors /
- link prediction /
- network topology /
- similarity

Figure 1. Effectiveness of common neighbors between two endpoints.

下载: 全尺寸图片幻灯片

Figure 2. Precision curves of ECN index with the change of parameter for 15 real networks. Each precision point is the average of 20 realizations

下载: 全尺寸图片幻灯片

Table 1. Basic topological features of real networks

Data	$\left\| V \right\|$	$\left\| E \right\|$	$\langle k\rangle $	$\langle d\rangle $	r	C	H
Jazz	198	2 742	27.7	2.24	0.02	0.633	1.4
CE	297	2 148	14.46	2.46	-0.163	0.308	1.8
Kohonen	3 704	12 683	3.4	5.64	-0.121	0.252	11.2
USAir	332	2 128	12.81	2.74	-0.208	0.749	3.36
Hamster	1 858	12 534	13.49	3.39	-0.085	0.090	3.36
Metabolic	453	2 025	8.94	2.67	-0.226	0.647	4.49
Email	167	5 784	69.26	1.87	-0.295	0.541	1.66
Yeast	2 375	11 693	9.85	5.10	0.469	0.378	3.48
PB	1 222	16 717	27.36	2.74	-0.221	0.361	2.97
Flight	2 939	30 501	20.75	4.18	0.051	0.255	5.22
SciMet	3 084	10 399	6.74	4.18	-0.033	0.151	2.78
FWEW	69	880	25.51	1.64	-0.298	0.552	1.27
Router	5 022	6 258	2.49	5.99	-0.138	0.033	5.5
Power	4 941	6 594	2.67	18.9	0.003	0.107	1.45
FWFB	128	2 075	32.42	1.78	-0.112	0.335	1.24

下载: 导出CSV

Table 2. Comparison of the precision value between ECN (optimal value) and some similarity indices. Each value is the average of 20 realizations, each of which corresponds to an independent division of ${E^T}$and ${E^P}$

Data	CN	RA	AA	CAR	LP^a	LP^b	Katz^a	Katz^b	ACT	Cos+	SPM	ECN
Jazz	0.810	0.809	0.830	0.851	0.798	0.775	0.798	0.740	0.258	0.356	0.895	0.861
CE	0.129	0.131	0.137	0.129	0.138	0.138	0.138	0.138	0.064	0.069	0.207	0.210
Kohonen	0.157	0.145	0.154	0.211	0.159	0.159	0.159	0.155	0.010	0.020	0.089	0.304
USAir	0.611	0.636	0.632	0.612	0.615	0.612	0.615	0.604	0.492	0.104	0.589	0.676
Hamster	0.015	0.004	0.011	0.037	0.017	0.054	0.017	0.075	0.089	0.020	0.898	0.186
Metabolic	0.206	0.305	0.253	0.195	0.201	0.200	0.201	0.200	0.120	0.110	0.445	0.456
Email	0.724	0.733	0.727	0.691	0.726	0.722	0.726	0.714	0.633	0.637	0.809	0.889
Yeast	0.668	0.495	0.693	0.670	0.671	0.733	0.670	0.721	0.565	0.239	0.699	0.876
PB	0.431	0.253	0.389	0.472	0.436	0.465	0.436	0.467	0.127	0.342	0.669	0.522
Flight	0.511	0.354	0.443	0.631	0.521	0.559	0.522	0.554	0.344	0.030	0.127	0.646
SciMet	0.149	0.130	0.147	0.165	0.149	0.150	0.149	0.147	0.000	0.000	0.079	0.203
FWEW	0.147	0.163	0.154	0.142	0.158	0.189	0.158	0.197	0.271	0.000	0.543	0.387
Router	0.110	0.092	0.113	0.109	0.132	0.132	0.131	0.130	0.000	0.019	0.015	0.306
Power	0.130	0.071	0.097	0.131	0.148	0.148	0.146	0.146	0.000	0.086	0.017	0.178
FWFB	0.090	0.087	0.086	0.089	0.096	0.133	0.096	0.146	0.178	0.030	0.718	0.378
^aParameter $\alpha {\rm{ = 0}}{\rm{.001}}$. ^bParameter$\alpha {\rm{ = 0}}{\rm{.01}}$.

下载: 导出CSV

[1]	BARABÁSI A L. Scale-free networks:A decade and beyond[J]. Science, 2009, 325(5939):412-413. doi: 10.1126/science.1173299
[2]	LI K, LI H J, WANG H. Situation analysis of relationship in social networks based on link entropy[J]. Modern Physics Letters B, 2015, 29(13):1550061. doi: 10.1142/S021798491550061X
[3]	BULLMORE E, SPORNS O. Complex brain networks:Graph theoretical analysis of structural and functional systems[J]. Nature Reviews Neuroscience, 2009, 10(3):186-198. doi: 10.1038/nrn2575
[4]	SCHWEITZER F, FAGIOLO G, SORNETTE D, et al. Economic networks:The new challenges[J]. Science, 2009, 325(5939):422-425. doi: 10.1126/science.1173644
[5]	LIN J, BAN Y. Complex network topology of transportation systems[J]. Transport Reviews, 2013, 33(6):658-685. doi: 10.1080/01441647.2013.848955
[6]	WANG P, XU B W, WU Y R, et al. Link prediction in social networks:The state-of-the-art[J]. Science China Information Sciences, 2015, 58(1):1-38. http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_1303.7264
[7]	VON M C, JENSEN L J, SNEL B, et al. STRING:Known and predicted protein-protein associations, integrated and transferred across organisms[J]. Nucleic Acids Research, 2005, 33(suppl.1):433-437. http://d.old.wanfangdata.com.cn/Periodical/jsjkxjsxb-e201706006
[8]	SCELLATO S, NOULAS A, MASCOLO C. Exploiting place features in link prediction on location-based social networks[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, CA, USA: ACM, 2011: 1046-1054.
[9]	HOLLAND P W, LASKEY K B, LEINHARDT S. Stochastic blockmodels:First steps[J]. Social Networks, 1983, 5(2):109-137. doi: 10.1016/0378-8733(83)90021-7
[10]	LIU S, XU X F, XIE R G, et al. Anomalous heat conduction and anomalous diffusion in low dimensional nanoscale systems[J]. The European Physical Journal B, 2012, 85(10):337. doi: 10.1140/epjb/e2012-30383-8
[11]	LÜ L, ZHOU T. Link prediction in complex networks:A survey[J]. Physica A:Statistical Mechanics and its Applications, 2011, 390(6):1150-1170. doi: 10.1016/j.physa.2010.11.027
[12]	LORRAIN F, WHITE H C. Structural equivalence of individuals in social networks[J]. Social Networks. 1977(1):67-98.
[13]	ZHOU T, LÜ L, ZHANG Y C. Predicting missing links via local information[J]. The European Physical Journal B, 2009, 71(4):623-630. doi: 10.1140/epjb/e2009-00335-8
[14]	ADAMIC L A, ADAR E. Friends and neighbors on the web[J]. Social Networks, 2003, 25(3):211-230. doi: 10.1016/S0378-8733(03)00009-1
[15]	LÜ L, JIN C H, ZHOU T. Similarity index based on local paths for link prediction of complex networks[J]. Physical Review E, 2009, 80(4):046122. doi: 10.1103/PhysRevE.80.046122
[16]	LIU S, JI X, LIU C, et al. Extended resource allocation index for link prediction of complex network[J]. Physica A:Statistical Mechanics and its Applications, 2017, 479:174-183. doi: 10.1016/j.physa.2017.02.078
[17]	LIU S, JI X, LIU C, et al. Similarity indices based on link weight assignment for link prediction of unweighted complex networks[J]. International Journal of Modern Physics B, 2017, 31(2):1650254. doi: 10.1142/S0217979216502544
[18]	YAO Y, ZHANG R, YANG F, et al. Link prediction based on local weighted paths for complex networks[J]. International Journal of Modern Physics C, 2017, 28(4):1750053. doi: 10.1142/S012918311750053X
[19]	MA Y, CHENG G, LIU Z, et al. Clustering-based link prediction in scientific coauthorship networks[J]. International Journal of Modern Physics C, 2017, 28(06):1750082. doi: 10.1142/S0129183117500826
[20]	FENG X, ZHAO J C, XU K. Link prediction in complex networks:A clustering perspective[J]. The European Physical Journal B, 2012, 85(1):1-3. doi: 10.1140/epjb/e2011-20818-1
[21]	LÜ L, PAN L, ZHOU T, et al. Toward link predictability of complex networks[J]. Proceedings of the National Academy of Sciences, 2015, 112(8):2325-2330. doi: 10.1073/pnas.1424644112
[22]	KATZ L. A new status index derived from sociometric analysis[J]. Psychometrika, 1953, 18(1):39-43. doi: 10.1007/BF02289026
[23]	LEICHT E A, HOLME P, NEWMAN M E J. Vertex similarity in networks[J]. Physical Review E, 2006, 73(2):026120. doi: 10.1103/PhysRevE.73.026120
[24]	KLEIN D J, RANDIĆ M. Resistance distance[J]. Journal of Mathematical Chemistry, 1993, 12(1):81-95. doi: 10.1007/BF01164627
[25]	FOUSS F, PIROTTE A, RENDERS J M, et al. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3):355-369. doi: 10.1109/TKDE.2007.46
[26]	CANNISTRACI C V, ALANIS-LOBATO G, RAVASI T. From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks[J]. Scientific Reports, 2013, 3:1613-1618. doi: 10.1038/srep01613
[27]	HERLOCKER J L, KONSTAN J A, TERVEEN L G, et al. Evaluating collaborative filtering recommender systems[J]. ACM Transactions on Information Systems (TOIS), 2004, 22(1):5-53. doi: 10.1145/963770
[28]	GLEISER P M, DANON L. Community structure in jazz[J]. Advances in Complex Systems, 2003, 6(4):565-573. doi: 10.1142/S0219525903001067
[29]	WATTS D J, STROGATZ S H. Collective dynamics of 'small-world' networks[J]. Nature, 1998, 393(6684):440-445. doi: 10.1038/30918
[30]	ZHANG Q M, LÜ L, WANG W Q, et al. Potential theory for directed networks[J]. PloS One, 2013, 8(2):e55437. http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_1202.2709
[31]	BATAGELJ V, MRVAR A. Pajek-program for large network analysis[J]. Connections, 1998, 21(2):47-57. http://cn.bing.com/academic/profile?id=ae97bbc52dc8eb50567885156695e92f&encoded=0&v=paper_preview&mkt=zh-cn
[32]	LÜ L, PAN L, ZHOU T, et al. Toward link predictability of complex networks[J]. Proceedings of the National Academy of Sciences, 2015, 112(8):2325-2330. doi: 10.1073/pnas.1424644112
[33]	DUCH J, ARENAS A. Community detection in complex networks using extremal optimization[J]. Physical Review E, 2005, 72(2):027104. doi: 10.1103/PhysRevE.72.027104
[34]	MICHALSKI R, PALUS S, KAZIENKO P. Matching organizational structure and social network extracted from email communication[C]//International Conference on Business Information Systems. Berlin, Heidelberg: Springer, 2011: 197-206.
[35]	BU D, ZHAO Y, CAI L, et al. Topological structure analysis of the protein-protein interaction network in budding yeast[J]. Nucleic Acids Research, 2003, 31(9):2443-2450. doi: 10.1093/nar/gkg340
[36]	ADAMIC L A, GLANCE N. The political blogosphere and the 2004 US election: Divided they blog[C]//Proceedings of the 3rd International Workshop on Link Discovery. Chicago, USA: ACM, 2005: 36-43.
[37]	OPSAHL T, AGNEESSENS F, SKVORETZ J. Node centrality in weighted networks:Generalizing degree and shortest paths[J]. Social Networks, 2010, 32(3):245-251. doi: 10.1016/j.socnet.2010.03.006
[38]	ULANOWICZ R E, DEANGELIS D L. Network analysis of trophic dynamics in south florida ecosystems[J]. US Geological Survey Program on the South Florida Ecosystem, 2005, 114:45-48.
[39]	SPRING N, MAHAJAN R, WETHERALL D. Measuring ISP topologies with Rocketfuel[J]. ACM SIGCOMM Computer Communication Review, 2002, 32(4):133-145. doi: 10.1145/964725
[40]	NEWMAN M E J. Assortative mixing in networks[J]. Physical Review Letters, 2002, 89(20):208701. doi: 10.1103/PhysRevLett.89.208701

[1]	苏晓萍, 查英华, 曲鸿博. 一种异质图的Lorentz嵌入模型 . 电子科技大学学报, 2023, 52(1): 146-153. doi: 10.12178/1001-0548.2021284
[2]	谢怡燃, 李国华, 杨波. 基于站点线路数的城市公交网络鲁棒性研究 . 电子科技大学学报, 2022, 51(4): 630-640. doi: 10.12178/1001-0548.2021336
[3]	方祺娜, 许小可. 基于异质模体特征的社交网络链路预测 . 电子科技大学学报, 2022, 51(2): 274-281. doi: 10.12178/1001-0548.2021181
[4]	赵娜, 李杰, 王剑, 彭西阳, 景铭, 聂永杰, 郁湧. 基于邻层传播的相对重要节点挖掘方法 . 电子科技大学学报, 2021, 50(1): 121-126. doi: 10.12178/1001-0548.2020283
[5]	王曦, 许爽, 许小可. 融合用户行为同步指标的链路预测研究 . 电子科技大学学报, 2021, 50(2): 276-284. doi: 10.12178/1001-0548.2020241
[6]	李治成, 吉立新, 刘树新, 李星, 李劲松. 基于拓扑有效连通路径的有向网络链路预测方法 . 电子科技大学学报, 2021, 50(1): 127-137. doi: 10.12178/1001-0548.2020220
[7]	李艳丽, 周涛. 链路预测中的局部相似性指标 . 电子科技大学学报, 2021, 50(3): 422-427. doi: 10.12178/1001-0548.2021062
[8]	曹红艳, 许小可, 许爽. 基于多模体特征的科学家合作预测 . 电子科技大学学报, 2020, 49(5): 766-773. doi: 10.12178/1001-0548.2019173
[9]	朱军芳, 陈端兵, 周涛, 张千明, 罗咏劼. 网络科学中相对重要节点挖掘方法综述 . 电子科技大学学报, 2019, 48(4): 595-603. doi: 10.3969/j.issn.1001-0548.2019.04.018
[10]	吴宗柠, 樊瑛. 复杂网络视角下国际贸易研究综述 . 电子科技大学学报, 2018, 47(3): 469-480. doi: 10.3969/j.issn.1001-0548.2018.03.023
[11]	朱为华, 刘凯, 闫小勇, 汪明, 吴金闪. 识别流网络关键节点的虚拟外界投入产出分析法 . 电子科技大学学报, 2018, 47(2): 292-297. doi: 10.3969/j.issn.1001-0548.2018.02.021
[12]	张海霞, 吕振, 张传亭, 袁东风. 一种引入加权异构信息的改进协同过滤推荐算法 . 电子科技大学学报, 2018, 47(1): 112-116, 152. doi: 10.3969/j.issn.1001-0548.2018.01.017
[13]	郭婷婷, 赵承业. 异常链路分析在电力网络恢复中的应用 . 电子科技大学学报, 2016, 45(5): 854-859. doi: 10.3969/j.issn.1001-0548.2016.05.024
[14]	张昌利, 龚建国, 闫茂德. 基于复杂网络的社会化标签语义相似度分析 . 电子科技大学学报, 2012, 41(5): 642-648. doi: 10.3969/j.issn.1001-0548.2012.05.001
[15]	朱大勇, 张新丽, 李树全. 利用局部拓扑信息发现模糊社团结构 . 电子科技大学学报, 2011, 40(1): 73-79. doi: 10.3969/j.issn.1001-0548.2011.01.014
[16]	王文强, 张千明. 链路预测的网络演化模型评价方法 . 电子科技大学学报, 2011, 40(2): 174-179. doi: 10.3969/j.issn.1001-0548.2011.02.003
[17]	吕琳媛. 复杂网络链路预测 . 电子科技大学学报, 2010, 39(5): 651-661. doi: 10.3969/j.issn.1001-0548.2010.05.002
[18]	郑永红, 彭世, 黄小丽, 靳映霞. 流动相似律在气流式倾角传感器中的应用 . 电子科技大学学报, 2007, 36(5): 931-934.
[19]	耿技, 印鉴. 改进的共享型最近邻居聚类算法 . 电子科技大学学报, 2006, 35(1): 70-72.
[20]	杨清夙, 游志胜, 张先玉. 基于豪斯多夫距离的快速多人脸检测算法 . 电子科技大学学报, 2004, 33(4): 407-409.

点击查看大图

图(2) / 表(2)

计量

文章访问数: 4030
HTML全文浏览量: 1357
PDF下载量: 127
被引次数: 0

全文HTML

With the development of complex network research, many more real complex systems have been described as complex network^[1-5]. Link prediction aims to predict the likelihood that a link exists between two endpoints in complex networks^[6]. Predicting missing links plays an important role in discovering the unknown connections in protein-protein interaction networks^[7], forecasting the future relationships in online social networks^[8], and identifying spurious flights in air transportation networks^[9].

Based on evolution mechanisms of complex networks, a lot of link prediction methods have been proposed. Among these methods, topology-based similarity indices have got a great success due to their high performance and low complexity^[10]. Similarity indices can be classified as local indices and global indices according to complexity and length of paths^[11]. Considering the local information including common neighbors and short paths, local indices, such as common neighbor (CN) index^[12]directly using the number of common neighbors, resource allocation (RA) index^[13]and Adamic-Adar (AA) index^[14] weighting the common neighbors by their node degree, local path (LP) index^[15]and extended resource allocation (ERA)^[16]index adding paths with length 3 to CN and RA respectively, perform very well in real networks with lower complexity. Ref.[17] proposed a local weighted method, which can improve the prediction accuracy of several similarity indices (including CN, AA, RA, Salton, Jaccard and LP). Similarly, Ref.[18]proposed a local weighted path (LWP) index to differentiate the contributions between paths. Considering the clustering coefficient of nodes, some clustering-based methods^[19-20]have been proposed to improve the prediction accuracy. Ref.[21]believed that the regularity of a network was reflected in the consistency of structural features before and after a random removal of a small set of links and proposed a universal structural consistency index. To improve the prediction accuracy, many global indices have been proposed based on global paths, including Katz index^[22], LHN-II^[23], average commute time (ACT)^[24] and cosine similarity time (Cos+)^[25]. The global indices can achieve a good performance, but most of them are not suitable for large networks due to their higher complexity. Therefore, how to effectively use the local topological information may be a good way to predict the missing links of large-scale networks.

In the real world, such as human relationship network, if there are more indirect connections between two people and their common friends, they are more likely to be friend. Considering two pairs of endpoints x, y and x', y' with a common neighbor. There are two longer paths between x', y', and two indirect connections between x, y and their common neighbor. According to CN, RA, AA and LP index, the similarity between two pairs of nodes are the same, though the links between endpoints x and y are more likely to exist. Because there are more indirect connections between x, y and their common neighbor, common neighbor of x and y is more effective in similarity calculating. In a word, the topology information related to common neighbors plays an important role in link prediction.

Based on the above discussion, an effective common neighbor index (ECN) is proposed, which weights the common neighbors by the local topology information around them. With a parameter adjusting the strength of effective common neighbors, ECN can achieve the optimal prediction accuracy for different networks. Empirical study has shown that the ECN index proposed can significantly improve the prediction accuracy, compared with 9 similarity indices in 15 real networks.

1. Baselines

Several classical similarity indices are introduced here, including 5 local indices: CN, RA, AA, CAR and LP index, and 4 global indices: Katz, ACT, Cos+ and SPM index. A brief description is shown as follows:

1) CN index^[12]measures the similarity of two endpoints by the number of their common neighbors:

$$s_{xy}^{{\rm{CN}}} = |{\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)|$$

(1)

where ${\mathit{\Gamma}} (x)$is the set of neighbors of node x, and ${\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)$denotes the common neighbors between nodes x and y.

2) RA index^[13]publishes the common neighbors with big degree:

$$s_{xy}^{{\rm{RA}}} = \sum\limits_{z \in |{\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)|} {\frac{1}{{{k_z}}}} $$

(2)

where ${k_z}$ is the node degree of node $z$.

3) AA index^[14]weights the common neighbors by nodes degree:

$$s_{xy}^{{\rm{AA}}} = \sum\limits_{z \in |{\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)|} {\frac{1}{{\log ({k_z})}}} $$

(3)

4) CAR index^[26]believes that two nodes are more likely to link together if their common-first-neighbors are members of a strongly inner-linked cohort:

$$s_{xy}^{{\rm{CAR}}} = \left| {{\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)} \right|\sum\limits_{z \in |{\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)|} {\frac{{\left| {\gamma (z)} \right|}}{2}} $$

(4)

where $\left| {\gamma (z)} \right|$refers to the sub-set of neighbors of $z$ that are also common neighbors of nodes$x$and$y$.

5) LP index^[15]adds the path information with length 3 to CN:

$${\mathit{\boldsymbol{S = }}}{{\mathit{\boldsymbol{A}}}^{\rm{2}}}{\mathit{\boldsymbol{ + }}}\alpha {{\mathit{\boldsymbol{A}}}^{\rm{3}}}$$

(5)

where $\alpha $ is the parameter and A is the adjacency matrix.

6) Katz index^[22]considers all the path information between two endpoints and gives more weights to the shorter paths:

$$s_{xy}^{{\rm{Katz}}} = \sum\limits_{l = 1}^\infty {{\varepsilon ^l}|{\rm{path}}_{xy}^l|} $$

(6)

where ${\rm{path}}_{xy}^l$ is the set of paths with length l between x and y, and $\varepsilon $ is the adjust parameter.

7) ACT^[24]measures similarity by calculating the average steps of random walk between two endpoints:

$$s_{xy}^{{\rm{ACT}}} = \frac{1}{{l_{xx}^ + + l_{yy}^ + - 2l_{xy}^ + }}$$

(7)

where $l_{xy}^ + $ is the corresponding entry in ${{\mathit{\boldsymbol{L}}}^{\mathit{\boldsymbol{ + }}}}$, and ${{\mathit{\boldsymbol{L}}}^{\mathit{\boldsymbol{ + }}}}$ denotes the pseudo-inverse of matrix ${\mathit{\boldsymbol{L}}} = {\mathit{\boldsymbol{D}}} - {\mathit{\boldsymbol{A}}}$.

8) Cos+^[25]is based on ${{\mathit{\boldsymbol{L}}}^{\mathit{\boldsymbol{ + }}}}$ by calculating similarity of two vectors:

$$s_{xy}^{{\rm{Cos + }}} = \frac{{v_x^{\rm{T}}{v_y}}}{{|{v_x}||{v_y}|}} = \frac{{l_{xy}^ + }}{{\sqrt {l_{xx}^ + l_{yy}^ + } }}$$

(8)

9) SPM^[21]measures the structural consistency of a network by perturbing its adjacency matrix and observing the change of eigenvalues provided the fixed eigenvectors:

$$s_{xy}^{{\rm{SPM}}} = \sum\limits_{k = 1}^N {({\lambda _k} + \Delta {\lambda _k})} {x_k}x_k^{\rm{T}}$$

(9)

where ${\lambda _k}$ and ${x_k}$ are the eigenvalue and the corresponding orthogonal and normalized eigenvector for${A^{\rm{R}}}$, respectively.

3. Metric and Datasets

To quantify the prediction accuracy, the standard metric called precision^[27]is introduced. Precision is defined as the ratio of relevant links to the top L predicted links (here we set L=100). If there are m relevant links appeared in the probe set, the precision value is:

$${\rm{Precision}} = \frac{m}{L}$$

(12)

Clearly, the higher precision value means higher prediction accuracy.

Our experiments are performed on fifteen different real networks: 1) Jazz^[28]:the network of Jazz musicians. 2) Caenorhabditis elegans (CE)^[29]: the neural network of the nematode worm. 3) Kohonen^[30]: the citing network of articles with topic "self-organizing maps" or references to "Kohonen T". 4) USAir^[31]: the USA airline network. 5) Hamster^[32]: a friendship network of users on the website hamsterster.com. 6) Metabolic^[33]: the metabolic network of Caenorhabditis elegans. 7) Email network (Email)^[34]: the internal email communication network between employees of a mid-sized manufacturing company. 8) Yeast PPI (Yeast)^[35]: a protein-protein interaction network of yeast. 9) Political blogs (PB)^[36]: a network of US political blogs. 10) Openflights (Flight)^[37]: the network of flights between air-ports of the world. 11) Scientometrics (SciMet)^[30]: the citing network of the articles from scientometrics. 12) Food Web of Everglades ecosystem (FWEW)^[38]: the network of carbon exchanges occurring during the wet season in the Everglades Graminoid Marshes of South Florida. 13) Route topology of internet (Router)^[39]: the topology of internet in router-level. 14) Power Grid (Power)^[29]: an electrical power grid network of western US; 15) Food Web of Florida Bay ecosystem (FWFB)^[38]: the network of carbon exchanges occurring during the wet season in Florida Bay.

Table 1 shows the basic topological features of these networks. $\left| V \right|$is the number of nodes, $\left| E \right|$ denotes the number of links, $\left\langle k \right\rangle $ indicates the average degree. $\left\langle d \right\rangle $denotes the average distance. r is the assortativity coefficient^[40]. C represents the clustering coefficient^[29]. H is the degree heterogeneity.

Table 1. Basic topological features of real networks

Data	$\left\| V \right\|$	$\left\| E \right\|$	$\langle k\rangle $	$\langle d\rangle $	r	C	H
Jazz	198	2 742	27.7	2.24	0.02	0.633	1.4
CE	297	2 148	14.46	2.46	-0.163	0.308	1.8
Kohonen	3 704	12 683	3.4	5.64	-0.121	0.252	11.2
USAir	332	2 128	12.81	2.74	-0.208	0.749	3.36
Hamster	1 858	12 534	13.49	3.39	-0.085	0.090	3.36
Metabolic	453	2 025	8.94	2.67	-0.226	0.647	4.49
Email	167	5 784	69.26	1.87	-0.295	0.541	1.66
Yeast	2 375	11 693	9.85	5.10	0.469	0.378	3.48
PB	1 222	16 717	27.36	2.74	-0.221	0.361	2.97
Flight	2 939	30 501	20.75	4.18	0.051	0.255	5.22
SciMet	3 084	10 399	6.74	4.18	-0.033	0.151	2.78
FWEW	69	880	25.51	1.64	-0.298	0.552	1.27
Router	5 022	6 258	2.49	5.99	-0.138	0.033	5.5
Power	4 941	6 594	2.67	18.9	0.003	0.107	1.45
FWFB	128	2 075	32.42	1.78	-0.112	0.335	1.24

Each original data is randomly divided into training set containing 90% of links, and the probe set containing the remaining 10%.

5. Conclusions

There are a lot of similarity indices considering local information between endpoints such as common neighbors and shorter paths. However, most of them ignore the effectiveness of common neighbors. Based on local information, an ECN index is proposed, which weights the common neighbors by the local topology information around them. Empirical results demonstrate that ECN index can significantly improve the prediction accuracy in 15 real networks, compared with 7 similarity indices. The measurement of effective common neighbors and their enhancement are both beneficial to the improvement of prediction accuracy or description of similarity between two endpoints. The ECN index is more suitable for the link prediction of networks concerning more about the prediction accuracy of the top L predicted links (such as protein-protein interaction networks). In addition, the complexity of ECN is between $O(N{\left\langle k \right\rangle ^2})$ (CN) and ${\rm{2}}O(N{\left\langle k \right\rangle ^2})$. Therefore, it can also be applied to predict missing links of large-scale networks.

参考文献 (40)

姓名
邮箱
手机号码
标题
留言内容
验证码

留言板

基于共同邻居有效性的复杂网络链路预测算法

doi: 10.3969/j.issn.1001-0548.2019.03.020

作者简介:
王凯(1980-), 男, 博士生, 副研究员, 主要从事社会网络分析方面的研究

Predicting Missing Links of Complex Network via Effective Common Neighbors

Corresponding author: 刘树新, E-mail:liushuxin11@126.com

计量

Predicting Missing Links of Complex Network via Effective Common Neighbors

doi: 10.3969/j.issn.1001-0548.2019.03.020

国家数字交换系统工程技术研究中心郑州 450002

通讯作者: 刘树新, E-mail:liushuxin11@126.com

English Abstract

Predicting Missing Links of Complex Network via Effective Common Neighbors

National Digital Switching System Engineering and Technological R & D Center Zhengzhou 450002

Corresponding author: 刘树新, E-mail:liushuxin11@126.com

全文HTML

目录

期刊在线

编辑办公

友情链接

留言板

基于共同邻居有效性的复杂网络链路预测算法

doi: 10.3969/j.issn.1001-0548.2019.03.020

作者简介: 王凯(1980-), 男, 博士生, 副研究员, 主要从事社会网络分析方面的研究

Predicting Missing Links of Complex Network via Effective Common Neighbors

Corresponding author: 刘树新, E-mail:liushuxin11@126.com

计量

出版历程

Predicting Missing Links of Complex Network via Effective Common Neighbors

doi: 10.3969/j.issn.1001-0548.2019.03.020

国家数字交换系统工程技术研究中心 郑州 450002

通讯作者: 刘树新, E-mail:liushuxin11@126.com

English Abstract

Predicting Missing Links of Complex Network via Effective Common Neighbors

National Digital Switching System Engineering and Technological R & D Center Zhengzhou 450002

Corresponding author: 刘树新, E-mail:liushuxin11@126.com

全文HTML

目录

期刊在线

编辑办公

友情链接

作者简介:
王凯(1980-), 男, 博士生, 副研究员, 主要从事社会网络分析方面的研究

国家数字交换系统工程技术研究中心郑州 450002