留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

Predicting Missing Links of Complex Network via Effective Common Neighbors

Kai WANG Shu-xin LIU Hong-tao YU Xing LI

王凯, 刘树新, 于洪涛, 李星. 基于共同邻居有效性的复杂网络链路预测算法[J]. 电子科技大学学报, 2019, 48(3): 432-439. doi: 10.3969/j.issn.1001-0548.2019.03.020
引用本文: 王凯, 刘树新, 于洪涛, 李星. 基于共同邻居有效性的复杂网络链路预测算法[J]. 电子科技大学学报, 2019, 48(3): 432-439. doi: 10.3969/j.issn.1001-0548.2019.03.020
WANG Kai, LIU Shu-xin, YU Hong-tao, LI Xing. Predicting Missing Links of Complex Network via Effective Common Neighbors[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(3): 432-439. doi: 10.3969/j.issn.1001-0548.2019.03.020
Citation: WANG Kai, LIU Shu-xin, YU Hong-tao, LI Xing. Predicting Missing Links of Complex Network via Effective Common Neighbors[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(3): 432-439. doi: 10.3969/j.issn.1001-0548.2019.03.020

基于共同邻居有效性的复杂网络链路预测算法

doi: 10.3969/j.issn.1001-0548.2019.03.020
基金项目: 国家自然科学基金创新研究群体项目(61521003)
详细信息
    作者简介:

    王凯(1980-), 男, 博士生, 副研究员, 主要从事社会网络分析方面的研究

  • 中图分类号: TP391;N94

Predicting Missing Links of Complex Network via Effective Common Neighbors

More Information
  • 摘要: 链路预测旨在预测网络中的缺失连边,对于实际网络演化机制的了解具有重要意义。虽然现有研究已经提出了很多相似性指标,但它们都忽视了不同网络结构下共同邻居的有效性,而局部拓扑结构信息尤其是共同邻居结构在计算节点间相似性中发挥重要作用。考虑到共同邻居周围局部拓扑信息,该文提出了一种高效共同邻居指标。该指标首先分析了共同邻居所有连边的有效性,分别从端点两侧量化了节点的有效性;然后,通过分析共同邻居节点拓扑有效性对两侧资源分配过程的影响刻画节点间相似性。15个实际网络数据实验表明,相比现有经典的9种方法,所提方法具有较高的预测精度。
  • Figure  1.  Effectiveness of common neighbors between two endpoints.

    Figure  2.  Precision curves of ECN index with the change of parameter for 15 real networks. Each precision point is the average of 20 realizations

    Table  1.   Basic topological features of real networks

    Data $\left| V \right|$ $\left| E \right|$ $\langle k\rangle $ $\langle d\rangle $ r C H
    Jazz 198 2 742 27.7 2.24 0.02 0.633 1.4
    CE 297 2 148 14.46 2.46 -0.163 0.308 1.8
    Kohonen 3 704 12 683 3.4 5.64 -0.121 0.252 11.2
    USAir 332 2 128 12.81 2.74 -0.208 0.749 3.36
    Hamster 1 858 12 534 13.49 3.39 -0.085 0.090 3.36
    Metabolic 453 2 025 8.94 2.67 -0.226 0.647 4.49
    Email 167 5 784 69.26 1.87 -0.295 0.541 1.66
    Yeast 2 375 11 693 9.85 5.10 0.469 0.378 3.48
    PB 1 222 16 717 27.36 2.74 -0.221 0.361 2.97
    Flight 2 939 30 501 20.75 4.18 0.051 0.255 5.22
    SciMet 3 084 10 399 6.74 4.18 -0.033 0.151 2.78
    FWEW 69 880 25.51 1.64 -0.298 0.552 1.27
    Router 5 022 6 258 2.49 5.99 -0.138 0.033 5.5
    Power 4 941 6 594 2.67 18.9 0.003 0.107 1.45
    FWFB 128 2 075 32.42 1.78 -0.112 0.335 1.24
    下载: 导出CSV

    Table  2.   Comparison of the precision value between ECN (optimal value) and some similarity indices. Each value is the average of 20 realizations, each of which corresponds to an independent division of ${E^T}$and ${E^P}$

    Data CN RA AA CAR LPa LPb Katza Katzb ACT Cos+ SPM ECN
    Jazz 0.810 0.809 0.830 0.851 0.798 0.775 0.798 0.740 0.258 0.356 0.895 0.861
    CE 0.129 0.131 0.137 0.129 0.138 0.138 0.138 0.138 0.064 0.069 0.207 0.210
    Kohonen 0.157 0.145 0.154 0.211 0.159 0.159 0.159 0.155 0.010 0.020 0.089 0.304
    USAir 0.611 0.636 0.632 0.612 0.615 0.612 0.615 0.604 0.492 0.104 0.589 0.676
    Hamster 0.015 0.004 0.011 0.037 0.017 0.054 0.017 0.075 0.089 0.020 0.898 0.186
    Metabolic 0.206 0.305 0.253 0.195 0.201 0.200 0.201 0.200 0.120 0.110 0.445 0.456
    Email 0.724 0.733 0.727 0.691 0.726 0.722 0.726 0.714 0.633 0.637 0.809 0.889
    Yeast 0.668 0.495 0.693 0.670 0.671 0.733 0.670 0.721 0.565 0.239 0.699 0.876
    PB 0.431 0.253 0.389 0.472 0.436 0.465 0.436 0.467 0.127 0.342 0.669 0.522
    Flight 0.511 0.354 0.443 0.631 0.521 0.559 0.522 0.554 0.344 0.030 0.127 0.646
    SciMet 0.149 0.130 0.147 0.165 0.149 0.150 0.149 0.147 0.000 0.000 0.079 0.203
    FWEW 0.147 0.163 0.154 0.142 0.158 0.189 0.158 0.197 0.271 0.000 0.543 0.387
    Router 0.110 0.092 0.113 0.109 0.132 0.132 0.131 0.130 0.000 0.019 0.015 0.306
    Power 0.130 0.071 0.097 0.131 0.148 0.148 0.146 0.146 0.000 0.086 0.017 0.178
    FWFB 0.090 0.087 0.086 0.089 0.096 0.133 0.096 0.146 0.178 0.030 0.718 0.378
    aParameter $\alpha {\rm{ = 0}}{\rm{.001}}$. bParameter$\alpha {\rm{ = 0}}{\rm{.01}}$.
    下载: 导出CSV
  • [1] BARABÁSI A L. Scale-free networks:A decade and beyond[J]. Science, 2009, 325(5939):412-413. doi:  10.1126/science.1173299
    [2] LI K, LI H J, WANG H. Situation analysis of relationship in social networks based on link entropy[J]. Modern Physics Letters B, 2015, 29(13):1550061. doi:  10.1142/S021798491550061X
    [3] BULLMORE E, SPORNS O. Complex brain networks:Graph theoretical analysis of structural and functional systems[J]. Nature Reviews Neuroscience, 2009, 10(3):186-198. doi:  10.1038/nrn2575
    [4] SCHWEITZER F, FAGIOLO G, SORNETTE D, et al. Economic networks:The new challenges[J]. Science, 2009, 325(5939):422-425. doi:  10.1126/science.1173644
    [5] LIN J, BAN Y. Complex network topology of transportation systems[J]. Transport Reviews, 2013, 33(6):658-685. doi:  10.1080/01441647.2013.848955
    [6] WANG P, XU B W, WU Y R, et al. Link prediction in social networks:The state-of-the-art[J]. Science China Information Sciences, 2015, 58(1):1-38. http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_1303.7264
    [7] VON M C, JENSEN L J, SNEL B, et al. STRING:Known and predicted protein-protein associations, integrated and transferred across organisms[J]. Nucleic Acids Research, 2005, 33(suppl.1):433-437. http://d.old.wanfangdata.com.cn/Periodical/jsjkxjsxb-e201706006
    [8] SCELLATO S, NOULAS A, MASCOLO C. Exploiting place features in link prediction on location-based social networks[C]//Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Diego, CA, USA: ACM, 2011: 1046-1054.
    [9] HOLLAND P W, LASKEY K B, LEINHARDT S. Stochastic blockmodels:First steps[J]. Social Networks, 1983, 5(2):109-137. doi:  10.1016/0378-8733(83)90021-7
    [10] LIU S, XU X F, XIE R G, et al. Anomalous heat conduction and anomalous diffusion in low dimensional nanoscale systems[J]. The European Physical Journal B, 2012, 85(10):337. doi:  10.1140/epjb/e2012-30383-8
    [11] LÜ L, ZHOU T. Link prediction in complex networks:A survey[J]. Physica A:Statistical Mechanics and its Applications, 2011, 390(6):1150-1170. doi:  10.1016/j.physa.2010.11.027
    [12] LORRAIN F, WHITE H C. Structural equivalence of individuals in social networks[J]. Social Networks. 1977(1):67-98.
    [13] ZHOU T, LÜ L, ZHANG Y C. Predicting missing links via local information[J]. The European Physical Journal B, 2009, 71(4):623-630. doi:  10.1140/epjb/e2009-00335-8
    [14] ADAMIC L A, ADAR E. Friends and neighbors on the web[J]. Social Networks, 2003, 25(3):211-230. doi:  10.1016/S0378-8733(03)00009-1
    [15] LÜ L, JIN C H, ZHOU T. Similarity index based on local paths for link prediction of complex networks[J]. Physical Review E, 2009, 80(4):046122. doi:  10.1103/PhysRevE.80.046122
    [16] LIU S, JI X, LIU C, et al. Extended resource allocation index for link prediction of complex network[J]. Physica A:Statistical Mechanics and its Applications, 2017, 479:174-183. doi:  10.1016/j.physa.2017.02.078
    [17] LIU S, JI X, LIU C, et al. Similarity indices based on link weight assignment for link prediction of unweighted complex networks[J]. International Journal of Modern Physics B, 2017, 31(2):1650254. doi:  10.1142/S0217979216502544
    [18] YAO Y, ZHANG R, YANG F, et al. Link prediction based on local weighted paths for complex networks[J]. International Journal of Modern Physics C, 2017, 28(4):1750053. doi:  10.1142/S012918311750053X
    [19] MA Y, CHENG G, LIU Z, et al. Clustering-based link prediction in scientific coauthorship networks[J]. International Journal of Modern Physics C, 2017, 28(06):1750082. doi:  10.1142/S0129183117500826
    [20] FENG X, ZHAO J C, XU K. Link prediction in complex networks:A clustering perspective[J]. The European Physical Journal B, 2012, 85(1):1-3. doi:  10.1140/epjb/e2011-20818-1
    [21] LÜ L, PAN L, ZHOU T, et al. Toward link predictability of complex networks[J]. Proceedings of the National Academy of Sciences, 2015, 112(8):2325-2330. doi:  10.1073/pnas.1424644112
    [22] KATZ L. A new status index derived from sociometric analysis[J]. Psychometrika, 1953, 18(1):39-43. doi:  10.1007/BF02289026
    [23] LEICHT E A, HOLME P, NEWMAN M E J. Vertex similarity in networks[J]. Physical Review E, 2006, 73(2):026120. doi:  10.1103/PhysRevE.73.026120
    [24] KLEIN D J, RANDIĆ M. Resistance distance[J]. Journal of Mathematical Chemistry, 1993, 12(1):81-95. doi:  10.1007/BF01164627
    [25] FOUSS F, PIROTTE A, RENDERS J M, et al. Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation[J]. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3):355-369. doi:  10.1109/TKDE.2007.46
    [26] CANNISTRACI C V, ALANIS-LOBATO G, RAVASI T. From link-prediction in brain connectomes and protein interactomes to the local-community-paradigm in complex networks[J]. Scientific Reports, 2013, 3:1613-1618. doi:  10.1038/srep01613
    [27] HERLOCKER J L, KONSTAN J A, TERVEEN L G, et al. Evaluating collaborative filtering recommender systems[J]. ACM Transactions on Information Systems (TOIS), 2004, 22(1):5-53. doi:  10.1145/963770
    [28] GLEISER P M, DANON L. Community structure in jazz[J]. Advances in Complex Systems, 2003, 6(4):565-573. doi:  10.1142/S0219525903001067
    [29] WATTS D J, STROGATZ S H. Collective dynamics of 'small-world' networks[J]. Nature, 1998, 393(6684):440-445. doi:  10.1038/30918
    [30] ZHANG Q M, LÜ L, WANG W Q, et al. Potential theory for directed networks[J]. PloS One, 2013, 8(2):e55437. http://d.old.wanfangdata.com.cn/OAPaper/oai_arXiv.org_1202.2709
    [31] BATAGELJ V, MRVAR A. Pajek-program for large network analysis[J]. Connections, 1998, 21(2):47-57. http://cn.bing.com/academic/profile?id=ae97bbc52dc8eb50567885156695e92f&encoded=0&v=paper_preview&mkt=zh-cn
    [32] LÜ L, PAN L, ZHOU T, et al. Toward link predictability of complex networks[J]. Proceedings of the National Academy of Sciences, 2015, 112(8):2325-2330. doi:  10.1073/pnas.1424644112
    [33] DUCH J, ARENAS A. Community detection in complex networks using extremal optimization[J]. Physical Review E, 2005, 72(2):027104. doi:  10.1103/PhysRevE.72.027104
    [34] MICHALSKI R, PALUS S, KAZIENKO P. Matching organizational structure and social network extracted from email communication[C]//International Conference on Business Information Systems. Berlin, Heidelberg: Springer, 2011: 197-206.
    [35] BU D, ZHAO Y, CAI L, et al. Topological structure analysis of the protein-protein interaction network in budding yeast[J]. Nucleic Acids Research, 2003, 31(9):2443-2450. doi:  10.1093/nar/gkg340
    [36] ADAMIC L A, GLANCE N. The political blogosphere and the 2004 US election: Divided they blog[C]//Proceedings of the 3rd International Workshop on Link Discovery. Chicago, USA: ACM, 2005: 36-43.
    [37] OPSAHL T, AGNEESSENS F, SKVORETZ J. Node centrality in weighted networks:Generalizing degree and shortest paths[J]. Social Networks, 2010, 32(3):245-251. doi:  10.1016/j.socnet.2010.03.006
    [38] ULANOWICZ R E, DEANGELIS D L. Network analysis of trophic dynamics in south florida ecosystems[J]. US Geological Survey Program on the South Florida Ecosystem, 2005, 114:45-48.
    [39] SPRING N, MAHAJAN R, WETHERALL D. Measuring ISP topologies with Rocketfuel[J]. ACM SIGCOMM Computer Communication Review, 2002, 32(4):133-145. doi:  10.1145/964725
    [40] NEWMAN M E J. Assortative mixing in networks[J]. Physical Review Letters, 2002, 89(20):208701. doi:  10.1103/PhysRevLett.89.208701
  • [1] 苏晓萍, 查英华, 曲鸿博.  一种异质图的Lorentz嵌入模型 . 电子科技大学学报, 2023, 52(1): 146-153. doi: 10.12178/1001-0548.2021284
    [2] 谢怡燃, 李国华, 杨波.  基于站点线路数的城市公交网络鲁棒性研究 . 电子科技大学学报, 2022, 51(4): 630-640. doi: 10.12178/1001-0548.2021336
    [3] 方祺娜, 许小可.  基于异质模体特征的社交网络链路预测 . 电子科技大学学报, 2022, 51(2): 274-281. doi: 10.12178/1001-0548.2021181
    [4] 赵娜, 李杰, 王剑, 彭西阳, 景铭, 聂永杰, 郁湧.  基于邻层传播的相对重要节点挖掘方法 . 电子科技大学学报, 2021, 50(1): 121-126. doi: 10.12178/1001-0548.2020283
    [5] 王曦, 许爽, 许小可.  融合用户行为同步指标的链路预测研究 . 电子科技大学学报, 2021, 50(2): 276-284. doi: 10.12178/1001-0548.2020241
    [6] 李治成, 吉立新, 刘树新, 李星, 李劲松.  基于拓扑有效连通路径的有向网络链路预测方法 . 电子科技大学学报, 2021, 50(1): 127-137. doi: 10.12178/1001-0548.2020220
    [7] 李艳丽, 周涛.  链路预测中的局部相似性指标 . 电子科技大学学报, 2021, 50(3): 422-427. doi: 10.12178/1001-0548.2021062
    [8] 曹红艳, 许小可, 许爽.  基于多模体特征的科学家合作预测 . 电子科技大学学报, 2020, 49(5): 766-773. doi: 10.12178/1001-0548.2019173
    [9] 朱军芳, 陈端兵, 周涛, 张千明, 罗咏劼.  网络科学中相对重要节点挖掘方法综述 . 电子科技大学学报, 2019, 48(4): 595-603. doi: 10.3969/j.issn.1001-0548.2019.04.018
    [10] 吴宗柠, 樊瑛.  复杂网络视角下国际贸易研究综述 . 电子科技大学学报, 2018, 47(3): 469-480. doi: 10.3969/j.issn.1001-0548.2018.03.023
    [11] 朱为华, 刘凯, 闫小勇, 汪明, 吴金闪.  识别流网络关键节点的虚拟外界投入产出分析法 . 电子科技大学学报, 2018, 47(2): 292-297. doi: 10.3969/j.issn.1001-0548.2018.02.021
    [12] 张海霞, 吕振, 张传亭, 袁东风.  一种引入加权异构信息的改进协同过滤推荐算法 . 电子科技大学学报, 2018, 47(1): 112-116, 152. doi: 10.3969/j.issn.1001-0548.2018.01.017
    [13] 郭婷婷, 赵承业.  异常链路分析在电力网络恢复中的应用 . 电子科技大学学报, 2016, 45(5): 854-859. doi: 10.3969/j.issn.1001-0548.2016.05.024
    [14] 张昌利, 龚建国, 闫茂德.  基于复杂网络的社会化标签语义相似度分析 . 电子科技大学学报, 2012, 41(5): 642-648. doi: 10.3969/j.issn.1001-0548.2012.05.001
    [15] 朱大勇, 张新丽, 李树全.  利用局部拓扑信息发现模糊社团结构 . 电子科技大学学报, 2011, 40(1): 73-79. doi: 10.3969/j.issn.1001-0548.2011.01.014
    [16] 王文强, 张千明.  链路预测的网络演化模型评价方法 . 电子科技大学学报, 2011, 40(2): 174-179. doi: 10.3969/j.issn.1001-0548.2011.02.003
    [17] 吕琳媛.  复杂网络链路预测 . 电子科技大学学报, 2010, 39(5): 651-661. doi: 10.3969/j.issn.1001-0548.2010.05.002
    [18] 郑永红, 彭世, 黄小丽, 靳映霞.  流动相似律在气流式倾角传感器中的应用 . 电子科技大学学报, 2007, 36(5): 931-934.
    [19] 耿技, 印鉴.  改进的共享型最近邻居聚类算法 . 电子科技大学学报, 2006, 35(1): 70-72.
    [20] 杨清夙, 游志胜, 张先玉.  基于豪斯多夫距离的快速多人脸检测算法 . 电子科技大学学报, 2004, 33(4): 407-409.
  • 加载中
图(2) / 表(2)
计量
  • 文章访问数:  4589
  • HTML全文浏览量:  1559
  • PDF下载量:  129
  • 被引次数: 0
出版历程
  • 收稿日期:  2018-03-21
  • 修回日期:  2018-11-15
  • 刊出日期:  2019-05-30

Predicting Missing Links of Complex Network via Effective Common Neighbors

doi: 10.3969/j.issn.1001-0548.2019.03.020
    通讯作者: 刘树新, E-mail:liushuxin11@126.com
  • 中图分类号: TP391;N94

摘要: 链路预测旨在预测网络中的缺失连边,对于实际网络演化机制的了解具有重要意义。虽然现有研究已经提出了很多相似性指标,但它们都忽视了不同网络结构下共同邻居的有效性,而局部拓扑结构信息尤其是共同邻居结构在计算节点间相似性中发挥重要作用。考虑到共同邻居周围局部拓扑信息,该文提出了一种高效共同邻居指标。该指标首先分析了共同邻居所有连边的有效性,分别从端点两侧量化了节点的有效性;然后,通过分析共同邻居节点拓扑有效性对两侧资源分配过程的影响刻画节点间相似性。15个实际网络数据实验表明,相比现有经典的9种方法,所提方法具有较高的预测精度。

English Abstract

王凯, 刘树新, 于洪涛, 李星. 基于共同邻居有效性的复杂网络链路预测算法[J]. 电子科技大学学报, 2019, 48(3): 432-439. doi: 10.3969/j.issn.1001-0548.2019.03.020
引用本文: 王凯, 刘树新, 于洪涛, 李星. 基于共同邻居有效性的复杂网络链路预测算法[J]. 电子科技大学学报, 2019, 48(3): 432-439. doi: 10.3969/j.issn.1001-0548.2019.03.020
WANG Kai, LIU Shu-xin, YU Hong-tao, LI Xing. Predicting Missing Links of Complex Network via Effective Common Neighbors[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(3): 432-439. doi: 10.3969/j.issn.1001-0548.2019.03.020
Citation: WANG Kai, LIU Shu-xin, YU Hong-tao, LI Xing. Predicting Missing Links of Complex Network via Effective Common Neighbors[J]. Journal of University of Electronic Science and Technology of China, 2019, 48(3): 432-439. doi: 10.3969/j.issn.1001-0548.2019.03.020
  • With the development of complex network research, many more real complex systems have been described as complex network[1-5]. Link prediction aims to predict the likelihood that a link exists between two endpoints in complex networks[6]. Predicting missing links plays an important role in discovering the unknown connections in protein-protein interaction networks[7], forecasting the future relationships in online social networks[8], and identifying spurious flights in air transportation networks[9].

    Based on evolution mechanisms of complex networks, a lot of link prediction methods have been proposed. Among these methods, topology-based similarity indices have got a great success due to their high performance and low complexity[10]. Similarity indices can be classified as local indices and global indices according to complexity and length of paths[11]. Considering the local information including common neighbors and short paths, local indices, such as common neighbor (CN) index[12]directly using the number of common neighbors, resource allocation (RA) index[13]and Adamic-Adar (AA) index[14] weighting the common neighbors by their node degree, local path (LP) index[15]and extended resource allocation (ERA)[16]index adding paths with length 3 to CN and RA respectively, perform very well in real networks with lower complexity. Ref.[17] proposed a local weighted method, which can improve the prediction accuracy of several similarity indices (including CN, AA, RA, Salton, Jaccard and LP). Similarly, Ref.[18]proposed a local weighted path (LWP) index to differentiate the contributions between paths. Considering the clustering coefficient of nodes, some clustering-based methods[19-20]have been proposed to improve the prediction accuracy. Ref.[21]believed that the regularity of a network was reflected in the consistency of structural features before and after a random removal of a small set of links and proposed a universal structural consistency index. To improve the prediction accuracy, many global indices have been proposed based on global paths, including Katz index[22], LHN-II[23], average commute time (ACT)[24] and cosine similarity time (Cos+)[25]. The global indices can achieve a good performance, but most of them are not suitable for large networks due to their higher complexity. Therefore, how to effectively use the local topological information may be a good way to predict the missing links of large-scale networks.

    In the real world, such as human relationship network, if there are more indirect connections between two people and their common friends, they are more likely to be friend. Considering two pairs of endpoints x, y and x', y' with a common neighbor. There are two longer paths between x', y', and two indirect connections between x, y and their common neighbor. According to CN, RA, AA and LP index, the similarity between two pairs of nodes are the same, though the links between endpoints x and y are more likely to exist. Because there are more indirect connections between x, y and their common neighbor, common neighbor of x and y is more effective in similarity calculating. In a word, the topology information related to common neighbors plays an important role in link prediction.

    Based on the above discussion, an effective common neighbor index (ECN) is proposed, which weights the common neighbors by the local topology information around them. With a parameter adjusting the strength of effective common neighbors, ECN can achieve the optimal prediction accuracy for different networks. Empirical study has shown that the ECN index proposed can significantly improve the prediction accuracy, compared with 9 similarity indices in 15 real networks.

    • Several classical similarity indices are introduced here, including 5 local indices: CN, RA, AA, CAR and LP index, and 4 global indices: Katz, ACT, Cos+ and SPM index. A brief description is shown as follows:

      1) CN index[12]measures the similarity of two endpoints by the number of their common neighbors:

      $$s_{xy}^{{\rm{CN}}} = |{\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)|$$ (1)

      where ${\mathit{\Gamma}} (x)$is the set of neighbors of node x, and ${\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)$denotes the common neighbors between nodes x and y.

      2) RA index[13]publishes the common neighbors with big degree:

      $$s_{xy}^{{\rm{RA}}} = \sum\limits_{z \in |{\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)|} {\frac{1}{{{k_z}}}} $$ (2)

      where ${k_z}$ is the node degree of node $z$.

      3) AA index[14]weights the common neighbors by nodes degree:

      $$s_{xy}^{{\rm{AA}}} = \sum\limits_{z \in |{\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)|} {\frac{1}{{\log ({k_z})}}} $$ (3)

      4) CAR index[26]believes that two nodes are more likely to link together if their common-first-neighbors are members of a strongly inner-linked cohort:

      $$s_{xy}^{{\rm{CAR}}} = \left| {{\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)} \right|\sum\limits_{z \in |{\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)|} {\frac{{\left| {\gamma (z)} \right|}}{2}} $$ (4)

      where $\left| {\gamma (z)} \right|$refers to the sub-set of neighbors of $z$ that are also common neighbors of nodes$x$and$y$.

      5) LP index[15]adds the path information with length 3 to CN:

      $${\mathit{\boldsymbol{S = }}}{{\mathit{\boldsymbol{A}}}^{\rm{2}}}{\mathit{\boldsymbol{ + }}}\alpha {{\mathit{\boldsymbol{A}}}^{\rm{3}}}$$ (5)

      where $\alpha $ is the parameter and A is the adjacency matrix.

      6) Katz index[22]considers all the path information between two endpoints and gives more weights to the shorter paths:

      $$s_{xy}^{{\rm{Katz}}} = \sum\limits_{l = 1}^\infty {{\varepsilon ^l}|{\rm{path}}_{xy}^l|} $$ (6)

      where ${\rm{path}}_{xy}^l$ is the set of paths with length l between x and y, and $\varepsilon $ is the adjust parameter.

      7) ACT[24]measures similarity by calculating the average steps of random walk between two endpoints:

      $$s_{xy}^{{\rm{ACT}}} = \frac{1}{{l_{xx}^ + + l_{yy}^ + - 2l_{xy}^ + }}$$ (7)

      where $l_{xy}^ + $ is the corresponding entry in ${{\mathit{\boldsymbol{L}}}^{\mathit{\boldsymbol{ + }}}}$, and ${{\mathit{\boldsymbol{L}}}^{\mathit{\boldsymbol{ + }}}}$ denotes the pseudo-inverse of matrix ${\mathit{\boldsymbol{L}}} = {\mathit{\boldsymbol{D}}} - {\mathit{\boldsymbol{A}}}$.

      8) Cos+[25]is based on ${{\mathit{\boldsymbol{L}}}^{\mathit{\boldsymbol{ + }}}}$ by calculating similarity of two vectors:

      $$s_{xy}^{{\rm{Cos + }}} = \frac{{v_x^{\rm{T}}{v_y}}}{{|{v_x}||{v_y}|}} = \frac{{l_{xy}^ + }}{{\sqrt {l_{xx}^ + l_{yy}^ + } }}$$ (8)

      9) SPM[21]measures the structural consistency of a network by perturbing its adjacency matrix and observing the change of eigenvalues provided the fixed eigenvectors:

      $$s_{xy}^{{\rm{SPM}}} = \sum\limits_{k = 1}^N {({\lambda _k} + \Delta {\lambda _k})} {x_k}x_k^{\rm{T}}$$ (9)

      where ${\lambda _k}$ and ${x_k}$ are the eigenvalue and the corresponding orthogonal and normalized eigenvector for${A^{\rm{R}}}$, respectively.

    • Considering an unweighted undirected network G(V, E), V and E are the sets of nodes and links respectively. A score ${s_{xy}}$ is assigned to each pair of nodes x and y by prediction algorithms and it can be a measure of similarity between two endpoints. For each nonexistent link of network, this score represents the likelihood that the link exists.

      Definition 1 Considering a pair of nodes, $x,y \in V$, node z is one of their common neighbors. Among all the links connected to node z, if there are more links on the local paths (here, length $l \leqslant 2$) between endpoints and node z, it will be more effective for common neighbor z in calculating the similarity of two endpoints. From the influence of paths connected to node x, the effectiveness of common neighbors between endpoints x and y can be defined as:

      $${\rm{IF}}(y|x) = {\sum\limits_{z \in {\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)} {\left( {\frac{{{c_{xz}}}}{{{k_z}}}} \right)} ^\delta }$$ (10)

      where ${c_{xz}}$ denotes the number of different paths between node x and z with length no more than 2, and $\delta $ is the adjust parameter.

      Definition 2 On an unweighted undirected network G(V, E). The effective common neighbor index of endpoints x and y, is defined as:

      $$\begin{gathered} s_{xy}^{{\rm{ECN}}} = {\rm{IF}}(y|x) + {\rm{IF}}(x|y) = \\ \sum\limits_{z \in {\mathit{\Gamma}} (x) \cap {\mathit{\Gamma}} (y)} {\left( {\frac{{c_{xz}^\delta {\rm{ + }}c_{yz}^\delta }}{{k_z^\delta }}} \right)} \\ \end{gathered} $$ (11)

      Here, the $\delta (\delta \geqslant 0)$ aims to adjust the strength of effectiveness. When $\delta {\rm{ = }}0$, the ECN index is the same as the CN index 12. TakeFig.1for example, the similarity of two endpoints is $s_{xy}^{{\rm{ECN}}} = ({3^\delta } + {2^\delta })/{9^\delta }$.

      Figure 1.  Effectiveness of common neighbors between two endpoints.

    • To quantify the prediction accuracy, the standard metric called precision[27]is introduced. Precision is defined as the ratio of relevant links to the top L predicted links (here we set L=100). If there are m relevant links appeared in the probe set, the precision value is:

      $${\rm{Precision}} = \frac{m}{L}$$ (12)

      Clearly, the higher precision value means higher prediction accuracy.

      Our experiments are performed on fifteen different real networks: 1) Jazz[28]:the network of Jazz musicians. 2) Caenorhabditis elegans (CE)[29]: the neural network of the nematode worm. 3) Kohonen[30]: the citing network of articles with topic "self-organizing maps" or references to "Kohonen T". 4) USAir[31]: the USA airline network. 5) Hamster[32]: a friendship network of users on the website hamsterster.com. 6) Metabolic[33]: the metabolic network of Caenorhabditis elegans. 7) Email network (Email)[34]: the internal email communication network between employees of a mid-sized manufacturing company. 8) Yeast PPI (Yeast)[35]: a protein-protein interaction network of yeast. 9) Political blogs (PB)[36]: a network of US political blogs. 10) Openflights (Flight)[37]: the network of flights between air-ports of the world. 11) Scientometrics (SciMet)[30]: the citing network of the articles from scientometrics. 12) Food Web of Everglades ecosystem (FWEW)[38]: the network of carbon exchanges occurring during the wet season in the Everglades Graminoid Marshes of South Florida. 13) Route topology of internet (Router)[39]: the topology of internet in router-level. 14) Power Grid (Power)[29]: an electrical power grid network of western US; 15) Food Web of Florida Bay ecosystem (FWFB)[38]: the network of carbon exchanges occurring during the wet season in Florida Bay.

      Table 1 shows the basic topological features of these networks. $\left| V \right|$is the number of nodes, $\left| E \right|$ denotes the number of links, $\left\langle k \right\rangle $ indicates the average degree. $\left\langle d \right\rangle $denotes the average distance. r is the assortativity coefficient[40]. C represents the clustering coefficient[29]. H is the degree heterogeneity.

      Table 1.  Basic topological features of real networks

      Data $\left| V \right|$ $\left| E \right|$ $\langle k\rangle $ $\langle d\rangle $ r C H
      Jazz 198 2 742 27.7 2.24 0.02 0.633 1.4
      CE 297 2 148 14.46 2.46 -0.163 0.308 1.8
      Kohonen 3 704 12 683 3.4 5.64 -0.121 0.252 11.2
      USAir 332 2 128 12.81 2.74 -0.208 0.749 3.36
      Hamster 1 858 12 534 13.49 3.39 -0.085 0.090 3.36
      Metabolic 453 2 025 8.94 2.67 -0.226 0.647 4.49
      Email 167 5 784 69.26 1.87 -0.295 0.541 1.66
      Yeast 2 375 11 693 9.85 5.10 0.469 0.378 3.48
      PB 1 222 16 717 27.36 2.74 -0.221 0.361 2.97
      Flight 2 939 30 501 20.75 4.18 0.051 0.255 5.22
      SciMet 3 084 10 399 6.74 4.18 -0.033 0.151 2.78
      FWEW 69 880 25.51 1.64 -0.298 0.552 1.27
      Router 5 022 6 258 2.49 5.99 -0.138 0.033 5.5
      Power 4 941 6 594 2.67 18.9 0.003 0.107 1.45
      FWFB 128 2 075 32.42 1.78 -0.112 0.335 1.24

      Each original data is randomly divided into training set containing 90% of links, and the probe set containing the remaining 10%.

    • Firstly, we report the impact of adjust parameter$\delta $ on prediction accuracy in different networks, and each precision value is the average of 20 realizations. As shown in Fig. 2, all the precision curves can achieve the optimal values when $\delta > 1$, and each of them exceeds the precision value at $\delta = 1$ without enhancement. It demonstrates that higher effective neighbors have a much greater influence on the similarity between endpoints than expected and the contribution of different effective common neighbors for similarity is not linear (for different networks, its power exponent is the optimal $\delta $). The precision values show a rising trend with an increase of $\delta $ before reaching the highest point, which indicates that the effective common neighbors can improve the prediction accuracy compared with CN (when $\delta = 0$). The measurement of effective common neighbors and their enhancement are both beneficial to the description of similarity between two endpoints. For some of datasets, the precision curves decline gradually after peak point. However, the precision values are stable around the maximum value in some networks such as metabolic, Email, Yeast, FWEW, Router and FWFB. In these networks, the role that the lower effective common neighbors play is very limited in link prediction of missing links.

      Table 2 shows the comparison between ECN index (optimal values) and 9 similarity indices in 15 networks, and each precision data is the average of 20 realizations. In 10 out of 15 datasets, ECN index obtains the best performance, but worse than the SPM index in Jazz, Hamster, PB, FWEW and FWFB. For some lower clustering network such as Hamster and Router, the improvement of precision is significant for ECN index, which indicates that common neighbors with different topological structures perform different roles in calculating the similarity between endpoints in these lower clustering networks. With the consideration of longer path information, the precision values of path-based indices (LP and Katz) are higher than neighbor-based indices (CN, RA and AA) in most of dataset. However, in some high clustering network such as Jazz, USAir, Metabolic and Email, the neighbor-weighted indices (RA and AA) perform better than most of the global indices. Also, the local- community-based index CAR can perform better than some global indices in Jazz, Kohonen, PB, Flight and SciMet. Having considered the effectiveness of common neighbors, ECN index can get a good performance by fusing some topology information used in neighbor-weighted indices and path-based indices. With the strength parameter, ECN index enhance the performance of effectiveness by adjusting the influence of different effective common neighbors on similarity in different networks. Having considered the structural consistency of a network, SPM performs very well for all the datasets. Surprisingly, some global indices (ACT and Cos+) perform worse than expected, and their precision values are close to zero in SciMet, Router, and Power networks. To summarize, the common neighbors between endpoints seem to be more effective for similarity calculating under the standard metric precision, and the consideration of local information around common neighbors can significantly improve the precision value.

      Figure 2.  Precision curves of ECN index with the change of parameter for 15 real networks. Each precision point is the average of 20 realizations

      Table 2.  Comparison of the precision value between ECN (optimal value) and some similarity indices. Each value is the average of 20 realizations, each of which corresponds to an independent division of ${E^T}$and ${E^P}$

      Data CN RA AA CAR LPa LPb Katza Katzb ACT Cos+ SPM ECN
      Jazz 0.810 0.809 0.830 0.851 0.798 0.775 0.798 0.740 0.258 0.356 0.895 0.861
      CE 0.129 0.131 0.137 0.129 0.138 0.138 0.138 0.138 0.064 0.069 0.207 0.210
      Kohonen 0.157 0.145 0.154 0.211 0.159 0.159 0.159 0.155 0.010 0.020 0.089 0.304
      USAir 0.611 0.636 0.632 0.612 0.615 0.612 0.615 0.604 0.492 0.104 0.589 0.676
      Hamster 0.015 0.004 0.011 0.037 0.017 0.054 0.017 0.075 0.089 0.020 0.898 0.186
      Metabolic 0.206 0.305 0.253 0.195 0.201 0.200 0.201 0.200 0.120 0.110 0.445 0.456
      Email 0.724 0.733 0.727 0.691 0.726 0.722 0.726 0.714 0.633 0.637 0.809 0.889
      Yeast 0.668 0.495 0.693 0.670 0.671 0.733 0.670 0.721 0.565 0.239 0.699 0.876
      PB 0.431 0.253 0.389 0.472 0.436 0.465 0.436 0.467 0.127 0.342 0.669 0.522
      Flight 0.511 0.354 0.443 0.631 0.521 0.559 0.522 0.554 0.344 0.030 0.127 0.646
      SciMet 0.149 0.130 0.147 0.165 0.149 0.150 0.149 0.147 0.000 0.000 0.079 0.203
      FWEW 0.147 0.163 0.154 0.142 0.158 0.189 0.158 0.197 0.271 0.000 0.543 0.387
      Router 0.110 0.092 0.113 0.109 0.132 0.132 0.131 0.130 0.000 0.019 0.015 0.306
      Power 0.130 0.071 0.097 0.131 0.148 0.148 0.146 0.146 0.000 0.086 0.017 0.178
      FWFB 0.090 0.087 0.086 0.089 0.096 0.133 0.096 0.146 0.178 0.030 0.718 0.378
      aParameter $\alpha {\rm{ = 0}}{\rm{.001}}$. bParameter$\alpha {\rm{ = 0}}{\rm{.01}}$.
    • There are a lot of similarity indices considering local information between endpoints such as common neighbors and shorter paths. However, most of them ignore the effectiveness of common neighbors. Based on local information, an ECN index is proposed, which weights the common neighbors by the local topology information around them. Empirical results demonstrate that ECN index can significantly improve the prediction accuracy in 15 real networks, compared with 7 similarity indices. The measurement of effective common neighbors and their enhancement are both beneficial to the improvement of prediction accuracy or description of similarity between two endpoints. The ECN index is more suitable for the link prediction of networks concerning more about the prediction accuracy of the top L predicted links (such as protein-protein interaction networks). In addition, the complexity of ECN is between $O(N{\left\langle k \right\rangle ^2})$ (CN) and ${\rm{2}}O(N{\left\langle k \right\rangle ^2})$. Therefore, it can also be applied to predict missing links of large-scale networks.

参考文献 (40)

目录

    /

    返回文章
    返回