A Higher-Order Community Detection Algorithm Based on Motif-Based Modularity Optimization

XIAO Jing; ZOU Yucheng; WU Shuang; XU Xiaoke

doi:10.12178/1001-0548.2022111

In order to improve the performance of existing higher-order community detection algorithms, a higher-order community detection algorithm based on motif-based modularity optimization is proposed. By quantifying the number of motifs as the weight between nodes, the higher-order community detection based on motifs is transformed into lower-order weighted network community detection based on edges, and a weighted modularity optimization problem is constructed. Based on the meta-heuristic algorithm as the optimization strategy, the lower-order topology structure and higher-order weight information are comprehensively utilized to design the neighborhood community modification operation and local search operation of nodes, so as to improve the quality of community partitions and prevent the algorithm from falling into local optimum. Experimental results on synthetic and real-world networks show that the utilization of motifs is helpful to improve the detection performance under the condition of fuzzy community structure. The proposed algorithm can effectively realize motif-based community detection and has certain advantages in accuracy and quality compared with existing typical motif-based algorithms, which helps to deepen the understanding of the higher-order structure and functional characteristics of complex networks.

HTML

社区结构是复杂网络中最重要的结构特性之一，广泛存在于不同类型的真实世界网络中，如生物网络、金融网络、社交网络等^[1-2]。社区检测不仅能够揭示网络的中尺度拓扑结构特征，而且有助于深入分析网络的功能及动力学特性^[3]，因此在节点重要性识别^[4]、神经元功能分区^[5]和疾病传播预测^[6]等领域具有重要的实际应用价值。

近年来涌现出大量基于不同知识背景的社区检测方法^[7-8]，能够有效识别出网络中隐含的复杂社区结构。然而，现有方法通常仅关注网络中的节点和连边等低阶结构信息，而忽略了网络中广泛存在的高阶组织结构。真实世界网络中包含丰富的高阶组织结构，即网络中规模较小但统计意义上显著的子图，如网络模体(motifs)^[9-10]等。模体广泛存在于社交网络、生物网络等真实网络中，是最普遍的高阶连接模式和交互模式，被认为是复杂网络中的基本拓扑和功能单元^[9-11]。如社交网络和生物网络中通常包含大量的三角形模体^[9,11]，对网络结构特性和功能特性均有重要影响。基于网络模体可揭示出复杂网络中新的高阶社区结构，具有社区内模体连接紧密、社区间模体连接稀疏的特性^[9,11]。基于模体的高阶社区检测，为复杂网络中尺度结构分析提供了新视角，可展现出节点间更丰富且有意义的拓扑连接关系，从而有助于加深对网络功能特性的理解。如在线虫神经元网络中，基于“bi-fan”模体可识别出额叶部分中包含20个神经元的运动感觉控制功能模块^[9]。上述具有特殊功能特性的社区结构是传统社区检测方法难以获得的。

近年来，基于模体的复杂网络高阶社区检测逐渐受到关注^[9-16]。文献[9]通过有效整合网络模体分析与社区检测，构建出基于模体的广义高阶聚类框架，并通过扩展的谱图聚类方法获得近似最优的高阶社区划分。此后，多种基于模体的社区检测算法被相继提出，包括基于模体的图嵌入方法LinLog-Motif^[13]、基于模体的边增强方法EdMot^[14]以及基于模体的标签传播方法MWLP^[15]等。上述典型方法均能挖掘、表示并利用网络模体信息，有效识别出复杂网络中的高阶社区结构。然而，基于特定高阶社区性能评价指标，并通过最优化方法获得高阶社区划分，本质上属于典型的NP难问题^[9]且求解较为困难。现有方法中采用的启发式或谱优化方法容易陷入局部最优，从而仅获得局部最优高阶社区划分，使复杂网络中高阶社区结构分析的精确性受到制约。此外，基于模体的高阶社区检测不仅包含基于模体的高阶结构信息，还涉及到节点和连边等低阶结构信息，进一步增加了全局最优高阶社区划分检测的难度。最后，部分检测方法还需要预知社区数目^[9,12]，在真实世界网络检测中难以实现。

针对上述问题，本文提出了一种基于模体的模块度优化方法以实现高阶社区检测(motif-based modularity optimization for higher-order community detection, MMHCD)，致力于提高基于模体的社区划分质量。

1. 相关工作

1.1. 基于模体的高阶社区检测

基于模体的高阶社区检测有助于揭示复杂网络的高阶组织结构及功能特性^[9]，近年来逐渐获得广泛关注，涌现出多种典型的代表性方法^[9-16]。如文献[9]构建的模体传导(motif conductance)函数用于评估高阶社区结构质量，并通过扩展的谱图聚类方法获得近似最优的高阶社区划分。该研究提出的广义高阶聚类框架，广泛应用于基于模体的高阶社区检测。在此基础上，文献[12]提出了三角形模体传导函数及基于三角形模体的谱聚类算法TECTONIC。该算法根据网络连边参与三角形模体的数目进行加权，并通过最小割三角形模体数量以获得高阶社区划分。除上述典型谱聚类算法外，文献[13]提出一种基于模体的图嵌入方法LinLog-Motif，在基于模体的加权网络上，通过结合模体的力导引嵌入方法，将加权网络映射到低维空间，并采用K均值聚类获得高阶社区划分。文献[14]提出了一种基于边增强的模体社区检测方法EdMot，在模体超图的前K个最大连通片上，通过添加边集强化原始网络中模体连通部分，再利用经典的Louvain方法^[12]获取基于模体的社区结构。EdMot能有效克服模体社区检测中存在的网络不连通及孤立节点问题。文献[15]提出了一种基于模体的加权标签传播算法MWLP，综合考虑模体高阶结构特征与边低阶结构特征为网络连边加权，并通过标签传播获得基于模体的高阶社区结构。

上述方法虽然对模体信息的处理方式不同，但均已证明能够从复杂网络中有效检测出基于模体的高阶社区结构。然而，上述方法通过优化基于模体的社区评价函数，如模体传导函数^[9,12]，以获得高阶社区结构的过程，本质上属于典型的NP难优化问题，求解较为困难。现有算法中采用的启发式或谱优化方法，存在容易早熟收敛的问题，难以获取全局最优的高阶社区划分。此外，现有算法在模体高阶结构信息利用过程中，容易忽略节点和连边等低阶结构信息，造成孤立节点和连通碎片。最后，部分算法需要预先设定社区数目，如Motif-SC^[9]和TECTONIC^[12]等，在真实网络环境下通常难以实现。

1.2. 基于模块度优化的社区检测

模块度优化是最典型的复杂网络社区检测方法之一，尽管受到分辨率限制问题的影响，但由于具有精确度高、可移植性强等优势，广泛应用于真实世界网络的社区结构发现^[16-20]。模块度优化的NP难特性使其求解较为困难，在过去几十年的研究中，大量元启发式优化算法被采用并作为优化策略^[19-20]，以提高模块度全局最优化性能。如新兴的蝙蝠算法^[21]和果蝇优化算法^[22]等生物启发式算法，以及状态转移算法^[23]和火势蔓延算法^[24]等自然启发式算法。元启发式优化算法由于具有基于种群的并行计算、自组织、有效结合网络结构信息及自动确定社区数目等优点，为模块度优化提供了良好的优化策略支持。

基于元启发式算法的模块度优化方法，已成功地用于求解传统低阶社区的检测问题，但其极少被用于模体高阶社区的检测。分析该类方法的应用主要包括两项关键因素：1)高阶社区检测中的结构单元由节点(或连边)变为模体，因此首先要解决网络中的模体结构信息的挖掘和表示，以构建高阶模块度优化模型；2)利用元启发式算法进行模块度优化的过程中，需有效结合网络中基于模体的高阶结构信息，以及基于节点、连边的低阶结构信息，以保证社区划分的完整性并提高检测性能。

5. 结束语

在典型人工合成网络和10种不同规模及特性的真实世界网络上，对MMHCD算法的社区检测性能进行了实验测试，并与Motif-SC, LinLog-Motif, EdMot和MWLP等基于模体的典型高阶社区检测算法进行对比分析。实验结果表明，相较于其他典型算法，MMHCD算法能够在GN和LFR网络上获得更加精确和稳定的检测结果，且在真实世界网络上同样表现出相对较好的质量性和稳定性。此外，MMHCD算法无需预知网络真实社区数目，且所需控制参数相对较少。

本研究提升了基于模体的网络社区检测质量，拓展了模体理论的应用场景，有助于加深对网络结构和功能特性的理解。未来研究将扩展到有向网络和符号网络等，使提出的检测方法能适应不同类型的网络应用需求，获得基于模体的高质量社区结构。

Reference (33)

[1]	NEWMAN M E J. Community structure in networks[J]. The European Physical Journal B, 2004, 38(2): 321-330.
[2]	BEDI P, SHARMA C. Community detection in social networks[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2016, 6(3): 115-135.
[3]	FORTUNATO S, HRIC D. Community detection in networks: A user guide[J]. Physics Reports, 2016, 659: 1-44.
[4]	LUO J W, WU J, YANG W Y. A relationship matrix resolving model for identifying vital nodes based on community in opportunistic social networks[J]. IEEE Transactions on Emerging Telecommunications Technologies, 2021, 33(1): e4389.
[5]	ZHU H, JIN W, ZHOU J, et a. Nodal memberships to communities of functional brain networks reveal functional flexibility and individualized connectome[J]. Cerebral Cortex, 2021, 31(11): 5090-5106.
[6]	PENG X L, SMALL M, XU X J. Temporal prediction of epidemic patterns in community networks[J]. New Journal of Physics, 2013, 15(11): 113033.
[7]	JAVED M, YOUNIS M, LATIF S, et a. Community detection in networks: A multidisciplinary review[J]. Journal of Network and Computer Applications, 2018, 108: 87-111.
[8]	MITTAL R, BHATIA M. Classification and comparative evaluation of community detection algorithms[J]. Archives of Computational Methods in Engineering, 2021, 28(3): 1417-1428.
[9]	BENSON A R, GLEICH D F, LESKOVEC J. Higher-Order-Organization of complex networks[J]. Science, 2016, 353(6295): 163-166.
[10]	MILO R, SHEN-ORR S, ITZKOVITZ S, et al. Network motifs: Simple building blocks of complex networks[J]. Science, 2002, 298(5594): 824-827.
[11]	HUANG J Y, HOU Y, LI Y S. Efficient community detection algorithm based on higher-order structures in complex networks[J]. Chaos, 2020, 30(2): 023114.
[12]	TSOURAKAKIS C E, PACHOCKI J, MITZENMACHER M. Scalable motif-aware graph clustering[C]//Proc of the 26th Int Conf on World Wide Web. New York: ACM, 2017: 1451-1460.
[13]	LIM S, LEE J G. Motif-Based embedding for graph clustering[J]. Journal of Statistical Mechanics Theory and Experiment, 2016, 12(12): 123401.
[14]	LI P Z, HUANG L, WANG C D, et al. EdMot: An edge enhancement approach for motif-aware community detection[C]//Proc of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. New York: ACM, 2019: 479-487.
[15]	LI P Z, HUANG L, WANG C D, et al. Community detection by motif-aware label propagation[J]. ACM Trans on Knowledge Discovery from Data, 2020, 14(2): 1-19.
[16]	CAO J, BU Z, GAO G, et al. Weighted modularity optimization for crisp and fuzzy community detection in large-scale networks[J]. Physica A: Statistical Mechanics and Its Applications, 2016, 462: 386-395.
[17]	HAQ N F, MORADI M, WANG Z J, et al. Community structure detection from networks with weighted modularity[J]. Pattern Recognition Letters, 2019, 122: 14-22.
[18]	NEWMAN M E J. Analysis of weighted networks[J]. Physical Review E, 2004, 70(5): 056131.
[19]	PIZZUTI C. Evolutionary computation for community detection in networks: A review[J]. IEEE Transactions on Evolutionary Computation, 2017, 22(3): 464-483.
[20]	ATTEA B A, ABBOOD A D, HASAN A A, et al. A review of heuristics and metaheuristics for community detection in complex networks: Current usage, emerging development and future directions[J]. Swarm and Evolutionary Computation, 2021, 63: 100885.
[21]	SONG A, LI M B, DING X H, et al. Community detection using discrete bat algorithm[J]. Iaeng International Journal of Computer Science, 2016, 43(1): 37-43.
[22]	LIU Q, ZHOU B, LI S D, et al. Community detection utilizing a novel multi-swarm fruit fly optimization algorithm with hill-climbing strategy[J]. Arabian Journal for Science and Engineering, 2015, 41(3): 807-826.
[23]	ZHOU X J, YANG K, XIE Y F, et al. A novel modularity-based discrete state transition algorithm for community detection in networks[J]. Neurocomputing, 2019, 334: 89-99.
[24]	PATTANAYAK H S, SANGAL A L, VERMA H K. Community detection in social networks based on fire propagation[J]. Swarm and Evolutionary Computation, 2019, 44: 31-48.
[25]	CHENG M Y, PRAYOGO D. Symbiotic organisms search: A new metaheuristic optimization algorithm[J]. Computers & Structures, 2014, 139(7): 98-112.
[26]	AL-SHARHAN S, OMRAN M G H. An enhanced symbiosis organisms search algorithm: An empirical study[J]. Neural Computing & Applications, 2016, 29(11): 1025-1043.
[27]	AYALA H, KLEIN C, MARIANI V, et al. Multi-Objective symbiotic search algorithm approaches for electromagnetic optimization[J]. IEEE Trans on Magnetics, 2017, 53(6): 7205504.
[28]	YU V F, REDI P, RUSKARTINA E, et al. Symbiotic organisms search and two solution representations for solving the capacitated vehicle routing problem[J]. Applied Soft Computing, 2017, 52: 657-672.
[29]	JIA G B, CAI Z X, MUSOLESI M, et al. Community detection in social and biological networks using differential evolution[C]//Proc of the Int Conf on Learning and Intelligent Optimization. Berlin: Springer, 2012: 71-85.
[30]	KUNEGIS J. Konect: The koblenz network collection [EB/OL]. [2022-04-05]. http://konect.cc/networks.
[31]	ROSSI R A, AHMED N K. An interactive scientific network data repository[EB/OL]. [2022-04-05]. https://networkrepository.com/index.php.
[32]	LANCICHINETTI A, FORTUNATO S, RADICCHI F. Benchmark graphs for testing community detection algorithms[J]. Physical Review E, 2008, 78(4): 046110.
[33]	JOAQUIN D, SALVADOR G, DANIEL M, et al. A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms[J]. Swarm and Evolutionary Computation, 2011, 1(1): 3-18.

数据集	节点数/个	边数/条	网络平均度	网络类型
Karate	34	78	4.69	社交
Macaque	47	505	13.32	生物
Dolphins	62	159	5.13	社交
Polbooks	105	441	8.40	社交
Football	115	613	10.66	社交
Email	1133	5451	4.81	社交
Cora	2708	5429	3.90	科学引文
Facebook	2888	2981	2.06	社交
PowerGrid	4941	6594	2.67	工程
PGP	10680	24316	4.55	通信

数据集	模体	MWLP	EdMot	LinLog-motif	Motif-SC	Motif-DECD	MMHCD
Karate	M₁	0.362(1.8×10⁻²)	0.456(2.63×10⁻²)	0.484(1×10⁰)	0.484(1×10⁰)	0.484(1×10⁰)	0.484(1×10⁰)
Macaque	M₁	0.061(1.53×10⁻²)	0.258(1×10⁰)	0.256(1.19×10⁻⁴)	0.244(1×10⁰)	0.259(2.1×10⁻³)	0.265(1×10⁰)
Dolphins	M₁	0.478(1.3×10⁻²)	0.637(5.98×10⁻³)	0.641(1×10⁰)	0.637(1×10⁰)	0.647(1×10⁰)	0.647(1×10⁰)
Polbooks	M₁	0.323(1.6×10⁻²)	0.544(5.74×10⁻⁴)	0.546(1×10⁰)	0.544(1×10⁰)	0.545(5×10⁻³)	0.548(1×10⁰)
Football	M₁	0.388(1.5×10⁻²)	0.847(4.87×10⁻³)	0.853(1×10⁰)	0.852(1×10⁰)	0.853(1×10⁰)	0.853(1×10⁰)
Email	M₁	0.423(4×10⁻⁴)	0.673(5.11×10⁻³)	0.655(6.66×10⁻⁴)	0.677(3×10⁻⁴)	0.658(1.9×10⁻³)	0.673(1.1×10⁻³)
Cora	M₁	0.638(8.18×10⁻³)	0.881(2.46×10⁻²)	0.879(5.02×10⁻³)	0.863(2.15×10⁻⁴)	0.816(6.5×10⁻²)	0.890(2.34×10⁻³)
Facebook	M₁	0.439(1.5×10⁻²)	0.386(1×10⁰)	0.391(1×10⁰)	0.235(1×10⁻³)	0.421(1.4×10⁻²)	0.477(1×10⁰)
PowerGrid	M₁	0.727(1.7×10⁻²)	0.908(2.94×10⁻³)	0.920(1×10⁰)	0.619(1.5×10⁻²)	0.817(1.3×10⁻²)	0.928(1×10⁰)
PGP	M₁	0.433(2.53×10⁻²)	0.803(1.17×10⁻³)	0.755(1×10⁰)	0.748(4.68×10⁻³)	0.767(2×10⁻³)	0.798(1×10⁰)
Friedman Rank		6	4	3	5	2	1

A Higher-Order Community Detection Algorithm Based on Motif-Based Modularity Optimization

doi: 10.12178/1001-0548.2022111

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views