Q-Learning Based Distributed Adaptive Algorithm for Topological Stability

HUANG Qing-dong; SHI Bin-yu; GUO Min-peng; YUAN Run-zhi; CHEN chen

doi:10.12178/1001-0548.2019076

Aiming at the influence of mobile nodes on network topological stability, an adaptive distributed reinforcement learning algorithm is proposed to predict the stable connection of adjacent nodes. Each node uses the method of combining reinforcement learning with adaptive division of learning intervals, uses the received signal strength information between adjacent nodes to determine the connection state between adjacent nodes, and finally predicts the set of neighbor nodes that can maintain stable connection. The simulation results of random walk model under various conditions show that the prediction accuracy is about 95%, which verifies the effectiveness and stability of the algorithm.

HTML

移动自组织网络(mobile Ad hoc networks, MANET)是由移动节点组成复杂分布式系统。移动节点可以自由和动态地自组织成临时网络拓扑结构来传输每个节点收集到的信息。MANET的特点是有限的存储资源、处理能力以及高度移动性。在网络中，移动节点可以动态地加入或离开网络，导致了频繁和难以预测的拓扑改变，加重了网络任务的复杂程度，降低了网络通信质量。由于网络拓扑结构的不断变化^[1-2]，无线链路在高速移动环境中经常发生断裂，如何保持通信链路的持续性成为一个巨大挑战。因此，在临时网络拓扑结构信息交互过程中选择稳定联接链路节点进行传输对于链路联接的持续性有重要意义。

为了增强网络的性能因素，目前最有效方法是通过节点的移动特性来预测网络中链路联接的稳定性程度和网络拓扑结构。文献[3]提出了基于自适应神经模糊系统来预测节点的运动轨迹，根据预测得到的轨迹来选择链路节点进行传输。文献[1]通过收集节点的接收信号强度指示(received signal strength indication, RSSI)，将其进行深度学习训练，预测节点的运动轨迹。文献[4-5]通过深度学习或机器学习方法对节点的位置进行预测或进行链路质量预测来选择最短可靠路径进行信息传输。文献[6]提出一种基于接收信号强度选择稳定路径的方法，根据一段时间内节点接收信号强度平均值将链路分为强联接和弱联接两类，设定阈值选择某一阈值内的链路进行路由传输。上述算法在研究方法上不尽相同，但都存在一定的局限性。现有的预测链路稳定性的算法中，大多都是仅考虑节点相对移动性，或仅采集节点某个时期的运动参数，而这些参数不能及时反映节点移动特性的变化，没有考虑对链路稳定性的综合影响。通常在预测节点的未来移动性时需大量的测量数据以及控制信息，这些因素会形成巨大开销造成网络拥塞，降低网络性能。在预测过程中节点移动特性是假设不变的，然而在实际的网络中这些情况都会实时变化，算法不能很好地自适应环境变化。因此，本文提出一种基于强化学习的分布式自适应拓扑稳定性方法，通过对网络中各个邻居节点接收信号强度值自适应学习，得到每个节点对未来链路稳定性和拓扑结构的判断依据，提升网络性能。

本文将接收信号强度与强化学习方法结合，每个分布式节点通过邻居节点的信号强度值进行分布式强化学习，自适应划分区间边界分级处理，形成直接决策区间和自适应强化学习区间，对不同环境下节点的联接状态进行分级判断以及实时更新学习。经过不断学习每个节点得到最优联接策略表，根据策略表中的值预测和判断下一状态的邻居节点联接情况，解决了综合因素对链路稳定性的影响。

4. 结束语

本文通过研究MANET中移动节点对网络拓扑影响，提出了基于强化学习的分布式自适应算法。算法中每个节点通过对其他节点运动特性学习得到下一传输时刻稳定联接的邻居集合，通过稳定联接集合预测移动节点之间网络拓扑的稳定联接关系，可以更好地适应网络拓扑变化。MANET中稳定的拓扑联接关系很大程度上改善了路由选择，同时也提高了网络通信服务质量。实验结果表明，基于Q-learning的分布式自适应拓扑稳定性算法高效稳定且准确度高，能够有效地实现网络拓扑联接的稳定性选择。

Reference (12)

[1]	YAYEH Y, LIN H, BERIE G, et al. Mobility prediction in mobile ad-hoc network using deep learning[C]//The 2018 IEEE International Conference on Applied System Invention. Japan: IEEE, 2018: 1203-1206.
[2]	MAYADUNNA H, SILVA S L D, WEDAGE I, et al. Improving trusted routing by identifying malicious nodes in a MANET using reinforcement learning[C]//The 2017 Seventeenth International Conference on Advances in ICT for Emerging Regions. Colombo, Srilanka: IEEE, 2017: 1-8.
[3]	ELLEUCH M, KAANICHE H, AYADI M. Exploiting neuro-fuzzy system for mobility prediction in wireless ad-hoc networks[C]//International Work-Conference on Artificial Neural Networks. Palma de Mallorca, Spain: Springer, 2015: 536-548.
[4]	KAANICHE H, KAMOUN F. Mobility prediction in wireless ad hoc networks using neural networks[J]. Computer Science, 2010, 8(1): 95-97.
[5]	LIU L, CHENG Y, CAI L, et al. Deep learning based optimization in wireless network[C]//The 2017 IEEE International Conference on Communications. Paris, France: IEEE, 2017: 1-6.
[6]	夏辉, 贾智平, 张志勇, 等. 移动Ad Hoc 网络中基于链路稳定性预测的组播路由协议[J]. 计算机学报, 2013, 36(5): 926-936.	XIA Hui, JIA Zhi-ping, ZHANG Zhi-yong, et al. A link stability prediction-based multicast routing protocol in mobile Ad Hoc networks[J]. Chinese Journal of Computers, 2013, 36(5): 926-936.
[7]	梁志伟, 朱松豪. 基于强化学习的类人机器人步行参数训练算法[J]. 计算机工程, 2012, 38(8): 13-15.	LIANG Zhi-wei, ZHU Song-hao. Walking parameters training algorithm of humanoid robot based on reinforcement learning[J]. Computer Engineering, 2012, 38(8): 13-15.
[8]	MAMMERI Z. Reinforcement learning based routing in networks: review and classification of approaches[J]. IEEE Access, 2019, 7: 55916-55950.
[9]	KAVALEROV M, LIKHACHEVA Y, SHILOVA Y. A reinforcement learning approach to network routing based on adaptive learning rates and route memory[C]//Southeastcon.[S.l.]: IEEE, 2017: 1-6.
[10]	ALHARBI A, ALDHALAAN A, ALRODHAAN M. A mobile ad hoc network q-routing algorithm: Self-aware approach[J]. International Journal of Computer Applications, 2015, 127(7): 1-6.
[11]	CAMP T, BOLENG J, DAVIES V. A survey of mobility models for ad hoc network research[J]. Wireless Communications and Mobile Computing, 2002, 2(5): 483-502.
[12]	熊皓. 无线电波传播[M]. 北京: 电子工业出版社, 2000.	XIONG Hao. Radio propagation[M]. Beijing: Publishing house of electronics industry, 2000.

$R_{{s_i} \to {{s'}_i}}^{{a_j}}$	${a_1}$	${a_2}$
${s_1} \to {s_1}$	+1	−1
${s_1} \to {s_2}$	−5	+1
${s_2} \to {s_1}$	+1	−1
${s_2} \to {s_2}$	−5	+1

参数	值
移动模型	RWM
仿真区域/m²	150×150
节点数目/个	15
节点随机移动速度/m·s⁻¹	[0, 10]
节点随机停顿时间区间/s	[0, 10]
节点随机移动角度区间	[0, 2π]
节点最大通信距离/m	70
仿真时间/s	1 000
采集数据间隔/s	1
临界联接信号强度/dBm	−77

Q-Learning Based Distributed Adaptive Algorithm for Topological Stability

doi: 10.12178/1001-0548.2019076

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views