基于强化学习的多阶段网络分组路由方法

高远翔; 罗龙; 孙罡

doi:10.12178/1001-0548.2021260

基于强化学习的多阶段网络分组路由方法

Packet Routing Method for Multi-Stage Networks Based on Reinforcement Learning

摘要

摘要: 多阶段网络被广泛应用于机器学习集群，由于多阶段网络中可用路径多，分组的路由是一个组合优化难题。现有基于启发式的路由算法由于缺乏性能保证，严重影响分组传输延迟。提出了基于强化学习的多阶段网络分组路由方法，使用一个新颖的策略迭代算法，通过学习的方式计算出最佳路由策略。算法通过在策略评估步骤中使用价值函数的最大似然估计器，克服了强化学习方法中蒙泰卡罗(MC)或时间差分(TD)价值估计器样本效率低的问题。为了应对组合优化时计算复杂度高的问题，算法在策略改进步骤中将组合动作空间上的优化分解为各组成动作的序列优化，以提高求解效率。基于NS-3网络模拟器的仿真实验结果表明，相较于现有最优的启发式路由策略，该算法学习到的路由策略降低了13.9%的平均分组延迟。

Abstract: Multi-stage networks are widely used in machine learning clusters. Due to the large number of available paths in a multi-stage network, packet routing is a combinatorial optimization problem. Existing routing algorithms based on heuristics lack performance guarantee, which seriously affects the packet transmission delay. This paper proposes a packet routing method based on reinforcement learning for multi-stage networks, using a novel policy iteration algorithm to compute an optimal routing policy by learning. In the policy evaluation step, this algorithm uses the maximum likelihood estimator of the value function, which overcomes the low sample efficiency problem of Monte Carlo (MC) or Temporal-Difference (TD) value function estimators in reinforcement learning. To deal with the high computational complexity of the combinatorial optimization problem in the policy improvement step, this algorithm decomposes the optimization over a combinatorial action space into a sequential optimization of each action. Experiments based on NS-3 network simulator show that the routing policy learnt by the algorithm reduces 13.9% of the average packet transmission delay compared to existing best routing heuristics.

HTML全文

参考文献(18)

施引文献

资源附件(0)

基于强化学习的多阶段网络分组路由方法

Packet Routing Method for Multi-Stage Networks Based on Reinforcement Learning

期刊在线

编辑办公

友情链接