基于次优策略的动态分步强化学习路径规划算法

Dynamic stepwise reinforcement learning path planning algorithm based on suboptimal policy

  • 摘要: 强化学习允许智能体在未知环境中进行路径规划,智能体能够使用与环境交互所得到的信息自主调整策略并找到最优路径。然而,大多数基于强化学习的路径规划任务都面临着稀疏奖励的问题,即获取外部奖励以及有效训练样本的难度大,这使得算法迭代缓慢,甚至难以收敛。为此,该文提出了一种基于次优策略的动态分步强化学习路径规划算法,该算法通过动态分步的方法将次优策略引入到强化学习框架下,并设计内在奖励鼓励智能体探索优于次优策略的新策略。实验结果表明,与基线算法相比,该算法有着更好的表现,智能体系统获得的奖励更高,策略收敛速度更快。

     

    Abstract: Reinforcement learning equips agents with the capability to successfully complete path planning task in an unknown environment, where the agent uses the information derived from its interaction with the environment to autonomously adjust its policy and find the optimal path. However, most path planning tasks face the problem of sparse rewards. Within path planning tasks characterized by these sparse rewards, the process of obtaining external rewards and acquiring valid training data becomes notably challenging, which makes the algorithm iterate slowly and even difficult to converge. To this end, we proposes a dynamic stepwise reinforcement learning path planning algorithm based on suboptimal policy. This algorithm introduces suboptimal policy into the reinforcement learning framework through dynamic stepwise methods, and designs intrinsic reward to encourage agent to explore better than suboptimal ones. Experimental results show that compared with the baseline algorithm, our algorithm has better performance, which obtains higher rewards and converges faster.

     

/

返回文章
返回