Abstract:
Reinforcement learning equips agents with the capability to successfully complete path planning task in an unknown environment, where the agent uses the information derived from its interaction with the environment to autonomously adjust its policy and find the optimal path. However, most path planning tasks face the problem of sparse rewards. Within path planning tasks characterized by these sparse rewards, the process of obtaining external rewards and acquiring valid training data becomes notably challenging, which makes the algorithm iterate slowly and even difficult to converge. To this end, we proposes a dynamic stepwise reinforcement learning path planning algorithm based on suboptimal policy. This algorithm introduces suboptimal policy into the reinforcement learning framework through dynamic stepwise methods, and designs intrinsic reward to encourage agent to explore better than suboptimal ones. Experimental results show that compared with the baseline algorithm, our algorithm has better performance, which obtains higher rewards and converges faster.