A Particle Swarm Optimization Algorithm Based on Deep Deterministic Policy Gradient
-
摘要: 在传统的粒子群优化算法(PSO)中,所有粒子都遵循最初设定的一些参数进行自我探索,这种方案容易导致过早成熟,且易被困于局部最优点。针对以上问题,该文提出了一种基于深度确定性策略梯度的粒子群优化算法(DDPGPSO),通过构造神经网络分别实现了动作函数和动作价值函数,且利用神经网络可以动态地生成算法运行所需要的参数,降低了人工配置算法的难度。实验表明DDPGPSO相比9种同类算法在收敛速度和寻优精度上均有较大的提升。Abstract: In the traditional particle swarm optimization (PSO) algorithm, all particles follow some initial parameters to explore themselves. This scheme is easy to lead to premature maturity, and easy to be trapped in the local optimum. To solve the above problems, a particle swarm optimization algorithm based on deep deterministic policy gradient (DDPGPSO) is proposed. The action function and action value function are realized by constructing neural network. The parameters required by the algorithm can be generated dynamically by using the neural network, which reduces the difficulty of manual configuration of the algorithm. The experimental results show that DDPGPSO has a great improvement in convergence speed and optimization accuracy compared with nine similar algorithms.
-
-
表 1 动作网络结构
层名称 输出维度 输入 输入层 6(状态向量) L0层 24 输入层 L1层 16 L0层 输出层 3(动作向量) L1层 表 2 动作价值网络结构
层名称 输出维度 输入 输入层1 6(状态向量) 输入层2 3(动作向量) 拼接层 9 输入层1, 2 L0层 24 拼接层 L1层 16 L0层 输出层 1(动作价值) L1层 表 3 算法参数设置
算法 参数设置 DDPGPSO 无需参数设置,惯性因子搜索范围0.1~0.9;加速因子
搜索范围0.5~2.5CPSOS 惯性因子初值0.9,末值0.4;全局加速因子初值0.5,
末值2.5;自身加速因子初值2.5,末值0.5;PSO 惯性因子w = 0.5;加速因子c1 = 1, c2 = 2 WOA 定义螺旋形状的常数b = 1 MFO 定义螺旋形状的常数b = 1 CUCKOO 布谷鸟蛋发现概率pa = 0.25, 步长控制量α = 1 BAT 响度A=0.5, 脉冲发射率r=0.5, 最小频率Qmin= 0,
最大频率Qmax= 2BFA 探索步长1 MVO 虫洞存在率最大值WEPmax= 1和最小值WEPmin= 0.2,
开发准确率p= 0.6CFA 伸展度常数: r1 = 2,r2 =−1, 可见度常数:v1 =−1.5,v2 = 1.5 表 4 基准测试函数
名称 测试函数 公式(目标均为最小化) 初始范围 维数 F1 Ackley $- 20\exp \left( - 0.2\sqrt {\dfrac{1}{D}\displaystyle\sum\limits_{i = 1}^D {x_i^2} } \right) - \exp \left(\dfrac{1}{D}\displaystyle\sum\limits_{i = 1}^D {\cos (2{\text π} {x_i})} \right) + 20 + \exp (1)$ [−100, 100] 100 F2 Griewank $\displaystyle\sum\limits_{i = 1}^D {\dfrac{ {x_i^2} }{ {4\;000} } - \prod\limits_{i = 1}^D {\cos \left(\dfrac{ { {x_i} } }{ {\sqrt i } }\right) + 1} }$ [−600, 600] 100 F3 Quartic $\displaystyle\sum\limits_{i = 1}^D {ix_i^4 + {\rm{random}}[0,1)} $ [−1.28,1.28] 100 F4 Rastrigrin $\displaystyle\sum\limits_{i = 1}^D {[x_i^2 - 10\cos (2{\text π} {x_i}) + 10]}$ [−5.12,5.12] 100 F5 RosenBrock $\displaystyle\sum\limits_{i = 1}^D {[100{{(x_i^2 - {x_{i + 1}})}^2} + {{({x_i} - 1)}^2}]} $ [−30, 30] 100 F6 Schwefel $\min (|{x_i}|,1 \leqslant i \leqslant D)$ [−10, 10] 100 F7 Sphere $\displaystyle\sum\limits_{i = 1}^D {x_i^2} $ [−100, 100] 100 F8 Step $\displaystyle\sum\limits_{i = 1}^D {{{({x_i} + 0.5)}^2}} $ [−100, 100] 100 表 5 实验结果对比
函数 数据名 DDPGPSO BAT BFA CPSOS CUCKOO FA MFO MVO PSO WOA F1 mean 2.35×10−10 2.00×10 2.11×10 4.81×10−5 2.07×10 2.12×10 2.03×10 1.45×10 2.34×10−2 1.77×10 best 1.78×10−10 2.00×10 2.06×10 3.28×10−7 2.01×10 2.08×10 2.00×10 1.44×10 5.87×10−6 1.73×10 std 3.84×10−11 1.45×10−4 1.77×10−1 7.74×10−5 2.20×10−1 1.16×10−1 9.87×10−2 2.00×10−2 4.53×10−2 4.57×10−1 F2 mean 5.55×10−19 1.52×103 2.33×103 1.24×10−6 6.49×103 2.99×103 8.09×103 1.12×102 1.21×10−1 4.89×10 best 0.00 1.52×103 1.37×103 1.97×10−10 2.36×103 2.45×103 2.44×103 1.11×102 2.67×10−7 4.73×10 std 2.42×10−18 2.13 6.70×102 6.43×10−6 1.08×103 2.66×102 2.29×102 2.11×10−1 2.79×10−1 1.35 F3 mean 5.17×10−1 2.38×103 2.42×103 5.15×10−1 7.03×103 2.95×103 1.19×104 1.53×10 5.29×10−1 5.78 best 6.79×10−4 1.80×103 1.37×103 1.23×10−3 1.78×103 1.33×103 1.82×103 1.48×10 3.93×10−3 5.26 std 2.85×10−1 2.38×102 4.17×102 2.81×10−1 1.52×103 4.76×102 4.91×102 2.87×10−1 2.75×10−1 3.36×10−1 F4 mean 0.00 1.58×103 1.75×103 1.93×10−7 2.37×103 1.85×103 2.74×103 9.72×102 1.29×10−1 9.80×102 best 0.00 1.47×103 1.42×103 2.31×10−11 1.59×103 1.52×103 1.66×103 9.67×102 6.28×10−9 9.63×102 std 0.00 4.45×10 1.27×102 6.00×10−7 1.96×102 1.04×102 5.47×10 2.05 4.55×10−1 2.35×10 F5 mean 1.00×102 4.54×108 1.11×109 9.99×10 5.17×109 1.62×109 7.17×109 8.55×106 1.32×102 3.77×106 best 9.99×10 4.27×108 4.38×108 9.98×10 1.18×109 1.14×109 1.16×109 8.50×106 9.99×10 3.65×106 std 2.24×10−3 3.27×107 5.28×108 4.16×10−2 1.04×109 2.25×108 2.55×108 2.55×104 1.73×102 7.84×104 F6 mean 5.19×10−9 8.21 9.76 3.00×10−3 9.97 9.93 1.00×10 8.32 3.15×10−1 8.72 best 3.09×10−9 7.89 8.88 1.35×10−3 9.52 9.41 9.62 8.30 7.09×10−2 8.59 std 1.66×10−9 1.89×10−1 1.90×10−1 1.41×10−3 1.06×10−1 9.70×10−2 0.00 8.67×10−3 1.57×10−1 1.62×10−1 F7 mean 2.66×10−18 1.51×105 2.60×105 1.95×10−6 7.08×105 3.34×105 9.02×105 1.32×104 1.84×10−1 4.56×103 best 1.28×10−18 1.49×105 1.56×105 1.66×10−8 2.74×105 2.66×105 2.70×105 1.32×104 6.14×10−8 4.45×103 std 1.16×10−18 1.54×103 7.01×104 2.66×10−6 1.26×105 3.09×104 2.39×104 2.39×10 8.43×10−1 9.98×10 F8 mean 2.44×10 1.74×105 2.60×105 2.33×10 7.13×105 3.30×105 8.99×105 1.10×104 4.91×10 4.53×103 best 2.36×10 1.73×105 1.50×105 2.16×10 2.66×105 2.72×105 2.65×105 1.09×104 2.34×10 4.29×103 std 4.36×10−2 1.53×103 7.25×104 4.57×10−1 1.08×105 2.98×104 3.27×104 2.17×10 2.69×10 2.49×102 表 6 不同迭代次数下DDPGPSO排名
迭代次数 平均排名 排名第一百分比/% 100 1.375 75 500 1.625 62.5 1000 1.75 62.5 -
[1] KENNEDY J, EBERHART R. Particle swarm optimization[C]//International Conference on Neural Networks. Perth, Australia: IEEE, 1995: 1942-1948.
[2] MIRJALILI S, LEWIS A. The whale optimization algorithm[J]. Advances in Engineering Software, 2016, 95: 51-67. DOI: 10.1016/j.advengsoft.2016.01.008
[3] MIRJALILI S, MIRJALILI S M, LEWIS A. Grey wolf optimizer[J]. Advances in Engineering Software, 2014, 69: 46-61. DOI: 10.1016/j.advengsoft.2013.12.007
[4] YANG X S. A new metaheuristic bat-inspired algorithm[J]. Computer Knowledge & Technology, 2010, 284: 65-74.
[5] ZHAO Fu-qing, QIN Shuo, YANG Guo-qiang, et al. A factorial based particle swarm optimization with a population adaptation mechanism for the no-wait flow shop scheduling problem with the makespan objective[J]. Expert Systems with Application, 2019, 126: 41-53. DOI: 10.1016/j.eswa.2019.01.084
[6] SANG Jin-guo, CUI Huan-qiang. Energy saving schedule of mine drainage system based on particle swarm optimization[J]. Journal of Physics: Conference Series, 2019, DOI: 10.1088/1742-6596/1168/2/022001.
[7] CUI Zhi-hua, ZHANG Jiang-jiang, WU Di, et al. Hybrid many-objective particle swarm optimization algorithm for green coal production problem[J]. Information Sciences, 2020, 518: 256-271. DOI: 10.1016/j.ins.2020.01.018
[8] WU Yu-qiang, WU Yong-gang, LIU Xing-long. Couple-based particle swarm optimization for short-term hydrothermal scheduling[J]. Applied Soft Computing, 2019, 74: 440-450. DOI: 10.1016/j.asoc.2018.10.041
[9] WANG Yu, LI Bin, THOMAS W, et al. Self-adaptive learning based particle swarm optimization[J]. Information Sciences, 2011, 181(20): 4515-4538. DOI: 10.1016/j.ins.2010.07.013
[10] SHI Y. A modified particle swarm optimizer[C]//1998 IEEE World Congress on Computational Intelligence. [S.l.]: IEEE, 1998: 69-73.
[11] CLERC M, KENNEDY J. The particle swarm - explosion, stability, and convergence in a multidimensional complex space[J]. IEEE Transactions on Evolutionary Computation, 2002, 6(1): 58-73. DOI: 10.1109/4235.985692
[12] RATNAWEERA A, HALGAMUGE S K, WATSON H C. Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients[J]. IEEE Transactions on Evolutionary Computation, 2004, 8(3): 240-255. DOI: 10.1109/TEVC.2004.826071
[13] ZHAN Zhi-hui, ZHANG Jun, LI Yun, et al. Adaptive particle swarm optimization[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2009, 39(6): 1362-1381. DOI: 10.1109/TSMCB.2009.2015956
[14] XU Gui-ping, CUI Quan-long, SHI Xiao-hu, et al. Particle swarm optimization based on dimensional learning strategy[J]. Swarm and Evolutionary Computation, 2019, 45: 33-51. DOI: 10.1016/j.swevo.2018.12.009
[15] ANG K M, LIM W H, ISA N A M, et al. A constrained multi-swarm particle swarm optimization without velocity for constrained optimization problems[J]. Expert Systems with Applications, 2020, DOI: 10.1016/j.eswa.2019.112882.
[16] ZHANG Xu-wei, LIU Hao, ZHANG Tong, et al. Terminal crossover and steering-based particle swarm optimization algorithm with disturbance[J]. Applied Soft Computing, 2019, DOI: 10.1016/j.asoc.2019.105841.
[17] WANG Sheng-liang, LIU Gen-you, GAO Ming, et al. Heterogeneous comprehensive learning and dynamic multi-swarm particle swarm optimizer with two mutation operators[J]. Information Sciences, 2020, DOI: 10.1016/j.ins.2020.06.027.
[18] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. Computer Science, 2016, 8(6): A187.
[19] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//Proceedings of the International Conference on Machine Learning. [S.l.]: ACM, 2014: 387-395.
[20] TIAN Dong-qiang, ZHAO Xiao-fei, SHI Zhong-zhi. Chaotic particle swarm optimization with sigmoid-based acceleration coefficients for numerical function optimization[J]. Swarm and Evolutionary Computation, 2019, 51: 100573. DOI: 10.1016/j.swevo.2019.100573
[21] 张强, 郭玉洁, 王颖, 等. 一种离散鲸鱼算法及其应用[J]. 电子科技大学学报, 2020, 49(4): 622-630. DOI: 10.12178/1001-0548.2019116 ZHANG Qiang, GUO Yu-jie, WANG Ying, et al. A discrete whale optimization algorithm and application[J]. Journal of the University of Electronic Science and Technology of China, 2020, 49(4): 622-630. DOI: 10.12178/1001-0548.2019116