基于深度确定性策略梯度的粒子群算法

鲁华祥, 尹世远, 龚国良, 刘毅, 陈刚

鲁华祥, 尹世远, 龚国良, 刘毅, 陈刚. 基于深度确定性策略梯度的粒子群算法[J]. 电子科技大学学报, 2021, 50(2): 199-206. DOI: 10.12178/1001-0548.2020420
引用本文: 鲁华祥, 尹世远, 龚国良, 刘毅, 陈刚. 基于深度确定性策略梯度的粒子群算法[J]. 电子科技大学学报, 2021, 50(2): 199-206. DOI: 10.12178/1001-0548.2020420
LU Hua-xiang, YIN Shi-yuan, GONG Guo-liang, LIU Yi, CHENG Gang. A Particle Swarm Optimization Algorithm Based on Deep Deterministic Policy Gradient[J]. Journal of University of Electronic Science and Technology of China, 2021, 50(2): 199-206. DOI: 10.12178/1001-0548.2020420
Citation: LU Hua-xiang, YIN Shi-yuan, GONG Guo-liang, LIU Yi, CHENG Gang. A Particle Swarm Optimization Algorithm Based on Deep Deterministic Policy Gradient[J]. Journal of University of Electronic Science and Technology of China, 2021, 50(2): 199-206. DOI: 10.12178/1001-0548.2020420

基于深度确定性策略梯度的粒子群算法

基金项目: 国家自然科学基金(U19A2080, U1936106);北京市科技计划(Z181100001518006);高技术项目(31513070501, 1916312ZD00902201, XDA27040303)
详细信息
    作者简介:

    鲁华祥(1965-),男,博士,研究员,博士生导师,主要从事神经计算芯片、类脑神经计算技术和应用系统方面的研究

    通讯作者:

    龚国良,E-mail:gongmianjie@semi.ac.cn

  • 中图分类号: TP18

A Particle Swarm Optimization Algorithm Based on Deep Deterministic Policy Gradient

  • 摘要: 在传统的粒子群优化算法(PSO)中,所有粒子都遵循最初设定的一些参数进行自我探索,这种方案容易导致过早成熟,且易被困于局部最优点。针对以上问题,该文提出了一种基于深度确定性策略梯度的粒子群优化算法(DDPGPSO),通过构造神经网络分别实现了动作函数和动作价值函数,且利用神经网络可以动态地生成算法运行所需要的参数,降低了人工配置算法的难度。实验表明DDPGPSO相比9种同类算法在收敛速度和寻优精度上均有较大的提升。
    Abstract: In the traditional particle swarm optimization (PSO) algorithm, all particles follow some initial parameters to explore themselves. This scheme is easy to lead to premature maturity, and easy to be trapped in the local optimum. To solve the above problems, a particle swarm optimization algorithm based on deep deterministic policy gradient (DDPGPSO) is proposed. The action function and action value function are realized by constructing neural network. The parameters required by the algorithm can be generated dynamically by using the neural network, which reduces the difficulty of manual configuration of the algorithm. The experimental results show that DDPGPSO has a great improvement in convergence speed and optimization accuracy compared with nine similar algorithms.
  • 图  1   F1迭代图

    图  2   F2迭代图

    图  3   F3迭代图

    图  4   F4迭代图

    图  5   F5迭代图

    图  6   F6迭代图

    图  7   F7迭代图

    图  8   F8迭代图

    表  1   动作网络结构

    层名称输出维度输入
    输入层6(状态向量)
    L0层24输入层
    L1层16L0层
    输出层3(动作向量)L1层
    下载: 导出CSV

    表  2   动作价值网络结构

    层名称输出维度输入
    输入层16(状态向量)
    输入层23(动作向量)
    拼接层9输入层1, 2
    L0层24拼接层
    L1层16L0层
    输出层1(动作价值)L1层
    下载: 导出CSV

    表  3   算法参数设置

    算法参数设置
    DDPGPSO无需参数设置,惯性因子搜索范围0.1~0.9;加速因子
    搜索范围0.5~2.5
    CPSOS惯性因子初值0.9,末值0.4;全局加速因子初值0.5,
    末值2.5;自身加速因子初值2.5,末值0.5;
    PSO惯性因子w = 0.5;加速因子c1 = 1, c2 = 2
    WOA定义螺旋形状的常数b = 1
    MFO定义螺旋形状的常数b = 1
    CUCKOO布谷鸟蛋发现概率pa = 0.25, 步长控制量α = 1
    BAT响度A=0.5, 脉冲发射率r=0.5, 最小频率Qmin= 0,
    最大频率Qmax= 2
    BFA探索步长1
    MVO虫洞存在率最大值WEPmax= 1和最小值WEPmin= 0.2,
    开发准确率p= 0.6
    CFA伸展度常数: r1 = 2,r2 =−1, 可见度常数:v1 =−1.5,v2 = 1.5
    下载: 导出CSV

    表  4   基准测试函数

    名称测试函数公式(目标均为最小化)初始范围维数
    F1Ackley$- 20\exp \left( - 0.2\sqrt {\dfrac{1}{D}\displaystyle\sum\limits_{i = 1}^D {x_i^2} } \right) - \exp \left(\dfrac{1}{D}\displaystyle\sum\limits_{i = 1}^D {\cos (2{\text π} {x_i})} \right) + 20 + \exp (1)$[−100, 100]100
    F2Griewank$\displaystyle\sum\limits_{i = 1}^D {\dfrac{ {x_i^2} }{ {4\;000} } - \prod\limits_{i = 1}^D {\cos \left(\dfrac{ { {x_i} } }{ {\sqrt i } }\right) + 1} }$[−600, 600]100
    F3Quartic$\displaystyle\sum\limits_{i = 1}^D {ix_i^4 + {\rm{random}}[0,1)} $[−1.28,1.28]100
    F4Rastrigrin$\displaystyle\sum\limits_{i = 1}^D {[x_i^2 - 10\cos (2{\text π} {x_i}) + 10]}$[−5.12,5.12]100
    F5RosenBrock$\displaystyle\sum\limits_{i = 1}^D {[100{{(x_i^2 - {x_{i + 1}})}^2} + {{({x_i} - 1)}^2}]} $[−30, 30]100
    F6Schwefel$\min (|{x_i}|,1 \leqslant i \leqslant D)$[−10, 10]100
    F7Sphere$\displaystyle\sum\limits_{i = 1}^D {x_i^2} $[−100, 100]100
    F8Step$\displaystyle\sum\limits_{i = 1}^D {{{({x_i} + 0.5)}^2}} $[−100, 100]100
    下载: 导出CSV

    表  5   实验结果对比

    函数数据名DDPGPSOBATBFACPSOSCUCKOOFAMFOMVOPSOWOA
    F1mean2.35×10−102.00×102.11×104.81×10−52.07×102.12×102.03×101.45×102.34×10−21.77×10
    best1.78×10−102.00×102.06×103.28×10−72.01×102.08×102.00×101.44×105.87×10−61.73×10
    std3.84×10−111.45×10−41.77×10−17.74×10−52.20×10−11.16×10−19.87×10−22.00×10−24.53×10−24.57×10−1
    F2mean5.55×10−191.52×1032.33×1031.24×10−66.49×1032.99×1038.09×1031.12×1021.21×10−14.89×10
    best0.00 1.52×1031.37×1031.97×10−102.36×1032.45×1032.44×1031.11×1022.67×10−74.73×10
    std2.42×10−182.13 6.70×1026.43×10−61.08×1032.66×1022.29×1022.11×10−12.79×10−11.35
    F3mean5.17×10−12.38×1032.42×1035.15×10−17.03×1032.95×1031.19×1041.53×105.29×10−15.78
    best6.79×10−41.80×1031.37×1031.23×10−31.78×1031.33×1031.82×1031.48×103.93×10−35.26
    std2.85×10−12.38×1024.17×1022.81×10−11.52×1034.76×1024.91×1022.87×10−12.75×10−13.36×10−1
    F4mean0.00 1.58×1031.75×1031.93×10−72.37×1031.85×1032.74×1039.72×1021.29×10−19.80×102
    best0.00 1.47×1031.42×1032.31×10−111.59×1031.52×1031.66×1039.67×1026.28×10−99.63×102
    std0.004.45×101.27×1026.00×10−71.96×1021.04×1025.47×102.054.55×10−12.35×10
    F5mean1.00×1024.54×1081.11×1099.99×105.17×1091.62×1097.17×1098.55×1061.32×1023.77×106
    best9.99×104.27×1084.38×1089.98×101.18×1091.14×1091.16×1098.50×1069.99×103.65×106
    std2.24×10−33.27×1075.28×1084.16×10−21.04×1092.25×1082.55×1082.55×1041.73×1027.84×104
    F6mean5.19×10−98.21 9.76 3.00×10−39.97 9.93 1.00×108.32 3.15×10−18.72
    best3.09×10−97.89 8.88 1.35×10−39.52 9.41 9.62 8.30 7.09×10−28.59
    std1.66×10−91.89×10−11.90×10−11.41×10−31.06×10−19.70×10−20.00 8.67×10−31.57×10−11.62×10−1
    F7mean2.66×10−181.51×1052.60×1051.95×10−67.08×1053.34×1059.02×1051.32×1041.84×10−14.56×103
    best1.28×10−181.49×1051.56×1051.66×10−82.74×1052.66×1052.70×1051.32×1046.14×10−84.45×103
    std1.16×10−181.54×1037.01×1042.66×10−61.26×1053.09×1042.39×1042.39×108.43×10−19.98×10
    F8mean2.44×101.74×1052.60×1052.33×107.13×1053.30×1058.99×1051.10×1044.91×104.53×103
    best2.36×101.73×1051.50×1052.16×102.66×1052.72×1052.65×1051.09×1042.34×104.29×103
    std4.36×10−21.53×1037.25×1044.57×10−11.08×1052.98×1043.27×1042.17×102.69×102.49×102
    下载: 导出CSV

    表  6   不同迭代次数下DDPGPSO排名

    迭代次数平均排名排名第一百分比/%
    1001.37575
    5001.62562.5
    10001.7562.5
    下载: 导出CSV
  • [1]

    KENNEDY J, EBERHART R. Particle swarm optimization[C]//International Conference on Neural Networks. Perth, Australia: IEEE, 1995: 1942-1948.

    [2]

    MIRJALILI S, LEWIS A. The whale optimization algorithm[J]. Advances in Engineering Software, 2016, 95: 51-67. DOI: 10.1016/j.advengsoft.2016.01.008

    [3]

    MIRJALILI S, MIRJALILI S M, LEWIS A. Grey wolf optimizer[J]. Advances in Engineering Software, 2014, 69: 46-61. DOI: 10.1016/j.advengsoft.2013.12.007

    [4]

    YANG X S. A new metaheuristic bat-inspired algorithm[J]. Computer Knowledge & Technology, 2010, 284: 65-74.

    [5]

    ZHAO Fu-qing, QIN Shuo, YANG Guo-qiang, et al. A factorial based particle swarm optimization with a population adaptation mechanism for the no-wait flow shop scheduling problem with the makespan objective[J]. Expert Systems with Application, 2019, 126: 41-53. DOI: 10.1016/j.eswa.2019.01.084

    [6]

    SANG Jin-guo, CUI Huan-qiang. Energy saving schedule of mine drainage system based on particle swarm optimization[J]. Journal of Physics: Conference Series, 2019, DOI: 10.1088/1742-6596/1168/2/022001.

    [7]

    CUI Zhi-hua, ZHANG Jiang-jiang, WU Di, et al. Hybrid many-objective particle swarm optimization algorithm for green coal production problem[J]. Information Sciences, 2020, 518: 256-271. DOI: 10.1016/j.ins.2020.01.018

    [8]

    WU Yu-qiang, WU Yong-gang, LIU Xing-long. Couple-based particle swarm optimization for short-term hydrothermal scheduling[J]. Applied Soft Computing, 2019, 74: 440-450. DOI: 10.1016/j.asoc.2018.10.041

    [9]

    WANG Yu, LI Bin, THOMAS W, et al. Self-adaptive learning based particle swarm optimization[J]. Information Sciences, 2011, 181(20): 4515-4538. DOI: 10.1016/j.ins.2010.07.013

    [10]

    SHI Y. A modified particle swarm optimizer[C]//1998 IEEE World Congress on Computational Intelligence. [S.l.]: IEEE, 1998: 69-73.

    [11]

    CLERC M, KENNEDY J. The particle swarm - explosion, stability, and convergence in a multidimensional complex space[J]. IEEE Transactions on Evolutionary Computation, 2002, 6(1): 58-73. DOI: 10.1109/4235.985692

    [12]

    RATNAWEERA A, HALGAMUGE S K, WATSON H C. Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients[J]. IEEE Transactions on Evolutionary Computation, 2004, 8(3): 240-255. DOI: 10.1109/TEVC.2004.826071

    [13]

    ZHAN Zhi-hui, ZHANG Jun, LI Yun, et al. Adaptive particle swarm optimization[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2009, 39(6): 1362-1381. DOI: 10.1109/TSMCB.2009.2015956

    [14]

    XU Gui-ping, CUI Quan-long, SHI Xiao-hu, et al. Particle swarm optimization based on dimensional learning strategy[J]. Swarm and Evolutionary Computation, 2019, 45: 33-51. DOI: 10.1016/j.swevo.2018.12.009

    [15]

    ANG K M, LIM W H, ISA N A M, et al. A constrained multi-swarm particle swarm optimization without velocity for constrained optimization problems[J]. Expert Systems with Applications, 2020, DOI: 10.1016/j.eswa.2019.112882.

    [16]

    ZHANG Xu-wei, LIU Hao, ZHANG Tong, et al. Terminal crossover and steering-based particle swarm optimization algorithm with disturbance[J]. Applied Soft Computing, 2019, DOI: 10.1016/j.asoc.2019.105841.

    [17]

    WANG Sheng-liang, LIU Gen-you, GAO Ming, et al. Heterogeneous comprehensive learning and dynamic multi-swarm particle swarm optimizer with two mutation operators[J]. Information Sciences, 2020, DOI: 10.1016/j.ins.2020.06.027.

    [18]

    LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. Computer Science, 2016, 8(6): A187.

    [19]

    SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//Proceedings of the International Conference on Machine Learning. [S.l.]: ACM, 2014: 387-395.

    [20]

    TIAN Dong-qiang, ZHAO Xiao-fei, SHI Zhong-zhi. Chaotic particle swarm optimization with sigmoid-based acceleration coefficients for numerical function optimization[J]. Swarm and Evolutionary Computation, 2019, 51: 100573. DOI: 10.1016/j.swevo.2019.100573

    [21] 张强, 郭玉洁, 王颖, 等. 一种离散鲸鱼算法及其应用[J]. 电子科技大学学报, 2020, 49(4): 622-630. DOI: 10.12178/1001-0548.2019116

    ZHANG Qiang, GUO Yu-jie, WANG Ying, et al. A discrete whale optimization algorithm and application[J]. Journal of the University of Electronic Science and Technology of China, 2020, 49(4): 622-630. DOI: 10.12178/1001-0548.2019116

图(8)  /  表(6)
计量
  • 文章访问数:  7415
  • HTML全文浏览量:  3066
  • PDF下载量:  132
  • 被引次数: 0
出版历程
  • 收稿日期:  2020-11-20
  • 修回日期:  2020-12-22
  • 网络出版日期:  2021-01-12
  • 刊出日期:  2021-03-21

目录

    /

    返回文章
    返回