基于深度确定性策略梯度的粒子群算法

鲁华祥; 尹世远; 龚国良; 刘毅; 陈刚

doi:10.12178/1001-0548.2020420

基于深度确定性策略梯度的粒子群算法

鲁华祥^{1, 2, 3, 4},
尹世远¹,
龚国良^1, ,,
刘毅¹,
陈刚¹

1.
中国科学院半导体研究所　北京海淀区　100083
2.
中国科学院大学微电子学院　北京海淀区　100089
3.
中国科学院脑科学与智能技术卓越创新中心　上海松江区　200031
4.
半导体神经网络智能感知与计算技术北京市重点实验室　北京海淀区　100083

基金项目: 国家自然科学基金(U19A2080, U1936106)；北京市科技计划(Z181100001518006)；高技术项目(31513070501, 1916312ZD00902201, XDA27040303)

详细信息

作者简介:
鲁华祥(1965-)，男，博士，研究员，博士生导师，主要从事神经计算芯片、类脑神经计算技术和应用系统方面的研究

通讯作者:
龚国良，E-mail：gongmianjie@semi.ac.cn

中图分类号: TP18
计量
- 文章访问数: 7415
- HTML全文浏览量: 3066
- PDF下载量: 132
出版历程
- 收稿日期: 2020-11-20
- 修回日期: 2020-12-22
- 网络出版日期: 2021-01-12
- 刊出日期: 2021-03-21

A Particle Swarm Optimization Algorithm Based on Deep Deterministic Policy Gradient

LU Hua-xiang^{1, 2, 3, 4},
YIN Shi-yuan¹,
GONG Guo-liang^1, ,,
LIU Yi¹,
CHENG Gang¹

1.
Institute of Semiconductors, CAS　Haidian Beijing　100083
2.
School of microelectronics, University of Chinese Academy of Sciences　Haidian Beijing　100089
3.
Center for Excellence in Brain Science and Intelligence Technology, CAS　Songjiang Shanghai　200031
4.
Beijing Key Laboratory of Semiconductor Neural Network Intelligent Sensing and Computing Technology　Haidian Beijing　100083

摘要

摘要: 在传统的粒子群优化算法(PSO)中，所有粒子都遵循最初设定的一些参数进行自我探索，这种方案容易导致过早成熟，且易被困于局部最优点。针对以上问题，该文提出了一种基于深度确定性策略梯度的粒子群优化算法(DDPGPSO)，通过构造神经网络分别实现了动作函数和动作价值函数，且利用神经网络可以动态地生成算法运行所需要的参数，降低了人工配置算法的难度。实验表明DDPGPSO相比9种同类算法在收敛速度和寻优精度上均有较大的提升。
- 自适应惯性权值 /
- 收敛因子 /
- 深度确定性策略梯度算法 /
- 强化学习 /
- 群体智能 /
- 粒子群优化算法
Abstract: In the traditional particle swarm optimization (PSO) algorithm, all particles follow some initial parameters to explore themselves. This scheme is easy to lead to premature maturity, and easy to be trapped in the local optimum. To solve the above problems, a particle swarm optimization algorithm based on deep deterministic policy gradient (DDPGPSO) is proposed. The action function and action value function are realized by constructing neural network. The parameters required by the algorithm can be generated dynamically by using the neural network, which reduces the difficulty of manual configuration of the algorithm. The experimental results show that DDPGPSO has a great improvement in convergence speed and optimization accuracy compared with nine similar algorithms.
- adaptive inertia weight /
- convergence factor /
- deep deterministic policy gradient /
- reinforcement learning /
- swarm intelligence /
- swarm PSO

HTML全文

图 1 F1迭代图

下载: 全尺寸图片幻灯片

图 2 F2迭代图

下载: 全尺寸图片幻灯片

图 3 F3迭代图

下载: 全尺寸图片幻灯片

图 4 F4迭代图

下载: 全尺寸图片幻灯片

图 5 F5迭代图

下载: 全尺寸图片幻灯片

图 6 F6迭代图

下载: 全尺寸图片幻灯片

图 7 F7迭代图

下载: 全尺寸图片幻灯片

图 8 F8迭代图

下载: 全尺寸图片幻灯片

表 1 动作网络结构

层名称	输出维度	输入
输入层	6(状态向量)
L0层	24	输入层
L1层	16	L0层
输出层	3(动作向量)	L1层

下载: 导出CSV

表 2 动作价值网络结构

层名称	输出维度	输入
输入层1	6(状态向量)
输入层2	3(动作向量)
拼接层	9	输入层1, 2
L0层	24	拼接层
L1层	16	L0层
输出层	1(动作价值)	L1层

下载: 导出CSV

表 3 算法参数设置

算法	参数设置
DDPGPSO	无需参数设置，惯性因子搜索范围0.1～0.9；加速因子搜索范围0.5～2.5
CPSOS	惯性因子初值0.9，末值0.4；全局加速因子初值0.5，末值2.5；自身加速因子初值2.5，末值0.5；
PSO	惯性因子w = 0.5；加速因子c₁ = 1, c₂ = 2
WOA	定义螺旋形状的常数b = 1
MFO	定义螺旋形状的常数b = 1
CUCKOO	布谷鸟蛋发现概率pa = 0.25, 步长控制量α = 1
BAT	响度A=0.5, 脉冲发射率r=0.5, 最小频率Qmin= 0, 最大频率Qmax= 2
BFA	探索步长1
MVO	虫洞存在率最大值WEPmax= 1和最小值WEPmin= 0.2, 开发准确率p= 0.6
CFA	伸展度常数: r1 = 2,r2 =−1, 可见度常数：v1 =−1.5,v2 = 1.5

下载: 导出CSV

表 4 基准测试函数

名称	测试函数	公式(目标均为最小化)	初始范围	维数
F1	Ackley	$- 20\exp \left( - 0.2\sqrt {\dfrac{1}{D}\displaystyle\sum\limits_{i = 1}^D {x_i^2} } \right) - \exp \left(\dfrac{1}{D}\displaystyle\sum\limits_{i = 1}^D {\cos (2{\text π} {x_i})} \right) + 20 + \exp (1)$	[−100, 100]	100
F2	Griewank	$\displaystyle\sum\limits_{i = 1}^D {\dfrac{ {x_i^2} }{ {4\;000} } - \prod\limits_{i = 1}^D {\cos \left(\dfrac{ { {x_i} } }{ {\sqrt i } }\right) + 1} }$	[−600, 600]	100
F3	Quartic	$\displaystyle\sum\limits_{i = 1}^D {ix_i^4 + {\rm{random}}[0,1)} $	[−1.28,1.28]	100
F4	Rastrigrin	$\displaystyle\sum\limits_{i = 1}^D {[x_i^2 - 10\cos (2{\text π} {x_i}) + 10]}$	[−5.12,5.12]	100
F5	RosenBrock	$\displaystyle\sum\limits_{i = 1}^D {[100{{(x_i^2 - {x_{i + 1}})}^2} + {{({x_i} - 1)}^2}]} $	[−30, 30]	100
F6	Schwefel	$\min (\|{x_i}\|,1 \leqslant i \leqslant D)$	[−10, 10]	100
F7	Sphere	$\displaystyle\sum\limits_{i = 1}^D {x_i^2} $	[−100, 100]	100
F8	Step	$\displaystyle\sum\limits_{i = 1}^D {{{({x_i} + 0.5)}^2}} $	[−100, 100]	100

下载: 导出CSV

表 5 实验结果对比

函数	数据名	DDPGPSO	BAT	BFA	CPSOS	CUCKOO	FA	MFO	MVO	PSO	WOA
F1	mean	2.35×10⁻¹⁰	2.00×10	2.11×10	4.81×10⁻⁵	2.07×10	2.12×10	2.03×10	1.45×10	2.34×10⁻²	1.77×10
	best	1.78×10⁻¹⁰	2.00×10	2.06×10	3.28×10⁻⁷	2.01×10	2.08×10	2.00×10	1.44×10	5.87×10⁻⁶	1.73×10
	std	3.84×10⁻¹¹	1.45×10⁻⁴	1.77×10⁻¹	7.74×10⁻⁵	2.20×10⁻¹	1.16×10⁻¹	9.87×10⁻²	2.00×10⁻²	4.53×10⁻²	4.57×10⁻¹
F2	mean	5.55×10⁻¹⁹	1.52×10³	2.33×10³	1.24×10⁻⁶	6.49×10³	2.99×10³	8.09×10³	1.12×10²	1.21×10⁻¹	4.89×10
	best	0.00	1.52×10³	1.37×10³	1.97×10⁻¹⁰	2.36×10³	2.45×10³	2.44×10³	1.11×10²	2.67×10⁻⁷	4.73×10
	std	2.42×10⁻¹⁸	2.13	6.70×10²	6.43×10⁻⁶	1.08×10³	2.66×10²	2.29×10²	2.11×10⁻¹	2.79×10⁻¹	1.35
F3	mean	5.17×10⁻¹	2.38×10³	2.42×10³	5.15×10⁻¹	7.03×10³	2.95×10³	1.19×10⁴	1.53×10	5.29×10⁻¹	5.78
	best	6.79×10⁻⁴	1.80×10³	1.37×10³	1.23×10⁻³	1.78×10³	1.33×10³	1.82×10³	1.48×10	3.93×10⁻³	5.26
	std	2.85×10⁻¹	2.38×10²	4.17×10²	2.81×10⁻¹	1.52×10³	4.76×10²	4.91×10²	2.87×10⁻¹	2.75×10⁻¹	3.36×10⁻¹
F4	mean	0.00	1.58×10³	1.75×10³	1.93×10⁻⁷	2.37×10³	1.85×10³	2.74×10³	9.72×10²	1.29×10⁻¹	9.80×10²
	best	0.00	1.47×10³	1.42×10³	2.31×10⁻¹¹	1.59×10³	1.52×10³	1.66×10³	9.67×10²	6.28×10⁻⁹	9.63×10²
	std	0.00	4.45×10	1.27×10²	6.00×10⁻⁷	1.96×10²	1.04×10²	5.47×10	2.05	4.55×10⁻¹	2.35×10
F5	mean	1.00×10²	4.54×10⁸	1.11×10⁹	9.99×10	5.17×10⁹	1.62×10⁹	7.17×10⁹	8.55×10⁶	1.32×10²	3.77×10⁶
	best	9.99×10	4.27×10⁸	4.38×10⁸	9.98×10	1.18×10⁹	1.14×10⁹	1.16×10⁹	8.50×10⁶	9.99×10	3.65×10⁶
	std	2.24×10⁻³	3.27×10⁷	5.28×10⁸	4.16×10⁻²	1.04×10⁹	2.25×10⁸	2.55×10⁸	2.55×10⁴	1.73×10²	7.84×10⁴
F6	mean	5.19×10⁻⁹	8.21	9.76	3.00×10⁻³	9.97	9.93	1.00×10	8.32	3.15×10⁻¹	8.72
	best	3.09×10⁻⁹	7.89	8.88	1.35×10⁻³	9.52	9.41	9.62	8.30	7.09×10⁻²	8.59
	std	1.66×10⁻⁹	1.89×10⁻¹	1.90×10⁻¹	1.41×10⁻³	1.06×10⁻¹	9.70×10⁻²	0.00	8.67×10⁻³	1.57×10⁻¹	1.62×10⁻¹
F7	mean	2.66×10⁻¹⁸	1.51×10⁵	2.60×10⁵	1.95×10⁻⁶	7.08×10⁵	3.34×10⁵	9.02×10⁵	1.32×10⁴	1.84×10⁻¹	4.56×10³
	best	1.28×10⁻¹⁸	1.49×10⁵	1.56×10⁵	1.66×10⁻⁸	2.74×10⁵	2.66×10⁵	2.70×10⁵	1.32×10⁴	6.14×10⁻⁸	4.45×10³
	std	1.16×10⁻¹⁸	1.54×10³	7.01×10⁴	2.66×10⁻⁶	1.26×10⁵	3.09×10⁴	2.39×10⁴	2.39×10	8.43×10⁻¹	9.98×10
F8	mean	2.44×10	1.74×10⁵	2.60×10⁵	2.33×10	7.13×10⁵	3.30×10⁵	8.99×10⁵	1.10×10⁴	4.91×10	4.53×10³
	best	2.36×10	1.73×10⁵	1.50×10⁵	2.16×10	2.66×10⁵	2.72×10⁵	2.65×10⁵	1.09×10⁴	2.34×10	4.29×10³
	std	4.36×10⁻²	1.53×10³	7.25×10⁴	4.57×10⁻¹	1.08×10⁵	2.98×10⁴	3.27×10⁴	2.17×10	2.69×10	2.49×10²

下载: 导出CSV

表 6 不同迭代次数下DDPGPSO排名

迭代次数	平均排名	排名第一百分比/%
100	1.375	75
500	1.625	62.5
1000	1.75	62.5

下载: 导出CSV

参考文献(21)

[1]	KENNEDY J, EBERHART R. Particle swarm optimization[C]//International Conference on Neural Networks. Perth, Australia: IEEE, 1995: 1942-1948.
[2]	MIRJALILI S, LEWIS A. The whale optimization algorithm[J]. Advances in Engineering Software, 2016, 95: 51-67. DOI: 10.1016/j.advengsoft.2016.01.008
[3]	MIRJALILI S, MIRJALILI S M, LEWIS A. Grey wolf optimizer[J]. Advances in Engineering Software, 2014, 69: 46-61. DOI: 10.1016/j.advengsoft.2013.12.007
[4]	YANG X S. A new metaheuristic bat-inspired algorithm[J]. Computer Knowledge & Technology, 2010, 284: 65-74.
[5]	ZHAO Fu-qing, QIN Shuo, YANG Guo-qiang, et al. A factorial based particle swarm optimization with a population adaptation mechanism for the no-wait flow shop scheduling problem with the makespan objective[J]. Expert Systems with Application, 2019, 126: 41-53. DOI: 10.1016/j.eswa.2019.01.084
[6]	SANG Jin-guo, CUI Huan-qiang. Energy saving schedule of mine drainage system based on particle swarm optimization[J]. Journal of Physics: Conference Series, 2019, DOI: 10.1088/1742-6596/1168/2/022001.
[7]	CUI Zhi-hua, ZHANG Jiang-jiang, WU Di, et al. Hybrid many-objective particle swarm optimization algorithm for green coal production problem[J]. Information Sciences, 2020, 518: 256-271. DOI: 10.1016/j.ins.2020.01.018
[8]	WU Yu-qiang, WU Yong-gang, LIU Xing-long. Couple-based particle swarm optimization for short-term hydrothermal scheduling[J]. Applied Soft Computing, 2019, 74: 440-450. DOI: 10.1016/j.asoc.2018.10.041
[9]	WANG Yu, LI Bin, THOMAS W, et al. Self-adaptive learning based particle swarm optimization[J]. Information Sciences, 2011, 181(20): 4515-4538. DOI: 10.1016/j.ins.2010.07.013
[10]	SHI Y. A modified particle swarm optimizer[C]//1998 IEEE World Congress on Computational Intelligence. [S.l.]: IEEE, 1998: 69-73.
[11]	CLERC M, KENNEDY J. The particle swarm - explosion, stability, and convergence in a multidimensional complex space[J]. IEEE Transactions on Evolutionary Computation, 2002, 6(1): 58-73. DOI: 10.1109/4235.985692
[12]	RATNAWEERA A, HALGAMUGE S K, WATSON H C. Self-organizing hierarchical particle swarm optimizer with time-varying acceleration coefficients[J]. IEEE Transactions on Evolutionary Computation, 2004, 8(3): 240-255. DOI: 10.1109/TEVC.2004.826071
[13]	ZHAN Zhi-hui, ZHANG Jun, LI Yun, et al. Adaptive particle swarm optimization[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics, 2009, 39(6): 1362-1381. DOI: 10.1109/TSMCB.2009.2015956
[14]	XU Gui-ping, CUI Quan-long, SHI Xiao-hu, et al. Particle swarm optimization based on dimensional learning strategy[J]. Swarm and Evolutionary Computation, 2019, 45: 33-51. DOI: 10.1016/j.swevo.2018.12.009
[15]	ANG K M, LIM W H, ISA N A M, et al. A constrained multi-swarm particle swarm optimization without velocity for constrained optimization problems[J]. Expert Systems with Applications, 2020, DOI: 10.1016/j.eswa.2019.112882.
[16]	ZHANG Xu-wei, LIU Hao, ZHANG Tong, et al. Terminal crossover and steering-based particle swarm optimization algorithm with disturbance[J]. Applied Soft Computing, 2019, DOI: 10.1016/j.asoc.2019.105841.
[17]	WANG Sheng-liang, LIU Gen-you, GAO Ming, et al. Heterogeneous comprehensive learning and dynamic multi-swarm particle swarm optimizer with two mutation operators[J]. Information Sciences, 2020, DOI: 10.1016/j.ins.2020.06.027.
[18]	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. Computer Science, 2016, 8(6): A187.
[19]	SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//Proceedings of the International Conference on Machine Learning. [S.l.]: ACM, 2014: 387-395.
[20]	TIAN Dong-qiang, ZHAO Xiao-fei, SHI Zhong-zhi. Chaotic particle swarm optimization with sigmoid-based acceleration coefficients for numerical function optimization[J]. Swarm and Evolutionary Computation, 2019, 51: 100573. DOI: 10.1016/j.swevo.2019.100573
[21]	张强, 郭玉洁, 王颖, 等. 一种离散鲸鱼算法及其应用[J]. 电子科技大学学报, 2020, 49(4): 622-630. DOI: 10.12178/1001-0548.2019116 ZHANG Qiang, GUO Yu-jie, WANG Ying, et al. A discrete whale optimization algorithm and application[J]. Journal of the University of Electronic Science and Technology of China, 2020, 49(4): 622-630. DOI: 10.12178/1001-0548.2019116

施引文献

资源附件(0)

图(8) / 表(6)

计量

文章访问数: 7415
HTML全文浏览量: 3066
PDF下载量: 132
被引次数: 0

基于深度确定性策略梯度的粒子群算法

作者简介:
鲁华祥(1965-)，男，博士，研究员，博士生导师，主要从事神经计算芯片、类脑神经计算技术和应用系统方面的研究

通讯作者:
龚国良，E-mail：gongmianjie@semi.ac.cn

计量

A Particle Swarm Optimization Algorithm Based on Deep Deterministic Policy Gradient

计量

目录

期刊在线

编辑办公

友情链接

基于深度确定性策略梯度的粒子群算法

作者简介: 鲁华祥(1965-)，男，博士，研究员，博士生导师，主要从事神经计算芯片、类脑神经计算技术和应用系统方面的研究

通讯作者: 龚国良，E-mail：gongmianjie@semi.ac.cn

计量

出版历程

A Particle Swarm Optimization Algorithm Based on Deep Deterministic Policy Gradient

计量

出版历程

目录

期刊在线

编辑办公

友情链接

作者简介:
鲁华祥(1965-)，男，博士，研究员，博士生导师，主要从事神经计算芯片、类脑神经计算技术和应用系统方面的研究

通讯作者:
龚国良，E-mail：gongmianjie@semi.ac.cn