Reinforcement Learning for Model Selection and Hyperparameter Optimization

WU Jia; CHEN Sen-peng; CHEN Xiu-yun; ZHOU Rui

doi:10.12178/1001-0548.2018279

Volume 49 Issue 2

Mar. 2020

Article Contents

Article Navigation > Journal of University of Electronic Science and Technology of China > 2020 > 49(2): 255-261

WU Jia, CHEN Sen-peng, CHEN Xiu-yun, ZHOU Rui. Reinforcement Learning for Model Selection and Hyperparameter Optimization[J]. Journal of University of Electronic Science and Technology of China, 2020, 49(2): 255-261. doi: 10.12178/1001-0548.2018279

Citation:

WU Jia, CHEN Sen-peng, CHEN Xiu-yun, ZHOU Rui. Reinforcement Learning for Model Selection and Hyperparameter Optimization[J]. Journal of University of Electronic Science and Technology of China, 2020, 49(2): 255-261. doi: 10.12178/1001-0548.2018279

Reinforcement Learning for Model Selection and Hyperparameter Optimization

doi: 10.12178/1001-0548.2018279

School of Information and Software Engineering, University of Electronic Science and Technology of China　Chengdu　610054

Received Date: 2018-10-31
Rev Recd Date: 2019-09-04

Available Online: 2020-03-06

Publish Date: 2020-03-01

Abstract

With the development of machine learning technology, the number of machine learning algorithms grows rapidly and the models become more and more complex. That causes two major problems in practice: the selection of machine learning models and the hyperparameter optimization. In order to tackle these issues, this paper proposes a new method based on deep reinforcement learning. Long short-term memory (LSTM) network is used to build an agent which automatically selects the machine learning model and optimizes hyperparameters for a given dataset. The agent aims to maximize the accuracy of the selected machine learning model on the validation dataset. At each iteration, it utilizes the accuracy of the selected model on the validation dataset as a reward signal to improve its decision for the next time. The reinforcement learning algorithm is used to guide the learning process for the agent. To verify the idea, the proposed method is compared with two widely optimization methods, tree-structured Parzen estimator and random search on UCI datasets. The results show that the proposed method outperforms other methods in terms of stability, time efficiency and accuracy.
- deep reinforcement learning,
- hyperparameter optimization,
- LSTM network,
- machine learning,
- model selection

References

[1]	WU Y, SCHUSTER M, CHEN Z, et al. Google's neural machine translation system: Bridging the gap between human and machine translation[J]. CoRR, 2016, 1: 1-10.
[2]	ZHOU Shi-yu, DONG Lin-hao, XU Shuang, et al. Syllable-based sequence-to-sequence speech recognition with the transformer in Mandarin Chinese[J]. Interspeech, 2018, 10: 791-795.
[3]	SAINATH T N, WEISS R J, WILSON K W, et al. Multichannel signal processing with deep neural networks for automatic speech recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(5): 965-979. doi: 10.1109/TASLP.2017.2672401
[4]	KIM J, EL-KHAMY M, LEE J. Residual LSTM: Design of a deep recurrent architecture for distant speech recognition[J]. Interspeech, 2017, 12: 1591-1595.
[5]	BIGDELI S A, JIN M, FAVARO P, et al. Deep mean-shift priors for image restoration[C]//Advances in Neural Information Processing Systems 30. Long Beach, CA: DBLP, 2017: 763-772.
[6]	LIN C H, LUCEY S. Inverse compositional spatial transformer networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA: IEEE Computer Society, 2017: 2252-2260.
[7]	SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676): 354-359. doi: 10.1038/nature24270
[8]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780. doi: 10.1162/neco.1997.9.8.1735
[9]	WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3-4): 229-256. doi: 10.1007/BF00992696
[10]	BERGSTRA J, BENGIO Y. Random search for hyper-parameter optimization[J]. Journal of Machine Learning Research, 2012, 13(1): 281-305.
[11]	PFAHRINGER B, REUTEMANN P, WITTEN I H, et al. The WEKA data mining software: An update[J]. ACM SIGKDD Explorations Newsletter, 2009, 11(1): 10-21. doi: 10.1145/1656274.1656278
[12]	THORNTON C, HUTTER F, HOOS H H, et al. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms[C]//International Conference on Knowledge Discovery and Data Mining.[S.l.]: ACM, 2012: 847-855.
[13]	SNOEK J, LAROCHELLE H, ADAMS R P S. Practical bayesian optimization of machine learning algorithms[J]. Advances in Neural Information Processing Systems, 2012, 4: 2951-2959.
[14]	HUTTER F, HOOS H H, LEYTONBROWN K. Sequential model-based optimization for general algorithm configuration[C]//Learning and Intelligent Optimization - International Conference. Rome, Italy: DBLP, 2011: 507-523.
[15]	JAMES B, REMI B, YOSHUA B, et al. Algorithms for hyper-parameter optimization[C]//Advances in Neural Information Processing Systems. Granada, Spain: DBLP, 2011: 2546-2554.
[16]	LINDAUER M, HUTTER F. Warmstarting of model-based algorithm configuration[C]//Association for the Advancement of Artificial Intelligence(AAAI). Louisiana, USA: DBLP, 2018: 1355-1362.
[17]	HANSEN N. The CMA evolution strategy: A comparing review[J]. Studies in Fuzziness & Soft Computing, 2007, 192: 75-102.
[18]	FALKNER S, KLEIN A, HUTTER F. BOHB: Robust and efficient hyperparameter optimization at scale[J]. Dy and Krause, 2018, 24: 1437-1446.
[19]	SZEPESVÁRI C. Algorithms for reinforcement learning[J]. International Conference on Computing, 2009, 21: 234-243.
[20]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533. doi: 10.1038/nature14236
[21]	SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484. doi: 10.1038/nature16961
[22]	LEVINE S, FINN C, DARRELL T, et al. End-to-end training of deep visuomotor policies[J]. Journal of Machine Learning Research, 2015, 17(1): 1334-1373.
[23]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. Computer Science, 2013, 10: 431-439.
[24]	GU S, HOLLY E, LILLICRAP T, et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates[C]//International Conference on Robotics and Automation. Singapore: DBLP, 2017: 3389-3396.
[25]	CERNADAS E, AMORIM D. Do we need hundreds of classifiers to solve real world classification problems?[J]. Journal of Machine Learning Research, 2014, 15(1): 3133-3181.
[26]	KINGMA D P, BA J. Adam: A method for stochastic optimization[J]. Computer Science, 2014, 30: 1272-1282.

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(4) / Tables(3)

Get Citation

PDF

XML

Article Metrics

Article views(13451) PDF downloads(273) Cited by()

Proportional views

HTML

近年来，机器学习已广泛应用于如机器翻译^[1-2]、语音识别^[3-4]、图像识别^[5-6]和游戏^[7]等众多领域。针对某一问题，如何快速构建一个成熟、可靠的机器学习模型就显得尤为重要。为了满足行业需要，使机器学习算法能够得到快速、高效的利用，一大批企业针对普通用户开发出了一些应用系统，如DataRobot.com^[8]、BigML.com^[9]、Wise.io^[10]等。在机器学习算法的应用中，不可避免涉及两个重要问题：算法模型选择和超参数优化。

现有的机器学习算法众多，具有代表性的算法有逻辑回归(logistic regression)、支持向量机(support vector machine)、决策树(decision tree)和随机森林(random forest)等。针对不同的问题，没有一个机器学习算法模型能够适用于所有问题。在同一问题上，不同的方法所达到的性能也存在不同程度的差异。这给机器学习算法的使用者造成了不小的麻烦。算法模型选择成了机器学习算法广泛应用的一大障碍。

另外，超参数优化同样成为了机器学习算法应用中的难点之一。超参数不同于算法模型内部的参数，它是在算法模型训练之前设置的参数。在训练开始之前，往往希望找到一组超参数的值，即超参数组合，使得算法模型可以在合理的时间范围内对某一数据集的分类或拟合达到最佳性能。这个过程被称为超参数优化，它对机器学习算法的性能起着至关重要的作用。在实践中通常需要不断调整超参数的值，最终选择最佳的超参数组合。若算法模型的超参数搜索空间较大，该过程将非常耗时。

因此，针对某一问题(或数据集)，最终结果很大程度上是由机器学习算法模型和算法对应的超参数组合共同决定的。本文提出了一种基于深度强化学习的方法，用于自动实现机器学习算法的选择和超参数的优化。该方法利用长短期记忆(LSTM)网络^[8]构建一个智能体(Agent)来代替机器学习使用者选择最优的机器学习算法及其超参数；Agent在训练集上训练所选择的机器学习算法及超参数组合所对应的算法模型，在验证数据集上验证该算法模型的性能；以在验证集上的准确度作为奖赏值，利用策略梯度算法(policy gradient)^[9]优化Agent的决策。经过多次迭代，Agent选择出适合该问题的最优模型及对应的超参数。在Agent训练过程中，梯度方差较大，本文提出引导数据池来解决该问题。本文主要的贡献在于以下3点：

1) 使用强化学习框架来解决模型选择和超参数优化问题；

2) 提出了数据引导池结构来提高方法的稳定性；

3) 通过在标准数据集上对8种机器学习算法进行优化，相比于其他方法，本文提出的方法达到了最好的优化结果。

4. 结束语

本文提出了一种基于深度强化学习的超参数优化方法。该方法利用长短时记忆网络构建了一个Agent，针对不同问题(数据集)自动进行算法选择超参数优化。Agent以最大化模型在验证集上的准确率为目标，以Agent每次选择的所对应的模型在验证数据集上的准确率作为奖赏值，利用策略梯度算法来修正Agent的模型参数。经过多次迭代，Agent最终收敛并选择出最优的算法模型及超参数组合。为了验证算法的可行性和性能，利用Agent对两种标准数据集进行优化实验。通过对比TPE和随机搜索两种具有代表性的超参数优化方法，本文提出的方法在准确率、运行时间效率和稳定性上均优于上述算法，特别是对于规模较大的问题，具有绝对优势，其完成优化所需的时长最低仅约为随机搜索方法的12%和TPE优化方法的19%。

Reference (26)

[1]	WU Y, SCHUSTER M, CHEN Z, et al. Google's neural machine translation system: Bridging the gap between human and machine translation[J]. CoRR, 2016, 1: 1-10.
[2]	ZHOU Shi-yu, DONG Lin-hao, XU Shuang, et al. Syllable-based sequence-to-sequence speech recognition with the transformer in Mandarin Chinese[J]. Interspeech, 2018, 10: 791-795.
[3]	SAINATH T N, WEISS R J, WILSON K W, et al. Multichannel signal processing with deep neural networks for automatic speech recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2017, 25(5): 965-979.
[4]	KIM J, EL-KHAMY M, LEE J. Residual LSTM: Design of a deep recurrent architecture for distant speech recognition[J]. Interspeech, 2017, 12: 1591-1595.
[5]	BIGDELI S A, JIN M, FAVARO P, et al. Deep mean-shift priors for image restoration[C]//Advances in Neural Information Processing Systems 30. Long Beach, CA: DBLP, 2017: 763-772.
[6]	LIN C H, LUCEY S. Inverse compositional spatial transformer networks[C]//2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA: IEEE Computer Society, 2017: 2252-2260.
[7]	SILVER D, SCHRITTWIESER J, SIMONYAN K, et al. Mastering the game of Go without human knowledge[J]. Nature, 2017, 550(7676): 354-359.
[8]	HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8): 1735-1780.
[9]	WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3-4): 229-256.
[10]	BERGSTRA J, BENGIO Y. Random search for hyper-parameter optimization[J]. Journal of Machine Learning Research, 2012, 13(1): 281-305.
[11]	PFAHRINGER B, REUTEMANN P, WITTEN I H, et al. The WEKA data mining software: An update[J]. ACM SIGKDD Explorations Newsletter, 2009, 11(1): 10-21.
[12]	THORNTON C, HUTTER F, HOOS H H, et al. Auto-WEKA: Combined selection and hyperparameter optimization of classification algorithms[C]//International Conference on Knowledge Discovery and Data Mining.[S.l.]: ACM, 2012: 847-855.
[13]	SNOEK J, LAROCHELLE H, ADAMS R P S. Practical bayesian optimization of machine learning algorithms[J]. Advances in Neural Information Processing Systems, 2012, 4: 2951-2959.
[14]	HUTTER F, HOOS H H, LEYTONBROWN K. Sequential model-based optimization for general algorithm configuration[C]//Learning and Intelligent Optimization - International Conference. Rome, Italy: DBLP, 2011: 507-523.
[15]	JAMES B, REMI B, YOSHUA B, et al. Algorithms for hyper-parameter optimization[C]//Advances in Neural Information Processing Systems. Granada, Spain: DBLP, 2011: 2546-2554.
[16]	LINDAUER M, HUTTER F. Warmstarting of model-based algorithm configuration[C]//Association for the Advancement of Artificial Intelligence(AAAI). Louisiana, USA: DBLP, 2018: 1355-1362.
[17]	HANSEN N. The CMA evolution strategy: A comparing review[J]. Studies in Fuzziness & Soft Computing, 2007, 192: 75-102.
[18]	FALKNER S, KLEIN A, HUTTER F. BOHB: Robust and efficient hyperparameter optimization at scale[J]. Dy and Krause, 2018, 24: 1437-1446.
[19]	SZEPESVÁRI C. Algorithms for reinforcement learning[J]. International Conference on Computing, 2009, 21: 234-243.
[20]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
[21]	SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484.
[22]	LEVINE S, FINN C, DARRELL T, et al. End-to-end training of deep visuomotor policies[J]. Journal of Machine Learning Research, 2015, 17(1): 1334-1373.
[23]	MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. Computer Science, 2013, 10: 431-439.
[24]	GU S, HOLLY E, LILLICRAP T, et al. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates[C]//International Conference on Robotics and Automation. Singapore: DBLP, 2017: 3389-3396.
[25]	CERNADAS E, AMORIM D. Do we need hundreds of classifiers to solve real world classification problems?[J]. Journal of Machine Learning Research, 2014, 15(1): 3133-3181.
[26]	KINGMA D P, BA J. Adam: A method for stochastic optimization[J]. Computer Science, 2014, 30: 1272-1282.

算法模型	超参数	候选值范围	间隔
RandomFor-estClassifier	n_estimators	[100～1 200]	100
	max_depth	[2～30]	2
	min_samples_split	[1～99]	2
	min_samples_leaf	[1～99]	2
	max_features	[sqrt,log2,None]	无
	criterion	[gini,entropy]	无
	bootstrap	[True,False]	无
XGBClassif-ier	max_depth	[3～25]	2
	gamma	[0.05～0.9]	0.05
	min_child_weight	[1～9]	2
	subsample	[0.1～0.9]	0.1
	colsample_bytree	[0.1～0.9]	0.1
	reg_alpha	[0.0～1.0]	0.1
	reg_lambda	[0.01～0.1]	0.01
	learning_rate	[0.005～0.1]	0.005
DecisionTre-eClassifier	criterion	[gini,entropy]	无
	splitter	[best,random]	无
	max_depth	[2～30]	2
	min_samples_split	[1～99]	2
	min_samples_leaf	[1～99]	2
	max_features	[sqrt,log2,None]	无
SVC	C	[0.000 5～0.01]	0.000 5
	kernel	[linear,poly,rbf,sigmoid]	无
	class_weight	[balanced,None]	无
Kneighbor-sClassifier	n_neighbors	[2～100]	2
	weights	[uniform,distance]	无
	algorithm	[auto,ball_tree,kd_tree,brut]	无
	leaf_size	[5～50]	5
	p	[1～5]	1
AdaBoostCl-assifier	n_estimators	[100～1 200]	100
	learning_rate	[0.1～1.0]	0.1
	algorithm	[SAMME,SAMME.R]	无
ExtraTreesC-lassifier	n_estimators	[100～1 200]	100
	criterion	[gini,entropy]	无
	max_features	[sqrt,log2,None]	无
	max_depth	[1～29]	2
	min_samples_split	[1～99]	2
	min_samples_leaf	[1～99]	2
BaggingCla-ssifier	n_estimators	[100～1200]	100
	max_samples	[0.1～0.9]	0.1
	max_features	[0.1～0.9]	0.1
	bootstrap	[True,False]	无
	bootstrap_features	[True,False]	无
	warm_start	[True,False]	无

数据集名称	UCI手写数字数据集	UCI Spambase 数据集	UCI Car Evaluation 数据集
适用任务类别	多分类	二分类	多分类
标签类别	10种(0−9)整数	2种(0,1)	4种
特征数量/个	64	57	6
是否有缺失值	否	否	否
数据集大小/条	5 620	4 601	1 728

数据集	UCI手写数字数据集实验结果
方法	准确度	耗时/min	标准差
Agent	0.987 5	219.4	0.009 36
TPE	0.983 9	1 167.5	0.026 43
Rand	0.985 3	1 825.9	0.038 62
CMAES	0.985 6	776.3	0.006 71

数据集	UCI Spambase数据集实验结果
方法	准确度	耗时/min	标准差
Agent	0.955 2	199.5	0.012 2
TPE	0.953 5	462.1	0.019 4
Rand	0.952 7	870.5	0.028 0
CMAES	0.954 0	355.6	0.013 4

数据集	UCI Car Evaluation数据集实验结果
方法	准确度	耗时/min	标准差
Agent	0.975 4	60.7	0.003 4
TPE	0.948 2	78.3	0.010 6
Rand	0.921 6	80.6	0.033 4
CMAES	0.952 1	66.3	0.009 7

Reinforcement Learning for Model Selection and Hyperparameter Optimization

doi: 10.12178/1001-0548.2018279

Abstract

References

Proportional views

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Related

Proportional views