-
2010年全球近视人群约有19.5亿,占世界总人口的28.3%[1]。2018年我国儿童青少年总体近视率为53.6%[2],远超国际水平[3]。目前美国[4-5]、新加坡[6-7]、澳大利亚[8]等国家已采用队列研究的方法对儿童近视影响因素展开研究。国内也有不少针对中国各城市青少年近视情况的分析,如上海[9-10]、安阳[11]、广州[12]、温州[13]、北京[14]等。文献[15]分析了来自Orinda近视纵向研究的数据子集MYOPIA,认为父母近视情况、户外运动时间、阅读时间和性别等因素对近视有较大影响。Orinda近视纵向研究[15-17]、CLEERE父母近视史的研究[18]以及近视影响因素研究[19]都认为青少年早期屈光度能够用于预测其未来近视的发生。文献[20]分析了长达10年的临床屈光数据,发现机器学习方法可以有效预测高度近视发生的几率。
现有研究中使用最多的方法为队列分析和逻辑回归[21]。其中队列研究是探讨疾病病因的常用方法之一,能较好地揭示两事件间的因果关系。但设计和组织实施较难,收集与分析资料较复杂。逻辑回归分析其决策面是线性的,难以处理数据不平衡的问题。本文利用斯皮尔曼相关系数(spearman's rank correlation coefficient)[22]分析各影响因素与未来视力的相关性,并细分高度近视与普通近视随年龄的变化情况。斯皮尔曼相关系数是衡量两个变量的依赖性的非参数指标,利用单调方程评价两个统计变量的相关性,适用于总体分布未知或有序变量相关性分析。本文构建了适用于小数据集和一次检查数据的集成学习算法模型,仅需输入一次检查数据即可对未来任意时刻视力情况进行量化预测。通过对比5种常见的集成学习算法,发现随机森林模型的综合表现最佳。本工作在近视预测及防控方面具有一定的参考价值。
-
本文选取了随机森林(random forest)[23]、自适应提升(AdaBoost)[24]、装袋(bagging)[25]、梯度提升(gradient boosting)[26-27]和极端梯度提升(XGBoost)[28]5种集成学习模型,把数据集B中同一个检查对象的前一个时间点数据(含个人信息)和间隔时间一起作为训练数据,预测其后一个时间点的近视情况。样本数据随机划分为70%训练集和30%测试集,预测对象为裸眼视力和等效球镜,并用预测值与真实值差的绝对值作为误差。预测模型的结构如图5所示。
如表1所示,随机森林方法和梯度提升方法的误差相对最低,但梯度提升方法在间隔时间较长之后,会出现结果突变和失稳,特别是突然出现视力大幅度变好的情况,而这与医学实践知识是不符合的,也没有在其他具有更长时间间隔的真实数据中观察到。因此,综合考虑算法的精确性和鲁棒性,我们认为随机森林是最好的模型,而如果只预测较短时间之内的视力变化情况(如半年之内),则梯度提升方法效果也很好。如果用随机森林算法,只对下一个时间点“是否是近视”进行预测,则准确度可以达到92.8%。
表 1 不同模型预测误差对比
模型 误差 右眼裸眼视力 左眼裸眼视力 右眼等效球镜 左眼等效球镜 随机森林 0.111 0.120 0.379 0.387 梯度提升 0.115 0.118 0.368 0.364 极端梯度提升 0.132 0.132 0.379 0.387 装袋 0.119 0.123 0.381 0.396 自适应提升 0.147 0.125 0.396 0.374
Myopia Contributing Factors and Myopia Prediction Based on Vision Examination Data
-
摘要: 该文分析了国内外近视检查数据,统计结果显示中国青少年近视发生率远远超过国际水平,其中8~12岁是近视新增的高发时期,平均每年约有20%的非近视学生转为近视学生,而10~14岁是高度近视新增的危险时期。此外,父母近视情况和户外活动时间对近视形成的影响最大,高于电脑使用时间和看电视时间的影响。该文采用5种集成学习方法对未来视力情况进行预测,综合考虑鲁棒性和精确度,随机森林模型预测效果最好,其中近视的预测准确率在70%训练集、30%测试集划分的情况下为92.8%。Abstract: This paper analyzes myopia examination data at home and abroad. Statistics show that the incidence of myopia in Chinese adolescents far exceeds the international adolescents. 8 to 12 years old is a period when the number of myopia is increasing rapidly. About 20% of non-myopia students turn into myopia students every year in this period. The age of 10 to 14 is a dangerous period of suffering from high myopia. Time for outdoor activities and parents’ myopia have the greatest impact on the occurrence of myopia, higher than that of the time spent on computer and the time spent on watching TV. This paper uses five ensemble learning methods to predict people’s future vision. Considering the robustness and accuracy, the random forest model has the best prediction effect. The prediction accuracy of myopia is 92.8% in the case of 70% training set and 30% test set.
-
Key words:
- ensemble learning /
- myopia /
- prediction /
- random forest
-
表 1 不同模型预测误差对比
模型 误差 右眼裸眼视力 左眼裸眼视力 右眼等效球镜 左眼等效球镜 随机森林 0.111 0.120 0.379 0.387 梯度提升 0.115 0.118 0.368 0.364 极端梯度提升 0.132 0.132 0.379 0.387 装袋 0.119 0.123 0.381 0.396 自适应提升 0.147 0.125 0.396 0.374 -
[1] HOLDEN B A, FRICKE T R, WILSON D A, et al. Global prevalence of myopia and high myopia and temporal trends from 2000 through 2050[J]. Ophthalmology, 2016, 123(5): 1036-1042. [2] 中华人民共和国国家卫生健康委员会宣传司. 儿童青少年总体近视率为53.6%我国将更有针对性地开展近视干预[EB/OL]. (2019-5-18). [2020-10-20]. http://www.nhc.gov.cn/xcs/s7847/201905/11c679a40eb3494cade977f65f1c3740.shtml. [3] World Health Organization. The impact of myopia and high myopia[EB/OL]. (2015-03-16). [2020-10-20]. https://www.who.int/blindness/causes/MyopiaReportforWeb.pdf. [4] ZADNIK K, SINNOTT L T, COTTER S A, et al. Prediction of juvenile-onset myopia[J]. JAMA Ophthalmol, 2015, 133(6): 683-689. doi: 10.1001/jamaophthalmol.2015.0471 [5] MUTTI D O, SINNOTT L T, MITCHELL G L, et al. Relative peripheral refractive error and the risk of onset and progression of myopia in children[J]. Invest Ophthalmol Vis Sci, 2011, 52(1): 199-205. doi: 10.1167/iovs.09-4826 [6] SAW S M, SHANKAR A, TAN S B, et al. A cohort study of incident myopia in Singaporean children[J]. Invest Ophthalmol Vis Sci, 2006, 47(5): 1839-1844. doi: 10.1167/iovs.05-1081 [7] TONG L, CHAN Y H, GAZZARD G, et al. Longitudinal study of anisometropiain Singaporean school children[J]. Invest Ophthalmol Vis Sci, 2006, 47(8): 3247-3252. doi: 10.1167/iovs.05-0906 [8] OJAIMI E, ROSE K A, SMITH W, et al. Methods for a population-based study of myopia and other eye conditions in school children: the Sydney myopia study[J]. Ophthalmic Epidemiol, 2005, 12(1): 59-69. doi: 10.1080/09286580490921296 [9] 马莹琰. 上海儿童近视流行及预测相关研究[D]. 上海: 上海交通大学, 2016. MA Ying-yan. Epidemiolgical studies of myopia in shanghai children and relevant methods for myopia prediction[D]. Shanghai: Shanghai Jiao Tong University, 2016. [10] MA Ying-yan, ZOU Hai-dong, LIN Sen-lin, et al. Cohort study with 4-year follow-up of myopia and refractive parameters in primary schoolchildren in Baoshan district, Shanghai[J]. Clin Exp Ophthalmol, 2018, 46(8): 861-872. doi: 10.1111/ceo.13195 [11] LI Shi-ming, LI He, LI Si-yuan, et al. Time outdoors and myopia progression over 2 years in Chinese children: The Anyang childhood eye study[J]. Invest Ophthalmol Vis Sci, 2015, 56(8): 4734-4740. doi: 10.1167/iovs.14-15474 [12] HE Ming-guang, ZENG Jun-wen, LIU Yi-zhi, et al. Refractive error and visual impairment in urban children in southern china[J]. Invest Ophthalmol Vis Sci, 2004, 45(3): 793-799. doi: 10.1167/iovs.03-1051 [13] 张加裕, 王强, 林思思, 等. 温州地区7~14岁儿童近视眼患病率和眼轴及其相关因素分析[J]. 中华眼科杂志, 2016, 52(7): 514-519. ZHANG Jia-yu, WANG Qiang, LIN Si-si, et al. Analysis of myopia and axial length changes and relevant factors of children aged 7 to 14 years in Wenzhou[J]. Chinese Journal of Ophthalmology, 2016, 52(7): 514-519. [14] 元力, 万博, 鲍永珍. 近视眼人群屈光状态与主视眼的相关性研究[J]. 中华眼科杂志, 2020, 56(9): 693-698. YUAN Li, WAN Bo, BAO Yong-zhen. Association between ocular dominance and refraction in myopic subjects[J]. Chinese Journal of Ophthalmology, 2020, 56(9): 693-698. [15] GIANNATOU E. Myopia study, statistics for business analytics II[EB/OL]. [2017-02-01]. https://github.com/evagian/Myopia-Study-classification-logistic-regression-R. [16] ZADNIK K, FRIEDMAN N E, QUALLEY P A, et al. Ocular predictors of the onset of juvenile myopia[J]. Invest Ophthalmol Vis Sci, 1999, 40(9): 1936-1943. [17] HOSMER D W, LEMESHOW S, STURDIVANT R X. Applied logistic regression[M]. The 3rd ed. [S.l.]: John Wiley & Sons Inc, 2013. [18] JONES-JORDAN L A, MANNY R E, COTTER S A, et al. Early childhood refractive error and parental history of myopia as predictors of myopia[J]. Invest Ophthalmol Vis Sci, 2010, 51: 115-121. doi: 10.1167/iovs.08-3210 [19] FRENCH A N, MITCHELL P, ROSE K A. Risk factors for incident myopia in Australian schoolchildren: the Sydney adolescent vascular and eye study[J]. Ophthalmology, 2013, 120(10): 2100-2108. doi: 10.1016/j.ophtha.2013.02.035 [20] LIN Hao-tian, LONG E, DING Xiao-hu, et al. Prediction of myopia development among Chinese school-aged children using refraction data from electronic medical records: A retrospective, multicentre machine learning study[J]. PLOS Medicine, 2018, 15(11): 1-17. [21] COX D R. The regression analysis of binary sequences[J]. Journal of the Royal Statistical Society: Series B (Methodological), 1958, 20(2): 215-232. doi: 10.1111/j.2517-6161.1958.tb00292.x [22] SPEARMAN C. The proof and measurement of association between two things[J]. Am J Psychol, 1904, 15(1): 72-101. doi: 10.2307/1412159 [23] KAM H T. Random decision forest[C]//Proceedings of the 3rd International Conference on Document Analysis and Recognition. Montreal, Canada: [s.n.], 1995, 1416: 278-282. [24] FREUND Y, SCHAPIRE R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119-139. doi: 10.1006/jcss.1997.1504 [25] BREIMAN L. Bagging predictors[J]. Machine Learning, 1996, 24(2): 123-140. [26] FRIEDMAN J H. Greedy function approximation: A gradient boosting machine[J]. Annals of Statistics, 2001: 1189-1232. [27] FRIEDMAN J H. Stochastic gradient boosting[J]. Computational Statistics & Data Analysis, 2002, 38(4): 367-378. [28] CHEN T, GUESTRIN C. Xgboost: A scalable tree boosting system[C]//Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining. [S.l.]: ACM, 2016: 785-794.