基于XGBoost特征选择的幕课翘课指数建立及应用

The Establishment and Application of Drop-Out-Index of MOOCs Based on XGBoost Feature Selection

  • 摘要: 翘课行为反应了幕课的质量问题,也是在线教育的核心问题之一。该文通过对真实的在线教育数据进行分析,结合在线教育领域的先验知识,针对数据中的丰富海量的特征问题,提出了基于XGBoost特征重要度计算和分类的翘课特征选择方法,并建立了在线教育的翘课指数(DOI)。基于学堂在线数据集提取的海量特征的实证分析表明,基于XGBoost的特征选择方法比其他经典特征选择方法具有更好的效果。在数据集的不同时间点上使用翘课指数模型作翘课预测,验证了翘课指数的有效性。

     

    Abstract: Dropout of classes reflects the quality of MOOCs, which is the key issue of online education. In order to predict the dropout rate in advance, this paper presents an efficient prediction framework based on the analysis on real online education data and the prior knowledge of online education. This presented framework combines the feature importance learning and the selection by the classification algorithm of XGBoost, and establishes a Drop-Out-Index (DOI) for online courses. Experiments analysis on massive features extracted from the online-data of XueTang website shows that the feature selection method based on XGBoost achieves better results than other feature selection methods. The validity of DOI has also been verified by testing on different time points in the data set.

     

/

返回文章
返回