成年肺癌患者住院费用预测模型及评分工具研究

Prediction model and scoring tool for hospitalization costs in adult patients with lung cancer

  • 摘要: 分析和研究肺癌患者住院费用的影响因素有利于更好地理解肺癌住院支出及疾病负担,也对优化医疗支付政策等工作有重要的参考意义。该研究共纳入12117例2020年1月—2023年9月间,某省多家医院的成年肺癌患者住院记录数据,首先利用K-means聚类将住院费用进行离散化预处理,并采用单因素logistic回归从42个因素中筛选出25个潜在影响因素,之后基于CatBoost和XGBoost分别构建成年肺癌患者住院费用预测模型并开展模型性能评估,以变量的特征重要性评分为依据衡量其对住院费用的影响程度。该研究还使用基于多因素logistic回归的方法建立了高住院费用评分工具。结果显示,CatBoost和XGBoost均具有良好的预测性能(AUC>0.95),CatBoost表现略优于XGBoost。基于CatBoost模型,该研究明确了住院天数、手术级别、是否放疗、抢救次数、肺癌组织学分型、年龄、是否化疗、是否首次住院和中性粒细胞计数共9个影响肺癌住院费用的重要因素,并根据赋分标准将其中7个因素纳入评分工具。评分工具的区分度和校准度在测试集上得到验证,结果显示评分工具的AUC值达到0.958,表现出了卓越的性能。

     

    Abstract: To predict the hospitalization costs of lung cancer patients and analyze its influence factors is conducive to better understanding the hospitalization expenses and economic burden of lung cancer patients, and has reference significance for optimizing medical payment policies. This study included records of 12 117 adult lung cancer patients hospitalized between Jan 2020 and Sep 2023 from multiple hospitals in a province. Firstly, K-means clustering was employed to categorize the hospitalization costs, and then 25 potentially influencing factors were screened out from 42 factors using single-factor logistic regression. After that, this study constructed and evaluated hospitalization costs prediction models based on CatBoost and XGBoost, respectively, and measured the influence of these factors on hospitalization costs based on the feature importance value. Furthermore, employing the significant factors identified by the prediction models, this study developed a high hospitalization costs scoring tool using a multi-variable logistic regression approach. Results show that both CatBoost and XGBoost have good predictive performance (AUC>0.95), with CatBoost performing slightly better than XGBoost. Based on the CatBoost model, this study identified nine factors affecting the cost of hospitalization: length of hospital stay, type of surgery, radiotherapy, number of rescues, histological classification of lung cancer, age, chemotherapy, first hospitalization, and neutrophil count level, and seven of them were included in the scoring tool according to the assignment criteria. The differentiation and calibration of the scoring tool were validated on the test set, showing an excellent AUC of 0.958, indicating exceptional performance.

     

/

返回文章
返回