Abstract:
To predict the hospitalization costs of lung cancer patients and analyze its influence factors is conducive to better understanding the hospitalization expenses and economic burden of lung cancer patients, and has reference significance for optimizing medical payment policies. This study included records of 12 117 adult lung cancer patients hospitalized between Jan 2020 and Sep 2023 from multiple hospitals in a province. Firstly, K-means clustering was employed to categorize the hospitalization costs, and then 25 potentially influencing factors were screened out from 42 factors using single-factor logistic regression. After that, this study constructed and evaluated hospitalization costs prediction models based on CatBoost and XGBoost, respectively, and measured the influence of these factors on hospitalization costs based on the feature importance value. Furthermore, employing the significant factors identified by the prediction models, this study developed a high hospitalization costs scoring tool using a multi-variable logistic regression approach. Results show that both CatBoost and XGBoost have good predictive performance (AUC>0.95), with CatBoost performing slightly better than XGBoost. Based on the CatBoost model, this study identified nine factors affecting the cost of hospitalization: length of hospital stay, type of surgery, radiotherapy, number of rescues, histological classification of lung cancer, age, chemotherapy, first hospitalization, and neutrophil count level, and seven of them were included in the scoring tool according to the assignment criteria. The differentiation and calibration of the scoring tool were validated on the test set, showing an excellent AUC of 0.958, indicating exceptional performance.