基于Web数据挖掘的COVID-19流行病学特征分析

Epidemiological Characteristics of Novel Coronavirus COVID-19 Based on Web Data Mining

  • 摘要: 基于Selenium数据挖掘技术,通过对2020年2月4日−22日新浪微博“肺炎患者求助超话”中690例有效病例的分析,获得了新浪微博中真实求助病例的流行病学特征。研究发现,求助患者97.6%来自于武汉,重点集中在武昌、硚口、汉阳等中心城区,与当地的医疗资源和人口密度成正比。微博求助病例主要分布在2020年2月4日−7日,随着医疗资源紧张程度的缓解,通过微博求助的病例明显减少。求助患者确诊日期主要分布在2020年1月16日−2月6日,与中国疾控中心发布的病例分布情况基本一致。求助患者年龄分布中位数为60岁,明显高于中国疾控中心发布的数据,但与武汉市中心医院的数据基本吻合。该文研究结果说明,针对重大突发性传染病,微博等社交媒体除了在舆论传播上发挥作用,在流行病学分析上也具有重要意义。基于社交媒体的实时性和广泛性,结合数据挖掘和大数据分析等方法,有助于决策层快速掌握一线真实情况。

     

    Abstract: Based on the Selenium data mining technology, the epidemiological characteristics of real help cases in Sina Weibo were obtained by the analysis of 690 valid cases posted in the Sina Weibo “Pneumonia Patients Asking for Help” topic from February 4 to February 22, 2020. The research showed that 97.6% of the patients seeking for help came from Wuhan, mainly centralized in Wuchang, Tongkou, Hanyang etc. urban areas, and the proportion is directly proportional to the local medical resources and population density. The cases of Weibo help were mainly distributed from February 4 to February 7, 2020. With the relief of medical resources, the number of cases seeking help through social media decreased significantly. The distribution of patients, whose diagnosed date was mainly from January 16 to February 6, 2020, was basically consistent with the case information released by the Chinese Center for Disease Control and Prevention (CCDC). The median age of patients seeking for help was 60 years old, which was much higher than the data released by the CCDC but was roughly coincident with the data of the central hospital of Wuhan. The results of this study indicate that when dealing with major outbreaks of infectious diseases, social media are equally important in epidemiological analysis as well as the role in the dissemination of public opinion. Based on the wide adoption and timeliness nature of social media, it will be helpful for decision-makers to quickly grasp the real-world situation as it is combined with data mining or big data analysis.

     

/

返回文章
返回