融合语义及边界信息的中文电子病历命名实体识别

Named Entity Recognition for Chinese Electronic Medical Record by Fusing Semantic and Boundary Information

  • 摘要: 中文电子病历数据专业性强,语法结构复杂,用于自然语言处理(NLP)的命名实体识别(NER)难度大。为了从电子病历数据中精确识别出医疗实体,提出了一种融合语义及边界信息的命名实体识别算法。首先,利用卷积神经网络(CNN)结构提取汉字图形信息,并与五笔特征拼接来丰富汉字的语义信息;然后,利用FLAT模型中的Lattice将医学词典作为字符潜在词组匹配文本信息;最后,将融入语义信息的Lattice模型用于中文电子病历命名实体识别。实验结果表明,该方法在Yidu-S4K数据集上的识别性能超过现有多种算法,且在Resume数据集上F1值可达到96.06%。

     

    Abstract: Chinese electronic medical record texts are highly professional, with complex grammar,it is difficult to use named entity recognition (NER) for natural language processing (NLP). In order to accurately identify medical entities from electronic medical record data, a named entity recognition algorithm combining semantic and boundary information is proposed. In this algorithm, the graphic information of Chinese characters is extracted by using the convolutional neural network (CNN) structure and the semantic information of the Chinese characters is enriched with Wubi features. And then the text information is matched with medical dictionary as a potential phrase of characters by using the Lattice in the FLAT model. Finally, the Lattice model incorporating semantic information is used for named entity recognition in Chinese electronic medical records. The experimental results show that this method has better recognition performance than other existing methods on the Yidu-S4K data set, and the F1 value on the Resume dataset is 96.06%.

     

/

返回文章
返回