基于深度强化学习的文本实体关系抽取方法

Entity Relationship Extraction from Text Data Based on Deep Reinforcement Learning

  • 摘要: 从文本大数据中快速准确地抽取文本的实体关系信息是构建知识图谱的关键。针对目前主流的远程监督关系抽取方法常常忽略实体对的类型信息和句子语法信息的问题,该文提出了一种基于深度强化学习的文本实体关系抽取方法。首先,利用结合实体周围词注意力机制的双向长短期记忆网络作为句子编码的第一个模块;然后,在此基础上加入实体类型嵌入模块,利用实体类型来丰富句子编码信息;最后,将一个依存句法分析模块纳入模型,共同组成了关系抽取器。同时,为实现标签级别的降噪,该文结合强化学习方法,设计了一个标签学习器来学习句子的软标签,以纠正错误标签。设计的标签学习器与关系抽取器结合,构成了基于深度强化学习的文本关系抽取框架。在公开数据集ACE2005、Chinese-Literature-NER-RE-Dataset和自建的数据集上进行实验,结果表明本文提出的方法在精度和召回率上都优于目前几种主流的模型。

     

    Abstract: Extracting entity relationship information from text big data quickly and accurately is very important to build knowledge maps. The existing main methods for remote supervised relationship extraction often ignore the type information and syntactic information of entity pairs. In this work, a bi-directional long short-term memory (BiLSTM) model combined with an attention mechanism layer of words around entities is utilized as the first module of sentence encoding. Then, an entity type embedding module is added to the model to enrich sentence encoding information. Finally, a semantic dependency parsing module is also included to the model. Thus, the three modules form a relation extractor. In addition, most of distant supervised relationship extraction models are designed to reduce noises in packets and sentences, they ignore the impacts of noise labels on model performances. Focused on noise reduction of labels, this work designs a label learner, which can learn soft labels of sentences on the basis of reinforcement learning so as to modify noisy labels. A novel relationship extraction framework for text entities based on deep reinforcement learning is built from our designed relationship extractor and label learner. The experiment results for a self-constructed dataset and two public datasets, ACE2005 and Chinese-Literature-NER-RE-Dataset show that our proposed method outperforms several state-of-the-art models in precision and recall rate.

     

/

返回文章
返回