基于改进BERT算法的专利实体抽取研究—以石墨烯为例

Study on Patent Entity Extraction Based on Improved Bert Algorithms—A Case Study of Graphene

摘要: 实体关系抽取是判断专利新颖性的核心环节，传统的实体关系抽取都是采用串行方式来进行，有很大的局限性。该文利用两种改进的BERT算法研究了专利实体关系抽取的技术演化。一种是将中文特征和句法语义特征相结合的新算法—基于改进的BERT-BiLSTM-CRF命名实体识别算法；另一种是将注意力机制与句法语义特征相结合的新算法—基于注意力机制与语义结合的实体关系抽取算法。最后以石墨烯制备技术为例，利用数值实验说明改进的两种算法能够高效分析专利的内容，揭示石墨烯企业技术的动态演化过程。

Abstract: The entity relation extraction is the key part to estimate the novelty of patents. The traditional entity relation extraction is the series system, but this style has major dwawbacks. The paper studies the evolution of entity relation extraction using two improved BERT algorithms. One is the method combining traditional Chinese features with syntactic semantic features, and the other is the method combining attention mechanism with syntactic semantic features. The extensive computational experiments and the preparation technology of the graphene show that the two algorithms can improve the analysis efficiency for the contents of the patents and reveal the dynamic evolution process of the technology of the graphene firm.