JIA Zhen, YANG Yan, HE Da-ke. Attribute Extraction of Chinese Online Encyclopedia Based on Weakly Supervised Learning[J]. Journal of University of Electronic Science and Technology of China, 2014, 43(5): 758-763. DOI: 10.3969/j.issn.1001-0548.2014.05.022
Citation: JIA Zhen, YANG Yan, HE Da-ke. Attribute Extraction of Chinese Online Encyclopedia Based on Weakly Supervised Learning[J]. Journal of University of Electronic Science and Technology of China, 2014, 43(5): 758-763. DOI: 10.3969/j.issn.1001-0548.2014.05.022

Attribute Extraction of Chinese Online Encyclopedia Based on Weakly Supervised Learning

  • An attribute extraction method based on weakly supervised learning is proposed in the paper. The training corpus is automatically acquired from natural language texts by using structured attribute information from knowledgebase. To solve the problem that noise exists in the training corpus, an optimization method based on keywords filtering is proposed. N-pattern features extraction method is proposed which can relieve to some extent the data sparsity problem of traditional n-gram features. Experiment data are downloaded from Hudong Baike. Structured attribute information is extracted from infoboxes of Hudong Baike and used to construct knowledgebase. Training data and testing data are acquired from encyclopedia entry texts. Experiment results show that the method of keywords filtering can effectively improve the quality of training corpus, and achieve better performance of attribute extraction by using n-pattern features, compared with traditional n-gram features.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return