半监督语义动态文本聚类算法

Semi-Supervised Semantic Dynamic Text Clustering Algorithm

  • 摘要: 针对传统的动态文本聚类将描述方式不同的同类文本划分到不同组中;以及聚类类别个数与真实类别数之间差距明显等问题,该文提出了一种半监督语义动态文本聚类算法(SDCS)。该算法以语义表征文本的方式来捕获文本间的语义关系,在聚类过程中动态学习类别语义,让文本能根据语义准确聚类。同时该算法利用半监督聚类的方法对新类的产生进行监督,学习符合实际情况的聚类结果。实验结果表明该文提出的算法是有效可行的。

     

    Abstract: In the traditional dynamic text clustering, the similar texts with different descriptions are divided into different groups; and the difference between the number of cluster categories and the number of real categories is obvious. Aiming at these problems, this paper proposes a semi-supervised semantic dynamic text clustering algorithm (SDCS). The algorithm captures the semantic relationship between texts by semantically representing the text, and dynamically learns the category semantics during the clustering process, so that the text can be accurately clustered according to semantics. At the same time, the algorithm uses the semi-supervised clustering algorithm to supervise the generation of new classes, and produces clustering results that are consistent with the actual situation. The experimental results show that the proposed algorithm is effective and feasible.

     

/

返回文章
返回