余弦度量和适应度函数改进的聚类方法

施侃晟, 刘海涛, 白英彩, 宋文涛, 洪亮亮

施侃晟, 刘海涛, 白英彩, 宋文涛, 洪亮亮. 余弦度量和适应度函数改进的聚类方法[J]. 电子科技大学学报, 2013, 42(4): 621-624. DOI: 10.3969/j.issn.1001-0548.2013.04.017
引用本文: 施侃晟, 刘海涛, 白英彩, 宋文涛, 洪亮亮. 余弦度量和适应度函数改进的聚类方法[J]. 电子科技大学学报, 2013, 42(4): 621-624. DOI: 10.3969/j.issn.1001-0548.2013.04.017
SHI Kan-sheng, LIU Hai-tao, BAI Yin-cai, SONG Wen-tao, HONG Liang-liang. Text Clustering Method with Improved Fitness Function and Cosine Similarity Measure[J]. Journal of University of Electronic Science and Technology of China, 2013, 42(4): 621-624. DOI: 10.3969/j.issn.1001-0548.2013.04.017
Citation: SHI Kan-sheng, LIU Hai-tao, BAI Yin-cai, SONG Wen-tao, HONG Liang-liang. Text Clustering Method with Improved Fitness Function and Cosine Similarity Measure[J]. Journal of University of Electronic Science and Technology of China, 2013, 42(4): 621-624. DOI: 10.3969/j.issn.1001-0548.2013.04.017

余弦度量和适应度函数改进的聚类方法

基金项目: 

国家自然科学基金(61073150)

详细信息
    作者简介:

    施侃晟(1966-),男,教授,主要从事信息挖掘、云计算和物联网方面的研究.

  • 中图分类号: TP18

Text Clustering Method with Improved Fitness Function and Cosine Similarity Measure

  • 摘要: K-均值算法因其简单和高效性, 在文本聚类中占有重要地位. 针对传统的K-均值算法对初始点敏感、易陷入局部最优的问题, 结合遗传算法已经成为一种趋势. 在充分发挥K-均值算法的高效性的同时, 该文利用遗传算法的全局自适应优化特点克服了对初始点敏感的问题. 同时, 以余弦度量评价对象间的相似性并以此构造新的遗传算法适应度函数、收敛准则以及遗传算法种群更新方式, 提高了K-均值和遗传算法这种结合方式的聚类精度, 并增强了该结合算法的稳定性.
    Abstract: The traditional K-means algorithm is widely used because of its simplicity and efficiency. However, it is sensitive to the initial point and easy to fall into local optimum. In this paper, we use cosine measure to evaluate the similarity between objects and construct a new fitness function of genetic algorithm and the new convergence criterion for K-means algorithm. Experimental results show that the new method enhances the clustering accuracy and stability for the combination of K-means and genetic algorithm.
计量
  • 文章访问数:  4668
  • HTML全文浏览量:  147
  • PDF下载量:  70
  • 被引次数: 0
出版历程
  • 收稿日期:  2011-08-28
  • 修回日期:  2012-04-17
  • 刊出日期:  2013-08-14

目录

    /

    返回文章
    返回