新颖的无监督特征选择方法

New Unsupervised Feature Selection Method

  • 摘要: 针对有监督特征选择方法因为需要类信息而无法应用于文本聚类的问题,提出了一种新的无监督特征选择方法: 结合文档频和K-Means的特征选择方法。该方法首先使用文档频进行无监督特征初选,然后再通过在不同K-Means聚类结果上 使用有监督特征选择方法来实现无监督特征选择。实验表明该方法不仅能够成功地选择出最为重要的—小部分特征,而且还 能提高聚类质量。

     

    Abstract: Due to unavailability of class label information, supervised feature selection methods can not be applied to text clustering. In this case, a new unsupervised feature selection method combined Document Frequency with K-Means is proposed. The method firstly employs document frequency to select initial unsupervised features, and then brings into unsupervised feature selection by means of mainly performing effective supervised feature selection methods on different K-Means clustering results. Experimental results show that the new method can not only successfully select out the best small part of features, but also can significantly improve clustering performance.

     

/

返回文章
返回