Abstract:
Due to unavailability of class label information, supervised feature selection methods can not be applied to text clustering. In this case, a new unsupervised feature selection method combined Document Frequency with K-Means is proposed. The method firstly employs document frequency to select initial unsupervised features, and then brings into unsupervised feature selection by means of mainly performing effective supervised feature selection methods on different K-Means clustering results. Experimental results show that the new method can not only successfully select out the best small part of features, but also can significantly improve clustering performance.