Abstract:
This paper proposes a new algorithm for document-word co-clustering. After mining semantics with word hyperclique patterns, the document dataset with a bipartite graph is described. Then, the efficient graph partitioning algorithm is employed to partition this graph, so that the high computational overhead of traditional clustering algorithms over huge document datasests can be avoided. During clustering, word hyperclique patterns that are full of document semantics are preserved. In this way, our algorithm partially circumvents the problem of loosing document semantics, which happens a lot in traditional clustering algorithms based on document pairwise similarity alone. Finally, the extensive experimental results demonstrated the effectiveness of this algorithm in document clustering accuracy and cluster topic detection.