High-dimensional text clustering by dimensionality reduction and improved density peak

Loading...
Thumbnail Image

Downloads

11

Date issued

Journal Title

Journal ISSN

Volume Title

Publisher

Hindawi and Wiley

Location

Signature

License

Abstract

This study focuses on high-dimensional text data clustering, given the inability of K-means to process high-dimensional data and the need to specify the number of clusters and randomly select the initial centers. We propose a Stacked-Random Projection dimensionality reduction framework and an enhanced K-means algorithm DPC-K-means based on the improved density peaks algorithm. The improved density peaks algorithm determines the number of clusters and the initial clustering centers of K-means. Our proposed algorithm is validated using seven text datasets. Experimental results show that this algorithm is suitable for clustering of text data by correcting the defects of K-means.

Description

Subject(s)

Citation

Wireless Communications and Mobile Computing. 2020, vol. 2020, art. no. 8881112.