Scalable Parallel K-Means Clustering on GPU and CPU Clusters

Rahmani, Saeid; Ahmadzadeh, Armin; Hajihassani, Omid; Rahmati, Dara; Gorgin, Saeid

doi:10.48308/jicse.2023.231121.1019

Document Type : Original Article

Authors

¹ Institute for Research in Fundamental Sciences (IPM)

² Sharif University of Technology, Department of Computer Engineering,

³ Faculty of Computer Science and Engineering, Shahid Beheshti University

⁴ Department of Iranian Research Organization for Science and Technology (IROST), Electrical Engineering and Information Technology

https://doi.org/10.48308/jicse.2023.231121.1019

Abstract

K-means clustering is one of the most prominent clustering methods that is used in many applications. By considering a widespread application of k-means clustering, redesign of this method in the context of high-performance computing has a considerable impact. In this paper, we consider scalability and utilize the available resources at a different level of parallelism. As a result, novel techniques are proposed for different hardware platforms, which are evaluated separately on uniformly random generated datasets and with different sizes. We change classic two-stage Lloyd’s formulation to a three stage that utilizes different techniques for each stage separately. Besides, we use an algebraic technique to reduce the amount of calculation and lay the foundation for consequent ideas. In CPUs, we propose a parallel architecture based on OpenMP and AVX2 instruction set. In GPUs, we utilize atomic operation and shared memory without considering GPU memory, and shared memory capabilities. Proposed method extends to multi-GPU. We merge these techniques and utilize MPI to scale it for multiple-node platforms.

Keywords

Main Subjects

Artificial Intelligence, Robotics, Cognitive Computing

Journal of Innovations in Computer Science and Engineering (JICSE)

Scalable Parallel K-Means Clustering on GPU and CPU Clusters

Volume 1, Issue 1 - Serial Number 1
June 2023
Pages 102-119

Scalable Parallel K-Means Clustering on GPU and CPU Clusters

Volume 1, Issue 1 - Serial Number 1June 2023Pages 102-119

Volume 1, Issue 1 - Serial Number 1
June 2023
Pages 102-119