<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE ArticleSet PUBLIC "-//NLM//DTD PubMed 2.7//EN" "https://dtd.nlm.nih.gov/ncbi/pubmed/in/PubMed.dtd">
<ArticleSet>
<Article>
<Journal>
				<PublisherName>Shahid Beheshti University</PublisherName>
				<JournalTitle>Journal of Innovations in Computer Science and Engineering (JICSE)</JournalTitle>
				<Issn>2981-2135</Issn>
				<Volume>1</Volume>
				<Issue>1</Issue>
				<PubDate PubStatus="epublish">
					<Year>2023</Year>
					<Month>06</Month>
					<Day>01</Day>
				</PubDate>
			</Journal>
<ArticleTitle>Scalable Parallel K-Means Clustering on GPU and CPU Clusters</ArticleTitle>
<VernacularTitle></VernacularTitle>
			<FirstPage>102</FirstPage>
			<LastPage>119</LastPage>
			<ELocationID EIdType="pii">104047</ELocationID>
			
<ELocationID EIdType="doi">10.48308/jicse.2023.231121.1019</ELocationID>
			
			<Language>EN</Language>
<AuthorList>
<Author>
					<FirstName>Saeid</FirstName>
					<LastName>Rahmani</LastName>
<Affiliation>Institute for Research in Fundamental Sciences (IPM)</Affiliation>

</Author>
<Author>
					<FirstName>Armin</FirstName>
					<LastName>Ahmadzadeh</LastName>
<Affiliation>Sharif University of Technology, Department of Computer Engineering,</Affiliation>

</Author>
<Author>
					<FirstName>Omid</FirstName>
					<LastName>Hajihassani</LastName>
<Affiliation>Institute for Research in Fundamental Sciences (IPM)</Affiliation>

</Author>
<Author>
					<FirstName>Dara</FirstName>
					<LastName>Rahmati</LastName>
<Affiliation>Faculty of Computer Science and Engineering, Shahid Beheshti University</Affiliation>

</Author>
<Author>
					<FirstName>Saeid</FirstName>
					<LastName>Gorgin</LastName>
<Affiliation>Department of Iranian Research Organization for Science and Technology
(IROST), Electrical Engineering and Information Technology</Affiliation>

</Author>
</AuthorList>
				<PublicationType>Journal Article</PublicationType>
			<History>
				<PubDate PubStatus="received">
					<Year>2023</Year>
					<Month>03</Month>
					<Day>18</Day>
				</PubDate>
			</History>
		<Abstract>K-means clustering is one of the most prominent clustering methods that is used in many applications. By considering a widespread application of k-means clustering, redesign of this method in the context of high-performance computing has a considerable impact. In this paper, we consider scalability and utilize the available resources at a different level of parallelism. As a result, novel techniques are proposed for different hardware platforms, which are evaluated separately on uniformly random generated datasets and with different sizes. We change classic two-stage Lloyd’s formulation to a three stage that utilizes different techniques for each stage separately. Besides, we use an algebraic technique to reduce the amount of calculation and lay the foundation for consequent ideas. In CPUs, we propose a parallel architecture based on OpenMP and AVX2 instruction set. In GPUs, we utilize atomic operation and shared memory without considering GPU memory, and shared memory capabilities. Proposed method extends to multi-GPU. We merge these techniques and utilize MPI to scale it for multiple-node platforms.</Abstract>
		<ObjectList>
			<Object Type="keyword">
			<Param Name="value">K-Means</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">CUDA</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">GPU</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">Multi-GPU</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">MPI</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">OpenMP</Param>
			</Object>
			<Object Type="keyword">
			<Param Name="value">AVX2</Param>
			</Object>
		</ObjectList>
<ArchiveCopySource DocType="pdf">https://jicse.sbu.ac.ir/article_104047_68992ec4ebf6a015151ea33f51320016.pdf</ArchiveCopySource>
</Article>
</ArticleSet>
