Some techniques to speed-up k-means and kernel k-means clustering methods for large datasets

Some techniques to speed-up k-means and kernel k-means clustering methods for large datasets

dc.contributor.guide	Viswanath P	en_US
dc.contributor.guide	Reddy, B Eswara
dc.coverage.spatial	Computer Science	en_US
dc.creator.researcher	Sarma, T Hitendra	en_US
dc.date.accessioned	2013-12-11T07:36:17Z
dc.date.available	2013-12-11T07:36:17Z
dc.date.awarded	04/12/2013	en_US
dc.date.completed	05/07/2012	en_US
dc.date.issued	2013-12-11
dc.date.registered	20/08/2009	en_US
dc.description.abstract	Data clustering is an unsupervised learning activity which is a process of finding natural groups (clusters) present in the given dataset (i.e., the given set of patterns). It has several applications, like image segmentation, video analysis, bio-informatics, intrusion detection, outlier detection, etc. New application domains and amassed data poses new challenges in the area of data clustering. Different types of data clustering methods have been evolved to cater these upcoming challenges. Among the existing clustering methods, the simplest and efficient clustering method is the k-means clustering method. It has been shown to produce good clustering results in various applications. The time complexity of the k-means method linearly grows with respect to the size of the dataset. In the iterative process of the k-means method, the entire dataset has to be scanned once in each iteration, which is a time consuming process in case of large datasets. Hence, the k-means method is not a suitable one to work with large datasets which do not fit in the main memory. Further, the method fails in identifying non-convex shaped and linearly inseparable clusters in the input space. The kernel k-means clustering method is a nonlinear extension of the k-means method. By implicitly mapping data points to a higher-dimensional feature space (induced space)using a non linear transformation, the kernel k-means method can discover clusters that are linearly inseparable in the input space. But, the time complexity of this method grows equadratically with respect to the size of the dataset. Hence, the kernel k-means clustering method is also not a suitable one for large datasets. The present thesis is about speeding-up he k-means and kernel k-means clustering methods to work with large datasets. In order to speed-up the k-means method, the thesis proposes two prototype based hybrid approaches, which give the same result as that obtained by using the conventional kmeans method.	en_US
dc.description.note	Summary p. 146-147, References p. 148-172	en_US
dc.format.accompanyingmaterial	None	en_US
dc.format.dimensions	--	en_US
dc.format.extent	172p.	en_US
dc.identifier.uri	http://hdl.handle.net/10603/13964
dc.language	English	en_US
dc.publisher.institution	Department of Computer Science and Engineering	en_US
dc.publisher.place	Anantapuram	en_US
dc.publisher.university	Jawaharlal Nehru Technological University, Anantapuram	en_US
dc.relation	206	en_US
dc.rights	university	en_US
dc.source.university	University	en_US
dc.subject.keyword	Kernel k-means clustering methods	en_US
dc.subject.keyword	Speed-Up K-Means	en_US
dc.title	Some techniques to speed-up k-means and kernel k-means clustering methods for large datasets	en_US
dc.type.degree	Ph.D.	en_US