Some techniques to speed-up k-means and kernel k-means clustering methods for large datasets

dc.contributor.guideViswanath Pen_US
dc.contributor.guideReddy, B Eswara
dc.coverage.spatialComputer Scienceen_US
dc.creator.researcherSarma, T Hitendraen_US
dc.date.accessioned2013-12-11T07:36:17Z
dc.date.available2013-12-11T07:36:17Z
dc.date.awarded04/12/2013en_US
dc.date.completed05/07/2012en_US
dc.date.issued2013-12-11
dc.date.registered20/08/2009en_US
dc.description.abstractData clustering is an unsupervised learning activity which is a process of finding natural groups (clusters) present in the given dataset (i.e., the given set of patterns). It has several applications, like image segmentation, video analysis, bio-informatics, intrusion detection, outlier detection, etc. New application domains and amassed data poses new challenges in the area of data clustering. Different types of data clustering methods have been evolved to cater these upcoming challenges. Among the existing clustering methods, the simplest and efficient clustering method is the k-means clustering method. It has been shown to produce good clustering results in various applications. The time complexity of the k-means method linearly grows with respect to the size of the dataset. In the iterative process of the k-means method, the entire dataset has to be scanned once in each iteration, which is a time consuming process in case of large datasets. Hence, the k-means method is not a suitable one to work with large datasets which do not fit in the main memory. Further, the method fails in identifying non-convex shaped and linearly inseparable clusters in the input space. The kernel k-means clustering method is a nonlinear extension of the k-means method. By implicitly mapping data points to a higher-dimensional feature space (induced space)using a non linear transformation, the kernel k-means method can discover clusters that are linearly inseparable in the input space. But, the time complexity of this method grows equadratically with respect to the size of the dataset. Hence, the kernel k-means clustering method is also not a suitable one for large datasets. The present thesis is about speeding-up he k-means and kernel k-means clustering methods to work with large datasets. In order to speed-up the k-means method, the thesis proposes two prototype based hybrid approaches, which give the same result as that obtained by using the conventional kmeans method.en_US
dc.description.noteSummary p. 146-147, References p. 148-172en_US
dc.format.accompanyingmaterialNoneen_US
dc.format.dimensions--en_US
dc.format.extent172p.en_US
dc.identifier.urihttp://hdl.handle.net/10603/13964
dc.languageEnglishen_US
dc.publisher.institutionDepartment of Computer Science and Engineeringen_US
dc.publisher.placeAnantapuramen_US
dc.publisher.universityJawaharlal Nehru Technological University, Anantapuramen_US
dc.relation206en_US
dc.rightsuniversityen_US
dc.source.universityUniversityen_US
dc.subject.keywordKernel k-means clustering methodsen_US
dc.subject.keywordSpeed-Up K-Meansen_US
dc.titleSome techniques to speed-up k-means and kernel k-means clustering methods for large datasetsen_US
dc.type.degreePh.D.en_US

Files

Original bundle

Now showing 1 - 5 of 13
Loading...
Thumbnail Image
Name:
01_title.pdf
Size:
38.76 KB
Format:
Adobe Portable Document Format
Description:
Attached File
Loading...
Thumbnail Image
Name:
02_certificate.pdf
Size:
25.21 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
03_acknowledgements.pdf
Size:
15.23 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
04_contents.pdf
Size:
34.75 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
05_abstract.pdf
Size:
22.67 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.79 KB
Format:
Plain Text
Description: