DESIGN AND DEVELOPMENT OF EFFICIENT CLUSTERING TECHNIQUES IN DATA MINING

Abstract

Data Mining is a process of drawing out useful patterns or knowledge from the huge data collected in information systems and to use these patterns in taking safe and smart decisions. The predefined methods and algorithms that are used to extract these useful patterns are called Data Mining Techniques. newlineClustering is a data mining technique of dividing the given dataset into groups or clusters such that the objects in one group are more similar to each other than the objects in the other group. Many clustering algorithms have been proposed in the literature. These clustering algorithms are broadly classified into two categories, Hierarchical and Partitional. The newlineK-Means algorithm is one of the commonly used techniques in the Partitional category. newlineK-Means is a simple algorithm known for its speed. The algorithm is inexpensive in terms of computational cost and works well with high dimensional and large datasets. However, there exist some limitations of this algorithm. One major limitation is the requirement to specify a pre-defined value of number of clusters (K) as input. Providing value of K is domain specific. Sometimes it is difficult to predict the number of clusters required in advance as the dataset is unknown or new and in that case inefficient grouping of data may emerge. These limitations of K-Means are carried forward to its extensions K-Modes and K-Prototype. Various extensions of K-Means for numerical, categorical and mixed datasets to overcome the limitation of providing K as input have been proposed in the literature but these algorithms either require some input parameter other than K or they are computationally complex. newlineThe K-Modes, an extension of the K-Means algorithm for categorical data, is an algorithm famous for its simplicity and speed. Since K-Modes algorithm is used for categorical data, Simple Matching Dissimilarity measure is used instead of Euclidean distance and the Modes of clusters are used instead of Means .

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced