An efficient sentiment classification technique in Hadoop framework using optimized tree

Abstract

The big data analysis requires a fast mining on a large scale data set, i.e., the immense amount of data should be processed in a limited time to show useful information. As the computing power improves, the more volume of date cab be processed. The more data are retrieved and processed; the better understanding of problems can be obtained. The process whereby the subsets of features that are obtained from the data are extorted for a learning algorithm s application is referred to as feature extraction. Classification is the problem identifying to which set of categories a new observation belongs on the basis of training set of data contains observations whose category membership is known. Feature selection can address the curse of dimensionality by selecting only relevant features for classification. By eliminating and reducing irrelevant features and redundant features, feature selection could reduce the number of features, cut down the training time, simplify the learned classifiers and improve the classification performance. While handling big data, Hadoop provides a platform for users in developing their own sentiment analysis with the help of lexicon dictionary or available application programming interfaces (APIs) or external programs. The aim of classifying a data is to analyze large data and develop an appropriate description or model for every organized class with the feature present in the data. This work involves Hadoop framework in obtaining an effective classification with the help of Random Forest (RF) Techniques. Feature extraction using Term Frequency-Inverse Document Frequency (TF-IDF) TECHNIQUE. Term frequency (TF) means the number of times a term appeared in a document. Document frequency (DF) is defined as the number of documents that includes a term. Inverse Document Frequency (IDF) measures the amount of information. The TF-IDF is used to calculate the product of TF and IDF. RF be the parametric supervised classification method which can be considered as a Classification and Regr

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced