Analyzing Big Data Originated from Social Networks and Data Communication Networks
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The dissertation work is divided into two parts. The
newlinefirst part discusses about sentiment analysis of tweets
newlinegenerated in the Big Data form, by using machine learning
newlinealgorithms. Second part of the dissertation work discusses
newlineabout fine-tuning the resource allocation mechanism of
newlinedistributed Apache Spark s multinode cluster. This results in
newlinefaster processing and analysis of Big Data originated from
newlinedata communication networks, using K-means machine
newlinelearning algorithm.
newlineAs first part of the dissertation work, a multiple tier
newlinearchitecture is proposed for performing sentiment
newlineclassification. This includes several modules like
newlinepreprocessing, data cleaning, tokenization, stemming, an
newlineupdated set of stopwords, lexicon and emoticon dictionaries
newlineand mechanisms for selecting the best features.
newlineIn sentiment analysis, prior to training a Machine Learning
newline(ML) model, the person using the software tool should select
newlinea ML algorithm manually and tune the model parameters
newlinesince an algorithm and its tuning parameter values will
newlinegreatly impact a model s performance. But selecting and fine
newlinetuning them requires high expertise and labor-intensive
newlineiterations. Thus, automating this process is much needed to
newlinemake ML accessible to layman users with a limited computing
newlineand programming expertise.
newlineIn particular, there is no a-single-model-fits-all solution to
newlineachieve highest accuracy for all varieties of dataset in a
newlinespecific application domain. It is a tedious, time-consuming
newlineand inefficient process to try out several ML algorithms with
newlinevarying parameter configurations. Hence, automating the ML
newlinemodelling process is of much importance. In the proposed
newlinemethod, the algorithm automatically selects the best
newlineiv
newlineperforming ML algorithm against a particular dataset by
newlineoptimizing the parameter settings for the selected algorithm.
newlineThis yields a much better performance than selecting an
newlinealgorithm with its default settings.
newlineProposed model is a multiple layered ML architecture, with
newlineaccuracy as the evaluation criteria while analyzing the tweets.
newlineTuning