Study and analysis of document mining Using optimization techniques
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
In this digital era, the internet acts as an important medium for
newlinecommunication. Every day, the internet users generate a vast amount of data in
newlineWWW repository for communication. The internet users are contributing data in
newlinethe form of text such as emails, tweets, product/movie reviews, discussion text,
newlinechat, personal/technical blogs, etc. The quest for knowledge in a vast data pool is
newlinea challenging task. The document mining techniques are used to get the needed
newlineinformation from the unstructured text corpus in the easiest way. The document
newlinemining techniques such as text summarization, topic modeling, text clustering,
newlinetext feature selection, text classification, sentiment analysis are used to manage
newlineand retrieve the needed information from unstructured text corpus. This research
newlinework enhances the document classification techniques and document clustering
newlinetechniques by using the Jaya optimization algorithm. This research work is
newlinesegmented into two phases.
newlineIn the first phase, the proposed research work deploys a novel hybrid
newlinefeature selection method based on binary Jaya optimization algorithm to obtain
newlinethe appropriate subset of optimal features for document classification problem.
newlineFeature selection plays a vital role to reduce the high dimension of the feature
newlinespace in the text document classification. The dimension reduction of feature
newlinespace reduces the computation cost and improves the text classification
newlineaccuracy. Hence, the identification of a proper subset of the significant features
newlineof the text corpus is needed to classify the data in less computational time with
newlinehigher accuracy. This work introduces the new hybrid feature selection method
newlinebased on normalized difference measure and binary Jaya optimization algorithm
newlineto obtain the appropriate subset of optimal features from the text corpus. The
newlineerror rate is used as a minimizing objective function to measure the fitness of a
newlinesolution. The nominated optimal feature subsets are evaluated using Naive
newlineBayes and Support Vector Machine classifier with various popular benchmark
newlinetext corpus datasets
newline