Design and Analysis of Supervised Machine Learning Systems for Automatic Text Mining
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Classification of text documents has now become an important research issue due to the
newlineexplosion of digital and online text transactions. Increased popularity of the internet and
newlineWorld Wide Web has made the text data as common platform for information exchange
newlinewhich results in the production of large quantity of text data. Though it is increasing in
newlinean exponential way, the algorithms and the data structures to process it still remains the
newlinesame. This situation motivated us to work towards classification of text data. Our
newlineresearch work contains a systematic study of text representation and proximity measures
newlinefor classification as well as for clustering problems. We have made a successful attempt
newlineto study the existing text representation models and text classification algorithms for the
newlinedevelopment of robust and efficient text mining applications.
newlineText documents, being unstructured in nature, requires a well-organized representation
newlinemodel for the building text mining application. In this work, we have proposed an
newlineeffective document vector space representation model, sentence vector space
newlinerepresentation model and uni-gram representation models for text document to tackle
newlinetext classification problem. The proposed representation model has the facility of
newlinerepresenting the text documents in lower dimension feature space which requires low
newlinecomputational time for processing. Two models for classification of text documents
newlinebased on the proposed representation is also presented.
newlineAn integer representation for text data is proposed to tackle text classification problem.
newlineThe newly proposed representation technique will drastically decrease the dimension of
newlinethe feature space by representing the text as an integer data. Two different classification
newlinetechniques are designed based on the proposed representation technique. A series of
newlineix
newlineexperiments are conducted on different data corpus to demonstrate the efficiency and
newlineeffectiveness of the proposed representation and classification techniques.
newlineNovel uncompres