Design and Analysis of Supervised Machine Learning Systems for Automatic Text Mining

Abstract

Classification of text documents has now become an important research issue due to the newlineexplosion of digital and online text transactions. Increased popularity of the internet and newlineWorld Wide Web has made the text data as common platform for information exchange newlinewhich results in the production of large quantity of text data. Though it is increasing in newlinean exponential way, the algorithms and the data structures to process it still remains the newlinesame. This situation motivated us to work towards classification of text data. Our newlineresearch work contains a systematic study of text representation and proximity measures newlinefor classification as well as for clustering problems. We have made a successful attempt newlineto study the existing text representation models and text classification algorithms for the newlinedevelopment of robust and efficient text mining applications. newlineText documents, being unstructured in nature, requires a well-organized representation newlinemodel for the building text mining application. In this work, we have proposed an newlineeffective document vector space representation model, sentence vector space newlinerepresentation model and uni-gram representation models for text document to tackle newlinetext classification problem. The proposed representation model has the facility of newlinerepresenting the text documents in lower dimension feature space which requires low newlinecomputational time for processing. Two models for classification of text documents newlinebased on the proposed representation is also presented. newlineAn integer representation for text data is proposed to tackle text classification problem. newlineThe newly proposed representation technique will drastically decrease the dimension of newlinethe feature space by representing the text as an integer data. Two different classification newlinetechniques are designed based on the proposed representation technique. A series of newlineix newlineexperiments are conducted on different data corpus to demonstrate the efficiency and newlineeffectiveness of the proposed representation and classification techniques. newlineNovel uncompres

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced