Efficient Unsupervised Learning Technique Based Automatic Text Categorization
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Automatic text categorization can play an important role in a wide variety of more
newlinetlexible. dynamic and personalized information management tasks such as real-time
newlinesorting of email or files into folder hierarchies; topic identification to support topicspecific
newlineprocessing operations; structured search and/or browsing; or finding documents
newlinethat match long-term standing interests or more dynamic task-based interests.
newlineIn many contexts, textual information is a more important communication data in
newlineWorld Wide Web which is employed to categorize new knowledge by the trained
newlineprofessionals. This process is very time consuming and costly, thus limiting its
newlineapplicability. Consequently there is increased interest in developing technologies for
newlineautomatic text categorization.
newlineThe main focus of this research work is to study the problem of automatic
newlinetext categorization and to develop efficient unsupervised learning technique based text
newlinecategorization mechanism. In this thesis, an attempt is made to overcome the challenges
newlineof the various classifiers in terms of learning speed, real-time classification speed, and
newlineaccuracy. Three new algorithms are implements and results are analyzed to see the
newlineperformance of these algorithms using two different types of datasets DS0 and DS1 (20-
newlineNewsgroups, and Reuters-21578 WebPages). The performance evaluations of the
newlineproposed algorithms are done on different combinations of classifiers (Naïve Bayes and
newlineJ48) and datasets (DS0 and DS1).
newlineThe first algorithm describes a novel unsupervised learning based approach
newlinewhich uses frequent item (term) sets for text clustering for reducing drastically the
newlinedimensionality of the data. All the way through the performance analysis, it provides
newlinehetter accuracy of classilication as compared to direct classification.