applications of web log mining in exploratory web log big data analysis a study
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The advancements in the internet world and the exponential growth of web site usage
newlinehave lead to focus on web site development and maintenance. Web site is an important part of
newlineany organization and should be handled by carefully. Servers where the web sites are hosted to
newlinestore the web site users activity as web log records which have to be analysed for the user
newlineactivities. The web log records stored by web server can be administered by web master and is
newlineauthorised to access those files. For effective management of any web server, it becomes
newlineimminent to subject the feedback, activity and the web server performance and also other
newlinecrucial issues for analyses. Web Log Mining or Web Usage Mining denotes the process of
newlineanalysing behavioural patterns and interests of consumers.
newlineThe demand for developing an algorithm to expedite preprocessing is obvious. Further,
newlinethe mining of frequent link sets from the pre-processed data has to be faster and accurate. There
newlineis a growing vacuum for developing algorithms to mine frequent data link sets. There is a lot of
newlinescope for obtaining the statistical results in subjecting the data to exploratory analysis. Hence
newlinethe demand for development of efficient preprocessing and data mining algorithms and the
newlinescope of the exploratory data analysis triggered the motivation to work on those lines.
newlineThis work concentrates on Web Usage Mining of the web log taken from the web server
newlineof the Annamalai University website. The mining is done to understand the user patterns that
newlineemerge from the careful analysis of the web log data. This involves Web Log Data Collection
newlinewhich can be downloaded from the web server. The downloaded data is then subjected for
newlinepreprocessing. An SQL Based Algorithm, wwwLogMiner is employed to preprocess the data.
newlineA total of one lakh log entries are taken for the preprocessing. Out of which 85,156
newline(85%) logs are marked as noisy log. Only 14,844 (15 %) log data is considered as cleaned log.
newlineOn evaluation it is found that the size of the Web server log file is reduced by 85%