applications of web log mining in exploratory web log big data analysis a study

Abstract

The advancements in the internet world and the exponential growth of web site usage newlinehave lead to focus on web site development and maintenance. Web site is an important part of newlineany organization and should be handled by carefully. Servers where the web sites are hosted to newlinestore the web site users activity as web log records which have to be analysed for the user newlineactivities. The web log records stored by web server can be administered by web master and is newlineauthorised to access those files. For effective management of any web server, it becomes newlineimminent to subject the feedback, activity and the web server performance and also other newlinecrucial issues for analyses. Web Log Mining or Web Usage Mining denotes the process of newlineanalysing behavioural patterns and interests of consumers. newlineThe demand for developing an algorithm to expedite preprocessing is obvious. Further, newlinethe mining of frequent link sets from the pre-processed data has to be faster and accurate. There newlineis a growing vacuum for developing algorithms to mine frequent data link sets. There is a lot of newlinescope for obtaining the statistical results in subjecting the data to exploratory analysis. Hence newlinethe demand for development of efficient preprocessing and data mining algorithms and the newlinescope of the exploratory data analysis triggered the motivation to work on those lines. newlineThis work concentrates on Web Usage Mining of the web log taken from the web server newlineof the Annamalai University website. The mining is done to understand the user patterns that newlineemerge from the careful analysis of the web log data. This involves Web Log Data Collection newlinewhich can be downloaded from the web server. The downloaded data is then subjected for newlinepreprocessing. An SQL Based Algorithm, wwwLogMiner is employed to preprocess the data. newlineA total of one lakh log entries are taken for the preprocessing. Out of which 85,156 newline(85%) logs are marked as noisy log. Only 14,844 (15 %) log data is considered as cleaned log. newlineOn evaluation it is found that the size of the Web server log file is reduced by 85%

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced