Web activity analysis using sequential pattern mining
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Sequence of web pages visited by the clients over a particular
newlinetimeframe is called the session/pageset. Web log mining is done to analyze the
newlinebehavior of the users, using the web access patterns. Sessions are identified as
newlinethe significant part of the construction of recommendation model. The novel part
newlineof the work makes use of backward moves made by the user, considering both
newlinethe referrer url and the requested url extracted from the extended web log for
newlinesession identification. The length of the sessions are maximized using split and
newlinemerge technique and the time taken for session identification is reduced using
newlinethread parallelization. For efficient storage and retrieval of information the hash
newlinemap data structure is used. The proposed approach outperforms the existing
newlineapproach in terms of standard error and correlation coefficient.
newlineTwo different users may have sessions with a similar set of pages
newlinevisited, but the interest with which they have visited the web pages may be
newlinedifferent. By augmenting the pages with interest of the users, the clustered
newlinesessions can be used for recommending the pages that are of actual interest to
newlinethe users. The initial number of clusters is identified based on the Discounted
newlineFuzzy Relational Clustering (DFRC) algorithm which reduces the overhead of
newlinesetting the number of clusters. A non-euclidean distance metric is used to
newlinedetermine how close the sessions are and clustering the sessions.
newlineAn approach for identifying the frequent pagesets from the sequence
newlinedatabase without candidate generation is proposed. The sequence hashmap is
newlineused for finding the support of the sequences without scanning the entire
newlinedatabase. The ordered sequence position hashmap is used for constructing
newlinelength-(k+1) sequences from length-k sequences in a faster manner compared to
newlineother approaches. A model based collaborative filtering is proposed to
newlinerecommend the pages of interest to the user.
newline