Cloud data storage and comparison using advanced enhanced position aware sampling AEPAS algorithm
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Cloud computing is a Technology that possibly changes a massive portion of
newlinethe IT business, building software design considerably more attractive as an
newlineadministration and shaping the manner IT hardware is structured and acquired.
newlineBecause of the development in the Information Technology, managing the massive
newlinedata is very difficult to handle. So users depend on the service providers to store and
newlinemanage their data. Because of this, there is a chance for data redundancy in the
newlinecloud. In the IDC report, the redundant data stored across the world is almost 75%.
newlineBecause many users save the same files in the cloud. This redundant data consumes
newlineInformation Technology Resources and also the network bandwidth as accessed
newlinethrough the internet.
newlineThere are few techniques available in the literature to address the problem of
newlinededuplication to eliminate redundant data are mostly at the file level. In this thesis, a
newlinenovel similarity detection algorithm based on sampling technique called Advanced
newlineEnhanced Position Aware Sampling (AEPAS). This algorithm detects file similarity
newlinefor the files in the cloud utilizing the concept of file modulo length. In the existing
newlinetechniques, a slight modification in the file made a significant impact on the shifting
newlineof sampling bit positions. The proposed AEPAS algorithm samples the data blocks
newlineboth from the beginning and end of the files. Furthermore, this thesis described a
newlinequery algorithm to decrease the time overhead incurred in detecting the similarity.
newlineThe various metrics such as Query time, CPU and Memory Utilization, etc.,
newlineare used to evaluate the performance of the proposed algorithm with the other
newlineexisting similarity detection algorithms such as Shingling, Simhash, Traits, TSA,
newlinePAS, and EPAS. The metrics precision, recall, accuracy, and f-measure
newlinedemonstrates that AEPAS is very efficient in identifying file similarity in compare
newlinevii
newlineto the existing algorithms. The experimental results also implies that the time
newlineoverhead, CPU and memory utilization of AEPAS are very minimal when
newlinecompared to