Efficient Data Deduplication for Big Data Storage System

Efficient Data Deduplication for Big Data Storage System

dc.contributor.guide	S.C JAIN
dc.coverage.spatial
dc.creator.researcher	NARESH KUMAR
dc.date.accessioned	2020-11-04T06:08:30Z
dc.date.available	2020-11-04T06:08:30Z
dc.date.awarded	2018
dc.date.completed	2018
dc.date.registered	2013
dc.description.abstract	In this thesis research work, prime focus is optimizing the deduplication system by adjusting newlinepertinent factors in content defined chunking (CDC) to identify as key ingredients by newlinedeclaring chunk cut-points and efficient fingerprint lookup using on-disk secure bucket based newlineindex partitioning. Firstly, Differential Evolution (DE) algorithm based efficient chunking newlineapproach is proposed to optimize Two Thresholds Two Divisors (TTTD) CDC known as newlineTTTD-P; where significantly it reduces the number of computing operations by using single newlinedynamic optimal parameter divisor D with optimal threshold value T by exploiting the multioperations newlinenature of TTTD. To reduce the chunk-size variance, TTTD algorithm introduces newlinean additional backup divisor D` that has a higher probability of finding cut-points. However, newlineadding an additional divisor D` decreases the chunking throughput, meaning that TTTD newlinealgorithm aggravates Rabin s CDC performance bottleneck. To this end, Asymmetric newlineExtremum (AE) CDC significantly improves chunking throughput while providing newlinecomparable deduplication efficiency by using the local extreme value in a variable-sized newlineasymmetric window to overcome Rabin CDC and TTTD chunking problem. After AE, an newlineefficient FastCDC approach is developed using fast gear-based hashing. Therefore, AE and newlineFastCDC approaches increase the chunking throughput only, but suffers with the problem of newlinededuplication ratio (DR) for enhancing storage space as being the prime objective of today s newlinebig data storage systems using Hadoop technology in cloud computing to accommodate newlinemassive volume of data by eliminating redundant data maximally. Secondly, fingerprint newlinegeneration stage of data deduplication uses cryptographic secure hash function SHA-1 to newlinesecure big data storage using key-value store. The key is a fingerprint and the value points to newlinethe data chunk. Moreover, deduplication technology is also facing technical challenges for newlineduplicate-lookup disk bottleneck to store complete index of data chunks with their newlinefingerprints.
dc.description.note
dc.format.accompanyingmaterial	CD
dc.format.dimensions
dc.format.extent	5094
dc.identifier.uri	http://hdl.handle.net/10603/305313
dc.language	English
dc.publisher.institution	Computer Engineering
dc.publisher.place	Kota
dc.publisher.university	Rajasthan Technical University, Kota
dc.relation
dc.rights	university
dc.source.university	University
dc.subject.keyword	Computer Science
dc.subject.keyword	Computer Science Cybernetics
dc.subject.keyword	Engineering and Technology
dc.title	Efficient Data Deduplication for Big Data Storage System
dc.title.alternative
dc.type.degree	Ph.D.