Efficient Data Deduplication for Big Data Storage System

dc.contributor.guideS.C JAIN
dc.coverage.spatial
dc.creator.researcherNARESH KUMAR
dc.date.accessioned2020-11-04T06:08:30Z
dc.date.available2020-11-04T06:08:30Z
dc.date.awarded2018
dc.date.completed2018
dc.date.registered2013
dc.description.abstractIn this thesis research work, prime focus is optimizing the deduplication system by adjusting newlinepertinent factors in content defined chunking (CDC) to identify as key ingredients by newlinedeclaring chunk cut-points and efficient fingerprint lookup using on-disk secure bucket based newlineindex partitioning. Firstly, Differential Evolution (DE) algorithm based efficient chunking newlineapproach is proposed to optimize Two Thresholds Two Divisors (TTTD) CDC known as newlineTTTD-P; where significantly it reduces the number of computing operations by using single newlinedynamic optimal parameter divisor D with optimal threshold value T by exploiting the multioperations newlinenature of TTTD. To reduce the chunk-size variance, TTTD algorithm introduces newlinean additional backup divisor D` that has a higher probability of finding cut-points. However, newlineadding an additional divisor D` decreases the chunking throughput, meaning that TTTD newlinealgorithm aggravates Rabin s CDC performance bottleneck. To this end, Asymmetric newlineExtremum (AE) CDC significantly improves chunking throughput while providing newlinecomparable deduplication efficiency by using the local extreme value in a variable-sized newlineasymmetric window to overcome Rabin CDC and TTTD chunking problem. After AE, an newlineefficient FastCDC approach is developed using fast gear-based hashing. Therefore, AE and newlineFastCDC approaches increase the chunking throughput only, but suffers with the problem of newlinededuplication ratio (DR) for enhancing storage space as being the prime objective of today s newlinebig data storage systems using Hadoop technology in cloud computing to accommodate newlinemassive volume of data by eliminating redundant data maximally. Secondly, fingerprint newlinegeneration stage of data deduplication uses cryptographic secure hash function SHA-1 to newlinesecure big data storage using key-value store. The key is a fingerprint and the value points to newlinethe data chunk. Moreover, deduplication technology is also facing technical challenges for newlineduplicate-lookup disk bottleneck to store complete index of data chunks with their newlinefingerprints.
dc.description.note
dc.format.accompanyingmaterialCD
dc.format.dimensions
dc.format.extent5094
dc.identifier.urihttp://hdl.handle.net/10603/305313
dc.languageEnglish
dc.publisher.institutionComputer Engineering
dc.publisher.placeKota
dc.publisher.universityRajasthan Technical University, Kota
dc.relation
dc.rightsuniversity
dc.source.universityUniversity
dc.subject.keywordComputer Science
dc.subject.keywordComputer Science Cybernetics
dc.subject.keywordEngineering and Technology
dc.titleEfficient Data Deduplication for Big Data Storage System
dc.title.alternative
dc.type.degreePh.D.

Files

Original bundle

Now showing 1 - 5 of 10
Loading...
Thumbnail Image
Name:
01_title.pdf
Size:
104.85 KB
Format:
Adobe Portable Document Format
Description:
Attached File
Loading...
Thumbnail Image
Name:
02_certificate.pdf
Size:
1.82 MB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
03_preliminary pages.pdf
Size:
400.2 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
04_chapter01.pdf
Size:
583.66 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
04_chapter02.pdf
Size:
1.71 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.79 KB
Format:
Plain Text
Description: