Performance enhancement of hash based parallel deduplication model

Performance enhancement of hash based parallel deduplication model

Files

01_title.pdf (24.64 KB)

02_certificates.pdf (530 KB)

03_abstracts.pdf (11.17 KB)

04_acknowledgements.pdf (5.13 KB)

05_contents.pdf (69.92 KB)

Abstract

In the recent years, a man-made digital universe is created by millions of devices such as mobile phones, digital cameras, surveillance cameras, embedded systems and organizations providing solutions for handling this enormous amount of data. This digital universe is increasing twofold every two years and is expected to reach 44 trillion gigabytes by the year 2020. In order to protect and preserve this voluminous data, backup solutions are provided. However, a large proportion as large as 75% of this data contains duplicates. This leads to the need of data reduction techniques that can optimize the storage requirements. Deduplication is an effective data reduction technique that not only removes inter-file and intra-file redundancy but also helps to remove the duplicates among the files and file constituents present across various users and even across organizations. A hash based deduplication split the incoming data stream into fragments called chunks. An identity signature, also called fingerprint is created for each chunk using a cryptographic hash algorithm. A hash indexing structure is used to store the metadata, the fingerprints. The fingerprint insertion and lookup operations are CPU intensive in nature. Moreover, as the size of the incoming data stream increases, the indexing structure also grows leading to frequent disk lookups to access the metadata. Hence, maintaining the indexing structure, improving the fingerprint insertion and lookup operations on the indexing structure and addressing the disk lookup bottleneck problems continue to be the open issues in hash based deduplication. newline

URI

http://hdl.handle.net/10603/299276

Collections

Faculty of Information and Communication Engineering

Full item page

Performance enhancement of hash based parallel deduplication model

Files

Date

item.page.authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced