Scalable Technique for Privacy Preserving Big Data Publishing and Mining For Data Stored in Cloud

Abstract

Data is referred to as Big Data nowadays because of the advancement in newlinetechnologies. This is due to heavy usage of social media which generates huge data newlineevery second. Consider the social media platforms like Facebook and Twitter which newlinehas a huge number of users. These users share a huge amount of data every day. With newlinethe evolution of Big Data, data owners require the assistance of a third party newline(e.g.,cloud) to store, analyze the data and obtain information at a lower cost. newlineHowever, maintaining privacy is a challenge in such scenarios. It may reveal sensitive newlineinformation about the users. newlineThe existing research discusses different techniques to implement privacy in newlineoriginal data using anonymization, randomization, and suppression techniques. But newlinethose techniques are not scalable, suffer from information loss, do not support real newlinetime data and hence not suitable for privacy-preserving Big Data mining. In this newlineresearch, a novel approach of two level privacy is proposed using pseudonymization newlineand homomorphic encryption in Apache Apache Spark framework. Several newlinesimulations are carried out on the collected dataset. Through the results obtained, we newlineobserved that execution time is reduced by 20%, privacy is enhanced by 10%. This newlinescheme is suitable for both privacy preserving Big Data publishing and mining. This newlinethesis has three contributions that include quasi identifiers detection, privacy newlinepreserving Big data mining and real time privacy preserving Big data mining. newlineThe first contribution is the novel approach for finding quasi-identifiers in the newlinedataset. These quasi-identifiers can disclose the sensitive information of explicit newlineidentifiers. So, both explicit identifiers and quasi-identifiers must be protected from newlineunauthorised access. This technique is efficient in detecting adequate number of newlinequasi-identifiers and takes less time for detection. newlineThe second contribution is to design and develop a scalable privacy model that newlineanonymizes huge datasets by employing map-reduce structure on public cloud newlinethrough parallel computa

Description

Keywords

Citation

item.page.endorsement

item.page.review

item.page.supplemented

item.page.referenced