Scalable Technique for Privacy Preserving Big Data Publishing and Mining For Data Stored in Cloud
Loading...
Date
item.page.authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Data is referred to as Big Data nowadays because of the advancement in
newlinetechnologies. This is due to heavy usage of social media which generates huge data
newlineevery second. Consider the social media platforms like Facebook and Twitter which
newlinehas a huge number of users. These users share a huge amount of data every day. With
newlinethe evolution of Big Data, data owners require the assistance of a third party
newline(e.g.,cloud) to store, analyze the data and obtain information at a lower cost.
newlineHowever, maintaining privacy is a challenge in such scenarios. It may reveal sensitive
newlineinformation about the users.
newlineThe existing research discusses different techniques to implement privacy in
newlineoriginal data using anonymization, randomization, and suppression techniques. But
newlinethose techniques are not scalable, suffer from information loss, do not support real
newlinetime data and hence not suitable for privacy-preserving Big Data mining. In this
newlineresearch, a novel approach of two level privacy is proposed using pseudonymization
newlineand homomorphic encryption in Apache Apache Spark framework. Several
newlinesimulations are carried out on the collected dataset. Through the results obtained, we
newlineobserved that execution time is reduced by 20%, privacy is enhanced by 10%. This
newlinescheme is suitable for both privacy preserving Big Data publishing and mining. This
newlinethesis has three contributions that include quasi identifiers detection, privacy
newlinepreserving Big data mining and real time privacy preserving Big data mining.
newlineThe first contribution is the novel approach for finding quasi-identifiers in the
newlinedataset. These quasi-identifiers can disclose the sensitive information of explicit
newlineidentifiers. So, both explicit identifiers and quasi-identifiers must be protected from
newlineunauthorised access. This technique is efficient in detecting adequate number of
newlinequasi-identifiers and takes less time for detection.
newlineThe second contribution is to design and develop a scalable privacy model that
newlineanonymizes huge datasets by employing map-reduce structure on public cloud
newlinethrough parallel computa