Popis: |
Large amount of digital data is generated rapidly all around the globe. Providing security to digital data is the crucial issue in almost all types of organizations. According to the Identity Theft Resource Center, there were 8069 data breaches between January 2005 and November 2017 [1]. In the year 2018, 477 cases registered about data breach [2]. In just three months of 2019, 145 such cases are already noticed [GK et al. in A study on dynamic data masking with its trends and implications. 38(2), 0975–8887, 3], and it continues to grow. Protecting the digital sensitive data from data breaches is the need of the hour. The main objective is to protect the privacy of individuals and society which is becoming crucial for effective functioning across businesses. Privacy enforcement today is being handled primarily through government monitored regulations and compliances. To overcome the limitations of existing masking methods, researcher proposed a non-zero random replacement masking method. Researcher has successfully developed a scalable data masking model which can be used for various data types—CSV, JSON, XML, and relational databases. To evaluate the proposed method, researcher used an internationally recognized UCI repository which is an open source of secondary data, out of 436 datasets available on the site; researcher selected five different datasets of various business domains. The selected business data is under five different categories—healthcare, social media, bank marketing, bank finance, and stock market. The researcher also contemplated about volume of datasets. Researcher applied three types of masking—substitution, shuffling, and proposed method on the selected datasets. The original dataset and masked datasets are classified by classification metric. Performance parameters measured on four different classifiers delivered sizeable variations. With respect to data samples used for analysis, results strongly augmented that the proposed data masking method can be used across the business-critical domains. The results strongly emphasize that the proposed model is the solution which not only protects the sensitive data but also maintains the usability, accuracy, and sensitivity. |