Data Leaks And Its Prevention In Mobile Application

DOI : 10.17577/NCRTCA-PID-047

Download Full-Text PDF Cite this Publication

Text Only Version

Data Leaks And Its Prevention In Mobile Application

A S Shridhar Kakade Post Graduate Student Dept. of MCA, DSCE

Bengaluru, India

Dr, Samitha Khaiyum

Professor and Head of Department Dept. of MCA, DSCE

Bengaluru, India


The fast expansion of mobile applications (apps) has raised serious questions about data security and privacy. This study investigates the topic of data leaks in mobile applications, emphasizing the dangers or the risks they provide, the effects they have on users and businesses, and the mitigation tactics that may be used to protect private information. This study offers insights into the variables causing information leaks and highlights the significance of strong security measures and legislative frameworks through a thorough assessment of the literature and analysis of real-world case studies.

This paper also focuses on prevention techniques using two main approaches such as Machine learning approach using K-Anonymity Algorithm that is K-Anonymity algorithm is used to create a sensitive data, so data set will be hidden and third parties will not be able to view the original data sets. And another approach is Mobile Application Penetration Testing. It is a security assessment procedure carried out to assess the security posture of mobile applications. In order to find potential flaws in the design, implementation, and configuration of the application, it involves simulating actual attacks and exploiting vulnerabilities. The main objective of mobile app penetration testing is to find security flaws that attackers might exploit, potentially resulting in unauthorized access, data breaches, or other illegal activities.

The research results are intended to increase awareness among app creators, consumers, and policymakers as well, allowing the creation of more trustworthy and privacy-conscious mobile app ecosystem.

Keywords: Mobile Applications, Security measures, Machine Learning, K-Anonymity Algorithm, Sensitive data, Mobile Application Penetration Testing.


    Mobile applications(Apps) has set a great impact on revolutionalising and engaging the way we interact with technology, The way we engage with technology has been revolutionized by mobile applications (apps), which put ease, connectivity, and a wealth of features at our fingertips. However, worries regarding data privacy and security have considerably increased as a result of the development of

    mobile applications and the growing quantity of personal data exchanged through these platforms[4]. The incidence of data breaches, when sensitive user information is mistakenly or purposefully exposed to unauthorized parties, is one of the most important issues facing the mobile app industry.

    A data leak is the unauthorized disclosure of sensitive information to an unintended recipient. This can happen through a variety of ways, such as Human error, Malicious attacks, System misconfigurations, Physical theft. This can be considered as a general meaning. But specifically in mobile application the meaning is almost same but slightly different as it involves some different approaches of causing it.

    So, A data leak in a mobile application is the unauthorized disclosure of sensitive information from a mobile device to an unintended recipient. This can happen through a variety of ways, such as Insecure coding, Malicious apps, User errors, Data storage.

    If we consider that mobile devices represent the future, then apps can be seen as essential and supportive elements in this process. Not too long ago, the mobile industry brought significant changes to the global technological landscape by making smartphones widely available and fueling the development of an ecosystem based on mobile applications. While the future looks promising, there is a concerning issue hidden behind the surge in smartphone app usage: the leakage of data. This advanced technology, driven by apps and held in the palm of our hands, has become a fertile ground for cybercriminals and organizations with malicious intent, aiming to steal and exploit our personal information.

    An analysis of security trends for the upcoming year revealed that 93% of malware attacks targeting organizations originated from device networks. The study further indicated that mobile devices were susceptible to phishing emails, which accounted for 52% of compromised sensitive credentials. These findings suggest that attackers' focus on mobile devices for accessing sensitive information is expected to grow in the future [2].

    To grasp the significance of potential threats that could result in data leakage, it is crucial to have a clear understanding of mobile application security and the potentially devastating consequences of such leaks.

    The problem statement says- The vulnerability of sensitive data stored or transmitted through these apps is the core issue with data leaks in mobile applications. Our daily lives have integrated mobile applications, which store and process personal data like financial information, login credentials, location data, and more. But the growing reliance on mobile apps has also drawn the attention of malicious organizations and cybercriminals, who use security flaws to gain unauthorized access to this important data. People are at serious risk from data leaks in mobile applications because their private information may be made public, which could result in identity theft, financial loss, privacy breaches, and reputational damage. In order to ensure the security of user information and preserve confidence in the digital ecosystem[4,7-9], it is essential to address and mitigate the risks connected with data leaks in mobile applications.

    Figure 1

    It is a graph that indicates Apps by category and its percentage of data leakage over the years.


    This research paper adopts a mixed-methods approach to investigate data leaks in mobile apps along with prevention of it. The methodology encompasses both qualitative and quantitative techniques to gather and analyze relevant data. The following sections provide a detailed explanation of the research design, data collection methods, and data analysis techniques employed in this study.

    The runtime data within the application should be watched while it is active in order to perform our measurement on the application in question. Therefore, a key component of our approach is to keep track of and log all activities that occur within the application before letting the analyser review the data gathered. We suggest a platform architecture with two different checkpoints for the collection of runtime data based on three key design tenets. As a first step in tracing the flow of sensitive data, tracking the data flow inside the application is necessary. When the monitored data is transmitted through a network or another application, this could be a leak that needs to be reported along with more details. This task requires that the operating system's base platform, on which the application is running, provide the operational information needed to complete it. Second, external data flow, like network traffic, should be taken into

    account in addition to local data flow. According to our initial analysis, a significant amount of malware interacts with a third-party server posing as a CandC server[3]. Additionally, studying the application's network traffic enables a better comprehension of the strategies employed by malware. Last but not least, the measurement architecture is deployable in a cloud computing environment, so all processes should be carried out automatically without human involvement.

    Figure 2

    Figure 2 depicts a suggested architecture that blends fundamental construction components with the surrounding environment. The necessary functionality from the preceding design ideas are combined into basic building components. And the operating platform is envisioned on the virtual world of cloud computing. Overall, the architecture includes three virtual execution environment instance layers: host system, mobile device, and software execution platform. Two runtime loggers are placed in the device and software platform instance, respectively, to capture operational information inside and outside of the programme.[2] Using Android as an example, a mobile device and the Android software platform may be emulated on a host system with an x86 CPU. Furthermore, the host system may be virtualized into many cloud instances.

    1. Methods of data leak in Mobile apps

      1. Electronic communication

        Employees frequently receive access to the Internet, email, and instant messaging credentials as part of their job duties. The problem is that hackers view all of these media outlets as their main targets, which is problematic. Utilizing a variety of strategies, including phishing, spoofing, and attacking target victims with malware to infiltrate the system, the data is breached or leaked.

      2. Accidental leakage

        Unintentional or intentional data leaks can happen at any time. Such illegal activities frequently take place sporadically and under unusual circumstances. Most data leaks are the result of human error and are unintentional. An employee once sent an email to the incorrect person by error, unintentionally giving that person access to private data.

        These unintentional data leaks have grave repercussions, including fines and damage to reputation.

      3. Disgruntled Employees

      Employees who are unhappy with their jobs are the ones who either plan or participate in data leaks. The majority of businesses think that email leaks and lost or stolen laptops are the main sources of data loss. On the other hand, devices like printers, cameras, photocopiers, and portable USB drives are where data leakage happens the most frequently. Despite having signed a strict employment contract, there is nothing to stop a disgruntled employee from disclosing the data if they are unhappy or have been promised sizable payouts by hackers.

    2. Data leakage's causes.

      1. Legacy Techniques and Tools

        Although there are many new threats to your data, it's still important to deal with older attack strategies that take advantage of outdated systems and tools. Not only cloud- based tools and outside SaaS offerings, but also physical devices like desktops, USBs, and printers are frequently used in modern organizations.

        Although you might require these tools to carry out legal actions (enabling staff members to print out presentations at home), they also present a significant risk. A USB or external storage device that contains sensitive information, for instance, might be lost by employees. The device might be taken by a bad guy who wants to get past the company's perimeter security.

      2. Misconfiguration Issues

        Complexity increases when application software, cloud services, and machine learning tools are used in a networked data system's configuration. For ML algorithms to have access to the data they require while preventing unnecessary data exposure, data configuration processes are crucial. Configuration errors are a common occurrence as the system becomes more complex.[5]

        You can use tools that automate many configuration processes to help lower the risk of misconfiguration (although you must also make sure these tools are configured properly)[1]. Depending on the network, a single misconfigured router may cause a data leak.

      3. Social Engineering Attacks

        In order to trick privileged users, like employees, into providing sensitive information, malicious actors frequently use social engineering techniques. Cybercriminals frequently use deception, such as by pretending to be a co worker or an employee of the IT department and inventing a justification for requesting access credentials.

        Social engineering attempts frequently aim to obtain phone numbers, login information, or the names of individuals with privileged access. To stop employees and malicious actors from accessing data they shouldn't, users must avoid disclosing sensitive information to authorized users.

      4. Zero-Day Vulnerabilities

      Software frequently has zero-day vulnerabilities, putting your company at risk without your knowledge. Zero-day flaws can lead to persistent threats that leak data covertly for months or even years before they are found. When a significant breach is reported in the news, many organizations only learn about these threats.


      The incident involving the mobile dating app Coffee Meets Bagel that happened in 2019 is one actual illustration of a data leak in a mobile application. A well-known dating app called Coffee Meets Bagel matches users based on their social media profiles and shared interests.In this data breach, it was found that Coffee Meets Bagel had experienced a breach that exposed millions of user records. Unauthorized entry into a database containing user data, including names, email addresses, and encrypted passwords, was the cause of the breach.

      Coffee Meets Bagel received a notification from a third party who claimed to have accessed their systems, and this is how they learned about the data leak. Investigations showed that a hacker had gained access to the app's systems and obtained user information.

      Due to the exposure of private user information that could be used by malicious parties, this data leak had a significant impact. Email addresses among the compromised data may be used in spamming, identity theft, or phishing scams.[7]

      Types of Information Exposed in a Data Leak

      Financial data: This data includes any information about an individual's finances or banking, including credit card numbers, tax details, bank records, invoices, and receipts.

      Emails and communication: Data leaks can expose email content, instant messaging conversations, or other forms of electronic communications. This can result in privacy breaches, compromised sensitive discussions, or the exposure of confidential business communications.[6]

      Along with these, Trade secrets and intellectual property (IP), Personally identifiable information (PII), Company, federal, or business information are some of the types of data that gets exposed in a data leak.[2][6]

      Trade secrets and intellectual property (IP)

      Information that must be kept secret and under strict security because it could risk a company's ability to continue operating, such as classified research, patents, plans, testing data, documentation for abandoned or unfinished products, designs for upcoming projects, source code for proprietary technology, and strategic company data.

      Personally identifiable information (PII)

      Information or documents that make it possible to locate or identify a person. Names, phone numbers, street addresses, social security numbers, and email addresses are examples of common PII. Cybercriminals use PIIs for fraud, identity theft, and frauds. PII frequently shows up in data leaks.[1]

      Company, federal, or business information

      Information that a company or federal agency creates and stores that is private and internal. It often contains crucial corporate data including meeting notes, HR records,

      confidential documents, performance indicators, internal communications, and company roadmaps.



    Machine learning Approach in etection of data leaks in mobile Application illustrates There are many accurate techniques and algorithms but here, we will discuss about the K-Anonymity Algorithm which is strong and widely used nowadays.[8]

    The proposed system of "Data Leakage Detection with K- anonymity Algorithm" is a method for detecting data leaks that addresses the limitations of the watermarking technique.[8] The watermarking technique involves embedding a unique code in each copy of the data that is distributed. If that copy is later discovered in the hands of an unauthorized party, the watermark can be used to identify the leaker. However, the watermarking technique has several limitations.[9]

    The proposed system is a more comprehensive approach to data leakage detection than the watermarking technique. By using a combination of techniques, the proposed system can help to detect data leaks while also minimizing the risk of false positives.

    K-Anonymity Algorithm:

    The k-anonymity algorithm is a privacy-preserving data publishing technique that makes sure that no single record in a dataset can be re-identified, even if the dataset is linked with other datasets. To do this, records with the same values for a number of quasi-identifier attributessuch as age, gender, and zip codeare grouped together. So that no single record can be uniquely identified, each group must contain at least k records.

    The k-anonymity algorithm is a privacy preserving data publishing technique that ensures that each individual record in a dataset cannot be re-identified, even if the dataset is linked with other datasets. This is achieved by grouping together records that have the same values for a set of quasi- identifier attributes, such as age, gender, and zip code. Each group must contain at least k records, so that no individual record can be uniquely identified.[9]

    The k-anonymity algorithm works as follows:

    1. Identify the quasi-identifier attributes in the dataset.

    2. Group together records that have the same values for all of the quasi-identifier attributes.

    3. If any group contains fewer than k records, then generalize the values of the quasi-identifier attributes in that group.

    4. Repeat steps 2 and 3 until all groups contain at least k records.

    The k-anonymity algorithm can be used to publish private information, like financial or medical records, while still maintaining individual privacy.

    K-anonymity does not, however, ensure perfect privacy, which must be taken into consideration.

    Even after the dataset has been k-anonymized, it may occasionally still be possible to re-identify specific individuals.[10]

    Example below:

    Name, Postcode, Age, and Gender are attributes that could all be used to help narrow down the record to an individual; these are considered quasi-identifiers as they could be found in other data sources. Disease is the sensitive attribute that we wish to study and which we assume the individual has an interest in keeping private.

    Table 1

    This second table shows the data anonymised to achieve k- anonymity of k = 3, as you can see this was achieved by generalising some quasi-identifier attributes and redacting some others. In this small example the data has been distorted quite significantly, but the larger the dataset, the less distortion is required to reach the desired level of k. [10]

    Table 2

    While k-anonymity can provide some useful guarantees, the technique comes with the following conditions:.

    1. Information that was redacted in the generalized columns must not be disclosed in the sensitive columns of interest.

    2. For a specific group of k, the values in the sensitive columns are not all the same.

    3. The dimensionality of the data must be sufficiently low. k-anonymity algorithm in Python code:

    import random

    def k_anonymity(dataset, k): """

    Performs the k-anonymity algorithm on a dataset.


    dataset: The dataset to be anonymized. k: The number of records in each group.


    A list of anonymized groups. """

    groups = []

    for record in dataset: group = []

    for attribute in record: group.append(attribute) groups.append(group)

    while not all(len(group) >= k for group in groups):

    for group in groups: if len(group) < k:

    group[random.randint(0, len(group) – 1)] =


    return groups

    if name__ == "__main ":

    dataset = [[1, 2, 3], [2, 3, 4], [3, 4, 5], [4, 5, 6],

    [5, 6, 7]]

    k = 3

    anonymized_groups = k_anonymity(dataset, k) for group in anonymized_groups:



    This code first generates a list of groups, each containing a list of records. The program then loops through the groups to see if any of them have fewer records than k. If so, the code picks one of the group's attributes at random and replaces it with the wildcard character (*). This guarantees that there are at least k records in each group. The code then continues in this manner until each group has at least k records. The list of anonymized groups is returned. The goal of this approach is whether the fake objects in distributed data sets yield significant improvement in the chance of detecting guilty agents.



      Penetration testing for mobile applications, also referred to as mobile app pen testing, is a security assessment procedure carried out to assess the security posture of mobile

      applications. In order to find potential flaws in the design, implementation, and configuration of the application, it involves simulating actual attacks and exploiting vulnerabilities. The main objective of mobile app penetration testing is to find security flaws that attackers might exploit, potentially resulting in unauthorized access, data breaches, or other illegal activities.[11]

      Penetration testers, also known as ethical hackers, are knowledgeable security specialists who thoroughly examine the mobile application for security flaws during the penetration testing procedure.[13] They use a variety of methodologies, tools, and techniques to find flaws and evaluate the application's defense against attacks. The frontend, backend, server-side APIs, data storage, and network communications may all be thoroughly examined during testing using both manual and automated methods.

      The penetration testing process typically covers several areas including: Authentication and authorization mechanisms, Data storage and transmission, Network communication , Code and application logic, Reverse Engineering, Third party integrations.

      Mobile Penetration Testing Methodology:

      This involves using automated tools to simulate attacks on the mobile application in order to identify security vulnerabilities, including data leakage vulnerabilities.

      Pen testing should ideally be scheduled in advance as part of the dev team's major update planning process and integrated into their timelines to be finished before the mobile app enters production. The most effective pen testing projects start with the involvement of the developer from the beginning. The process should begin with a meeting between the developer lead and pen testers to discuss the mobile app, comprehend its features and risks, and establish trust. Additionally, the group will decide on logistics like how the mobile pen testing team will get access to the mobile app binaries and credentials.

      Every mobile app pen test should adhere to the process outlined below, regardless of whether it is carried out by your company's security analyst or a professional mobile pen testing service like NowSecure.[12]

      The team will first go over the specific risk profile and security needs of the mobile app, as well as its threat profile, sensitive data, intellectual property, and potential vulnerabilities. Corporatesecurity procedures and industry compliance issues are other topics that will be covered.

      Figure 3

      It is a figure showing the penetration testing methodology.

      The mobile application penetration testing methodology evaluates an app to identify any potential problems with these attack vectors:

      • information recovery and mobile forensics.

      • API, network, and web service testing.

      • analysis carried out on the server.

      • Analyzing code and reverse engineering.


      Operational, technical, and human errors are to blame for the majority of data leaks. To stop this, organizations must implement a variety of technical and administrative controls. There are numerous steps in this. By adhering to them, the organization can lessen the possibility of data leaks and keep the information away from unauthorized parties[2].

      Security audits and assessment, Restrictions on access to data, Training on cybersecurity awareness, Trust no one & always verify, Employ multi-factor authentication, Monitoring of third-party risks are some of the most effective methods of Prevention of data leaks.

      1. Data loss prevention (DLP) software Usage:

        To prevent the leak, loss, or abuse of sensitive data, an effective DLP system combines technology and procedures. This kind of software begins by figuring out the various data types that an organization deals with. Data identification is often aided by machine learning and artificial intelligence (AI). Following identification of all the data, it can be divided into groups based on how sensitive it is. Following that, the program will be able to effectively tailor defenses for each data category. Preventative measures, however, occasionally fail. DLP software typically has tools to find data leaks and plug them in as quickly as possible in this situation.[3]

      2. Encryption of data:

        Data leaks are more difficult for cybercriminals to take advantage of when they are secured with an encryption key. Data can be encrypted using encryption keys, which are arbitrary collections of numbers or letters that scramble the data. This scrambling produces something known as ciphertext. Only with the aid of a decryption key can you make sense of this output.

        There are two types of encryption algorithms such as symmetric (where the encryption and decryption keys are the same) and asymmetric (where they are different). Asymmetric encryption is more secure than symmetric encryption, but symmetric encryption is faster. Nevertheless, skilled online attackers are able to decrypt highly encrypted data, sometimes even without a key. Because of this, you cannot completely rely on encryption to stop data leakage in an app.

      3. Network Access Monitoring:

    The network traffic that passes through your system should be regularly observed. By doing this, you'll be better able to spot security leaks as soon as they occur. In order to get around your security measures, someone who is planning a cyberattack must first understand them. It will be simpler to

    protect your app's data if you carefully monitor who is accessing what information and are alert to any suspicious activity.

    Third-party risk Evaluation:

    Make sure the vendors you choose to work with have robust data security procedures as well. You must carry out vendor risk analyses in order to ensure that all of your third parties abide by legal requirements[14].

    You can conduct this assessment by first compiling a list of risk criteria that includes the worst risks that your organization could encounter. Once you have this list, you can use it to evaluate potential third-party vendors and determine whether they can adequately safeguard your application. Your risk assessment must continue even after you start working with a third-party vendor. A solid framework for risk assessment and management will be very helpful.


    This research paper examined the crucial problem of data leaks in mobile apps, looking at the risks, implications, and mitigation tactics that are involved. This study has advanced knowledge of data leaks in the mobile app ecosystem and highlighted the significance of protecting data privacy and security through a thorough analysis of the body of prior research, actual case studies, and user surveys.

    The results of this study show how seriously dangerous data leaks in mobile apps are. Identity theft, fraud, and various cyberattacks can result from the compromise of personal data, including names, addresses, and financial information. Companies that do not adequately protect user data risk not only losing customers but also facing legal and financial repercussions. It is clear that data leaks have serious repercussions for both people and organizations, which emphasizes the pressing need to resolve this issue.

    Data leaks in mobile apps are caused by a number of things, such as security flaws in the development of the app, poor security procedures, and a lack of user privacy awareness[3]. The security of mobile apps is a challenge for developers, who deal with issues like coding flaws and unsafe data storage. Users themselves may also unintentionally reveal private information or grant a great deal of access without fully comprehending the risks. Addressing these issues is essential to reducing the likelihood of data leaks.[5]

    Various security precautions and best practices have been identified to lessen data leaks. Protecting user data requires a number of different measures, including encryption, secure data storage and transmission, user authentication, and open privacy policies. Additionally, legal implications and regulatory frameworks offer direction and enforce accountability in the mobile app sector. For the purpose of preventing data leaks and preserving user privacy, compliance with data protection laws and ongoing security practice monitoring are essential.

    The paper also discusses about Penetration testing and machine learning approach of detecting data leaks in mobile application through K-Anonimity Algorithm which is widely used in data security and its result. And this paper also discusses about Mobile application penetration testing.

    This research paper emphasizes the significance of strong security measures, user privacy awareness, and regulatory frameworks and serves as a valuable resource for app developers, users, and policymakers. A more safe and privacy-conscious mobile app ecosystem can be developed by promoting best practices, increasing awareness, and encouraging stakeholder collaboration.

    Recognizing that the subject of data leaks in mobile apps is a dynamic one where new issues and dangers occasionally surface. Therefore, additional study is necessary to examine current trends, examine the efficacy of mitigation measures, and evaluate the changing regulatory environment. We can make sure that mobile apps become a trusted and secure platform for users around the world by keeping up with these developments and consistently working to improve data privacy and security.


[1] Joo Keng, J. C., Wee, T. K., Jiang, L., & Balan, R.

K. (n.d.). The Case for Mobile Forensics of Private Data Leaks: Towards Large-Scale User-Oriented Privacy Protection. (PDF) the Case for Mobile Forensics of Private Data Leaks: Towards Large-Scale User-Oriented Privacy Protection | Joseph Chan Joo Keng – Forensics_of_Private_Data_Leaks_Towards_Large_Scale_ User_Oriented_Privacy_Protection

[2] H., Kim, Y., Oh, T., & Kim, J. (2015, December 27).

Analyzing User Awareness of Privacy Data Leak in Mobile Applications. Analyzing User Awareness of Privacy Data Leak in Mobile Applications.

[3] Packetlabs –

[4] Secure and flexible message-based communication for mobile apps within and across devices. (2022, August 4). Secure and Flexible Message-based Communication for Mobile Apps Within and Across Devices – ScienceDirect.

[5] Why Does Your Data Leak? Uncovering the Data Leakage in Cloud from Mobile Apps. (n.d.). Why Does Your Data Leak? Uncovering the Data Leakage in Cloud From Mobile Apps | IEEE Conference Publication | IEEE Xplore.

[6] An SDN/NFV-Enabled Architecture for Detecting Personally Identifiable Information Leaks on Network Traffic. (n.d.). An SDN/NFV-Enabled Architecture for Detecting Personally Identifiable Information Leaks on Network Traffic | IEEE Conference Publication | IEEE Xplore.

[7] ACM Digital Library. (n.d.). ACM Digital Library.

[8] 50200165X. (n.d.). 50200165X



[10] Mahanan, W., Chaovalitwongse, W. A., & Natwichai, J. (2021, July 28). Data privacy preservation algorithm with k-anonymity – World Wide Web.


[11] Alanda, A., Satria, D., Mooduto, H., & Kurniawan,

B. (2020, May 1). Mobile Application Security Penetration Testing Based on OWASP – IOPscience. Mobile Application Security Penetration Testing Based on OWASP – IOPscience. 899X/846/1/012036

[12] Reddy, S., & S. (2023, February 7). Mobile Application Penetration Testing Methodology. Penetration Testing and CyberSecurity Solution – SecureLayer7. testing-methodology/

[13] Systematic Literature Review on Penetration Testing for Mobile Cloud Computing Applications. (n.d.). Systematic Literature Review on Penetration Testing for Mobile Cloud Computing Applications | IEEE Journals & Magazine | IEEE Xplore.

[14] Detection of mobile applications leaking sensitive data. (n.d.). Detection of Mobile Applications Leaking Sensitive Data | IEEE Conference Publication | IEEE Xplore.