🔒
Global Research Press
Serving Researchers Since 2012

Deepfake Video Detection using Deep Learning and Ant Colony Optimization

DOI : https://doi.org/10.5281/zenodo.18983781
Download Full-Text PDF Cite this Publication

Text Only Version

 

Deepfake Video Detection using Deep Learning and Ant Colony Optimization

Dr. R. Kaviarasan Associate Professor Dept of CSE(CS)

RGM College of Engineering and Technology, Nandyal, AP

S. Sireesha

UG Scholar Dept of CSE(CS) RGM College of Engineering and

Technology, Nandyal, AP

G. Dinesh Karthikeyan

UG Scholar Dept of CSE(CS) RGM College of Engineering and

Technology, Nandyal, AP

K. Sindhura Reddy

UG Scholar Dept of CSE(CS) RGM College of Engineering and

Technology, Nandyal, AP

Abstract Deepfake videos are a major threat to the authenticity of digital media, as they can be easily manipulated and spread. The current state of the art in deepfake video detection using deep learning techniques has achieved promising results but is faced with several challenges, including overfitting, high computational complexity, and dependence on the dataset used. To overcome these challenges, this paper proposes a deepfake video detection system that combines deep learning techniques with Ant Colony Optimization (ACO), where deep learning is employed for feature extraction and ACO is used for optimal feature selection and hyperparameter optimization. The proposed method improves the accuracy of deepfake video detection while reducing the computational complexity. The experimental results show that the proposed model has achieved an accuracy of about 97%, precision of 96%, recall of 95%, and F1-score of 96%, which are impressive results compared to the identified benchmarks.

KeywordsDeepfake videos; Ant Colony Optimization (ACO); precision

  1. INTRODUCTION

    Normally, the production of deepfake videos occurs through the use of advanced artificial intelligence and deep learning approaches, whose production of realistic video material involves the use of faces, lips, and speech. Some of the used models include Generative Adversarial Networks (GANs), autoencoders, and deep learning models like CNNs and transformers. Most of these models are trained through the use of large datasets of genuine videos, making the production of believable videos easier and distinguishing them from the original ones difficult due to the advanced production of the videos.

    The accessibility and availability of tools for deepfake generation have caused significant impacts on society, media, and cybersecurity. Some potential applications or issues that may cause problems include misinformation, political engineering, identity theft, online fraudulent activities, and social engineering attacks. Such issues may impact reputations, opinions, and the level of trust in general media. Critical fields like journalism, law enforcement, and even national security could also be impacted by the risks associated with the existence of deepfakes.

    Despite the advancement in the methods of detection, the detection of deepfake videos still presents a challenge. Some

    of the limitations associated with the previously used methods of detection include overfitting, the associated computational complexity, and the need for the use of datasets. In addition, the use of compressed media and the difficulty experienced in the detection of low-quality and real-world videos also pose a challenge, coupled with the change in the pose of bodies and the image size. The change in the models used for the production of deepfake videos has also been a challenge in the detection of the videos.

    The highlights of this paper includes:

    • Deepfake videos can be identified effectively using meta-heuristic approach that is ACO(Ant Colony Optimization).
    • The ACO approach is been proposed to detect deepfake videos, while ACO will be used for feature selection and hyperparameter optimization.
    • This will, in turn, assist in identifying features, simplifying the problem, and improving the efficiency of the approach in terms of detection.
    • The suggested approach may address the challenges in existing detection techniques by generalizing well across data

    The remaining sections of the paper are organized as follows: Section II describes the literature review of the existing deepfake detection methods. Section III describes the proposed methodology for deepfake detection. Section IV explains the experimentation in terms of the dataset and analysis, also the results are presented based on the analysis. In the final section, i.e., in Section V, the conclusion and future scope for deepfake detection are provided.

  2. LITERATURE SURVEY

    Aryaf Al-Adwan et al. identified a hybrid deep learning model that combines convolutional neural network (CNN) and recurrent neural network (RNN) to detect deep fake videos. The weight and bias values of the CNN and RNN are tuned using particle swarm optimization (PSO), a bio-inspired optimization algorithm. In this method first video frames are preprocessed and extracted, these frames are converted into suitable format for input into the CNN. CNN and RNN are pre-trained on a large dataset of real and deepfake videos to extract features from the video frames. Pre-trained CNN and

    RNN are fine-tuned on the deepfake video detection task. High accuracy, sensitivity, specificity, and F1 score were attained by the proposed approach when tested on two publicly available datasets: Celeb-DF and the Deepfake Detection Challenge Dataset (DFDC). Specifically, the proposed method achieved an average accuracy of 97.26% on Celeb-DF and an average accuracy of 94.2% on DFDC. The results were compared to other state-of-the-art methods and showed that the proposed method outperformed many. The drawback of this approach – CNN has overfitting risk and sensitivity to input changes – RNN has computational intensity and threshold sensitivity PSO has parameter dependency and iterative complexity.

    Deressa Wodajo Deressa et al. has proposed a generative convolutional vision transformer (GenConViT) for deepfake video detection , it combines ConvNeXt and Swin Transformer models for feature extraction, and it utilizes an Autoencoder and Variational Autoencoder to learn from latent data distributions. By learning from the visual artifacts and latent data distribution, GenConViT achieves an improved performance in detecting a wide range of deepfake videos. The model is trained and evaluated on DFDC, FF++, TM, DeepfakeTIMIT, and Celeb-DF (v2) datasets. Generative Convolutional Vision Transformer model transforms the input facial images to latent spaces and extracts visual clues and hidden patterns from within them to determine whether a video is real or fake. GenConViT model has two independently trained networks and four main modules: an Autoencoder (AE), a Variational Autoencoder (VAE), a ConvNeXt layer, and a Swin Transformer. The first network includes an AE, a ConvNeXt layer, and a Swin Transformer, while the second network includes a VAE, a ConvNeXt layer, and a Swin Transformer. The first network uses an AE to transform images to a Latent Feature (LF) space, maximizing the models class prediction probability, indicating the likelihood that a given input is a deepfake. The second network uses a VAE to maximize the probability of correct class prediction and minimize the reconstruction loss between the sample input image and the reconstructed image. Both AE and VAE models extract LFs from the input facial images (extracted from video frames), which capture hidden patterns and correlations present in the learned deepake visual artifacts. Pros of this approach – strong performance in deepfake video detection, achieving high accuracy across the tested datasets, identifying a wide range of fake videos while preserving the integrity of media. Cons of this approach computational intensity, performance drops on specific datasets , manual pre-processing requirements.

    Leandro Cunha et al. propose a hybrid EfficientNet-Gated Recurrent Unit (GRU) network as well as EfficientNet-B0- based transfer learning for video forgery classification. A new PSO algorithm is proposed for hyperparameter search, which incorporates composite leaders and reinforcement learning- based search strategy allocation to mitigate premature convergence. The proposed deepfake detection system consists of three key steps, i.e. – data preprocessing for the extraction of cropped facial regions, – the proposed PSO-based hyperparameter optimization during network training stage and – model establishment using the selected optimal settings and subsequent evaluation using unseen test sam ples. In

    particular, transfer learning with EfficientNet as the backbone as well as a hybrid EfficientNet-GRU model is studied in conjunction with PSO-based hyperparameter search for synthetic video classification. Pros of this method – PSO-based EfficientNet-GRU and EfficientNet-B0 networks outperform the counterparts with manual and optimal learning configurations yielded by other search methods for several deepfake datasets. Cons of this approach high computational cost , scalability issues because of the large datasets .

    Hanan Saleh Alhaji et al. proposed an innovative approach to deepfake video detection by integrating features derived from ant colony optimizationparticle swarm optimization (ACO- PSO) and deep learning techniques. The proposed methodology leverages ACO-PSO features and deep learning models to enhance detection accuracy and robustness. Features from ACO-PSO are extracted from the spatial and temporal characteristics of video frames, capturing subtle patterns indicative of deepfake manipulation. These features are then used to train a deep learning classifier to automatically distinguish between authentic and deepfake videos. Pros of this method – achieved an accuracy of 98.91% and an F1 score of 99.12%, indicating remarkable success in deepfake detection. Cons sensitivity to image quality , difficult to handle unreadable or low quality images .

    Tackhyun Jung et al. proposed a new approach to detect Deepfakes generated through the generative adversarial network (GANs) model via an algorithm called DeepVision to analyze a significant change in the pattern of blinking, which is a voluntary and spontaneous action that does not require conscious effort. It is perform integrity verification through tracking significant changes in the eye blinking pattern of a subject in a video. The proposed method called DeepVision is implemented as a measure to verify an anomaly based on the period, repeated number, and elapsed eye blink time when eye blinks were continuously repeated within a very short period of time. Advantages – detected Deepfakes in seven out of eight types of videos with 87.5% accuracy rate. Disadvantage dataset and bench mark limitations and influence of biological and psychological factors .

    Andreas R¨ossler et al. propose an automated benchmark for facial manipulation detection. In particular, the benchmark is based on Deep Fakes , Face 2 Face , Face Swap and Neural Tex tures as prominent representatives for facial manipulations at random compression level and size. The benchmark is publicly available and contains a hidden test set as well as a database of over 1.8 million manipulated images. They performed a thorough analysis of data-driven forgery detectors. Current facial manipulation methods can be separated into two categories: facial expression manipulation and facial identity manipulation , It enables the transfer of facial expressions of one person to another per son in real time using only commodity hardware. Advantage high accuracy even with strong video compression. Disadvantage dataset dependency.

    Hessen Bougueffa Eutamene et al. propose a multimodal framework for deepfake detection with reliable accuracy. This method acquires low-level perceptual features from frames of video, including contrast, brightness, and sharpness, and computes artifact scores for artificial anomaly detection. In parallel, this method produces descriptive frame-level

    captions, summed up for generating video-level summaries for capturing contextual coherence. For training the model, this method leverages the FaceForensics++ dataset, comprising several techniques of deepFake manipulation, including DeepFake, Face2Face, FaceSwap, as well as NeuralTextures. Metadata comprised of quality measurements, the artifact scores, as well as textual captions, is tokenized, processed by a DeepSeek V2 Lite model, fine-tuned by the Low-Rank Adaptation (LoRA) procedure, as the backbone for classification. Pros it achieved 96.51% accuracy for classifying, by integrating low-level perceptual as well as high-level semantic reason. Cons high computational complexity and sensitive to compression artifacts.

    Reshma Sunil et al. conducted a comprehensive survey. Deepfakes, which involve the manipulation of image, audio and video to produce highly convincing yet completely fabricated content, present significant risks to media, politics, and personal well-being. To address this increasing problem, their comprehensive survey investigates the advancement along with evaluation of autonomous techniques for identifying and evaluating deepfake media. It provides an in- depth analysis of state-of-the-art techniques and tools for identifying deepfakes, encompassing image, video, and audio- based content. They explored the fundamental technologies, such as deep learning models, and evaluate their efficacy in differentiating real and manipulated media. Advantage provides an in depth analysis of state-of-the-art tools and foundational deep learning techniques across image, video, and audio. Disadvantage as a review paper, it does not propose a single new method but highlights the broad challenge of real time deepfake evolution.

    Andry Chowanda et al. examined the effectiveness of combining a conventional optimization technique i.e; gradient- based optimization with a metaheuristic search approach i.e; swarm intelligence to improve model performance. An inception-based architecture is utilized to model emotion recognition from facial cues. A hybrid optimization approach that integrates gradient-based and swarm intelligence techniques is employed to improve the architectures. Advantage-combines gradient based and metaheuristic search to improve performance in varying illumination and facial variances, attained a training accuracy of 99.15%,a validation accuracy of 100%. Disadvantage models can be significantly impacted by extreme environmental variances and illumination changes.

    Sarah Abdulkarem Al-shalif et al. systematically analyzed the MH techniques used for FS between 2015 and 2022, focusing on 108 primary studies from three different databases such as Scopus, Science Direct, and Google Scholar to identify the techniques used, as well as their strengths and weaknesses. MHtechniques are efficient and outperform traditional techniques, with the potential for further exploration of MH techniques such as Ringed Seal Search (RSS) to improve FS in several applications. Pros outperforms traditional statistical methods in finding optimal, reduced feature subsets. Cons high computational complexity compared to simple filter based methods.

  3. OVERVIEW OF ANT COLONY OPTIMIZATION

    The Ant Colony Optimization Algorithm for Deep Fake Detection (ACO_DFD) was designed based on the inspiration of behavior of ant in searching their food particles. The ants usually live in groups and the food that is searched by ants will be useful for the entire ant colony. There are two types of ants in the olony: First in the category is the Queen ant their prime objective is colony reproduction. Second is the Male ant which is used for reproduction and last category is the Worker ant which is responsible for foraging behavior of food. This worker ant behavior is mimicked in the ACO_DFD for effective Deep fake detection. The proposed method falls in the category of Meta-Heuristic and Nature Inspired Algorithm, so always a global solution can be achieved and it will never struck into local optima. It will always attain global solution.

    The worker ants generally search for food particles and they have to travel long distance. The collected food particles will be stored and used during emergency situations where the ants cannot leave out of their residing place. The multiple worker ants used to go in search of food particles. The ants start to shed a chemical called pheromone. This chemical will be used by other ants to bring travel and bring the food particle back. The shortest path and most frequently used path will have high intensity of pheromone while less used path the pheromone will start to evaporate. From the possible solutions best, optimal solution is selected.

    The image or video frame is taken as input

    , gives either 0/1. The value 0 denotes real and 1 denotes fake.

    The Dataset has total no of N samples and is the

    input and is the output obtained for the given frame using Equ. (1)

    The data set that is used in the work is taken from Kaggle website https://zenodo.org/record/5528418#.YpdlS2hBzDd.

    The next step is involved preprocessing where the raw input frame is extracted V= { F1 , F2,Fn} and it is resized to a dimsension of 256*256.

    Normalization process has to be carried out for transforming the pixel intensity values into standard numerical values for better convergence. Here min max normalization is used Equ.

    The normalization perform linear scaling. So the values are fixed between range of 0 to 1 which helps in active convergence and stable learning.

    The final processed output is which is shown in Equ.(4)

    The pheromone Matrix has to be initialized as which is shown in Equ (5)

    All the features will get equal pheromone value at the start. Each ant start to select a feature.

    Initially it is Best= (6)

    Each ant tries to build a subset solution based on probalistic selection (Equ. 7) using pheromone strength and

    heuristic desirability. is the Pheromone strength and is the heuristic desirability. a is pheromone importance factor and b is heuristic importance factor.

    (7)

    If Higher pheromone and stronger heuristic then it is higher probability.

    After the construction of the subset the CNN is applied on the selected features and fitness is computed using

    Equ (8). Where is the selected features and is the penalty variable. n is the total no of

    system with common computational capabilities, and the model training and testing were done using common deepfake datasets. Performance evaluation metrics like accuracy, F1- score, and AUC were calculated to assess the efficacy of the proposed method over various runs.

    If

    features.

    Then update Best= and .

    Update the pheromone trial as better subset will have higher pheromone value.

    These process like Feature selection, Classification, Fitness evaluation and Pheromone updated is repeated until max iteration is reached or till better convergence is achieved.

    Algorithm of HWOA

    Input: Dataset

    Output: Deepfake Detection D*

    1. Preprocess images in Dataset
      • Detect face
      • Resize
      • Normalize pixel values
    2. Extract CNN features F
    3. Initialize pheromone values P
    4. FOR t = 1 to T do

      FOR each ant k do

      Select feature subset Sk using P Train detector using Sk Evaluate fitness

      END FOR

      Update pheromone values P END FOR

    5. Select best feature subset S*
    6. Train final detector D* using S*
    7. Return D*
  4. EXPERIMENTAL RESULTS

    The proposed deepfake detection model based on deep learning and Ant Colony Optimization (ACO) was developed using Python because of its rich set of libraries and tools for machine learning and image processing tasks. The simulation setup was designed using common deep learning and data processing libraries, where OpenCV was utilized for video frame extraction and preprocessing, including face detection, resizing, and normalization. The CNN model was designed using deep learning libraries like TensorFlow/Keras for extracting spatial features from video frames, and the ACO algorithm was incorporated for feature selection and hyperparameter tuning. The experiments were carried out on a

    Figure 1. Accuracy Vs Iterations

    The Accuracy generally mesures the accuracy of the model. In figure 1 the ACO_DFD has obtained an improved accuracy of 5.43% when compared with GA.

    Figure 2. F1 score Vs Iterations

    The F1 score is used to balance precession and recall. In the Deepfake detection many of the time the datasets are imbalanced. In figure 2 it is evident that ACO_DFD has an improved F1 score of 6.67% when compared with the existing method GA.

    Figure 3. AUC Vs Iterations

    The Area Under ROC tells about the models ability in identifying fake and real images. In figure 3 the ACO_DFD has an improvement of 5.38% when compared with the existing method GA.

  5. CONCLUSION

The proposed system, Deepfake Video Detection using Deep Learning and Ant Colony Optimization (ACO), proves to be an effective solution for the detection of manipulated video content by leveraging the strong feature extraction capability of deep learning models along with the optimization capabilities of Ant Colony Optimization. Deep learning models, specifically Convolutional Neural Networks (CNNs), are efficient in extracting spatial and temporal irregularities of facial expressions, texture, and frame-level artifacts, which are further optimized by Ant Colony Optimization for better performance. The proposed system is an effective solution for improving the accuracy of detection, minimizing false positives, and maximizing computational efficiency compared to traditional deepfake detection systems.

The proposed system of combining deep learning with Ant Colony Optimization is an effective solution for providing a robust and efficient framework for countering the increasing threat of deepfake technology. The proposed system can be applied to various domains, including digital forensics, social media surveillance, cybersecurity, and media validation, for ensuring improved trust and reliability of digital content.

In the future, the proposed approach can be extended by using the latest deep learning models like transformers and attention models to improve the accuracy of detection against highly sophisticated deepfakes. Real-time detection approaches can also be designed for use on social media platforms and live streaming services. Moreover, multimodal analysis approaches that use video, audio, and metadata features together can be used to improve the robustness of the approach against the latest deepfake generation methods.

Future work can be done to improve the generalization capabilities of the approach on various datasets and to reduce the complexity of the approach for implementation on edge devices. The approach can also be made adaptive to next-generation AI-generated media attacks using reinforcement learning and hybrid optimization approaches, which are not limited to ACO.

REFERENCES

  1. Z. Pan, W. Yu, X. Yi, A. Khan, F. Yuan, and Y. Zheng, Recent Progress on Generative Adversarial Networks (GANs): a survey, IEEE Access, vol. 7, pp. 3632236333, Jan. 2019, doi: 10.1109/access.2019.2905015.
  2. A. H. Soudy et al., Deepfake detection using convolutional vision transformers and convolutional neural networks, Neural Computing and Applications, vol. 36, no. 31, pp. 1975919775, Aug. 2024, doi: 10.1007/s00521-024-10181-7.
  3. Y. Patel et al., Deepfake Generation and detection: case study and challenges, IEEE Access, vol. 11, pp. 143296143323, Jan. 2023, doi: 10.1109/access.2023.3342107.
  4. F. Folorunsho and B. F. Boamah, deepfake technology and its impact: ethical considerations, societal disruptions, and security threats in ai- generated media, international journal of information technology and management information systems, vol. 16, no. 1, pp. 10601080, feb. 2025, doi: 10.34218/ijitmis_16_01_076.
  5. K. T. Pedersen, L. Pepke, T. Stærmose, M. Papaioannou, G. Choudhary, and N. Dragoni, Deepfake-Driven Social Engineering: threats, detection techniques, and defensive strategies in corporate environments, Journal of Cybersecurity and Privacy, vol. 5, no. 2, p. 18, Apr. 2025, doi: 10.3390/jcp5020018.
  6. A. Kaur, A. N. Hoshyar, V. Saikrishna, S. Firmin, and F. Xia, Deepfake video detection: challenges and opportunities, Artificial Intelligence Review, vol. 57, no. 6, May 2024, doi: 10.1007/s10462-024-10810-6.
  7. A. Al-Adwan, H. Alazzam, N. Al-Anbaki, and E. Alduweib, Detection of deepfake media using a hybrid CNNRNN model and particle swarm optimization (PSO) algorithm, Computers, vol. 13, no. 4, p. 99, Apr. 2024, doi: 10.3390/computers13040099.
  8. D. W. Deressa, H. Mareen, P. Lambert, S. Atnafu, Z. Akhtar, and G. Van Wallendael, GENCONVIT: Deepfake video detection using Generative Convolutional Vision Transformer, Applied Sciences, vol. 15, no. 12, p. 6622, Jun. 2025, doi: 10.3390/app15126622.
  9. L. Cunha, L. Zhang, B. Sowan, C. P. Lim, and Y. Kong, Video deepfake detection using Particle Swarm Optimization improved deep neural networks, Neural Computing and Applications, vol. 36, no. 15, pp. 84178453, Feb. 2024, doi: 10.1007/s00521-024-09536-x.
  10. H. S. Alhaji, Y. Celik, and S. Goel, An approach to deepfake video detection based on ACO-PSO features and deep learning, Electronics, vol. 13, no. 12, p. 2398, Jun. 2024, doi: 10.3390/electronics13122398.
  11. T. Jung, S. Kim, and K. Kim, DeepVision: DeepFakes detection using human eye blinking pattern, IEEE Access, vol. 8, pp. 8314483154, Jan. 2020, doi: 10.1109/access.2020.2988660.
  12. Andreas R¨ ossler, Davide Cozzolino, Luisa Verdoliva, Christian Riess, Justus Thies, and Matthias Nießner, FaceForensics++: Learning to Detect Manipulated Facial Images, Roßler et al., ICCV, 2019.
  13. Hessen Bougueffa Eutamene, Wassim Hamidouche, Mamadou Keita, Abdelmalik Taleb-Ahmed, and Abdenour Hadid, Integrating perceptual quality analysis and caption-based features for robust deepfake video detection , Computers and Electrical Engineering , Vol 128, Article 110699, doi: 10.1016/j.compeleceng.2025.110699.
  14. Reshma Sunil, Parita Mer, Anjali Diwan, Rajesh Mahadeva, and Anuj Sharma, Exploring autonomous methods for deepfake detection: A detailed survey on techniques and evaluation, Heliyon, 23 January 2025, Volume: 11 2025, Article ID: e42273.
  15. A. Chowanda and M. I. B. M. Ariff, CNN-swarm intelligence hybrid model for facial expression recognition, Procedia Computer Science, vol. 269, pp. 844852, Jan. 2025, doi: 10.1016/j.procs.2025.09.027.
  16. S. A. Al-Shalif et al., A systematic literature review on meta-heuristic based feature selection techniques for text classification, PeerJ Computer Science, vol. 10, p. e2084, Jun. 2024, doi: 10.7717/peerj-cs.2084.