A Survey on AI-Powered Drug Repurposing using Graph Neural Networks

doi:10.5281/zenodo.19511563

Volume 15, Issue 03 (March 2026)

A Survey on AI-Powered Drug Repurposing using Graph Neural Networks

DOI : 10.5281/zenodo.19511563

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 31
Authors : Ms. Preethi N P, Alan Antony, Alen Bigi, Alen Chacko, Anju Augustine
Paper ID : IJERTV15IS031550
Volume & Issue : Volume 15, Issue 03 , March – 2026
Published (First Online): 11-04-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

A Survey on AI-Powered Drug Repurposing using Graph Neural Networks

Alan Antony

Department of Computer Science and Engineering FISAT, Angamaly, India

Alen Chacko

Department of Computer Science and Engineering FISAT, Angamaly, India

Ms. Preethi NP

Department of Computer Science and Engineering FISAT, Angamaly, India

Alen Bigi

Department of Computer Science and Engineering FISAT, Angamaly, India

Anju Augustine

Department of Computer Science and Engineering FISAT, Angamaly, India

Abstract – Drug repurposing has emerged as a cost-effective alternative to traditional drug discovery, particularly for rare and orphan diseases. With the rapid growth of biomedical data, artificial intelligence (AI) techniques have gained prominence in identifying novel drug – disease associations. Among these, Graph Neural Networks (GNNs) have shown remarkable potential due to their ability to model complex biological relationships. This survey presents a comprehensive review of AI-powered drug repurposing approaches, with a particular focus on graph- based and deep

learning methods.

General Terms

Drug Repurposing, Graph Neural Networks, Artificial Intelligence

Keywords – Biomedical Graphs, Deep Learning

INTRODUCTION

Drug discovery is a lengthy, expensive, and high-risk process, often requiring more than ten years and billions of dollars to develop a single successful drug. This challenge is amplified for rare and orphan diseases, where limited patient populations and low commercial incentives discourage pharmaceutical investment. As a result, a significant proportion of rare diseases lack approved therapeutic options.

Drug repurposing aims to identify new clinical indications for existing drugs, thereby reducing development time, cost, and safety risks. With the rapid growth of biomedical data, artificial intelligence (AI) has emerged as a powerful tool to automate and scale drug repurposing pipelines. Machine learning and deep learning models can analyze large volumes of molecular, genomic, and clinical data to uncover hidden drugdisease relationships.

Recently, Graph Neural Networks (GNNs) have gained prominence due to their ability to operate on graph-structured data. Biological systems are inherently graph-based, involving complex networks of

interactions between drugs, targets, genes, and diseases. This survey focuses on AI-powered drug repurposing methods with a particular emphasis on graph- based learning techniques.
BACKGROUND AND MOTIVATION

Biomedical data has grown exponentially due to advances in high- throughput sequencing, electronic health records, and large-scale molecular profiling initiatives. These heterogeneous data sources capture complex relationships between drugs, genes, proteins, pathways, and diseases. However, extracting meaningful insights from such interconnected data remains challenging using conventional machine learning techniques.

Graph-based representations naturally model these inter- actions, where nodes represent biological entities and edges encode relationships such as binding, regulation, or association. Graph Neural Networks (GNNs) extend traditional neural networks to graph-structured data, enabling effective learning from both node features and graph topology. This capability makes GNNs particularly suitable for drug repurposing tasks, where therapeutic effects often arise from indirect or multi-hop biological interactions.

The motivation for this survey is threefold: (i) to provide a structured overview of AI-driven drug repurposing methods,

(ii) to analyze how GNNs outperform traditional approaches in relational learning, and (iii) to identify research gaps related to scalability, interpretability, and clinical translation.

The motivation for this survey is to consolidate existing research on AI-powered drug repurposing, highlight the role of GNNs, and identify research gaps.

DATA SOURCES AND GRAPH CONSTRUCTION

Biomedical Data Sources

AI-driven drug repurposing relies fundamentally on the availability of high-quality and well-curated biomedical datasets that

describe drugs, targets, diseases, and their interrelationships. Drug-related information is primarily obtained from publicly available chemical and pharmacological databases such as DrugBank and ChEMBL. These resources provide structured data on approved and experimental drugs, including canonical SMILES representations, molecular properties, bioactivity measurements, and known drugtarget interactions. Such information is critical for learning meaningful chemical representations and for supervising drugtarget interaction prediction models.

Diseasegene association data are sourced from repositories such as DisGeNET and Orphanet. DisGeNET aggregates genedisease associations from curated databases, genome- wide association studies, and scientific literature, offering broad coverage across common and rare diseases. Orphanet, in particular, focuses on rare and orphan diseases and provides expert-curated disease definitions along with associated genes and external cross-references. These associations enable the mapping of disease phenotypes to their underlying molecular mechanisms.

Protein-centric information is obtained from databases such as UniProt and STRING. UniProt provides stable protein identifiers and functional annotations, which are essential for unifying data from multiple sources. STRING and BioGRID offer proteinprotein interaction (PPI) networks that describe functional and physical interactions between proteins, allowing the modeling of biological pathways and molecular complexes involved in disease progression.

In advanced repurposing pipelines, transcriptomic datasets from GEO and LINCS L1000 are often incorporated to capture gene expression changes associated with disease states or drug perturbations. Although optional in this work, such datasets can further enhance model performance by integrating functional cellular responses into the learning framework.

Graph Construction Strategies

Graph-based drug repurposing approaches represent biomedical knowledge as interconnected graphs, where entities are modeled as nodes and their relationships as edges. This representation allows complex biological systems to be analyzed using graph neural networks (GNNs), which are capable of learning both local and global structural patterns.

At the molecular level, drugs are represented as molecular graphs, where atoms correspond to nodes and chemical bonds correspond to edges. Each atom node is associated with feature vectors encoding physicochemical properties such as atomic number, valence, aromaticity, and hybridization state. Bond features capture bond type, conjugation, and ring membership. These molecular graphs preserve the structural and chemical context of compounds, enabling GNNs to learn expressive drug embeddings directly from chemical structure.

At the biological level, heterogeneous graphs are constructed by integrating multiple entity types, including drugs, proteins, genes, and diseases.

Node features in the heterogeneous graph may include molecular embeddings for drugs, learnable embedding vectors for proteins, and identifier-based representations for diseases. Protein nodes are indexed

and embedded using trainable embedding layers, enabling the model to learn functional protein representations based on observed interaction paterns. Disease nodes are connected to proteins through curated geneprotein mappings derived from Orphanet, thereby linking disease phenotypes to molecular targets.

By combining molecular graphs with higher-level biomedical interaction networks, the resulting graph representation captures both fine-grained chemical structure and system- level biological context. This unified representation enables graph neural networks to model complex drugtargetdisease relationships and supports the identification of potential repurposing candidates for orphan diseases.

Table 1 : Advantages of Graph Neural Networks for Drug Repurposing

Feature	Benefit
Graph structure	Natural representation of biomedical data
Message passing	Captures multi-hop biological interactions
Heterogeneous graphs	Integrates drugs, genes, and diseases
Explainability tools	Subgraph-level interpretation
Scalability	Suitable for large biomedical networks

COMPREHENSIVE SURVEY OF EXISTING LITERATURE

This section presents an in-depth survey of ten representative research works in AI-powered drug repurposing. Each study is analyzed based on its methodology, data sources, key contributions, and limitations, illustrating the evolution of computational approaches in this domain.

Singh (2024): AI for Drug Repurposing Against Infectious Diseases

Singh (2024) proposed a multi-stage artificial intelligence framework aimed at accelerating drug repurposing during infectious disease outbreaks such as COVID-19 and Ebola. The framework begins with large-scale biomedical literature mining using natural language processing techniques, including named entity recognition and relation extraction, to identify candidate drugtargetdisease relationships from sources such as PubMed and CORD-19. These candidates are refined using deep learning models, primarily convolutional and recurrent neural networks, to predict drugtarget interactions based on molecular fingerprints and protein sequence features.

In the final stage, shortlisted candidates undergo molecular docking and binding affinity estimation to validate structural

feasibility. The key contribution of this work lies in demonstrating how AI-driven pipelines can significantly reduce hypothesis generation time during public health emergencies. However, the approach relies heavily on curated literature and high computational resources, limiting scalability and explainability.
Prasad and Kumar (2021): AI-driven Drug Repurposing for SARS-CoV-2

Prasad and Kumar (2021) developed a hybrid AI-based drug repurposing pipeline targeting SARS-CoV-2 viral proteins. The methodology integrates deep learning models trained on molecular descriptors with structure-based validation using molecular docking and molecular dynamics simulations. Convolutional neural networks extract spatial molecular features, while recurrent neural networks capture sequential patterns from SMILES representations.

The framework achieved biologically meaningful predictions and demonstrated improved confidence through structural validation. However, its dependence on high- quality protein structures and intensive computation limits generalizability and scalability beyond viral diseases.
Jin and Wong (2014): Toward Better Drug Repositioning

Jin and Wong (2014) presented a foundational review categorizing drug repositioning strategies into phenotypic screening, target-based approaches, molecular docking, and network- based techniques. The study emphasized the advantages of integrative computational frameworks and discussed the limitations of isolated methodologies.

Although the work does not introduce a predictive model, it provides a conceptual foundation that influenced subsequent research, particularly the development of network-based and graph-driven repurposing approaches.
Pan et al. (2023): AI-DrugNet

Pan et al. (2023) introduced AI-DrugNet, a graph neural network framework for predicting drugdisease associations, with a focus on neurological disorders. The model constructs a heterogeneous biomedical graph integrating drugtarget, diseasegene, and proteinprotein interaction data. Graph convolutional networks are used to learn node embeddings that capture both topological structure and biological context.

The results demonstrate superior performance over traditional machine learning methods. However, the framework requires extensive curated interaction data and faces scalability challenges when applied to large biomedical networks.
Iorio et al. (2013): Connectivity Map

Iorio et al. (2013) pioneered transcriptomics-based drug repurposing through the Connectivity Map approach. The

methodology compares disease-induced gene expression sig- natures with drug-induced transcriptional profiles to identify compounds capable of reversing disease states. Drugs exhibiting strong negative correlation with disease signatures are prioritized as repurposing candidates.

The primary advantage of this approach is its interpretability and independence from prior drugtarget knowledge. However, it does not incorporate molecular structure or protein interaction networks, limiting its predictive scope.
Amiri et al. (2023): IDDI-DNN

Amiri et al. (2023) proposed IDDI-DNN, a hybrid frame- work combining Similarity Network Fusion with convolutional neural networks to predict drugdisease associations. Multiple similarity matrices derived from chemical and biological data are fused into a unified representation and processed by a CNN classifier.

While the model achieves high predictive accuracy, it treats similarity matrices as images, resulting in loss of explicit relational structure and reduced biological interpretability compared to graph-based approaches.
Wen et al. (2021): Clinical Connectivity Map

Wen et al. (2021) developed a clinical connectivity map using real-world electronic health record laboratory data. Drug and disease signatures were constructed from longitudinal laboratory measurements, and complementarity-based scoring was applied to identify repurposable drugs.

This approach offers strong clinical relevance and interpretability; however, it is constrained by data sparsity, confounding clinical factors, and limited availability of high- quality EHR datasets.
Kulkarni (2023): Review on Computational Drug Repurposing

Kulkarni (2023) presented a comprehensive survey of com- putational drug repurposing techniques, covering similarity- based methods, network-based approaches, and deep learning models. The review benchmarked representative studies and highlighted challenges such as data heterogeneity and lack of standardized evaluation protocols.

Although the study does not propose new predictive models, it provides valuable insights into methodological trends and open research problems.
Wang (2022): DrugRepo

Wang (2022) introduced DrugRepo, a scoring-based frame- work that integrates chemical similarity, target overlap, and proteinprotein interaction distance to rank drugdisease as- sociations. The framework emphasizes scalability and

interpretability, enabling large-scale screening.

However, its reliance on linear scoring limits the ability to capture complex nonlinear biological interactions, motivating the adoption of learning-based graph models.
Firoozbakht et al. (2021): Network-based Repurposing

Firoozbakht et al. (2021) proposed a network-based drug repurposing approach for breast cancer subtypes. The frame- work integrates gene expression data with protein interaction networks to identify drugs capable of reversing subtype- specific molecular signatures.

The study demonstrated biologically meaningful predictions but requires extensive omics data and lacks large-scale clinical validation, limiting broader applicability.

Table 2: Comparison of Drug Repurposing Approaches

Approach Type	Strengths	Limitations	Representativ e Works
Similarity-Based	Simple, inter- pretable	Ignores complex relations	Wang (2022)
Transcriptomics – Based	Biologically meaningful	Limited to gene expression	Iorio et al. (2013)
Deep Learnin g (CNN/RNN)	Learns complex features	Black-box, no topology	Prasad (2021)
Network-Based	Captures indi- rect effects	Data- dependent	Firoozbakht (2021)
Graph Neural Networks	Relational learning, scalable	Computatio n- ally expensive	Pan et al. (2023)

AI AND GNN-BASED DRUG
1. Similarity-Based and Machine Learning Approaches
  
  Early computational drug repurposing methods primarily relied on similarity-based inference, where drugs were com- pared based on chemical structure, shared targets, side-effect profiles,
  
  or phenotypic similarities. Chemical similarity was often computed using molecular fingerprints such as ECFP or MACCS keys, while biological similarity relied on target overlap or pathway co-membership. These similarity measures were then used to infer potential drugdisease associations through nearest-neighbor or scoring-based techniques.
  
  Traditional machine learning models, including support vector machines (SVMs), random forests, and logistic regression, were later employed to improve predictive accuracy. These models treated drug repurposing as a binary classification or ranking problem using handcrafted features derived from similarity matrices. Although such approaches were computationally efficient and interpretable, they suffered from several limitations. In particular, they struggled to generalize to unseen drugs or diseases, failed to capture nonlinear biological interactions, and were highly sensitive to feature engineering and data imbalance.
  Graph Neural Networks (GNNs) represent a paradigm shift in AI-based drug repurposing by explicitly modeling biomedical systems as graphs. In such graphs, nodes correspond to entities such as drugs, proteins, genes, or diseases, while edges represent interactions including binding, regulation, or association. GNNs learn node embeddings through iterative message- passing mechanisms, allowing information to propagate across local and global neighborhoods.
  
  Popular architectures such as Graph Convolutional Net- works (GCNs), Graph Attention Networks (GATs), and heterogeneous GNNs have been successfully applied to drug repurposing tasks. These models are particularly effective in capturing multi-hop biological relationships, integrating heterogeneous data sources, and providing a unified framework
  
  for relational learning. Recent studies consistently demonstrate that GNN-based approaches outperform similarity-based and deep learning models in predicting novel drugdisease associations. Moreover, the integration of explainability techniques, such as attention mechanisms and subgraph extraction, enhances interpretability and supports clinical trust.
EVALUATION METRICS AND BENCHMARKING

Evaluation of drug repurposing models is typically framed as a link prediction or ranking problem. Due to class imbalance and the absence of true negative samples, careful metric selection is critical. Widely used metrics include Area Under the Receiver Operating Characteristic Curve (AUC), Area Under the PrecisionRecall Curve (AUPR), Precision, Recall and Hit Rate.

Table 3 : Evaluation Metrics

Metric

Description

Accuracy

Overall correctness

Precision

Correct positive predictions

Recall

Sensitivity to positives

AUC

Ranking capability

Benchmark datasets are commonly derived from curated repositories such as DrugBank and CTD. Cross-validation and temporal split strategies are employed to evaluate generalization. However, lack of standardized benchmarks remains a major limitation in the field.
CHALLENGES AND OPEN ISSUES

Despite strong progress in AI-powered drug repurposing, several open challenges remain.

Data quality and sparsity is a primary concern, as biomedical databases are often incomplete, noisy, and biased toward well-studied drugs and diseases. The lack of reliable negative drugdisease associations further complicates supervised learning and leads to overly optimistic performance estimates during evaluation.

Scalability poses another significant challenge, since large heterogeneous biomedical graphs may contain millions of nodes and edges. Training deep Graph Neural Networks on such large-scale graphs requires substantial computational resources, efficient sampling strategies, and memory optimization techniques. This limits the applicability of complex models in real-world clinical settings.

Interpretability and explainability are critical issues for the adoption of AI-based drug repurposing methods. Many deep learning and GNN-based models operate as black boxes, making it difficult for clinicians and researchers to understand the rationale behind predictions. Although recent explainable AI techniques such as attention mechanisms and subgraph- based explanations have been proposed, they are not yet standardized or widely validated.

Another important challenge is evaluation inconsistency. Existing studies often use different datasets, validation strategies, and performance metrics, making direct comparison across models difficult. The absence of standardized benchmarks and gold-standard datasets remains a major barrier to fair assessment.

Finally, clinical translation and validation remain limited. Most AI-driven drug repurposing studies rely on retrospective data and computational validation, with few prediction progressing to experimental studies or clinical trials. Bridging

the gap between computational predictions and real-world therapeutic applications requires closer collaboration between computational scientists, biologists, and clinicians.
FUTURE RESEARCH DIRECTIONS

Future research in AI-powered drug repurposing is expected to move toward multimodal and multi-scale learning, integrating molecular structure, gene expression, protein interaction networks, clinical records, and biomedical literature. Such holistic representations can improve robustness and biological relevance.

The integration of large language models (LLMs) with GNNs is an emerging direction, enabling automated knowledge extraction from scientific literature and enhanced reasoning over biomedical graphs. Explainable AI will play a key role in clinical adoption, necessitating transparent and faithful interpretation mechanisms.

Federated and privacy-preserving learning frameworks may enable cross-institutional collaboration without sharing sensitive patient data. Additionally, closer integration with experimental validation and clinical trials will be essential to translate computational predictions into real-world therapies.
CONCLUSION

This survey presented an extensive review of AI-powered drug repurposing approaches, with a strong emphasis on graph-based learning techniques. By analyzing ten representative studies, we highlighted methodological trends, strengths, and limitations across traditional machine learning, deep learning, and Graph Neural Network models. GNN-based approaches demonstrate superior capability in modeling complex biomedical relationships and integrating heterogeneous data sources.

Despite promising results, challenges related to data quality,

scalability, interpretability, and clinical validation remain open. Addressing these issues will be critical for translating computational predictions into real-world therapeutic outcomes. Future research integrating multimodal data, explainable AI, and large-scale validation holds significant promise for accelerating drug discovery and improving global healthcare accessibility.
REFERENCES

Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. Yu, A Comprehensive Survey on Graph Neural Networks, IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 1, pp. 424, 2021.
X. Pan, Y. Liu, J. Guan, and S. Zhou, AI-DrugNet: A Graph Neural Network Framework for DrugDisease Association Prediction, IEEE Journal of Biomedical and Health Informatics, 2023.
A. Singh, Artificial Intelligence for Drug Repurposing Against Infectious Diseases, Briefings in Bioinformatics, 2024.
K. Prasad and R. Kumar, AI-driven Drug Repurposing for SARS-CoV- 2 Using Deep Learning and Molecular Docking, Journal of Biomedical Informatics, vol. 115, 2021.
G. Jin and S. T. C. Wong, Toward Better Drug Repositioning: Prioritizing and Integrating Existing Approaches, Drug Discovery Today, vol. 19, no. 5, pp. 637644, 2014.
F. Iorio, R. Tagliaferri, and D. di Bernardo, Identifying Network of Drug Mode of Action by Gene Expression Profiling, Nature Methods, vol. 6, pp. 761767, 2009.
M. Amiri, H. R. Rabiee, and M. Jalili, IDDI-DNN: A Deep Learning Framework for DrugDisease Interaction Prediction, IEEE Access, vol. 11, pp. 2456124572, 2023.
Q. Wen, Z. Wang, and Y. Li, A Clinical Connectivity Map for Drug Re- purposing Using Electronic Health Records, Nature Communications, vol. 12, 2021.
V. Kulkarni, A Review on Computational Drug Repurposing Approaches, Computers in Biology and Medicine, vol. 155, 2023.
H. Wang, DrugRepo: A Scalable Scoring Framework for Drug Repurposing, Bioinformatics, vol. 38, no. 4, pp. 10311039,

2022.
M. Firoozbakht, S. Ahmed, and J. Loscalzo, Network-based Drug Re- purposing for Breast Cancer Subtypes, PLOS Computational Biology, vol. 17, no. 6, 2021.
M. Zhang, Z. Cui, M. Neumann, and Y. Chen, An End-to-End Deep Learning Architecture for Graph Classification, AAAI Conference on Artificial Intelligence, 2018.
R. Ying, D. Bourgeois, J. You, M. Zitnik, and J. Leskovec, GNNExplainer: Generating Explanations for Graph Neural Networks, NeurIPS, 2019.
Y. Lin et al., Drug Repurposing for COVID-19 Using AI and Network- based Methods, IEEE Access, vol. 9, pp. 123047 123057, 2021.
M. Zitnik, M. Agrawal, and J. Leskovec, Modeling Polypharmacy Side Effects with Graph Convolutional Networks, Bioinformatics, vol. 34, no. 13, pp. i457i466, 2018.

Metric	Description
Accuracy	Overall correctness
Precision	Correct positive predictions
Recall	Sensitivity to positives
AUC	Ranking capability