AI Powered Multimodal Resume Ranking Web Application for Wide Scale Hiring

doi:https://doi.org/10.5281/zenodo.18876420

Volume 15, Issue 02 (February 2026)

AI Powered Multimodal Resume Ranking Web Application for Wide Scale Hiring

DOI : https://doi.org/10.5281/zenodo.18876420

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 20
Authors : Mr. Prathap Sathyavedu, P Hinduja, M Deena, M Lakshmi Chaithanya Sai, Y Dilli Sravani
Paper ID : IJERTV15IS020653
Volume & Issue : Volume 15, Issue 02 , February – 2026
Published (First Online): 05-03-2026
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

AI Powered Multimodal Resume Ranking Web Application for Wide Scale Hiring

Mr. Prathap Sathyavedu

Assistant Professor, Department of CSE Annamacharya Institute of Technology and Sciences Tirupati 517520, A.P, India

P Hinduja

UG Scholar, Department of CSE, Annamacharya Institute of Technology and Sciences Tirupati 517520, A.P, India

M Deena

UG Scholar, Department of CSE, Annamacharya Institute of Technology and Sciences Tirupati 517520, A.P, India

M Lakshmi Chaithanya Sai

UG Scholar, Department of CSE Annamacharya Institute of Technology and Sciences Tirupati 517520, A.P, India

Y Dilli Sravani

UG Scholar, Department of CSE Annamacharya Institute of Technology and Sciences Tirupati 517520, A.P, India

Abstract – This work presents a web-based resume ranking system designed to support large-scale recruitment by automating resume analysis and candidate matching. The system combines modern deep learning models with traditional information re- trieval techniques to extract, organize, and compare resume con- tent against job descriptions. Resume layouts are first segmented, after which textual information is extracted and classified into meaningful sections such as skills, education, and work experi- ence. Relevant entities are then identified to enrich the extracted information. A hybrid matching strategy is applied, combining semantic similarity and keyword relevance to generate accurate and interpretable rankings. The application enables HR profes- sionals to upload resumes, configure job requirements, and view ranked candidates through an intuitive interface. Experimental results indicate that the system performs reliably across diverse resume formats, demonstrating its potential to reduce manual workload and improve the efficiency of recruitment processes.

Index Terms – Artificial Intelligence, Natural Language Pro- cessing,g, Optical Character Recognition, Named Entity Recog- nition,n, Text Recognition

INTRODUCTION
The rapid expansion of online recruitment platforms and remote work opportunities has dramatically increased the number of job applications received by organizations. As a result, Human Resources (HR) teams face growing pressure to evaluate resumes quickly and accurately. Traditional manual screening methods are time-consuming, difficult to scale, and prone to human bias, which can lead to inconsistent

decisions or the unintentional exclusion of qualified candidates. These challenges become more severe when resumes are submitted in diverse formats and layouts, making information extraction and comparison even harder.

To address these issues, automated resume analysis systems have gained attention in recent years. By leveraging advances in artificial intelligence, particularly in computer vision and natural language processing, such systems aim to standardize resume interpretation and improve candidate-job alignment. In this work, we introduce a resume ranking web application that automates the extraction, organization, and comparison of resume content with job descriptions. The goal is to assist HR professionals by reducing manual workload while improving fairness, consistency, and efficiency in the recruitment process. General motivations for digital and AI- supported recruitment are well established in prior studies on e-recruitment and hiring challenges [1] [2].
RELATED WORK
Resume parsing and ranking has been widely studied using a variety of machine learning and natural language processing techniques. Earlier approaches relied heavily on optical char- acter recognition (OCR) combined with rule-based methods to extract key information from resumes. With the emergence of transformer-based language models, more recent systems employ models such as BERT for text classification and named entity recognition to better capture contextual information.

Several studies have explored automated resume screening

by matching candidate profiles with job descriptions using keyword-based similarity, semantic embeddings, or hybrid approaches. Machine learning classifiers and deep learning architectures, including convolutional neural networks and re- current models, have also been applied to resume segmentation and ranking tasks. While these systems demonstrate promising accuracy, many struggle with layout variability, multilingual resumes, or limited generalization across domains.

Compared to prior work, our approach emphasizes a multi- modal pipeline that first understands the visual structure of resumes before applying OCR and language models. This design improves robustness across different resume formats

and supports more accurate downstream text analysis. The discussion in this section is based on a synthesis of established resume parsing and ranking literature rather than direct reuse of specific implementations [3] [4]. user
METHODOLOGY
1. System Overview
  The proposed resume ranking system adopts a multi-layer architecture that combines deep learning models with tra- ditional information retrieval techniques. This hybrid design enables efficient processing, structured information extraction, and accurate matching between resumes and job descriptions.
  1. Resume Information Extraction: HR users upload re- sumes in PDF, DOCX, or image formats. These files are converted into images and processed using a YOLOv9-based model to detect and segment textual regions. Text recognition is performed on the segmented regions using EasyOCR. The extracted content is then classified into structured resume
    ac- curacy, recall, and mAP across diverse resume formats. These results are consistent with recent studies highlighting YOLOs effectiveness in document layout analysis [6]. Accurate layout detection prior to OCR significantly improves overall resume parsing and ranking performance.
  2. Text Recognition with OCR: Text recognition is per- formed using EasyOCR, which reliably extracts multilingual text from detected resume segments and prepares structured content for subsequent classification stages [10] [11] [12].
  3. Text Classification: The extracted resume text is orga- nized using a fine-tuned multilingual BERT model from the Hugging Face Transformers library. This model groups each text segment into meaningful resume sections, making the overall content easier to analyze and compare [13] [14].
  4. Zero shot NER: Named entities are extracted using GLiNER, a zero-shot NER model that supports flexible, on- the-fly label definitions without retraining. It efficiently identi- fies key details from classified resume sections, complemented by rule-based patterns for contact information [15].
  5. Embedding Model and Hybrid Matching: Resume and job description texts are embedded using the Sentence Trans- formers library with the gte-large-en-v1.5 model. Section-level weights for skills, experience, and education can be adjusted, and final rankings combine semantic similarity and keyword matching for precise matching.
    - Cosine Similarity Matching with Resume Parts: To measure how closely each resume section aligns with the job description, cosine similarity is computed between their respective embeddings, as defined in Equation (1).
      Ejob · Eres,i
      
      sectionssuch as personal details, education, experience, and skillsusing a fine-tuned multilingual BERT model.
      
      Ejob Eres,i
      
      (1)
      
      Named entities, including locations and languages, are identified using a zero-shot NER model supported by regular expression rules.
      1. Job Description Definition: Recruiters define job de- scriptions through the system interface and assign weights to key attributes such as skills, experience, and education. This allows the matching process to reflect specific hiring priorities.
      2. Matching and Ranking: A hybrid matching strategy is applied to compare resumes with job descriptions.
      3. Semantic relevance is computed using cosine similarity over dense text embeddings, while keyword relevance is measured using BM25. Weighted scores are combined to generate a final
2. Overview of the dataset
  The dataset was created by collecting resume templates from publicly available and royalty-free online sources, includ- ing resume websites, professional networking platforms, open- source resume builders, and university career portals. This process resulted in 2,751 resumes in PDF and DOCX formats,
  - Weighted Cosine Similarity Score: Where Ejob repre- sents the embedding vector of the job description, and Eres,i is the embedding vector of the i-th resume section. These similarity values are weighted and combined to obtain the final cosine similarity score, as shown in Equation (2).
    ranking, and the most relevant candidates are presented to the computational cost [9]. After training and testing all models on a custom resume dataset, YOLOv9 demonstrated superior
    
    C = Ws · Cskills + We · Cexperience
    
    Wed · Ceducation + Wm · Cmiscellaneous
    
    (2)
    
    covering a wide range of layouts, styles, and content structures. All resumes were annotated using the Roboflow platform for object detection, with a single class labeled segment torepresent text regions. The dataset was split into training (75%), validation (19%), and test (6%) sets. Preprocessing and data augmentation techniques were applied to improve model robustness and generalization [5].
3. Implementation
  1. Object Detection: Multiple object detection models were evaluated to analyze resume layouts. DETR and Detectron2 in- troduce transformer-based and multi- architecture frameworks that improve detection performance on complex documents
[7] [8]. YOLOv9, the latest model in the YOLO family, incorporates the GELAN architecture and programmable gra- dient information, enabling efficient learning with reduced
- Keyword Matching: The BM25 algorithm [16] is em- ployed to match keywords, with particular attention given to location and language. Let Kloc and Klang represent the BM25 scores for location and language keywords, respectively. The overall keyword matching score K is then computed as the average of these two scores, as shown in Equation (3):
  K = 0.5 · (Kloc + Klang) (3)
- Overall Matching Score: Finally, the overall matching score S is obtained by combining the weighted cosine similarity score and the keyword matching score. Let C denote the cosine similarity score and K the keyword matching score, with Wk as the weight for keyword importance. The combined score is computed as shown in Equation (4):
  S = C + Wk · K (4)
  
  6) Web Application: The web application developed in this study prioritizes ease of use and thorough review through a clean and intuitive interface. It allows users to upload up to 200 resumes at a time, directly observe the models outputs, and modify job descriptions, including the assignment of weights and specification of job components. The system accommodates weight scores, job titles, company locations, job types, job details, and required skills. Additionally, it provides a detailed view of the matching results, enabling users to examine how each resume corresponds to the job description in a comprehensive manner.
EXPERIMENTAL RESULTS
We first present the training performance of our object detection models, evaluated on a custom dataset designed for resume layout analysis. This dataset, featuring various resume formats, enabled comparison of YOLOv9, DETR, and Detec- tron2 in detecting structural elements. Next, we discuss OCR- based text extraction results, followed by text classification and NER outcomes, enhanced with regular expressions. Finally, we illustrate how embedding models match resumes with job descriptions, highlighting the effectiveness of our multi-stage resume ranking approach.
- 1. Custom Dataset Analysis
    The study uses a custom dataset of 2,694 images, expanded to 4,304 via augmentation, containing 19,111 annotated in- stances across training, validation, and testing subsets. Each image averages 7.1 annotations, supporting thorough model training and evaluation.
  2. Object Detection Model Training Results
    Training parameters, including learning rate, batch size, op- timizer, and other hyperparameters for DETR, Detectron2, and YOLOv9, were set based on their respective studies [7] [8] [9]. Model performance was evaluated using Average Precision (AP) and Average Recall (AR) across multiple IoU thresholds. AP reflects detection accuracy, AR measures coverage of essential resume sections, and IoU indicates localization preci- sion. Inference times on an L4 GPU were 1.05s (DETR), 1.17s (Detectron2), and 0.24s (YOLOv9), highlighting YOLOv9s speed and precision.
  3. Text Processing and Analysis Results
    1. OCR Results: EasyOCR generally achieved high text recognition accuracy across resume sections, but errors were observed that could affect downstream processing. Common issues included character confusions (e.g., O vs. 0, l vs. 1), misreads of words such as Manager as Mana9er, punctuation mistakes, and spacing inconsistencies like high- quality appearing as high- quality. Font variations, es- pecially in headers or logos, occasionally caused complete
      misinterpretations. These errors may impact job title, skill classification, and NER accuracy. Our findings emphasize the importance of post-processing corrections or manual review and highlight the need for diverse resume datasets to improve robustness.
    2. Text Classification Results: The text classification stage follows OCR-based text extraction. We employed a pre-trained Hugging Face model [17] to categorize resume sections. Its performance was evaluated on an unseen resume dataset, with metrics summarized in Table IV, showing high accuracy across categories such as certificates, contact details, education, lan- guages, work
      experience, and skills. Leveraging a pre-trained model allows the use of knowledge from large resume corpora, improving classification across diverse formats. Performance may vary depending on dataset similarity to the original train- ing data. The classified text serves as input for the subsequent named entity recognition step, ensuring structured and accurate information extraction.
    3. NER Results: Following text classification, specific en- tities are extracted from the classified text using the GLiNER model. The NER performance depends on prior stages, par- ticularly OCR output. Our analysis shows GLiNER achieves generally good entity recognition but often produces lower confidence scores. This is expected, as the model uses a zero-shot approach and has not been trained specifically on resumes. Lower confidence may also result from resume format variability or OCR errors. Despite these limitations, GLiNER demonstrates the potential of zero-shot learning for specialized tasks, such as parsing resumes, highlighting its usefulness even in domains with limitedtraining data.
    4. Matching Results: Choosing an appropriate embed- ding model is key for effective resume-job matching. Us- ing the MTEB Benchmark, which evaluates long-sequence performance, we selected the gte-large-en-v1.5 model [?], [?] for its strong handling of lengthy texts and balanced size-performance tradeoff. This model efficiently processes resumes and job descriptions, generating accurate embeddings for semantic similarity and BM25 keyword matching. Table VII illustrates flexible weighting, showing how candidate qualifications align with job requirements.
CONCLUSION AND FUTURE WORK

In this study, we developed a resume ranking web ap- plication that employs advanced deep learning techniques to streamline recruitment. The system integrates YOLOv9 for object detection, EasyOCR for text extraction, fine-tuned mBERT for text classification, and GLiNER for named entity recognition, enabling accurate extraction, categorization, and matching of resumes with job descriptions. Key contributions include a multi-model approach for comprehensive parsing, combined cosine similarity and BM25 scoring for precise matching, and an intuitive interface with adjustable weights for HR professionals. Challenges encountered involved OCR errors due to diverse fonts and layouts, lower confidence in zero-shot NER, and limitations in semantic representation using dense embeddings. Future work aims to enhance OCR

TABLE I

Hybrid Matching Scores for an Example Scenario

Resume ID	Skills (0.2)	Experience (0.4)	Education (0.1)	Misc. (0.2)	Keyword (0.1)	Final Score
4	0.14	0.29	0.07	0.146	0	0.65
5	0.12	0.29	0.047	0.146	0	0.61
3	0.118	0.27	0.00	0.141	0.05	0.59
6	0.148	0.27	0.062	0.108	0	0.59
7	0.146	0.27	0.00	0.144	0	0.56
1	0.10	0.20	0.03	0.108	0.05	0.44
2	0.103	0.20	0.05	0.078	0	0.44
8	0.121	0.27	0.048	0.00	0	0.44

performance, develop a resume-specific NER dataset, refine information extraction, adopt hybrid embedding techniques, expand features for job seekers, update datasets continuously, and optimize models for real-time processing. These improve- ments will create a more accurate, adaptable, and efficient recruitment tool.

Acknowledgment

The authors would like to express their sincere gratitude to the faculty advisors and mentors for their continuous guidance, encouragement, and valuable feedback throughout the course of this work. We also thank our peers and colleagues for their constructive discussions and support, which greatly contributed to the completion of this document.

REFERENCES

Baykal, E. (2020). Digital Era and New Methods for Employee Recruit- ment. 10.4018/978-1-7998-1125-1.ch018.
Solanki, S., & Gujarati, D. (2024). The Digital Revolution In Recruitment: Unraveling The Impact And Challenges Of E- Recruitment. Educational Administration Theory and Practices, 30. doi: 10.53555/kuey.v30i6(S).5362
Rozario, S. D., Venkatraman, S., & Abbas, A. (2019). Challenges in recruitment and selection process: An empirical study. Challenges, 10(2), 35.
Palshikar, G. K., Pawar, S., Banerjee, A. S., Srivastava, R., Ram- rakhiyani, N., Patil, S., … & Chalavadi, D. (2023). RINX: A system for information and knowledge extraction from resumes. Data & Knowledge Engineering, 147, 102202.
Kinge, B., Mandhare, S., Chavan, P., & Chaware, S. M. (2022). Resume Screening using Machine Learning and NLP: A proposed system. International Journal of Scientific Research in Computer Science, Engi- neering and Information Technology, 8(2), 253258.
Tanberk, S., Helli, S. S., Kesim, E., & Cavsak, S. N. (2023, September). Resume Matching Framework via Ranking and Sorting Using NLP and Deep Learning. In 2023 8th International Conference on Computer Science and Engineering (UBMK) (pp. 453458). IEEE.
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., & Zagoruyko, S. (2020, August). End-to-end object detection with trans- formers. In European Conference on Computer Vision (pp. 213229). Cham: Springer International Publishing.
Kirillov, A., Wu, Y., He, K., & Girshick, R. (2020). PointRend: Image segmentation as rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 97999808).
Wang, C. Y., Yeh, I. H., & Liao, H. Y. M. (2024). YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv preprint arXiv:2402.13616.
JaidedAI. (2023). EasyOCR. GitHub. Retrieved from https://github.com/JaidedAI/EasyOCR
Vedhaviyassh, D. R., Sudhan, R., Saranya, G., Safa, M., & Arun, D. (2022, December). Comparative analysis of EasyOCR and Tesserac- tOCR for automatic license plate recognition using a deep learning algorithm. In 2022 6th International Conference on Electronics, Com- munication and Aerospace Technology (pp. 966 971). IEEE.
Wu, X., Luo, C., Zhang, Q., Zhou, J., Yang, H., & Li, Y. (2019).
Text Detection and Recognition for Natural Scene Images Using Deep Convolutional Neural Networks. Computers, Materials & Continua, 61(1).
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A.,
… & Rush, A. M. (2019). HuggingFaces transformers: State-of-the- art natural language processing. arXiv preprint arXiv:1910.03771.
Abdaoui, A., Pradel, C., & Sigel, G. (2020). Load what you need: Smaller versions of multilingual BERT. In Proceedings of SustaiNLP
/ EMNLP.
Zaratiana, U., Tomeh, N., Holat, P., & Charnois, T. (2023). GLiNER: Generalist model for named entity recognition using a bidirectional transformer. arXiv preprint arXiv:2311.08526.
Li, Z., Zhang, X., Zhang, Y., Long, D., Xie, P., & Zhang, M. (2023). Towards general text embeddings with multi-stage contrastive learning. arXiv preprint arXiv:2308.03281.
Abida, H. (2022). distilBERT-finetuned-resumes-sections. Hugging Face. Retrieved from https://huggingface.co/has-abi/distilBERT- finetuned-resumes-sections
Hugging Face. DistilBERT Authorized. Retrieved from https://huggingface.co/has-abi/distilBERTAuthorized