🔒
Global Research Press
Serving Researchers Since 2012

CareerSmart: An Intelligent Recruitment Framework Using TF-IDF, BERT, and Named Entity Recognition for Resume-Job Matching

DOI : https://doi.org/10.5281/zenodo.19033816
Download Full-Text PDF Cite this Publication

Text Only Version

 

CareerSmart: An Intelligent Recruitment Framework Using TF-IDF, BERT, and Named Entity Recognition for Resume-Job Matching

Srinivasa Rao Pendela

Professor, Department of Computer Science and Engineering, Vasireddy Venkatadri Institute of Technolgy, Nambur 522508, Andhra Pradesh, India.

A. Dharma Teja

Department of Computer Science and Technology, Vasireddy Venkatadri Institute of Technolgy, Nambur 522508, Andhra Pradesh, India.

Sasideepika Dharanikota

Department of Computer Science and Technology, Vasireddy Venkatadri Institute of Technolgy, Nambur 522508, Andhra Pradesh, India

A. Gaurav Vandan

Department of Computer Science and Engineering, Vasireddy Venkatadri Institute of Technolgy, Nambur 522508, Andhra Pradesh, India.

Abstract – This paper introduces CareerSmart, a smart recruitment system that automatically uses rule-based and semantic natural language processing algorithms to screen resumes and rank the shortlisted candidates in a deployable web- based system. The suggested framework helps to ingest multi- format resumes (PDF, DOCX/DOC, TXT/RTF/ODT and image files) and use OCR fallback when scanning documents, and then text normalization with further feature extraction is suggested. The calculation of the candidate-job alignment includes a hybrid scoring pipeline that counts dynamically lexical coverage of skills by the keyword at the same time, lexical relevance TF-IDF cosine similarity, and semantic similarity scores based on the transformer-based Sentence-BERT embeddings, and adaptive weighting preserves the stability of the results under the unavailability of profound semantic components. The system is also more interpretable as it features named entity recognition and automatic feedback generation, finding matching competencies, those that are not present, and constructive improvements recommendations that applicants can take. The platform, developed on Flask and SQLite, implements end-to-end workflows of hiring, such as job posting, applicant applications, resume auto alcoholism, ranking, shortlist and rejection notification, and can be implemented in small to medium hiring companies. The application in real working conditions demonstrates that the hybrid methodology can be better at matching robustness than either of the single-method baselines, as it involves both explicit skill references and context-based semantic markers, lessening the effort of human recruiters and giving more open, evidence-based guidance to fairer and more efficient hiring results.

Keywords – Resume Screening, Recruitment Automation, Job- Candidate Matching, Natural Language Processing (NLP), TF-IDF, Sentence-BERT, Semantic Similarity, Named Entity Recognition (NER), Skill Gap Analysis, Explainable AI.

  1. INTRODUCTION

    The complexities have now been enhanced by the fast development of online job marketplaces, massive amounts of applicants, and the ever-changing skill demands in every industry. Companies usually get hundreds or even thousands of

    resume applications on a single open position which makes manual screening very slow, erratic and susceptible to human error or exhaustion. Meanwhile, the applicants demand quicker and open feedback, whereas recruiters are supposed to make quality decisions in extremely limited time. This mismatch between the workload in recruiting and the quality of decision has led to a great desire to find intelligent systems that will allow them to assist in screening, ranking, and communication, in a scalable and reliable manner. Automated decision support is becoming a necessity to most companies, particularly those with small HR department because of the increasing degree of competitiveness during hiring. The conventional applicant tracking systems largely depend on the use of key-word filtering which may overlook good candidates who may have varying wording in their resumes as compared to job description. Lexical method can also wrongly penalize candidates that show of relevant skills through project setting, domain language or some other technical language. On the other hand, the use of keywords on the resumes may overrank even when job-role fit is weak. Such restriction is particularly apparent in the technical recruiting industry, where skill expression differs among frameworks, toolchains, and descriptions of experience. Thus, the contemporary resume screening must involve techniques that bring into play distinct skill-evidence with contextual semantic knowledge, as opposed to relying on a corresponding technique. NLP offers a solid basis upon which unstructured resumes and job descriptions may be converted into machine comprehensible format. Such techniques as text cleaning, tokenization, and vectorization allow making a strong comparison between the profiles of candidates and the requirements of the position. In practice, TF-IDF and cosine similarity are still applicable to determine the lexical overlap and relevance especially in cases where domain vocabularies are explicit. Nevertheless, lexical models only continue to have problems with similarity at the meaning level involving the use of different words to express the same capability. This encourages incorporation of transformer-based semantic models that have the power to establish contextual relationships

    between candidate experience and job requirements that extend beyond direct word matches. In order to mitigate these issues, this paper presents CareerSmart, a smart recruiting platform that deploys a hybrid pipeline of resume-job connections in a web application that is production-oriented. The system is a fusion of three supplementary dimensions: skill coverage, TF-IDF- based similarity of the documents and Sentence-BERT semantic similarity. This structure allows high score reliability in the event where precise skills are available and at the same time, semantically related evidence in resumes that are not reflected in job text are identified. Adaptive behavior is also available in the architecture to ensure matching is stable even in the presence of more advanced semantic components in constrained environments. Consequently, the platform achieves a trade off between the practical deployability and high NLP capability. The other imperative need of the real-world hiring tools is wide-document compatibility, as the resumes of the applicants are uploaded in numerous forms and quality. CareerSmart favors PDF, DOCX/DOC, plain text, and image- based resumes and can extract any type of resume with OCR fallback on scanned or non-choosable ones. This flexibility in ingestion lowers the level of candidate exclusion due to formatting variations and enhances equity during the input level of the screening procedures. The resulting text is preprocessed to clean up the text and enhance the quality of downstream analysis. The system is more realistic in terms of hiring since it is designed to accommodate heterogeneous input data as opposed to the assumption that the hiring process will involve clean and standardized documents. In addition to the generation of scores, the interpretation is necessary to facilitate recruiter trust and candidate guidance. Black-box ranking that does not provide any explanation may generate doubts and decrease its utilization in the practical HR processes. CareerSmart focuses on this by matching and unmatching the skills, extracting named entities and technical terms as well as producing structured feedback summaries on the users. This feedback assists recruiters in justifying shortlist decisions and would also assist applicants in knowing their areas of specific improvement. This focus on explainable outputs makes the platform more of a filtering tool than a decision-support assistant that promotes more open recruitment communication. It is a platform based on the systems viewpoint, which is built with Flask and SQLite to give an end-to-end workflow, such as authentication, posting of jobs, receiving applications, automatic analysis, ranking, and handling of notifications. This complete stack implementation shows that state-of-the-art NLP-based matching can be implemented in a lightweight, maintainable code that is appropriate in academic prototypes and small-scale and medium deployment environments. Its design is more focused on modularity to ensure that the parsing, scoring, and feedback parts can be independently evolved with better models. It also allows easy experimentation with weighting, threshold tuning strategies in various hiring situations. The work therefore adds an algorithmic approach as well as a practical implementation of a system. Overall, the paper has placed hybrid semantic recruitment intelligence as a viable solution toward enhancing efficiency, consistency, and transparency of hiring. CareerSmart enhances the alignment of the candidates and job opportunities more than one-method screening strategies by combining keyword accuracy, semantic

    similarity, and high-level contextual embeddings. The platform additionally enhances the quality of the recruitment process by dealing with multiple format resumes and generating explainable feedback, which is a significant flaw with many current applications. The suggested method is particularly applicable when an organization is interested in scalable employment assistance without compromising on easily interpretable and easy to implement. The rest of the paper describes the system design, the corresponding methodology, the course of implementation, and the results that are observed in the real-life recruitment application.

  2. LITERATURE SURVEY

    The change in human resource management and recruitment This process has increased as a consequence of the digitalization and the disruption in the global workforce. Stone and Deadrick [1] covered the HR challenges and opportunities in the future while Sharma and Kumar [2] explored post- pandemic HR practices focusing on technology-enabled recruitment. Strategic HR shifts during the Covid-19 has been highlighted by Collings et al. [3] and Radulescu et al. [4] and large-scale employment disruptions have been reported by Albanesi and Kim [5] and Soucheray [6], underlining the need for intelligent job-matching systems. Broader workforce restructuring and data-driven talent management requirements have also been highlighted by Shi et al. [7] and the National Skills Commission [8].

    With the growing digitalization of organizations, intelligent recruitment technologies have become the key to HR systems. Almeida et al. [9] and Luburi and Vuini [10] emphasized automation and digital platforms in the area of talent acquisition as well as Bierema et al. [11] and Margherita [12] identified HR analytics and workforce matching as most important research areas. Early intelligent job recommendation systems were based on data mining and machine learning methods, e.g. preference- based job recommendation by Gupta and Garg [13], machine- learned job matching by Paparrizos et al. [14]. Collaborative and semantic approaches to recommendation were discussed by Zhang et al. [15], Bakar and Ting [16], and Patel et al. [17] using collaborative filtering, Bayesian modeling, and career path fitting.

    Content based résume – job matching approaches were promoted by Guo et al. [18] (fostering a ResuMatcher system), and Almalis et al. [23] and Diaby and Viennet [24] through text similarity and social network profile information for job recommendation. Reciprocal and mobile job matching frameworks were proposed by Wenxing et al. [25], while personalized job role recommendation based on predictive analytics and personality feature was proposed by Mirza et al. [26]. Clustering and classification based improvement in the accuracy of recommendations were studied by Nguyen et al.

    [27] and Özcan and Ogudu [28] and fuzzy logic based career recommendation came from Razak et al. [29]. Analysis of textual feature and job categorization from textual and social network data was done by Malherbe et al. [30], showing the importance of how text features are extracted in recruitment systems.

    From an organizational perspective, the topic of recruitment effectiveness and aligning job seekers in an organization have also been widely studied. Gee et al. [19] identified risks such as recruitment fraud, Pan et al. [20] confirmed efficiency gains provided by AI driven hiring system, Karemu et al. [21] and Smither [22] depicted that recruitment strategies significantly affect the employee retention and lifecycle.

  3. PROPOSED WORK

    In the proposed work one develops an intelligent recruitment platform that automates screening of resumes and ranking of the candidates based on hybrid natural language processing approach based on real hiring processes. The system will aim at minimizing the amount of manual work, enhance consistency in the evaluation of applicants and offer a clear decision support to both the recruiters and applicants. Unlike relying on the matching of the keywords, it integrates both direct skill evidence and the contextual knowledge in such a way to ensure qualified applicants are not screened off on the basis of linguistic variations. This will render the method more credible in various, massive recruitment environments where manual inspection is inconveniencing and time consuming. The next major module of the suggested system is solid resume ingestion of a variety of file types which are usually known to be provided by the candidate. It is compatible with generic document formats and image picture resumes and thus organizations can go on to process applicants without pushing them to a particular template level. In case of non selectable or scanned documents the optical character recognition is employed to extract text to perform analysis. This enhances inclusiveness and minimizes rejection due to formatting errors and not due to the quality of candidates. The system has been proved to be more representative of the conditions of hiring than the ringtoned screening tools due to the variability of input sources. Following the extraction, resume content is filtered with cleaning and normalization processing before receiving downstream analysis to force out noise and enhance the accuracy of downstream analysis. On job, every posting is transformed into single profile; that is, integrating of title, description, required skills, education, experience and certifications. This categorization enables this to ensure that the similar decisions have in mind the entire context of the role rather than the individual fields. The resulting pipeline results in similarity in text representation between the job expectations and candidate profiles. Such alignment enhances the level of scoring and the ranking of results has more sense. The corresponding strategy is laidered to attract various types of relevance. The system outlines competencies required by the organization first and then matches them with what the candidate has presented in their documents. Second, direct textual correspondence between resumes and job profiles is measured through the lexical similarity. Third, semantic analyzing encodes contextual similarity wherein similar capability is conveyed in other language. Such mixture would minimize false negatives due to absolute dependence on the keywords and enhance strength when it comes to different writing styles. Practical reliability by the proposed work is also introduced through adaptive behavior. In case higher level semantic modules are not available in a specific deployment

    environment then the pipeline falls back to existing modules with a modified scoring logic. This prevents a breakdown in the system and delivey of usable ranking output as a result of resource constraints or dependency constraints. The objective is a continuity of operation as opposed to an all-or nothing model behavior. This kind of fallback is significant to actual deployments in academic laboratories and small enterprises that have small-

    scale infrastructure. Interpretability is discussed as a necessity, but not a side component. The system, in addition to the ranking of scores, gives systematic feedback with the matched strengths, competencies not present, and direction of improvements. It also derives useful contextual object like organization and technical words to improve the quality of the explanation. This assists recruiter to defend their judgements and assists the candidates on the ways they can be better next time. Open output brings greater credibility and promotes the responsible exercise of automated screening. The platform incorporates these analytics into hiring process, such as job post, application, automatic analysis, ranking and status update. The system allows recruiters to shortlist on the basis of evidence-based scores without losing a view of the reasoning. Both candidates and recruiters get timely acknowledgements and feedback-based communication to enhance user experience. The system minimizes the coordination overhead and improved the turnaround of recruitment by connecting analysis to workflow actions. It is an end-to-end design which enhances the practical usability as opposed to standalone model demos. In general, the suggested work provides a deployable and explainable recruitment intelligence system that is both technically powerful and easy to implement. The value of it is found in the fact that it integrates multi-format processing of the resumes, hybrid lexical-semantic matching, adaptive reliability, and feedback-based transparency within a unified framework. The system can be adopted in real-life settings when the hiring teams require a quicker screening process without the need to lose fairness and interpretability. The proposed solution will offer a solid basis to efficient and responsible AI-driven recruitment because it mitigates the drawbacks of manual review and keyword-based filters.

    Fig 1: The architecture image of proposed work.

  4. EXPERIMENTATION ANALYSIS AND RESULTS

    Fig 2: This graph compares Keyword, TF-IDF, BERT, and Hybrid models by plotting true positive rate against false positive rate, where higher AUC indicates better discrimination.

    Fig 3: This graph shows each models precision-recall tradeoff for shortlist prediction, highlighting performance on imbalanced recruitment data.

    Fig 4: This graph compares Accuracy, Precision, Recall, and F1-score of all models at their best thresholds to identify overall best performance.

    Fig 5: This graph visualizes correct and incorrect shortlist predictions (TP, TN, FP, FN) for the top-performing model.

  5. CONCLUSION

    This piece of work proposed a smart recruitment system, which automates the resume screening and candidate ranking using a hybrid NLP pipeline. The platform enhances a match between candidates and jobs relative to the conventional filters based on the use of keywords alone, by integrating the ability to match based on skills, examining lexical similarity, and understanding the semantics. The channel is planned as a real-world implementation method, which offers the end-to-end hiring process, such as application input, processing, ranking, and notifying. Consequently, the system saves recruiter effort that would have otherwise been spent on repetitive tasks but does not compromise on the quality of the decisions made in high volume recruiting situations. The advantage of the developed framework is that it is strong under real-life conditions. The system supports a variety of resume formats, and it has OCR fallback of scanned or image-based documents, which opens the system up to a broader range of candidates and also minimized the number of candidates who were rejected because of file-type restrictions. It also produces explainable results including, matched strengths, skills missing and improvement recommendations, enhancing transparency to the recruiters and candidates. This interpreting nature is more likely to result in a more acceptable platform that can be adopted within the operational environment. Altogether, the suggested CareerSmart system shows that the effective and comprehensible practical implementation of AI in recruitment is possible. The adoption of the modern NLP into a small web architecture demonstrates that a step further screening functionality can be achieved without sophisticated infrastructure. The system thus created speeds up and makes hiring decisions more consistent and has a clear rationale on how to rank outcomes. Thus, this research will be a valuable addition to scalable, data-driven, and user-centered recruitment automation.

  6. FUTURE WORK

The future can be improved with the introduction of model

intelligence through the introduction of domain-adaptive language models which are trained based on recruitment specific datasets. This would assist the system to get to know role specific vocabulary, industry terms and context sensitive description of experience in various sectors a lot better. More multilingual assistance may enlarge the availability beyond multiple applicant groups and hiring use scenarios worldwide. These would enhance semantic accuracy and minimize biasing due to linguistic and phrasing difference.

Enhancing fairness, reliability, and level of evaluation is another significant movement. The version can be enhanced with formal bias audits in gendered language, education histories, and nontraditional careers and fairness conscious ranking calibration in the future. A major benchmark test on human recruiter accord metrics can offer better validity of actual hiring worth. Confidence estimation and uncertainty- contaminated feedback can also be helpful in that the recruiters learn about the borderline profiles which require a manual review rather than automatic hard-to-find decisions.

Within the scope of systems, future work could be dedicated to enterprise level of integration and workflow of continuous learning. Intertwining the platform with the ecosystems of ATS/HRMS, interview scheduler, and recruiter analytics dashboards would enhance a more realistic implementation in the production backgrounds. An organization can gradually optimize the quality of ranking through a feedback loop whereby actions of recruiters, results of interviews and successful hiring are taken into consideration when the decision is made over a period of time. These extensions bring the platform above and beyond the automated screening support system, into a more intelligent, versatile, adaptable system of recruitment intelligence.

REFERENCES

  1. D. L. Stone and D. L. Deadrick, Challenges and opportunities affecting the future of human resource management, Human Resource Management Review, vol. 25, no. 2, pp. 139145, Jun. 2015.
  2. D. N. K. Sharma and N. Kumar, Post-pandemic human resource management: Challenges and opportunities, SSRN, Singhania Univ., Jhunjhunu, India, Tech. Rep., 2022.
  3. D. G. Collings, J. McMackin, A. J. Nyberg, and P. M. Wright, Strategic human resource management and COVID-19: Emerging challenges and research opportunities, Journal of Management Studies, vol. 58, no. 5,

    pp. 13781382, Jul. 2021.

  4. C. V. Radulescu, G.-R. Ladaru, S. Burlacu, F. Constantin, C. Ioan, and

    I. L. Petre, Impact of the COVID-19 pandemic on the Romanian labor

    market, Sustainability, vol. 13, no. 1, p. 271, Dec. 2020.

  5. S. Albanesi and J. Kim, Effects of the COVID-19 recession on the U.S. labor market: Occupation, family, and gender, Journal of Economi Perspectives, vol. 35, no. 3, pp. 324, Aug. 2021.
  6. S. Soucheray, U.S. job losses due to COVID-19 highest since Great Depression. Accessed: Aug. 7, 2023. [Online]. Available: https://www.cidrap.umn.edu/covid-19/us-job-losses-due-covid-19- highest-great-depression
  7. Z. Shi, M. Chakraborty, and S. Kar, Intelligence Science III. Durgapur, India: Springer, Feb. 2021.
  8. The Shape of Australias Post COVID-19 Workforce, National Skills Commission, Australia, 2021.
  9. F. Almeida, J. D. Santos, and J. A. Monteiro, The challenges and opportunities in the digitalization of companies in a post-COVID-19 world, IEEE Engineering Management Review, vol. 48, no. 3, pp. 97 103, 3rd Quart., 2020.
  10. R. Luburi and M. Vuini, The challenges and opportunities of human resource management in the post-pandemic era, HR Technol., Creative Space Assoc., vol. 1, no. 1, pp. 2637, 2021.
  11. L. L. Bierema, J. L. Callahan, C. J. Elliott, T. W. Greer, and J. C. Collins, Human Resource Development: Critical Perspectives and Practices. London, U.K.: Routledge, 2023.
  12. A. Margherita, Human resources analytics: A systematization of research topics and directions for future research, Human Resource Management Review, vol. 32, no. 2, Jun. 2022, Art. no. 100795.
  13. A. Gupta and D. Garg, Applying data mining techniques in job recommender system for considering candidate job preferences, in Proc. Int. Conf. Advances in Computing, Communications and Informatics (ICACCI), Sep. 2014, pp. 14581465. doi: 10.1109/ICACCI.2014.6968361.
  14. I. Paparrizos, B. B. Cambazoglu, and A. Gionis, Machine learned job recommendation, presented at the 5th ACM Conf. Recommender Systems, Chicago, IL, USA, Oct. 2011.
  15. Y. Zhang, C. Yang, and Z. Niu, A research of job recommendation system based on collaborative filtering, in Proc. 7th Int. Symp. Computer Intelligence and Design, vol. 1, Dec. 2014, pp. 533538. doi: 10.1109/ISCID.2014.228.
  16. A. A. Bakar and C.-Y. Ting, Soft skills recommendation systems for IT jobs: A Bayesian network approach, in Proc. 3rd Conf. Data Mining and Optimization (DMO), Jun. 2011, pp. 8287.
  17. B. Patel, V. Kakuste, and M. Eirinaki, CaPaR: A career path recommendation framework, in Proc. IEEE 3rd Int. Conf. Big Data Computing Service and Applications (BigDataService), Apr. 2017, pp. 2330. doi: 10.1109/BigDataService.2017.31.
  18. S. Guo, F. Alamudun, and T. Hammond, RésuMatcher: A personalized

    résumé-job matching system, Expert Systems with Applications, vol. 60,

    pp. 169182, Oct. 2016. doi: 10.1016/j.eswa.2016.04.013.

  19. J. Gee, M. Button, V. Wang, D. Blackbourn, and D. Shepherd, The Real Cost of Recruitment Fraud. London, U.K.: Crowe, 2019.
  20. Y. Pan, F. Froese, N. Liu, Y. Hu, and M. Ye, The adoption of artificial intelligence in employee recruitment: The influence of contextual factors, International Journal of Human Resource Management, vol. 33, no. 6, pp. 11251147, Mar. 2022.
  21. G. Karemu, K. Gikera, and M. J. Veronese, An analysis of the effect of employee recruitment strategies on employee retention at Equity Bank, Kenya, European Journal of Business and Management, vol. 6, no. 17, 2014.
  22. L. Smither, Managing employee life cycles to improve labor retention, Leadership and Management in Engineering, vol. 3, no. 1, pp. 1923, Jan. 2003.
  23. N. D. Almalis, G. A. Tsihrintzis, N. Karagiannis, and A. D. Strati, FoDRAA new content-based job recommendation algorithm for job seeking and recruiting, in Proc. 6th Int. Conf. Information, Intelligence, Systems and Applications (IISA), Jul. 2015, pp. 17. doi: 10.1109/IISA.2015.7388018.
  24. M. Diaby and E. Viennet, Taxonomy-based job recommender systems on Facebook and LinkedIn profiles, in Proc. IEEE 8th Int. Conf. Research Challenges in Information Science (RCIS), May 2014, pp. 16. doi: 10.1109/RCIS.2014.6861048.
  25. H. Wenxing, C. Yiwei, Q. Jianwei, and H. Yin, IHR+: A mobile reciprocal job recommender system, in Proc. 10th Int. Conf. Computer Science and Education (ICCSE), Jul. 2015, pp. 492495. doi: 10.1109/ICCSE.2015.7250296.
  26. I. A. Mirza, S. Mulla, R. Parekh, S. Sawant, and K. M. Singh, Generating personalized job role recommendations for the IT sector through predictive analytics and personality traits, in Proc. Int. Conf. Technology for Sustainable Development (ICTSD), Feb. 2015, pp. 14. doi: 10.1109/ICTSD.2015.7095894.
  27. Q.-D. Nguyen, T. Huynh, and T.-A. Nguyen-Hoang, Adaptive methods for job recommendation based on user clustering, in Proc. 3rd Nat. Found. Sci. Technol. Develop. Conf. Information and Computer Science (NICS), Sep. 2016, pp. 165170. doi: 10.1109/NICS.2016.7725643.
  28. G. Özcan and S. G. Ögüdücü, Applying different classification techniques in reciprocal job recommender system for considering job candidate preferences, in Proc. 11th Int. Conf. Internet Technologies and

    Secured Transactions (ICITST), Dec. 2016, pp. 235240. doi: 10.1109/ICITST.2016.7856703.

  29. T. R. Razak, M. A. Hashim, N. M. Noor, I. H. A. Halim, and N. F. F. Shamsul, Career path recommendation system for UiTM Perlis students using fuzzy logic, in Proc. 5th Int. Conf. Intelligent and Advanced Systems (ICIAS), Jun. 2014, pp. 15. doi: 10.1109/ICIAS.2014.6869553.
  30. E. Malherbe, M. Diaby, M. Cataldi, E. Viennet, and M.-A. Aufaure, Field selection for job categorization and recommendation to social network users, in Proc. IEEE/ACM Int. Conf. Advances in Social Networks

Analysis and Mining (ASONAM), Aug. 2014, pp. 588595. doi: 10.1109/ASONAM.2014.6921646.