DOI : https://doi.org/10.5281/zenodo.19997282
- Open Access
- Authors : Prof. Shraddha Kashid, Pradnya Pandhare, Sayali Pinge, Shivam Kolhe, Kartik Davhale
- Paper ID : IJERTV15IS042961
- Volume & Issue : Volume 15, Issue 04 , April – 2026
- Published (First Online): 03-05-2026
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License:
This work is licensed under a Creative Commons Attribution 4.0 International License
RAG Agent: AI with Answers You Can Trust
Prof. Shraddha Kashid
Dept. of Computer Science and Engineering MIT Art Design and Technology University Pune, India
Sayali Pinge
Dept. of Computer Science and Engineering MIT Art Design and Technology University Pune, India
Pradnya Pandhare
Dept. of Computer Science and Engineering MIT Art Design and Technology University Pune, India
Shivam Kolhe
Dept. of Computer Science and Engineering MIT Art Design and Technology University Pune, India
Kartik Davhale
Dept. of Computer Science and Engineering MIT Art Design and Technology University Pune, India
AbstractLarge Language Models (LLMs) have shown im-pressive skills in understanding and generating natural language. However, they often give incorrect or incomplete answers when relying only on pre-trained knowledge. These issues lower factual accuracy, consistency, and user trust in AI systems. To tackle these challenges, this research introduces a Retrieval-Augmented Generation (RAG) Agent. This AI framework combines LLMs with trusted external knowledge sources. The system uses ef-fective similarity search methods to nd relevant context from specic databases or document repositories. It then combines this information to create responses that are backed by citations and take context into account. This mixed approach helps ensure factual correctness, transparency, and reliability for various queries. Additionally, the model includes a trust layer that explains and veries where each output comes from. This reduces errors and makes the information easier to understand. The proposed RAG Agent shows great promise for improving LLM performance in key areas like education, healthcare, and nance. It aims to provide users with accurate, clear, and explainable AI responses.
Index TermsIndex Terms Retrieval-Augmented Genera-tion (RAG), Large Language Models (LLMs), Explainable AI, Information Retrieval, Knowledge Integration, Transparency, Ac-curacy.
-
INTRODUCTION
Articial Intelligence (AI) has quickly progressed in recent years, with Large Language Models (LLMs) like GPT and BERT transforming natural language understanding and gen-eration. These models show impressive skills in answering questions, summarizing text, translating languages, and syn-thesizing knowledge. However, despite their strong language abilities and reasoning skills, LLMs have a major drawback known as hallucination, which involves creating incorrect, unveried, or made-up information. These mistakes happen because LLMs rely mostly on pre-trained data and do not
have direct access to updated or veried external sources. This limitation raises serious concerns in areas that require factual accuracy and accountability, including education, healthcare, law, and nance.
In practical use, users need AI systems that can deliver trustworthy, clear, and contextually accurate answers, not vague or uncertain responses. The lack of source verication and clarity in LLM-generated outputs often leads to lower condence among users. Therefore, improving LLMs with features that ensure factual reliability, context awareness, and traceability of information has become an important research goal.
To tackle these issues, Retrieval-Augmented Generation (RAG) has emerged as a promising approach. RAG merges the generative strength of LLMs with factual grounding from ex-ternal retrieval systems. Instead of relying only on the models internal knowledge, the RAG process retrieves relevant in-formation from veried sources, specialized documents, or databases before forming a response. This method ensures that the models outputs are not only coherent but also factually correct and supported by evidence that can be veried.
The proposed RAG Agent builds on this idea by creating a solid pipeline that links LLMs with retrieval tools like vector databases (e.g., Pinecone, FAISS, or Weaviate). When a user asks a question, the system retrieves semantically similar content from external sources, sends it to the LLM, and produces a response backed by citations and relevant context. This process improves accuracy, consistency, and clarity across various application areas. Adding citation tracking also builds user trust by allowing them to verify the sources of the models information.
Furthermore, the RAG Agent prioritizes explainability and transparency, which are essential for ethical AI. Users can
view the generated response and trace where the supporting data comes from. By combining retrieval and generation, the system reduces hallucination, lessens bias, and keeps relevant context throughout the question-and-answer process. Using specialized datasets enables the model to adjust to particular elds, ensuring reliability in professional and academic set-tings.
The aim of this research is to create an AI system that con-nects intelligence with trustworthiness. As industries increas-ingly depend on AI-driven solutions, the need for veriable, transparent, and context-aware models becomes crucial. The RAG Agent seeks to change how AI interacts with knowledge, turning static, pre-trained systems into dynamic, evidence-based assistants capable of providing accurate, understandable, and timely answers.
In summary, this research aids in developing a transparent, retrieval-augmented AI system that improves factual accuracy, raises user condence, and offers a scalable framework for reliable knowledge generation. The proposed RAG Agent illustrates how combining retrieval systems with LLMs can effectively address the shortcomings of traditional AI models, paving the way for responsible and dependable AI applications across various elds.
-
LITERATURE REVIEW
The development of Large Language Models (LLMs) has greatly changed natural language processing and informa-tion retrieval. However, even with their strong generative abilities, these models often have issues with factual accu-racy, transparency, and reliability. To address these problems, researchers have suggested Retrieval-Augmented Generation (RAG). This hybrid approach mixes retrieval-based evidence with generative modeling to create responses that are factually grounded and easy to understand. Recent studies have looked into combining RAG systems with dense retrieval techniques, vector databases, and explainable AI (XAI) frameworks. This combination improves contextual understanding and reduces hallucination. This section reviews existing literature on RAG architectures, retrieval methods, ways to reduce hallucination, and the importance of explainability in creating trustworthy AI systems.
-
Retrieval-Augmented Generation (RAG) and Early Work
Retrieval-Augmented Generation (RAG) is a method that combines language models with the retrieval of external documents during inference. This approach allows generated outputs to be based on clear, current evidence instead of relying solely on the models internal parameters. The orig-inal RAG paper dened a set of architectures that retrieve relevant passages and use them to guide a seq2seq generator. This results in citation-backed answers and shows signicant improvements on knowledge-intensive tasks when compared to purely parametric models.
An earlier line of work that inspired RAG is REALM. This project introduced the concept of adding a learned retrieval component during pre-training and ne-tuning. It showed
that directly retrieving documents enhances both performance and clarity in open-domain QA. REALM demonstrated that retrieval can be trained as part f the LM pipeline and used during inference to reveal the knowledge applied. [?], [?].
-
Dense Retrieval and Passage Indexing
Retrieval quality is key to RAG performance. Traditional lexical methods, like BM25, serve as strong baselines. How-ever, dense neural retrievers that map queries and passages into a shared embedding space have become standard for RAG pipelines. Dense Passage Retrieval (DPR) uses bi-encoder ar-chitectures trained with contrastive goals to create embeddings that greatly outperform BM25 in open-domain QA retrieval tasks. DPR is widely used as the retrieval backbone in modern RAG systems.
To handle large collections of embeddings efciently, scal-able approximate nearest neighbor (ANN) libraries and vector databases are used. FAISS, developed by Meta, is a popular library for quick similarity searches at a billion-scale. Managed vector database services, such as Pinecone, and open-source alternatives, like Weaviate, build on these retrieval tools to provide index management, scalability, and other production features needed by RAG systems. [?], [?].
-
Hallucinations, Reliability and Explainability
One main reason for RAG is to reduce hallucination, which is when LLMs create uent but false or made-up statements. Recent surveys and studies show that hallucina-tion is a constant issue, regardless of model size or task. Retrieval grounding helps cut down hallucinations by limiting the model to evidence, but it doesnt completely remove them. This is because retrieval can bring back irrelevant or outdated information, and generation can still misattribute or overgeneralize what it retrieves. Ways to reduce this problem include improved retriever training, passage re-ranking, answer verication modules, and clear citation methods.
Explainable AI (XAI) methods and tracking sources are related approaches. Showing which documents or passages inuenced a generated answer helps users verify claims and build trust. RAG pipelines that return helpful passages or inline citations represent both a technical and ethical move toward responsible AI. [?], [?].
-
Advances, Variants and System-Level Considerations
Since the original RAG and REALM works, the literature has grown to include many variants and improvements at the system level. These include tighter integration between the retriever and generator through end-to-end training, multi-stage retrieval processes like coarse retrieval followed by re-ranking and fusion, the use of document chunking and context windows, and hybrid search that combines lexical and semantic signals. A recent review summarizes these trends and points out ongoing challenges, such as the tradeoffs between latency and accuracy, the freshness of indexes, and effective retrieval in the face of domain shifts.
From an engineering perspective, production RAG systems must address additional issues. These include updating indexes
efciently to provide fresh knowledge, securely storing pro-prietary documents while maintaining privacy, complying with regulations like HIPAA in healthcare, and monitoring failures in retrieval and generation. Features of vector databases, such as consistency guarantees, metadata ltering, and scalable approximate nearest neighbor (ANN) backends, are critical for real-world deployment. [?], [?].
-
Gaps and Open Problems (motivation for this work
Despite progress, important gaps remain. These include retrieval relevance for unclear queries, reasoning with long contexts that involve many retrieved documents, strong de-tection of retrieval failures, automated citation formatting, and standard benchmarks for end-to-end factuality in RAG systems. These gaps inspire the design of the RAG Agent. It combines strong dense retrieval, citation-aware generation, and explainability to improve trust and transparency in critical applications. [?], [?].
-
Application and Domain Studies
RAG has been successfully applied to many knowledge-intensive domains (open-domain QA, customer support, en-terprise search, and specialized elds like medicine and
and trust in AI-driven responses. Unlike traditional Large Language Models (LLMs) that depend only on pre-trained data, the RAG Agent combines external knowledge retrieval with generative reasoning to reduce hallucination and improve interpretability.
-
Embedding and Similarity Computation
Text documents and user queries are rst converted into dense vector representations using Sentence Transformers or BERT-based encoders. Each document chunk is embedded into an n-dimensional vector space Rn, where semantically similar chunks have minimal cosine distance.
Embedding:
ei = f(di) (1)
Where:
-
ei = embedding vector of document chunk di
-
f = embedding model (e.g., BERT, OpenAI, SBERT)
The similarity between a query and document is calculated using Cosine Similarity:
law). Domain adaptationindexing domain-specic corpora
sim(q, d ) = q · ei
(2)
and ne-tuning componentsimproves factuality for special-ized queries. However, domain use also raises the bar for verication and auditing: high-risk domains require stricter provenance, human-in-the-loop checks, and evaluation metrics beyond standard NLP benchmarks. [?], [?].
-
-
-
RAG AGENT Thinking: Empathy
Fig. 1: Empathy Map for understanding RAG AGENT
Through empathy-rst design, JobDhundo ensures its plat-form remains intuitive, inclusive, and user-centered, aligning with real-world recruitment challenges.
-
-
MAJOR ALGORITHMS USED TO SOLVE THE PROBLEM
The proposed RAG Agent uses a mix of retrieval and generative algorithms to ensure factual accuracy, transparency,
i l/ql/l/eil/
The top-K documents with the highest similarity scores are selected as relevant context for the generation phase. [?].
-
Approximate Nearest Neighbor (ANN) Search
For efciency, large document embeddings are indexed using FAISS, Weaviate, or Pinecone, enabling sub-linear time similarity search. ANN search identies the nearest neigh-bors using Hierarchical Navigable Small World (HNSW) or Inverted File Index (IVF) techniques.
TopK(q) = arg max sim(q, di) (3)
diD
This step ensures efcient retrieval even for millions of documents, maintaining scalability and responsiveness in real-world retrieval-augmented systems.
-
Prompt Augmentation and Context Fusion
Once the relevant documents are retrieved, they are fused with the users query to form an enriched prompt. This ensures that the Language Model (LLM), such as GPT or T5, receives contextually relevant and factually grounded information be-fore generation.
Paug = [Q; R1; R2; … ; Rk] (4)
Where Paug is the augmented prompt combining the user query (Q) with the retrieved contexts (R).
-
Generative Model (LLM Response Generation)
The Language Model (LLM) processes the augmented prompt and generates a factual, coherent, and contextually grounded response:
-
-
Flowchart of the Proposed System
Y = LLM(Paug; ) (5)
Here, represents the model parameters. The LLM applies autoregressive decoding to generate the output token-by-token as follows:
P (yt | y<t, Paug) = softmax(W ht) (6)
Where ht denotes the hidden state at time step t, and W is the output projection matrix mapping hidden representations to token probabilities.
-
Explainability and Attribution
To ensure transparency and interpretability, explainable AI techniques such as SHAP (SHapley Additive exPlanations) and LIME are employed to identify which retrieved cotexts contributed most to the generated response.
The Shapley Value for each input token or chunk is com-puted as:
Fig. 2 illustrates the step-by-step workow of the proposed system, beginning from the user query input to the generation of an explainable and source-linked response.
User Query (Q)
-
Embed Query Vector
Retrieve Top-K Docs (Vector Database)
i = X
SN \{i}
|S|! (|N | |S| 1)! f (S {i}) f (S) (7)
Fuse Query + Docs (Prompt Augment)
|N |!
This quanties the contribution of document chunk i to the nal model output f , thereby enhancing interpretability and user trust in model decisions.
-
Citation Linking and Source Validation
To promote veriability and user trust, each generated response is linked to its supporting documents through a citation map:
C = {(yi, dj) | yi derived from dj} (8) This ensures that every generated claim or answer element
yi has an attributed source dj, enabling transparent citation and factual traceability in the system output.
-
Algorithm Summary
Table ?? summarizes the key stages of the proposed system pipeline, outlining each algorithmic component, its purpose, and the primary method employed.
TABLE I: Algorithm Summary of the Proposed System
|
Algorithm |
Purpose |
Method Used |
|
Text Embedding |
Convert text into numerical vectors |
Transformer encoders (BERT, SBERT) |
|
Similarity Search |
Retrieve most relevant con- texts |
Cosine similarity + ANN (FAISS, Pinecone) |
|
Prompt Fusion |
Combine query with context |
Context concatenation |
|
LLM Generation |
Produce factual, contextual answers |
Transformer decoder (GPT/T5) |
|
Explainability |
Attribute responses to data |
SHAP, LIME |
|
Citation Linking |
Ensure source transparency |
Document mapping |
LLM Generator (Response Output)
Explainability + Citation Mapping
Fig. 2: Flowchart of the Proposed System.
Fig. 3: Semantic RAG Agent.
Fig. 4: System Architecture of the Proposed Model.
Fig. 5: RAG Pipeline Process Flow.
-
Major Challenges Faced by Other Researchers
Despite the promising advantages of Retrieval-Augmented Generation (RAG) pipelines, many researchers report several recurring challenges. Understanding these helps inform how your RAG Agent design must respond to avoid or mitigate them.
-
Retrieval Quality, Relevance, and Noise
A RAG systems performance depends heavily on how well it retrieves relevant documents. However, several retrieval-related challenges exist:
-
Retrieval components often bring in documents that are partially relevant, outdated, or noisy. This leads to context that confuses the LLM rather than helps. (TechTarget, Artoon Solutions)
-
For domain-specic queries (e.g., medical, legal), stan-dard retriever models may not understand domain ter-minology or nuances, leading to missing or unrelated content. (PMC, simg.baai.ac.cn)
-
Problems in document chunking (too small chunks lose context; too big chunks include irrelevant info) degrade retrieval relevance. (haohoang.is-a.dev)
-
-
Hallucinations and Factual Inconsistencies
Even when retrieval is present, LLMs sometimes generate answers that are not grounded in the retrieved content:
-
Retrieved texts may be incomplete, ambiguous, or con-icting, forcing the LLM to ll gaps using its in-ternal memory, which may be outdated or incorrect. (haohoang.is-a.dev, simg.baai.ac.cn)
-
Systems may misattribute or misphrase retrieved facts. In high-stakes domains like medicine, this can lead to unsafe or misleading outputs. (PMC)
-
-
Query Understanding and Prompting Challenges
-
If a users query is vague or poorly framed, retrieval may fetch irrelevant chunks, hurting downstream generation. Effective query expansion is crucial. (haohoang.is-a.dev)
-
Prompt templates that integrate retrieved context may suf-fer from redundancy, conicting information, or excessive lengthcausing truncation or inconsistency. (Educative)
-
-
Scalability, Latency, and System Overhead
-
As the knowledge base grows, embedding and index maintenance become resource-intensive. Large similarity searches cause latency. (TechTarget)
-
The multi-stage nature of RAG (retrieval, re-ranking, gen-eration, post-processing) increases compute and memory overhead, degrading real-time performance. (TechTarget, simg.baai.ac.cn)
-
-
Freshness, Version Drift, and Data Maintenance
-
Knowledge sources can become stale, leading to outdated responses if the index isnt updated. (PMC, TechTarget)
-
Multiple versions or duplicates of documents cause in-consistent retrieval and responses. Poor metadata worsens this issue. (TechRadar)
-
-
Bias, Fairness, and Ethical Transparency
-
Retrieved documents may carry bias. If uncorrected, the generated output may amplify it. (Arbisoft)
-
Lack of explainability users cannot see why certain documents were retrieved or how the LLM used them, reducing transparency and trust. (LinkedIn)
-
-
Security, Poisoning, and Adversarial Attacks
-
RAG systems are vulnerable to knowledge base poisoning
inserting misleading or malicious documents that inuence output. (arXiv)
-
Manipulation of retrieval ranking or document sources can shift generated content adversarially. (arXiv)
-
-
Suggested Diagrams and Flowcharts to Illustrate Chal-lenges
-
Flowchart of Failure Points in RAG Pipeline:
Query Issues
User Query
Query Embedding Retrieval
(Re-ranking)
Irrelevant Docs
Stale / Biased Data
-
Smart Chunking: Use adaptive chunking where docu-ment chunks respect semantic boundaries (sections, para-graphs) rather than xed sizes. Employ overlap between chunks to preserve context.
-
Re-ranking / Cross-Encoder: After initial retrieval, use a cross-encoder or more expensive scoring model to re-rank the top-K candidates, helping weed out irrelevant or weakly related documents.
Post-processing
/ Validation
Prompt Augmentation Generation
Hallucinated Output
-
-
B. Mitigate Hallucinations & Enforce Factual Integrity: Since generation errors are serious, especially in high-stakes domains:
-
Grounded Generation: Force the LLM to produce ci-tations referencing the retrieved text so users can verify the output.
-
Answer Verication Module: Implement a post-
-
Fig. 6: Flowchart of Failure Points in the RAG Pipeline
-
Diagram: Retrieval Quality vs. Generation Accuracy:
You may include a side-by-side diagram showing:
-
Retrieval: Noise, low recall, outdted documents, chunk-ing issues.
-
Generation: Hallucination, misattribution, bias, lack of coherence.
generation check that compares generated claims to re-trieved passages to detect contradictions or fabrications.
-
Constrained Decoding: Use generation techniques that limit the models freedom (e.g., retrieval-guided or con-strained decoding) to ensure evidence-based generation.
-
-
C. Better Query Understanding & Prompting: Poor query design can degrade the entire pipeline; strategies in-clude:
Arrows can illustrate how retrieval errors propagate into gen- Query Expansion / Reformulation: Automatically ex-eration inaccuracies. pand vague queries (e.g., via synonyms or similar past
-
Bar or Pie Chart of Error Frequency: If empirical data are available, a bar or pie chart can show the relative frequency of key error types (e.g., retrieval relevance errors, hallucinations, latency issues).
-
-
Proposed Mitigation Strategies
-
Hybrid retrieval (dense + sparse) with re-ranking.
-
Versioned document sources and metadata ltering.
-
Explainability tools (e.g., SHAP, LIME) with provenance tracking.
-
Efcient chunking and dynamic context windows.
-
Performance optimizations: caching, parallel retrieval, and prompt size tuning.
-
-
-
STRATEGIES TO OVERCOME MAJOR CHALLENGES IN RAG AGENT
Below are several strategies that your RAG Agent can employ to mitigate known issues and enhance reliability, transparency, and performance.
-
A. Improve Retrieval Quality & Relevance: To improve the relevance of retrieved documents and reduce noise, use a combination of methods:
-
Hybrid Retrieval: Combine sparse (e.g., BM25) and dense retrieval models. The sparse model helps with key-word matching, while the dense model captures semantic similarity. Hybrid scoring often yields better precision.
-
Domain-Tuned Embeddings: Fine-tune embedding models on domain-specic corpora so that representations align with domain terminology.
queries) to be more specic.
-
Template-Based Prompts: Use structured templates that organize retrieved context neatly to reduce confusion.
-
Context Window Management: Limit the amount of retrieved text passed to the model to avoid overwhelming or conicting prompts.
-
D. Scalability, Latency & System Overhead: To make the system practical in real-world use:
-
Efcient Indexing and Caching: Use fast ANN al-gorithms; cache embeddings of popular queries; reuse responses where possible.
-
Parallel Processing & Asynchronous Pipelines: Fetch retrieval and re-ranking in parallel; overlap retrieval and embedding computation.
-
Incremental Updates: Instead of rebuilding entire in-dices, use incremental embedding updates or append-only indexes.
-
-
E. Maintain Freshness & Data Integrity: To keep knowl-edge up-to-date and trustworthy:
-
Periodic Index Refresh / Version Control: Schedule regular updates of document sources; maintain metadata (timestamps, version numbers) so retrieval favors fresh documents.
-
Source Filtering & Metadata Vetting: Include only credible sources; discard or down-weight documents with poor provenance or uncertain authorship.
-
-
F. Address Bias, Fairness & Ethical Transparency: To ensure fairness and maintain trust:
-
Bias Auditing: Regularly audit the sources and generated outputs for demographic or ideological bias; track source representation.
-
Explainability Tools: Use SHAP, LIME, or attention-based visualization to show which sources inuenced outputs; provide user-friendly provenance for claims.
-
User-in-the-loop: Allow feedback from users about an-swers, which can feed into retraining or system warnings.
-
-
G. Secure System & Defense Against Adversarial Inputs:
Since RAG systems are vulnerable:
-
Source Authentication & Integrity Checks: Use crypto-graphic signatures or checksums for documents to ensure authenticity.
-
Anomaly Detection: Monitor retrieval outputs and detect suspicious or malicious documents using unsupervised methods (e.g., Isolation Forests, Autoencoders).
-
Access Controls & Audit Trails: Maintain logs of queries, retrieved documents, and generation steps; en-force permissions for sensitive data.
User Query
Query Expansion
Retrieval
Re-ranking
Prompt Augmentation
LLM Generation
Verication & Attribution
Final Output
-
-
Diagrams and Flowcharts: 1) Pipeline with Mitigation Layers:
Metadata Filtering
Bias Audit / Explainability
TABLE II: Challenges and Corresponding Mitigation Strate-gies
Challenge
Proposed Strategy
Irrelevant retrieval
Hybrid retrieval, domain-tuned embeddings, re-
ranking
Hallucination
Grounded generation, answer verication, con-
strained decoding
Query vagueness
Query expansion, template-based prompts
Latency
Caching, ANN, parallel processing
Data drift / stale-
ness
Periodic refresh, version control, metadata vet-
ting
Bias & ethics
Audit tools, provenance, user feedback
Adversarial
sources
Source authentication, anomaly detection
Fig. 8: RAG Pipeline Process Flow.
-
Table: Challenge vs. Strategy:
Fig. 7: RAG Pipeline with Mitigation Layers
-
-
-
Layered Architecture Diagram: This diagram can illustrate how mitigation strategies apply at each layer of the RAG architecture (Data Sources, Retrieval, Generation, Post-Processing).
-
-
LIMITATIONS OF EXISTING TECHNIQUES
The rapid evolution of Large Language Models (LLMs) has signicantly advanced natural language understanding and generation capabilities across various domains. However, traditional LLMs still suffer from major drawbacks such as hallucinations, lack of factual grounding, and limited explain-ability. To address these shortcomings, researchers have intro-duced Retrieval-Augmented Generation (RAG) an approach that integrates external knowledge retrieval with generative modeling. Despite the promise of improving factual accuracy and contextual relevance, current RAG frameworks are not without their limitations.
Existing RAG systems face a variety of technical, archi-tectural, and ethical challenges, ranging from low retrieval precision and context-window constraints to scalability issues and computational overhead. Moreover, inconsistencies be-tween retrieved evidence and generated text can lead to factual inaccuracies or misleading responses. Many studies have also highlighted persistent issues such as bias propagation, lack of interpretability, and inefcient evaluation frameworks. AsRAG adoption expands across real-world applications like medical assistance, legal reasoning, and question answering, these limitations become more critical to address.
This section provides a comprehensive overview of the key limitations identied in existing RAG implementations, sup-ported by recent research ndings, comparative analyses, and performance evaluations. The following subsections highlight the most prominent constraints and their corresponding impact on the reliability and performance of RAG-based systems.
-
Key Limitations of RAG Systems
-
Retrieval Relevance and Quality: RAG systems heavily depend on the relevance of retrieved documents. Limitations include:
-
Retrieved documents may be partially relevant or tan-gential, introducing noise that impairs the nal generated answers
-
Knowledge bases may be outdated, incomplete, or in-consistent, leading to hallucinations or answers based on missing or incorrect context
-
-
Context Length and Integration Constraints:
-
LLMs have xed context windows; when many retrieved passages are appended, important parts may be truncated or lost, reducing coherence
-
Redundancy and irrelevant content can crowd the context, causing the generative model to attend less to critical parts
-
-
Computational Latency and Infrastructure Complexity:
-
Retrieval, embedding, re-ranking, and generation increase computation per query, adding latency, especially for real-time systems
-
Complex system architecture requires monitoring, main-tenance, and knowledge base versioning, which is chal-lenging for smaller teams
-
-
Hallucinations, Contradictions, and Misinterpretation:
-
Generative models may produce statements not supported by context, due to ambiguous or conicting retrieved text
-
Conicting sources may cause output contradictions
-
-
Bias, Fairness, and Ethical Issues:
-
Retrieval may fetch biased sources, which generative models can amplify
-
Lack of transparency in source selection reduces trust and auditability
-
-
Scalability and Maintenance Issues:
-
As knowledge bases grow, vector indexing and retrieval performance can degrade
-
Document updates, version control, and consistency are often neglected
-
-
Lack of Standard Metrics and Benchmarking:
-
Different RAG systems use varying metrics and datasets, making comparisons difcult
-
Claims of improvement are sometimes not reproducible
-
-
-
Diagram and Flowchart Suggestions
Fig. 9: RAG pipeline with highlighted limitations at each stage: embedding, retrieval, re-ranking, context integration, and generation.
-
RAG Pipeline Flowchart with Failure Points:
-
Table: Limitation vs. Impact vs. Example:
-
Table: Limitations and Sources:
-
TABLE III: Impact of RAG System Limitations on Output
Limitation
Impact on Output
Real-World Example
Irrelevant retrieval
Wrong or misleading answers
Legal query citing wrong jurisdiction
Context window overow
Truncation of key info
Omitting crucial clause in summary
Hallucinations
Fabricated or incorrect facts
Medical advice with wrong dosage
Bias in sources
Discriminatory outputs
Hiring assistant favoring certain demo-
graphics
Latency
Poor user experience
Slow response in customer support
Maintenance cost
High cost, outdated knowledge
KB not updated, users get old specs
Lack of benchmarking
Difcult to assess improvements
Non-comparable system claims
TABLE IV: RAG System Limitations and Sources
Limitation
Description
Sources / Links
Retrieval relevance
Irrelevant, outdated, or partial documents; conicting
info
[?], [?], [?] Context length
Key info truncated; redundancy; conicting context
[?], [?] Latency
Slower responses; high computational cost; complex
stack
[?], [?] Hallucinations
Generated text not factually grounded
[?], [?] Bias
Outputs reinforce biases
[?], [?] Scalability
Degradation with large KB; updates needed
[?], [?] Lack of benchmarking
Difcult to compare system performance
[?], [?] -
-
Proposed Solutions for Improving RAG Systems
-
Introduction: To overcome the limitations identied in RAG (Retrieval-Augmented Generation) systems, a combi-nation of architectural, algorithmic, and operational strate-gies is required. These solutions aim to enhance retrieval relevance, reduce hallucinations, improve latency, maintain fairness, and ensure system scalability. The proposed methods are structured around key challenges such as retrieval quality, context integration, computational efciency, bias mitigation, and standardization.
-
Improving Retrieval Relevance and Knowledge Quality:
-
Hybrid Retrieval: Combine sparse retrieval (e.g., BM25) and dense retrieval (e.g., embeddings) to improve seman-tic coverage while maintaining keyword precision [?], [?].
-
Domain-Specic Fine-Tuning: Fine-tune embedding models on domain-specic corpora to better capture ter-minology and reduce irrelevant matches [?].
-
Dynamic Knowledge Base Updating: Implement au-tomated pipelines to continuously update, clean, and validate knowledge sources, ensuring completeness and reducing outdated information [?].
-
-
Context Integration and Management:
-
Adaptive Chunking: Break large documents into seman-tically coherent chunks to maximize information retention within LLM context windows [?].
-
Relevance Re-Ranking: Use neural or hybrid re-ranking techniques to prioritize highly relevant passages, reducing redundancy and irrelevant context [?].
-
Vector Index Optimization: Use approximate nearest neighbor search (ANN) libraries like FAISS or Milvus to speed up retrieval for large corpora [?].
-
Caching Frequent Queries: Store embeddings and top results for commonly asked queries to reduce redundant computations [?].
-
Modular Microservice Architecture: Separate retrieval, re-ranking, and generation into optimized microservices, allowing horizontal scaling and easier maintenance [?].
-
Reducing Hallucinations and Cntradictions:
-
Source Attribution and Verication: Include retrieved source references in the generation prompt and implement fact-checking layers to ensure grounded outputs [?].
-
Conict Resolution Mechanisms: Detect conicting information from multiple sources and apply consensus or weighted voting strategies [?].
-
Instruction-Tuned LLMs: Fine-tune models with retrieval-aware instructions to minimize unsupported statements [?].
-
-
Bias Mitigation and Fairness:
-
Diverse Knowledge Sources: Include multiple perspec-tives and unbiased sources in the knowledge base to reduce skewed outputs [?].
-
Bias Auditing Tools: Periodically evaluate outputs for fairness and adjust retrieval or re-ranking mechanisms accordingly [?].
-
Transparent Retrieval Logs: Maintain logs of retrieved documents for auditing and traceability [?].
-
-
Scalability and Maintenance Solutions:
-
Context Window Optimization: Selectively include re-trieved content based on query importance to avoid trun-cation of critical information [?].
-
-
-
-
Reducing Latency and Improving Infrastructure Ef-ciency:
-
Incremental Indexing: Implement incremental updates in vector databases to avoid re-indexing the entire corpus [?].
-
Version Control for Knowledge Base: Track document versions to ensure consistency and easy rollback [?].
-
Performance Monitoring: Continuously monitor re-trieval precision, latency, and KB growth to maintain system quality [?].
-
-
Standardization and Benchmarking:
-
Unied Metrics: Adopt standard evaluation metrics like Precision@K, F1-score, hallucination rate, and source attribution accuracy [?].
-
Benchmark Datasets: Use open datasets for repro-ducible comparisons of RAG system performance [?].
-
-
Table: Proposed Solutions vs. Benets vs. Challenges:
Fig. 10: Proposed RAG pipeline incorporating solutions: hy-brid retrieval, adaptive chunking, re-ranking, bias mitigation, and source attribution.
Fig. 11: Relationship between knowledge base size, retrievel latency and precision in RAG system
-
Proposed RAG Pipeline with Improvements (Flowchart):
The proposed solutions aim to improve relevance, reduce hallucinations, enhance fairness, and maintain system scal-ability. Implementation of these strategies can make RAG systems more reliable, accurate, and efcient for real-world applications.
-
Summary Table of Limitations and Solutions
-
Proposed RAG Pipeline (Flowchart)
Fig. 12: Proposed RAG pipeline with improvements including hybrid retrieval, adaptive chunking, re-ranking, source attribu-tion, and bias mitigation.
In conclusion, adopting these strategies can substantially enhance RAG systems reliability, relevance, and efciency, making them suitable for diverse applications such as legal assistance, medical advice, education, and customer support.
-
-
Conclusion
Retrieval-Augmented Generation (RAG) systems offer sig-nicant advantages by combining retrieval-based knowledge with large language models, enhancing answer relevance and domain coverage. However, challenges such as retrieval qual-ity, context limitations, latency, hallucinations, bias, and scal-ability can impact performance. This research identies these limitations and proposes solutions including hybrid retrieval, domain-specic ne-tuning, adaptive chunking, re-ranking, source verication, bias auditing, and scalable infrastructure.
TABLE V: Proposed Solutions for RAG Systems: Benets and Challenges
|
Proposed Solution |
Expected Benet |
Implementation Challenge |
|
Hybrid retrieval (sparse + dense) |
Improved relevance and semantic coverage |
Integration complexity, additional com- putation |
|
Domain-specic ne-tuning |
Reduced irrelevant retrieval |
Requires labeled domain corpus |
|
Adaptive chunking |
Better context utilization |
Complexity in chunk management |
|
Neural re-ranking |
Prioritized important info |
Additional computation and latency |
|
Vector index optimization (ANN) |
Faster retrieval, scalable |
Tuning ANN parameters, memory over- head |
|
Source attribution and verica- tion |
Reduced hallucinations |
Implementation of fact-checking layer |
|
Bias auditing and diverse KB |
Fairer outputs |
Continuous monitoring and updates |
|
Incremental indexing |
Efcient KB updates |
Requires versioning and change track- ing |
TABLE VI: Summary of RAG Limitations and Proposed Solutions
|
Limitation |
Impact |
Proposed Solution |
|
Irrelevant retrieval |
Wrong or misleading answers |
Hybrid retrieval, domain-specic ne- tuning, re-ranking |
|
Context truncation |
Loss of critical information |
Adaptive chunking, context optimization |
|
Hallucinations |
Fabricated or inconsistent outputs |
Source attribution, fact-checking, instruction-tuned LLMs |
|
Bias in sources |
Discriminatory outputs |
Diverse knowledge sources, bias auditing |
|
High latency |
Poor user experience |
Vector index optimization, caching, mi- croservice architecture |
|
Scalability issues |
Degradation of performance |
Incremental indexing, version control, per- formance monitoring |
|
Lack of benchmarking |
Difcult to compare systems |
Standard metrics, benchmark datasets |
References
-
TechTarget, Understanding the Limitations and
-
P. Lewis, et al., Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, NeurIPS, 2020. [Online]. Available: https://arxiv. org/abs/2005.11401
-
G. Izacard and E. Grave, Leveraging Passage Retrieval with Gener-ative Models for Open-Domain QA, arXiv preprint, 2020. [Online].
Available: https://arxiv.org/abs/2007.01282
-
S. Yao, et al., RAG-as-a-Service: Retrieval-Augmented Generation in Production, IBM Research Blog, 2023. [Online]. Available: https:
//www.ibm.com/architectures/patterns/genai-rag
-
N. Rajani, et al., Explainable AI for Language Models using SHAP, ACL Workshop on XAI, 2022. [Online]. Available: https://aclanthology. org/2020.emnlp-main.550/?utmsource=chatgpt.com
Challenges of RAG Systems. [Online]. Avail-able: https://www.techtarget.com/searchenterpriseai/tip/ Understanding-the-limitations-and-challenges-of-RAG-systems? utmsource=chatgpt.com
-
-
Educative Blog, RAG Challenges and Limitations. [Online]. Available: https://www.educative.ioblog/rag-challenges?utmsource=chatgpt.com
-
Anonymous, No Free Lunch: RAG Fairness in LLMs, arXiv preprint, 2025. [Online]. Available: https://arxiv.org/abs/2504.03957? utmsource=chatgpt.com
-
DigitalDefynd, Pros and Cons of Retrieval-Augmented Generation. [Online]. Available: https://digitaldefynd.com/IQ/ pros-cons-of-retrieval-augmented-generation/
-
Cloudkitect, RAG: How It Works, Limitations and
-
V. Karpukhin, et al., Dense Passage Retrieval for Open-Domain Ques-tion Answering, EMNLP, 2020. [Online]. Available: https://arxiv.org/ abs/2004.04906
-
Facebook Research, FAISS: A Library for Efcient Similarity Search. [Online]. Available: https://github.com/facebookresearch/faiss
-
Facebook Engineering, FAISS: A Library for Efcient Similarity Search, 2017. [Online]. Available: https://engineering.fb.com/2017/03/ 29/data-infrastructure/faiss-a-library-for-efcient-similarity-search/
-
Pinecone Docs, Getting Started with Pinecone. [Online]. Available: https://docs.pinecone.io/guides/get-started/overview
-
Weaviate Docs, Weaviate Documentation. [Online]. Available: https:
//docs.weaviate.io/weaviate
-
Anonymous, Enhancing Retrieval-Augmented Large Language Mod-els with Iterative Retrieval-Generation Synergy, arXiv preprint, 2023. [Online]. Available: https://arxiv.org/abs/2311.05232
-
Anonymous, Recent Advances in Retrieval-Augmented Generation, arXiv preprint, 2023. [Online]. Available: https://arxiv.org/pdf/2312. 10997
-
Anonymous, A Survey on Retrieval And Structuring Augmented Gen-eration with Large Language Models, arXiv preprint, 2024. [Online].
Available: https://arxiv.org/abs/2411.01751
-
Anonymous, Iterative RAG Strategies for Knowledge-Intensive Tasks, arXiv preprint, 2021. [Online]. Available: https://arxiv.org/abs/2106. 11517
-
Anonymous, No Free Lunch: Retrieval-Augmented Generation Under-mines Fairness in LLMs, arXiv preprint, 2025. [Online]. Available: https://arxiv.org/abs/2504.12330
Strategies for Accurate Generation, Medium, 2023. [Online]. Available: https://medium.com/@cloudkitect/
rag-retrieval-augmented-generation-how-it-works-its-limitations-and-strategies-for-utmsource=chatgpt.com
