Standard RAG pipelines have a silent failure mode: they generate confident, fluent answers from chunks that don’t actually answer the question. The retriever returns something — cosine similarity found a match — but “related document” and “relevant answer” are not the same thing. Without a mechanism to verify this distinction before the response reaches the user, your pipeline will hallucinate citations, fabricate specifics, and erode trust in exactly the cases where accuracy matters most.
The fix is a Critic Agent — a stateful verification layer wired into the pipeline via LangGraph conditional edges. It grades retrieval quality, checks answer groundedness, and reroutes on failure. Not just once: it loops, rewrites, and retries with a hard escape condition. This post walks through the full implementation, including the state schema, the three-grade evaluation chain, and the loop-escape pattern that prevents token budget exhaustion on adversarial queries.
Diagram 1: Self-correcting RAG pipeline state machine — retrieval grading, generation, critic evaluation, and conditional rerouting paths.
What Breaks at Scale
Three failure modes we’ve observed in production after the initial critic loop is deployed:
Critic Hallucination: The critic itself can hallucinate. A claude-haiku-4-5-based groundedness checker occasionally marks a grounded answer as ungrounded when the answer paraphrases rather than quotes the source chunk verbatim. This triggers unnecessary rewrites. The fix is to tune the groundedness prompt to explicitly permit paraphrase: “A claim is grounded if it is a reasonable paraphrase of information in the chunks, not only if it uses identical wording.” This single prompt change reduced false-positive hallucination flags in our deployments from occasional to rare.
Rewrite Drift: After two rewrites, the active query can drift so far from the user’s original intent that the third retrieval returns completely unrelated content. Cap rewrites at 2 and log the full rewrite chain in your observability layer — LangSmith traces capture this automatically if you instrument the rewrite node with @traceable. See our LangSmith observability guide for the tracing setup.
Chunk Boundary Failures Persist: The critic loop improves single-chunk retrieval failures but does not solve chunk boundary hallucination — cases where the answer genuinely requires information spanning multiple chunks that were split at ingestion time. If chunk boundary failures are frequent in your corpus, the fix is upstream in your chunking strategy (larger chunks with overlap, or parent-document retrieval), not in the critic loop. Adding more retry cycles on a chunk boundary failure will exhaust retries every time and always degrade to fallback.
A critic agent loop addresses retrieval-time and generation-time failures, but it cannot compensate for indexing-time failures. Systematic hallucinations that persist across correction cycles are almost always a signal that your chunking strategy or embedding model is misaligned with your query distribution.
Frequently Asked Questions
What is a self-correcting RAG pipeline?
A self-correcting RAG pipeline adds a critic agent that evaluates retrieval quality and answer groundedness after each generation step. If the retrieved chunks are irrelevant or the generated answer is not supported by the context, the pipeline rerouts — rewriting the query, re-ranking, or falling back to a web search — rather than returning a bad answer to the user.
How does a LangGraph critic agent differ from a simple prompt-based hallucination check?
A prompt-based check is stateless — it inspects one output and discards the result. A LangGraph critic agent is a stateful graph node with conditional edges. It can set flags in shared state, increment a retry counter, trigger upstream query rewrites, or halt the pipeline entirely, all based on accumulated context across multiple correction cycles.
What model should I use for the critic node?
Use the smallest model that achieves reliable grading for your domain. Claude Haiku 4.5 and Gemma 3n work well because grading is a structured classification task — binary or 3-class — not open-ended generation. Frontier models like Claude Opus 4.6 are overkill for a critic and add unnecessary latency and cost to every pipeline invocation.
How do I prevent infinite loops in a self-correcting RAG pipeline?
Maintain a retry counter in LangGraph shared state and define a maximum iteration threshold (typically 2–3 retries). Every conditional edge that loops back must check this counter first. When the counter is exceeded, the pipeline should route to a graceful degradation node that returns a transparent “I could not find a reliable answer” response rather than silently failing or continuing indefinitely.
Engineer Intelligence with ActiveWizards
If your RAG pipeline is producing confident wrong answers and you need a production-grade critic architecture designed for your document corpus and query distribution, our AI engineering team has deployed this pattern across regulated industries where hallucinations are not acceptable.