RAG guardrails: being useful without hallucinating
Retrieval-augmented generation sounds straightforward in concept: retrieve relevant documents, feed them to a language model, get a grounded answer. In practice, the gap between "demo-ready" and "production-ready" is where most RAG systems fall apart.
A good RAG system is as much about "when to refuse" as it is about retrieval.
The hallucination problem isn't just about the LLM
It's tempting to blame hallucinations on the language model alone, but in a RAG pipeline, the retrieval step introduces its own failure modes:
- False relevance: A document that matches keywords but answers a different question.
- Partial context: Chunks that contain related information but miss critical qualifiers.
- Stale data: Documents that were accurate when indexed but are now outdated.
- Missing evidence: The answer genuinely isn't in the corpus, but the model generates one anyway.
Each of these produces a confident-sounding answer that's wrong in ways the user can't easily verify.
Principles I follow
1. Deterministic indexing
The index should be reproducible. Same documents in, same index out. This means pinning embedding model versions, using deterministic chunking strategies, and versioning the index alongside the source documents. If you can't reproduce the index, you can't debug retrieval failures.
2. Conservative thresholds
Not every retrieved document is worth using. I set similarity thresholds that err on the side of returning "I don't have enough evidence to answer that" rather than surfacing weakly related content. The threshold is tuned per use case—a knowledge base query needs higher confidence than a discovery/browsing query.
3. Provenance by default
Every answer should link back to its source. This means including document references, chunk identifiers, and similarity scores in the response metadata. Users (and developers) need to be able to verify where an answer came from, not just what it says.
4. Explicit refusal
If retrieved evidence is below the confidence threshold, the system says so. "I don't have enough information to answer that" is a better response than a plausible-sounding guess. This is especially important in domains where wrong answers have real consequences.
5. Guardrails on generation
Even with good retrieval, the LLM can still drift. Guardrails include:
- Constraining the model to answer only from provided context.
- Detecting when the model's response contradicts or extends beyond the retrieved evidence.
- Rate limiting and input validation to prevent prompt injection.
What this looks like in practice
In the personal-codex-agent project, the pipeline follows: ingestion → chunking → embeddings → persistent index → retrieval → response generation. The emphasis is on reproducibility at every step and conservative behavior when evidence is weak.
The portfolio chatbot on this site follows similar principles at a smaller scale—it answers only from a local knowledge base and refuses sensitive or out-of-scope queries outright.
What I'd improve next
- Add evaluation datasets for retrieval quality (relevance, recall, answer faithfulness).
- Implement re-ranking to improve retrieval precision beyond raw similarity.
- Build monitoring for retrieval distribution drift over time.
- Experiment with hybrid retrieval (keyword + semantic) for better coverage.
Takeaway
The bar for a useful RAG system isn't "can it generate answers?"—it's "can you trust the answers it gives, and can you tell when it doesn't know?" Deterministic indexing, conservative thresholds, provenance, and explicit refusal are the foundations that make the difference.