RAG (Retrieval-Augmented Generation)
A technique that combines document retrieval with AI generation to ground responses in factual data.
RAG solves the hallucination problem by retrieving relevant documents before generating a response.
The workflow: 1. User asks a question. 2. Embed the question and search a vector database of documents. 3. Retrieve the k-most-relevant documents. 4. Pass the question + retrieved documents to the language model. 5. Model generates an answer grounded in those documents.
Why it matters: RAG decouples the model's knowledge from its parameters. You can update the document database without retraining. It works for company-specific Q&A, research synthesis, and any task where accuracy matters more than creativity.
RAG is widely used in production—Perplexity, many enterprise AI assistants, and retrieval-heavy products rely on it. The tradeoff is latency—retrieval adds a round-trip to the vector database before generation.
Example
Without RAG: Ask "What's our return policy?" and the model hallucinates. With RAG: Retrieve your actual return policy document, then generate an answer from it.
Related terms
Grounding
Anchoring an AI's responses to factual data to reduce hallucination.
Embedding
A numerical representation of text (or other data) that captures meaning, enabling semantic search and comparison.
Vector Database
A specialized database optimized for storing and searching embeddings by similarity.