All terms · Workflows & Patterns

RAG (Retrieval-Augmented Generation)

A technique that combines document retrieval with AI generation to ground responses in factual data.

RAG solves the hallucination problem by retrieving relevant documents before generating a response.

The workflow: 1. User asks a question. 2. Embed the question and search a vector database of documents. 3. Retrieve the k-most-relevant documents. 4. Pass the question + retrieved documents to the language model. 5. Model generates an answer grounded in those documents.

Why it matters: RAG decouples the model's knowledge from its parameters. You can update the document database without retraining. It works for company-specific Q&A, research synthesis, and any task where accuracy matters more than creativity.

RAG is widely used in production—Perplexity, many enterprise AI assistants, and retrieval-heavy products rely on it. The tradeoff is latency—retrieval adds a round-trip to the vector database before generation.

Example

Without RAG: Ask "What's our return policy?" and the model hallucinates. With RAG: Retrieve your actual return policy document, then generate an answer from it.

Related terms

Grounding

Anchoring an AI's responses to factual data to reduce hallucination.

Embedding

A numerical representation of text (or other data) that captures meaning, enabling semantic search and comparison.

Vector Database

A specialized database optimized for storing and searching embeddings by similarity.

Back to glossary