All terms · Core Concepts

Context Window

The maximum amount of text (in tokens) an AI model can process in a single conversation.

The context window is the model's memory limit. ChatGPT 4o has a 128,000-token window; GPT-4 Turbo has 128,000; Claude 3.5 Sonnet has 200,000. Tokens include both your input (the prompt or conversation history) and the model's output (the response). Larger windows mean you can fit more conversation history, longer documents, or more examples in a single request. When you exceed the context window, the model has to drop older messages or truncate your input.

Context window is critical for workflows like RAG (retrieval-augmented generation) or long-document analysis. If you're working with a 100-page PDF, a small window forces you to split it into chunks; a large window lets you feed the whole thing at once.

Example

If GPT-4 Turbo has a 128k-token window and your prompt is 10k tokens, you have 118k tokens left for the model's response.

Related terms

Token

The smallest unit of text an AI processes—usually a word fragment, character, or subword.

Inference

The process of a trained model generating a response to an input. When you chat with ChatGPT, that's inference.

RAG (Retrieval-Augmented Generation)

A technique that combines document retrieval with AI generation to ground responses in factual data.

Used in

Chatgpt → Claude → Gemini →

Back to glossary