Context Window
The maximum amount of text (in tokens) an AI model can process in a single conversation.
The context window is the model's memory limit. ChatGPT 4o has a 128,000-token window; GPT-4 Turbo has 128,000; Claude 3.5 Sonnet has 200,000. Tokens include both your input (the prompt or conversation history) and the model's output (the response). Larger windows mean you can fit more conversation history, longer documents, or more examples in a single request. When you exceed the context window, the model has to drop older messages or truncate your input.
Context window is critical for workflows like RAG (retrieval-augmented generation) or long-document analysis. If you're working with a 100-page PDF, a small window forces you to split it into chunks; a large window lets you feed the whole thing at once.
Example
If GPT-4 Turbo has a 128k-token window and your prompt is 10k tokens, you have 118k tokens left for the model's response.
Related terms
Token
The smallest unit of text an AI processes—usually a word fragment, character, or subword.
Inference
The process of a trained model generating a response to an input. When you chat with ChatGPT, that's inference.
RAG (Retrieval-Augmented Generation)
A technique that combines document retrieval with AI generation to ground responses in factual data.