All terms · Model Architecture

Large Language Model (LLM)

A neural network trained on massive amounts of text to predict and generate human language.

An LLM (Large Language Model) is an AI system trained to understand and generate text. "Large" refers to the number of parameters (internal learnable weights)—GPT-4 has ~1 trillion parameters, Claude 3.5 Sonnet has ~70+ billion. "Language Model" means it predicts the next word given previous words.

LLMs are trained in two phases: (1) Pre-training on trillions of words to learn language patterns, then (2) Fine-tuning with human feedback (RLHF) to align outputs with human values. Once trained, an LLM can solve diverse downstream tasks—writing, coding, reasoning, translation, summarization—without task-specific retraining.

All major AI tools (ChatGPT, Claude, Gemini, Copilot) are powered by LLMs.

Example

ChatGPT, Claude, and Llama are all LLMs. They use Transformer architecture and generate one token at a time.