All terms · Model Architecture

Large Language Model (LLM)

A neural network trained on massive amounts of text to predict and generate human language.

An LLM (Large Language Model) is an AI system trained to understand and generate text. "Large" refers to the number of parameters (internal learnable weights)—GPT-4 has ~1 trillion parameters, Claude 3.5 Sonnet has ~70+ billion. "Language Model" means it predicts the next word given previous words.

LLMs are trained in two phases: (1) Pre-training on trillions of words to learn language patterns, then (2) Fine-tuning with human feedback (RLHF) to align outputs with human values. Once trained, an LLM can solve diverse downstream tasks—writing, coding, reasoning, translation, summarization—without task-specific retraining.

All major AI tools (ChatGPT, Claude, Gemini, Copilot) are powered by LLMs.

Example

ChatGPT, Claude, and Llama are all LLMs. They use Transformer architecture and generate one token at a time.

Related terms

Transformer

The neural network architecture that powers modern large language models.

Pre-Training

The initial large-scale training phase where a model learns language patterns from massive text datasets.

RLHF (Reinforcement Learning from Human Feedback)

A training technique where human reviewers rate AI outputs, and the model learns to generate outputs that score high.

Used in

Chatgpt → Claude →

Back to glossary