Pre-Training

The initial large-scale training phase where a model learns language patterns from massive text datasets.

Pre-training is the expensive, foundational step where a model learns to predict the next word from trillions of words of text (books, web pages, research papers). This is why it's called "pre"—it happens before any task-specific fine-tuning.

During pre-training, the model learns grammar, facts, reasoning skills, and semantic relationships. Pre-training is so resource-intensive that only well-funded labs (OpenAI, Google, Meta, Anthropic) can afford it. Pre-trained models are released publicly or via APIs for others to fine-tune on specific tasks.

The scale tradeoff: bigger models trained on more data perform better, but are slower and more expensive to run.

Once pre-training is complete, the model is "frozen" for most users—we fine-tune or prompt it, but rarely retrain from scratch.

Example

GPT-4 was pre-trained on ~100+ trillion tokens. Then OpenAI fine-tuned it with human feedback to create ChatGPT.

Related terms

Fine-Tuning

The process of updating a pre-trained model with task-specific or domain-specific data to improve performance.

Large Language Model (LLM)

A neural network trained on massive amounts of text to predict and generate human language.

RLHF (Reinforcement Learning from Human Feedback)

A training technique where human reviewers rate AI outputs, and the model learns to generate outputs that score high.

Back to glossary