Pre-Training
The initial large-scale training phase where a model learns language patterns from massive text datasets.
Pre-training is the expensive, foundational step where a model learns to predict the next word from trillions of words of text (books, web pages, research papers). This is why it's called "pre"—it happens before any task-specific fine-tuning.
During pre-training, the model learns grammar, facts, reasoning skills, and semantic relationships. Pre-training is so resource-intensive that only well-funded labs (OpenAI, Google, Meta, Anthropic) can afford it. Pre-trained models are released publicly or via APIs for others to fine-tune on specific tasks.
The scale tradeoff: bigger models trained on more data perform better, but are slower and more expensive to run.
Once pre-training is complete, the model is "frozen" for most users—we fine-tune or prompt it, but rarely retrain from scratch.
Example
GPT-4 was pre-trained on ~100+ trillion tokens. Then OpenAI fine-tuned it with human feedback to create ChatGPT.
Related terms
Fine-Tuning
The process of updating a pre-trained model with task-specific or domain-specific data to improve performance.
Large Language Model (LLM)
A neural network trained on massive amounts of text to predict and generate human language.
RLHF (Reinforcement Learning from Human Feedback)
A training technique where human reviewers rate AI outputs, and the model learns to generate outputs that score high.