All terms · Model Architecture

Parameters

The learnable numerical weights inside a neural network — often cited as billions ("7B", "70B") as a rough proxy for model size and capability.

Parameters are the numerical values inside a neural network that are adjusted during training. Every connection between neurons has a weight (a parameter); every layer has biases (more parameters). When you see "Llama 3 70B" or "GPT-4 is estimated at ~1.8 trillion parameters," that count refers to these weights.

Why parameter count matters (and its limits): - More parameters generally means more capacity to store patterns from training data - Larger models tend to be more capable at reasoning, instruction-following, and knowledge recall - But parameter count alone does not determine quality — training data quality, training methodology, and architecture choices matter as much - A well-trained smaller model (e.g. Mistral 7B) can outperform a poorly-trained larger one

Common parameter scales: - ~7B: fast, runs on consumer hardware, limited reasoning - ~13B–30B: mid-range, runs on good consumer hardware with quantisation - ~70B: powerful, requires significant GPU memory - 100B+: frontier models, run on data centre infrastructure

Most hosted model providers (Anthropic, OpenAI, Google) do not disclose exact parameter counts for their frontier models.

Example

Claude Opus 4.7 is estimated to be a very large model by capability — likely 100B+ parameters — though Anthropic does not publish the exact count. The parameter scale contributes to its ability to handle complex multi-step reasoning.

Related terms

Large Language Model (LLM)

A neural network trained on massive amounts of text to predict and generate human language.

Transformer

The neural network architecture that powers modern large language models.

Pre-Training

The initial large-scale training phase where a model learns language patterns from massive text datasets.

Used in

Claude Ai →

Back to glossary