All terms · Model Architecture

Transformer

The neural network architecture that powers modern large language models.

A Transformer is a specific type of neural network designed to process sequences of data (like text) efficiently. Introduced by Google in 2017, Transformers use an "attention mechanism" to weigh which parts of the input are most relevant when generating each output token. This lets the model understand long-range relationships in text—understanding that "Alice" and "she" refer to the same person, even if they appear far apart.

Transformers are why modern AI is so good at language. Every major language model (GPT, Claude, Gemini, Llama) is built on Transformer architecture. The "T" in GPT stands for Transformer. They're efficient to train, parallel-friendly, and scale well to billions of parameters.

Example

When ChatGPT processes "Alice went to the store and she bought milk", the Transformer attention mechanism links "she" back to "Alice" even though 6 words separate them.