All terms · Model Architecture

Neural Network

A computational system loosely modelled on biological neurons, consisting of layers of mathematical functions that learn patterns from data.

A neural network is the foundational architecture underlying modern AI, including all large language models. It consists of:

- Input layer: receives raw data (token embeddings for language models) - Hidden layers: transform the data through learnable mathematical operations (weights and biases) - Output layer: produces predictions, probabilities, or generated tokens

During training, the network adjusts its internal weights to minimise the difference between its outputs and the correct answers. Over billions of training examples, it learns to recognise patterns.

Modern large language models use a specific neural network architecture called the Transformer, which added attention mechanisms that allow the model to relate different parts of the input to each other regardless of distance.

Key properties: - Depth: more layers generally = more representational capacity - Parameters: the learnable weights; more parameters = larger model capacity - Activation functions: mathematical functions applied at each neuron that introduce non-linearity

Understanding neural networks is the entry point to understanding why AI models behave as they do — pattern-matching machines that generalise from training data.

Example

GPT-4 is a neural network with hundreds of billions of parameters, trained on trillions of tokens of text. Its ability to answer questions, write code, and reason emerges from patterns learned during that training process.

Related terms

Transformer

The neural network architecture that powers modern large language models.

Large Language Model (LLM)

A neural network trained on massive amounts of text to predict and generate human language.

Pre-Training

The initial large-scale training phase where a model learns language patterns from massive text datasets.

Back to glossary