All terms · Model Architecture

Neural Network

A computational system loosely modelled on biological neurons, consisting of layers of mathematical functions that learn patterns from data.

A neural network is the foundational architecture underlying modern AI, including all large language models. It consists of:

- Input layer: receives raw data (token embeddings for language models) - Hidden layers: transform the data through learnable mathematical operations (weights and biases) - Output layer: produces predictions, probabilities, or generated tokens

During training, the network adjusts its internal weights to minimise the difference between its outputs and the correct answers. Over billions of training examples, it learns to recognise patterns.

Modern large language models use a specific neural network architecture called the Transformer, which added attention mechanisms that allow the model to relate different parts of the input to each other regardless of distance.

Key properties: - Depth: more layers generally = more representational capacity - Parameters: the learnable weights; more parameters = larger model capacity - Activation functions: mathematical functions applied at each neuron that introduce non-linearity

Understanding neural networks is the entry point to understanding why AI models behave as they do — pattern-matching machines that generalise from training data.

Example

GPT-4 is a neural network with hundreds of billions of parameters, trained on trillions of tokens of text. Its ability to answer questions, write code, and reason emerges from patterns learned during that training process.