All terms · AI Tools & Products

GPU

Graphics Processing Unit — a chip originally built for rendering graphics, now the primary hardware for training and running AI models due to its ability to perform thousands of calculations simultaneously.

A GPU (Graphics Processing Unit) is a specialised processor designed to handle thousands of simultaneous calculations in parallel. Originally built to render graphics in games and video, GPUs turned out to be perfectly suited for the matrix multiplication operations that power neural networks.

CPU vs GPU for AI: A CPU (the standard computer chip) executes tasks sequentially — it's fast at one thing at a time. A GPU executes thousands of tasks simultaneously — slower per individual task, but vastly faster in aggregate for AI workloads. Training GPT-3 required roughly 300 billion GPU-hours on CPUs alone would have been effectively impossible.

Why the AI compute shortage is a GPU shortage: Training and running frontier AI models requires tens of thousands of GPUs in parallel. Nvidia dominates this market with its H100 and A100 data centre chips. Demand from OpenAI, Google, Meta, and Microsoft has vastly outstripped supply — which is why orbital data centre companies like Cowboy Space are raising $275M to put GPUs in orbit.

VRAM determines what you can run locally: GPU video memory (VRAM) sets the maximum model size you can run. Running a 7B parameter model requires ~14GB VRAM; a 70B model requires ~140GB. Consumer GPUs (RTX 4090: 24GB VRAM) handle smaller models; frontier models require data-centre hardware.

Example

A 7-billion-parameter open source model runs on a consumer laptop at 50 tokens/second. The same model on an H100 data-centre GPU runs at 3,000+ tokens/second — the difference is raw GPU parallelism and memory bandwidth.