Text-to-Image
AI generation of images from a text description, where the model interprets the prompt and synthesises a visual output.
Text-to-image models take a written description (called a prompt) and generate an image that matches it. The model has been trained on billions of image-caption pairs and learns to associate visual patterns with language.
How it works: Most modern text-to-image models use a technique called diffusion — starting from random noise and progressively refining it toward an image that matches the prompt. Guidance parameters control how closely the output follows the prompt vs. generating more freely.
Key models in 2026: Midjourney, DALL-E 3 (used in ChatGPT), Stable Diffusion (open source), Adobe Firefly, and Flux are the dominant models.
Prompt structure matters: Text-to-image prompts typically describe the subject, style, lighting, composition, and quality modifiers. "A golden retriever in a sunlit meadow, oil painting, warm tones, high detail" produces a very different result than "golden retriever photo".
Commercial use: Outputs from Midjourney, Adobe Firefly, and DALL-E 3 are cleared for commercial use. Stable Diffusion outputs depend on the model weights used — check the licence before shipping.
Example
You type: "A flat-design icon of a robot reading a book, blue and white colour palette, minimal, suitable for a tech startup logo." The model outputs a vector-style illustration in seconds.
Related terms
Inference
The process of a trained model generating a response to an input. When you chat with ChatGPT, that's inference.
Open Source
Software or AI models whose underlying code or weights are publicly available for anyone to inspect, modify, and use.
Parameters
The learnable numerical weights inside a neural network — often cited as billions ("7B", "70B") as a rough proxy for model size and capability.