NVIDIA launches Cosmos 3, an open frontier omnimodel for physical AI — with a creative-tools coalition behind it
TL;DR
NVIDIA launched Cosmos 3 on May 31, 2026 at GTC Taipei — an open frontier omnimodel for physical AI built on a mixture-of-transformers architecture that generates text, images, video, ambient sound, and action trajectories in a single model. It ships with open weights on Hugging Face, is deployable via NVIDIA NIM microservices, and is backed by the Cosmos Coalition — a founding group that includes Black Forest Labs, LTX, and Runway alongside robotics firms.
May 31, 2026
Cosmos 3 launched at NVIDIA GTC Taipei — open weights available same day on Hugging Face
Super / Nano / Edge
Three variants: Super for highest-accuracy robotics/AV post-training, Nano for fast inference, Edge coming soon for real-time edge deployment
5 modalities
Cosmos 3 generates text, images, video, ambient sound, and action trajectories in a single model
Cosmos Coalition
Founding partners include Black Forest Labs, LTX, Runway, Agile Robots, Generalist, and Skild AI
NVIDIA released Cosmos 3 on May 31, 2026 at GTC Taipei — an open frontier omnimodel for physical AI built on a mixture-of-transformers architecture that combines vision reasoning and multimodal generation in one model. Cosmos 3 generates text, images, video, ambient sound, and action trajectories together — what NVIDIA calls "the world's first fully open omnimodel" — and ships with open weights available on Hugging Face, deployable as NVIDIA NIM microservices or via build.nvidia.com. For a field where frontier video generation has largely stayed closed and proprietary, the open licensing is the structural signal: NVIDIA is positioning Cosmos 3 as infrastructure, not a gated API product.
Three variants, two available now. Cosmos 3 ships in three tiers. Cosmos 3 Super is the highest-accuracy variant — optimised for post-training robotics and autonomous vehicle models where physics fidelity matters most. Cosmos 3 Nano is the fast tier — high-quality video and action reasoning in fractions of a second, designed for real-time use cases. Cosmos 3 Edge is coming soon, targeting real-time inference at the edge for on-device deployment in robots and autonomous systems. Super and Nano are available now; Edge has no confirmed date.
What the model actually does. Cosmos 3 operates across five modalities simultaneously: text, images, video, ambient sound, and action. The action outputs are numerical — joint angles, gripper positions, trajectory points — which makes the model usable as the reasoning and generation layer for robotic policy development, not just video creation. This is what separates Cosmos 3 from commercial video tools: it can generate a video of a manipulation task and simultaneously output the action data that would make a physical robot execute the same motion. NVIDIA benchmarks Cosmos 3 first among open models on Physics-IQ, PAI-Bench, and R-Bench for world generation; first on RoboLab and RoboArena for action policy; and first on VANTAGE-Bench and TAR for vision understanding. Jensen Huang's framing at the launch: "The Cosmos 3 family of open, frontier omnimodels gives developers a generational leap in ability to build robots, autonomous vehicles and vision AI."
The Cosmos Coalition — and why creative-AI operators should pay attention. Alongside the model launch, NVIDIA announced the Cosmos Coalition: a founding group of companies building on top of Cosmos 3. The robotics members — Agile Robots and Skild AI — are expected. The media and creative-AI members are more relevant for ToolNav operators: Black Forest Labs (the team behind the FLUX image generation models), LTX (AI video generation), Runway (which shipped its own GWM-1 world model in December 2025 and now carries a $5.3B valuation), and Generalist. These are tools with direct overlap with AI content creation workflows. The Coalition's likely path is that these companies build world-model-powered features on top of Cosmos 3 — improved physics accuracy in generated video, better motion coherence, synthetic scene generation for training. For operators using FLUX or LTX-based tools today, Cosmos 3 is the upstream infrastructure that may power the next generation of those capabilities.
What Cosmos 3 is not, yet. Cosmos 3 is not a turnkey creative tool. There is no Cosmos 3 interface comparable to Midjourney or InVideo. Accessing it requires Hugging Face, a NIM microservice, or a supported infrastructure provider (Baseten, CoreWeave, Microsoft Azure, Nebius, Deep Infra, Classmethod). The creative-AI impact will arrive through the Coalition members' products — not through a direct Cosmos 3 consumer experience. The benchmarks are for open models; how Cosmos 3 compares to Sora, Veo, or Kling at the product level is not established by NVIDIA's current benchmark set.
Where it sits in the landscape. For the AI video and image generation category, Cosmos 3's significance is indirect but real. The tools your readers pay for are built on top of foundation models like this one. NVIDIA going open at the world-model layer sets a capability baseline that the entire category is now building toward. The Coalition is the mechanism through which Cosmos 3 shows up in the products creators actually use.
Why It Matters
Open world models are arriving in the creative stack, and NVIDIA just set the baseline. Cosmos 3 is not a consumer product — but it is the infrastructure layer that powers the next generation of the tools that are. The Cosmos Coalition's creative-AI members (Black Forest Labs / FLUX, LTX, Runway) are likely to build Cosmos 3's physics-accurate generation into their products over the next product cycle. For operators using AI video and image tools today, this is the upstream development that determines how good those tools get. The open weights mean the entire developer community can build on it, not just NVIDIA's partners. The honest caveat: Cosmos 3 is robotics and AV first — the creative-AI applications will arrive through the Coalition and third-party builders, not from NVIDIA directly.
Who's Affected
- — AI video and image tool users — not directly today, but Runway, Black Forest Labs (FLUX), and LTX are in the Cosmos Coalition. Their products may gain physics-accurate generation capabilities built on Cosmos 3 over the next product cycle. Watch Runway's next model announcement and FLUX updates for signs of world-model integration.
- — Developers building on open models — Cosmos 3's open weights on Hugging Face are available now. If you build pipelines on FLUX or other Black Forest Labs models, Cosmos 3's action and video generation capabilities are in the same open-model ecosystem to evaluate.
- — AI video service sellers — Cosmos 3 does not change your tool costs or workflow today. The relevant window is the next product cycle, when Coalition members start shipping Cosmos 3-powered features. For current tool choices, see our AI video tools roundup.
- — Operators tracking the physical-AI trend — Cosmos 3 is one of two major physical-AI announcements from May 31: NVIDIA's open omnimodel and OpenAI's robotics division rebuild arrived on the same day. The convergence of multiple labs on open physical-AI infrastructure is a directional signal worth tracking even if the consumer product implications are still one-to-two product cycles out.
What To Do Now
- 1. No immediate action required on Cosmos 3 itself. It is not a consumer product — it is open-weights infrastructure for developers and researchers. Your current video and image tools are unaffected today.
- 2. Watch the Coalition members' product updates. The creative-AI relevance of Cosmos 3 will land through Runway, Black Forest Labs (FLUX), and LTX products — not through a Cosmos 3 interface. When those tools ship next-generation features citing improved physics or world-model capabilities, this is the infrastructure behind it.
- 3. If you build on open models, Cosmos 3 is worth evaluating for synthetic video and scene generation. Available on Hugging Face and via NIM microservices on CoreWeave, Microsoft Azure, Baseten, Nebius, Deep Infra, and Classmethod.
- 4. File it as a 12–18 month upstream signal. For content creator tool decisions this quarter, Cosmos 3 does not change your stack. For decisions about where the video AI category is heading structurally — physics accuracy, world-consistent generation, open licensing — it is the most important foundation model release of the week.
More on this topic — Best AI Video Tools
Independent Review
Midjourney
Pricing, pros and cons, real-world verdict — no affiliate spin.
Read the Midjourney reviewMore from ToolNav News
NVIDIA's Nemotron 3 Ultra is now the strongest open US model — 550B parameters, built for agentic coding workflows
2026-06-01
OpenAI rebuilds its robotics division — job listings signal a return to physical hardware after a six-year pause
2026-05-31
Cactus Compute Drops a 26M Parameter AI Model That Runs at 6,000 Tokens/Second on Your Laptop
2026-05-12