· By the ToolNav Team · 6 min read NVIDIA Open Source AI Models AI Agents AI Coding Developer Tools

NVIDIA's Nemotron 3 Ultra is now the strongest open US model — 550B parameters, built for agentic coding workflows

TL;DR

At Computex 2026 on June 1, Jensen Huang announced Nemotron 3 Ultra — a 550-billion-parameter mixture-of-experts open model designed specifically for agentic coding, research, and enterprise workflows. It ranks first among US open-weights models on Artificial Analysis's Intelligence Index with a score of 48, though it trails China's Kimi K2.6 (54) and closed models like Claude Opus 4.8 (61). Weights ship June 4 on Hugging Face, OpenRouter, and NVIDIA NIM.

June 1, 2026

Announced by Jensen Huang at Computex 2026 in Taipei; weights available June 4

550B / 55B active

Total parameters / active parameters per token — mixture-of-experts, ~90% sparse

48

Artificial Analysis Intelligence Index score — #1 US open-weights model; trails Kimi K2.6 (54) and Claude Opus 4.8 (61)

300+ tok/s

Inference speed on DeepInfra pre-release endpoint, vs 50–100 tok/s for comparable Chinese open models

NVIDIA announced Nemotron 3 Ultra at Computex 2026 on June 1, as part of Jensen Huang's keynote in Taipei. It is a 550-billion-parameter mixture-of-experts model with approximately 55 billion active parameters per token — meaning roughly 90% of parameters are inactive at inference time, which is the architectural reason for its speed. Weights are available from June 4, 2026 on Hugging Face, ModelScope, OpenRouter, and build.nvidia.com as NVIDIA NIM microservices, with additional availability through cloud partners including DeepInfra.

Where it ranks. Artificial Analysis's independent evaluation — conducted in partnership with NVIDIA — places Nemotron 3 Ultra at 48 on its Intelligence Index, making it the highest-scoring US open-weights model on that leaderboard. For context: Gemma 4 31B scores 39, Nemotron 3 Super scores 36, and gpt-oss-120b scores 33. The honest caveat: it does not top the global open-model chart. China's Kimi K2.6 scores 54 on the same index, and the closed Claude Opus 4.8 scores 61. Nemotron 3 Ultra is the best open US model — not the best open model overall.

Speed. Pre-release testing on DeepInfra shows Nemotron 3 Ultra delivering over 300 tokens per second — materially faster than comparable Chinese open models like DeepSeek and Moonshot Kimi at 50–100 tokens per second. NVIDIA also claims 5× faster inference and up to 30% lower cost for complex agentic tasks versus "open frontier models in its class," though the specific comparison models and methodology behind those figures are not published.

What NVIDIA optimised it for. Nemotron 3 Ultra was post-trained specifically for agentic harnesses rather than general-purpose chat. NVIDIA lists compatibility with Hermes Agent, LangChain Deep Agents, OpenClaw, OpenHands, and OpenCode — the orchestration layers that developers use to run autonomous multi-step coding and research workflows. The practical implication: if you already run agents on one of those harnesses, Nemotron 3 Ultra is a drop-in model swap. Weights ship in BF16 and NVFP4 quantization formats. Pricing is not yet disclosed.

The open-model context. Nemotron 3 Ultra arrives alongside NVIDIA's other Computex launch, Cosmos 3 (announced May 31), which targets physical-AI and video generation. Nemotron 3 Ultra is the complementary language-and-reasoning bet — NVIDIA shipping open weights at both ends of the capability stack. For developers evaluating open models for agentic pipelines, it raises the ceiling for what is achievable without paying closed-model API rates. Whether the intelligence gap versus Kimi K2.6 matters depends on the task: at 300+ tokens per second and open weights, the economics of running high-volume agent loops favour Nemotron 3 Ultra even if raw intelligence benchmarks are not first globally.

Who it is for. Nemotron 3 Ultra is a developer and infrastructure tool, not a consumer product. It competes with other open models in coding-agent pipelines — the same tier as Claude Code (which uses closed Anthropic models) and Cursor (which uses a mix of models). The direct comparison is not product-to-product — it is model-to-model for developers who self-host or route through OpenRouter. If your workflow is built on an IDE like Cursor or Claude Code, nothing changes today. If you build agentic pipelines on OpenRouter or HuggingFace and want a fast, open US alternative to closed frontier models, Nemotron 3 Ultra is now the strongest candidate. See our AI coding tools roundup for the broader category landscape.

Why It Matters

Developers building open agentic pipelines now have a fast, US open-weights model that tops the domestic leaderboard. Nemotron 3 Ultra's 300+ tokens per second and compatibility with LangChain, OpenHands, OpenCode, and Hermes Agent harnesses make it a practical option for high-volume agent loops where closed-model API costs scale uncomfortably. The honest limitation: it is not the global leader — China's Kimi K2.6 still scores higher on the Artificial Analysis index, and closed frontier models like Claude Opus 4.8 outrank it significantly. The value proposition is the combination of open weights, US model provenance, and raw inference speed — not benchmark supremacy.

Who's Affected

  • Developers running agentic coding pipelines on OpenRouter or Hugging Face — Nemotron 3 Ultra is available June 4 on both platforms. If you are already routing through these infra layers, this is worth benchmarking against your current model for high-throughput or cost-sensitive agentic workloads.
  • Teams self-hosting LLMs for agent workflows — compatibility with LangChain Deep Agents, OpenHands, OpenCode, OpenClaw, and Hermes Agent means integration effort should be low if you are already on one of these harnesses.
  • Cursor, Claude Code, and Codex users — no direct change to your IDE experience. These products use their own model routing. The Nemotron 3 Ultra relevance is at the infrastructure layer, not the IDE layer.
  • Operators tracking the open vs. closed model balance — NVIDIA shipping a competitive open model at the top of the US leaderboard continues to widen the practical gap between open and closed pricing. If you have been waiting for open weights to catch up to closed models for agentic use, the gap is narrowing.

What To Do Now

  1. 1. If you run agents on OpenRouter, benchmark this before committing to a closed-model default. Nemotron 3 Ultra is available June 4 on OpenRouter. The 300+ tok/s speed advantage over comparable open models is meaningful for agent loops where latency compounds across many steps.
  2. 2. Do not switch IDE-based coding tools on the basis of this launch. Cursor, Claude Code, and Codex are product decisions; Nemotron 3 Ultra is an infrastructure decision. They are not directly competing in the same purchase decision.
  3. 3. Note the benchmark caveat before recommending it to clients or teams. It is #1 US open model — not #1 open model globally. Kimi K2.6 scores higher. For tasks requiring maximum reasoning depth, the closed frontier models still lead.
  4. 4. Pricing is not yet public. NVIDIA has not disclosed API or NIM pricing for Nemotron 3 Ultra as of the June 1 announcement. Factor this into any cost comparison until confirmed rates are published.

More on this topic — Best AI Coding Tools

Independent Review

Claude Code

Pricing, pros and cons, real-world verdict — no affiliate spin.

Read the Claude Code review

The AI Hustle Playbook Newsletter

Get one practical AI playbook each week.

Tools, workflows, and side-income ideas — curated for people who want to build, not browse forever.

No spam. Unsubscribe anytime. We respect your privacy.