Nvidia has launched Nemotron 3 Super, a 120-billion-parameter open-weight model with a 1-million-token context window, designed to run complex agentic AI systems at scale.
The model is now available on various platforms including build.nvidia.com, Perplexity, OpenRouter, and Hugging Face. Enterprises can also access it through Google Cloud Vertex AI and Oracle Cloud Infrastructure, with support for Amazon Bedrock and Microsoft Azure forthcoming.
Nemotron 3 Super utilizes a hybrid latent mixture-of-experts and Mamba-Transformer architecture, enabling it to call four times more expert specialists during inference at the same cost as its predecessors. The model was trained on synthetic data generated from other frontier reasoning models.
Nvidia has published over 10 trillion tokens of pre- and post-training datasets, along with 15 training environments for reinforcement learning and evaluation recipes. Previous variants of Nemotron have been used by enterprises like ServiceNow to fine-tune their own models.
Benchmarks from Artificial Analysis show that Nemotron 3 Super scores 36 on overall intelligence, surpassing gpt-oss-120B, which scores 33, but falling behind Gemini 3.1 Pro and GPT-5.4, both of which score 57. The model achieves an output rate of 478 tokens per second, making it the fastest in its class.
In comparison, gpt-oss-120B is the second-fastest model at 264 output tokens per second. Nvidia claims that Nemotron 3 Super offers 7.5 times higher inference throughput than Qwen3.5-122B. The company has not announced a release date for Nemotron 3 Ultra, the largest model in the family with 500 billion parameters.
Nvidia previously introduced Nemotron 3 Nano, a 30-billion-parameter open-weight model optimized for smaller targeted tasks, in December 2024. The company had teased Nemotron 3 Ultra in an earlier announcement but has not provided a release timeline.




