Introducing Nemotron 3 Super: A Day 0 Launch with SGLang

We are thrilled to announce that SGLang is supporting the groundbreaking NVIDIA Nemotron 3 Super on Day 0. This latest addition in the Nemotron 3 family is designed for sophisticated multi-agent interactions, enabling seamless collaboration between agents that plan, reason, and execute tasks together.

Contents

What Makes Nemotron 3 Super Stand Out?

Advanced Architecture
Unmatched Accuracy
Optimized Model Specifications
Fully Open Model

Installation: Getting Started with SGLang and Nemotron 3 Super
The Ideal Solution for Multi-Agent Workloads

Efficiency and Performance
Diverse Applications

Get Started Today!

Acknowledgments

What Makes Nemotron 3 Super Stand Out?

Advanced Architecture

The Nemotron 3 Super employs a Mixture of Experts (MoE) structure combined with a Hybrid Transformer-Mamba Architecture. This architecture is engineered for efficiency, allowing it to achieve a throughput that is up to 5x higher compared to previous models, such as Llama Nemotron Super 1.5. Additionally, its Multi-Token Prediction (MTP) capability allows simultaneous token prediction, dramatically speeding up long-form text generation.

Unmatched Accuracy

On the Artificial Analysis Intelligence Index, the Nemotron 3 Super boasts leading accuracy metrics within its size category. It achieves up to 2x higher accuracy than its predecessor through its innovative latent MoE feature, which enables the model to utilize four experts for the inference cost of just one.

Optimized Model Specifications

Parameter Count: 120B total parameters, with only 12B active parameters during each inference run.
Context Length: Capable of handling contexts up to 1M tokens, providing a broader scope for conversation and workflow management.
Input/Output: Simple text input with text output, making it user-friendly for various applications.
Supported Hardware: The model efficiently runs on top-tier GPUs including B200, H100, H200, DGX Spark, and RTX 6000.

Fully Open Model

As demonstrated in our accompanying chart on the Artificial Analysis Openness Index, Nemotron 3 Super sets itself apart with its fully open framework. It offers open weights, datasets, and configuration recipes, allowing developers the freedom to customize, optimize, and deploy as per their needs, ensuring maximum privacy and security.

Installation: Getting Started with SGLang and Nemotron 3 Super

If you’re looking to integrate Nemotron 3 Super into your pipeline, the first step is installing SGLang. For detailed guidance, you can consult our comprehensive getting started cookbook.

Run the following command to install the necessary dependencies:

bash
pip install ‘git+https://github.com/sgl-project/sglang.git#subdirectory=python‘

After installation, serving the model is straightforward. The example below is optimized for a 4x H200 setup. Detailed instructions are further elaborated in our cookbooks.

bash
python3 -m sglang.launch_server
–model-path nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
–host 0.0.0.0
–port 5000
–trust-remote-code
–tp 4
–tool-call-parser qwen3_coder
–reasoning-parser nemotron_3

Once your server is operational, you can begin prompting the model with simple code snippets as shown below:

python
from openai import OpenAI

SERVED_MODEL_NAME = “nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16″
BASE_URL = f”http://localhost:5000/v1”
API_KEY = “EMPTY”

client = OpenAI(base_url=BASE_URL, api_key=API_KEY)

resp = client.chat.completions.create(
model=SERVED_MODEL_NAME,
messages=[
{“role”: “system”, “content”: “You are a helpful AI assistant.”},
{“role”: “user”, “content”: “Give me 3 bullet points about SGLang.”}
],
temperature=0.6,
max_tokens=512,
)

print(“Reasoning:”, resp.choices[0].message.reasoning_content, “nContent:”, resp.choices[0].message.content)

The Ideal Solution for Multi-Agent Workloads

Nemotron 3 Super shines particularly in scenarios requiring multi-agent capabilities and complex reasoning workloads.

Efficiency and Performance

As illustrated in the accompanying chart, the model excels not just in accuracy but also in efficiency—making it an attractive choice for multi-agent systems. The expansive 1M-token context empowers agents to maintain full conversation histories, enhancing their ability to plan and execute tasks effectively. This architecture is particularly advantageous for RAG (Retrieval-Augmented Generation) processes, as large document sets can be ingested in one operation. This feature helps in minimizing fragmentation and reducing the risk of goal drift during multi-step workflows.

Diverse Applications

The capabilities of Nemotron 3 Super extend across a range of applications—from code generation and debugging to research summarization, alert triage, and document analysis. Its design enables users to orchestrate multiple agents efficiently within a single node, making it a versatile tool in any developer’s arsenal.

Get Started Today!

With Nemotron 3 Super, developers are equipped to build scalable, cost-effective multi-agent AI systems without sacrificing accuracy. Its open-source framework gives you the flexibility to tailor your deployment, whether on local infrastructure or cloud environments.

Eager to revolutionize your multi-agent AI projects? Dive into the potential of Nemotron 3 Super today!

Acknowledgments

We extend our heartfelt thanks to everyone who contributed to implementing Nemotron 3 Super into SGLang. Special thanks go to the NVIDIA team—Nirmal Kumar Juluru, Anusha Pant, Max Xu, Daniel Afrimi, Shahar Mor, Roi Koren, and Ann Guan—along with the SGLang team and community members Baizhou Zhang, Jiajun Li, Ke Bao, Lingyan Hao, and Mingyi Lu for their invaluable efforts.

Inspired by: Source

SGLang Introduces Day-0 Support for NVIDIA Nemotron 3 Super: Build High-Efficiency Multi-Agent Systems with Ease

Introducing Nemotron 3 Super: A Day 0 Launch with SGLang

What Makes Nemotron 3 Super Stand Out?

Advanced Architecture

Unmatched Accuracy

Optimized Model Specifications

Fully Open Model

Installation: Getting Started with SGLang and Nemotron 3 Super

The Ideal Solution for Multi-Agent Workloads

Efficiency and Performance

Diverse Applications

Get Started Today!

Acknowledgments

Stay Connected

Explore Top AI Tools Instantly

Latest News

Master Your Dataset: Take the pandas Quiz – Real Python Guide

Transform AI Prompts into Repeatable ‘Skills’ with Chrome’s New Feature

Efficient RAG Implementation with Training-Free Adaptive Gating Techniques

NAACP Lawsuit Claims Elon Musk’s xAI Pollutes Black Neighborhoods Near Memphis

Leading global tech insights for 20M+ innovators

Quick Link

Support

Sign Up for Our Newsletter

Introducing Nemotron 3 Super: A Day 0 Launch with SGLang

What Makes Nemotron 3 Super Stand Out?

Advanced Architecture

Unmatched Accuracy

Optimized Model Specifications

Fully Open Model

Installation: Getting Started with SGLang and Nemotron 3 Super

More Read

The Ideal Solution for Multi-Agent Workloads

Efficiency and Performance

Diverse Applications

Get Started Today!

Acknowledgments

Sign Up For Daily Newsletter

Get AI news first! Join our newsletter for fresh updates on open-source models.

Stay Connected

Explore Top AI Tools Instantly

Latest News

Master Your Dataset: Take the pandas Quiz – Real Python Guide

Transform AI Prompts into Repeatable ‘Skills’ with Chrome’s New Feature

Efficient RAG Implementation with Training-Free Adaptive Gating Techniques

NAACP Lawsuit Claims Elon Musk’s xAI Pollutes Black Neighborhoods Near Memphis