NVIDIA's Nemotron 3 Super Technical Report Highlights 25 Trillion Token Synthetic Pretraining

Image for NVIDIA's Nemotron 3 Super Technical Report Highlights 25 Trillion Token Synthetic Pretraining

Alexander Doria has drawn attention to a new reference report detailing advancements in synthetic pretraining, specifically highlighting the updated Nemotron 3 Super technical report. Doria, a prominent voice in AI, stated in a recent tweet, "New reference report for synthetic pretraining (and since it's fully recursive: on a very good model to create synthetic data)." This announcement points to a significant development from NVIDIA, showcasing their latest large language model, Nemotron 3 Super.

The technical report, co-authored by Eric W. Tramel and others, introduces Nemotron 3 Super, a 120 billion total parameter (12 billion active) hybrid Mamba-Attention Mixture-of-Experts (MoE) model. This model is engineered for agentic reasoning and high efficiency, demonstrating up to 2.2 times higher inference throughput compared to models like GPT-OSS-120B. A core innovation lies in its pre-training on an unprecedented 25 trillion tokens, a substantial portion of which is synthetically generated data.

NVIDIA's strategy heavily leverages synthetic data across various stages of the model's development. During pre-training, specialized synthetic datasets were crucial for enhancing capabilities in areas such as code concepts, algorithms, economics, formal logic, and multiple-choice questions. This approach extends into supervised fine-tuning (SFT) and reinforcement learning (RL), covering complex tasks like software engineering, financial reasoning, and long-context understanding. The model's architecture also integrates LatentMoE for improved accuracy per FLOP and Multi-Token Prediction (MTP) layers for accelerated inference through speculative decoding.

The Nemotron 3 Super model and its training recipes are released under the NVIDIA Nemotron Open Model License, emphasizing an aggressive push into open-source AI development. This transparency, including the release of model weights and training corpora, aims to foster a broader developer ecosystem and address the increasing demand for auditable AI solutions, particularly in regulated industries. The model's ability to process up to 1 million tokens further enhances its utility for long-form documents and complex multi-step reasoning.