Revolutionizing Autonomous Vehicle Development with NVIDIA Cosmos Predict-2
Autonomous vehicles (AVs) are shifting from various isolated models to a unified, end-to-end architecture capable of executing driving actions directly from sensor data. This transformation is significantly elevating the demand for high-quality, physics-based sensor data used for training, testing, and validating AV systems.
To support the evolution of next-generation AV frameworks, NVIDIA has introduced NVIDIA Cosmos Predict-2, a groundbreaking world foundation model. This model enhances future world state prediction capabilities, allowing for the generation of high-quality synthetic data essential for AV advancement. Along with new developer tools, Cosmos Predict-2 positions itself as a key asset in the ambitious realm of autonomous vehicle development.
Enhancing AV Training with Cosmos Predict-2
Building on its predecessor, Cosmos Predict-1, which pioneered the prediction of future world states via text, images, and video prompts, Cosmos Predict-2 boasts superior contextual understanding. This advancement reduces hallucinations and enriches details in generated videos, making the synthetic data more applicable and realistic for AV training.
Cosmos Predict-2 enhances text adherence and common sense for a stop sign at the intersection.
Utilizing state-of-the-art optimization techniques, Cosmos Predict-2 accelerates synthetic data generation on NVIDIA GB200 NVL72 systems and NVIDIA DGX Cloud. This speed-up is pivotal for developers who require vast amounts of data for training robust AV systems.
Unlocking New Training Data Sources with Post-Training Cosmos
By post-training Cosmos models on AV data, developers can now generate videos that accurately mirror existing physical environments and vehicle trajectories. A particularly notable feature is the capability to transform single-view dashcam footage into multi-view videos. This innovation opens up new avenues for AV training, particularly by utilizing readily available dashcam data to create rich, multi-camera datasets. Such datasets are instrumental in scenarios where camera data might be missing due to broken or occluded sensors.
Post-trained Cosmos models generate multi-view videos, significantly augmenting AV training datasets.
The NVIDIA Research team post-trained these models on an impressive 20,000 hours of real-world driving footage. The outcome? Enhanced model performance in challenging weather conditions like fog and rain, critical for ensuring safety and reliability in autonomous driving.
Boosting the AV Ecosystem with Cosmos Predict
Numerous AV companies have already integrated Cosmos Predict into their development frameworks, expediting the road to vehicle commercialization. For instance, Plus, a leader in autonomous trucking, utilizes the NVIDIA DRIVE AGX platform in conjunction with Cosmos Predict to generate realistic synthetic driving scenarios. This technique significantly hastens the development and deployment of their autonomous solutions.
Another innovator, Oxa, leverages Cosmos Predict to facilitate the creation of high-fidelity multi-camera videos, ensuring temporal consistency across varied scenarios.
Empowering AV Developers with New NVIDIA Models and NIM Microservices
In addition to the advancements introduced with Cosmos Predict-2, NVIDIA has also launched Cosmos Transfer, an NVIDIA NIM microservice preview designed for streamlined deployment on data center GPUs. This microservice enhances datasets and produces photorealistic videos using structured inputs or ground-truth simulations sourced from the NVIDIA Omniverse platform. Moreover, the NuRec Fixer model serves a vital role, inpainting and addressing gaps within reconstructed AV data, further increasing the fidelity of training datasets.
NuRec Fixer fills in gaps in driving data to enhance neural reconstructions.
The integration of Cosmos Transfer and NVIDIA NuRec into CARLA, a leading open-source AV simulator, has expanded its capabilities significantly. CARLA’s user base—over 150,000 AV developers—can now render synthetic simulation scenes and viewpoints with remarkable fidelity, crafting endless variations of lighting, weather, and terrain with straightforward prompts.
Developers can explore this robust data generation pipeline using open-source data from the NVIDIA Physical AI Dataset. This latest release includes a staggering 40,000 clips generated through Cosmos Predict, along with sample reconstructed scenes for neural rendering. With CARLA’s updated version, developers enjoy the ability to author new trajectories, reposition sensors, and simulate realistic drives, all of which are invaluable for developing versatile AV systems.
Advancing End-to-End AV Safety with NVIDIA Halos
In line with NVIDIA’s commitment to operational safety, they have earlier announced NVIDIA Halos—a comprehensive safety platform designed to integrate the full automotive hardware and software safety stack with cutting-edge AI research focused on autonomous driving safety.
Esteemed automotive leaders such as Bosch, Easyrain, and Nuro have joined the NVIDIA Halos AI Systems Inspection Lab. This initiative aims to verify the safety of their products when integrated with NVIDIA technologies, thereby enhancing the overall safety of AV solutions. Existing members include major players like Continental, Ficosa, OMNIVISION, onsemi, and Sony Semiconductor Solutions, illustrating a united effort towards improving AV safety standards.
Catch the highlights from the NVIDIA GTC Paris keynote given by Jensen Huang, NVIDIA founder and CEO, at VivaTech, and explore additional GTC Paris sessions for further insights.
Inspired by: Source

