We optimized NVIDIA Cosmos World Model based Hazard Predictor increasing efficiency by the factor of 1800 compared to Base NVIDIA Cosmos.
Sentinel AI is a future-aware safety system built on top of NVIDIA Cosmos that detects unsafe situations before they occur, rather than reacting after the fact.
Most industrial safety systems today rely on vision or vision-language models that classify the current frame as safe or unsafe. By the time a hazard is detected, the unsafe condition often already exists.
ForeSight Safety takes a fundamentally different approach:
we use predictive world modeling to reason about future states of the environment, extract future-aware latent representations, and classify risk ahead of time — enabling proactive mitigation.
-
Baseline industry approach
Vision (or Vision-Language) models classify the current frame as safe or unsafe. -
Our approach
Use NVIDIA Cosmos as a predictive world model to encode future dynamics into compact latent representations and perform early hazard classification — before a collision, near-miss, or unsafe interaction happens.
This shifts safety from:
reactive perception → predictive prevention
Sentinel AI is intentionally designed as a systems-level optimization, not a pixel-generation demo.
We remove the slow diffusion / video generation head and operate purely in representation space:
- Images or short video clips are encoded using Cosmos’ tokenizer / VAE
encode() - No future video synthesis
- No pixel-level decoding
This preserves predictive signal while dramatically reducing latency.
Encoded representations are:
- Pooled into compact vectors
- Passed to classical ML classifiers (Logistic Regression, SVM, MLP, XGBoost)
- Cached and reused across runs
This enables sub-minute inference on edge-class GPUs.
Instead of generating future frames, we retain temporal structure by:
- Sampling short input snippets (e.g., last 3–5 seconds at low FPS)
- Aggregating latent features across time
- Learning risk trajectories directly in embedding space
video → latent embeddings → classifier → (risk score + confidence)
- Supports model saving and reuse
- Designed for real-time or near-real-time deployment
- Optimized for actionable safety decisions, not visual output
Sentinel AI makes an explicit engineering tradeoff:
- Less fidelity
- Much lower latency
- More actionable output
Instead of spending compute on generating pixels, we spend it on predictive risk scoring — the part that actually triggers mitigation (alerts, slowdowns, rerouting, human intervention).
This design choice turns world models from research artifacts into deployable safety systems.
- Predicts hazards before they happen
- Reduces near-miss incidents and injuries
- Enables safer human–robot interaction
- Generalizes beyond warehouses to factories, construction sites, and autonomous environments
ForeSight Safety demonstrates how **world models can be adapted for real-world, safety-critical decision-making**.
- Reads like a YC technical founder README
- Respects NVIDIA Cosmos without competing with it
- Clearly explains why you deviated architecturally
- Signals long-term product thinking, not a hack
Product Website | 🤗 Hugging Face | Paper | Paper Website | Cosmos Cookbook
NVIDIA Cosmos™ is a platform purpose-built for physical AI, featuring state-of-the-art generative world foundation models (WFMs), robust guardrails, and an accelerated data processing and curation pipeline. Designed specifically for real-world systems, Cosmos enables developers to rapidly advance physical AI applications such as autonomous vehicles (AVs), robots, and video analytics AI agents.Cosmos World Foundation Models come in three model types which can all be customized in post-training: cosmos-predict, cosmos-transfer, and cosmos-reason.
- [December 19, 2025] Released Cosmos-Predict2.5-2B Diffusers support via Hugging Face, Cosmos-Predict2.5-2B Text2World distilled checkpoint on Hugging Face and Distillation guide.
- [December 5, 2025] Released Cosmos-Predict2.5-14B base models, inference and post training for DreamGen. Also added the Cosmos-Predict2.5B robot/multiview-agibot model, and inference.
- [November 25, 2025] Added Blackwell + ARM inference support, along with fixes for the help menu and CLI overrides, improved guardrail offloading, and LFS enablement for large assets.
- [November 11, 2025] Refactored the Cosmos-Predict2.5-2B Auto/Multiview code, updated the Auto/Multiview checkpoints in Hugging Face, and added inference example notebooks under examples/notebook/ to make testing and onboarding easier.
- [November 8, 2025] Added a new pedagogical README in docs/ detailing the Rectified Flow formulation and its integration with the UniPC solver.
- [November 7, 2025] We released support for DMD2 distillation for model compression, autoregressive sliding window generation mode for generating longer videos, and a new multiview cross-attention module. We improved inference examples and documentation, upgraded dependencies to improve support for Blackwell, and made various infrastructure improvements.
- [October 28, 2025] We added Cosmos Cookbook, a collection of step-by-step recipes and post-training scripts to quickly build, customize, and deploy NVIDIA’s Cosmos world foundation models for robotics and autonomous systems.
- [October 28, 2025] We fixed action-conditioned inference bug, improved LoRA post-training and unified across text2world, image2world, video2world, sped up tokenization with CP + torch.compile for Transfer2, updated guardrails, added multi-storage support, and introduced the cosmos-oss package.
- [October 21, 2025] We added LoRA (Low-Rank Adaptation) post-training for both Video2World and Text2World, and gr00t-dreams dataset for post-training. Also, updated Docker base image version, and Gradio related documentation.
- [October 14, 2025] We released the Cosmos-Predict2.5 robot/action-cond: Inference Guide and Post-Training Guide. Also released Auto Multview Post-Training.
- [October 6, 2025] We released Cosmos-Predict2.5 and Cosmos-Transfer2.5 - the next generation of our world simulation models!
We introduce Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video. Cosmos-Predict2.5 is a flow based model that unifies Text2World, Image2World, and Video2World into a single model and utilizes Cosmos-Reason1, a Physical AI reasoning vision language model (VLM), as the text encoder. Cosmos-Predict2.5 significantly improves upon Cosmos-Predict1 in both quality and prompt alignment.
Input prompt
A nighttime city bus terminal gradually shifts from stillness to subtle movement. At first, multiple double-decker buses are parked under the glow of overhead lights, with a central bus labeled '87D' facing forward and stationary. As the video progresses, the bus in the middle moves ahead slowly, its headlights brightening the surrounding area and casting reflections onto adjacent vehicles. The motion creates space in the lineup, signaling activity within the otherwise quiet station. It then comes to a smooth stop, resuming its position in line. Overhead signage in Chinese characters remains illuminated, enhancing the vibrant, urban night scene.| Input image | Output video |
|---|---|
![]() |
bus_terminal.mp4 |
Input prompt
A robotic arm, primarily white with black joints and cables, is shown in a clean, modern indoor setting with a white tabletop. The arm, equipped with a gripper holding a small, light green pitcher, is positioned above a clear glass containing a reddish-brown liquid and a spoon. The robotic arm is in the process of pouring a transparent liquid into the glass. To the left of the pitcher, there is an opened jar with a similar reddish-brown substance visible through its transparent body. In the background, a vase with white flowers and a brown couch are partially visible, adding to the contemporary ambiance. The lighting is bright, casting soft shadows on the table. The robotic arm's movements are smooth and controlled, demonstrating precision in its task. As the video progresses, the robotic arm completes the pour, leaving the glass half-filled with the reddish-brown liquid. The jar remains untouched throughout the sequence, and the spoon inside the glass remains stationary. The other robotic arm on the right side also stays stationary throughout the video. The final frame captures the robotic arm with the pitcher finishing the pour, with the glass now filled to a higher level, while the pitcher is slightly tilted but still held securely by the gripper.| Input Video | Output Video |
|---|---|
robot_pouring.mp4 |
robot_pouring.mp4 |
Our world simulation models, Cosmos-Predict's fundamental capability is predicting future world states in video form supporting multimodal inputs. We have open sourced both pre-trained foundation models as well as post-trained models accelerating multiple domains. Please check back as we continue to add more specialized models and capabilities to the Predict family!
Cosmos-Predict2.5: Base 2B checkpoints and 14B checkpoints, trained from the ground up for Physical AI and robotics.
Cosmos-Predict2.5/auto/multiview: Specialized checkpoints, post-trained for Autonomous Vehicle applications.
| Model Name | Capability | Input |
|---|---|---|
| Cosmos-Predict2.5 base | ||
| Cosmos-Predict2.5-2B/pre-trained | pre-trained base | text + image or video |
| Cosmos-Predict2.5-2B/post-trained | post-trained base | text + image or video |
| Cosmos-Predict2.5-2B/distilled | distilled base | text |
| Cosmos-Predict2.5-14B/pre-trained | pre-trained base | text + image or video |
| Cosmos-Predict2.5-14B/post-trained | post-trained base | text + image or video |
| Cosmos-Predict2.5 auto | ||
| Cosmos-Predict2.5-2B/auto/multiview | driving, 7-camera view | text + image or video |
| Cosmos-Predict2.5-2B robot | ||
| Cosmos-Predict2.5-2B/robot/action-cond | robotic, action-conditioned | action |
| Cosmos-Predict2.5-2B/robot/multiview-agibot | robotic, AgiBot data, 3-camera view | text + image |
We thrive on community collaboration! NVIDIA-Cosmos wouldn't be where it is without contributions from developers like you. Check out our Contributing Guide to get started, and share your feedback through issues.
Big thanks 🙏 to everyone helping us push the boundaries of open-source physical AI!
This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.
NVIDIA Cosmos source code is released under the Apache 2 License.
NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.

