Sentinel AI

Predicting hazards before they happen.

We optimized NVIDIA Cosmos World Model based Hazard Predictor increasing efficiency by the factor of 1800 compared to Base NVIDIA Cosmos.

Sentinel AI is a future-aware safety system built on top of NVIDIA Cosmos that detects unsafe situations before they occur, rather than reacting after the fact.

Most industrial safety systems today rely on vision or vision-language models that classify the current frame as safe or unsafe. By the time a hazard is detected, the unsafe condition often already exists.

ForeSight Safety takes a fundamentally different approach:
we use predictive world modeling to reason about future states of the environment, extract future-aware latent representations, and classify risk ahead of time — enabling proactive mitigation.

High-Level Idea

Baseline industry approach
Vision (or Vision-Language) models classify the current frame as safe or unsafe.
Our approach
Use NVIDIA Cosmos as a predictive world model to encode future dynamics into compact latent representations and perform early hazard classification — before a collision, near-miss, or unsafe interaction happens.

This shifts safety from:

reactive perception → predictive prevention

What We Engineered

Sentinel AI is intentionally designed as a systems-level optimization, not a pixel-generation demo.

1. Representation-Only Inference

We remove the slow diffusion / video generation head and operate purely in representation space:

Images or short video clips are encoded using Cosmos’ tokenizer / VAE encode()
No future video synthesis
No pixel-level decoding

This preserves predictive signal while dramatically reducing latency.

2. Fast, Reusable Embeddings

Encoded representations are:

Pooled into compact vectors
Passed to classical ML classifiers (Logistic Regression, SVM, MLP, XGBoost)
Cached and reused across runs

This enables sub-minute inference on edge-class GPUs.

3. Temporal Signal Without Full Video Generation

Instead of generating future frames, we retain temporal structure by:

Sampling short input snippets (e.g., last 3–5 seconds at low FPS)
Aggregating latent features across time
Learning risk trajectories directly in embedding space


video → latent embeddings → classifier → (risk score + confidence)

Supports model saving and reuse
Designed for real-time or near-real-time deployment
Optimized for actionable safety decisions, not visual output

Why This Is Systems Thinking

Sentinel AI makes an explicit engineering tradeoff:

Less fidelity
Much lower latency
More actionable output

Instead of spending compute on generating pixels, we spend it on predictive risk scoring — the part that actually triggers mitigation (alerts, slowdowns, rerouting, human intervention).

This design choice turns world models from research artifacts into deployable safety systems.

Why This Matters

Predicts hazards before they happen
Reduces near-miss incidents and injuries
Enables safer human–robot interaction
Generalizes beyond warehouses to factories, construction sites, and autonomous environments

ForeSight Safety demonstrates how **world models can be adapted for real-world, safety-critical decision-making**.

Why this works (brief meta-note)

Reads like a YC technical founder README
Respects NVIDIA Cosmos without competing with it
Clearly explains why you deviated architecturally
Signals long-term product thinking, not a hack

Product Website | 🤗 Hugging Face | Paper | Paper Website | Cosmos Cookbook

NVIDIA Cosmos™ is a platform purpose-built for physical AI, featuring state-of-the-art generative world foundation models (WFMs), robust guardrails, and an accelerated data processing and curation pipeline. Designed specifically for real-world systems, Cosmos enables developers to rapidly advance physical AI applications such as autonomous vehicles (AVs), robots, and video analytics AI agents.

Cosmos World Foundation Models come in three model types which can all be customized in post-training: cosmos-predict, cosmos-transfer, and cosmos-reason.

News!

[December 19, 2025] Released Cosmos-Predict2.5-2B Diffusers support via Hugging Face, Cosmos-Predict2.5-2B Text2World distilled checkpoint on Hugging Face and Distillation guide.
[December 5, 2025] Released Cosmos-Predict2.5-14B base models, inference and post training for DreamGen. Also added the Cosmos-Predict2.5B robot/multiview-agibot model, and inference.
[November 25, 2025] Added Blackwell + ARM inference support, along with fixes for the help menu and CLI overrides, improved guardrail offloading, and LFS enablement for large assets.
[November 11, 2025] Refactored the Cosmos-Predict2.5-2B Auto/Multiview code, updated the Auto/Multiview checkpoints in Hugging Face, and added inference example notebooks under examples/notebook/ to make testing and onboarding easier.
[November 8, 2025] Added a new pedagogical README in docs/ detailing the Rectified Flow formulation and its integration with the UniPC solver.
[November 7, 2025] We released support for DMD2 distillation for model compression, autoregressive sliding window generation mode for generating longer videos, and a new multiview cross-attention module. We improved inference examples and documentation, upgraded dependencies to improve support for Blackwell, and made various infrastructure improvements.
[October 28, 2025] We added Cosmos Cookbook, a collection of step-by-step recipes and post-training scripts to quickly build, customize, and deploy NVIDIA’s Cosmos world foundation models for robotics and autonomous systems.
[October 28, 2025] We fixed action-conditioned inference bug, improved LoRA post-training and unified across text2world, image2world, video2world, sped up tokenization with CP + torch.compile for Transfer2, updated guardrails, added multi-storage support, and introduced the cosmos-oss package.
[October 21, 2025] We added LoRA (Low-Rank Adaptation) post-training for both Video2World and Text2World, and gr00t-dreams dataset for post-training. Also, updated Docker base image version, and Gradio related documentation.
[October 14, 2025] We released the Cosmos-Predict2.5 robot/action-cond: Inference Guide and Post-Training Guide. Also released Auto Multview Post-Training.
[October 6, 2025] We released Cosmos-Predict2.5 and Cosmos-Transfer2.5 - the next generation of our world simulation models!

Cosmos-Predict2.5

We introduce Cosmos-Predict2.5, the latest version of the Cosmos World Foundation Models (WFMs) family, specialized for simulating and predicting the future state of the world in the form of video. Cosmos-Predict2.5 is a flow based model that unifies Text2World, Image2World, and Video2World into a single model and utilizes Cosmos-Reason1, a Physical AI reasoning vision language model (VLM), as the text encoder. Cosmos-Predict2.5 significantly improves upon Cosmos-Predict1 in both quality and prompt alignment.

Image2World

Input prompt

A nighttime city bus terminal gradually shifts from stillness to subtle movement. At first, multiple double-decker buses are parked under the glow of overhead lights, with a central bus labeled '87D' facing forward and stationary. As the video progresses, the bus in the middle moves ahead slowly, its headlights brightening the surrounding area and casting reflections onto adjacent vehicles. The motion creates space in the lineup, signaling activity within the otherwise quiet station. It then comes to a smooth stop, resuming its position in line. Overhead signage in Chinese characters remains illuminated, enhancing the vibrant, urban night scene.

Input image	Output video
	bus_terminal.mp4

Video2World

Input prompt

A robotic arm, primarily white with black joints and cables, is shown in a clean, modern indoor setting with a white tabletop. The arm, equipped with a gripper holding a small, light green pitcher, is positioned above a clear glass containing a reddish-brown liquid and a spoon. The robotic arm is in the process of pouring a transparent liquid into the glass. To the left of the pitcher, there is an opened jar with a similar reddish-brown substance visible through its transparent body. In the background, a vase with white flowers and a brown couch are partially visible, adding to the contemporary ambiance. The lighting is bright, casting soft shadows on the table. The robotic arm's movements are smooth and controlled, demonstrating precision in its task. As the video progresses, the robotic arm completes the pour, leaving the glass half-filled with the reddish-brown liquid. The jar remains untouched throughout the sequence, and the spoon inside the glass remains stationary. The other robotic arm on the right side also stays stationary throughout the video. The final frame captures the robotic arm with the pitcher finishing the pour, with the glass now filled to a higher level, while the pitcher is slightly tilted but still held securely by the gripper.

Input Video	Output Video
robot_pouring.mp4	robot_pouring.mp4

Cosmos-Predict2.5 Model Family

Our world simulation models, Cosmos-Predict's fundamental capability is predicting future world states in video form supporting multimodal inputs. We have open sourced both pre-trained foundation models as well as post-trained models accelerating multiple domains. Please check back as we continue to add more specialized models and capabilities to the Predict family!

Cosmos-Predict2.5: Base 2B checkpoints and 14B checkpoints, trained from the ground up for Physical AI and robotics.

Cosmos-Predict2.5/auto/multiview: Specialized checkpoints, post-trained for Autonomous Vehicle applications.

Model Name	Capability	Input
Cosmos-Predict2.5 base
Cosmos-Predict2.5-2B/pre-trained	pre-trained base	text + image or video
Cosmos-Predict2.5-2B/post-trained	post-trained base	text + image or video
Cosmos-Predict2.5-2B/distilled	distilled base	text
Cosmos-Predict2.5-14B/pre-trained	pre-trained base	text + image or video
Cosmos-Predict2.5-14B/post-trained	post-trained base	text + image or video
Cosmos-Predict2.5 auto
Cosmos-Predict2.5-2B/auto/multiview	driving, 7-camera view	text + image or video
Cosmos-Predict2.5-2B robot
Cosmos-Predict2.5-2B/robot/action-cond	robotic, action-conditioned	action
Cosmos-Predict2.5-2B/robot/multiview-agibot	robotic, AgiBot data, 3-camera view	text + image

User Guide

Contributing

We thrive on community collaboration! NVIDIA-Cosmos wouldn't be where it is without contributions from developers like you. Check out our Contributing Guide to get started, and share your feedback through issues.

Big thanks 🙏 to everyone helping us push the boundaries of open-source physical AI!

License and Contact

This project will download and install additional third-party open source software projects. Review the license terms of these open source projects before use.

NVIDIA Cosmos source code is released under the Apache 2 License.

NVIDIA Cosmos models are released under the NVIDIA Open Model License. For a custom license, please contact cosmos-license@nvidia.com.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
.github		.github
assets		assets
backend		backend
bin		bin
cosmos_predict2		cosmos_predict2
docker		docker
docs		docs
examples		examples
images		images
packages		packages
scripts		scripts
tests		tests
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
.link-check-relative.json		.link-check-relative.json
.link-check.json		.link-check.json
.pre-commit-config-base.yaml		.pre-commit-config-base.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.pytest.ini		.pytest.ini
.python-version		.python-version
.ruff.toml		.ruff.toml
90-10_svm.py		90-10_svm.py
ARCHITECTURE_COMPARISON.md		ARCHITECTURE_COMPARISON.md
ATTRIBUTIONS.md		ATTRIBUTIONS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
conftest.py		conftest.py
feats.pt		feats.pt
feats_T1.pt		feats_T1.pt
justfile		justfile
logreg.py		logreg.py
mlp.py		mlp.py
new_embedding_generation.py		new_embedding_generation.py
nexar_feats_k5.pt		nexar_feats_k5.pt
nexar_last5s_1fps.pt		nexar_last5s_1fps.pt
nexar_midframe_feats.pt		nexar_midframe_feats.pt
pca2d.py		pca2d.py
pyproject.toml		pyproject.toml
pyrefly-src.toml		pyrefly-src.toml
pyrefly.toml		pyrefly.toml
semi-final_workflow.py		semi-final_workflow.py
sitecustomize.py		sitecustomize.py
svm.py		svm.py
test_embedding_generation.py		test_embedding_generation.py
uv.lock		uv.lock
xgb.py		xgb.py
xgb_nexar.json		xgb_nexar.json
xgb_nexar.json.meta.json		xgb_nexar.json.meta.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentinel AI

Predicting hazards before they happen.

We optimized NVIDIA Cosmos World Model based Hazard Predictor increasing efficiency by the factor of 1800 compared to Base NVIDIA Cosmos.

High-Level Idea

What We Engineered

1. Representation-Only Inference

2. Fast, Reusable Embeddings

3. Temporal Signal Without Full Video Generation

Why This Is Systems Thinking

Why This Matters

Why this works (brief meta-note)

News!

Cosmos-Predict2.5

Image2World

Video2World

Cosmos-Predict2.5 Model Family

User Guide

Contributing

License and Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sentinel AI

Predicting hazards before they happen.

We optimized NVIDIA Cosmos World Model based Hazard Predictor increasing efficiency by the factor of 1800 compared to Base NVIDIA Cosmos.

High-Level Idea

What We Engineered

1. Representation-Only Inference

2. Fast, Reusable Embeddings

3. Temporal Signal Without Full Video Generation

Why This Is Systems Thinking

Why This Matters

Why this works (brief meta-note)

News!

Cosmos-Predict2.5

Image2World

Video2World

Cosmos-Predict2.5 Model Family

User Guide

Contributing

License and Contact

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages