Demo is here: https://timsinashok.github.io/Sentinel-AI/
FYI This project was submitted by Ashok, Nils, and Parth for Tree Hacks 2026 (Feb 13 -16, Stanford)
Sponsors Used:
- NVIDIA
- Gemini AI
Backend Available at: https://github.com/timsinashok/cosmos-predict2.5
Sentinel AI is a future-aware safety system built on top of NVIDIA Cosmos. Instead of labeling the current frame as safe/unsafe, it uses predictive world modeling to reason about future states of the environment and produce an early risk score — enabling proactive mitigation (alerts, slowdowns, reroutes, kill-switches) before a near-miss becomes an incident.
Major contribution: we optimized the NVIDIA Cosmos world-model-based hazard predictor and increased inference efficiency by ~1800x compared to the base Cosmos pipeline by operating entirely in representation space and removing pixel-level generation.
Safety shift: reactive perception → predictive prevention
This repository contains an interactive web demo UI (Vite + React) that shows the Sentinel AI operator experience:
- Without Sentinel: the system only “understands” what happened after the event (reactive VLM-style explanation).
- With Sentinel: the system surfaces an early warning (time-to-hazard + severity) and a sequence of mitigation actions before the collision.
The demo is driven by a short clip (/0.0-14.0.mp4) plus a lightweight simulation timeline that illustrates the intended end-to-end behavior.
Most industrial safety stacks today do:
- Vision / VLM → classify current frame as safe/unsafe
Sentinel AI instead aims to do:
- World model (Cosmos) → encode dynamics and forecast future states
- Extract future-aware latent representations
- Early hazard classification in representation space
Conceptual pipeline:
video → latent embeddings (future-aware) → classifier → risk score + confidence → mitigation
Sentinel AI is designed as a systems-level optimization, not a pixel-generation demo.
We keep the predictive signal but skip expensive pixel synthesis:
- Encode frames/clips using Cosmos tokenizer / VAE
encode() - No diffusion head
- No future video decoding
Latents are pooled into compact vectors that can be:
- Fed into lightweight classifiers (LogReg / SVM / MLP / XGBoost)
- Cached and reused to reduce repeated compute
This supports model saving and reuse, and enables near-real-time deployment on edge-class GPUs.
Temporal context comes from short snippets (e.g., last 3–5 seconds at low FPS), aggregated directly in embedding space to learn risk trajectories.
Tradeoff (intentional):
- Less fidelity
- Much lower latency
- More actionable output
Sentinel AI makes an explicit engineering tradeoff: we spend compute on predictive risk scoring (the part that triggers mitigation) instead of generating pixels.
This makes world models deployable for safety, not just compelling for demos.
- Node.js (recommended: 18+)
npm installnpm run devThen open http://localhost:3000.
npm run build
npm run preview- Video: the demo expects the clip to be available at
/0.0-14.0.mp4.- For Vite, the simplest approach is to place it in
public/0.0-14.0.mp4so it’s copied into the build output.
- For Vite, the simplest approach is to place it in
- Logo: the header loads
/logo.png(place inpublic/logo.png).
App.tsx: main UI + “With/Without Sentinel” mode logiccomponents/ScenePanel.tsx: video stage + overlays and analysis pause behaviorcomponents/RiskSummary.tsx: risk level + time-to-hazard displaycomponents/AgentActionPanel.tsx: mitigation action timelineconstants.ts: demo timings + initial entitiestypes.ts: shared types and enums
Predicting hazards before they happen can:
- Reduce near-misses and injuries
- Improve human–robot / forklift–pedestrian safety
- Extend from warehouses to factories, construction sites, and autonomous environments
Sentinel AI demonstrates how world models can be adapted into deployable, safety-critical decision systems by prioritizing low-latency risk scoring over pixel generation.
- Built for TreeHacks as a prototype UI + systems concept.
- Inspired by NVIDIA Cosmos and the broader world-modeling ecosystem.