Architecture
Sparky!

Sentinel.AI — Devpost Submission

Inspiration

Three researchers walked into TreeHacks and picked the hardest problem we could find: predict the future.

Not metaphorically. Literally. We wanted to build a system that watches the physical world and knows what's about to go wrong — before it happens. Every year, preventable accidents in warehouses, factories, and autonomous systems cost lives and billions of dollars. The technology to prevent them exists — it's just trapped inside massive research models that are too slow to matter. We decided to fix that.

What it does

Sentinel.AI is a predictive safety engine for the physical world. It takes live video from any environment — a warehouse floor, a construction site, an autonomous vehicle's feed — and forecasts hazards seconds into the future. When danger is forming, Gemini-powered agents take real, grounded actions: slowing robots, rerouting vehicles, alerting humans.

Current systems react. Sentinel prevents.

How we built it

We forked NVIDIA Cosmos — a state-of-the-art world model — and did something NVIDIA hasn't done yet: we made it fast enough to save lives.

The original Cosmos pipeline generates full future video. Beautiful, but 30 minutes per inference on DGX Spark (We call him Sparky✨). Useless for real-time safety.

Our breakthrough: we ripped out the video generation head entirely. The model's internal representations — its compressed understanding of physics, motion, and spatial relationships — already contain everything needed to predict danger. We don't need to see the future. We just need to understand it.

We attached a lightweight XGBoost classifier directly onto these future-aware embeddings. Then we connected Gemini agents as the decision layer — taking the predicted risk and choosing world-grounded actions: stop, slow, reroute, alert.

End-to-end latency: under 1 second. Down from 30 minutes on Sparky.

Challenges we ran into

Cosmos was never designed for real-time inference. Making a research-grade world model run at edge speed meant rethinking the entire pipeline — not fine-tuning it, restructuring it. We also had to prove that embeddings alone (without generated video) retain enough predictive signal to classify risk accurately. THEY DO!

Cross-domain generalization was another battle. We trained and tested on automotive collision data and deployed on warehouse scenarios — two very different visual worlds. Getting the representations to transfer required careful temporal feature design.

Accomplishments that we're proud of

30 minutes → under 1 second. That's not optimization. That's a paradigm shift.

We proved that world models — the most powerful spatial reasoning systems in AI — can run on edge GPUs for safety-critical decisions. And we showed that Gemini agents can take those predictions and act on them in the real world, not just narrate what they see.

What we learned

The most expensive part of a world model — generating pixels — is also the part you don't need. Representations are the product. This insight applies far beyond safety: any application where you need fast, physics-aware reasoning (robotics, autonomous driving, industrial automation) can benefit from this pattern.

We also learned that the gap between a research breakthrough and a deployable system is an engineering problem — and sometimes, the best engineering is knowing what to throw away.

What's next for Sentinel.AI

This isn't a warehouse tool. It's infrastructure for any system that operates in the physical world. Autonomous vehicles. Robotic fleets. Construction sites. Surgical robots. Anywhere humans and machines share space, Sentinel can predict what's about to go wrong and prevent it.

The future of safety isn't faster reactions. It's prediction.