Inspiration
Firefighters enter burning buildings with almost zero visibility, limited air, and extreme heat. Even experienced responders can miss early warning signs when every second counts. Modern AI systems can recognize complex risk patterns instantly. Inside a structure fire, firefighters must do the same — without clear visibility and under intense physical and cognitive strain. In those conditions, detecting imminent danger in time can mean the difference between escape and catastrophe.
So we asked:
What if an AI system could continuously monitor the immediate environment and alert firefighters to danger before it becomes critical?
That question led us to focus on a practical and achievable first step: real-time, localized hazard awareness.
What it does
We built a computer vision system that doesn’t just detect hazards — it interprets them in context and surfaces recommended actions based on expert firefighting protocols. In realtime, a firefighter can detect smoke, fire, and debris hazards that may harm them or people they are saving. The system also prioritizes critical radio communications, ensuring the command center receives vital information instantly.
How we built it
Detection On Edge A sub-30ms, air-gapped safety loop running on the NVIDIA Jetson Orin. We engineered a multi sensor pipeline that combines thermal and smoke sensor data with YOLOv8 computer vision, allowing the system to detect hazards, track humans, and identify flashover risks in real-time—processing reality faster than the human eye, even without internet. We also trained our YOLO model, augmenting the model with specific hazard or objects of interests specific to firefighting purposes.
Knowledge Access We built a low-latency retrieval-augmented generation (RAG) system using Actian Vector. Our computer vision pipeline, with BoT-SORT object persistence, identifies hazards frame by frame and converts observations into natural language. A sliding three-second window is passed through an LLM to generate real-time scene understanding, which is stored in a mission log.
Short-term memory is maintained in Redis, allowing the system to quickly recall recent objects and contextual events. Fire intensity from YOLO models is used to structure the vector database, enabling fast, dynamic retrieval of relevant safety protocols. The result: our system understands the story of the fire, distinguishing stable situations from rapidly escalating threats, and provides context-aware guidance to responders in under <200ms.
The result: our system understands the story of the fire, distinguishing stable situations from rapidly escalating threats, and provides context-aware guidance to responders in real time.
The Command A tactical dashboard using React and websockets for data that transforms complex hazard data into actionable procedure checklists. The system models multiple firefighters allowing the system to detect hazards, track humans, and identify flashover risks in real time. It processes data faster than the human eye, so responders and commanders can make informed decisions instantly.
Challenges we ran into
Fragile and limited sensors: Smoke detectors are delicate, and thermal cameras have lower resolution, making reliable detection difficult. Contextual hazard understanding: We needed to build a computer vision memory that tracks environmental changes and interprets data as semantic intelligence to use in firefighting context. Low-latency intelligence: The system must analyze hazards and deliver actionable guidance instantly, without delay. Operational constraints: It has to function reliably in extreme heat, near-zero visibility, and high-stress environments.
Accomplishments that we're proud of
Fragile and limited sensors: smoke detectors and low-res thermal cameras Contextual hazard understanding: building a stateful RAG intelligence that was heavily optimized under tight time constraints Low-latency intelligence: delivering actionable guidance instantly Operational constraints: functioning reliably in extreme heat, low visibility, and high-stress environments
What we learned
Integrating physical sensors adds significant complexity. Training and tuning models on live hardware requires more time, careful calibration, and repeated testing compared to pure software simulations.
What's next for Firewatch
Enhance indoor navigation by replacing rope markers and heuristic methods with more reliable real-time positioning. Enable GPS-independent tracking to maintain situational awareness in complex or obstructed structures.
Built With
- computervision
- embedded
- inference
- jetsonnano
- model-tuning
- opencv
- pytorch
- rag
- react
- redis
- vectordb
- websockets
- yolo


Log in or sign up for Devpost to join the conversation.