Inspiration

Our inspiration came from the recent tragedy that took place in Amherst that displaced many students close to campus. We saw a critical gap between when a fire starts and how panic among people can lead to mismanagement of information and widespread panic, which leads to hazards to the lives of people involved. In those crucial first minutes, confusion, panic, and misinformation often spread faster than help can arrive. We wanted to leverage machine learning and real-time visual intelligence to bridge that gap. By analyzing readily available drone, CCTV, or street camera feeds, our system can provide instant, automated insights that help authorities and the public make faster, safer decisions

What it does

FireFlee is an intelligent, real-time emergency routing platform. It works by processing live video feeds from street-level cameras and feeding those images into the Gemini 2.5 Vision Language model. The AI performs two key tasks on a continuous basis:

It identifies nodes (like intersections or pathways) that are blocked by fire.
In safe areas, it estimates crowd density by creating a segmented mask to identify high-panic or congested zones.

Our website displays this information on a live-interactive map, marking impassable fire zones with a "red beam." When a user inputs their current location (either by selecting their location or by speaking to our voice agent that is triggered in case of emergency), our backend greedy-A* search algorithm instantly calculates the optimal escape route. This route dynamically avoids fire-blocked paths and heavily penalizes high-congestion areas, guiding users to the safest, fastest, and least-crowded exit.

How we built it

We designed our system end-to-end to analyze real-time video feeds, detect potential fire or pedestrian hazards, and communicate critical insights instantly. Our goal was to combine the strengths of vision-language models (VLMs), cloud infrastructure, and geospatial APIs to deliver actionable intelligence within seconds

Data Ingestion layer: We sourced real-time or recorded drone and CCTV footage from open datasets like Stanford Drone Dataset, and CERN street views (including some UCY's). Our frame uploading live or recorded video streams. These are sent to our backend for frame-by-frame analysis.
OpenCV + FastAPI (Python Backend): Extracts and batches video frames for Gemini inference and manages data flow between APIs. ElevenlabsAPI: Converts auto-detected alerts (e.g., “Fire detected near Lot 32”) into real-time voice announcements for immediate response and records location to guide you to safest exit.
Frontend: We used React, HTML/CSS, and Mapbox/Leaflet.js to build a responsive and interactive user interface that displays live hazard maps, camera feeds, and rerouting suggestions with real-time WebSocket updates.
Vultr Cloud: Hosts both the frontend and backend with Nginx reverse proxy on our .tech domain; object storage handles large media files.
Routing Engine: We modeled the local area as a graph of nodes (intersections) and edges (paths). We then implemented the A* pathfinding algorithm in Python. The cost function for the algorithm is updated in real-time by the AI's output: Nodes with fire: true are given an infinite cost, making them impassable. Nodes with crowd_density: "high" are given a high-cost penalty, encouraging the algorithm to find less-congested routes.

Challenges we ran into

Our biggest challenge was data simulation and integration. Since we couldn't legally or practically access live city camera feeds during the hackathon, we had to build a robust simulator that would "play" different scenarios (like a fire starting or a crowd forming) to our application.

Another significant hurdle was data ingestion and tuning for the Gemini Vision Language model. Getting it to reliably and quickly return accurate, structured JSON data from a chaotic video frame was a major challenge. We had to iterate many times to find prompts that were descriptive enough for the AI but simple enough for fast processing.

Accomplishments that we're proud of

We are incredibly proud of building a fully functional, end-to-end prototype in such a short time. Our biggest accomplishment is the "live" integration: we successfully take a raw image, have an AI analyze it, and use that analysis to dynamically change the logic of a pathfinding algorithm all in one seamless flow. We are proud that our project was inspired by a major campus fire, giving us the opportunity to contribute to building a more robust and proactive prevention system that can help safeguard lives in the future. Seeing the A* algorithm correctly route a user around a "red beam" fire zone and away from a "crowded" node on our map for the first time was the moment we knew our concept was viable, especially seeing how it would be so valuable to the student community. Additionally, working with VLMs, masking, segmentation for crowd density proved to be quite challenging for us, especially since some of us were new to the domain, but seeing it work as a strong proof of concept in its early stages was extremely encouraging.

What we learned

This project was a crash course in practical, multi-modal AI integration. We learned how to use a cutting-edge vision model like Gemini 2.5 Flash not just for simple classification, but for extracting complex, real-world data from a scene. Combining Google Maps, 11Labs, and Gemini allowed us to build a fully functional early-warning system in a short timeframe and allowed us to prototype an end-to-end system fast.

We also gained a much deeper understanding of the algorithms we learn in courses - discussing greedy approaches versus A*, and network flow approaches (some of which were included in our early iterations), and so much value can be gained by modifying simple arguments to cost functions. We learned how to modify it to account for multiple, dynamic variables (fire and crowds) beyond just simple distance, which is crucial for real-world routing. It was great implementing graph-based approaches for real-world scenarios