💡 Inspiration
We identified a critical flaw in modern security infrastructure: the Human Latency Gap. Traditional CCTV is passive, and data suggests that operators miss up to 95% of security events after just 20 minutes of continuous monitoring due to fatigue.
We viewed this not as a security problem, but as a data throughput and latency problem. Our goal with SecureVista was to transition from reactive forensics to deterministic real-time prevention, minimizing the time delta between event and response (Δt) to near zero.
🛡️ What it does
SecureVista is an autonomous surveillance pipeline that transforms standard IP cameras into intelligent guardians.
- Automated Threat Detection: Instantly flags weapons, unauthorized entry, and aggressive behavior.
- Behavioral Analytics: Detects loitering in restricted zones and analyzes shadow/motion patterns to reduce false positives.
- Health & Safety: Uses pose estimation to detect accidental falls (e.g., for elderly care or hospitals) and triggers immediate medical alerts.
- Real-Time Dashboard: A low-latency React dashboard that streams alerts and evidence snapshots to security personnel via Twilio SMS/WhatsApp.
⚙️ How we built it
We engineered a Distributed Edge-Inference Pipeline designed for high throughput.
1. The Vision Backbone (Detection)
We utilized a customized YOLOv8 architecture, fine-tuned on a custom dataset for security-specific classes. To improve localization accuracy in crowded campus scenes, we optimized the loss function during training:
$$ \mathcal{L}_{CIoU} = 1 - IoU + \frac{\rho^2(b, b^{gt})}{c^2} + \alpha v $$
2. State Estimation (Tracking)
Detection alone is jittery. To maintain object identity across frames (handling occlusion), we implemented a robust state estimation model using a Kalman Filter. The tracker predicts the future position of a subject using the state update equation:
x̂(k|k) = x̂(k|k−1) + Kₖ · [ zₖ − Hₖ · x̂(k|k−1) ]
This allows SecureVista to "remember" a threat even if they briefly pass behind a pillar. Simultaneously, we run MediaPipe for pose estimation to detect specific actions like falling or crouching.
3. Asynchronous Orchestration
To prevent the inference engine from blocking the video stream, we used Python's threading module to decouple the frame capture from the processing logic.
- Frame Buffer: A LIFO (Last-In-First-Out) queue ensures the model always processes the latest frame, dropping older ones if the GPU is saturated.
- Communication: We used FastAPI with WebSockets to push alerts to the frontend in real-time (<100ms latency), rather than having the client poll the server.
🚧 Challenges we ran into
- The "Flicker" False Positive: Initial models would flag a threat for 1 frame due to lighting noise. We implemented a Temporal Consistency Algorithm that only triggers an alert if detection confidence exceeds a threshold τ for k consecutive frames.
- Shadow Noise: In outdoor settings, moving shadows were often misclassified as intruders. We implemented a Shadow Analysis module (using
cv2.createBackgroundSubtractorMOG2) to differentiate between solid objects and transient light artifacts. - Dependency Hell: Integrating MediaPipe (CPU-bound) with YOLOv8 (GPU-bound) caused resource contention. We solved this by containerizing the services and optimizing the thread allocation.
🏅 Accomplishments that we're proud of
- 🏆 Winner of CodeVeda: Secured top spot in the 48hr Hackathon organized by IIT Madras BS and Manipal University Jaipur.
- Real-Time Performance: Achieved <150ms end-to-end latency (Camera → Server → Dashboard).
- Scalability: The system successfully runs 4 concurrent 1080p streams on a single node without significant frame drops.
- Privacy-First: By processing at the edge, no raw video footage needs to be sent to the cloud—only the metadata and alert snapshots are transmitted.
🧠 What we learned
- Data Drift: Models trained on well-lit datasets fail in low-light corridors. We learned the importance of Gamma Correction preprocessing to normalize input feeds.
- Bottlenecks: We discovered that in high-FPS computer vision, the bottleneck is often not the GPU compute, but the CPU-bound video decoding and memory copying between RAM and VRAM.
🚀 What's next for SecureVista
- Action Recognition: Moving beyond bounding boxes to Video Vision Transformers (ViViT) to detect complex interactions like fights.
- Federated Learning: Implementing a decentralized training loop where edge nodes update the global model weights without sharing privacy-sensitive video data.
- Blockchain Identity: Integrating a blockchain-based identity management system for tamper-proof access logs.
Log in or sign up for Devpost to join the conversation.