Inspiration
Most camera analytics systems are either:
- expensive
- over-engineered
- slow to customize
- or require ML engineers to operate
Small businesses can’t afford that. Even enterprises routinely deploy the wrong models — wasting compute, increasing latency, and locking themselves into bloated pipelines.
We asked a simple question:
Why can’t you just describe what you want monitored… and have the system build itself?
That became S.E.E.R — a prompt-to-deployment camera intelligence platform with built-in emergency response.
What it does
S.E.E.R turns natural language into a live, production-style camera pipeline.
Example prompt:
“Detect shoplifting in high-value aisles and call for help if a fight breaks out.”
S.E.E.R automatically:
- selects appropriate computer vision models
- wires them into an end-to-end pipeline
- deploys real-time inference workers
- configures alerts and analytics
- enables emergency escalation via outbound voice calls
Supported capabilities:
- camera ingestion & stream management
- model discovery + selection
- detection, tracking, OCR, activity recognition
- dwell time & behavior analytics
- event routing (webhooks, notifications, clips)
- emergency escalation (Twilio outbound calling)
Use cases include:
- Retail – shoplifting + violence detection
- Factories – worker injury detection
- Parking lots – collision / hit-and-run detection
- Healthcare rooms – fall & immobility detection
How we built it
We designed S.E.E.R as an orchestration system:
- Prompt → structured monitoring plan
- Task decomposition (detection, tracking, analytics, alerts)
- Model discovery for each task
- Cost/latency/accuracy-aware model selection
- Pipeline compilation
- Deployment across inference workers
- Real-time event routing + escalation actions
We used modular MCP servers for:
- camera tools
- model tools
- inference tools
- alerting tools
- analytics tools
- storage tools
- voice escalation tools
The result: intent → deployed system in minutes.
Challenges we ran into
Live emergency calling
Connecting LiveKit to Twilio required:
- SIP trunking setup
- call routing & termination logic
- real-time audio bridging
- handling silent failures from misconfiguration
This was one of the hardest parts to get stable.
Model deployment at runtime
Selecting models is easy. Running them reliably is not.
We had to solve:
- dependency packaging
- GPU scheduling
- inference latency under multiple streams
- scaling workers dynamically
Model documentation ingestion
We used Firecrawl to scrape model documentation, but ran into:
- limited credits
- inconsistent formatting
- oversized pages
We added caching, targeted crawling, and fallback strategies.
Accomplishments that we're proud of
- Prompt → deployed camera pipeline working end-to-end
- Automatic model selection based on cost & latency
- Real-time multi-service orchestration
- Working emergency voice escalation (actual phone calls)
- Fully built during the hackathon timeframe
What we learned
- Orchestration is harder than model accuracy
- Real-time systems fail in non-obvious ways
- Emergency workflows require extreme reliability
- Most businesses don’t need “better models” — they need better deployment
What's next for S.E.E.R
Behavior-based shoplifting detection
(concealment gestures, distraction teams, rapid exits)Human-in-the-loop verification
(confirm / dismiss alerts before escalation)Multi-camera re-identification
(track behavior across entrances, aisles, and exits)Cost-aware auto-optimization
(continuously swap models as usage patterns change)
Built With
- claudecode
- fastapi
- huggingface
- livekit
- mcps
- postgresql
- python
- vlms
Log in or sign up for Devpost to join the conversation.