A sophisticated multi-agent system for maintenance operations featuring:
- Multi-camera feed support with Nemotron multimodal reasoning
- Real-time frame analysis for safety violations, equipment issues, and technician errors
- Contextual memory for each camera feed
- Monitors all feeds and summarizes task progress
- Updates tickets automatically based on observations
- Provides system-wide oversight
- Retrieves relevant SOPs, safety docs, and repair guides
- Dynamic document search based on detected issues
- Context-aware guidance generation
- Prioritizes tickets based on urgency and technician availability
- Intelligent routing based on skills and location
- Automatic assignment optimization
- Natural, low-frequency verbal feedback
- ReAct-based decision making to avoid annoyance
- Focuses on safety alerts and critical corrections
- Automatic ticket creation when errors detected
- Part reordering workflows
- Self-healing system responses
- Vision Agent - Analyzes camera frames using Nemotron
- ReAct Agent - Implements ReasonβActβObserve workflow
- Voice Agent - Manages ElevenLabs TTS with smart triggering
- RAG Agent - Retrieves documentation dynamically
- Coordinator Agent - Assigns and prioritizes work
- Supervisor Agent - Monitors and summarizes operations
- Tickets - Work orders with priority, status, and metadata
- Technicians - Skills, status, and current assignments
- Camera Feeds - Stream URLs and active monitoring
- Vision Analysis - AI-detected issues and confidence scores
- Agent Memory - Shared state and conversation history
- Documents - SOPs, safety guides, troubleshooting steps
Send camera frames for vision analysis with intelligent caching and batching:
{
"cameraId": "camera_id",
"frameData": "base64_encoded_image",
"priority": 5
}Features:
- β Automatic deduplication using SHA-256 hashing
- β Smart caching (24-hour cache lifetime)
- β Batch processing (5 frames at a time)
- β Similarity detection (skips near-identical frames)
- β Priority queue for urgent frames
Trigger coordinator to assign pending tickets to available technicians.
Manually trigger batch processing for a specific camera:
{
"cameraId": "camera_id"
}Get frame processing statistics:
- Cache hit rate
- Queue depth
- Batch processing status
Required:
OPENROUTER_API_KEY- For Nemotron vision analysisELEVENLABS_API_KEY- For voice synthesis (optional)
- Add Technicians - Create technician profiles with skills
- Add Cameras - Register camera feeds with stream URLs (YouTube URLs work!)
- Add Documentation - Upload SOPs and safety guides
- Monitor Dashboard - View tickets, technicians, and camera status
The system includes powerful frame extraction tools with intelligent caching and batching!
pip install opencv-python requests yt-dlp pillow
python scripts/extract-frames.py \
--youtube "https://youtube.com/watch?v=..." \
--camera-id "k17abc123..." \
--api-url "https://accurate-marlin-326.convex.site" \
--fps 0.5 \
--priority 5npm install fluent-ffmpeg axios
node scripts/extract-frames.js \
--video "datacenter-footage.mp4" \
--camera-id "k17abc123..." \
--api-url "https://accurate-marlin-326.convex.site"curl -X POST https://accurate-marlin-326.convex.site/api/analyze-frame \
-H "Content-Type: application/json" \
-d '{
"cameraId": "your_camera_id_here",
"frameData": "base64_encoded_image_data",
"priority": 5
}'π Detailed Guides:
QUICK_START_FRAMES.md- Get started in 5 minutesFRAME_PROCESSING.md- Complete documentation
YouTube Videos: You can add YouTube URLs as camera feeds. The video will play in the Vision Analysis modal. Use the extraction scripts to send frames for AI analysis.
When the vision agent detects critical issues:
- Reason - AI analyzes severity and required action
- Act - Creates ticket and/or sends voice alert
- Observe - Logs outcome and updates system state
The system uses a ReAct loop for autonomous decision-making:
- REASON: Analyze detected issues and determine severity
- ACT: Create tickets, send voice alerts, or request assistance
- OBSERVE: Monitor outcomes and adjust future actions
Voice alerts are triggered only when:
- Safety violations detected
- Critical errors observed
- Less than 2 alerts in past 5 minutes (to avoid annoyance)
The modular design allows easy addition of:
- New camera feeds (just add to database)
- Additional agent types (extend agent framework)
- Custom workflows (modify ReAct logic)
- New voice channels (add to voice agent)
Built with:
- Convex - Realtime database and backend
- React - Frontend dashboard
- OpenRouter - Nemotron vision AI
- ElevenLabs - Text-to-speech
- TypeScript - Type-safe development