Technician Vision AI - Multi-Agent Maintenance System

A sophisticated multi-agent system for maintenance operations featuring:

Core Features

🎥 Vision Agents

Multi-camera feed support with Nemotron multimodal reasoning
Real-time frame analysis for safety violations, equipment issues, and technician errors
Contextual memory for each camera feed

🎯 Supervisor Agent

Monitors all feeds and summarizes task progress
Updates tickets automatically based on observations
Provides system-wide oversight

📚 RAG Agent

Retrieves relevant SOPs, safety docs, and repair guides
Dynamic document search based on detected issues
Context-aware guidance generation

🎛️ Coordinator Agent

Prioritizes tickets based on urgency and technician availability
Intelligent routing based on skills and location
Automatic assignment optimization

🔊 Voice Guidance (ElevenLabs)

Natural, low-frequency verbal feedback
ReAct-based decision making to avoid annoyance
Focuses on safety alerts and critical corrections

🤖 Autonomous Actions

Automatic ticket creation when errors detected
Part reordering workflows
Self-healing system responses

Architecture

Agent Types

Vision Agent - Analyzes camera frames using Nemotron
ReAct Agent - Implements Reason→Act→Observe workflow
Voice Agent - Manages ElevenLabs TTS with smart triggering
RAG Agent - Retrieves documentation dynamically
Coordinator Agent - Assigns and prioritizes work
Supervisor Agent - Monitors and summarizes operations

Data Models

Tickets - Work orders with priority, status, and metadata
Technicians - Skills, status, and current assignments
Camera Feeds - Stream URLs and active monitoring
Vision Analysis - AI-detected issues and confidence scores
Agent Memory - Shared state and conversation history
Documents - SOPs, safety guides, troubleshooting steps

API Endpoints

`/api/analyze-frame` (POST)

Send camera frames for vision analysis with intelligent caching and batching:

{
  "cameraId": "camera_id",
  "frameData": "base64_encoded_image",
  "priority": 5
}

Features:

✅ Automatic deduplication using SHA-256 hashing
✅ Smart caching (24-hour cache lifetime)
✅ Batch processing (5 frames at a time)
✅ Similarity detection (skips near-identical frames)
✅ Priority queue for urgent frames

`/api/assign-tickets` (POST)

Trigger coordinator to assign pending tickets to available technicians.

`/api/process-batch` (POST)

Manually trigger batch processing for a specific camera:

{
  "cameraId": "camera_id"
}

`/api/cache-stats` (GET)

Get frame processing statistics:

Cache hit rate
Queue depth
Batch processing status

Environment Variables

Required:

OPENROUTER_API_KEY - For Nemotron vision analysis
ELEVENLABS_API_KEY - For voice synthesis (optional)

Usage

Add Technicians - Create technician profiles with skills
Add Cameras - Register camera feeds with stream URLs (YouTube URLs work!)
Add Documentation - Upload SOPs and safety guides
Monitor Dashboard - View tickets, technicians, and camera status

Camera Integration & Frame Extraction

The system includes powerful frame extraction tools with intelligent caching and batching!

🚀 Quick Start (Python)

pip install opencv-python requests yt-dlp pillow

python scripts/extract-frames.py \
  --youtube "https://youtube.com/watch?v=..." \
  --camera-id "k17abc123..." \
  --api-url "https://accurate-marlin-326.convex.site" \
  --fps 0.5 \
  --priority 5

🚀 Quick Start (Node.js)

npm install fluent-ffmpeg axios

node scripts/extract-frames.js \
  --video "datacenter-footage.mp4" \
  --camera-id "k17abc123..." \
  --api-url "https://accurate-marlin-326.convex.site"

Manual API Call

curl -X POST https://accurate-marlin-326.convex.site/api/analyze-frame \
  -H "Content-Type: application/json" \
  -d '{
    "cameraId": "your_camera_id_here",
    "frameData": "base64_encoded_image_data",
    "priority": 5
  }'

📖 Detailed Guides:

QUICK_START_FRAMES.md - Get started in 5 minutes
FRAME_PROCESSING.md - Complete documentation

YouTube Videos: You can add YouTube URLs as camera feeds. The video will play in the Vision Analysis modal. Use the extraction scripts to send frames for AI analysis.

Automatic Ticket Creation

When the vision agent detects critical issues:

Reason - AI analyzes severity and required action
Act - Creates ticket and/or sends voice alert
Observe - Logs outcome and updates system state

ReAct Workflow

The system uses a ReAct loop for autonomous decision-making:

REASON: Analyze detected issues and determine severity
ACT: Create tickets, send voice alerts, or request assistance
OBSERVE: Monitor outcomes and adjust future actions

Voice Guidance Rules

Voice alerts are triggered only when:

Safety violations detected
Critical errors observed
Less than 2 alerts in past 5 minutes (to avoid annoyance)

Scalability

The modular design allows easy addition of:

New camera feeds (just add to database)
Additional agent types (extend agent framework)
Custom workflows (modify ReAct logic)
New voice channels (add to voice agent)

Development

Built with:

Convex - Realtime database and backend
React - Frontend dashboard
OpenRouter - Nemotron vision AI
ElevenLabs - Text-to-speech
TypeScript - Type-safe development

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.cursor/rules		.cursor/rules
convex		convex
scripts		scripts
src		src
.env		.env
.gitignore		.gitignore
AUTO_VIDEO_ANALYSIS.md		AUTO_VIDEO_ANALYSIS.md
CHEF_README.md		CHEF_README.md
DEMO_CHECKLIST.md		DEMO_CHECKLIST.md
DEPLOYMENT_COMMANDS.md		DEPLOYMENT_COMMANDS.md
FRAME_PROCESSING.md		FRAME_PROCESSING.md
FRAME_SYSTEM_SUMMARY.md		FRAME_SYSTEM_SUMMARY.md
INVENTORY_ORDERING_GUIDE.md		INVENTORY_ORDERING_GUIDE.md
QUICK_START_FRAMES.md		QUICK_START_FRAMES.md
README.md		README.md
TEST_INVENTORY_ORDERS.md		TEST_INVENTORY_ORDERS.md
VOICE_CHAT_GUIDE.md		VOICE_CHAT_GUIDE.md
VOICE_INVENTORY_IMPLEMENTATION.md		VOICE_INVENTORY_IMPLEMENTATION.md
WEBCAM_DEMO_SETUP.md		WEBCAM_DEMO_SETUP.md
components.json		components.json
eslint.config.js		eslint.config.js
index.html		index.html
package.json		package.json
postcss.config.cjs		postcss.config.cjs
setup.mjs		setup.mjs
tailwind.config.js		tailwind.config.js
test-voice-alert.js		test-voice-alert.js
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Technician Vision AI - Multi-Agent Maintenance System

Core Features

🎥 Vision Agents

🎯 Supervisor Agent

📚 RAG Agent

🎛️ Coordinator Agent

🔊 Voice Guidance (ElevenLabs)

🤖 Autonomous Actions

Architecture

Agent Types

Data Models

API Endpoints

`/api/analyze-frame` (POST)

`/api/assign-tickets` (POST)

`/api/process-batch` (POST)

`/api/cache-stats` (GET)

Environment Variables

Usage

Camera Integration & Frame Extraction

🚀 Quick Start (Python)

🚀 Quick Start (Node.js)

Manual API Call

Automatic Ticket Creation

ReAct Workflow

Voice Guidance Rules

Scalability

Development

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Technician Vision AI - Multi-Agent Maintenance System

Core Features

🎥 Vision Agents

🎯 Supervisor Agent

📚 RAG Agent

🎛️ Coordinator Agent

🔊 Voice Guidance (ElevenLabs)

🤖 Autonomous Actions

Architecture

Agent Types

Data Models

API Endpoints

/api/analyze-frame (POST)

/api/assign-tickets (POST)

/api/process-batch (POST)

/api/cache-stats (GET)

Environment Variables

Usage

Camera Integration & Frame Extraction

🚀 Quick Start (Python)

🚀 Quick Start (Node.js)

Manual API Call

Automatic Ticket Creation

ReAct Workflow

Voice Guidance Rules

Scalability

Development

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`/api/analyze-frame` (POST)

`/api/assign-tickets` (POST)

`/api/process-batch` (POST)

`/api/cache-stats` (GET)

Packages