YOLO-AI: High-Performance Real-Time Object Detection Framework
YOLO-AI is a complete end-to-end framework for real-time object detection that enables seamless deployment from model training to production. Built with YOLOv8, BentoML, and WebSocket streaming, it delivers high-quality detection results with minimal latency.
Complete Workflow: Train → Convert → Deploy
Stage
Description
Output
1. Train
Train YOLOv8 model on custom dataset
.pt model weights
2. Convert
Convert PyTorch model to ONNX format
.onnx optimized model
3. Deploy
Deploy ONNX model to BentoML service
Production-ready API service
# Train YOLOv8 model on your dataset
from ultralytics import YOLO
model = YOLO(' yolov8n.pt' )
model.train(data=' your_dataset.yaml' , epochs=100, imgsz=640)
# Output: weights/best.pt
# Convert trained model to ONNX for optimized inference
python -m src.quantization.onnx_model \
--model_path weights/best.pt \
--output_path weights/
# Deploy ONNX model to BentoML
python -m src.deploy.deploy \
--onnx_path weights/best.onnx
# Build BentoML service
bentoml build
# Serve locally
bentoml serve yolov8-service:latest --port 3000
Application Flow: UI → Backend → UI
┌─────────────┐
│ Frontend │
│ (React) │
└──────┬──────┘
│
│ 1. User Action (Upload/Stream)
│
▼
┌─────────────────────────────────────┐
│ API Server (aiohttp) │
│ Port: 8005 │
└──────┬───────────────────┬───────────┘
│ │
│ 2. Process │ 3. WebSocket
│ Request │ Stream
│ │
▼ ▼
┌──────────────┐ ┌─────────────────┐
│ BentoML │ │ YouTube Stream │
│ Service │ │ (yt-dlp+ffmpeg) │
│ Port: 3000 │ │ │
└──────┬───────┘ └────────┬────────┘
│ │
│ 4. YOLO Inference │ 5. Frame Processing
│ (ONNX Runtime) │ (YOLO Detection)
│ │
└──────────┬──────────┘
│
│ 6. Annotated Frame
│
▼
┌─────────────────┐
│ WebSocket │
│ Response │
└────────┬────────┘
│
│ 7. Display Result
│
▼
┌─────────────────┐
│ Frontend UI │
│ (Canvas) │
└─────────────────┘
Step
Component
Action
Data Flow
1
Frontend
User uploads image/YouTube URL
Image/URL → API Server
2
API Server
Receives request, processes frame
Frame → BentoML Service
3
BentoML
YOLO inference on frame
Frame → Detections
4
API Server
Annotates frame with bounding boxes
Detections → Annotated Frame
5
WebSocket
Streams annotated frames
Annotated Frame → Frontend
6
Frontend
Displays result on canvas
Annotated Frame → UI Display
Feature
Description
Component
Image Upload
Upload single image for detection
ImageUpload.tsx
YouTube Streaming
Stream YouTube videos with real-time detection
VideoStreamUpload.tsx
IP Camera
Connect to IP cameras for live detection
IPCameraStream.tsx
1. Image Upload & Detection
Drag-and-drop image upload interface
Real-time annotation with bounding boxes
Download annotated results
Support formats : JPEG, PNG
Display : Confidence scores and class labels
2. YouTube Video Streaming
URL input for YouTube videos
Real-time streaming via WebSocket
Frame-by-frame detection processing
FPS counter and detection statistics
Play/Stop controls
IP camera connection support
Local device camera access
Live streaming with real-time detection
Connection status indicator
Feature
Technology
Description
Model Serving
BentoML
Production-ready ML model serving
Real-time Streaming
WebSocket
Low-latency video streaming
Video Processing
yt-dlp + ffmpeg
YouTube stream extraction and decoding
Object Detection
YOLOv8 + ONNX
High-performance inference
Image Processing
OpenCV + PIL
Frame annotation and encoding
Endpoint
Method
Description
Input
Output
/api/v1/upload
POST
Upload image for detection
Image file
Annotated JPEG
/ws/youtube
WebSocket
YouTube video streaming
YouTube URL
Annotated frames (base64)
--
Example: PPE (Personal Protective Equipment) Detection
Detect Personal Protective Equipment (PPE) including:
CLASS_NAMES = {
0 : "person" ,
1 : "helmet" ,
2 : "vest" ,
3 : "shoes"
}
Metric
Value
Description
FPS
18-20
Frames per second processed
Latency
<100ms
End-to-end detection time
Accuracy
High
YOLOv8-based detection
Frame Skip
Every 3rd frame
Optimized processing
Requirement
Version
Purpose
Python
3.10+
Backend runtime
Node.js
18+
Frontend runtime
CUDA
11.8+ (optional)
GPU acceleration
ffmpeg
Latest
Video processing
yt-dlp
Latest
YouTube extraction
Clone repository
git clone < repository-url>
cd yolo-ai
Install backend dependencies
pip install -r requirements.txt
Install frontend dependencies
bentoml serve yolov8-service:latest --port 3000
python -m src.api.v1 --host 0.0.0.0 --port 8005
Frontend : http://localhost:8081
API Server : http://localhost:8005
BentoML Service : http://localhost:3000
Variable
Default
Description
BENTO_ENDPOINT_URL
http://localhost:3000
BentoML service URL
FPS_LIMIT
20
Maximum frames per second
FRAME_SKIP
3
Process every Nth frame
CONF_THRES
0.20
Confidence threshold
IOU_THRES
0.3
IoU threshold for NMS
CUDA_VISIBLE_DEVICES
1
GPU device ID
yolo-ai/
├── src/
│ ├── api/ # API endpoints (WebSocket, REST)
│ ├── deploy/ # BentoML deployment
│ ├── quantization/ # Model conversion (ONNX, TensorRT)
│ └── config.py # Configuration
├── app/ # Frontend (React + TypeScript)
│ ├── src/
│ │ ├── components/ # UI components
│ │ ├── hooks/ # React hooks
│ │ └── lib/ # Utilities
├── weights/ # Model weights
├── scripts/ # Utility scripts
└── requirements.txt # Python dependencies
Advantage
Description
Easy Deployment
One-command deployment with BentoML
High Quality
YOLOv8 state-of-the-art detection
Fast Performance
WebSocket streaming, ONNX optimization
Production Ready
Scalable, error handling, logging
Developer Friendly
Clear documentation, simple API
[Add your license information here]
[Add contribution guidelines here]