Real-time hybrid AR object detection - Fast local YOLO + Accurate Gemini refinement!
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ COCO-SSD (Local) โ Instant AR Overlays โ
โ โก 10-30 FPS โ
Always visible โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Gemini (Backend) โ Label Refinement โ
โ ๐ง 0.5 FPS โจ Better accuracy โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Best of both worlds:
- โก Fast: COCO-SSD gives instant visual feedback
- ๐ฏ Accurate: Gemini refines labels in background
- ๐จ AR: Labels stick smoothly to objects in 3D space
- โ Instant Detection - COCO-SSD shows AR overlays immediately
- โจ Smart Refinement - Gemini upgrades labels for accuracy
- ๐ฏ Advanced AR Tracking - Labels stick to objects (Google Lens style)
- ๐ฑ Mobile Optimized - Works on iOS and Android
- ๐ Voice Feedback - Audio alerts for important signs
- ๐น Dashcam Mode - Record video with detections
- ๐ PWA - Install as native app
- ๐ถ Fall Detection - Emergency pause and recording
git clone https://github.com/YOUR_USERNAME/SignVision-AR.git
cd SignVision-AR
# Install Python dependencies
pip install -r requirements.txt
# Set up Gemini API key
cp .env.example .env
# Edit .env and add your GEMINI_API_KEY- Go to Google AI Studio
- Click "Create API Key"
- Copy and paste into
.envfile
# Start backend (Gemini refinement)
python server.py
# In a new terminal, serve frontend
python -m http.server 8080
# Open http://localhost:8080- Camera captures frame (1920x1080)
- COCO-SSD detects objects (50-150ms)
- Shows AR overlays immediately
- Labels: Traffic lights, stop signs, vehicles, pedestrians
- Gemini refines labels (background, every 2 seconds)
- More accurate classification
- Detects walk/no walk signals
- Identifies specific sign types
- Upgrades COCO labels with โจ sparkle
- AR tracking keeps labels stuck (Google Lens style)
- IoU matching
- Motion prediction
- Camera motion compensation
- Exponential smoothing
- Regular box (3px): COCO-SSD detection
- Thick box (4px) + โจ: Gemini-refined label
- Dashed box: Predicted position (object not currently detected)
- Glow effect: Active detection
- ๐ฆ Traffic lights
- ๐ Stop signs
- ๐ Vehicles (cars, trucks, buses)
- ๐ถ Pedestrians
- ๐ฆ Walk/Don't Walk signals
- ๐ All traffic signs (stop, yield, speed limit, etc.)
โ ๏ธ Road hazards- ๐ง Construction zones
- More accurate labels
python server.py # Backend on :8000
python -m http.server 8080 # Frontend on :8080Option 1: Backend on Render + Frontend on Vercel
- Deploy Backend (Render/Railway/Heroku):
# Push to GitHub
git push origin main
# On Render.com:
# - New Web Service
# - Connect repo
# - Build: pip install -r requirements.txt
# - Start: python server.py
# - Add environment variable: GEMINI_API_KEY- Deploy Frontend (Vercel/Netlify):
# Update script.js config.apiEndpoint to your backend URL
# Then deploy to Vercel
vercelOption 2: Single Server
- Deploy entire app to one server
- Backend serves API + static files
- Simpler but less scalable
Edit script.js:
config: {
apiEndpoint: 'https://your-backend.onrender.com/analyze',
processingInterval: 100, // COCO-SSD speed (10 FPS)
geminiInterval: 2000, // Gemini frequency (0.5 FPS)
minConfidence: 0.3 // Detection threshold
}Adjust for your needs:
- Faster COCO: Lower
processingInterval(more CPU) - More Gemini: Lower
geminiInterval(more API calls) - Less noise: Increase
minConfidence
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ฑ Camera View โ
โ โ
โ โโโโโโโโโโโโโโโโ โ
โ โ ๐ฆ Traffic โ โ COCO-SSD โ
โ โ Signal โ โ
โ โโโโโโโโโโโโโโโโ โ
โ โ
โ โโโโโโโโโโโโโโโโโ โ
โ โ โจ Walk Signal โ โ Gemini refined โ
โ โ - Green โ (thicker glow) โ
โ โโโโโโโโโโโโโโโโโ โ
โ โ
โ โ โ โ โ โ โ โ โ โ
โ โ ๐ Stop Sign โ โ Predicted โ
โ โ โ โ โ โ โ โ โ (dashed) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Metric | Value |
|---|---|
| COCO-SSD Latency | 50-150ms |
| COCO-SSD FPS | 10-30 FPS |
| Gemini Latency | 500-2000ms |
| Gemini Frequency | 0.5 FPS (every 2s) |
| AR Tracking | Smooth 60 FPS |
| Total Model Size | ~13 MB (COCO-SSD only) |
Gemini API (Free tier):
- 15 requests per minute
- 1,500 requests per day
- ~$0.01 per 100 requests after free tier
Usage:
- 0.5 requests/second = 30 requests/minute
- ~1,800 requests/hour
- Should stay within free tier for testing!
- Check backend is running (
python server.py) - Verify
GEMINI_API_KEYin.env - Check browser console for API errors
- Confirm
config.apiEndpointis correct
- Increase
processingInterval(lower FPS) - Increase
geminiInterval(less refinement) - Use better device/browser
- Enable device motion sensors in settings
- Keep device steady during initial detection
- Check AR tracking parameters in code
| Aspect | Pure COCO-SSD | Pure Gemini | Hybrid (This!) |
|---|---|---|---|
| Speed | โก Instant | ๐ข Slow | โก Instant |
| Accuracy | โ Good (70%) | ๐ฏ Excellent (95%) | ๐ฏ Excellent (95%) |
| Offline | โ Yes | โ No | |
| Cost | ๐ Free | ๐ฐ Paid | ๐ Mostly Free |
| UX | โก Instant | โฐ Laggy | โก Instant + Refined |
- Frontend: Vanilla JS (PWA)
- Fast Detection: TensorFlow.js + COCO-SSD
- Accurate Refinement: Google Gemini 2.0 Flash
- Backend: FastAPI (Python)
- AR Tracking: Custom (IoU, Kalman-like prediction)
- Camera: WebRTC getUserMedia
- Audio: Web Speech Synthesis
- Storage: IndexedDB
- COCO-SSD: 100% local, no data sent
- Gemini: Images sent to Google for refinement (optional)
- No Tracking: No analytics, no user data collection
- Works Offline: COCO-SSD continues without internet
# Install dependencies
pip install -r requirements.txt
# Run with auto-reload
uvicorn server:app --reload --port 8000
# Run frontend
python -m http.server 8080Built with:
Made for accessibility. Empowering visually impaired users with real-time hybrid AR detection.
๐ Star this repo if you find it useful!