A robotic book reader. Point a webcam at a book, press a button, and Flipper captures the page, extracts the text via VLM OCR, and reads it aloud. An optional robotic arm (SO-100) automatically turns the pages.
- Camera preview — live MJPEG feed from your webcam in the browser
- Capture — snapshot is sent to a vision model (OpenRouter) for OCR
- TTS — extracted text is synthesized to audio and played in the browser
- Flip — triggers the SO-100 robotic arm to turn the page, then captures automatically
flipper/
├── backend/ # FastAPI server + Python modules
│ ├── server.py # HTTP API (capture, flip, stream, tts)
│ ├── capture.py # Webcam capture with shared frame reader
│ ├── ocr.py # VLM-based OCR via OpenRouter
│ ├── tts.py # Text-to-speech via SmallestAI
│ ├── main.py # CLI entry point (keyboard-driven)
│ └── tmp/ # Captured images (gitignored)
├── frontend/ # Next.js app
│ └── app/
│ ├── page.tsx # Main UI
│ └── api/ # Proxy routes → backend
└── arm/ # SO-100 arm motion scripts
└── SO100-Motion-Recorder-Playback/
├── record_poses.py # Record waypoints to poses.json
└── replay_poses.py # Replay saved waypoints
- Python 3.12+
- uv package manager
- Node.js 18+
- A webcam
- OpenRouter API key (for OCR)
- SmallestAI API key (for TTS)
- (Optional) SO-100 arm +
lerobotconda environment
git clone https://github.com/Shreyas-Yadav/flipper.git
cd flipper
# Backend
uv sync
# Frontend
cd frontend && npm installCreate a .env file at the project root:
# OCR
OCR_API_KEY=your_openrouter_key
OCR_BASE_URL=https://openrouter.ai/api/v1
OCR_MODEL=google/gemini-flash-1.5
# TTS
SMALLEST_API_KEY=your_smallestai_key
SMALLEST_VOICE_ID=magnus
# Camera (0 = default webcam)
CAMERA_INDEX=0
# Arm (optional)
FLIP_SCRIPT=/path/to/flipper/arm/SO100-Motion-Recorder-Playback/replay_poses.py
FLIP_CONDA_ENV=lerobotTerminal 1 — backend:
cd backend
uv run uvicorn server:app --reloadTerminal 2 — frontend:
cd frontend
npm run devOpen http://localhost:3000.
| Method | Path | Description |
|---|---|---|
| GET | /stream?camera=0 |
MJPEG live camera feed |
| GET | /cameras |
List available cameras |
| POST | /capture?camera=0 |
Capture page + OCR |
| POST | /flip?camera=0 |
Trigger arm flip + capture + OCR |
| POST | /tts |
Synthesize text → WAV audio |
| GET | /image?path=... |
Serve a captured image |
The arm scripts live in arm/SO100-Motion-Recorder-Playback/.
Record a flip motion:
conda run -n lerobot python3 arm/SO100-Motion-Recorder-Playback/record_poses.pyTest replay manually:
conda run -n lerobot python3 arm/SO100-Motion-Recorder-Playback/replay_poses.pyOnce verified, set FLIP_SCRIPT in .env to the absolute path of replay_poses.py and the Flip button will trigger it automatically.
Run Flipper without the frontend:
cd backend
uv run python main.py # normal mode
uv run python main.py --logs # with debug loggingKeyboard shortcuts: c capture · p play · q quit