Flipper

A robotic book reader. Point a webcam at a book, press a button, and Flipper captures the page, extracts the text via VLM OCR, and reads it aloud. An optional robotic arm (SO-100) automatically turns the pages.

How it works

Camera preview — live MJPEG feed from your webcam in the browser
Capture — snapshot is sent to a vision model (OpenRouter) for OCR
TTS — extracted text is synthesized to audio and played in the browser
Flip — triggers the SO-100 robotic arm to turn the page, then captures automatically

Project structure

flipper/
├── backend/          # FastAPI server + Python modules
│   ├── server.py     # HTTP API (capture, flip, stream, tts)
│   ├── capture.py    # Webcam capture with shared frame reader
│   ├── ocr.py        # VLM-based OCR via OpenRouter
│   ├── tts.py        # Text-to-speech via SmallestAI
│   ├── main.py       # CLI entry point (keyboard-driven)
│   └── tmp/          # Captured images (gitignored)
├── frontend/         # Next.js app
│   └── app/
│       ├── page.tsx  # Main UI
│       └── api/      # Proxy routes → backend
└── arm/              # SO-100 arm motion scripts
    └── SO100-Motion-Recorder-Playback/
        ├── record_poses.py   # Record waypoints to poses.json
        └── replay_poses.py   # Replay saved waypoints

Requirements

Python 3.12+
uv package manager
Node.js 18+
A webcam
OpenRouter API key (for OCR)
SmallestAI API key (for TTS)
(Optional) SO-100 arm + lerobot conda environment

Setup

1. Clone and install

git clone https://github.com/Shreyas-Yadav/flipper.git
cd flipper

# Backend
uv sync

# Frontend
cd frontend && npm install

2. Configure environment

Create a .env file at the project root:

# OCR
OCR_API_KEY=your_openrouter_key
OCR_BASE_URL=https://openrouter.ai/api/v1
OCR_MODEL=google/gemini-flash-1.5

# TTS
SMALLEST_API_KEY=your_smallestai_key
SMALLEST_VOICE_ID=magnus

# Camera (0 = default webcam)
CAMERA_INDEX=0

# Arm (optional)
FLIP_SCRIPT=/path/to/flipper/arm/SO100-Motion-Recorder-Playback/replay_poses.py
FLIP_CONDA_ENV=lerobot

3. Run

Terminal 1 — backend:

cd backend
uv run uvicorn server:app --reload

Terminal 2 — frontend:

cd frontend
npm run dev

Open http://localhost:3000.

API endpoints

Method	Path	Description
GET	`/stream?camera=0`	MJPEG live camera feed
GET	`/cameras`	List available cameras
POST	`/capture?camera=0`	Capture page + OCR
POST	`/flip?camera=0`	Trigger arm flip + capture + OCR
POST	`/tts`	Synthesize text → WAV audio
GET	`/image?path=...`	Serve a captured image

Robotic arm setup

The arm scripts live in arm/SO100-Motion-Recorder-Playback/.

Record a flip motion:

conda run -n lerobot python3 arm/SO100-Motion-Recorder-Playback/record_poses.py

Test replay manually:

conda run -n lerobot python3 arm/SO100-Motion-Recorder-Playback/replay_poses.py

Once verified, set FLIP_SCRIPT in .env to the absolute path of replay_poses.py and the Flip button will trigger it automatically.

CLI mode

Run Flipper without the frontend:

cd backend
uv run python main.py          # normal mode
uv run python main.py --logs   # with debug logging

Keyboard shortcuts: c capture · p play · q quit

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
arm/SO100-Motion-Recorder-Playback		arm/SO100-Motion-Recorder-Playback
backend		backend
frontend		frontend
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
test-img.png		test-img.png
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Flipper

How it works

Project structure

Requirements

Setup

1. Clone and install

2. Configure environment

3. Run

API endpoints

Robotic arm setup

CLI mode

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Flipper

How it works

Project structure

Requirements

Setup

1. Clone and install

2. Configure environment

3. Run

API endpoints

Robotic arm setup

CLI mode

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages