Paste a URL or upload a PDF. Yapit renders the document and reads it aloud.
- Handles the documents other TTS tools can't: academic papers with math, citations, figures, tables, messy formatting. Equations get spoken descriptions, citations become prose, page noise is skipped. The original content displays faithfully.
- 170+ voices across 15 languages. Premium voices or free local synthesis that runs entirely in your browser, no account needed.
- Vim-style keyboard shortcuts, document outliner, media key support, adjustable speed, dark mode, share by link.
Powered by Gemini, Kokoro, Inworld TTS, DocLayout-YOLO, defuddle.
git clone https://github.com/yapit-tts/yapit.git && cd yapit
cp .env.selfhost.example .env.selfhost
make self-hostOpen http://localhost and create an account. Data persists across restarts.
.env.selfhost is self-documenting — see the comments for optional features (Gemini extraction, Inworld voices, RunPod overflow).
Multi-worker GPU setup:
Workers are pull-based — any machine with Redis access can run them. Connect from the local network or via Tailscale, for example. GPU and CPU workers run side-by-side; faster workers naturally pull more jobs. Scale by running more containers on any machine that can reach Redis.
Prereq: Docker 25+, nvidia-container-toolkit with CDI enabled, network access to the Redis instance.
# One-time GPU setup: generate CDI spec + enable CDI in Docker
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
# Add {"features": {"cdi": true}} to /etc/docker/daemon.json, then:
sudo systemctl restart docker
git clone --depth 1 https://github.com/yapit-tts/yapit.git && cd yapit
# Pull only the images you need
docker compose -f docker-compose.worker.yml pull kokoro-gpu yolo-gpu
# Start 2 Kokoro + 1 YOLO worker
REDIS_URL=redis://<host>:6379/0 docker compose -f docker-compose.worker.yml up -d \
--scale kokoro-gpu=2 --scale yolo-gpu=1 kokoro-gpu yolo-gpuAdjust --scale to your GPU. A 4GB card fits 2 Kokoro + 1 YOLO comfortably.
NVIDIA MPS (recommended for multiple workers per GPU)
MPS lets multiple workers share one GPU context — less VRAM overhead, no context switching. Without MPS, each worker gets its own CUDA context (~300MB each). The compose file mounts the MPS pipe automatically; just start the daemon.
sudo tee /etc/systemd/system/nvidia-mps.service > /dev/null <<'EOF'
[Unit]
Description=NVIDIA Multi-Process Service (MPS)
After=nvidia-persistenced.service
[Service]
Type=forking
ExecStart=/usr/bin/nvidia-cuda-mps-control -d
ExecStop=/bin/sh -c 'echo quit | /usr/bin/nvidia-cuda-mps-control'
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now nvidia-mpsTo stop: make self-host-down.
Now:
- Launch
Next:
- Support uploading images, EPUB.
- Support AI-transform for websites.
- Support exporting audio as MP3.
Later:
- Better support for self-hosting (better modularity for adding voices, extraction methods, documentation)
- Support thinking parameter for Gemini
- Support temperature parameter for Inworld
make dev-cpu # start backend services (Docker Compose)
cd frontend && npm run dev # start frontend
make test-local # run testsSee agent/knowledge/dev-setup.md for full setup instructions.
The agent/knowledge/ directory is the project's in-depth knowledge base, maintained jointly with Claude during development.