caw is a pip-installable CLI TTS tool.
It is designed to:
- take text from the command line
- optionally select a scene-specific reference voice with
--scene - call a configurable OpenAI-compatible TTS endpoint
- write
wavaudio bytes to stdout by default - optionally play or save the generated audio
- discover local reference voices from
src/caw/assets/*.wav
pipx install caw-ttsOr:
python3 -m pip install caw-ttsFor local development, run pipx install . or python3 -m pip install . from
the repository root.
caw help
caw setup
caw upgrade
caw scenes
caw "你好,这是一段测试语音。" > out.wav
caw --play "你好,这是一段测试语音。"
caw --play /usr/bin/aplay "你好,这是一段测试语音。"
caw -o out.wav "你好,这是一段测试语音。"caw setup uses questionary + rich for an interactive configuration flow. It
sets the API key environment variable name, Base URL, TTS model, and default
scene. It can also store an optional PULSE_SERVER value for playback, for
example unix:/tmp/pulse-socket.
If no config file exists, running caw "text" opens caw setup first.
Reference voices are local-only: put .wav files under src/caw/assets/,
using filenames without spaces. The filename stem is the scene name. If no
reference audio is found, caw prints a prompt explaining where to add it.
caw stores config at:
$XDG_CONFIG_HOME/caw/config.json
Default fallback:
~/.config/caw/config.json
Generated audio is not saved unless --output/-o is provided. Without
--play or --output, caw writes WAV bytes to stdout.
The TTS config stores:
{
"tts": {
"api_key_env": "TTS_API_KEY",
"base_url": "https://example.com/v1/",
"model": "your-model"
},
"audio": {
"pulse_server": "unix:/tmp/pulse-socket"
}
}If you use caw on NVIDIA DGX Spark, a driverless USB speaker is the simplest playback option.
DGX Spark exposes USB-C ports and commonly works with USB Audio Class output devices without extra vendor drivers. If caw generates a .wav file correctly but you do not hear audio, verify that the USB speaker is detected and selected as the active output device.
Useful checks:
aplay -l
lsusb
pactl list short sinks
speaker-test -c 2 -t wav
aplay /usr/share/sounds/alsa/Front_Center.wavIf the USB speaker appears in pactl list short sinks, set it as the default output:
pactl set-default-sink <sink_name>Local quality checks are enforced with pre-commit.
Run:
pre-commit run --all-filesThe active checks are:
ruff-checkruff-formatpyrefly-check
The repository keeps test code and CI command code separate:
- application source:
src/caw/ - tests:
tests/
The type check is executed through uv from inside the package directory:
uv run --frozen --with pyrefly==0.47.0 pyrefly check src/cawGitHub Actions CI is split into focused workflows:
CI TestsCI RuffCI PyreflyCI SmokePublish
Each workflow runs the tool command directly instead of wrapping CI steps in custom Python helper scripts.
If you change packaging, CLI entrypoints, Python source under src/caw, or tests under tests/, keep all workflows passing.
Publishing uses .github/workflows/publish.yml.
Configure this repository secret before publishing:
PYPI_API_TOKEN
The secret value should be a PyPI API token for the caw-tts project. Publishing
runs when a GitHub Release is published, and can also be started manually from
the Publish workflow page.