Skip to content

Phil-Fan/caw

Repository files navigation

caw

CI Tests CI Ruff CI Pyrefly CI Smoke Python

caw is a pip-installable CLI TTS tool.

It is designed to:

  • take text from the command line
  • optionally select a scene-specific reference voice with --scene
  • call a configurable OpenAI-compatible TTS endpoint
  • write wav audio bytes to stdout by default
  • optionally play or save the generated audio
  • discover local reference voices from src/caw/assets/*.wav

Install

pipx install caw-tts

Or:

python3 -m pip install caw-tts

For local development, run pipx install . or python3 -m pip install . from the repository root.

Commands

caw help
caw setup
caw upgrade
caw scenes
caw "你好,这是一段测试语音。" > out.wav
caw --play "你好,这是一段测试语音。"
caw --play /usr/bin/aplay "你好,这是一段测试语音。"
caw -o out.wav "你好,这是一段测试语音。"

caw setup uses questionary + rich for an interactive configuration flow. It sets the API key environment variable name, Base URL, TTS model, and default scene. It can also store an optional PULSE_SERVER value for playback, for example unix:/tmp/pulse-socket.

If no config file exists, running caw "text" opens caw setup first. Reference voices are local-only: put .wav files under src/caw/assets/, using filenames without spaces. The filename stem is the scene name. If no reference audio is found, caw prints a prompt explaining where to add it.

Config

caw stores config at:

$XDG_CONFIG_HOME/caw/config.json

Default fallback:

~/.config/caw/config.json

Generated audio is not saved unless --output/-o is provided. Without --play or --output, caw writes WAV bytes to stdout.

The TTS config stores:

{
  "tts": {
    "api_key_env": "TTS_API_KEY",
    "base_url": "https://example.com/v1/",
    "model": "your-model"
  },
  "audio": {
    "pulse_server": "unix:/tmp/pulse-socket"
  }
}

DGX Spark Audio Output

If you use caw on NVIDIA DGX Spark, a driverless USB speaker is the simplest playback option.

DGX Spark exposes USB-C ports and commonly works with USB Audio Class output devices without extra vendor drivers. If caw generates a .wav file correctly but you do not hear audio, verify that the USB speaker is detected and selected as the active output device.

Useful checks:

aplay -l
lsusb
pactl list short sinks
speaker-test -c 2 -t wav
aplay /usr/share/sounds/alsa/Front_Center.wav

If the USB speaker appears in pactl list short sinks, set it as the default output:

pactl set-default-sink <sink_name>

Development

Local quality checks are enforced with pre-commit.

Run:

pre-commit run --all-files

The active checks are:

  • ruff-check
  • ruff-format
  • pyrefly-check

The repository keeps test code and CI command code separate:

  • application source: src/caw/
  • tests: tests/

The type check is executed through uv from inside the package directory:

uv run --frozen --with pyrefly==0.47.0 pyrefly check src/caw

CI

GitHub Actions CI is split into focused workflows:

  • CI Tests
  • CI Ruff
  • CI Pyrefly
  • CI Smoke
  • Publish

Each workflow runs the tool command directly instead of wrapping CI steps in custom Python helper scripts.

If you change packaging, CLI entrypoints, Python source under src/caw, or tests under tests/, keep all workflows passing.

Publish

Publishing uses .github/workflows/publish.yml.

Configure this repository secret before publishing:

PYPI_API_TOKEN

The secret value should be a PyPI API token for the caw-tts project. Publishing runs when a GitHub Release is published, and can also be started manually from the Publish workflow page.

About

CLI tool for cloning voice and playing using XIAOMI MiMo API.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages