feat(tts): support per-request instructions and params by localai-bot · Pull Request #10172 · mudler/LocalAI

localai-bot · 2026-06-04T08:30:21Z

Summary

The OpenAI-compatible TTS endpoint (POST /v1/audio/speech) accepts an instructions field, but it was silently dropped at the HTTP→gRPC boundary: neither schema.TTSRequest nor the gRPC TTSRequest proto carried it, so backends could only read such a value from static YAML options (identical for every request). This blocked:

Per-line emotion/style (Qwen3-TTS CustomVoice, Chatterbox): same speaker, different tone per request.
VoiceDesign / describe-a-voice (Qwen3-TTS VoiceDesign): the instruction string is the voice, so a model config was limited to a single designed voice.

This PR plumbs a generic per-request instruction string end to end, plus an optional backend-specific params map.

Changes

proto (backend/backend.proto): add optional string instructions = 6 and map<string,string> params = 7 to TTSRequest.
schema (core/schema/localai.go): add Instructions (maps the OpenAI instructions field) and Params (LocalAI extension).
core (core/backend/tts.go): thread both through ModelTTS/ModelTTSStream via a newTTSRequest helper that attaches instructions only when non-empty (so backends can fall back to YAML when unset). Forwarded from the /v1/audio/speech handler; other callers (cli, elevenlabs, realtime) pass empty values.
qwen-tts (backend/python/qwen-tts/backend.py): prefer the per-request instruction over the YAML instruct option (used by both mode detection and generation) and merge per-request params.
chatterbox (backend/python/chatterbox/backend.py): merge per-request params (coerced to float/int/bool) over YAML options into generate() kwargs.
docs + regenerated swagger.

Backward compatibility

Fully compatible: empty instructions falls back to the YAML option, and backends that don't support style/voice instructions simply ignore the field. The params map values arrive as strings and are coerced by the backend.

Example

curl http://localhost:8080/v1/audio/speech -H "Content-Type: application/json" -d '{
  "model": "qwen-tts-design",
  "input": "Hello world, this is a test.",
  "instructions": "A calm, low-pitched elderly storyteller with a warm tone."
}'

Testing

New Ginkgo specs for newTTSRequest (instructions attached/omitted, params forwarded/nil); updated existing ctx-propagation specs for the new signature.
core/backend, core/schema, core/http/endpoints/localai test suites pass.
go vet and golangci-lint --new-from-merge-base=master clean (0 issues); both Python backends py_compile clean.

Generated proto bindings (pkg/grpc/proto/*) are gitignored and rebuilt by CI; Python backend_pb2 is regenerated from backend.proto at backend build time.

Assisted-by: Claude:claude-opus-4-8 [Claude Code]

The OpenAI-compatible TTS endpoint accepts an `instructions` field, but it was silently dropped at the HTTP->gRPC boundary: neither schema.TTSRequest nor the gRPC TTSRequest proto carried it, so backends could only read such a value from static YAML options (identical for every request). This blocked per-line emotion/style and, for Qwen3-TTS VoiceDesign, limited a model config to a single designed voice. Plumb a generic per-request instruction string end to end, plus an optional backend-specific params map: - proto: add `optional string instructions` and `map<string,string> params` to TTSRequest. - schema: add Instructions (maps OpenAI `instructions`) and Params (LocalAI extension) to schema.TTSRequest. - core: thread both through ModelTTS/ModelTTSStream via a newTTSRequest helper that attaches instructions only when non-empty (so backends can fall back to YAML when unset); forward them from the /v1/audio/speech handler. - qwen-tts: prefer the per-request instruction over the YAML `instruct` option (used by both mode detection and generation) and merge per-request params. - chatterbox: merge per-request params (coerced to float/int/bool) over YAML options into generate() kwargs. Fully backward compatible: empty instructions fall back to the YAML option and backends that don't support style/voice instructions ignore the field. Closes #10164 Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Assisted-by: Claude:claude-opus-4-8 [Claude Code]

mudler merged commit 27e63b9 into master Jun 4, 2026
73 checks passed

mudler deleted the worktree-tts-per-request-instructions branch June 4, 2026 09:45

localai-bot added the enhancement New feature or request label Jun 10, 2026

BrewTestBot mentioned this pull request Jun 10, 2026

localai 4.4.0 Homebrew/homebrew-core#287347

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(tts): support per-request instructions and params#10172

feat(tts): support per-request instructions and params#10172
mudler merged 1 commit into
masterfrom
worktree-tts-per-request-instructions

localai-bot commented Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented Jun 4, 2026

Summary

Changes

Backward compatibility

Example

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants