-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Description
Description
When selecting Qwen3-TTS 0.6B from the model dropdown and generating speech, the app ignores the selection and downloads/loads the 1.7B model instead. There is also no way to cancel the unwanted download from the UI.
Steps to reproduce
- Install Voicebox v0.1.12
- Download only the Qwen TTS 0.6B model from Model Management
- Select Qwen3-TTS 0.6B in the generation dropdown
- Click generate
- The app starts downloading Qwen TTS 1.7B (3.6 GB) instead of using the already-downloaded 0.6B
Root cause
In backend/main.py, the /generate endpoint calls create_voice_prompt_for_profile() (line 546) before calling load_model_async(model_size) (line 585).
Since create_voice_prompt() in pytorch_backend.py (line 235) calls load_model_async(None), and None falls back to self.model_size which defaults to "1.7B" from the constructor, the 1.7B model gets loaded/downloaded regardless of the user's selection.
Fix
PR #95 reorders the operations so the user's requested model is loaded before the voice prompt is created.
Environment
- Voicebox v0.1.12
- Windows 11
- NVIDIA GeForce GTX 1660 SUPER
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels