fix: Chatterbox float64 dtype mismatch + model unload button#264
fix: Chatterbox float64 dtype mismatch + model unload button#264
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughAdds an unload-by-name API and UI flow, moves chatterbox model assignment to a local variable and applies post-load monkey-patches (S3Tokenizer.log_mel_spectrogram, VoiceEncoder.forward) before publishing the model to avoid half-initialized state. Changes
Sequence Diagram(s)sequenceDiagram
participant User as User (UI)
participant Frontend as Frontend (ModelManagement)
participant Api as ApiClient / HTTP
participant Server as Backend API (/models/{name}/unload)
participant Loader as Model Backend (e.g., chatterbox_turbo)
participant Model as In-memory Model
User->>Frontend: Click "Unload"
Frontend->>Api: POST /models/{name}/unload
Api->>Server: HTTP request
Server->>Loader: dispatch unload for model_name
Loader->>Model: perform unload / release resources
Model-->>Loader: success
Loader-->>Server: unloaded
Server-->>Api: 200 { message }
Api-->>Frontend: response
Frontend->>Frontend: invalidate model status queries / update UI
sequenceDiagram
participant Loader as Model Loader
participant ModelLib as Chatterbox Library
participant S3Tok as S3Tokenizer
participant VEnc as VoiceEncoder
Loader->>ModelLib: load model instance (into local `model`)
ModelLib-->>Loader: model instance
Loader->>S3Tok: wrap log_mel_spectrogram -> cast audio to float32
Loader->>VEnc: wrap forward -> cast mel inputs to float32
Loader-->>ModelLib: bind patched methods to local `model`
Loader->>Loader: assign `self.model = model` (publish) and complete load
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
backend/backends/chatterbox_backend.py (1)
186-194: Consider extracting this monkey-patch into a shared helper.This same block appears in both Chatterbox backends; centralizing it would reduce drift and keep behavior consistent across model variants.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/chatterbox_backend.py` around lines 186 - 194, Extract the repeated monkey-patch into a shared helper function (e.g., patch_ve_forward_to_float) and replace the inline block in Chatterbox backends with a call to that helper; the helper should accept the model.ve instance, capture the original forward (using the same getattr/hasattr check you currently use to set _orig_ve_forward), define the wrapper (currently _f32_forward) that calls the original with mels.float(), and attach it back with types.MethodType so the behavior is identical across model variants.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@backend/backends/chatterbox_backend.py`:
- Around line 186-194: Extract the repeated monkey-patch into a shared helper
function (e.g., patch_ve_forward_to_float) and replace the inline block in
Chatterbox backends with a call to that helper; the helper should accept the
model.ve instance, capture the original forward (using the same getattr/hasattr
check you currently use to set _orig_ve_forward), define the wrapper (currently
_f32_forward) that calls the original with mels.float(), and attach it back with
types.MethodType so the behavior is identical across model variants.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ae0b9a43-c070-47fa-82a4-bd2f62d18052
📒 Files selected for processing (2)
backend/backends/chatterbox_backend.pybackend/backends/chatterbox_turbo_backend.py
The previous approach of patching librosa.load didn't work because melspectrogram itself performs float64 math (numpy dot, signal.lfilter) regardless of input dtype. The actual mismatch happens when pack() creates a float64 tensor from the mel arrays and passes it into the float32 LSTM weights in VoiceEncoder.forward(). Fix by monkey-patching VoiceEncoder.forward() to call mels.float() before the LSTM, ensuring the input always matches the model dtype.
2ec8ac1 to
7abdd7c
Compare
- POST /models/{model_name}/unload — unloads a specific model from
memory without deleting from disk, supports all engine types
- Frontend: Unload button in model detail dialog when model is loaded
- Delete button remains disabled while loaded (unload first)
7abdd7c to
cac80f6
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/src/components/ServerSettings/ModelManagement.tsx`:
- Around line 308-315: The onSuccess handler for the unload call currently
ignores the API response and always shows "Model unloaded"; change the signature
in the onSuccess callback to use the real response (replace _data with e.g.
data) and use data.message (or branch on it) when calling toast so the toast
reflects the API's actual success/no-op message; keep the existing
queryClient.invalidateQueries/refetchQueries calls (they can remain as-is) and
ensure you still pass modelName where needed for context in the toast if
desired.
In `@backend/backends/chatterbox_backend.py`:
- Around line 181-195: The patching of VoiceEncoder.forward should happen on a
local model variable before assigning to self.model so a failure doesn't leave
self.model partially initialized; in load_model() create a local variable (e.g.,
model), retrieve model.ve as in the diff (_ve), capture the original forward
(_orig_ve_forward), bind the f32 wrapper (_f32_forward) with types.MethodType to
model.ve, and only after successful patching assign self.model = model (leaving
existing logic that checks self.model at Lines 97/100 unchanged). Ensure you
reference and modify model.ve.forward (not self.model) and preserve the existing
_ve, _orig_ve_forward, and _f32_forward naming/behavior when moving the code.
In `@backend/main.py`:
- Around line 1526-1539: The unload path for Chatterbox calls
backend.unload_model() directly while background threads in
chatterbox_backend.py (_generate_sync and _create_prompt_sync) may still access
self.model, causing None derefs; modify the Chatterbox backend (referencing
class methods _generate_sync, _create_prompt_sync and unload_model) to serialize
unloads against in-flight ops by adding either an asyncio.Lock or an atomic
active-ops counter on the backend instance: increment the counter or acquire the
lock at the start of generation entry points, decrement/release when finished
(including error paths), and make unload_model wait for zero active ops or
acquire the same lock before clearing self.model; then update the API call sites
that use get_tts_backend_for_engine(...).unload_model() to rely on the backend’s
guarded unload (no changes to the API surface), ensuring unload is blocked until
current generations complete.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 416b06e1-816c-41c5-b2dc-2d428fd0965d
📒 Files selected for processing (5)
app/src/components/ServerSettings/ModelManagement.tsxapp/src/lib/api/client.tsbackend/backends/chatterbox_backend.pybackend/backends/chatterbox_turbo_backend.pybackend/main.py
🚧 Files skipped from review as they are similar to previous changes (1)
- backend/backends/chatterbox_turbo_backend.py
| onSuccess: async (_data, modelName) => { | ||
| toast({ | ||
| title: 'Model unloaded', | ||
| description: `${modelName} has been unloaded from memory.`, | ||
| }); | ||
| await queryClient.invalidateQueries({ queryKey: ['modelStatus'], refetchType: 'all' }); | ||
| await queryClient.refetchQueries({ queryKey: ['modelStatus'] }); | ||
| }, |
There was a problem hiding this comment.
Use the API response message for unload toasts.
The new unload endpoint returns a normal success payload for stale/no-op unloads, but this handler ignores _data.message, so the UI still says the unload succeeded even when nothing changed. Toast the returned message (or branch on it) so the modal stays truthful.
Suggested fix
- onSuccess: async (_data, modelName) => {
+ onSuccess: async (data, modelName) => {
toast({
- title: 'Model unloaded',
- description: `${modelName} has been unloaded from memory.`,
+ title: data.message.includes('is not loaded') ? 'Model already unloaded' : 'Model unloaded',
+ description: data.message,
});🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/src/components/ServerSettings/ModelManagement.tsx` around lines 308 -
315, The onSuccess handler for the unload call currently ignores the API
response and always shows "Model unloaded"; change the signature in the
onSuccess callback to use the real response (replace _data with e.g. data) and
use data.message (or branch on it) when calling toast so the toast reflects the
API's actual success/no-op message; keep the existing
queryClient.invalidateQueries/refetchQueries calls (they can remain as-is) and
ensure you still pass modelName where needed for context in the toast if
desired.
| elif model_type == "chatterbox": | ||
| from .backends import get_tts_backend_for_engine | ||
| backend = get_tts_backend_for_engine("chatterbox") | ||
| if backend.is_loaded(): | ||
| backend.unload_model() | ||
| else: | ||
| return {"message": f"Model {model_name} is not loaded"} | ||
| elif model_type == "chatterbox_turbo": | ||
| from .backends import get_tts_backend_for_engine | ||
| backend = get_tts_backend_for_engine("chatterbox_turbo") | ||
| if backend.is_loaded(): | ||
| backend.unload_model() | ||
| else: | ||
| return {"message": f"Model {model_name} is not loaded"} |
There was a problem hiding this comment.
Serialize Chatterbox unloads against active generation.
These branches call backend.unload_model() directly even though backend/backends/chatterbox_backend.py, Lines 310-339 still uses self.model from a worker thread during generation. If the user clicks Unload while a Chatterbox request is in flight, self.model can become None mid-run and the request falls over with a 500. Please gate unload behind a per-backend active-ops counter or unload lock before exposing it from the API.
Based on learnings, unloading self.model while _generate_sync or _create_prompt_sync threads hold a direct reference requires an asyncio.Lock or active-ops counter to serialize model swap against in-flight inference.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@backend/main.py` around lines 1526 - 1539, The unload path for Chatterbox
calls backend.unload_model() directly while background threads in
chatterbox_backend.py (_generate_sync and _create_prompt_sync) may still access
self.model, causing None derefs; modify the Chatterbox backend (referencing
class methods _generate_sync, _create_prompt_sync and unload_model) to serialize
unloads against in-flight ops by adding either an asyncio.Lock or an atomic
active-ops counter on the backend instance: increment the counter or acquire the
lock at the start of generation entry points, decrement/release when finished
(including error paths), and make unload_model wait for zero active ops or
acquire the same lock before clearing self.model; then update the API call sites
that use get_tts_backend_for_engine(...).unload_model() to rely on the backend’s
guarded unload (no changes to the API surface), ensuring unload is blocked until
current generations complete.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
app/src/components/ServerSettings/ModelManagement.tsx (1)
308-312:⚠️ Potential issue | 🟡 MinorUse the unload API response message in the toast.
Line 308 ignores the returned payload, so the toast can claim success even when backend responds with “is not loaded”.
Suggested fix
- onSuccess: async (_data, modelName) => { + onSuccess: async (data, modelName) => { toast({ - title: 'Model unloaded', - description: `${modelName} has been unloaded from memory.`, + title: data.message.includes('is not loaded') ? 'Model already unloaded' : 'Model unloaded', + description: data.message || `${modelName} unload request completed.`, }); await queryClient.invalidateQueries({ queryKey: ['modelStatus'], refetchType: 'all' }); await queryClient.refetchQueries({ queryKey: ['modelStatus'] }); },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@app/src/components/ServerSettings/ModelManagement.tsx` around lines 308 - 312, The onSuccess handler in ModelManagement.tsx currently ignores the API response and always shows "Model unloaded" using the modelName param; change it to read the response payload (the _data argument) from the unload API and use its message (e.g., _data.message) for the toast title/description so the UI reflects responses like "is not loaded"; keep a fallback to the modelName if the message is missing and ensure the same toast call (toast({...})) uses the response-derived text instead of assuming success.
🧹 Nitpick comments (1)
backend/main.py (1)
1493-1505: Consider centralizing model-name metadata.The
model_typesmap here duplicates model metadata that is also maintained in other model-management routes, which is easy to drift over time. Extracting one shared constant/helper would reduce mismatch risk.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/main.py` around lines 1493 - 1505, The model_types dict is duplicated across model-management code; extract it into a single shared constant or helper (e.g., MODEL_TYPES or get_model_metadata()) and replace local occurrences of model_types in functions/routes (referencing model_types, model_name lookups) to import and use that centralized symbol; ensure the new module exports the same mapping shape (model_name -> (model_type, model_size)) and update any callers that expect this data so they use the centralized constant/helper to avoid drift.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/main.py`:
- Around line 1513-1518: The unload branch uses tts_model.model_size which is
inconsistent with the rest of the file's authoritative loaded-size signal;
change the comparison to use the module's current loaded-size field (e.g.
tts._current_model_size or the getter used by the status/health path) so the
check becomes tts.get_tts_model().is_loaded() and compare against
tts._current_model_size, then call tts.unload_tts_model() when they match;
update any references in the same block to use the same symbol used by
status/health (tts._current_model_size or its getter) to avoid misreporting Qwen
as not loaded.
---
Duplicate comments:
In `@app/src/components/ServerSettings/ModelManagement.tsx`:
- Around line 308-312: The onSuccess handler in ModelManagement.tsx currently
ignores the API response and always shows "Model unloaded" using the modelName
param; change it to read the response payload (the _data argument) from the
unload API and use its message (e.g., _data.message) for the toast
title/description so the UI reflects responses like "is not loaded"; keep a
fallback to the modelName if the message is missing and ensure the same toast
call (toast({...})) uses the response-derived text instead of assuming success.
---
Nitpick comments:
In `@backend/main.py`:
- Around line 1493-1505: The model_types dict is duplicated across
model-management code; extract it into a single shared constant or helper (e.g.,
MODEL_TYPES or get_model_metadata()) and replace local occurrences of
model_types in functions/routes (referencing model_types, model_name lookups) to
import and use that centralized symbol; ensure the new module exports the same
mapping shape (model_name -> (model_type, model_size)) and update any callers
that expect this data so they use the centralized constant/helper to avoid
drift.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 485c65d3-d13a-4e66-ae25-04bc4ec5f34e
📒 Files selected for processing (5)
app/src/components/ServerSettings/ModelManagement.tsxapp/src/lib/api/client.tsbackend/backends/chatterbox_backend.pybackend/backends/chatterbox_turbo_backend.pybackend/main.py
🚧 Files skipped from review as they are similar to previous changes (2)
- backend/backends/chatterbox_backend.py
- app/src/lib/api/client.ts
The actual dtype mismatch was in S3Tokenizer.log_mel_spectrogram, not VoiceEncoder.forward. librosa.load returns float64 numpy, which torch.from_numpy preserves as double. The STFT output (double) then hits _mel_filters (float32) in a matmul at s3tokenizer.py:163. Now patching both entry points after model load: 1. S3Tokenizer.log_mel_spectrogram — cast audio to float32 before STFT 2. VoiceEncoder.forward — cast mels to float32 before LSTM Remove debug traceback logging (no longer needed).
2cbd56f to
bfd7b81
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
backend/backends/chatterbox_turbo_backend.py (1)
194-212:⚠️ Potential issue | 🟠 MajorKeep the loaded Turbo model local until the patching succeeds.
This has the same failure mode as the multilingual backend:
self.modelis assigned at Line 167/Line 173, then the new patch block can still fail on Line 194 through Line 212. If that happens, retries short-circuit at Line 97 and Line 100, and the cold-download flow may already be marked complete at Line 178 and Line 179. Please patch a localmodel, then publish it toself.modeland complete the task state afterward.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/chatterbox_turbo_backend.py` around lines 194 - 212, The patching block mutates self.model in-place and can raise after self.model is already set; instead, load the model into a local variable (e.g., local_model), apply all patches to local_model (patching _tokzr.log_mel_spectrogram via _orig_log_mel/_f32_log_mel and _ve.forward via _orig_ve_forward/_f32_ve_forward), and only after all patches succeed assign local_model to self.model and update whatever "task state" completion code currently runs after model assignment; this ensures retries and cold-download markers (currently guarded around self.model assignment) are not short-circuited by a failed patch.backend/backends/chatterbox_backend.py (1)
188-206:⚠️ Potential issue | 🟠 MajorPatch a local model instance before assigning
self.model.These bindings run after
self.modelhas already been published, so any exception from Line 188 through Line 206 leaves the backend stuck with a partially initialized model. The nextload_model()call then returns early at Line 97 and Line 100, and the cold-download path may already have been marked complete at Line 178 and Line 179.🛠️ Proposed fix
- self.model = ChatterboxMultilingualTTS.from_pretrained( + model = ChatterboxMultilingualTTS.from_pretrained( device=device, ) @@ - self.model = ChatterboxMultilingualTTS.from_pretrained( + model = ChatterboxMultilingualTTS.from_pretrained( device=device, ) @@ - t3_tfmr = self.model.t3.tfmr + t3_tfmr = model.t3.tfmr @@ - if not is_cached: - progress_manager.mark_complete(model_name) - task_manager.complete_download(model_name) - # Patch float64 → float32 dtype mismatches in upstream chatterbox. @@ - _tokzr = self.model.s3gen.tokenizer + _tokzr = model.s3gen.tokenizer @@ - _ve = self.model.ve + _ve = model.ve @@ + self.model = model + if not is_cached: + progress_manager.mark_complete(model_name) + task_manager.complete_download(model_name)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/chatterbox_backend.py` around lines 188 - 206, The patching of tokenizer and VoiceEncoder is being done directly on self.model which can leave self.model partially initialized if an exception occurs; instead, load the model into a local variable (e.g., model = <loaded object>), apply the patches to model.s3gen.tokenizer.log_mel_spectrogram and model.ve.forward (use the existing _f32_log_mel and _f32_ve_forward patterns) and only assign self.model = model after all patches succeed; this ensures load_model’s early-return checks and cold-download flags are not left pointing at a partially-initialized object.
🧹 Nitpick comments (1)
backend/backends/chatterbox_turbo_backend.py (1)
191-212: Extract the dtype monkey-patch into one helper.This block is now duplicated almost verbatim with
backend/backends/chatterbox_backend.py. A shared helper that accepts the loaded model and applies the dtype shims would reduce drift the next time upstream Chatterbox changes these method names or signatures.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/backends/chatterbox_turbo_backend.py` around lines 191 - 212, Extract the duplicated dtype monkey-patch into a single helper function (e.g., apply_dtype_monkeypatch) and call it from both chatterbox_turbo_backend and chatterbox_backend after the model is loaded; the helper should accept the loaded model, patch model.s3gen.tokenizer.log_mel_spectrogram to coerce audio to float (preserving the original via _orig_log_mel) and patch model.ve.forward to call the original with mels.float() (preserving _orig_ve_forward), using types.MethodType to bind the wrappers, and keep the original symbol names (S3Tokenizer / s3gen.tokenizer, log_mel_spectrogram, VoiceEncoder / ve, forward) so future upstream changes are localized to that helper.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@backend/backends/chatterbox_backend.py`:
- Around line 188-206: The patching of tokenizer and VoiceEncoder is being done
directly on self.model which can leave self.model partially initialized if an
exception occurs; instead, load the model into a local variable (e.g., model =
<loaded object>), apply the patches to model.s3gen.tokenizer.log_mel_spectrogram
and model.ve.forward (use the existing _f32_log_mel and _f32_ve_forward
patterns) and only assign self.model = model after all patches succeed; this
ensures load_model’s early-return checks and cold-download flags are not left
pointing at a partially-initialized object.
In `@backend/backends/chatterbox_turbo_backend.py`:
- Around line 194-212: The patching block mutates self.model in-place and can
raise after self.model is already set; instead, load the model into a local
variable (e.g., local_model), apply all patches to local_model (patching
_tokzr.log_mel_spectrogram via _orig_log_mel/_f32_log_mel and _ve.forward via
_orig_ve_forward/_f32_ve_forward), and only after all patches succeed assign
local_model to self.model and update whatever "task state" completion code
currently runs after model assignment; this ensures retries and cold-download
markers (currently guarded around self.model assignment) are not short-circuited
by a failed patch.
---
Nitpick comments:
In `@backend/backends/chatterbox_turbo_backend.py`:
- Around line 191-212: Extract the duplicated dtype monkey-patch into a single
helper function (e.g., apply_dtype_monkeypatch) and call it from both
chatterbox_turbo_backend and chatterbox_backend after the model is loaded; the
helper should accept the loaded model, patch
model.s3gen.tokenizer.log_mel_spectrogram to coerce audio to float (preserving
the original via _orig_log_mel) and patch model.ve.forward to call the original
with mels.float() (preserving _orig_ve_forward), using types.MethodType to bind
the wrappers, and keep the original symbol names (S3Tokenizer / s3gen.tokenizer,
log_mel_spectrogram, VoiceEncoder / ve, forward) so future upstream changes are
localized to that helper.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c178db28-21ff-4cdd-86bb-2041ce15e2c0
📒 Files selected for processing (2)
backend/backends/chatterbox_backend.pybackend/backends/chatterbox_turbo_backend.py
…sed state Apply local-var-then-assign pattern to chatterbox_backend.py (multilingual) to match the turbo backend. Also use _current_model_size fallback in unload, delete, and status endpoints for consistent Qwen model size checks.
Summary
Fixes
expected m1 and m2 to have the same dtype, but got: float != doublewhen generating with Chatterbox (both multilingual and turbo), and adds a model unload button so users can free memory without restarting the backend.Root cause
librosa.load()returns float64 numpy arrays. The upstream chatterbox package converts these to torch tensors viatorch.from_numpy()without casting to float32, creating dtype mismatches in two places:_mel_filtersin a matmulFix
Monkey-patch both entry points after model load to cast inputs to float32:
S3Tokenizer.log_mel_spectrogram—audio.float()before STFTVoiceEncoder.forward—mels.float()before LSTMApplied to both Chatterbox Multilingual and Chatterbox Turbo backends.
Model unload feature
POST /models/{model_name}/unload— unloads any model from memory without deleting from diskSummary by CodeRabbit
New Features
Bug Fixes