Skip to content

fix: batch of bug fixes from issue tracker#295

Merged
jamiepine merged 9 commits intomainfrom
fix/misc-bugs
Mar 17, 2026
Merged

fix: batch of bug fixes from issue tracker#295
jamiepine merged 9 commits intomainfrom
fix/misc-bugs

Conversation

@jamiepine
Copy link
Owner

@jamiepine jamiepine commented Mar 17, 2026

Summary

Fixes 7 bugs from the issue tracker in one pass.

Issue Fix
#233 Add model and expanded language params to /transcribe endpoint
#290 Fix [object Object] error messages by normalizing error detail objects
#231 Fix generation list not updating on completion (use refetchQueries, handle SSE onerror, reset page)
#286 Upgrade pip in Docker build to fix hash mismatch on Qwen3-TTS deps
#248 Handle client disconnects in SSE/streaming endpoints to suppress broken pipe errors
#289 Upgrade CUDA build from cu121 to cu126 for Blackwell (RTX 50-series) GPU support
#278 Move sample upload audio processing to thread pool so it doesn't block the event loop and crash the server

Notes

Closes #233, #290, #231, #286, #248, #289, #278

Summary by CodeRabbit

  • New Features

    • Choose Whisper model size for transcription (base, small, medium, large, turbo)
    • Added support for Japanese, Korean, German, French, Russian, Portuguese, Spanish, and Italian
  • Bug Fixes

    • Reset history table pagination when a generation completes
    • Improved handling of client disconnects during streaming
  • Improvements

    • More robust sample upload with chunked transfers and 50 MB max size

Use refetchQueries instead of invalidateQueries for more reliable history
refresh. Add history refetch to SSE onerror handler so dropped connections
don't leave the list stale. Reset page to 0 in HistoryTable when a pending
generation completes.

Closes #231
Fixes hash mismatch when pip resolves Qwen3-TTS transitive deps.

Closes #286
Wrap SSE generators with BrokenPipeError/ConnectionResetError handling
so client disconnects during generation status polling, download progress,
or audio streaming don't produce unhandled Errno 32 errors.

Closes #248
cu121 only ships kernels up to SM 9.0 (Ada Lovelace). RTX 50-series
(Blackwell, SM 12.0) and RTX 6000 Pro need cu126 which includes SM 12.0
support while remaining backward compatible with older GPUs.

Closes #289
Move audio validation and saving to thread pool so librosa/ffmpeg decoding
doesn't block the async event loop. Combine validate + load into a single
pass to avoid decoding the file twice. Add 50 MB upload limit and chunked
reads to prevent unbounded memory allocation.

Closes #278
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 17, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 50197c67-bcd0-44be-8a01-01c9498a1194

📥 Commits

Reviewing files that changed from the base of the PR and between d35e6f0 and df50b8a.

⛔ Files ignored due to path filters (1)
  • tauri/src-tauri/gen/voicebox.icns is excluded by !**/gen/**
📒 Files selected for processing (2)
  • .github/workflows/release.yml
  • .gitignore

📝 Walkthrough

Walkthrough

This PR wires an optional Whisper model-size selection end-to-end (frontend → API → backends), expands supported transcription languages, adds chunked profile uploads with size limits, improves SSE/client-disconnect handling, upgrades CUDA wheel targets to 12.6, and adds a pip upgrade step in the backend Docker build.

Changes

Cohort / File(s) Summary
CI / Docker
.github/workflows/release.yml, Dockerfile
Windows CUDA wheel targets updated from cu121→cu126; torchaudio install flags adjusted; added pip install --upgrade pip step in backend builder.
Frontend API & Types
app/src/lib/api/types.ts, app/src/lib/api/client.ts
Added WhisperModelSize type; transcribeAudio accepts optional model; introduced formatErrorDetail for consistent error formatting.
Frontend Hooks
app/src/lib/hooks/useTranscription.ts, app/src/lib/hooks/useGenerationProgress.ts
useTranscription passes optional model through to API; replaced invalidateQueries with refetchQueries in history-refresh locations.
Frontend UI
app/src/components/History/HistoryTable.tsx
Detects generation completion (pending count decrease) to reset history pagination to page 0.
Backend API Types
backend/models.py
Expanded TranscriptionRequest.language whitelist to include: en, zh, ja, ko, de, fr, ru, pt, es, it; added optional model field validated against (base, small, medium, large, turbo).
Backend Transcription Path
backend/backends/__init__.py, backend/backends/mlx_backend.py, backend/backends/pytorch_backend.py, backend/routes/transcription.py
Extended transcribe(..., model_size=None) signatures and wired model_size through model loading and transcription; endpoint accepts and validates model and can trigger background model download / 202 response when needed.
Backend Streaming & Progress
backend/routes/generations.py, backend/utils/progress.py
Added module logger and robust handling for BrokenPipeError, ConnectionResetError, asyncio.CancelledError in SSE and audio streaming loops to log and gracefully handle client disconnects.
Backend Profiles / Audio
backend/routes/profiles.py, backend/services/profiles.py, backend/utils/audio.py
Implemented chunked profile uploads with SAMPLE_MAX_FILE_SIZE = 50 MB and SAMPLE_UPLOAD_CHUNK_SIZE = 1 MB; added validate_and_load_reference_audio(...) that returns (is_valid, error_msg, audio, sr); refactored profile sample handling to use async validation and cleanup on failures.
Repository
.gitignore
Added ignore for tauri/src-tauri/gen/voicebox.icns.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Frontend as Frontend
    participant API as Backend API
    participant Backend as STT Backend
    participant ModelStore as Model Loader/Cache

    User->>Frontend: Upload audio + select model
    Frontend->>API: transcribeAudio(file, language, model)
    activate API
    API->>ModelStore: validate model against WHISPER_HF_REPOS
    alt Model already loaded
        API->>Backend: transcribe(audio_path, language, model_size)
    else Model not loaded
        API-->>Frontend: 202 Accepted (model download in background)
        API->>ModelStore: trigger load_model_async(model_size)
    end
    Backend->>ModelStore: ensure model available
    Backend->>Backend: run transcription using model_size
    Backend-->>API: transcription result
    API-->>Frontend: transcription result
    deactivate API
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Poem

🐰 I hopped through code to add a choice so fine,
Pick base or turbo and let the models shine.
Chunks guarded uploads, streams no longer fall,
CUDA wheels upgraded — I celebrate it all! 🎶

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'fix: batch of bug fixes from issue tracker' is vague and uses a non-descriptive term ('batch of bug fixes') that doesn't convey meaningful information about the specific changes, despite the PR addressing seven distinct issues across multiple systems. Consider using a more specific title that highlights the primary changes, such as 'fix: add Whisper model selection and improve error handling' or similar, to better reflect the main objectives of this changeset.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 80.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/misc-bugs
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
backend/routes/profiles.py (1)

121-147: ⚠️ Potential issue | 🟡 Minor

Temp file leak if an exception occurs during chunked upload.

If an exception is raised inside the with block (e.g., IOError on tmp.write() or network error on file.read()) before the explicit size-limit check, the temp file won't be cleaned up because the try/finally block only starts after the with block exits.

🛡️ Proposed fix: wrap the entire operation in a single try/finally
-    with tempfile.NamedTemporaryFile(suffix=file_suffix, delete=False) as tmp:
+    tmp_path = None
+    try:
+      with tempfile.NamedTemporaryFile(suffix=file_suffix, delete=False) as tmp:
+        tmp_path = tmp.name
         total_size = 0
         while chunk := await file.read(SAMPLE_UPLOAD_CHUNK_SIZE):
             total_size += len(chunk)
             if total_size > SAMPLE_MAX_FILE_SIZE:
-                Path(tmp.name).unlink(missing_ok=True)
                 raise HTTPException(
                     status_code=413,
                     detail=f"File too large (max {SAMPLE_MAX_FILE_SIZE // (1024 * 1024)} MB)",
                 )
             tmp.write(chunk)
-        tmp_path = tmp.name
 
-    try:
-        sample = await profiles.add_profile_sample(
+      sample = await profiles.add_profile_sample(
             profile_id,
             tmp_path,
             reference_text,
             db,
-        )
-        return sample
+      )
+      return sample
     except ValueError as e:
         raise HTTPException(status_code=400, detail=str(e))
     except Exception as e:
         raise HTTPException(status_code=500, detail=f"Failed to process audio file: {str(e)}")
     finally:
-        Path(tmp_path).unlink(missing_ok=True)
+        if tmp_path:
+            Path(tmp_path).unlink(missing_ok=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/profiles.py` around lines 121 - 147, The temp file can leak if
an exception occurs inside the with tempfile.NamedTemporaryFile block because
the current try/finally starts after the with; wrap the whole read/write and
processing in a single try/finally so cleanup always runs: declare tmp_path =
None before the try, move the with tempfile.NamedTemporaryFile(...) as tmp and
the chunked read loop (using SAMPLE_UPLOAD_CHUNK_SIZE and SAMPLE_MAX_FILE_SIZE)
and the call to profiles.add_profile_sample(...) into the try, catch specific
exceptions (ValueError) and general exceptions as currently done, and in the
finally call Path(tmp_path).unlink(missing_ok=True) only if tmp_path is set to
ensure the temp file is always removed even if file.read() or tmp.write()
raises.
app/src/lib/hooks/useGenerationProgress.ts (1)

16-20: ⚠️ Potential issue | 🟡 Minor

Docstring still references invalidateQueries but implementation now uses refetchQueries.

The comment on line 18 says "invalidates the history query" but the code now calls refetchQueries. Update the docstring to reflect the new behavior.

📝 Suggested fix
 /**
  * Subscribes to SSE for all pending generations. When a generation completes,
- * invalidates the history query, removes it from pending, and auto-plays
+ * refetches the history query, removes it from pending, and auto-plays
  * if the player is idle.
  */
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/src/lib/hooks/useGenerationProgress.ts` around lines 16 - 20, Update the
docstring in useGenerationProgress.ts to reflect that the hook now calls
refetchQueries rather than invalidating the cache: change the description that
currently says "invalidates the history query" to state that it "refetches the
history query" (or similar), and ensure the comment mentions refetching the
history query and removing pending entries and auto-playing when the player is
idle to match the behavior implemented in useGenerationProgress.
🧹 Nitpick comments (5)
backend/services/profiles.py (2)

120-121: Consider moving asyncio import to module level.

While local imports can reduce startup overhead, asyncio is a standard library module with negligible import cost. Moving it to the top of the file with other imports would follow the conventional Python import style.

✨ Optional: Move asyncio to top-level imports
 from typing import List, Optional
 from datetime import datetime
+import asyncio
 import uuid
 import shutil

Then remove the local import:

-    import asyncio
-
     profile = db.query(DBVoiceProfile).filter_by(id=profile_id).first()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/services/profiles.py` around lines 120 - 121, Move the local "import
asyncio" out of the function and add a top-level "import asyncio" with the other
module imports; then delete the local "import asyncio" statement (the one shown
in the diff) so all uses in this module rely on the single module-level asyncio
import.

25-25: Remove unused load_audio import from line 25.

The load_audio function is no longer called in this file after the refactor to use validate_and_load_reference_audio. Remove it from the import statement.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/services/profiles.py` at line 25, The import statement currently
brings in load_audio which is unused after switching to
validate_and_load_reference_audio; update the import on the line that references
validate_reference_audio, validate_and_load_reference_audio, load_audio,
save_audio by removing load_audio so only the used symbols
(validate_reference_audio, validate_and_load_reference_audio, save_audio) are
imported.
backend/utils/audio.py (1)

226-256: Good refactor for single-pass validation and loading.

This eliminates redundant audio decoding by combining validation and loading into one function. The 4-tuple return type is consistently applied across all code paths.

One minor suggestion from static analysis: prefer f-string conversion flag over explicit str() call.

✨ Optional: Use explicit conversion flag
     except Exception as e:
-        return False, f"Error validating audio: {str(e)}", None, None
+        return False, f"Error validating audio: {e!s}", None, None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/utils/audio.py` around lines 226 - 256, In
validate_and_load_reference_audio, update the exception formatting to use the
f-string conversion flag instead of calling str() directly; replace the current
return in the except block that builds the error message using str(e) with an
f-string that uses the conversion flag (e.g., {e!s}) so the message becomes
f"Error validating audio: {e!s}" while leaving the rest of the return tuple
unchanged.
backend/routes/profiles.py (1)

123-131: Consider offloading blocking file writes to a thread pool.

The PR objective mentions moving audio processing to a thread pool to avoid blocking the async event loop. While file.read() is async, tmp.write(chunk) is a synchronous blocking call. For a 50 MB upload with 1 MB chunks, this results in 50 blocking write operations on the event loop.

♻️ Proposed fix using run_in_executor
+import asyncio
+from functools import partial
+
 # In the upload handler:
-            tmp.write(chunk)
+            loop = asyncio.get_running_loop()
+            await loop.run_in_executor(None, tmp.write, chunk)

Alternatively, consider using aiofiles for fully async file I/O if it's already a project dependency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/profiles.py` around lines 123 - 131, The sync tmp.write(chunk)
inside the upload loop blocks the async event loop; change it to run the write
in the threadpool (use loop.run_in_executor or
asyncio.get_running_loop().run_in_executor) so writes are executed off the event
loop while keeping the async file.read(...) loop and the total_size checks
intact (refer to SAMPLE_UPLOAD_CHUNK_SIZE, SAMPLE_MAX_FILE_SIZE, file.read,
tmp.write and Path(tmp.name).unlink). Alternatively, replace the temp file usage
with aiofiles to perform async writes if the project already depends on
aiofiles; ensure the error/cleanup path (unlink on oversized) still runs
correctly.
backend/routes/transcription.py (1)

48-49: Consider exposing a public method for cache checking.

Line 49 accesses the private method _is_model_cached. While this works, accessing private methods directly couples this route to implementation details of the whisper model class. Consider adding a public is_model_cached(model_size) method to the whisper model interface.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@backend/routes/transcription.py` around lines 48 - 49, The route currently
inspects a private member by calling whisper_model._is_model_cached(model_size);
instead add and use a public method like
whisper_model.is_model_cached(model_size) on the whisper model class and update
the route to call that instead of the private method, keeping the existing logic
that checks whisper_model.is_loaded() and whisper_model.model_size; implement
the new public is_model_cached(model_size) method to wrap the existing private
check so callers no longer rely on _is_model_cached.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/release.yml:
- Around line 192-195: The torchaudio install in the "Install PyTorch with CUDA
12.6" workflow step omits --force-reinstall, causing potential pip cache
inconsistency with torch; update the pip install command for torchaudio (the
line starting with "pip install torchaudio --index-url
https://download.pytorch.org/whl/cu126") to include --force-reinstall so both
packages are installed deterministically from the cu126 index.

---

Outside diff comments:
In `@app/src/lib/hooks/useGenerationProgress.ts`:
- Around line 16-20: Update the docstring in useGenerationProgress.ts to reflect
that the hook now calls refetchQueries rather than invalidating the cache:
change the description that currently says "invalidates the history query" to
state that it "refetches the history query" (or similar), and ensure the comment
mentions refetching the history query and removing pending entries and
auto-playing when the player is idle to match the behavior implemented in
useGenerationProgress.

In `@backend/routes/profiles.py`:
- Around line 121-147: The temp file can leak if an exception occurs inside the
with tempfile.NamedTemporaryFile block because the current try/finally starts
after the with; wrap the whole read/write and processing in a single try/finally
so cleanup always runs: declare tmp_path = None before the try, move the with
tempfile.NamedTemporaryFile(...) as tmp and the chunked read loop (using
SAMPLE_UPLOAD_CHUNK_SIZE and SAMPLE_MAX_FILE_SIZE) and the call to
profiles.add_profile_sample(...) into the try, catch specific exceptions
(ValueError) and general exceptions as currently done, and in the finally call
Path(tmp_path).unlink(missing_ok=True) only if tmp_path is set to ensure the
temp file is always removed even if file.read() or tmp.write() raises.

---

Nitpick comments:
In `@backend/routes/profiles.py`:
- Around line 123-131: The sync tmp.write(chunk) inside the upload loop blocks
the async event loop; change it to run the write in the threadpool (use
loop.run_in_executor or asyncio.get_running_loop().run_in_executor) so writes
are executed off the event loop while keeping the async file.read(...) loop and
the total_size checks intact (refer to SAMPLE_UPLOAD_CHUNK_SIZE,
SAMPLE_MAX_FILE_SIZE, file.read, tmp.write and Path(tmp.name).unlink).
Alternatively, replace the temp file usage with aiofiles to perform async writes
if the project already depends on aiofiles; ensure the error/cleanup path
(unlink on oversized) still runs correctly.

In `@backend/routes/transcription.py`:
- Around line 48-49: The route currently inspects a private member by calling
whisper_model._is_model_cached(model_size); instead add and use a public method
like whisper_model.is_model_cached(model_size) on the whisper model class and
update the route to call that instead of the private method, keeping the
existing logic that checks whisper_model.is_loaded() and
whisper_model.model_size; implement the new public is_model_cached(model_size)
method to wrap the existing private check so callers no longer rely on
_is_model_cached.

In `@backend/services/profiles.py`:
- Around line 120-121: Move the local "import asyncio" out of the function and
add a top-level "import asyncio" with the other module imports; then delete the
local "import asyncio" statement (the one shown in the diff) so all uses in this
module rely on the single module-level asyncio import.
- Line 25: The import statement currently brings in load_audio which is unused
after switching to validate_and_load_reference_audio; update the import on the
line that references validate_reference_audio,
validate_and_load_reference_audio, load_audio, save_audio by removing load_audio
so only the used symbols (validate_reference_audio,
validate_and_load_reference_audio, save_audio) are imported.

In `@backend/utils/audio.py`:
- Around line 226-256: In validate_and_load_reference_audio, update the
exception formatting to use the f-string conversion flag instead of calling
str() directly; replace the current return in the except block that builds the
error message using str(e) with an f-string that uses the conversion flag (e.g.,
{e!s}) so the message becomes f"Error validating audio: {e!s}" while leaving the
rest of the return tuple unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 43986149-0c4d-4e66-9d61-c336266fb382

📥 Commits

Reviewing files that changed from the base of the PR and between a2adc3b and d35e6f0.

📒 Files selected for processing (17)
  • .github/workflows/release.yml
  • Dockerfile
  • app/src/components/History/HistoryTable.tsx
  • app/src/lib/api/client.ts
  • app/src/lib/api/types.ts
  • app/src/lib/hooks/useGenerationProgress.ts
  • app/src/lib/hooks/useTranscription.ts
  • backend/backends/__init__.py
  • backend/backends/mlx_backend.py
  • backend/backends/pytorch_backend.py
  • backend/models.py
  • backend/routes/generations.py
  • backend/routes/profiles.py
  • backend/routes/transcription.py
  • backend/services/profiles.py
  • backend/utils/audio.py
  • backend/utils/progress.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transcribe audio API

1 participant