fix(whisper): enable Vulkan GPU backend on Linux#319
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughWhisper context initialization now computes a ChangesWhisper Configuration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3027094fc7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@server/services/whisper-local.ts`:
- Around line 393-397: The transcribeOpts currently sets maxThreads =
cpus().length which ignores GPU presence and container CPU quotas; update the
code that builds TranscribeOptions (transcribeOpts) to compute maxThreads using
detectGpu() and the existing config: if detectGpu() returns true default to ~4
threads, otherwise use a clamped value (e.g., min(availableCpusFromCgroupOrOs(),
8)); allow an override from the config system (e.g., config.whisper.maxThreads)
and ensure you read container-aware CPU count (cgroup/quota-aware) when falling
back to OS CPU count before clamping.
- Around line 73-76: The hardcoded 'vulkan' variant passed to initWhisper (in
the contextInitializing assignment that calls initWhisper({ filePath:
modelPath(), useGpu: true }, 'vulkan')) breaks macOS Metal support and can fail
on systems without Vulkan; change this to pick the variant conditionally: use
'default' on macOS to enable Metal, and choose between 'vulkan' or 'cuda' on
Windows/Linux based on detected GPU/drivers (or fall back to 'default' if
detection fails), update the inline comment to reflect auto-selection, and
ensure the initWhisper call and any surrounding error handling handle fallback
gracefully when the chosen variant cannot be loaded.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: d4af591c-8a36-40b5-86a8-752c5f984d16
📒 Files selected for processing (1)
server/services/whisper-local.ts
…acOS - Pass 'vulkan' explicitly on Linux for AMD/Intel GPU acceleration - Use 'default' (Metal) on macOS and other platforms - Add maxThreads: cpus().length for full CPU utilization
ccecc6f to
38448bc
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 38448bc76f
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf6b8b771e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Force-passing 'vulkan' to initWhisper on every Linux host took local STT offline on CPU-only Linux/containers where no Vulkan ICD is present — initWhisper would throw and never load the model. Reuse the existing detectGpu() probe (which already runs nvidia-smi/vulkaninfo) and only request the Vulkan backend when a GPU is actually available; otherwise let the library pick its default backend so CPU fallback works. Addresses the Codex P1 follow-up from daggerhashimoto#319 review.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: af90a50461
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@server/services/whisper-local.ts`:
- Around line 73-78: The backend selection currently uses detectGpu() which
returns true for any GPU (including CUDA-only NVIDIA), causing Linux hosts
without a Vulkan ICD to be forced to 'vulkan' and fail: change the check used
when setting the backend constant so it only picks 'vulkan' when a Vulkan ICD is
actually available (e.g., replace detectGpu() with a Vulkan-specific probe like
detectVulkan() or isVulkanAvailable()); keep detectGpu() for broader GPU
reporting, and ensure initWhisper() still receives undefined when Vulkan is not
available so the library can pick the appropriate default.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: ab3d4219-0c9f-4a61-8465-2b57342fb1d7
📒 Files selected for processing (1)
server/services/whisper-local.ts
|
Pushed a follow-up commit (af90a50) addressing the Codex P1 about Vulkan availability. The change gates the Linux I left the P2 ( |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d88a4f821e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Codex/CodeRabbit pointed out the prior detectGpu() gate is still too permissive: detectGpu() returns true on CUDA-only NVIDIA containers (where nvidia-smi works but no Vulkan ICD is installed), so the original failure mode — initWhisper(..., 'vulkan') throwing at context init — survives on those hosts. Split the check: hasVulkanBackend() probes `vulkaninfo --summary` specifically and is the only gate for selecting the Vulkan backend; detectGpu() stays as the broader "has any GPU?" reporter.
|
Pushed bfefc55 — addresses the follow-up from both Codex P1 and CodeRabbit Major. You're both right that the @coderabbitai full review |
🧩 Analysis chain🏁 Script executed: Length of output: 22883
1.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bfefc554c3
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
🧹 Nitpick comments (2)
server/services/whisper-local.ts (2)
414-418: ⚡ Quick winConsider capping
maxThreadsbased on GPU presence and container limits.
cpus().lengthignores two scenarios:
- GPU active: whisper.cpp recommends ~4 threads when GPU handles compute; extra threads add scheduling overhead.
- Container quotas:
os.cpus()returns host core count, not cgroup limits (--cpus=2on a 32-core host still returns 32).Since
hasVulkanBackend()is now available, a simple improvement:- maxThreads: cpus().length, + maxThreads: hasVulkanBackend() ? 4 : Math.min(cpus().length, 8),For full cgroup awareness, consider reading
/sys/fs/cgroup/cpu.maxor using a library likephysical-cpu-countin a follow-up.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/services/whisper-local.ts` around lines 414 - 418, The transcribe options currently set maxThreads to cpus().length which ignores GPU usage and container CPU quotas; update the code that builds transcribeOpts (the TranscribeOptions object) to cap maxThreads: if hasVulkanBackend() is true set maxThreads = Math.min(4, cpus().length) to avoid excessive threads when GPU is active, and additionally constrain threads by a cgroup-aware CPU count (read /sys/fs/cgroup/cpu.max or use a helper like physical-cpu-count) falling back to cpus().length if cgroup info is unavailable; ensure you reference transcribeOpts and hasVulkanBackend() when making this change.
298-299: 💤 Low value
detectGpu()spawnsvulkaninfoagain despitehasVulkanBackend()caching it.On Vulkan-enabled Linux hosts, both probes run independently.
detectGpu()could reuse the cached result:// Try vulkaninfo - try { execSync('vulkaninfo --summary', { stdio: 'pipe', timeout: 3000 }); gpuDetected = true; return true; } catch { /* no vulkan */ } + if (hasVulkanBackend()) { gpuDetected = true; return true; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@server/services/whisper-local.ts` around lines 298 - 299, detectGpu() is redundantly spawning 'vulkaninfo' even though hasVulkanBackend() already caches that probe; update detectGpu() to use the cached result instead of running vulkaninfo again by calling hasVulkanBackend() (or checking the shared gpuDetected cache) first and returning its boolean if present, only falling back to running the probe when the cached state is undefined; reference detectGpu(), hasVulkanBackend(), and the gpuDetected cache when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@server/services/whisper-local.ts`:
- Around line 414-418: The transcribe options currently set maxThreads to
cpus().length which ignores GPU usage and container CPU quotas; update the code
that builds transcribeOpts (the TranscribeOptions object) to cap maxThreads: if
hasVulkanBackend() is true set maxThreads = Math.min(4, cpus().length) to avoid
excessive threads when GPU is active, and additionally constrain threads by a
cgroup-aware CPU count (read /sys/fs/cgroup/cpu.max or use a helper like
physical-cpu-count) falling back to cpus().length if cgroup info is unavailable;
ensure you reference transcribeOpts and hasVulkanBackend() when making this
change.
- Around line 298-299: detectGpu() is redundantly spawning 'vulkaninfo' even
though hasVulkanBackend() already caches that probe; update detectGpu() to use
the cached result instead of running vulkaninfo again by calling
hasVulkanBackend() (or checking the shared gpuDetected cache) first and
returning its boolean if present, only falling back to running the probe when
the cached state is undefined; reference detectGpu(), hasVulkanBackend(), and
the gpuDetected cache when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 01660559-1ba5-44d6-9d67-307239f83ba0
📒 Files selected for processing (1)
server/services/whisper-local.ts
Two follow-ups from the second CR pass on this PR: * Derive useGpu from the same hasVulkanBackend() check that gates the backend selection. Previously useGpu stayed `true` on Linux when Vulkan was absent, which left the library in "useGpu=true + backend=undefined" territory — the exact undefined behaviour this PR is trying to avoid. Now useGpu only stays true when there is a real backend to use (Metal on macOS, Vulkan on Linux with ICD present). * Have detectGpu() reuse hasVulkanBackend() instead of running its own `vulkaninfo --summary` exec. With both getContext() and getSystemInfo() called during startup, vulkaninfo would otherwise be spawned twice on a Vulkan Linux host; the shared cache eliminates the duplicate probe.
|
Pushed d4530db addressing the two CR follow-ups from the bfefc55 review:
@coderabbitai yes, please open the follow-up issue for the cgroup-aware @coderabbitai full review |
🧩 Analysis chainGreat, I'll open the cgroup-aware 🏁 Script executed: Length of output: 61
The cgroup-aware Full re-review is running now. 🐇 ✨ ✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d4530dba6a
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@coderabbitai re: Codex's new P2 — that's a fair call-out (Vulkan runtime can be present without the Please trigger a fresh review pass once those follow-ups are filed. |
🧩 Analysis chainAgreed on the conservative stance — skipping GPU when 🏁 Script executed: Length of output: 61
Triggering the fresh review pass now. 🐇 ✅ Actions performedReview triggered.
|
kimbotai1337
left a comment
There was a problem hiding this comment.
Approving on the merits. The original P1 (initWhisper crash on Linux hosts without a Vulkan ICD) is addressed, useGpu is now in sync with the backend selection, and vulkaninfo is only probed once per process. The two remaining concerns from this review cycle are tracked as follow-up issues #346 (cgroup-aware maxThreads) and #347 (direct Vulkan ICD probe when vulkaninfo CLI is absent) — both P2-level polish on top of the working fix this PR delivers.
UPDATED SUMMARY:
Summary
Enables the Vulkan GPU backend for local Whisper transcription on Linux, and utilizes all available CPU threads for transcription.
Problem
On Linux systems with AMD or Intel GPUs, the @fugood/whisper.node binding silently falls back to CPU-only inference even when useGpu: true is passed to initWhisper. This results in transcription times of 14–34 seconds for short audio clips on an AMD RX 6600 XT.
The root cause is that the binding requires an explicit backend variant string ('vulkan') on Linux to load the GPU backend, unlike macOS where Metal is auto-detected via the 'default' variant.
Additionally, the transcription options did not set maxThreads, leaving performance on the table on multi-core systems.
Changes
Platform-conditional backend selection in server/services/whisper-local.ts:
Linux → 'vulkan' (enables GPU acceleration for AMD/Intel/NVIDIA via RADV/mesa)
macOS → undefined (uses 'default' variant, enabling Metal on Apple Silicon with CPU fallback on x86_64)
Windows/others → undefined (uses 'default' variant)
Set maxThreads: cpus().length in the TranscribeOptions passed to transcribeFile. This ensures transcription uses all available logical CPU cores.
[Dev option as to whether this is wise or not, see coderabbit review]
Performance Impact
Tested on AMD Radeon RX 6600 XT (RADV NAVI23, Mesa Vulkan drivers):
Model Before (CPU) After (Vulkan) Improvement
tiny ~14 s ~0.9 s 15.6×
base ~34 s ~0.58 s 58.6×
small — ~1.77 s —
Verification
vkcube runs successfully, confirming Vulkan driver availability.
Whisper logs show using Vulkan0 backend after the change.
Rebuild (npm run build:server) and server restart confirmed working.
Compatibility Notes
macOS: Preserved. Metal is still auto-detected via the 'default' variant on Apple Silicon.
Linux + NVIDIA: The 'vulkan' backend should also work for NVIDIA GPUs with proprietary or Nouveau Vulkan drivers, but has not been explicitly tested.
Linux without GPU: The binding will fall back to CPU if Vulkan is unavailable, maintaining existing behavior.
Non-Vulkan Linux: Will throw if Vulkan drivers are absent; users should install Mesa/Vulkan ICDs or fall back to CPU-only systems.
Related
This fix was initially documented in the agent's session memory on 2026-04-24 after debugging slow transcription on an Arch Linux workstation.
Summary by CodeRabbit