Skip to content

fix(whisper): enable Vulkan GPU backend on Linux#319

Merged
kimbotai1337 merged 6 commits into
daggerhashimoto:masterfrom
jimdawdy-hub:fix/whisper-vulkan-linux
May 21, 2026
Merged

fix(whisper): enable Vulkan GPU backend on Linux#319
kimbotai1337 merged 6 commits into
daggerhashimoto:masterfrom
jimdawdy-hub:fix/whisper-vulkan-linux

Conversation

@jimdawdy-hub

@jimdawdy-hub jimdawdy-hub commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

UPDATED SUMMARY:
Summary
Enables the Vulkan GPU backend for local Whisper transcription on Linux, and utilizes all available CPU threads for transcription.

Problem
On Linux systems with AMD or Intel GPUs, the @fugood/whisper.node binding silently falls back to CPU-only inference even when useGpu: true is passed to initWhisper. This results in transcription times of 14–34 seconds for short audio clips on an AMD RX 6600 XT.

The root cause is that the binding requires an explicit backend variant string ('vulkan') on Linux to load the GPU backend, unlike macOS where Metal is auto-detected via the 'default' variant.

Additionally, the transcription options did not set maxThreads, leaving performance on the table on multi-core systems.

Changes
Platform-conditional backend selection in server/services/whisper-local.ts:

Linux → 'vulkan' (enables GPU acceleration for AMD/Intel/NVIDIA via RADV/mesa)
macOS → undefined (uses 'default' variant, enabling Metal on Apple Silicon with CPU fallback on x86_64)
Windows/others → undefined (uses 'default' variant)
Set maxThreads: cpus().length in the TranscribeOptions passed to transcribeFile. This ensures transcription uses all available logical CPU cores.
[Dev option as to whether this is wise or not, see coderabbit review]

Performance Impact
Tested on AMD Radeon RX 6600 XT (RADV NAVI23, Mesa Vulkan drivers):

Model Before (CPU) After (Vulkan) Improvement
tiny ~14 s ~0.9 s 15.6×
base ~34 s ~0.58 s 58.6×
small — ~1.77 s —
Verification
vkcube runs successfully, confirming Vulkan driver availability.
Whisper logs show using Vulkan0 backend after the change.
Rebuild (npm run build:server) and server restart confirmed working.
Compatibility Notes
macOS: Preserved. Metal is still auto-detected via the 'default' variant on Apple Silicon.
Linux + NVIDIA: The 'vulkan' backend should also work for NVIDIA GPUs with proprietary or Nouveau Vulkan drivers, but has not been explicitly tested.
Linux without GPU: The binding will fall back to CPU if Vulkan is unavailable, maintaining existing behavior.
Non-Vulkan Linux: Will throw if Vulkan drivers are absent; users should install Mesa/Vulkan ICDs or fall back to CPU-only systems.
Related
This fix was initially documented in the agent's session memory on 2026-04-24 after debugging slow transcription on an Arch Linux workstation.

Summary by CodeRabbit

  • Chores
    • Transcription now prefers GPU-based (Vulkan) compute on Linux only when a quick compatibility probe confirms availability, with a cached result to avoid repeated checks.
    • Added a per-request thread cap for transcription set to the host CPU count to limit concurrency and reduce resource contention.

Review Change Stack

@coderabbitai

coderabbitai Bot commented Apr 25, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@kimbotai1337 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 7 minutes and 57 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 47bc3619-12cc-4fcc-adf1-1b9089b707b9

📥 Commits

Reviewing files that changed from the base of the PR and between bfefc55 and d4530db.

📒 Files selected for processing (1)
  • server/services/whisper-local.ts
📝 Walkthrough

Walkthrough

Whisper context initialization now computes a backend ('vulkan' on Linux when the new probe succeeds) and passes it to initWhisper. A Vulkan availability probe with caching was added. Per-request transcription options set maxThreads: cpus().length. No exported APIs changed.

Changes

Whisper Configuration

Layer / File(s) Summary
Vulkan availability probe
server/services/whisper-local.ts
Adds hasVulkanBackend() which runs vulkaninfo --summary with a timeout and caches the boolean in vulkanBackendAvailable.
Context init and backend selection
server/services/whisper-local.ts
getContext() computes and supplies a backend value (e.g., 'vulkan' on Linux when the probe passes) as the second argument to initWhisper.
Per-request transcription tuning
server/services/whisper-local.ts
transcribeOpts for ctx.transcribeFile now includes maxThreads: cpus().length.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I nibbled bytes where kernels hum,
Vulkan wakes when GPUs come,
Threads set to match each core's song,
Whisper listens all day long,
Hops of code — the build grows strong.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main objective: enabling Vulkan GPU backend on Linux for Whisper transcription.
Description check ✅ Passed The description covers all required sections (What, Why, How) with detailed problem statement, implementation details, performance metrics, and verification steps. Type of Change is marked as bug fix.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3027094fc7

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/services/whisper-local.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@server/services/whisper-local.ts`:
- Around line 393-397: The transcribeOpts currently sets maxThreads =
cpus().length which ignores GPU presence and container CPU quotas; update the
code that builds TranscribeOptions (transcribeOpts) to compute maxThreads using
detectGpu() and the existing config: if detectGpu() returns true default to ~4
threads, otherwise use a clamped value (e.g., min(availableCpusFromCgroupOrOs(),
8)); allow an override from the config system (e.g., config.whisper.maxThreads)
and ensure you read container-aware CPU count (cgroup/quota-aware) when falling
back to OS CPU count before clamping.
- Around line 73-76: The hardcoded 'vulkan' variant passed to initWhisper (in
the contextInitializing assignment that calls initWhisper({ filePath:
modelPath(), useGpu: true }, 'vulkan')) breaks macOS Metal support and can fail
on systems without Vulkan; change this to pick the variant conditionally: use
'default' on macOS to enable Metal, and choose between 'vulkan' or 'cuda' on
Windows/Linux based on detected GPU/drivers (or fall back to 'default' if
detection fails), update the inline comment to reflect auto-selection, and
ensure the initWhisper call and any surrounding error handling handle fallback
gracefully when the chosen variant cannot be loaded.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d4af591c-8a36-40b5-86a8-752c5f984d16

📥 Commits

Reviewing files that changed from the base of the PR and between 0619c80 and 3027094.

📒 Files selected for processing (1)
  • server/services/whisper-local.ts

Comment thread server/services/whisper-local.ts Outdated
Comment thread server/services/whisper-local.ts
…acOS

- Pass 'vulkan' explicitly on Linux for AMD/Intel GPU acceleration
- Use 'default' (Metal) on macOS and other platforms
- Add maxThreads: cpus().length for full CPU utilization
@jimdawdy-hub jimdawdy-hub force-pushed the fix/whisper-vulkan-linux branch from ccecc6f to 38448bc Compare April 25, 2026 19:41

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 38448bc76f

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/services/whisper-local.ts

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf6b8b771e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread server/services/whisper-local.ts Outdated
Force-passing 'vulkan' to initWhisper on every Linux host took local STT
offline on CPU-only Linux/containers where no Vulkan ICD is present —
initWhisper would throw and never load the model. Reuse the existing
detectGpu() probe (which already runs nvidia-smi/vulkaninfo) and only
request the Vulkan backend when a GPU is actually available; otherwise let
the library pick its default backend so CPU fallback works.

Addresses the Codex P1 follow-up from daggerhashimoto#319 review.
@kimbotai1337

Copy link
Copy Markdown
Collaborator

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
✅ Actions performed

Full review triggered.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af90a50461

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/services/whisper-local.ts Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@server/services/whisper-local.ts`:
- Around line 73-78: The backend selection currently uses detectGpu() which
returns true for any GPU (including CUDA-only NVIDIA), causing Linux hosts
without a Vulkan ICD to be forced to 'vulkan' and fail: change the check used
when setting the backend constant so it only picks 'vulkan' when a Vulkan ICD is
actually available (e.g., replace detectGpu() with a Vulkan-specific probe like
detectVulkan() or isVulkanAvailable()); keep detectGpu() for broader GPU
reporting, and ensure initWhisper() still receives undefined when Vulkan is not
available so the library can pick the appropriate default.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ab3d4219-0c9f-4a61-8465-2b57342fb1d7

📥 Commits

Reviewing files that changed from the base of the PR and between 38448bc and af90a50.

📒 Files selected for processing (1)
  • server/services/whisper-local.ts

Comment thread server/services/whisper-local.ts Outdated
@kimbotai1337

Copy link
Copy Markdown
Collaborator

Pushed a follow-up commit (af90a50) addressing the Codex P1 about Vulkan availability.

The change gates the Linux 'vulkan' backend on the existing detectGpu() probe (which already runs nvidia-smi/vulkaninfo --summary). On CPU-only Linux hosts and containers without a Vulkan ICD, getContext() now passes undefined for the backend so initWhisper can fall back to its default path instead of throwing on init.

I left the P2 (maxThreads = cpus().length not being cgroup-aware) alone — happy to file a follow-up issue tracking the cgroup-aware thread count.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d88a4f821e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/services/whisper-local.ts Outdated
Codex/CodeRabbit pointed out the prior detectGpu() gate is still too
permissive: detectGpu() returns true on CUDA-only NVIDIA containers (where
nvidia-smi works but no Vulkan ICD is installed), so the original failure
mode — initWhisper(..., 'vulkan') throwing at context init — survives on
those hosts. Split the check: hasVulkanBackend() probes `vulkaninfo
--summary` specifically and is the only gate for selecting the Vulkan
backend; detectGpu() stays as the broader "has any GPU?" reporter.
@kimbotai1337

Copy link
Copy Markdown
Collaborator

Pushed bfefc55 — addresses the follow-up from both Codex P1 and CodeRabbit Major.

You're both right that the detectGpu() gate was still too permissive: it returns true on CUDA-only NVIDIA containers where nvidia-smi works but no Vulkan ICD is installed, which leaves the original failure mode in place. Split the check into hasVulkanBackend() (probes vulkaninfo --summary specifically, cached behind a module-level flag) and used that as the only gate for selecting 'vulkan'detectGpu() stays as the broader "has any GPU?" reporter for system info.

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Get the current state of the whisper-local.ts and any GPU detection utilities
fd -e ts -e js whisper-local --exec cat {}
echo "---FILE_BOUNDARY---"
# Also look for detectGpu or hasVulkanBackend definitions
rg -n "hasVulkanBackend|detectGpu|vulkaninfo|vulkan" --type ts -C 4
echo "---FILE_BOUNDARY---"
# Check package.json for whisper dependency versions
cat package.json 2>/dev/null || fd package.json -d 2 --exec cat {}

Length of output: 22883


@kimbotai1337 — nice work on the split. I've reviewed the bfefc55 changes in full. The core fix is correct: hasVulkanBackend() is now the only gate for selecting 'vulkan', and the module-level cache, 3 s timeout, and clear JSDoc all look good. A few follow-up observations:


1. useGpu: true is still unconditional on Linux when Vulkan is absent

When hasVulkanBackend() returns false on Linux, backend is undefined but useGpu: true is still forwarded to initWhisper. The library's behaviour in that combination (no explicit backend + useGpu: true on Linux) is the exact undefined territory this PR is trying to escape. Consider:

// useGpu: true is meaningful on macOS (triggers Metal) and on Linux with Vulkan;
// on Linux without Vulkan leave it false so the library cannot attempt its own probing.
const useGpu = process.platform !== 'linux' || hasVulkanBackend();
const backend = process.platform === 'linux' && useGpu ? 'vulkan' : undefined;

contextInitializing = initWhisper({ filePath: modelPath(), useGpu }, backend)

2. vulkaninfo --summary is spawned twice on a Vulkan Linux host

hasVulkanBackend() and detectGpu() each own an independent execSync call and an independent cache. When both getContext() and getSystemInfo() are called (which is the normal startup path), vulkaninfo is spawned once per cache miss in each function. Deduplication is a one-liner:

// In detectGpu(), replace the inline vulkaninfo execSync with:
try { if (hasVulkanBackend()) { gpuDetected = true; return true; } } catch { /* no vulkan */ }

This lets vulkanBackendAvailable serve as the shared result and avoids the double-spawn.


3. P2 follow-up: cgroup-aware thread count

You mentioned you're happy to file a follow-up issue for maxThreads = cpus().length not being cgroup-aware. Would you like me to open that GitHub issue now so it doesn't get lost?


Triggering the full re-review now.

✅ Actions performed

Full review triggered.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bfefc554c3

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/services/whisper-local.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
server/services/whisper-local.ts (2)

414-418: ⚡ Quick win

Consider capping maxThreads based on GPU presence and container limits.

cpus().length ignores two scenarios:

  1. GPU active: whisper.cpp recommends ~4 threads when GPU handles compute; extra threads add scheduling overhead.
  2. Container quotas: os.cpus() returns host core count, not cgroup limits (--cpus=2 on a 32-core host still returns 32).

Since hasVulkanBackend() is now available, a simple improvement:

-      maxThreads: cpus().length,
+      maxThreads: hasVulkanBackend() ? 4 : Math.min(cpus().length, 8),

For full cgroup awareness, consider reading /sys/fs/cgroup/cpu.max or using a library like physical-cpu-count in a follow-up.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/whisper-local.ts` around lines 414 - 418, The transcribe
options currently set maxThreads to cpus().length which ignores GPU usage and
container CPU quotas; update the code that builds transcribeOpts (the
TranscribeOptions object) to cap maxThreads: if hasVulkanBackend() is true set
maxThreads = Math.min(4, cpus().length) to avoid excessive threads when GPU is
active, and additionally constrain threads by a cgroup-aware CPU count (read
/sys/fs/cgroup/cpu.max or use a helper like physical-cpu-count) falling back to
cpus().length if cgroup info is unavailable; ensure you reference transcribeOpts
and hasVulkanBackend() when making this change.

298-299: 💤 Low value

detectGpu() spawns vulkaninfo again despite hasVulkanBackend() caching it.

On Vulkan-enabled Linux hosts, both probes run independently. detectGpu() could reuse the cached result:

     // Try vulkaninfo
-    try { execSync('vulkaninfo --summary', { stdio: 'pipe', timeout: 3000 }); gpuDetected = true; return true; } catch { /* no vulkan */ }
+    if (hasVulkanBackend()) { gpuDetected = true; return true; }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@server/services/whisper-local.ts` around lines 298 - 299, detectGpu() is
redundantly spawning 'vulkaninfo' even though hasVulkanBackend() already caches
that probe; update detectGpu() to use the cached result instead of running
vulkaninfo again by calling hasVulkanBackend() (or checking the shared
gpuDetected cache) first and returning its boolean if present, only falling back
to running the probe when the cached state is undefined; reference detectGpu(),
hasVulkanBackend(), and the gpuDetected cache when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@server/services/whisper-local.ts`:
- Around line 414-418: The transcribe options currently set maxThreads to
cpus().length which ignores GPU usage and container CPU quotas; update the code
that builds transcribeOpts (the TranscribeOptions object) to cap maxThreads: if
hasVulkanBackend() is true set maxThreads = Math.min(4, cpus().length) to avoid
excessive threads when GPU is active, and additionally constrain threads by a
cgroup-aware CPU count (read /sys/fs/cgroup/cpu.max or use a helper like
physical-cpu-count) falling back to cpus().length if cgroup info is unavailable;
ensure you reference transcribeOpts and hasVulkanBackend() when making this
change.
- Around line 298-299: detectGpu() is redundantly spawning 'vulkaninfo' even
though hasVulkanBackend() already caches that probe; update detectGpu() to use
the cached result instead of running vulkaninfo again by calling
hasVulkanBackend() (or checking the shared gpuDetected cache) first and
returning its boolean if present, only falling back to running the probe when
the cached state is undefined; reference detectGpu(), hasVulkanBackend(), and
the gpuDetected cache when making the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 01660559-1ba5-44d6-9d67-307239f83ba0

📥 Commits

Reviewing files that changed from the base of the PR and between a4a605c and bfefc55.

📒 Files selected for processing (1)
  • server/services/whisper-local.ts

Two follow-ups from the second CR pass on this PR:

* Derive useGpu from the same hasVulkanBackend() check that gates the
  backend selection. Previously useGpu stayed `true` on Linux when Vulkan
  was absent, which left the library in "useGpu=true + backend=undefined"
  territory — the exact undefined behaviour this PR is trying to avoid.
  Now useGpu only stays true when there is a real backend to use (Metal on
  macOS, Vulkan on Linux with ICD present).

* Have detectGpu() reuse hasVulkanBackend() instead of running its own
  `vulkaninfo --summary` exec. With both getContext() and getSystemInfo()
  called during startup, vulkaninfo would otherwise be spawned twice on a
  Vulkan Linux host; the shared cache eliminates the duplicate probe.
@kimbotai1337

Copy link
Copy Markdown
Collaborator

Pushed d4530db addressing the two CR follow-ups from the bfefc55 review:

  1. useGpu now tracks hasVulkanBackend() — on Linux without a Vulkan ICD both useGpu and the backend are now false/undefined, so we never put the library in the "useGpu=true + backend=undefined" undefined territory. On macOS and Windows the constraint is degenerate (useGpu = true).
  2. detectGpu() reuses hasVulkanBackend() for its Vulkan branch — vulkaninfo is now spawned at most once per process across both getContext() and getSystemInfo().

@coderabbitai yes, please open the follow-up issue for the cgroup-aware maxThreads (P2) so we can track it independently.

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
🧩 Analysis chain

Great, I'll open the cgroup-aware maxThreads tracking issue now and kick off the full re-review.


🏁 Script executed:

#!/bin/bash
gh issue create \
  --repo daggerhashimoto/openclaw-nerve \
  --title "fix(whisper): make maxThreads cgroup-aware instead of using cpus().length" \
  --body "## Background

In PR `#319` (fix/whisper-vulkan-linux), \`transcribeOpts\` was updated to set \`maxThreads: cpus().length\` so Whisper utilises all available logical CPU cores on the host.

## Problem

\`cpus().length\` (Node's \`os.cpus()\`) reflects the **host's** logical CPU count and is unaware of Linux cgroup CPU quotas. In containerised environments (Docker, Kubernetes, LXC, etc.) the container may be limited to a fractional number of CPUs via \`cpu.cfs_quota_us\` / \`cpu.cfs_period_us\` (cgroups v1) or \`cpu.max\` (cgroups v2), yet \`cpus().length\` will still return the full host count. Spawning more threads than the container is allotted leads to CPU throttling and can degrade transcription throughput.

## Proposed fix

Read the effective CPU limit from the cgroup interface files at runtime and clamp \`maxThreads\` to \`Math.max(1, Math.floor(cgroupQuota))\`. Fall back to \`cpus().length\` if the cgroup files are absent (bare-metal, macOS, Windows) or if the quota is set to \`-1\` (unlimited).

Rough sketch:

\`\`\`typescript
function effectiveCpuCount(): number {
  // cgroups v2
  try {
    const raw = fs.readFileSync('/sys/fs/cgroup/cpu.max', 'utf8').trim();
    const [quota, period] = raw.split(' ');
    if (quota !== 'max') return Math.max(1, Math.floor(Number(quota) / Number(period)));
  } catch { /* not v2 */ }
  // cgroups v1
  try {
    const quota  = Number(fs.readFileSync('/sys/fs/cgroup/cpu/cpu.cfs_quota_us',  'utf8'));
    const period = Number(fs.readFileSync('/sys/fs/cgroup/cpu/cpu.cfs_period_us', 'utf8'));
    if (quota > 0) return Math.max(1, Math.floor(quota / period));
  } catch { /* not v1 */ }
  return cpus().length;
}
\`\`\`

## Context

- Raised as a P2 follow-up during review of PR `#319` by `@kimbotai1337`.
- The \`hasVulkanBackend()\` and \`useGpu\` fixes shipped in that PR; this is the only remaining item.

## References

- PR `#319`: https://github.com/daggerhashimoto/openclaw-nerve/pull/319
" \
  --assignee kimbotai1337

Length of output: 61


@kimbotai1337 — the d4530db fixes look clean, both follow-ups are resolved. ✅

The cgroup-aware maxThreads tracking issue is now open at #346: #346 — assigned to you, with full context and a reference back to this PR.

Full re-review is running now.

🐇 ✨

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d4530dba6a

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread server/services/whisper-local.ts
@kimbotai1337

Copy link
Copy Markdown
Collaborator

@coderabbitai re: Codex's new P2 — that's a fair call-out (Vulkan runtime can be present without the vulkaninfo CLI binary on slim containers, so hasVulkanBackend() is on the strict side). I'd rather not extend this PR further chasing probe edges, since the conservative choice (skip GPU when vulkaninfo is absent) is strictly safer than the original failure mode this PR is fixing. Could you open a follow-up issue alongside the cgroup one, titled "fix(whisper): probe Vulkan ICD directly when vulkaninfo CLI is absent" or similar? Users running minimal containers can install vulkan-tools as a workaround in the meantime.

Please trigger a fresh review pass once those follow-ups are filed.

@coderabbitai

coderabbitai Bot commented May 21, 2026

Copy link
Copy Markdown
🧩 Analysis chain

Agreed on the conservative stance — skipping GPU when vulkaninfo is absent is strictly safer, and vulkan-tools is a clear and documented workaround. Opening the tracking issue now.


🏁 Script executed:

#!/bin/bash
gh issue create \
  --repo daggerhashimoto/openclaw-nerve \
  --title "fix(whisper): probe Vulkan ICD directly when vulkaninfo CLI is absent" \
  --body "## Background

In PR `#319` (fix/whisper-vulkan-linux), \`hasVulkanBackend()\` gates Vulkan GPU selection by probing \`vulkaninfo --summary\`. This is conservative and correct — it avoids the original failure mode of \`initWhisper\` throwing on hosts with no Vulkan ICD.

## Problem

On minimal/slim Linux container images (e.g., distroless, alpine-based, CI runners), the Vulkan *runtime* (ICD loader + driver) may be installed while the \`vulkaninfo\` CLI binary (typically shipped in the \`vulkan-tools\` / \`vulkan-utils\` package) is absent. In that case \`hasVulkanBackend()\` returns \`false\` and the Vulkan GPU path is silently skipped even though the runtime is fully functional.

**Current workaround:** install \`vulkan-tools\` in the container image so \`vulkaninfo\` is available.

## Proposed fix

Augment (or replace) the \`vulkaninfo --summary\` probe with a direct ICD enumeration approach — for example:

- Parse the ICD manifest paths from \`/etc/vulkan/icd.d/\` or the \`VK_ICD_FILENAMES\` / \`VK_DRIVER_FILES\` environment variables and verify at least one \`.so\` is loadable.
- As a lighter fallback, attempt \`dlopen\` of \`libvulkan.so.1\` via a small native check or a Node.js \`ffi\`-style probe.

The check should remain synchronous (or cached async) and should fall back gracefully when none of the above is available (bare-metal macOS, Windows, \`cpus()\`-only hosts).

## Context

- Raised as a follow-up during review of PR `#319` by \`@kimbotai1337\`.
- Conservative decision: keep \`vulkaninfo\`-based gate in this PR; address direct ICD probing here.
- Related follow-up: `#346` (cgroup-aware \`maxThreads\`).
- PR reference: https://github.com/daggerhashimoto/openclaw-nerve/pull/319
" \
  --assignee kimbotai1337

Length of output: 61


@kimbotai1337 — agreed, the conservative choice is the right call for this PR. The tracking issue is now open at #347: #347 — assigned to you, with the vulkan-tools workaround documented and a reference back to this PR alongside #346.

Triggering the fresh review pass now. 🐇

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@kimbotai1337 kimbotai1337 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on the merits. The original P1 (initWhisper crash on Linux hosts without a Vulkan ICD) is addressed, useGpu is now in sync with the backend selection, and vulkaninfo is only probed once per process. The two remaining concerns from this review cycle are tracked as follow-up issues #346 (cgroup-aware maxThreads) and #347 (direct Vulkan ICD probe when vulkaninfo CLI is absent) — both P2-level polish on top of the working fix this PR delivers.

@kimbotai1337 kimbotai1337 merged commit 9edbf7a into daggerhashimoto:master May 21, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants