fix(inference): prove WSL Docker Desktop GPUs and report sandbox CUDA proof state#4599
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (19)
🚧 Files skipped from review as they are similar to previous changes (18)
📝 WalkthroughWalkthroughAdds structured CUDA-usability proof results, runs bounded Docker/WSL proofs (ARM64), persists proof outcomes in the registry, integrates optional proof injection into GPU detection, captures proofs during onboarding, and renders proof status (verified/failed/unverified) in sandbox status output with tests and e2e update. ChangesSandbox GPU proof state and verification
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lib/inference/nim.ts (1)
96-100: ⚡ Quick winDon't swallow internal prover initialization failures.
This
catchturns bugs inside../onboard/wsl-docker-desktop-gpuinto a silent CPU fallback, which makes supported N1X hosts look like “no GPU” instead of surfacing the regression. Only the optional module-resolution case should returnnull; other failures should bubble.Based on learnings: "avoid adding defensive error handling (e.g., try/catch wrappers, fallbacks, or extra validation) around internal helper logic when there is no realistic throwing path or failure mode."Suggested change
function defaultArm64WslDockerDesktopGpuProver(): Arm64WslDockerDesktopGpuProver | null { try { return require("../onboard/wsl-docker-desktop-gpu").createArm64WslDockerDesktopGpuProver(); - } catch { - return null; + } catch (error) { + if ( + error && + typeof error === "object" && + "code" in error && + (error as NodeJS.ErrnoException).code === "MODULE_NOT_FOUND" + ) { + return null; + } + throw error; } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/inference/nim.ts` around lines 96 - 100, The function defaultArm64WslDockerDesktopGpuProver currently swallows all errors from requiring or initializing ../onboard/wsl-docker-desktop-gpu; change it so only the optional-module-resolution case returns null and any other initialization errors bubble up: call require("../onboard/wsl-docker-desktop-gpu") and attempt to createArm64WslDockerDesktopGpuProver inside a try/catch that only returns null when the caught error is a module-not-found resolution (e.g., error.code === 'MODULE_NOT_FOUND' or equivalent), otherwise rethrow the error so failures inside createArm64WslDockerDesktopGpuProver are not silenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/lib/inference/nim.ts`:
- Around line 96-100: The function defaultArm64WslDockerDesktopGpuProver
currently swallows all errors from requiring or initializing
../onboard/wsl-docker-desktop-gpu; change it so only the
optional-module-resolution case returns null and any other initialization errors
bubble up: call require("../onboard/wsl-docker-desktop-gpu") and attempt to
createArm64WslDockerDesktopGpuProver inside a try/catch that only returns null
when the caught error is a module-not-found resolution (e.g., error.code ===
'MODULE_NOT_FOUND' or equivalent), otherwise rethrow the error so failures
inside createArm64WslDockerDesktopGpuProver are not silenced.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a97cb9ac-bff2-4a0c-b99c-99c28f76ae2b
📒 Files selected for processing (15)
src/lib/actions/sandbox/status.test.tssrc/lib/actions/sandbox/status.tssrc/lib/inference/gpu-trust.tssrc/lib/inference/nim.test.tssrc/lib/inference/nim.tssrc/lib/onboard.tssrc/lib/onboard/sandbox-gpu-mode.tssrc/lib/onboard/sandbox-gpu-preflight.test.tssrc/lib/onboard/sandbox-gpu-preflight.tssrc/lib/onboard/sandbox-registry-metadata.tssrc/lib/onboard/wsl-docker-desktop-gpu.test.tssrc/lib/onboard/wsl-docker-desktop-gpu.tssrc/lib/runner.tssrc/lib/state/registry.tstest/e2e/test-gpu-e2e.sh
|
@coderabbitai review The incremental commit addresses your nitpick on |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check the current state of wsl-docker-desktop-gpu.ts for MODULE_NOT_FOUND handling
rg -n "MODULE_NOT_FOUND\|defaultArm64Wsl\|require.*runCaptureEx\|catch" src/lib/onboard/wsl-docker-desktop-gpu.ts -A3 -B3
echo "---nim.ts defaultArm64WslDockerDesktopGpuProver---"
rg -n "defaultArm64Wsl\|MODULE_NOT_FOUND" src/lib/inference/nim.ts -A5 -B3Length of output: 646 🏁 Script executed: #!/bin/bash
# Find the relevant files
fd "wsl-docker-desktop-gpu.ts" --type f
fd "nim.ts" --type f | grep -v test | grep -v node_modules
echo "---"
# Try broader search
rg -rn "MODULE_NOT_FOUND" --type ts -C3Length of output: 1760 🏁 Script executed: #!/bin/bash
echo "=== nim.ts defaultArm64WslDockerDesktopGpuProver / lazy require section ==="
cat src/lib/inference/nim.ts | grep -n "MODULE_NOT_FOUND\|require\|catch\|proveArm64\|defaultArm64\|lazy" | head -60
echo ""
echo "=== Full nim.ts catch blocks ==="
rg -n "catch" src/lib/inference/nim.ts -A 8 -B 2
echo ""
echo "=== wsl-docker-desktop-gpu.ts full file ==="
cat src/lib/onboard/wsl-docker-desktop-gpu.tsLength of output: 16786 🏁 Script executed: #!/bin/bash
# Read the exact lines around the defaultArm64WslDockerDesktopGpuProver function to confirm rethrow
awk 'NR>=92 && NR<=125' src/lib/inference/nim.tsLength of output: 1920
function defaultArm64WslDockerDesktopGpuProver(): Arm64WslDockerDesktopGpuProver | null {
try {
return require("../onboard/wsl-docker-desktop-gpu").createArm64WslDockerDesktopGpuProver();
} catch (error) {
// Only the optional module-resolution case should degrade to "no prover";
// a real bug inside the prover module must bubble up rather than masquerade
// as a missing GPU on an otherwise-supported N1X host.
if (
error &&
typeof error === "object" &&
(error as NodeJS.ErrnoException).code === "MODULE_NOT_FOUND"
) {
return null;
}
throw error; // ← all other errors rethrown
}
}The narrowed catch is exactly right:
The previous nitpick is fully addressed. The rest of the PR also looks solid — the [approve] |
|
@yimoj this one's showing |
|
@yimoj Did you see this error when you tested it on Windows-ARM with WSL? |
cbdf60f to
eef7250
Compare
|
@zyang-dev Thank you — that To be upfront: I do not have Windows-ARM + WSL hardware, so I did not reproduce your run live. I diagnosed it from the image itself, which is conclusive here: So on an aarch64 host Docker pulls the "arm64" variant whose binary is x86-64, and execve fails with Fix (pushed):
If you're able to re-run |
|
@coderabbitai review Rebased onto current |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== wsl-docker-desktop-gpu.ts: proof command + timeout + isExecFormatError ==="
cat src/lib/onboard/wsl-docker-desktop-gpu.ts
echo ""
echo "=== sandbox-gpu-preflight.ts: cuInit(0)=0 requirement + jetson remediation ==="
cat src/lib/onboard/sandbox-gpu-preflight.ts
echo ""
echo "=== snapshot.ts: sandboxGpuProof cleared on clone ==="
rg -n "sandboxGpuProof" src/lib/actions/sandbox/snapshot.ts -C5
echo ""
echo "=== status.ts: proof suffix rendering ==="
cat src/lib/actions/sandbox/status.ts
echo ""
echo "=== registry.ts: SandboxGpuProofResult + SandboxEntry ==="
rg -n "SandboxGpuProof\|sandboxGpuProof" src/lib/state/registry.ts -C3
echo ""
echo "=== nim.ts: MODULE_NOT_FOUND narrowed catch (confirm still in place) ==="
awk 'NR>=88 && NR<=125' src/lib/inference/nim.tsLength of output: 50273
Snapshot clone
MODULE_NOT_FOUND narrowed catch (from previous review)
One cosmetic nit — not blocking: Line ~188 in Everything is correct end-to-end. Approving. [approve] |
… proof state Two grouped GPU trust/proof/status fixes, rebased onto current main. NVIDIA#4565 — accept real Windows-ARM N1X (WSL2 + Docker Desktop) GPUs without reopening the Snapdragon false positive (NVIDIA#3988/NVIDIA#4424). detectGpu() still rejects a denylisted JMJWOA-Generic-* name by default; the only escape is the ARM64 WSL Docker Desktop prover, which runs one bounded Docker --gpus CUDA workload. The proof image is now the arch-correct cuda-sample:vectoradd-cuda12.5.0 (a genuine aarch64 binary running a real CUDA kernel) instead of cuda-sample:nbody, whose arm64 manifest entry actually ships an x86-64 ELF and therefore fails with "exec format error" on the very N1X target this feature accepts. An explicit exec-format-error diagnostic now distinguishes an image-architecture problem from a missing GPU. A real GPU passes; the Snapdragon nvidia-smi shim (no usable CUDA device) stays fail-closed. NVIDIA#4231 — nemoclaw status reflects CUDA proof, not just config. The direct sandbox GPU verifier returns a SandboxGpuProofResult (verified/unverified/failed) keyed on cuInit(0)=0, persisted to the registry and rendered by status as "(CUDA verified)" / "(CUDA unverified)" / "(last CUDA proof failed: …)". A zero exit that printed a non-zero cuInit code (swallowed exit) is treated as failed, not verified. The proof is captured by the verifyGpuSandboxAfterReady wrapper (net-zero onboard.ts) and cleared on snapshot clone so a restored sandbox cannot inherit another sandbox's "CUDA verified" state. CUDA failures print Jetson /dev/nvmap + video/render group remediation. Fail-closed CPU fallback with explicit --no-gpu guidance is preserved on every proof-failure path. Captured stderr in runCaptureEx so Docker/CUDA diagnostics are no longer dropped. The default ARM64 prover only swallows MODULE_NOT_FOUND and rethrows internal initialization errors. Fixes NVIDIA#4565 Fixes NVIDIA#4231 Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
eef7250 to
a56e438
Compare
|
@coderabbitai review Latest head only updates one stale code comment (the |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
echo "=== Comment around line 188 in wsl-docker-desktop-gpu.ts ==="
awk 'NR>=183 && NR<=200' src/lib/onboard/wsl-docker-desktop-gpu.tsLength of output: 1101 🏁 Script executed: #!/bin/bash
echo "=== Search for vectorAdd / nbody / diagnostic comment in wsl-docker-desktop-gpu.ts ==="
rg -n "vectorAdd\|nbody\|prefer.*diagnostic\|fall back.*stdout" src/lib/onboard/wsl-docker-desktop-gpu.ts -C2Length of output: 545 🏁 Script executed: #!/bin/bash
echo "=== Lines 160-192 in wsl-docker-desktop-gpu.ts ==="
awk 'NR>=160 && NR<=192' src/lib/onboard/wsl-docker-desktop-gpu.tsLength of output: 1747
// prefer it for the diagnostic and fall back to stdout (vectorAdd output).The [approve] |
…n init (NVIDIA#4231) On Jetson Orin the sandbox saw the GPU devices mounted but CUDA failed with `NvRmMemInitNvmap ... Permission denied` / `cuInit(0)=999` because the unprivileged sandbox user was not a member of the host group (`video`) that owns `/dev/nvmap` (`crw-rw---- root video`). PR NVIDIA#4599 improved the status/proof semantics but did not propagate that group, so QA reopened: CUDA stayed unusable and status still read "enabled". The Jetson Docker GPU recreate now detects the host group(s) owning the Tegra device nodes (`/dev/nvmap`, `/dev/nvhost-*`, `/dev/nvgpu/*`) and grants the sandbox user matching `--group-add <gid>` membership, so CUDA's nvmap init can open them. The existing post-recreate `cuInit(0)` proof then passes and `nemoclaw status` reports `(CUDA verified)`; if the group cannot be resolved, onboard warns and the proof still gates success, so status falls back to the honest `(last CUDA proof failed)` with `/dev/nvmap` remediation instead of a misleading "enabled". This automates the remediation the existing `jetsonGpuProofRemediationLines()` already documents. - detectTegraDeviceGroupGids(): stat the Tegra device nodes, return owning numeric GIDs (skip missing and root-owned); numeric GIDs work even when the sandbox image has no matching video/render group entry. - recreateOpenShellDockerSandboxWithGpu(): for the jetson backend, thread the detected GIDs into DockerGpuCloneRunOptions.extraGroupGids; buildDockerGpu CloneRunArgs emits --group-add (deduped vs baseline GroupAdd). - applyDockerGpuPatchOrExit(): thread `backend` explicitly so the fallback create path also grants the group. - Regression tests for GID detection, --group-add emission/dedupe, and the Jetson-vs-generic recreate plumbing. - Reporter-workflow E2E (test/e2e/test-jetson-nvmap-gpu.sh): onboard with GPU, inspect sandbox groups + /dev/nvmap, run cuInit(0) in-sandbox, assert status reports (CUDA verified). Wired as gpu-jetson-nvmap-e2e (Jetson-gated) and inventoried; skips cleanly on non-Jetson hosts. Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
…n init (#4231) (#5018) ## Summary On Jetson Orin the sandbox saw the GPU devices mounted but CUDA failed with `NvRmMemInitNvmap ... Permission denied` / `cuInit(0)=999` because the unprivileged sandbox user was not a member of the host group (`video`) that owns `/dev/nvmap` (`crw-rw---- root video`). This grants the sandbox user that group on the Jetson Docker GPU recreate so CUDA actually initializes, and the existing post-recreate `cuInit(0)` proof makes `nemoclaw status` report proven CUDA usability instead of a misleading bare "enabled". ## Related Issue Fixes #4231 PR #4599 improved status/proof semantics but did not propagate Jetson `/dev/nvmap` group access, so QA reopened: CUDA stayed unusable inside the sandbox. This PR fixes the device-permission root cause. ## Changes - **`docker-gpu-patch.ts` — grant the Tegra device-node group on Jetson recreate (the fix):** new `detectTegraDeviceGroupGids()` stats the Jetson Tegra device nodes (`/dev/nvmap`, `/dev/nvhost-*`, `/dev/nvgpu/*`) on the host and returns the owning numeric GID(s) (skipping missing and root-owned nodes). `recreateOpenShellDockerSandboxWithGpu` passes those through `DockerGpuCloneRunOptions.extraGroupGids` into `buildDockerGpuCloneRunArgs`, which emits `--group-add <gid>` (deduped against any baseline `GroupAdd`). Numeric GIDs are used on purpose — the sandbox image need not define a matching `video`/`render` group. Only runs for the `jetson` backend; `backend` is now threaded explicitly through `applyDockerGpuPatchOrExit` so the fallback create path is covered too. This automates the exact remediation the existing `jetsonGpuProofRemediationLines()` already documents. - **Status correctness:** the existing post-recreate `cuInit(0)` proof from #4599 now passes once the device group is granted, so `nemoclaw status` shows `(CUDA verified)`. If the group cannot be resolved, onboard warns and the proof still gates success, so status falls back to the honest `(last CUDA proof failed: …)` with `/dev/nvmap` remediation rather than a misleading "enabled". - **Regression tests** (`docker-gpu-patch.test.ts`): GID detection (dedupe, skip missing/root), `--group-add` emission + dedupe, and end-to-end plumbing through the Jetson recreate; plus a guard that the generic backend never adds Tegra groups. - **Reporter-workflow E2E** (`test/e2e/test-jetson-nvmap-gpu.sh`, `gpu-jetson-nvmap-e2e` in `nightly-e2e.yaml`): runs the reporter's exact Jetson steps and inventoried in `legacy-inventory.json` + `.coderabbit.yaml`. ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Verification - [x] `npm test` (CLI project) passes — full `vitest --project cli` green on this PR head after rebase (the only 2 reds are the pre-existing `snapshot-shields` / `e2e-fixture-context` flakes, confirmed failing on base with my changes stashed). - [x] `npm run typecheck:cli` passes. - [x] `codex review --uncommitted` clean (two flagged CI-integration gaps fixed: aggregate `needs` lists + migration inventory). - [x] Tests added for new/changed behavior. - [x] No secrets, API keys, or credentials committed. ### Reporter-workflow E2E evidence This is verified at two levels that together cover the exact reporter workflow: 1. **Deterministic regression of the exact failure mode** — the unit suite `src/lib/onboard/docker-gpu-patch.test.ts` (describe `Jetson /dev/nvmap group propagation (#4231)`) reproduces the precise reporter condition hermetically: a sandbox user lacking the `/dev/nvmap` owning group, and asserts the Jetson recreate now emits `--group-add <gid>` for the Tegra device-node group so the proof can pass. **56/56 pass on this PR head.** 2. **Reporter-workflow pipeline E2E** — `test/e2e/test-jetson-nvmap-gpu.sh`, wired as the `gpu-jetson-nvmap-e2e` job in `nightly-e2e.yaml`, performs the reporter's exact steps on a Jetson host: onboard with GPU, inspect the sandbox user's groups and `/dev/nvmap`, run the in-sandbox `cuInit(0)` CUDA proof, and assert `nemoclaw status` reports `(CUDA verified)` (a bare "enabled" fails the job). Trigger it on a Jetson runner with: ``` gh workflow run nightly-e2e.yaml --ref fix/4231-jetson-nvmap-gpu-status -f jobs=gpu-jetson-nvmap-e2e ``` All required CI checks are green on this PR head (`cli-test-shards`, `build-typecheck`, `codebase-growth-guardrails`, `ShellCheck`, `dco-check`, `CodeRabbit`); see the PR Checks tab for the run ids and job logs. ## Merge gate / remaining work The live `gpu-jetson-nvmap-e2e` job is gated behind `vars.JETSON_E2E_ENABLED` and a Jetson/Tegra GPU runner label (`vars.JETSON_E2E_RUNNER_LABEL`). The project does not yet host an arm64/Jetson GPU runner, so a live green log on real Jetson hardware is pending that runner being provisioned — set the variable and label, then dispatch the job above. Issue #4231 stays assigned to @yimoj until that live log is captured. --- Signed-off-by: Yimo Jiang <yimoj@nvidia.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code) ## Summary by CodeRabbit * **New Features** * Jetson/Tegra GPU group-permission handling added to improve CUDA initialization on Jetson hardware. * **Tests** * New end-to-end Jetson nvmap GPU test validating group permissions, CUDA initialization, and status reporting. * Nightly E2E job added to run the Jetson GPU test, with configurable enablement and runner selection. * **Chores** * CI reporting updated to include the new Jetson GPU job in failure notifications and reports. <!-- end of auto-generated comment: release notes by coderabbit.ai --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added nightly end-to-end testing for Jetson Orin GPU support, validating CUDA usability and device access configuration. * Improved GPU sandbox group permissions handling for Jetson devices to ensure proper GPU device access. * **Tests** * Added comprehensive E2E test script for Jetson `/dev/nvmap` GPU validation. * Extended test coverage for GPU sandbox group permission detection and application. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Yimo Jiang <yimoj@nvidia.com>
Summary
Two grouped GPU trust/proof/status fixes.
nemoclawnow accepts real Windows-ARM N1X (WSL2 + Docker Desktop) GPUs when a bounded Docker--gpusCUDA proof succeeds (#4565), andnemoclaw statusreports proven CUDA usability instead of treating any configured GPU as healthy (#4231).Related Issue
Fixes #4565
Fixes #4231
Changes
detectGpu()still rejects a denylistedJMJWOA-Generic-*name by default; the only escape iscreateArm64WslDockerDesktopGpuProver, which runs one boundeddocker run --gpus all …CUDA workload on ARM64 Docker Desktop WSL hosts. The proof image isnvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0(a genuine aarch64 binary running a real CUDA kernel — device alloc + add + result verification). The previouscuda-sample:nbodyimage was wrong for this ARM64-only path: its arm64 manifest entry actually ships an x86-64 ELF, so it fails withexec format erroron the exact N1X hardware this feature targets (reported in-thread). Only a real GPU passes, so N1X is accepted while the Snapdragon nvidia-smi shim (no usable CUDA device) stays fail-closed. The proof timeout is bounded (default 180s,NEMOCLAW_WSL_GPU_PROOF_TIMEOUT_MSoverride) and failures keep the CPU fallback with--no-gpuguidance. An explicitexec format errordiagnostic now distinguishes an image-architecture problem from a missing GPU.SandboxGpuProofResult(verified/unverified/failed) keyed on thecuInit(0)=0usability proof instead of silently swallowing optional-proof failures. A zero exit that still printed a non-zerocuInit(0)code (a wrapper that swallowed the real exit) is treated as failed, not verified. The result is persisted to the sandbox registry and rendered bynemoclaw statusas(CUDA verified)/(CUDA unverified)/(last CUDA proof failed: …). CUDA failures print Jetson/dev/nvmap+video/rendergroup remediation. The proof is captured by the existingverifyGpuSandboxAfterReadywrapper (sosrc/lib/onboard.tsis unchanged / net-zero), and cleared on snapshot clone so a restored sandbox cannot inherit another sandbox'sCUDA verifiedstate.--no-gpuguidance preserved on every proof-failure path.runCaptureExso Docker/CUDA failure diagnostics are no longer dropped.MODULE_NOT_FOUNDand rethrows internal initialization errors (earlier CodeRabbit nit).Type of Change
Verification
upstream/main; resolved conflicts instatus.ts/status-snapshot.ts/status.test.ts(upstream extracted the snapshot/report code intostatus-snapshot.ts) and threaded the proof result through the#4509verifyGpuSandboxAfterReadywrapper.wsl-docker-desktop-gpu,nim,sandbox-gpu-preflight,docker-gpu-local-inference,status,registry,snapshot*).npm test(cli project): only pre-existing, environment-only failures remain (test/cli.test.ts,test/ssrf-parity.test.ts,config-sync/nemoclaw-startroot-ownership tests — file-mode/ownership/network checks unrelated to this change; none touch the modified files).codex review --base upstream/mainclean after addressing two P2 findings (stale proof on snapshot clone; requirecuInit(0)=0before verifying).npx prekpre-commit/pre-push hooks pass (format, lint, typecheck).Notes
fileon the extracted binary (thenbodyarm64 tag contains an x86-64 ELF; thevectoradd-cuda12.5.0arm64 tag contains a real aarch64 binary). No live Windows-ARM/WSL GPU hardware was available on the triage host, so the N1X run was not reproduced live — see the in-thread reply for the exact commands and evidence.detectGpuproof gating via injected prover, and the verifier/status proof classification via fixtures, confirming the pre-fix reject ([Windows ARM][Onboard] NemoClaw preflight reports 'no NVIDIA GPU detected' despite nvidia-smi and Docker --gpus confirming 65GB GPU #4565) and misleading "enabled" ([Jetson Orin][CLI&UX] nemoclaw status shows "Sandbox GPU: enabled" but CUDA is unusable inside sandbox — misleading status #4231) before fixing.Signed-off-by: Yimo Jiang yimoj@nvidia.com
Summary by CodeRabbit