fix(status): require verified gateway before healthy inference#2884
Conversation
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
📝 WalkthroughWalkthrough
ChangesStatus: Verify Gateway/Sandbox Before Inference Probe
Sequence Diagram(s)sequenceDiagram
participant User as User
participant CLI as CLI
participant Gateway as Gateway
participant Provider as Provider
User->>CLI: run "nemoclaw <sandbox> status"
CLI->>Gateway: getReconciledSandboxGatewayState(sandbox, getState=...)
alt lookup.state == "present"
CLI->>Gateway: captureOpenshellForStatus(["inference","get"])
Gateway->>Provider: forward inference probe
Provider-->>Gateway: inference response
Gateway-->>CLI: inference response
CLI-->>User: print Inference: healthy (model/provider)
CLI-->>CLI: exit 0
else lookup.state != "present"
CLI-->>User: print Inference: not verified (gateway/sandbox state not verified)
CLI--xGateway: skip inference probe
CLI-->>User: print gateway reconciliation/guidance
CLI-->>CLI: exit 1
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
test/cli.test.ts (2)
4464-4497: ⚡ Quick winMake the degraded-inference test more specific to inference failure.
The fallback script currently returns
exit 1for all non-inference commands, so this can pass under broader runtime failure modes. Keep non-inference commands successful and assertinference getwas attempted.Proposed tightening for path specificity
it("lists registered sandboxes when runtime inference probing is degraded", () => { const home = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-cli-list-runtime-degraded-")); const localBin = path.join(home, "bin"); + const markerFile = path.join(home, "openshell-calls"); @@ fs.writeFileSync( path.join(localBin, "openshell"), [ "#!/usr/bin/env bash", + `marker_file=${JSON.stringify(markerFile)}`, + 'printf \'%s\\n\' "$*" >> "$marker_file"', 'if [ "$1" = "inference" ] && [ "$2" = "get" ]; then', " echo 'Error: client error (Connect)' >&2", " exit 1", "fi", - "exit 1", + "exit 0", ].join("\n"), { mode: 0o755 }, ); @@ expect(r.out).toContain("model: configured-model"); expect(r.out).toContain("provider: nvidia-prod"); + expect(fs.readFileSync(markerFile, "utf8")).toContain("inference get"); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cli.test.ts` around lines 4464 - 4497, The test's fallback openshell stub currently fails all commands, making the scenario ambiguous; update the stub used in the "lists registered sandboxes when runtime inference probing is degraded" test so only the inference probe fails and all other commands succeed. Specifically, change the openshell script built in the test (the block written to path.join(localBin, "openshell")) so the branch that matches '"$1" = "inference" && "$2" = "get"' prints the error and exits non‑zero, while any other invocation returns success (exit 0). Then tighten assertions in the test (the it block using runWithEnv) to assert that the inference probe was actually attempted by checking that the error output from that command is present (e.g., expect the command's stderr/out to contain the 'Error: client error (Connect)' marker) while preserving the existing checks (Sandboxes, alpha *, model, provider) so the test specifically verifies degraded inference probing rather than a generic runtime failure.
4177-4208: ⚡ Quick winStrengthen this case by asserting inference probe is skipped when state is unverified.
Current assertions validate output, but they don’t guarantee
openshell inference getwas not called. Adding a call marker closes that regression gap.Proposed test hardening
const registryDir = path.join(home, ".nemoclaw"); + const markerFile = path.join(home, "openshell-calls"); @@ fs.writeFileSync( path.join(localBin, "openshell"), [ "#!/usr/bin/env bash", + `marker_file=${JSON.stringify(markerFile)}`, + 'printf \'%s\\n\' "$*" >> "$marker_file"', 'if [ "$1" = "sandbox" ] && [ "$2" = "get" ] && [ "$3" = "alpha" ]; then', @@ expect(statusResult.out).toContain( "Inference: not verified (gateway/sandbox state not verified)", ); + expect(fs.readFileSync(markerFile, "utf8")).not.toContain("inference get");🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cli.test.ts` around lines 4177 - 4208, The test should assert that the inference probe command was not invoked by instrumenting the fake curl used in the test; modify the curl stub written in the test (the script created under localBin named "curl") to detect requests that correspond to the CLI's inference probe (the HTTP path or arguments used by the CLI when calling openshell inference get) and write a call-marker (e.g., touch a marker file in HOME or print a unique token) when such a request occurs; then after invoking runWithEnv("alpha status", ...) assert that the marker was NOT created/printed (using statusResult and filesystem checks) to ensure openshell inference get was not called. Ensure you update the assertions around statusResult (the existing expects) to include the new negative assertion for the marker.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/cli.test.ts`:
- Around line 4464-4497: The test's fallback openshell stub currently fails all
commands, making the scenario ambiguous; update the stub used in the "lists
registered sandboxes when runtime inference probing is degraded" test so only
the inference probe fails and all other commands succeed. Specifically, change
the openshell script built in the test (the block written to path.join(localBin,
"openshell")) so the branch that matches '"$1" = "inference" && "$2" = "get"'
prints the error and exits non‑zero, while any other invocation returns success
(exit 0). Then tighten assertions in the test (the it block using runWithEnv) to
assert that the inference probe was actually attempted by checking that the
error output from that command is present (e.g., expect the command's stderr/out
to contain the 'Error: client error (Connect)' marker) while preserving the
existing checks (Sandboxes, alpha *, model, provider) so the test specifically
verifies degraded inference probing rather than a generic runtime failure.
- Around line 4177-4208: The test should assert that the inference probe command
was not invoked by instrumenting the fake curl used in the test; modify the curl
stub written in the test (the script created under localBin named "curl") to
detect requests that correspond to the CLI's inference probe (the HTTP path or
arguments used by the CLI when calling openshell inference get) and write a
call-marker (e.g., touch a marker file in HOME or print a unique token) when
such a request occurs; then after invoking runWithEnv("alpha status", ...)
assert that the marker was NOT created/printed (using statusResult and
filesystem checks) to ensure openshell inference get was not called. Ensure you
update the assertions around statusResult (the existing expects) to include the
new negative assertion for the marker.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: fb9cb7e4-90a1-4b9e-8bdf-32c2a9ecf151
📒 Files selected for processing (2)
src/nemoclaw.tstest/cli.test.ts
Selective E2E Results — ✅ All requested jobs passedRun: 25256767831
|
Selective E2E Results — ✅ All requested jobs passedRun: 25257342813
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/cli.test.ts (1)
3767-3837: ⚡ Quick winStrengthen the healthy-path test with probe-order assertions.
Line 3835 currently validates output only. It would be stronger to also assert
inference getis invoked after sandbox verification, so this test guards the gating behavior directly.Proposed test hardening
it("prints healthy inference only after the sandbox and gateway are verified", () => { const home = fs.mkdtempSync(path.join(os.tmpdir(), "nemoclaw-cli-status-healthy-")); const localBin = path.join(home, "bin"); + const markerFile = path.join(home, "openshell-calls"); fs.mkdirSync(localBin, { recursive: true }); @@ [ "#!/usr/bin/env bash", + `printf '%s\\n' "$*" >> ${JSON.stringify(markerFile)}`, 'if [ "$1" = "sandbox" ] && [ "$2" = "get" ] && [ "$3" = "alpha" ]; then', @@ expect(r.out).toContain("Inference:"); expect(r.out).toContain("healthy"); expect(r.out).not.toContain("not verified"); + const calls = fs.readFileSync(markerFile, "utf8").trim().split("\n").filter(Boolean); + const sandboxGetIdx = calls.indexOf("sandbox get alpha"); + const inferenceGetIdx = calls.indexOf("inference get"); + expect(sandboxGetIdx).toBeGreaterThanOrEqual(0); + expect(inferenceGetIdx).toBeGreaterThan(sandboxGetIdx); });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/cli.test.ts` around lines 3767 - 3837, Update the test to record the order of openshell invocations and assert that "sandbox get" happens before "inference get": modify the localBin/openshell stub (used by runWithEnv in this test) to append distinct markers (e.g., "SANDBOX_GET" and "INFERENCE_GET") to a temp file in HOME each time those branches run, then after runWithEnv("alpha status") read that file and assert the marker order (indexOf SANDBOX_GET < indexOf INFERENCE_GET). Keep references to runWithEnv and the localBin/openshell stub so the gating behavior is directly verified.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/cli.test.ts`:
- Around line 3767-3837: Update the test to record the order of openshell
invocations and assert that "sandbox get" happens before "inference get": modify
the localBin/openshell stub (used by runWithEnv in this test) to append distinct
markers (e.g., "SANDBOX_GET" and "INFERENCE_GET") to a temp file in HOME each
time those branches run, then after runWithEnv("alpha status") read that file
and assert the marker order (indexOf SANDBOX_GET < indexOf INFERENCE_GET). Keep
references to runWithEnv and the localBin/openshell stub so the gating behavior
is directly verified.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ed01d313-bd91-4e84-85f3-feed209bfdde
📒 Files selected for processing (2)
src/nemoclaw.tstest/cli.test.ts
Selective E2E Results — ❌ Some jobs failedRun: 25257807467
|
Selective E2E Results — ✅ All requested jobs passedRun: 25257807467
|
Selective E2E Results — ✅ All requested jobs passedRun: 25259840292
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
test/policies.test.ts (1)
617-638: ⚡ Quick winAdd one edge-case test for duplicate + missing preset names.
Please add a case like
["slack", "missing", "slack"]and assert de-dup + missing reporting (appliedPresets: ["slack"],missingPresets: ["missing"]) to lock the helper contract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/policies.test.ts` around lines 617 - 638, Add a new test in the "mergePresetNamesIntoPolicy" describe block that calls policies.mergePresetNamesIntoPolicy with an array containing duplicates and a non-existent preset (e.g. ["slack","missing","slack"]); assert that result.appliedPresets de-duplicates to ["slack"], result.missingPresets lists ["missing"], and result.policy still contains the expected policy content for the applied preset. Locate the test near the existing spec that uses mergePresetNamesIntoPolicy and mirror its style/expectations for appliedPresets, missingPresets, and policy string checks.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/policies.test.ts`:
- Around line 617-638: Add a new test in the "mergePresetNamesIntoPolicy"
describe block that calls policies.mergePresetNamesIntoPolicy with an array
containing duplicates and a non-existent preset (e.g.
["slack","missing","slack"]); assert that result.appliedPresets de-duplicates to
["slack"], result.missingPresets lists ["missing"], and result.policy still
contains the expected policy content for the applied preset. Locate the test
near the existing spec that uses mergePresetNamesIntoPolicy and mirror its
style/expectations for appliedPresets, missingPresets, and policy string checks.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: fc6a6e90-383b-475f-9b55-a0648bdcfc8f
📒 Files selected for processing (5)
src/lib/onboard.tssrc/lib/policies.tssrc/nemoclaw.tstest/onboard.test.tstest/policies.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/nemoclaw.ts
## Summary Catch up the docs for user-facing changes that landed over the weekend and today, so the published guidance matches current installer, onboarding, status, logs, local inference, rebuild backup behavior, the next docs version selector, and refreshed generated user skills. ## Related Issue None. ## Changes - Document WSL Windows-host Ollama onboarding actions, including use, start, restart, install, and `host.docker.internal` model pulls from #2800; clarify that onboard owns Ollama install and model pulls from #2952. - Document installer fail-fast behavior for non-TTY third-party software acceptance from #2706. - Update sandbox-name guidance and Brev deploy validation behavior from #2948. - Add `nemoclaw <name> logs --tail/--since` coverage from #2825. - Add global `nemoclaw status --json` coverage from #2822. - Document verified-gateway status behavior and non-zero degraded exits from #2884. - Clarify that rebuild stops before deleting the original sandbox when backup fails, including unreadable or root-owned state paths. - Bump docs switcher metadata from 0.0.33 to 0.0.34 without changing package versions or creating release tags. - Regenerate `.agents/skills/nemoclaw-user-*` from docs, including the new `nemoclaw-user-manage-sandboxes` generated skill and removal of the stale `nemoclaw-user-workspace` output. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [x] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- <!-- DCO sign-off required by CI. Run: git config user.name && git config user.email --> Signed-off-by: Miyoung Choi <miyoungc@nvidia.com> Made with [Cursor](https://cursor.com) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Expanded Ollama/local inference guidance with detailed WSL and Windows-host workflows, proxy/token behavior, and onboarding options * Standardized sandbox name validation and updated troubleshooting, CLI, and deploy docs to surface the rules and validation timing * Added/rewrote Manage Sandboxes, policy management, backup/restore, messaging channels, workspace persistence, and CLI selection guides * Refreshed quickstart/Hermes guidance, skill mappings, and bumped docs version to 0.0.34 <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Cursor <cursoragent@cursor.com>
…/Inference fields (#3537) > Re-signed replacement for #3534. Same code, single squashed commit, this time SSH-signed (the original branch had `commit.gpgsign=false` set as a local override on my clone — the prior PR's commits ended up unverified). Force-push is blocked on the original branch, so this is a fresh branch + new PR; #3534 will be closed in favour of this one. ## Summary Fixes #2604. Both @wangericnv (2026-05-11) and @cv (2026-05-14) re-reported the same symptom on v0.0.38 / v0.0.41 — `nemoclaw status` prints `● cloudflared (stopped)` in all three failure modes (no PID file, garbage PID, dead/wrong-process PID) with no cause and no remediation. The bare command also omits the `Connected:` / `Inference:` labeled fields the original bug requested. Both symptoms are addressed here. ## Root cause - **Cloudflared diagnostic.** `src/lib/actions/sandbox/doctor.ts` already distinguished three states and emitted matching hints. `showStatus()` in `src/lib/tunnel/services.ts` was written before that and only had `isRunning() ? ok : "(stopped)"` — every failure mode collapsed into the same un-actionable line. - **Bare-status fields.** PR #2884 added `Inference:` / `Connected:` labels to the per-sandbox `nemoclaw <name> status` but left bare `nemoclaw status` showing only the model in parens. ## What this PR does 1. **Share the cloudflared state machine.** New `readCloudflaredState(pidDir)` in `src/lib/tunnel/services.ts` returns a discriminated union `{ kind: "running" | "stopped" | "stale-pid-file" | "stale-pid-process" }`. `showStatus()` switches on it and emits a coloured marker + one-line remediation. `cloudflaredDoctorCheck()` consumes the same function and translates each state into its `DoctorCheck`, removing duplicated PID-file / `/proc/<pid>/cmdline` logic. 2. **Remediation wording — `no cloudflared process; restart with ...`.** All three failure modes lead with the cause and point at `nemoclaw tunnel start`. Both reporters asked for that exact shape. `nemoclaw tunnel start` already handles all three states because `isRunning()` returns false and `startService` proceeds and overwrites any stale PID file — so one command recovers in every case. The same wording is used in `doctor.ts` for consistency. 3. **Bare-status `Inference:` and `Connected:` lines.** `showStatusCommand` in `src/lib/inventory/index.ts` now prints labeled `Inference: <provider> / <model>` and `Connected: yes (N session) / no` under each sandbox row. Provider/model prefer live gateway values for the default sandbox (consistent with the existing `(model)` rendering, #2369). `getActiveSessionCount` is wired through `buildStatusCommandDeps`, mirroring the cached SSH-process probe already used by `buildListCommandDeps`. 4. **Tests** (15 new). Three cases each for `showStatus` failure-mode rendering, five cases for `readCloudflaredState`, seven for bare-status `Inference:` / `Connected:` rendering across live/stored/missing dep variants. All existing cli-doctor tests still pass against the refactored shared function. ## Behavioural diff Before (v0.0.41 baseline): ``` ● cloudflared (stopped) # identical output for all three failure modes ``` After: ``` ● cloudflared (stopped) no cloudflared process; run `nemoclaw tunnel start` to start it ● cloudflared (stale PID file) no cloudflared process (stored PID is invalid); run `nemoclaw tunnel start` to restart it ● cloudflared (stale PID 999999999) no cloudflared process (PID 999999999 is dead or not cloudflared); run `nemoclaw tunnel start` to restart it ``` Bare status sandbox row: ``` # Before test-sandbox * (qwen2.5:7b) :18789 # After test-sandbox * (qwen2.5:7b) :18789 Inference: ollama-local / qwen2.5:7b Connected: no ``` ## Brev reproduction — 3× per case, baseline and fix (Run on the prior PR #3534 branch, same code as here.) Fresh `n2d-standard-2` (Ubuntu 22.04 / Linux 6.8 GCP), v0.0.41 from tag, faked registry, three PID-dir states 3× each. Re-installed from this branch. <details> <summary><b>Baseline v0.0.41 — 9/9 runs reproduce the bug</b></summary> ``` ### Case: stopped — no PID file --- Run 1 --- ● cloudflared (stopped) --- Run 2 --- ● cloudflared (stopped) --- Run 3 --- ● cloudflared (stopped) ### Case: stale-pid-file — garbage PID contents --- Run 1 --- ● cloudflared (stopped) --- Run 2 --- ● cloudflared (stopped) --- Run 3 --- ● cloudflared (stopped) ### Case: stale-pid-process — PID is dead --- Run 1 --- ● cloudflared (stopped) --- Run 2 --- ● cloudflared (stopped) --- Run 3 --- ● cloudflared (stopped) ``` </details> <details> <summary><b>Fix branch — 9/9 runs emit a state-specific remediation</b></summary> Same three-case 3× harness, with the new wording. Identical output across reruns within each case — no flake. </details> ## Out of scope (filed/to-file as follow-ups) - **Auto-restart on crash.** Needs a watchdog daemon / systemd unit / `while true; cloudflared || sleep 5` shim. Real design work, separate PR. Manual `pkill cloudflared` is not preventable from inside the CLI either way; the right answer there is actionable status output, which this PR provides. - **Intermittent cloudflared exits in nightly E2E (#3494).** Different scope — CI-infra fault attribution. #3517 covers it; touches disjoint files. ## Test plan - [x] `npm run build:cli` clean - [x] `npx tsc -p tsconfig.cli.json` clean - [x] `vitest run src/lib/tunnel/services.test.ts` — 23 pass - [x] `vitest run src/lib/inventory/index.test.ts` — 31 pass - [x] `vitest run test/cli.test.ts -t doctor` — 4/4 pass (covers refactored `cloudflaredDoctorCheck`) - [x] Brev `n2d-standard-2` repro: 9/9 baseline reproduces; 9/9 fix shows the new diagnostic + remediation lines, no flake - [x] Commit SSH-signed and verified by GitHub (`reason: valid`) Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Status command now displays inference provider and model per sandbox. * Status command includes active SSH session count when available. * **Bug Fixes** * Enhanced cloudflared health checks with refined state detection (running, stopped, stale PID). * Improved remediation hints for cloudflared diagnostic messages. * **Tests** * Added tests for expanded status output with provider, model, and session information. * Added tests for cloudflared state detection across various scenarios. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3537) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Summary
Ports the useful behavior from #2626 onto current main without depending on #2881 or #2822.
Inference: healthyunless the selected sandbox/gateway is verified presentnemoclaw listrendering registered sandboxes when runtime inference probing is degradedFixes #2595
Refs #2666
Refs #2604
Validation
npm run build:clinpx vitest run test/cli.test.ts -t "status|runtime inference probing"npx vitest run test/cli.test.ts -t "gateway|inference|orphan|verified"npx vitest run test/cli.test.tsnpm run typecheck:clinpm run typechecknpx biome lint src/nemoclaw.ts test/cli.test.tsgit diff --checkSummary by CodeRabbit
New Features
Bug Fixes
Tests