fix(status): exit 1 and suppress healthy when gateway is down#2626
fix(status): exit 1 and suppress healthy when gateway is down#2626rluo8 wants to merge 2 commits into
Conversation
📝 WalkthroughWalkthrough
Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant CLI as "CLI\n(nemoclaw status)"
participant Reconciler as "Reconciler\n(getReconciledSandboxGatewayState)"
participant Gateway as "Live Gateway\n(OpenClaw/OpenShell)"
participant Provider as "Inference Provider\n(external API)"
participant Registry as "Local Registry\n(sandboxes.json)"
CLI->>Reconciler: request sandbox vs gateway state
Reconciler-->>CLI: lookup.state (present / missing / wrong_gateway_active / ...)
alt lookup.state == "present"
CLI->>Gateway: verify gateway reachable
Gateway-->>CLI: reachable
CLI->>Provider: probeProviderHealth / parse inference get
Provider-->>CLI: probe result (healthy / unhealthy)
CLI->>CLI: print healthy/inference info\nexit 0 (if healthy)
else lookup.state != "present"
CLI->>CLI: print guidance/error for specific state
alt state indicates stale registry entry
CLI->>Registry: remove orphaned sandbox\nset defaultSandbox=null
Registry-->>CLI: write ack
end
CLI-->>CLI: exit 1
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 9/10 reviews remaining, refill in 6 minutes. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/nemoclaw.ts (1)
1695-1875: Recommended validation run: execute the targeted E2E recovery suites for this status behavior change.Given this touches sandbox lifecycle/status correctness around gateway recovery, running the recommended E2E jobs would reduce regression risk.
As per coding guidelines, "E2E test recommendation: sandbox-survival-e2e — gateway restart recovery; sandbox-operations-e2e — process recovery after gateway kill; skip-permissions-e2e — permissive policy activation."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nemoclaw.ts` around lines 1695 - 1875, This change affects sandbox lifecycle logic in sandboxStatus (and related calls like getReconciledSandboxGatewayState and parseSandboxPhase), so run the recommended E2E suites locally/CI: sandbox-survival-e2e (gateway restart recovery), sandbox-operations-e2e (process recovery after gateway kill), and skip-permissions-e2e (permissive policy activation); capture failures, fix any regressions in the sandboxStatus flows (recovery handling, lookup.state branches, and registry/session updates), and attach the E2E results to the PR (or add/enable these suites in CI) before merging.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/nemoclaw.ts`:
- Around line 1695-1875: This change affects sandbox lifecycle logic in
sandboxStatus (and related calls like getReconciledSandboxGatewayState and
parseSandboxPhase), so run the recommended E2E suites locally/CI:
sandbox-survival-e2e (gateway restart recovery), sandbox-operations-e2e (process
recovery after gateway kill), and skip-permissions-e2e (permissive policy
activation); capture failures, fix any regressions in the sandboxStatus flows
(recovery handling, lookup.state branches, and registry/session updates), and
attach the E2E results to the PR (or add/enable these suites in CI) before
merging.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ef0efda6-7be3-4cda-be9c-392fdc9b9292
📒 Files selected for processing (2)
src/nemoclaw.tstest/cli.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- test/cli.test.ts
## Summary Ports the useful behavior from #2626 onto current main without depending on #2881 or #2822. - reconcile sandbox/gateway state before probing or printing inference provider health - suppress `Inference: healthy` unless the selected sandbox/gateway is verified present - exit non-zero for degraded or unverified sandbox/gateway status states - remove only the selected stale registry entry when status verifies that named sandbox is absent from a healthy NemoClaw gateway; status does not sweep unrelated registry-only entries - keep `nemoclaw list` rendering registered sandboxes when runtime inference probing is degraded Fixes #2595 Refs #2666 Refs #2604 ## Validation - `npm run build:cli` - `npx vitest run test/cli.test.ts -t "status|runtime inference probing"` - `npx vitest run test/cli.test.ts -t "gateway|inference|orphan|verified"` - `npx vitest run test/cli.test.ts` - `npm run typecheck:cli` - `npm run typecheck` - `npx biome lint src/nemoclaw.ts test/cli.test.ts` - `git diff --check` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Automatic generation and application of an initial sandbox create policy based on enabled messaging channels; merged policy presets are applied at sandbox boot. * **Bug Fixes** * Status now only reports inference when gateway/sandbox state is verified; otherwise shows "Inference: not verified..." and exits non-zero for unverified states. * Status removes the selected stale/orphaned registry entry only when that named sandbox is verified absent from a healthy NemoClaw gateway; it does not sweep unrelated registry-only entries. List/status gracefully fall back when inference probing is degraded. * **Tests** * CLI, e2e, and unit tests updated to expect non-zero exits and verification-gated inference reporting; policy merge behavior covered. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
Thank you for the fix here. This PR correctly identified that The maintained version of this behavior has now landed in #2884, which closed #2595. That merged path ports the useful pieces from this PR onto current I am closing this as superseded by #2884 so we do not keep a conflicting older branch open, but the issue and behavior were materially advanced by this PR. Thanks again for the contribution. |
Summary
nemoclaw <sandbox> statuscould printInference: healthybefore verifying that the selected NemoClaw gateway and sandbox were actually reachable. When the OpenShell gateway was down/refusing connections, the command later printed recovery/error guidance but still exited0, misleading users and scripts.This PR checks the reconciled sandbox/gateway state before printing inference health. If the live sandbox/gateway state cannot be verified,
statusno longer printsInference: healthyand exits non-zero.Related Issue
Fixes #2595
Changes
src/nemoclaw.ts— MovedgetReconciledSandboxGatewayState()ahead of inference/provider health rendering insandboxStatus().src/nemoclaw.ts— Only printsInference: healthywhenlookup.state === "present".src/nemoclaw.ts— Makes degraded/unverifiedstatuspaths exit1instead of silently returning success.test/cli.test.ts— Updated degraded gateway/status tests to expect non-zero exit codes.test/cli.test.ts— Added regression assertion that gateway-refused status output does not containInference: healthy.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesmake docsbuilds without warnings (doc changes only)AI Disclosure
Signed-off-by: rluo8 ruluo@nvidia.com
Summary by CodeRabbit
Bug Fixes
Tests