fix(connect): fail fast when gateway is down#3853
Conversation
Signed-off-by: Chengjie Wang <chengjiew@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughWhen the OpenShell gateway is disconnected or unreachable, readiness probing now classifies that condition and fails immediately with recovery guidance. Sandbox readiness polling captures both ChangesGateway Availability Detection and Recovery
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
ESLint skipped: no ESLint configuration detected in root package.json. To enable, add Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
PR Review AdvisorRecommendation: blocked This is an automated advisory review. A human maintainer must make the final merge decision. Limitations: Review used provided trusted metadata and the supplied diff; no tests, package-manager commands, scripts, workflows, or PR-provided instructions were executed.; CI and E2E conclusions were evaluated from the provided GitHub/status context only.; No live OpenShell gateway, Docker gateway container, or real sandbox was available to validate the docker-kill scenario.; Issue, PR, and bot text were treated as untrusted evidence and used only for acceptance mapping.; The full current contents of changed files were not independently re-read beyond the supplied diff/context.; Selective E2E comments mention sandbox-operations-e2e success for the branch/ref, but the supplied evidence does not prove that job passed for the exact current head SHA. Full advisor summaryPR Review AdvisorBase: Directionally fixes gateway-down connect guidance, but merge is blocked by mergeStateStatus=BLOCKED, missing required sandbox E2E evidence for this head SHA, and enforced connect.ts monolith growth. Gate status
🔴 Blockers
🟡 Warnings
🔵 Suggestions
Acceptance coverage
Security review
Test / E2E status
✅ What looks good
Review completeness
|
Selective E2E Results — ❌ Some jobs failedRun: 26258188011
|
Selective E2E Results — ❌ Some jobs failedRun: 26259194249
|
|
@chengjiew thanks for the fix here. This PR has accumulated commit signature/merge-commit issues while we were trying to get it current with main, so I'm going to preserve your change by opening a clean squashed replacement PR and then close this one. For your next PR, could you please make sure the branch has only signed commits and avoid merge commits from updating with main? That should keep the repo signature checks and merge queue happier. |
|
Closing in favor of clean squashed replacement PR #4022 to avoid the commit signature/merge-commit issues on this branch. @chengjiew please make sure future PR branches keep commits signed and avoid merge commits when updating with main. |
## Summary - fail fast during `nemoclaw <sandbox> connect` readiness polling when the named NemoClaw/OpenShell gateway is down or unreachable - include recovery guidance to restart the named gateway or rerun `nemoclaw onboard` - preserve stuck-sandbox timeout behavior when readiness has a concrete non-terminal phase such as `Provisioning` - add a regression test for `unknown` readiness status plus disconnected gateway lifecycle Supersedes #3853 with the same patch squashed onto current `main` to avoid the commit signature/merge-commit issues on that branch. Fixes #3821 ## Test Plan - [x] `git diff --check` - [ ] `npm test -- test/cli.test.ts -t "fails fast with gateway recovery guidance"` (not run locally: dependencies are not installed in this worktree; CI will run) - Prior PR #3853 evidence before the squash: PR CI green; `sandbox-operations-e2e` passed on previous head; latest E2E advisor requires `sandbox-operations-e2e` for current head and it will need to be re-run here. Signed-off-by: Julie Yaunches <jyaunches@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved sandbox connection readiness to detect OpenShell gateway unavailability earlier and fail fast with clearer, actionable error messages and recovery steps. * **Tests** * Added CLI test coverage ensuring connect attempts fail promptly when the gateway is reported disconnected, validating exit behavior and user guidance without entering the normal connect flow. <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/4022?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
Summary
nemoclaw <sandbox> connectreadiness polling when the named NemoClaw/OpenShell gateway is down or unreachablenemoclaw onboardProvisioningunknownreadiness status plus disconnected gateway lifecycleFixes #3821
Test Plan
npm run build:clinpm run typecheck:clinpm test -- test/cli.test.ts -t "fails fast with gateway recovery guidance"npm test -- test/sandbox-stuck-recovery.test.tsnpm test -- test/cli.test.ts -t "connect|gateway metadata exists|gateway is no longer configured|gateway recovery guidance|sandbox readiness"git diff --checkSummary by CodeRabbit
Bug Fixes
Tests
Signed-off-by: Chengjie Wang chengjiew@nvidia.com