refactor(arch): add named timeout constants for openshell execution#2683
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughAdds a new openshell timeouts module and tests, and applies the probe timeout constant to multiple openshell probe calls across the codebase. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Review rate limit: 8/10 reviews remaining, refill in 10 minutes and 44 seconds. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/lib/openshell-timeouts.test.ts`:
- Around line 29-33: The test's last assertion allows DOWNLOAD and HEAVY
timeouts to be equal, weakening the contract; update the assertion in the test
that checks OPENSHELL_PROBE_TIMEOUT_MS, OPENSHELL_OPERATION_TIMEOUT_MS,
OPENSHELL_DOWNLOAD_TIMEOUT_MS, and OPENSHELL_HEAVY_TIMEOUT_MS so that
OPENSHELL_DOWNLOAD_TIMEOUT_MS is strictly less than OPENSHELL_HEAVY_TIMEOUT_MS
(use a strict comparison matcher instead of the current less-than-or-equal
matcher) to enforce DOWNLOAD < HEAVY.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8e3252e5-8f24-430e-9643-aecf17da9486
📒 Files selected for processing (2)
src/lib/openshell-timeouts.test.tssrc/lib/openshell-timeouts.ts
Introduce src/lib/openshell-timeouts.ts with categorized timeout constants for all openshell child-process calls: - OPENSHELL_PROBE_TIMEOUT_MS (15s) — read-only queries - OPENSHELL_OPERATION_TIMEOUT_MS (30s) — mutating commands - OPENSHELL_HEAVY_TIMEOUT_MS (60s) — destructive/long-running ops - OPENSHELL_DOWNLOAD_TIMEOUT_MS (30s) — file transfers over SSH Uses the same OPENSHELL_PROBE_TIMEOUT_MS name introduced locally in PR NVIDIA#2454 so that PR can import from the shared module on rebase. This is Phase 1 of NVIDIA#2562 — pure constants, no behavioral change. Refs: NVIDIA#2562
… < HEAVY ordering
Add timeout: OPENSHELL_PROBE_TIMEOUT_MS to 15 unbounded captureOpenshell/runOpenshell calls in the status, recovery, and connect paths of nemoclaw.ts: - hasNoLiveSandboxes → sandbox list - executeSandboxCommand → sandbox ssh-config - recoverRegistryFromLiveGateway → sandbox list - getNamedGatewayLifecycleState → status + gateway info - recoverNamedGatewayRuntime → gateway select (x2) - getSandboxGatewayState → sandbox get + policy get - reconcileMissingAgainstNamedGateway → gateway select - recoverRegistryEntries → sandbox list - showStatus / listSandboxes / sandboxStatus → inference get (x4) - isGatewayAlive → sandbox list - providerExists → provider get This is Phase 2 of NVIDIA#2562 — the critical path that caused the NVIDIA#2398 E2E hang and blocks re-landing NVIDIA#2390. Refs: NVIDIA#2562
ddc1343 to
6e67943
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/nemoclaw.ts (1)
515-518: Run the lifecycle-focused E2E suites for this CLI fileSince these changes touch sandbox/gateway lifecycle paths in
src/nemoclaw.ts, run the recommended E2E jobs before merge:
gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e,skip-permissions-e2eAs per coding guidelines,
src/nemoclaw.tschanges affecting sandbox lifecycle commands should be validated withsandbox-survival-e2e,sandbox-operations-e2e, andskip-permissions-e2e.Also applies to: 638-662
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/nemoclaw.ts` around lines 515 - 518, Changes touching sandbox/gateway lifecycle (e.g., the captureOpenshell(["sandbox", "list"]) probe and related sandbox lifecycle paths) require running lifecycle-focused E2E suites before merge; run the GitHub workflow with: gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e,skip-permissions-e2e, wait for those jobs to pass, and only merge after they succeed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/nemoclaw.ts`:
- Line 97: The gateway "openshell gateway select" calls are using the probe
timeout constant but must use the operation timeout; update the import to
include OPENSHELL_OPERATION_TIMEOUT_MS (replace or add alongside
OPENSHELL_PROBE_TIMEOUT_MS) and change the timeout argument passed to each
openshell gateway select invocation (the calls around the blocks referencing
gateway select at the locations handling gateway selection/recovery) to
OPENSHELL_OPERATION_TIMEOUT_MS instead of OPENSHELL_PROBE_TIMEOUT_MS so mutating
gateway select uses the correct operation timeout.
---
Nitpick comments:
In `@src/nemoclaw.ts`:
- Around line 515-518: Changes touching sandbox/gateway lifecycle (e.g., the
captureOpenshell(["sandbox", "list"]) probe and related sandbox lifecycle paths)
require running lifecycle-focused E2E suites before merge; run the GitHub
workflow with: gh workflow run nightly-e2e.yaml --ref <branch> -f
jobs=sandbox-survival-e2e,sandbox-operations-e2e,skip-permissions-e2e, wait for
those jobs to pass, and only merge after they succeed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: d0748668-80bd-4db2-98be-724d54d2015b
📒 Files selected for processing (3)
src/lib/openshell-timeouts.test.tssrc/lib/openshell-timeouts.tssrc/nemoclaw.ts
✅ Files skipped from review due to trivial changes (1)
- src/lib/openshell-timeouts.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/lib/openshell-timeouts.test.ts
Address CodeRabbit review feedback: gateway select is a mutating operation, not a read-only probe. Switch the 3 gateway select calls from OPENSHELL_PROBE_TIMEOUT_MS (15s) to OPENSHELL_OPERATION_TIMEOUT_MS (30s) for correct timeout classification. Refs: NVIDIA#2562
…VIDIA#2683) ## Summary Introduces a shared module of named timeout constants for all openshell child-process calls. This is Phase 1 of NVIDIA#2562 — pure constants with no behavioral change, establishing the vocabulary that subsequent PRs will use to bound every unbounded call site. ## Related Issue Refs NVIDIA#2562 (partial — Phase 1 of 4) ## Changes - Add `src/lib/openshell-timeouts.ts` with four categorized constants: - `OPENSHELL_PROBE_TIMEOUT_MS` (15s) — read-only queries (list, status, info, ssh-config) - `OPENSHELL_OPERATION_TIMEOUT_MS` (30s) — mutating commands (provider CRUD, forward, gateway select) - `OPENSHELL_HEAVY_TIMEOUT_MS` (60s) — destructive/long-running (sandbox delete, gateway destroy) - `OPENSHELL_DOWNLOAD_TIMEOUT_MS` (30s) — file transfers over SSH - Add `src/lib/openshell-timeouts.test.ts` with unit tests for positivity, ordering, and PR NVIDIA#2454 alignment - Uses the same `OPENSHELL_PROBE_TIMEOUT_MS` name from PR NVIDIA#2454 so it can import from this shared module on rebase ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Verification - [ ] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes ## AI Disclosure - [x] AI-assisted — tool: Claude Code (pi) --- Signed-off-by: Jessica Yaunches <jyaunches@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Centralized timeout settings for shell interactions to standardize behavior and improve reliability across operations. * **Tests** * Added tests that validate timeout values are numeric, positive, and follow the expected ordering to prevent regressions. * **Stability** * Applied the refined probe-level timeout to quick checks to make probes and lightweight operations more consistent and predictable. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Introduces a shared module of named timeout constants for all openshell child-process calls. This is Phase 1 of #2562 — pure constants with no behavioral change, establishing the vocabulary that subsequent PRs will use to bound every unbounded call site.
Related Issue
Refs #2562 (partial — Phase 1 of 4)
Changes
src/lib/openshell-timeouts.tswith four categorized constants:OPENSHELL_PROBE_TIMEOUT_MS(15s) — read-only queries (list, status, info, ssh-config)OPENSHELL_OPERATION_TIMEOUT_MS(30s) — mutating commands (provider CRUD, forward, gateway select)OPENSHELL_HEAVY_TIMEOUT_MS(60s) — destructive/long-running (sandbox delete, gateway destroy)OPENSHELL_DOWNLOAD_TIMEOUT_MS(30s) — file transfers over SSHsrc/lib/openshell-timeouts.test.tswith unit tests for positivity, ordering, and PR fix(onboard): auto-allocate dashboard port on multi-sandbox conflict #2454 alignmentOPENSHELL_PROBE_TIMEOUT_MSname from PR fix(onboard): auto-allocate dashboard port on multi-sandbox conflict #2454 so it can import from this shared module on rebaseType of Change
Verification
npx prek run --all-filespassesnpm testpassesAI Disclosure
Signed-off-by: Jessica Yaunches jyaunches@nvidia.com
Summary by CodeRabbit