Skip to content

refactor(arch): add named timeout constants for openshell execution#2683

Merged
jyaunches merged 4 commits into
NVIDIA:mainfrom
jyaunches:issue-2562-unified-timeout-openshell
Apr 29, 2026
Merged

refactor(arch): add named timeout constants for openshell execution#2683
jyaunches merged 4 commits into
NVIDIA:mainfrom
jyaunches:issue-2562-unified-timeout-openshell

Conversation

@jyaunches

@jyaunches jyaunches commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Introduces a shared module of named timeout constants for all openshell child-process calls. This is Phase 1 of #2562 — pure constants with no behavioral change, establishing the vocabulary that subsequent PRs will use to bound every unbounded call site.

Related Issue

Refs #2562 (partial — Phase 1 of 4)

Changes

  • Add src/lib/openshell-timeouts.ts with four categorized constants:
    • OPENSHELL_PROBE_TIMEOUT_MS (15s) — read-only queries (list, status, info, ssh-config)
    • OPENSHELL_OPERATION_TIMEOUT_MS (30s) — mutating commands (provider CRUD, forward, gateway select)
    • OPENSHELL_HEAVY_TIMEOUT_MS (60s) — destructive/long-running (sandbox delete, gateway destroy)
    • OPENSHELL_DOWNLOAD_TIMEOUT_MS (30s) — file transfers over SSH
  • Add src/lib/openshell-timeouts.test.ts with unit tests for positivity, ordering, and PR fix(onboard): auto-allocate dashboard port on multi-sandbox conflict #2454 alignment
  • Uses the same OPENSHELL_PROBE_TIMEOUT_MS name from PR fix(onboard): auto-allocate dashboard port on multi-sandbox conflict #2454 so it can import from this shared module on rebase

Type of Change

  • Code change (feature, bug fix, or refactor)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes

AI Disclosure

  • AI-assisted — tool: Claude Code (pi)

Signed-off-by: Jessica Yaunches jyaunches@nvidia.com

Summary by CodeRabbit

  • Chores
    • Centralized timeout settings for shell interactions to standardize behavior and improve reliability across operations.
  • Tests
    • Added tests that validate timeout values are numeric, positive, and follow the expected ordering to prevent regressions.
  • Stability
    • Applied the refined probe-level timeout to quick checks to make probes and lightweight operations more consistent and predictable.

@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 96352f97-8084-4a1d-995a-95352c6f6cdf

📥 Commits

Reviewing files that changed from the base of the PR and between 6e67943 and fdc59a0.

📒 Files selected for processing (1)
  • src/nemoclaw.ts

📝 Walkthrough

Walkthrough

Adds a new openshell timeouts module and tests, and applies the probe timeout constant to multiple openshell probe calls across the codebase.

Changes

Cohort / File(s) Summary
Openshell timeout module & tests
src/lib/openshell-timeouts.ts, src/lib/openshell-timeouts.test.ts
New module exporting four millisecond timeout constants: probe, operation, download, heavy. Tests validate positive-integer values, enforce ordering (probe < operation ≤ download < heavy) and exact probe value 15000.
Apply probe timeout to openshell calls
src/nemoclaw.ts
Multiple captureOpenshell / runOpenshell probe-style calls updated to pass timeout: OPENSHELL_PROBE_TIMEOUT_MS (preserving existing ignoreError behavior) across sandbox listing, ssh-config reads, gateway lifecycle/status checks, state reads, messaging/inference probes, and recovery selection.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐇 I nibbled at time, then set it right,
Four little clocks in tidy light.
Probe stands steady, numbers true—
Tests clap softly, code says "pursue." ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'refactor(arch): add named timeout constants for openshell execution' is directly aligned with the main changeset, which introduces a new module with shared timeout constants and threads them through openshell interactions.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Review rate limit: 8/10 reviews remaining, refill in 10 minutes and 44 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/openshell-timeouts.test.ts`:
- Around line 29-33: The test's last assertion allows DOWNLOAD and HEAVY
timeouts to be equal, weakening the contract; update the assertion in the test
that checks OPENSHELL_PROBE_TIMEOUT_MS, OPENSHELL_OPERATION_TIMEOUT_MS,
OPENSHELL_DOWNLOAD_TIMEOUT_MS, and OPENSHELL_HEAVY_TIMEOUT_MS so that
OPENSHELL_DOWNLOAD_TIMEOUT_MS is strictly less than OPENSHELL_HEAVY_TIMEOUT_MS
(use a strict comparison matcher instead of the current less-than-or-equal
matcher) to enforce DOWNLOAD < HEAVY.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8e3252e5-8f24-430e-9643-aecf17da9486

📥 Commits

Reviewing files that changed from the base of the PR and between 24e4ff3 and ddc1343.

📒 Files selected for processing (2)
  • src/lib/openshell-timeouts.test.ts
  • src/lib/openshell-timeouts.ts

Comment thread src/lib/openshell-timeouts.test.ts
Introduce src/lib/openshell-timeouts.ts with categorized timeout
constants for all openshell child-process calls:

- OPENSHELL_PROBE_TIMEOUT_MS (15s) — read-only queries
- OPENSHELL_OPERATION_TIMEOUT_MS (30s) — mutating commands
- OPENSHELL_HEAVY_TIMEOUT_MS (60s) — destructive/long-running ops
- OPENSHELL_DOWNLOAD_TIMEOUT_MS (30s) — file transfers over SSH

Uses the same OPENSHELL_PROBE_TIMEOUT_MS name introduced locally
in PR NVIDIA#2454 so that PR can import from the shared module on rebase.

This is Phase 1 of NVIDIA#2562 — pure constants, no behavioral change.

Refs: NVIDIA#2562
Add timeout: OPENSHELL_PROBE_TIMEOUT_MS to 15 unbounded
captureOpenshell/runOpenshell calls in the status, recovery, and
connect paths of nemoclaw.ts:

- hasNoLiveSandboxes → sandbox list
- executeSandboxCommand → sandbox ssh-config
- recoverRegistryFromLiveGateway → sandbox list
- getNamedGatewayLifecycleState → status + gateway info
- recoverNamedGatewayRuntime → gateway select (x2)
- getSandboxGatewayState → sandbox get + policy get
- reconcileMissingAgainstNamedGateway → gateway select
- recoverRegistryEntries → sandbox list
- showStatus / listSandboxes / sandboxStatus → inference get (x4)
- isGatewayAlive → sandbox list
- providerExists → provider get

This is Phase 2 of NVIDIA#2562 — the critical path that caused the
NVIDIA#2398 E2E hang and blocks re-landing NVIDIA#2390.

Refs: NVIDIA#2562
@jyaunches jyaunches force-pushed the issue-2562-unified-timeout-openshell branch from ddc1343 to 6e67943 Compare April 29, 2026 20:45

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/nemoclaw.ts (1)

515-518: Run the lifecycle-focused E2E suites for this CLI file

Since these changes touch sandbox/gateway lifecycle paths in src/nemoclaw.ts, run the recommended E2E jobs before merge:
gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=sandbox-survival-e2e,sandbox-operations-e2e,skip-permissions-e2e

As per coding guidelines, src/nemoclaw.ts changes affecting sandbox lifecycle commands should be validated with sandbox-survival-e2e, sandbox-operations-e2e, and skip-permissions-e2e.

Also applies to: 638-662

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/nemoclaw.ts` around lines 515 - 518, Changes touching sandbox/gateway
lifecycle (e.g., the captureOpenshell(["sandbox", "list"]) probe and related
sandbox lifecycle paths) require running lifecycle-focused E2E suites before
merge; run the GitHub workflow with: gh workflow run nightly-e2e.yaml --ref
<branch> -f
jobs=sandbox-survival-e2e,sandbox-operations-e2e,skip-permissions-e2e, wait for
those jobs to pass, and only merge after they succeed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/nemoclaw.ts`:
- Line 97: The gateway "openshell gateway select" calls are using the probe
timeout constant but must use the operation timeout; update the import to
include OPENSHELL_OPERATION_TIMEOUT_MS (replace or add alongside
OPENSHELL_PROBE_TIMEOUT_MS) and change the timeout argument passed to each
openshell gateway select invocation (the calls around the blocks referencing
gateway select at the locations handling gateway selection/recovery) to
OPENSHELL_OPERATION_TIMEOUT_MS instead of OPENSHELL_PROBE_TIMEOUT_MS so mutating
gateway select uses the correct operation timeout.

---

Nitpick comments:
In `@src/nemoclaw.ts`:
- Around line 515-518: Changes touching sandbox/gateway lifecycle (e.g., the
captureOpenshell(["sandbox", "list"]) probe and related sandbox lifecycle paths)
require running lifecycle-focused E2E suites before merge; run the GitHub
workflow with: gh workflow run nightly-e2e.yaml --ref <branch> -f
jobs=sandbox-survival-e2e,sandbox-operations-e2e,skip-permissions-e2e, wait for
those jobs to pass, and only merge after they succeed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d0748668-80bd-4db2-98be-724d54d2015b

📥 Commits

Reviewing files that changed from the base of the PR and between ddc1343 and 6e67943.

📒 Files selected for processing (3)
  • src/lib/openshell-timeouts.test.ts
  • src/lib/openshell-timeouts.ts
  • src/nemoclaw.ts
✅ Files skipped from review due to trivial changes (1)
  • src/lib/openshell-timeouts.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/openshell-timeouts.test.ts

Comment thread src/nemoclaw.ts Outdated
Address CodeRabbit review feedback: gateway select is a mutating
operation, not a read-only probe. Switch the 3 gateway select calls
from OPENSHELL_PROBE_TIMEOUT_MS (15s) to OPENSHELL_OPERATION_TIMEOUT_MS
(30s) for correct timeout classification.

Refs: NVIDIA#2562
@jyaunches jyaunches merged commit 23699c9 into NVIDIA:main Apr 29, 2026
11 checks passed
DemianHeyGen pushed a commit to DemianHeyGen/NemoClaw that referenced this pull request Apr 30, 2026
…VIDIA#2683)

## Summary

Introduces a shared module of named timeout constants for all openshell
child-process calls. This is Phase 1 of NVIDIA#2562 — pure constants with no
behavioral change, establishing the vocabulary that subsequent PRs will
use to bound every unbounded call site.

## Related Issue

Refs NVIDIA#2562 (partial — Phase 1 of 4)

## Changes

- Add `src/lib/openshell-timeouts.ts` with four categorized constants:
- `OPENSHELL_PROBE_TIMEOUT_MS` (15s) — read-only queries (list, status,
info, ssh-config)
- `OPENSHELL_OPERATION_TIMEOUT_MS` (30s) — mutating commands (provider
CRUD, forward, gateway select)
- `OPENSHELL_HEAVY_TIMEOUT_MS` (60s) — destructive/long-running (sandbox
delete, gateway destroy)
  - `OPENSHELL_DOWNLOAD_TIMEOUT_MS` (30s) — file transfers over SSH
- Add `src/lib/openshell-timeouts.test.ts` with unit tests for
positivity, ordering, and PR NVIDIA#2454 alignment
- Uses the same `OPENSHELL_PROBE_TIMEOUT_MS` name from PR NVIDIA#2454 so it
can import from this shared module on rebase

## Type of Change

- [x] Code change (feature, bug fix, or refactor)

## Verification

- [ ] `npx prek run --all-files` passes
- [x] `npm test` passes
- [x] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes

## AI Disclosure

- [x] AI-assisted — tool: Claude Code (pi)

---
Signed-off-by: Jessica Yaunches <jyaunches@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Chores**
* Centralized timeout settings for shell interactions to standardize
behavior and improve reliability across operations.
* **Tests**
* Added tests that validate timeout values are numeric, positive, and
follow the expected ordering to prevent regressions.
* **Stability**
* Applied the refined probe-level timeout to quick checks to make probes
and lightweight operations more consistent and predictable.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran added the refactor PR restructures code without intended behavior change label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

refactor PR restructures code without intended behavior change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants