fix(onboard): pre-approve gateway scope upgrades after onboard and recover by TonyLuo-NV · Pull Request #4763 · NVIDIA/NemoClaw

TonyLuo-NV · 2026-06-04T09:21:57Z

Summary

Fixes #4504. After a fresh nemoclaw onboard, the first openclaw agent run printed gateway connect failed: scope upgrade pending approval and silently fell back to embedded mode (exit 0), and the same pending requestId survived across every subsequent run — the approval was never auto-resolved (regression of NVB#6041401).

Root cause

Two defects in the connect-time auto-pair approval pass (runConnectAutoPairApprovalPass):

Approval was silently failing. The host openclaw devices approve ran with timeout=1s and stripped only OPENCLAW_GATEWAY_URL from the child env. Leaving OPENCLAW_GATEWAY_PORT + OPENCLAW_GATEWAY_TOKEN set re-pinned the approve to the gateway, hitting the OpenClaw CLI scope-upgrade approval deadlocks and forces openclaw agent into embedded fallback #4462 self-defeat where a gateway-pinned scope-upgrade approve requests the very scopes it is trying to grant and fails. So the request was never cleared and the same requestId persisted.
The pass never ran post-onboard. It was wired only into the interactive connect path, not onboard finalization or nemoclaw recover.

Fix

Match the in-sandbox watcher: approve timeout 1s → 10s; strip OPENCLAW_GATEWAY_URL + PORT + TOKEN. MAX_APPROVALS 8 → 1 (single realistic pending upgrade); outer cap 12s → 15s so a slow 10s approve is never SIGKILLed mid-loop (2 + 10×1 = 12s < 15s, ~3s startup slack).
Run the corrected pass in onboard finalization (after process recovery, before verification) and the nemoclaw recover / probe-only path (gateway-up branches only, not the failure exit).
Budget constants extracted to a dependency-free connect-autopair-budget.ts leaf so the invariant is asserted on real values, not source text.

Known limitation

The CLI scope-upgrade request is only created by the user's first openclaw agent run (after onboard returns), so that first run can still fall back once — onboard has nothing to pre-approve yet. The deterministic remedy is nemoclaw recover (now actually clears it) or one ~30s in-sandbox watcher cycle. A docs/changelog note pointing users at nemoclaw recover on first-run fallback is recommended as follow-up.

Testing

34 tests across test/sandbox-connect-inference.test.ts (5 new: probe-path positive/negative/best-effort, env-triplet strip, budget invariant) and src/lib/onboard/machine/handlers/finalization.test.ts (ordering, agent-agnostic, best-effort) — all pass.
npm run typecheck:cli clean.

🤖 Generated with Claude Code

Signed-off-by: Tony Luo xialuo@nvidia.com

Summary by CodeRabbit

New Features
- Added configurable auto-pair budgeting (approval count and timeouts) for sandbox connect flows.
- Introduced a dedicated connect-path auto-pair approval helper to enforce the tighter connect budget.
Improvements
- Connect probe and onboarding finalization now run a best-effort auto-pair approval sweep to clear pending pairing requests.
- Centralized finalization helpers with lazy-loading to reduce import surface.
Tests
- New and updated tests covering approval limits, probe-only behaviors, and budget/timeouts.

coderabbitai · 2026-06-04T09:22:12Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds connect-time auto-pair budgeting constants, parameterizes and exposes a budgeted sandbox approval pass, invokes it best-effort from connect probe paths and onboarding finalization (via lazy deps), and adds tests plus a sandbox exec helper for probe/gateway-down scenarios.

Changes

Auto-pair scope approval integration

Layer / File(s)	Summary
Auto-pair approval budget constants `src/lib/actions/sandbox/connect-autopair-budget.ts`	New leaf module exporting `CONNECT_AUTO_PAIR_MAX_APPROVALS`, `CONNECT_AUTO_PAIR_LIST_TIMEOUT_S`, `CONNECT_AUTO_PAIR_APPROVE_TIMEOUT_S`, and `CONNECT_AUTO_PAIR_TIMEOUT_MS` used by the in-sandbox approval script and outer spawn cap.
Auto-pair approval script budgeting and API `src/lib/actions/sandbox/auto-pair-approval.ts`	Adds `AUTO_PAIR_APPROVAL_TIMEOUT_MS` and `AutoPairApprovalBudget`; `buildAutoPairApprovalScript` and `runSandboxAutoPairApprovalPass` accept `budget` to parameterize max approvals, list/approve timeouts, and the outer spawn timeout.
Sandbox connect: budgeted approval pass and probe hooks `src/lib/actions/sandbox/connect.ts`	Imports connect-budget constants, derives `CONNECT_AUTO_PAIR_BUDGET`, exports `runConnectAutoPairApprovalPass(sandboxName)` that calls the sandbox approval with the connect budget, and best-effort invokes it in both wasRunning and recovered probe paths; replaces direct connect-phase approval call with the wrapper.
Lazy finalization deps module `src/lib/onboard/finalization-deps.ts`	Introduces `finalizationHandlerDeps` with lazy-required helpers that delegate to `process-recovery` and `connect` action functions to avoid static import cycles.
Finalization deps injection `src/lib/onboard.ts`	Imports `finalizationHandlerDeps` and spreads it into the `deps` object passed to `handleFinalizationState`, replacing prior inline wiring.
Finalization handler auto-pair approval integration `src/lib/onboard/machine/handlers/finalization.ts`	Adds `autoPairScopeApproval(sandboxName)` to the finalization deps contract and invokes it immediately after `checkAndRecoverSandboxProcesses(...)` as a best-effort sweep before verification/handoff.
Finalization handler test coverage `src/lib/onboard/machine/handlers/finalization.test.ts`	Mocks and forwards `autoPairScopeApproval` in test deps; adds tests validating invocation ordering (after process recovery, before verify), agent-type independence, and best-effort non-throwing behavior.
Connect approval tests and helper `test/sandbox-connect-inference/auto-pair-approval.test.ts`, `test/sandbox-connect-inference/helpers.ts`	Updates tests to import budget constants, add probe-only integration scenarios (`#4504`), assert MAX_APPROVALS and timeout invariants, and add a gateway-down stub branch to model probe failure without running approval.

Sequence Diagram

sequenceDiagram
  participant Onboard as handleFinalizationState
  participant Recovery as checkAndRecoverSandboxProcesses
  participant Approval as autoPairScopeApproval / runConnectAutoPairApprovalPass
  participant Verify as deploymentVerification
  Onboard->>Recovery: recover sandbox processes
  Recovery-->>Onboard: recovery complete
  Onboard->>Approval: run approval sweep (best-effort)
  Approval-->>Onboard: returns (non-throwing)
  Onboard->>Verify: verify deployment
  Verify-->>Onboard: verification complete
  Onboard->>Onboard: record session & print dashboard

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

NVIDIA/NemoClaw#4786: Modifies the same src/lib/actions/sandbox/connect.ts approval pass script for environment sanitization and approval filtering.
NVIDIA/NemoClaw#4788: Updates embedded Python approval logic in the connect approval pass and is directly related to these changes.
NVIDIA/NemoClaw#4936: Also touches the in-sandbox auto-pair approval flow and recovery usage of the connect approval pass.

Suggested labels

area: sandbox, Integration: OpenClaw, v0.0.61

Suggested reviewers

prekshivyas
cv
ericksoa

Poem

🐰 I hopped through code at break of day,

Counting approvals in a tidy array.
One timeout set, one approval true,
Probe and recover — best-effort through.
The rabbit smiles: tests all green and gay.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title 'fix(onboard): pre-approve gateway scope upgrades after onboard and recover' clearly and specifically describes the main objective of adding pre-approval of gateway scope upgrades during onboard finalization and recover flows.
Linked Issues check	✅ Passed	All linked issue `#4504` requirements are met: scope upgrades are pre-approved during onboard finalization, the approval pass runs in recover/probe paths, timeouts are corrected, and MAX_APPROVALS is reduced to 1 with proper budget handling.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing `#4504`: adding auto-pair approval budgeting infrastructure, wiring the approval pass into onboard finalization and connect probe paths, updating tests, and supporting the gateway scope-upgrade pre-approval workflow.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

src/lib/onboard/machine/handlers/finalization.ts (1)
105-108: Run the onboarding/sandbox E2E slice for this integration point.

Since this hook changes finalization handoff behavior, run the recommended selective nightly E2E jobs before merge to validate cross-layer behavior under real gateway/sandbox lifecycle conditions.

As per coding guidelines: src/lib/onboard.ts changes affect full sandbox creation/configuration flow and include explicit E2E test recommendations.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/machine/handlers/finalization.ts` around lines 105 - 108,
This change to finalization (deps.autoPairScopeApproval in
src/lib/onboard/machine/handlers/finalization.ts) alters the sandbox/gateway
handoff path and must be validated by running the selective nightly E2E slice
for onboarding/sandbox lifecycle; before merging, execute the recommended E2E
jobs that cover the sandbox creation/configuration flow (as referenced by
src/lib/onboard.ts), verify there are no regressions in gateway recovery,
pairing handoff, or stuck pairing requests, and attach the E2E run IDs and any
failing test traces to the PR for review.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 7032-7036: The autoPairScopeApproval callback wiring currently
defined as autoPairScopeApproval: (name) => { const connect: typeof
import("./actions/sandbox/connect") = require("./actions/sandbox/connect");
connect.runConnectAutoPairApprovalPass(name); } should be moved out of
src/lib/onboard.ts into a new module under src/lib/onboard/ (for example
src/lib/onboard/autoPair.ts) or into the file that implements the finalization
handler; export a function (e.g., initAutoPairScopeApproval) that registers the
callback there and import/invoke that initializer from the finalization handler
so onboard.ts remains growth-neutral while still calling
connect.runConnectAutoPairApprovalPass(name) via the extracted initializer.

---

Nitpick comments:
In `@src/lib/onboard/machine/handlers/finalization.ts`:
- Around line 105-108: This change to finalization (deps.autoPairScopeApproval
in src/lib/onboard/machine/handlers/finalization.ts) alters the sandbox/gateway
handoff path and must be validated by running the selective nightly E2E slice
for onboarding/sandbox lifecycle; before merging, execute the recommended E2E
jobs that cover the sandbox creation/configuration flow (as referenced by
src/lib/onboard.ts), verify there are no regressions in gateway recovery,
pairing handoff, or stuck pairing requests, and attach the E2E run IDs and any
failing test traces to the PR for review.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a2e6af1d-6a39-45c7-9c8b-c3ec6723a137

📥 Commits

Reviewing files that changed from the base of the PR and between 17734b1 and 7e5e79e.

📒 Files selected for processing (6)

src/lib/actions/sandbox/connect-autopair-budget.ts
src/lib/actions/sandbox/connect.ts
src/lib/onboard.ts
src/lib/onboard/machine/handlers/finalization.test.ts
src/lib/onboard/machine/handlers/finalization.ts
test/sandbox-connect-inference.test.ts

wscurran · 2026-06-05T14:20:17Z

✨
Related open issues:

#4504 [Ubuntu 24.04][Onboard][GitHub Issue #4504] openclaw agent after fresh onboard silently falls to embedded mode — scope upgrade pending approval (regression from NVB#6041401)
#4462 OpenClaw CLI scope-upgrade approval deadlocks and forces openclaw agent into embedded fallback

prekshivyas

Reviewed the logic and it's sound — the env-triplet strip and the budget invariant (2s + 10s×1 = 12s < 15s, extracted to the dep-free connect-autopair-budget leaf) both check out, and the first-run limitation is honestly documented. The blocker is that this now conflicts semantically with main, which independently reworked the same runConnectAutoPairApprovalPass path. Mapping the overlap so the rebase is targeted:

Env-strip is already on main (now redundant here). main's scripts/lib/openclaw_device_approval_policy.py defines gateway_approval_env(), which pops GATEWAY_APPROVAL_ENV_KEYS = ("OPENCLAW_GATEWAY_URL", "OPENCLAW_GATEWAY_PORT", "OPENCLAW_GATEWAY_TOKEN") — the exact triplet this PR strips inline. So the #4462 self-defeat fix is covered on main; the inline approve_env.pop(...) block can drop in favor of gateway_approval_env(os.environ).
runConnectAutoPairApprovalPass was restructured on main — it now reads the external openclaw_device_approval_policy.py module and uses inline MAX_APPROVALS=8 / TIMEOUT_MS=12_000. That collides head-on with this PR's budget-leaf extraction and MAX_APPROVALS=1 / 15s. Worth a deliberate call on which budget wins (the 1/15s rationale here is reasonable, but it needs to sit on top of main's policy-module structure, not replace it).
The wiring is the part still genuinely missing on main. finalization.ts on main has zero references to the approval pass, so running it in onboard finalization and in recover is the unique, still-wanted contribution.

Suggested path: rebase on main, drop the now-redundant env-strip, reconcile the budget constants with main's structure, and keep the finalization/recover wiring (+ its tests). That should shrink the diff considerably. Happy to re-review once it's rebased.

…cover After a fresh onboard, the first `openclaw agent` run emitted "scope upgrade pending approval" and silently fell back to embedded mode, and the same pending requestId persisted across every subsequent run — the approval never auto-resolved. Root cause was a divergence in the host one-shot auto-pair approval pass: it ran `openclaw devices approve` with a 1s timeout and stripped only OPENCLAW_GATEWAY_URL from the child env. Leaving PORT and TOKEN set re-pinned the approve to the gateway, hitting the NVIDIA#4462 self-defeat where a gateway-pinned scope-upgrade approve requests the very scopes it is trying to grant and fails — so the request was never cleared. - Match the in-sandbox watcher: approve timeout 1s -> 10s, strip OPENCLAW_GATEWAY_URL + PORT + TOKEN. MAX_APPROVALS 8 -> 1 (single realistic pending upgrade) and outer cap 12s -> 15s so a slow 10s approve is never SIGKILLed mid-loop (2 + 10x1 = 12s < 15s). - Wire the now-correct pass into onboard finalization (after process recovery, before verification) and the `nemoclaw recover` / probe-only path (gateway-up branches only, not the failure exit), so `nemoclaw recover` deterministically clears a pending scope upgrade. The CLI scope-upgrade request is only created by the user's first `openclaw agent` run, so that first run can still fall back once; the deterministic remedy is `nemoclaw recover`. Closes NVIDIA#4504 Signed-off-by: Tony Luo <xialuo@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

TonyLuo-NV · 2026-06-08T02:33:45Z

@coderabbitai full review

coderabbitai · 2026-06-08T02:33:51Z

✅ Action performed

Full review finished.

TonyLuo-NV · 2026-06-08T02:37:50Z

@prekshivyas thanks — rebased onto main (0c25fc185), and the diff now follows exactly the path you mapped:

Dropped the redundant inline env-strip. The approve child now uses main's shared gateway_approval_env(os.environ) from scripts/lib/openclaw_device_approval_policy.py (connect.ts), so the OpenClaw CLI scope-upgrade approval deadlocks and forces openclaw agent into embedded fallback #4462 triplet strip is no longer duplicated here.
Budget reconciled on top of main's policy-module structure. Kept the MAX_APPROVALS=1 / 15s rationale (2 + 10×1 = 12s < 15s, ~3s startup slack so a slow 10s approve isn't SIGKILLed mid-loop) but extracted the four constants into the dep-free connect-autopair-budget.ts leaf and left main's policy-module wiring intact — did not replace it.
Kept the unique contribution: the approval pass now runs in onboard finalization and in recover/probe-only (gateway-up branches only), plus the tests, ported into the new test/sandbox-connect-inference/auto-pair-approval.test.ts split using the upstream helpers.

The earlier CodeRabbit onboard.ts growth comment is also resolved — the wiring moved into src/lib/onboard/finalization-deps.ts, so onboard.ts is now net-negative and codebase-growth-guardrails passes. All CI green. Ready for re-review.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

src/lib/actions/sandbox/connect.ts (1)
767-798: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stale comment: update "OPENCLAW_GATEWAY_URL only" to reflect the full triplet fix.

Line 775 still says "removing OPENCLAW_GATEWAY_URL only from the child env" — this describes the OLD buggy behavior that caused #4462. The PR summary states the fix strips all three env vars: OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_PORT, and OPENCLAW_GATEWAY_TOKEN. The test at line 386 of auto-pair-approval.test.ts validates all three are stripped ("unset:unset:unset").
📝 Proposed fix to update the comment
-// in-sandbox `openclaw devices list` invocation targets the running gateway
-// with its token. Approvals then use OpenClaw's local fallback by removing
-// OPENCLAW_GATEWAY_URL only from the child env, and apply the same allowlist
+// in-sandbox `openclaw devices list` invocation targets the running gateway  
+// with its token. Approvals then use OpenClaw's local fallback by removing  
+// OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_PORT, and OPENCLAW_GATEWAY_TOKEN  
+// from the child env, and apply the same allowlist
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/actions/sandbox/connect.ts` around lines 767 - 798, Update the stale
inline comment that currently says "removing OPENCLAW_GATEWAY_URL only from the
child env" to reflect the real fix which strips the full triplet:
OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_PORT, and OPENCLAW_GATEWAY_TOKEN; locate
the comment block in src/lib/actions/sandbox/connect.ts around the one-shot /
workaround description (the big comment starting above the approval logic and
referenced by the auto-pair watcher behavior) and change that single sentence so
it accurately states that all three env vars are removed from the child
environment (matching the test in auto-pair-approval.test.ts that expects
"unset:unset:unset").
src/lib/onboard/finalization-deps.ts (1)
1-24: ⚠️ Potential issue | 🔴 Critical

Update: npm run typecheck:cli currently fails (tsc -p tsconfig.cli.json).
The run exits with code 2 and reports many TypeScript errors (e.g., missing ../dist/... modules and missing @oclif/core type declarations), so it’s not possible to tell whether src/lib/onboard/finalization-deps.ts—or any suggested return-type change—compiles cleanly until the underlying CLI typecheck setup/inputs are fixed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/finalization-deps.ts` around lines 1 - 24, The CLI typecheck
errors come from using typeof import("...") type annotations that force tsc to
resolve non-CLI modules (causing missing ../dist and `@oclif` types); update
finalizationHandlerDeps to avoid those module-resolving type annotations: remove
the typeof import(...) types and instead cast the require() results to any (or
annotate the functions as using any), e.g. in checkAndRecoverSandboxProcesses
and autoPairScopeApproval replace the typed declarations around
require("../actions/sandbox/process-recovery") and
require("../actions/sandbox/connect") with an any-typed const and call the
exported functions (checkAndRecoverSandboxProcesses,
runConnectAutoPairApprovalPass) through that any-typed value so tsc no longer
tries to resolve the referenced modules during the CLI typecheck.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard/finalization-deps.ts`:
- Around line 10-14: The wrapper checkAndRecoverSandboxProcesses in
src/lib/onboard/finalization-deps.ts currently declares a void return and
discards the result from require("../actions/sandbox/process-recovery"); change
the wrapper to preserve and return the upstream function's structured result ({
checked, wasRunning, recovered, forwardRecovered }) by updating the wrapper
signature to return that same type and returning the value from
processRecovery.checkAndRecoverSandboxProcesses(name, options) so callers get
the original recovery object and API contract remains consistent with
src/lib/actions/sandbox/process-recovery.ts and usages such as
src/lib/actions/sandbox/connect.ts.

---

Outside diff comments:
In `@src/lib/actions/sandbox/connect.ts`:
- Around line 767-798: Update the stale inline comment that currently says
"removing OPENCLAW_GATEWAY_URL only from the child env" to reflect the real fix
which strips the full triplet: OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_PORT, and
OPENCLAW_GATEWAY_TOKEN; locate the comment block in
src/lib/actions/sandbox/connect.ts around the one-shot / workaround description
(the big comment starting above the approval logic and referenced by the
auto-pair watcher behavior) and change that single sentence so it accurately
states that all three env vars are removed from the child environment (matching
the test in auto-pair-approval.test.ts that expects "unset:unset:unset").

In `@src/lib/onboard/finalization-deps.ts`:
- Around line 1-24: The CLI typecheck errors come from using typeof
import("...") type annotations that force tsc to resolve non-CLI modules
(causing missing ../dist and `@oclif` types); update finalizationHandlerDeps to
avoid those module-resolving type annotations: remove the typeof import(...)
types and instead cast the require() results to any (or annotate the functions
as using any), e.g. in checkAndRecoverSandboxProcesses and autoPairScopeApproval
replace the typed declarations around
require("../actions/sandbox/process-recovery") and
require("../actions/sandbox/connect") with an any-typed const and call the
exported functions (checkAndRecoverSandboxProcesses,
runConnectAutoPairApprovalPass) through that any-typed value so tsc no longer
tries to resolve the referenced modules during the CLI typecheck.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8d163a73-5e2d-462e-bdfb-d0ba9e81c8cc

📥 Commits

Reviewing files that changed from the base of the PR and between 4307f43 and 0c25fc1.

📒 Files selected for processing (8)

src/lib/actions/sandbox/connect-autopair-budget.ts
src/lib/actions/sandbox/connect.ts
src/lib/onboard.ts
src/lib/onboard/finalization-deps.ts
src/lib/onboard/machine/handlers/finalization.test.ts
src/lib/onboard/machine/handlers/finalization.ts
test/sandbox-connect-inference/auto-pair-approval.test.ts
test/sandbox-connect-inference/helpers.ts

coderabbitai · 2026-06-08T02:40:26Z

+  checkAndRecoverSandboxProcesses(name: string, options: { quiet: boolean }): void {
+    const processRecovery: typeof import("../actions/sandbox/process-recovery") =
+      require("../actions/sandbox/process-recovery");
+    processRecovery.checkAndRecoverSandboxProcesses(name, options);
+  },


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return type mismatch breaks the upstream contract.

The upstream checkAndRecoverSandboxProcesses function returns a structured object with { checked, wasRunning, recovered, forwardRecovered } fields (see src/lib/actions/sandbox/process-recovery.ts:421-545), but this wrapper changes the return type to void. This contract break prevents the finalization handler from:

Knowing whether the sandbox exists (checked: false)

Determining if recovery succeeded or failed

Inspecting whether the port forward was restored

Making decisions or logging based on recovery outcomes

In src/lib/actions/sandbox/connect.ts:203-241, the same function's return value is actively used to control probe behavior and user messaging. Even if the current finalization handler doesn't need this information, preserving the return type:

Maintains API consistency with the upstream function

Enables future code to inspect recovery outcomes without modifying the wrapper

Provides valuable debugging information for onboard failures

🔧 Proposed fix to preserve the upstream return type

- checkAndRecoverSandboxProcesses(name: string, options: { quiet: boolean }): void { + checkAndRecoverSandboxProcesses( + name: string, + options: { quiet: boolean }, + ): { checked: boolean; wasRunning: boolean | null; recovered: boolean; forwardRecovered: boolean } { const processRecovery: typeof import("../actions/sandbox/process-recovery") = require("../actions/sandbox/process-recovery"); - processRecovery.checkAndRecoverSandboxProcesses(name, options); + return processRecovery.checkAndRecoverSandboxProcesses(name, options); },

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

checkAndRecoverSandboxProcesses(name: string, options: { quiet: boolean }): void {

const processRecovery: typeof import("../actions/sandbox/process-recovery") =

require("../actions/sandbox/process-recovery");

processRecovery.checkAndRecoverSandboxProcesses(name, options);

},

checkAndRecoverSandboxProcesses(

name: string,

options: { quiet: boolean },

): { checked: boolean; wasRunning: boolean | null; recovered: boolean; forwardRecovered: boolean } {

const processRecovery: typeof import("../actions/sandbox/process-recovery") =

require("../actions/sandbox/process-recovery");

return processRecovery.checkAndRecoverSandboxProcesses(name, options);

},

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard/finalization-deps.ts` around lines 10 - 14, The wrapper checkAndRecoverSandboxProcesses in src/lib/onboard/finalization-deps.ts currently declares a void return and discards the result from require("../actions/sandbox/process-recovery"); change the wrapper to preserve and return the upstream function's structured result ({ checked, wasRunning, recovered, forwardRecovered }) by updating the wrapper signature to return that same type and returning the value from processRecovery.checkAndRecoverSandboxProcesses(name, options) so callers get the original recovery object and API contract remains consistent with src/lib/actions/sandbox/process-recovery.ts and usages such as src/lib/actions/sandbox/connect.ts.

The auto-pair approval-pass comment still described the old behavior (removing OPENCLAW_GATEWAY_URL only). Update it to reflect the shared gateway_approval_env() policy helper, which strips URL + PORT + TOKEN — the full triplet that avoids the NVIDIA#4462 self-defeat. Comment-only. Signed-off-by: Tony Luo <xialuo@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

TonyLuo-NV · 2026-06-08T02:45:44Z

@coderabbitai addressed your review (6ab4c05f9):

connect.ts stale comment ("OPENCLAW_GATEWAY_URL only") — ✅ Fixed. Updated to state the approve child uses the shared gateway_approval_env() helper, which strips the full URL + PORT + TOKEN triplet (the OpenClaw CLI scope-upgrade approval deadlocks and forces openclaw agent into embedded fallback #4462 fix), matching the unset:unset:unset test assertion.
finalization-deps.ts return type void — Skipped (not a real contract break). The finalization handler's deps interface declares checkAndRecoverSandboxProcesses(sandboxName, options): void and the handler discards the result; the original inline closure in onboard.ts was also void. The wrapper intentionally matches the consumer's contract, not the raw upstream signature.
npm run typecheck:cli Critical — Skipped (false positive). Those errors (missing ../dist/..., @oclif/core types) are the pre-existing baseline on a clean checkout before npm run build:cli; none reference the new files. typecheck:cli is clean for finalization-deps.ts and connect-autopair-budget.ts, and CI static-checks + cli-tests are green.

coderabbitai · 2026-06-08T02:46:10Z

Tip

For best results, initiate chat on the files or code changes.

@TonyLuo-NV Thanks for the detailed follow-up.

connect.ts stale comment — Acknowledged, the updated comment accurately reflecting gateway_approval_env() stripping the full URL + PORT + TOKEN triplet is the right call. ✅
finalization-deps.ts void return — Agreed; the wrapper correctly matches the deps interface contract declared by the finalization handler (void), and the handler discards the result anyway. No issue there.
npm run typecheck:cli Critical — Understood; pre-existing baseline errors on a clean checkout (missing dist/ artifacts, unresolved @oclif/core types) that don't touch the new files. With CI static-checks + cli-tests green, this is a false positive.

Everything looks good. 🐇

…topair Resolve connect.ts conflicts against upstream's new shared auto-pair approval module (auto-pair-approval.ts, NVIDIA#4263/NVIDIA#4616). The inline connect-time approval pass is replaced by the shared, base64-wrapped runSandboxAutoPairApprovalPass; the NVIDIA#4504 connect/probe/finalization budget (maxApprovals=1, list=2s, approve=10s, outer=15s) is re-applied as a per-surface override threaded through buildAutoPairApprovalScript, keeping the doctor recovery surface on its wider default budget (8/12s). Signed-off-by: Tony Luo <xialuo@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

prekshivyas

Approving. This is a clean reconciliation onto main's extracted auto-pair-approval.ts — verified at head 7c97388:

Reconciled, not reimplemented: the inline approval heredoc is gone; the module now takes an optional budget, and connect/onboard pass a tighter CONNECT_AUTO_PAIR_BUDGET (10s approve, 1 approval, 15s cap) while doctor keeps main's 8/12s default. Budget is optional (?? AUTO_PAIR_*), so main's other callers are unaffected.
The genuine fix lands: budget.approveTimeoutS threads to the module's timeout=${approveTimeoutS}, so a single slow scope-upgrade approve now gets 10s instead of 1s — the actual #4504 fix. Invariant holds: 2s list + 10s×1 = 12s < 15s cap.
Onboard sweep is wired correctly: best-effort, never throws, runs after process recovery (so it targets the freshly-recovered gateway, #3573) and before verification. Lazy-require in finalization-deps avoids the connect↔onboard import cycle and keeps onboard.ts lean.
Env-strip uses main's gateway_approval_env; the earlier redundancy is resolved.

Tests are substantial (auto-pair-approval +244, finalization +52) and CI is green. Nice job landing this on top of main's refactor rather than fighting it.

## Summary - Add v0.0.62 release notes from Discussion #5100 and link release highlights to the relevant docs pages. - Document the release's GPU sandbox recreation, sandbox-side local inference verification, and Hermes dashboard port guard in the command and inference references. - Refresh generated NemoClaw user skills for the release-prep docs set. ## Source Summary - #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU recreation behavior for Linux Docker-driver sandboxes. - #5024 -> `docs/inference/use-local-inference.mdx`: Document sandbox-runtime verification of the `inference.local` local inference route. - #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra device-node group propagation for sandbox CUDA initialization. - #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`: Summarize onboarding and recovery reliability fixes, including the reserved Hermes API port guard. - #5017 and #5043 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Summarize mutable OpenClaw config recovery and host-side `agents list` coverage. - #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes upstream metadata visibility and WhatsApp QR rendering reliability. - #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`: Refresh generated user-skill references from the current docs source. ## Skipped - #5019 -> skipped for new prose because it touched `openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`. Existing source docs remain the source for generated skill synchronization. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning) - Pre-commit hooks passed during commit, including docs-to-skills verification, markdown lint, gitleaks, and skills YAML tests.  ## Summary by CodeRabbit * **New Features** * Added `nemoclaw <name> agents list` command. * v0.0.62 release notes added summarizing onboarding and recovery improvements. * **Bug Fixes** * Improved GPU sandbox onboarding reliability (NVIDIA CDI path, Jetson/Tegra device handling). * Better local inference verification and recovery for Linux Docker-driver GPU sandboxes. * Quieter/earlier handling of onboarding drift and port collisions. * **Documentation** * Expanded GPU passthrough, inference verification, writable paths (`/dev/pts`), port 8642 restriction, and command examples.  --------- Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>

#5323) ## Summary Reopened #4504. The earlier fix (#4763) corrected the approval-pass mechanics (timeout, full env-triplet strip, `recover`/finalization wiring) but the reporter verified on v0.0.63 that the **very first** `openclaw agent` run after a fresh onboard still silently falls back to embedded mode (exit 0). ## Root cause (verified against the reporter's gateway log) On a fresh onboard the device is auto-paired with **only** `operator.pairing`. The first `openclaw agent --agent main` run immediately requests `operator.write`, which is not pre-approved → gateway rejects with `1008 pairing required` → silent embedded fallback. The connect-time approval pass cannot help the first run: the upgrade request does not exist until that run happens, *after* finalization's approval sweep already found nothing pending. OpenClaw's `devices` CLI exposes only `list`/`get`/`approve`, so there is no way to pre-grant a scope the device has not yet requested. ## Fix — warm up the upgrade during finalization Run a bounded, best-effort throwaway in-sandbox `openclaw agent --agent main -m "ping"` during onboard finalization to provoke the exact `operator.write` scope-upgrade request, poll `devices list` until that allowlisted upgrade is pending, then let the existing approval pass clear it. `operator.write` is persisted before handoff, so the user's first real run connects via the gateway with no fallback. - New best-effort leaf `src/lib/actions/sandbox/auto-pair-warmup.ts` (`runSandboxScopeWarmupRun`): 30s outer `spawnSync` cap, ~10s bounded pending-upgrade poll, all errors swallowed (never throws / never blocks onboard). - New `warmupScopeUpgrade` finalization dep, called after process recovery and **before** the existing approval pass — order is load-bearing (provoke → approve). - Builds on the merged #4763 approval-pass / `recover` wiring; if the gateway never registers the upgrade within the budget it degrades cleanly to that behavior (one fallback, then `recover` fixes it) — no regression. ## Testing - `npx vitest --project cli` on affected files: 25 pass — finalization ordering (recover → warmup → approval), best-effort/non-blocking, agent-agnostic; broad regression 1659 pass on the v2 base. - `npm run typecheck:cli` clean for the changed files; `npm run build:cli` clean. - `src/lib/onboard.ts` unchanged (dep flows through the existing `finalizationHandlerDeps` spread) → `codebase-growth-guardrails` neutral. ## Note The long-term home for this is OpenClaw pre-approving the full operator scope set at pairing time; this warm-up is the NemoClaw-side mitigation until then. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Tony Luo <xialuo@nvidia.com>  ## Summary by CodeRabbit * **New Features** * Finalization adds a best-effort, idempotent scope-warmup step that runs after sandbox/process recovery and before scope approval to reduce first-run timing races; it is non-throwing, agent-agnostic, and does not block onboarding. * **Tests** * Added unit tests validating warm-up timing/timeout behavior, payload wrapping and execution, finalization ordering, best-effort semantics, and invocation across agent types.  --------- Signed-off-by: Tony Luo <xialuo@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread src/lib/onboard.ts Outdated

TonyLuo-NV force-pushed the fix/4504-onboard-scope-upgrade-autopair branch from 7e5e79e to 6767c9b Compare June 4, 2026 09:30

TonyLuo-NV added the v0.0.60 Release target label Jun 4, 2026

wscurran added area: security Security controls, permissions, secrets, or hardening bug-fix PR fixes a bug or regression labels Jun 5, 2026

prekshivyas self-assigned this Jun 5, 2026

prekshivyas reviewed Jun 5, 2026

View reviewed changes

cv added v0.0.61 Release target and removed v0.0.60 Release target labels Jun 6, 2026

TonyLuo-NV force-pushed the fix/4504-onboard-scope-upgrade-autopair branch from 6767c9b to 0c25fc1 Compare June 8, 2026 02:27

TonyLuo-NV requested a review from prekshivyas June 8, 2026 02:38

coderabbitai Bot reviewed Jun 8, 2026

View reviewed changes

cv added v0.0.62 Release target and removed v0.0.61 Release target labels Jun 8, 2026

TonyLuo-NV and others added 3 commits June 9, 2026 10:32

Merge branch 'main' into fix/4504-onboard-scope-upgrade-autopair

8bbef2d

Merge branch 'main' into fix/4504-onboard-scope-upgrade-autopair

7c97388

prekshivyas approved these changes Jun 9, 2026

View reviewed changes

jyaunches self-requested a review June 9, 2026 19:15

jyaunches merged commit 57af84e into NVIDIA:main Jun 9, 2026
29 checks passed

TonyLuo-NV deleted the fix/4504-onboard-scope-upgrade-autopair branch June 10, 2026 01:51

miyoungc mentioned this pull request Jun 10, 2026

docs: refresh v0.0.62 release docs #5157

Merged

TonyLuo-NV mentioned this pull request Jun 12, 2026

fix(onboard): warm up scope upgrade so first agent run avoids fallback #5323

Merged

Conversation

TonyLuo-NV commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Fix

Known limitation

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wscurran commented Jun 5, 2026

Uh oh!

prekshivyas left a comment

Choose a reason for hiding this comment

Uh oh!

TonyLuo-NV commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TonyLuo-NV commented Jun 8, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

TonyLuo-NV commented Jun 8, 2026

Uh oh!

coderabbitai Bot commented Jun 8, 2026

Uh oh!

prekshivyas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

TonyLuo-NV commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 8, 2026 •

edited

Loading