Skip to content

fix(onboard): pre-approve gateway scope upgrades after onboard and recover#4763

Merged
jyaunches merged 5 commits into
NVIDIA:mainfrom
TonyLuo-NV:fix/4504-onboard-scope-upgrade-autopair
Jun 9, 2026
Merged

fix(onboard): pre-approve gateway scope upgrades after onboard and recover#4763
jyaunches merged 5 commits into
NVIDIA:mainfrom
TonyLuo-NV:fix/4504-onboard-scope-upgrade-autopair

Conversation

@TonyLuo-NV

@TonyLuo-NV TonyLuo-NV commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #4504. After a fresh nemoclaw onboard, the first openclaw agent run printed gateway connect failed: scope upgrade pending approval and silently fell back to embedded mode (exit 0), and the same pending requestId survived across every subsequent run — the approval was never auto-resolved (regression of NVB#6041401).

Root cause

Two defects in the connect-time auto-pair approval pass (runConnectAutoPairApprovalPass):

  • Approval was silently failing. The host openclaw devices approve ran with timeout=1s and stripped only OPENCLAW_GATEWAY_URL from the child env. Leaving OPENCLAW_GATEWAY_PORT + OPENCLAW_GATEWAY_TOKEN set re-pinned the approve to the gateway, hitting the OpenClaw CLI scope-upgrade approval deadlocks and forces openclaw agent into embedded fallback #4462 self-defeat where a gateway-pinned scope-upgrade approve requests the very scopes it is trying to grant and fails. So the request was never cleared and the same requestId persisted.
  • The pass never ran post-onboard. It was wired only into the interactive connect path, not onboard finalization or nemoclaw recover.

Fix

  • Match the in-sandbox watcher: approve timeout 1s → 10s; strip OPENCLAW_GATEWAY_URL + PORT + TOKEN. MAX_APPROVALS 8 → 1 (single realistic pending upgrade); outer cap 12s → 15s so a slow 10s approve is never SIGKILLed mid-loop (2 + 10×1 = 12s < 15s, ~3s startup slack).
  • Run the corrected pass in onboard finalization (after process recovery, before verification) and the nemoclaw recover / probe-only path (gateway-up branches only, not the failure exit).
  • Budget constants extracted to a dependency-free connect-autopair-budget.ts leaf so the invariant is asserted on real values, not source text.

Known limitation

The CLI scope-upgrade request is only created by the user's first openclaw agent run (after onboard returns), so that first run can still fall back once — onboard has nothing to pre-approve yet. The deterministic remedy is nemoclaw recover (now actually clears it) or one ~30s in-sandbox watcher cycle. A docs/changelog note pointing users at nemoclaw recover on first-run fallback is recommended as follow-up.

Testing

  • 34 tests across test/sandbox-connect-inference.test.ts (5 new: probe-path positive/negative/best-effort, env-triplet strip, budget invariant) and src/lib/onboard/machine/handlers/finalization.test.ts (ordering, agent-agnostic, best-effort) — all pass.
  • npm run typecheck:cli clean.

🤖 Generated with Claude Code

Signed-off-by: Tony Luo xialuo@nvidia.com

Summary by CodeRabbit

  • New Features

    • Added configurable auto-pair budgeting (approval count and timeouts) for sandbox connect flows.
    • Introduced a dedicated connect-path auto-pair approval helper to enforce the tighter connect budget.
  • Improvements

    • Connect probe and onboarding finalization now run a best-effort auto-pair approval sweep to clear pending pairing requests.
    • Centralized finalization helpers with lazy-loading to reduce import surface.
  • Tests

    • New and updated tests covering approval limits, probe-only behaviors, and budget/timeouts.

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds connect-time auto-pair budgeting constants, parameterizes and exposes a budgeted sandbox approval pass, invokes it best-effort from connect probe paths and onboarding finalization (via lazy deps), and adds tests plus a sandbox exec helper for probe/gateway-down scenarios.

Changes

Auto-pair scope approval integration

Layer / File(s) Summary
Auto-pair approval budget constants
src/lib/actions/sandbox/connect-autopair-budget.ts
New leaf module exporting CONNECT_AUTO_PAIR_MAX_APPROVALS, CONNECT_AUTO_PAIR_LIST_TIMEOUT_S, CONNECT_AUTO_PAIR_APPROVE_TIMEOUT_S, and CONNECT_AUTO_PAIR_TIMEOUT_MS used by the in-sandbox approval script and outer spawn cap.
Auto-pair approval script budgeting and API
src/lib/actions/sandbox/auto-pair-approval.ts
Adds AUTO_PAIR_APPROVAL_TIMEOUT_MS and AutoPairApprovalBudget; buildAutoPairApprovalScript and runSandboxAutoPairApprovalPass accept budget to parameterize max approvals, list/approve timeouts, and the outer spawn timeout.
Sandbox connect: budgeted approval pass and probe hooks
src/lib/actions/sandbox/connect.ts
Imports connect-budget constants, derives CONNECT_AUTO_PAIR_BUDGET, exports runConnectAutoPairApprovalPass(sandboxName) that calls the sandbox approval with the connect budget, and best-effort invokes it in both wasRunning and recovered probe paths; replaces direct connect-phase approval call with the wrapper.
Lazy finalization deps module
src/lib/onboard/finalization-deps.ts
Introduces finalizationHandlerDeps with lazy-required helpers that delegate to process-recovery and connect action functions to avoid static import cycles.
Finalization deps injection
src/lib/onboard.ts
Imports finalizationHandlerDeps and spreads it into the deps object passed to handleFinalizationState, replacing prior inline wiring.
Finalization handler auto-pair approval integration
src/lib/onboard/machine/handlers/finalization.ts
Adds autoPairScopeApproval(sandboxName) to the finalization deps contract and invokes it immediately after checkAndRecoverSandboxProcesses(...) as a best-effort sweep before verification/handoff.
Finalization handler test coverage
src/lib/onboard/machine/handlers/finalization.test.ts
Mocks and forwards autoPairScopeApproval in test deps; adds tests validating invocation ordering (after process recovery, before verify), agent-type independence, and best-effort non-throwing behavior.
Connect approval tests and helper
test/sandbox-connect-inference/auto-pair-approval.test.ts, test/sandbox-connect-inference/helpers.ts
Updates tests to import budget constants, add probe-only integration scenarios (#4504), assert MAX_APPROVALS and timeout invariants, and add a gateway-down stub branch to model probe failure without running approval.

Sequence Diagram

sequenceDiagram
  participant Onboard as handleFinalizationState
  participant Recovery as checkAndRecoverSandboxProcesses
  participant Approval as autoPairScopeApproval / runConnectAutoPairApprovalPass
  participant Verify as deploymentVerification
  Onboard->>Recovery: recover sandbox processes
  Recovery-->>Onboard: recovery complete
  Onboard->>Approval: run approval sweep (best-effort)
  Approval-->>Onboard: returns (non-throwing)
  Onboard->>Verify: verify deployment
  Verify-->>Onboard: verification complete
  Onboard->>Onboard: record session & print dashboard
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4786: Modifies the same src/lib/actions/sandbox/connect.ts approval pass script for environment sanitization and approval filtering.
  • NVIDIA/NemoClaw#4788: Updates embedded Python approval logic in the connect approval pass and is directly related to these changes.
  • NVIDIA/NemoClaw#4936: Also touches the in-sandbox auto-pair approval flow and recovery usage of the connect approval pass.

Suggested labels

area: sandbox, Integration: OpenClaw, v0.0.61

Suggested reviewers

  • prekshivyas
  • cv
  • ericksoa

Poem

🐰 I hopped through code at break of day,

Counting approvals in a tidy array.
One timeout set, one approval true,
Probe and recover — best-effort through.
The rabbit smiles: tests all green and gay.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix(onboard): pre-approve gateway scope upgrades after onboard and recover' clearly and specifically describes the main objective of adding pre-approval of gateway scope upgrades during onboard finalization and recover flows.
Linked Issues check ✅ Passed All linked issue #4504 requirements are met: scope upgrades are pre-approved during onboard finalization, the approval pass runs in recover/probe paths, timeouts are corrected, and MAX_APPROVALS is reduced to 1 with proper budget handling.
Out of Scope Changes check ✅ Passed All changes are directly scoped to fixing #4504: adding auto-pair approval budgeting infrastructure, wiring the approval pass into onboard finalization and connect probe paths, updating tests, and supporting the gateway scope-upgrade pre-approval workflow.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/lib/onboard/machine/handlers/finalization.ts (1)

105-108: Run the onboarding/sandbox E2E slice for this integration point.

Since this hook changes finalization handoff behavior, run the recommended selective nightly E2E jobs before merge to validate cross-layer behavior under real gateway/sandbox lifecycle conditions.

As per coding guidelines: src/lib/onboard.ts changes affect full sandbox creation/configuration flow and include explicit E2E test recommendations.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/machine/handlers/finalization.ts` around lines 105 - 108,
This change to finalization (deps.autoPairScopeApproval in
src/lib/onboard/machine/handlers/finalization.ts) alters the sandbox/gateway
handoff path and must be validated by running the selective nightly E2E slice
for onboarding/sandbox lifecycle; before merging, execute the recommended E2E
jobs that cover the sandbox creation/configuration flow (as referenced by
src/lib/onboard.ts), verify there are no regressions in gateway recovery,
pairing handoff, or stuck pairing requests, and attach the E2E run IDs and any
failing test traces to the PR for review.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 7032-7036: The autoPairScopeApproval callback wiring currently
defined as autoPairScopeApproval: (name) => { const connect: typeof
import("./actions/sandbox/connect") = require("./actions/sandbox/connect");
connect.runConnectAutoPairApprovalPass(name); } should be moved out of
src/lib/onboard.ts into a new module under src/lib/onboard/ (for example
src/lib/onboard/autoPair.ts) or into the file that implements the finalization
handler; export a function (e.g., initAutoPairScopeApproval) that registers the
callback there and import/invoke that initializer from the finalization handler
so onboard.ts remains growth-neutral while still calling
connect.runConnectAutoPairApprovalPass(name) via the extracted initializer.

---

Nitpick comments:
In `@src/lib/onboard/machine/handlers/finalization.ts`:
- Around line 105-108: This change to finalization (deps.autoPairScopeApproval
in src/lib/onboard/machine/handlers/finalization.ts) alters the sandbox/gateway
handoff path and must be validated by running the selective nightly E2E slice
for onboarding/sandbox lifecycle; before merging, execute the recommended E2E
jobs that cover the sandbox creation/configuration flow (as referenced by
src/lib/onboard.ts), verify there are no regressions in gateway recovery,
pairing handoff, or stuck pairing requests, and attach the E2E run IDs and any
failing test traces to the PR for review.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: a2e6af1d-6a39-45c7-9c8b-c3ec6723a137

📥 Commits

Reviewing files that changed from the base of the PR and between 17734b1 and 7e5e79e.

📒 Files selected for processing (6)
  • src/lib/actions/sandbox/connect-autopair-budget.ts
  • src/lib/actions/sandbox/connect.ts
  • src/lib/onboard.ts
  • src/lib/onboard/machine/handlers/finalization.test.ts
  • src/lib/onboard/machine/handlers/finalization.ts
  • test/sandbox-connect-inference.test.ts

Comment thread src/lib/onboard.ts Outdated
@TonyLuo-NV TonyLuo-NV force-pushed the fix/4504-onboard-scope-upgrade-autopair branch from 7e5e79e to 6767c9b Compare June 4, 2026 09:30
@TonyLuo-NV TonyLuo-NV added the v0.0.60 Release target label Jun 4, 2026
@wscurran wscurran added area: security Security controls, permissions, secrets, or hardening bug-fix PR fixes a bug or regression labels Jun 5, 2026
@prekshivyas prekshivyas self-assigned this Jun 5, 2026

@prekshivyas prekshivyas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the logic and it's sound — the env-triplet strip and the budget invariant (2s + 10s×1 = 12s < 15s, extracted to the dep-free connect-autopair-budget leaf) both check out, and the first-run limitation is honestly documented. The blocker is that this now conflicts semantically with main, which independently reworked the same runConnectAutoPairApprovalPass path. Mapping the overlap so the rebase is targeted:

  • Env-strip is already on main (now redundant here). main's scripts/lib/openclaw_device_approval_policy.py defines gateway_approval_env(), which pops GATEWAY_APPROVAL_ENV_KEYS = ("OPENCLAW_GATEWAY_URL", "OPENCLAW_GATEWAY_PORT", "OPENCLAW_GATEWAY_TOKEN") — the exact triplet this PR strips inline. So the #4462 self-defeat fix is covered on main; the inline approve_env.pop(...) block can drop in favor of gateway_approval_env(os.environ).
  • runConnectAutoPairApprovalPass was restructured on main — it now reads the external openclaw_device_approval_policy.py module and uses inline MAX_APPROVALS=8 / TIMEOUT_MS=12_000. That collides head-on with this PR's budget-leaf extraction and MAX_APPROVALS=1 / 15s. Worth a deliberate call on which budget wins (the 1/15s rationale here is reasonable, but it needs to sit on top of main's policy-module structure, not replace it).
  • The wiring is the part still genuinely missing on main. finalization.ts on main has zero references to the approval pass, so running it in onboard finalization and in recover is the unique, still-wanted contribution.

Suggested path: rebase on main, drop the now-redundant env-strip, reconcile the budget constants with main's structure, and keep the finalization/recover wiring (+ its tests). That should shrink the diff considerably. Happy to re-review once it's rebased.

@cv cv added v0.0.61 Release target and removed v0.0.60 Release target labels Jun 6, 2026
…cover

After a fresh onboard, the first `openclaw agent` run emitted "scope upgrade
pending approval" and silently fell back to embedded mode, and the same
pending requestId persisted across every subsequent run — the approval never
auto-resolved.

Root cause was a divergence in the host one-shot auto-pair approval pass: it
ran `openclaw devices approve` with a 1s timeout and stripped only
OPENCLAW_GATEWAY_URL from the child env. Leaving PORT and TOKEN set re-pinned
the approve to the gateway, hitting the NVIDIA#4462 self-defeat where a
gateway-pinned scope-upgrade approve requests the very scopes it is trying to
grant and fails — so the request was never cleared.

- Match the in-sandbox watcher: approve timeout 1s -> 10s, strip
  OPENCLAW_GATEWAY_URL + PORT + TOKEN. MAX_APPROVALS 8 -> 1 (single realistic
  pending upgrade) and outer cap 12s -> 15s so a slow 10s approve is never
  SIGKILLed mid-loop (2 + 10x1 = 12s < 15s).
- Wire the now-correct pass into onboard finalization (after process recovery,
  before verification) and the `nemoclaw recover` / probe-only path (gateway-up
  branches only, not the failure exit), so `nemoclaw recover` deterministically
  clears a pending scope upgrade.

The CLI scope-upgrade request is only created by the user's first
`openclaw agent` run, so that first run can still fall back once; the
deterministic remedy is `nemoclaw recover`.

Closes NVIDIA#4504

Signed-off-by: Tony Luo <xialuo@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@TonyLuo-NV TonyLuo-NV force-pushed the fix/4504-onboard-scope-upgrade-autopair branch from 6767c9b to 0c25fc1 Compare June 8, 2026 02:27
@TonyLuo-NV

Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

@TonyLuo-NV

Copy link
Copy Markdown
Contributor Author

@prekshivyas thanks — rebased onto main (0c25fc185), and the diff now follows exactly the path you mapped:

  • Dropped the redundant inline env-strip. The approve child now uses main's shared gateway_approval_env(os.environ) from scripts/lib/openclaw_device_approval_policy.py (connect.ts), so the OpenClaw CLI scope-upgrade approval deadlocks and forces openclaw agent into embedded fallback #4462 triplet strip is no longer duplicated here.
  • Budget reconciled on top of main's policy-module structure. Kept the MAX_APPROVALS=1 / 15s rationale (2 + 10×1 = 12s < 15s, ~3s startup slack so a slow 10s approve isn't SIGKILLed mid-loop) but extracted the four constants into the dep-free connect-autopair-budget.ts leaf and left main's policy-module wiring intact — did not replace it.
  • Kept the unique contribution: the approval pass now runs in onboard finalization and in recover/probe-only (gateway-up branches only), plus the tests, ported into the new test/sandbox-connect-inference/auto-pair-approval.test.ts split using the upstream helpers.

The earlier CodeRabbit onboard.ts growth comment is also resolved — the wiring moved into src/lib/onboard/finalization-deps.ts, so onboard.ts is now net-negative and codebase-growth-guardrails passes. All CI green. Ready for re-review.

@TonyLuo-NV TonyLuo-NV requested a review from prekshivyas June 8, 2026 02:38

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/lib/actions/sandbox/connect.ts (1)

767-798: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stale comment: update "OPENCLAW_GATEWAY_URL only" to reflect the full triplet fix.

Line 775 still says "removing OPENCLAW_GATEWAY_URL only from the child env" — this describes the OLD buggy behavior that caused #4462. The PR summary states the fix strips all three env vars: OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_PORT, and OPENCLAW_GATEWAY_TOKEN. The test at line 386 of auto-pair-approval.test.ts validates all three are stripped ("unset:unset:unset").

📝 Proposed fix to update the comment
-// in-sandbox `openclaw devices list` invocation targets the running gateway
-// with its token. Approvals then use OpenClaw's local fallback by removing
-// OPENCLAW_GATEWAY_URL only from the child env, and apply the same allowlist
+// in-sandbox `openclaw devices list` invocation targets the running gateway  
+// with its token. Approvals then use OpenClaw's local fallback by removing  
+// OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_PORT, and OPENCLAW_GATEWAY_TOKEN  
+// from the child env, and apply the same allowlist
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/actions/sandbox/connect.ts` around lines 767 - 798, Update the stale
inline comment that currently says "removing OPENCLAW_GATEWAY_URL only from the
child env" to reflect the real fix which strips the full triplet:
OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_PORT, and OPENCLAW_GATEWAY_TOKEN; locate
the comment block in src/lib/actions/sandbox/connect.ts around the one-shot /
workaround description (the big comment starting above the approval logic and
referenced by the auto-pair watcher behavior) and change that single sentence so
it accurately states that all three env vars are removed from the child
environment (matching the test in auto-pair-approval.test.ts that expects
"unset:unset:unset").
src/lib/onboard/finalization-deps.ts (1)

1-24: ⚠️ Potential issue | 🔴 Critical

Update: npm run typecheck:cli currently fails (tsc -p tsconfig.cli.json).
The run exits with code 2 and reports many TypeScript errors (e.g., missing ../dist/... modules and missing @oclif/core type declarations), so it’s not possible to tell whether src/lib/onboard/finalization-deps.ts—or any suggested return-type change—compiles cleanly until the underlying CLI typecheck setup/inputs are fixed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/finalization-deps.ts` around lines 1 - 24, The CLI typecheck
errors come from using typeof import("...") type annotations that force tsc to
resolve non-CLI modules (causing missing ../dist and `@oclif` types); update
finalizationHandlerDeps to avoid those module-resolving type annotations: remove
the typeof import(...) types and instead cast the require() results to any (or
annotate the functions as using any), e.g. in checkAndRecoverSandboxProcesses
and autoPairScopeApproval replace the typed declarations around
require("../actions/sandbox/process-recovery") and
require("../actions/sandbox/connect") with an any-typed const and call the
exported functions (checkAndRecoverSandboxProcesses,
runConnectAutoPairApprovalPass) through that any-typed value so tsc no longer
tries to resolve the referenced modules during the CLI typecheck.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/onboard/finalization-deps.ts`:
- Around line 10-14: The wrapper checkAndRecoverSandboxProcesses in
src/lib/onboard/finalization-deps.ts currently declares a void return and
discards the result from require("../actions/sandbox/process-recovery"); change
the wrapper to preserve and return the upstream function's structured result ({
checked, wasRunning, recovered, forwardRecovered }) by updating the wrapper
signature to return that same type and returning the value from
processRecovery.checkAndRecoverSandboxProcesses(name, options) so callers get
the original recovery object and API contract remains consistent with
src/lib/actions/sandbox/process-recovery.ts and usages such as
src/lib/actions/sandbox/connect.ts.

---

Outside diff comments:
In `@src/lib/actions/sandbox/connect.ts`:
- Around line 767-798: Update the stale inline comment that currently says
"removing OPENCLAW_GATEWAY_URL only from the child env" to reflect the real fix
which strips the full triplet: OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_PORT, and
OPENCLAW_GATEWAY_TOKEN; locate the comment block in
src/lib/actions/sandbox/connect.ts around the one-shot / workaround description
(the big comment starting above the approval logic and referenced by the
auto-pair watcher behavior) and change that single sentence so it accurately
states that all three env vars are removed from the child environment (matching
the test in auto-pair-approval.test.ts that expects "unset:unset:unset").

In `@src/lib/onboard/finalization-deps.ts`:
- Around line 1-24: The CLI typecheck errors come from using typeof
import("...") type annotations that force tsc to resolve non-CLI modules
(causing missing ../dist and `@oclif` types); update finalizationHandlerDeps to
avoid those module-resolving type annotations: remove the typeof import(...)
types and instead cast the require() results to any (or annotate the functions
as using any), e.g. in checkAndRecoverSandboxProcesses and autoPairScopeApproval
replace the typed declarations around
require("../actions/sandbox/process-recovery") and
require("../actions/sandbox/connect") with an any-typed const and call the
exported functions (checkAndRecoverSandboxProcesses,
runConnectAutoPairApprovalPass) through that any-typed value so tsc no longer
tries to resolve the referenced modules during the CLI typecheck.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 8d163a73-5e2d-462e-bdfb-d0ba9e81c8cc

📥 Commits

Reviewing files that changed from the base of the PR and between 4307f43 and 0c25fc1.

📒 Files selected for processing (8)
  • src/lib/actions/sandbox/connect-autopair-budget.ts
  • src/lib/actions/sandbox/connect.ts
  • src/lib/onboard.ts
  • src/lib/onboard/finalization-deps.ts
  • src/lib/onboard/machine/handlers/finalization.test.ts
  • src/lib/onboard/machine/handlers/finalization.ts
  • test/sandbox-connect-inference/auto-pair-approval.test.ts
  • test/sandbox-connect-inference/helpers.ts

Comment on lines +10 to +14
checkAndRecoverSandboxProcesses(name: string, options: { quiet: boolean }): void {
const processRecovery: typeof import("../actions/sandbox/process-recovery") =
require("../actions/sandbox/process-recovery");
processRecovery.checkAndRecoverSandboxProcesses(name, options);
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return type mismatch breaks the upstream contract.

The upstream checkAndRecoverSandboxProcesses function returns a structured object with { checked, wasRunning, recovered, forwardRecovered } fields (see src/lib/actions/sandbox/process-recovery.ts:421-545), but this wrapper changes the return type to void. This contract break prevents the finalization handler from:

  • Knowing whether the sandbox exists (checked: false)
  • Determining if recovery succeeded or failed
  • Inspecting whether the port forward was restored
  • Making decisions or logging based on recovery outcomes

In src/lib/actions/sandbox/connect.ts:203-241, the same function's return value is actively used to control probe behavior and user messaging. Even if the current finalization handler doesn't need this information, preserving the return type:

  • Maintains API consistency with the upstream function
  • Enables future code to inspect recovery outcomes without modifying the wrapper
  • Provides valuable debugging information for onboard failures
🔧 Proposed fix to preserve the upstream return type
-  checkAndRecoverSandboxProcesses(name: string, options: { quiet: boolean }): void {
+  checkAndRecoverSandboxProcesses(
+    name: string,
+    options: { quiet: boolean },
+  ): { checked: boolean; wasRunning: boolean | null; recovered: boolean; forwardRecovered: boolean } {
     const processRecovery: typeof import("../actions/sandbox/process-recovery") =
       require("../actions/sandbox/process-recovery");
-    processRecovery.checkAndRecoverSandboxProcesses(name, options);
+    return processRecovery.checkAndRecoverSandboxProcesses(name, options);
   },
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
checkAndRecoverSandboxProcesses(name: string, options: { quiet: boolean }): void {
const processRecovery: typeof import("../actions/sandbox/process-recovery") =
require("../actions/sandbox/process-recovery");
processRecovery.checkAndRecoverSandboxProcesses(name, options);
},
checkAndRecoverSandboxProcesses(
name: string,
options: { quiet: boolean },
): { checked: boolean; wasRunning: boolean | null; recovered: boolean; forwardRecovered: boolean } {
const processRecovery: typeof import("../actions/sandbox/process-recovery") =
require("../actions/sandbox/process-recovery");
return processRecovery.checkAndRecoverSandboxProcesses(name, options);
},
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/onboard/finalization-deps.ts` around lines 10 - 14, The wrapper
checkAndRecoverSandboxProcesses in src/lib/onboard/finalization-deps.ts
currently declares a void return and discards the result from
require("../actions/sandbox/process-recovery"); change the wrapper to preserve
and return the upstream function's structured result ({ checked, wasRunning,
recovered, forwardRecovered }) by updating the wrapper signature to return that
same type and returning the value from
processRecovery.checkAndRecoverSandboxProcesses(name, options) so callers get
the original recovery object and API contract remains consistent with
src/lib/actions/sandbox/process-recovery.ts and usages such as
src/lib/actions/sandbox/connect.ts.

The auto-pair approval-pass comment still described the old behavior
(removing OPENCLAW_GATEWAY_URL only). Update it to reflect the shared
gateway_approval_env() policy helper, which strips URL + PORT + TOKEN —
the full triplet that avoids the NVIDIA#4462 self-defeat. Comment-only.

Signed-off-by: Tony Luo <xialuo@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@TonyLuo-NV

Copy link
Copy Markdown
Contributor Author

@coderabbitai addressed your review (6ab4c05f9):

  1. connect.ts stale comment ("OPENCLAW_GATEWAY_URL only") — ✅ Fixed. Updated to state the approve child uses the shared gateway_approval_env() helper, which strips the full URL + PORT + TOKEN triplet (the OpenClaw CLI scope-upgrade approval deadlocks and forces openclaw agent into embedded fallback #4462 fix), matching the unset:unset:unset test assertion.

  2. finalization-deps.ts return type void — Skipped (not a real contract break). The finalization handler's deps interface declares checkAndRecoverSandboxProcesses(sandboxName, options): void and the handler discards the result; the original inline closure in onboard.ts was also void. The wrapper intentionally matches the consumer's contract, not the raw upstream signature.

  3. npm run typecheck:cli Critical — Skipped (false positive). Those errors (missing ../dist/..., @oclif/core types) are the pre-existing baseline on a clean checkout before npm run build:cli; none reference the new files. typecheck:cli is clean for finalization-deps.ts and connect-autopair-budget.ts, and CI static-checks + cli-tests are green.

@coderabbitai

coderabbitai Bot commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@TonyLuo-NV Thanks for the detailed follow-up.

  1. connect.ts stale comment — Acknowledged, the updated comment accurately reflecting gateway_approval_env() stripping the full URL + PORT + TOKEN triplet is the right call. ✅

  2. finalization-deps.ts void return — Agreed; the wrapper correctly matches the deps interface contract declared by the finalization handler (void), and the handler discards the result anyway. No issue there.

  3. npm run typecheck:cli Critical — Understood; pre-existing baseline errors on a clean checkout (missing dist/ artifacts, unresolved @oclif/core types) that don't touch the new files. With CI static-checks + cli-tests green, this is a false positive.

Everything looks good. 🐇

@cv cv added v0.0.62 Release target and removed v0.0.61 Release target labels Jun 8, 2026
TonyLuo-NV and others added 3 commits June 9, 2026 10:32
…topair

Resolve connect.ts conflicts against upstream's new shared auto-pair
approval module (auto-pair-approval.ts, NVIDIA#4263/NVIDIA#4616). The inline
connect-time approval pass is replaced by the shared, base64-wrapped
runSandboxAutoPairApprovalPass; the NVIDIA#4504 connect/probe/finalization
budget (maxApprovals=1, list=2s, approve=10s, outer=15s) is re-applied
as a per-surface override threaded through buildAutoPairApprovalScript,
keeping the doctor recovery surface on its wider default budget (8/12s).

Signed-off-by: Tony Luo <xialuo@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@prekshivyas prekshivyas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. This is a clean reconciliation onto main's extracted auto-pair-approval.ts — verified at head 7c97388:

  • Reconciled, not reimplemented: the inline approval heredoc is gone; the module now takes an optional budget, and connect/onboard pass a tighter CONNECT_AUTO_PAIR_BUDGET (10s approve, 1 approval, 15s cap) while doctor keeps main's 8/12s default. Budget is optional (?? AUTO_PAIR_*), so main's other callers are unaffected.
  • The genuine fix lands: budget.approveTimeoutS threads to the module's timeout=${approveTimeoutS}, so a single slow scope-upgrade approve now gets 10s instead of 1s — the actual #4504 fix. Invariant holds: 2s list + 10s×1 = 12s < 15s cap.
  • Onboard sweep is wired correctly: best-effort, never throws, runs after process recovery (so it targets the freshly-recovered gateway, #3573) and before verification. Lazy-require in finalization-deps avoids the connect↔onboard import cycle and keeps onboard.ts lean.
  • Env-strip uses main's gateway_approval_env; the earlier redundancy is resolved.

Tests are substantial (auto-pair-approval +244, finalization +52) and CI is green. Nice job landing this on top of main's refactor rather than fighting it.

@jyaunches jyaunches self-requested a review June 9, 2026 19:15
@jyaunches jyaunches merged commit 57af84e into NVIDIA:main Jun 9, 2026
29 checks passed
@TonyLuo-NV TonyLuo-NV deleted the fix/4504-onboard-scope-upgrade-autopair branch June 10, 2026 01:51
jyaunches pushed a commit that referenced this pull request Jun 10, 2026
## Summary
- Add v0.0.62 release notes from Discussion #5100 and link release
highlights to the relevant docs pages.
- Document the release's GPU sandbox recreation, sandbox-side local
inference verification, and Hermes dashboard port guard in the command
and inference references.
- Refresh generated NemoClaw user skills for the release-prep docs set.

## Source Summary
- #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU
recreation behavior for Linux Docker-driver sandboxes.
- #5024 -> `docs/inference/use-local-inference.mdx`: Document
sandbox-runtime verification of the `inference.local` local inference
route.
- #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra
device-node group propagation for sandbox CUDA initialization.
- #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`:
Summarize onboarding and recovery reliability fixes, including the
reserved Hermes API port guard.
- #5017 and #5043 -> `docs/about/release-notes.mdx`,
`docs/reference/commands.mdx`: Summarize mutable OpenClaw config
recovery and host-side `agents list` coverage.
- #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes
upstream metadata visibility and WhatsApp QR rendering reliability.
- #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`:
Refresh generated user-skill references from the current docs source.

## Skipped
- #5019 -> skipped for new prose because it touched
`openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`.
Existing source docs remain the source for generated skill
synchronization.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning)
- Pre-commit hooks passed during commit, including docs-to-skills
verification, markdown lint, gitleaks, and skills YAML tests.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  * Added `nemoclaw <name> agents list` command.
* v0.0.62 release notes added summarizing onboarding and recovery
improvements.

* **Bug Fixes**
* Improved GPU sandbox onboarding reliability (NVIDIA CDI path,
Jetson/Tegra device handling).
* Better local inference verification and recovery for Linux
Docker-driver GPU sandboxes.
  * Quieter/earlier handling of onboarding drift and port collisions.

* **Documentation**
* Expanded GPU passthrough, inference verification, writable paths
(`/dev/pts`), port 8642 restriction, and command examples.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
cv added a commit that referenced this pull request Jun 13, 2026
#5323)

## Summary

Reopened #4504. The earlier fix (#4763) corrected the approval-pass
mechanics (timeout, full env-triplet strip, `recover`/finalization
wiring) but the reporter verified on v0.0.63 that the **very first**
`openclaw agent` run after a fresh onboard still silently falls back to
embedded mode (exit 0).

## Root cause (verified against the reporter's gateway log)

On a fresh onboard the device is auto-paired with **only**
`operator.pairing`. The first `openclaw agent --agent main` run
immediately requests `operator.write`, which is not pre-approved →
gateway rejects with `1008 pairing required` → silent embedded fallback.
The connect-time approval pass cannot help the first run: the upgrade
request does not exist until that run happens, *after* finalization's
approval sweep already found nothing pending. OpenClaw's `devices` CLI
exposes only `list`/`get`/`approve`, so there is no way to pre-grant a
scope the device has not yet requested.

## Fix — warm up the upgrade during finalization

Run a bounded, best-effort throwaway in-sandbox `openclaw agent --agent
main -m "ping"` during onboard finalization to provoke the exact
`operator.write` scope-upgrade request, poll `devices list` until that
allowlisted upgrade is pending, then let the existing approval pass
clear it. `operator.write` is persisted before handoff, so the user's
first real run connects via the gateway with no fallback.

- New best-effort leaf `src/lib/actions/sandbox/auto-pair-warmup.ts`
(`runSandboxScopeWarmupRun`): 30s outer `spawnSync` cap, ~10s bounded
pending-upgrade poll, all errors swallowed (never throws / never blocks
onboard).
- New `warmupScopeUpgrade` finalization dep, called after process
recovery and **before** the existing approval pass — order is
load-bearing (provoke → approve).
- Builds on the merged #4763 approval-pass / `recover` wiring; if the
gateway never registers the upgrade within the budget it degrades
cleanly to that behavior (one fallback, then `recover` fixes it) — no
regression.

## Testing

- `npx vitest --project cli` on affected files: 25 pass — finalization
ordering (recover → warmup → approval), best-effort/non-blocking,
agent-agnostic; broad regression 1659 pass on the v2 base.
- `npm run typecheck:cli` clean for the changed files; `npm run
build:cli` clean.
- `src/lib/onboard.ts` unchanged (dep flows through the existing
`finalizationHandlerDeps` spread) → `codebase-growth-guardrails`
neutral.

## Note

The long-term home for this is OpenClaw pre-approving the full operator
scope set at pairing time; this warm-up is the NemoClaw-side mitigation
until then.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Signed-off-by: Tony Luo <xialuo@nvidia.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Finalization adds a best-effort, idempotent scope-warmup step that
runs after sandbox/process recovery and before scope approval to reduce
first-run timing races; it is non-throwing, agent-agnostic, and does not
block onboarding.

* **Tests**
* Added unit tests validating warm-up timing/timeout behavior, payload
wrapping and execution, finalization ordering, best-effort semantics,
and invocation across agent types.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Tony Luo <xialuo@nvidia.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: security Security controls, permissions, secrets, or hardening bug-fix PR fixes a bug or regression v0.0.62 Release target

Projects

None yet

5 participants