fix(onboard): pre-approve gateway scope upgrades after onboard and recover#4763
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds connect-time auto-pair budgeting constants, parameterizes and exposes a budgeted sandbox approval pass, invokes it best-effort from connect probe paths and onboarding finalization (via lazy deps), and adds tests plus a sandbox exec helper for probe/gateway-down scenarios. ChangesAuto-pair scope approval integration
Sequence DiagramsequenceDiagram
participant Onboard as handleFinalizationState
participant Recovery as checkAndRecoverSandboxProcesses
participant Approval as autoPairScopeApproval / runConnectAutoPairApprovalPass
participant Verify as deploymentVerification
Onboard->>Recovery: recover sandbox processes
Recovery-->>Onboard: recovery complete
Onboard->>Approval: run approval sweep (best-effort)
Approval-->>Onboard: returns (non-throwing)
Onboard->>Verify: verify deployment
Verify-->>Onboard: verification complete
Onboard->>Onboard: record session & print dashboard
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/lib/onboard/machine/handlers/finalization.ts (1)
105-108: Run the onboarding/sandbox E2E slice for this integration point.Since this hook changes finalization handoff behavior, run the recommended selective nightly E2E jobs before merge to validate cross-layer behavior under real gateway/sandbox lifecycle conditions.
As per coding guidelines:
src/lib/onboard.tschanges affect full sandbox creation/configuration flow and include explicit E2E test recommendations.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard/machine/handlers/finalization.ts` around lines 105 - 108, This change to finalization (deps.autoPairScopeApproval in src/lib/onboard/machine/handlers/finalization.ts) alters the sandbox/gateway handoff path and must be validated by running the selective nightly E2E slice for onboarding/sandbox lifecycle; before merging, execute the recommended E2E jobs that cover the sandbox creation/configuration flow (as referenced by src/lib/onboard.ts), verify there are no regressions in gateway recovery, pairing handoff, or stuck pairing requests, and attach the E2E run IDs and any failing test traces to the PR for review.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 7032-7036: The autoPairScopeApproval callback wiring currently
defined as autoPairScopeApproval: (name) => { const connect: typeof
import("./actions/sandbox/connect") = require("./actions/sandbox/connect");
connect.runConnectAutoPairApprovalPass(name); } should be moved out of
src/lib/onboard.ts into a new module under src/lib/onboard/ (for example
src/lib/onboard/autoPair.ts) or into the file that implements the finalization
handler; export a function (e.g., initAutoPairScopeApproval) that registers the
callback there and import/invoke that initializer from the finalization handler
so onboard.ts remains growth-neutral while still calling
connect.runConnectAutoPairApprovalPass(name) via the extracted initializer.
---
Nitpick comments:
In `@src/lib/onboard/machine/handlers/finalization.ts`:
- Around line 105-108: This change to finalization (deps.autoPairScopeApproval
in src/lib/onboard/machine/handlers/finalization.ts) alters the sandbox/gateway
handoff path and must be validated by running the selective nightly E2E slice
for onboarding/sandbox lifecycle; before merging, execute the recommended E2E
jobs that cover the sandbox creation/configuration flow (as referenced by
src/lib/onboard.ts), verify there are no regressions in gateway recovery,
pairing handoff, or stuck pairing requests, and attach the E2E run IDs and any
failing test traces to the PR for review.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a2e6af1d-6a39-45c7-9c8b-c3ec6723a137
📒 Files selected for processing (6)
src/lib/actions/sandbox/connect-autopair-budget.tssrc/lib/actions/sandbox/connect.tssrc/lib/onboard.tssrc/lib/onboard/machine/handlers/finalization.test.tssrc/lib/onboard/machine/handlers/finalization.tstest/sandbox-connect-inference.test.ts
7e5e79e to
6767c9b
Compare
|
✨ |
prekshivyas
left a comment
There was a problem hiding this comment.
Reviewed the logic and it's sound — the env-triplet strip and the budget invariant (2s + 10s×1 = 12s < 15s, extracted to the dep-free connect-autopair-budget leaf) both check out, and the first-run limitation is honestly documented. The blocker is that this now conflicts semantically with main, which independently reworked the same runConnectAutoPairApprovalPass path. Mapping the overlap so the rebase is targeted:
- Env-strip is already on main (now redundant here).
main'sscripts/lib/openclaw_device_approval_policy.pydefinesgateway_approval_env(), which popsGATEWAY_APPROVAL_ENV_KEYS = ("OPENCLAW_GATEWAY_URL", "OPENCLAW_GATEWAY_PORT", "OPENCLAW_GATEWAY_TOKEN")— the exact triplet this PR strips inline. So the#4462self-defeat fix is covered on main; the inlineapprove_env.pop(...)block can drop in favor ofgateway_approval_env(os.environ). runConnectAutoPairApprovalPasswas restructured on main — it now reads the externalopenclaw_device_approval_policy.pymodule and uses inlineMAX_APPROVALS=8/TIMEOUT_MS=12_000. That collides head-on with this PR's budget-leaf extraction andMAX_APPROVALS=1/15s. Worth a deliberate call on which budget wins (the1/15srationale here is reasonable, but it needs to sit on top of main's policy-module structure, not replace it).- The wiring is the part still genuinely missing on main.
finalization.tson main has zero references to the approval pass, so running it in onboard finalization and inrecoveris the unique, still-wanted contribution.
Suggested path: rebase on main, drop the now-redundant env-strip, reconcile the budget constants with main's structure, and keep the finalization/recover wiring (+ its tests). That should shrink the diff considerably. Happy to re-review once it's rebased.
…cover After a fresh onboard, the first `openclaw agent` run emitted "scope upgrade pending approval" and silently fell back to embedded mode, and the same pending requestId persisted across every subsequent run — the approval never auto-resolved. Root cause was a divergence in the host one-shot auto-pair approval pass: it ran `openclaw devices approve` with a 1s timeout and stripped only OPENCLAW_GATEWAY_URL from the child env. Leaving PORT and TOKEN set re-pinned the approve to the gateway, hitting the NVIDIA#4462 self-defeat where a gateway-pinned scope-upgrade approve requests the very scopes it is trying to grant and fails — so the request was never cleared. - Match the in-sandbox watcher: approve timeout 1s -> 10s, strip OPENCLAW_GATEWAY_URL + PORT + TOKEN. MAX_APPROVALS 8 -> 1 (single realistic pending upgrade) and outer cap 12s -> 15s so a slow 10s approve is never SIGKILLed mid-loop (2 + 10x1 = 12s < 15s). - Wire the now-correct pass into onboard finalization (after process recovery, before verification) and the `nemoclaw recover` / probe-only path (gateway-up branches only, not the failure exit), so `nemoclaw recover` deterministically clears a pending scope upgrade. The CLI scope-upgrade request is only created by the user's first `openclaw agent` run, so that first run can still fall back once; the deterministic remedy is `nemoclaw recover`. Closes NVIDIA#4504 Signed-off-by: Tony Luo <xialuo@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6767c9b to
0c25fc1
Compare
|
@coderabbitai full review |
✅ Action performedFull review finished. |
|
@prekshivyas thanks — rebased onto main (
The earlier CodeRabbit |
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/lib/actions/sandbox/connect.ts (1)
767-798:⚠️ Potential issue | 🟠 Major | ⚡ Quick winStale comment: update "OPENCLAW_GATEWAY_URL only" to reflect the full triplet fix.
Line 775 still says "removing OPENCLAW_GATEWAY_URL only from the child env" — this describes the OLD buggy behavior that caused
#4462. The PR summary states the fix strips all three env vars:OPENCLAW_GATEWAY_URL,OPENCLAW_GATEWAY_PORT, andOPENCLAW_GATEWAY_TOKEN. The test at line 386 ofauto-pair-approval.test.tsvalidates all three are stripped ("unset:unset:unset").📝 Proposed fix to update the comment
-// in-sandbox `openclaw devices list` invocation targets the running gateway -// with its token. Approvals then use OpenClaw's local fallback by removing -// OPENCLAW_GATEWAY_URL only from the child env, and apply the same allowlist +// in-sandbox `openclaw devices list` invocation targets the running gateway +// with its token. Approvals then use OpenClaw's local fallback by removing +// OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_PORT, and OPENCLAW_GATEWAY_TOKEN +// from the child env, and apply the same allowlist🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/actions/sandbox/connect.ts` around lines 767 - 798, Update the stale inline comment that currently says "removing OPENCLAW_GATEWAY_URL only from the child env" to reflect the real fix which strips the full triplet: OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_PORT, and OPENCLAW_GATEWAY_TOKEN; locate the comment block in src/lib/actions/sandbox/connect.ts around the one-shot / workaround description (the big comment starting above the approval logic and referenced by the auto-pair watcher behavior) and change that single sentence so it accurately states that all three env vars are removed from the child environment (matching the test in auto-pair-approval.test.ts that expects "unset:unset:unset").src/lib/onboard/finalization-deps.ts (1)
1-24:⚠️ Potential issue | 🔴 CriticalUpdate:
npm run typecheck:clicurrently fails (tsc -p tsconfig.cli.json).
The run exits with code 2 and reports many TypeScript errors (e.g., missing../dist/...modules and missing@oclif/coretype declarations), so it’s not possible to tell whethersrc/lib/onboard/finalization-deps.ts—or any suggested return-type change—compiles cleanly until the underlying CLI typecheck setup/inputs are fixed.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard/finalization-deps.ts` around lines 1 - 24, The CLI typecheck errors come from using typeof import("...") type annotations that force tsc to resolve non-CLI modules (causing missing ../dist and `@oclif` types); update finalizationHandlerDeps to avoid those module-resolving type annotations: remove the typeof import(...) types and instead cast the require() results to any (or annotate the functions as using any), e.g. in checkAndRecoverSandboxProcesses and autoPairScopeApproval replace the typed declarations around require("../actions/sandbox/process-recovery") and require("../actions/sandbox/connect") with an any-typed const and call the exported functions (checkAndRecoverSandboxProcesses, runConnectAutoPairApprovalPass) through that any-typed value so tsc no longer tries to resolve the referenced modules during the CLI typecheck.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard/finalization-deps.ts`:
- Around line 10-14: The wrapper checkAndRecoverSandboxProcesses in
src/lib/onboard/finalization-deps.ts currently declares a void return and
discards the result from require("../actions/sandbox/process-recovery"); change
the wrapper to preserve and return the upstream function's structured result ({
checked, wasRunning, recovered, forwardRecovered }) by updating the wrapper
signature to return that same type and returning the value from
processRecovery.checkAndRecoverSandboxProcesses(name, options) so callers get
the original recovery object and API contract remains consistent with
src/lib/actions/sandbox/process-recovery.ts and usages such as
src/lib/actions/sandbox/connect.ts.
---
Outside diff comments:
In `@src/lib/actions/sandbox/connect.ts`:
- Around line 767-798: Update the stale inline comment that currently says
"removing OPENCLAW_GATEWAY_URL only from the child env" to reflect the real fix
which strips the full triplet: OPENCLAW_GATEWAY_URL, OPENCLAW_GATEWAY_PORT, and
OPENCLAW_GATEWAY_TOKEN; locate the comment block in
src/lib/actions/sandbox/connect.ts around the one-shot / workaround description
(the big comment starting above the approval logic and referenced by the
auto-pair watcher behavior) and change that single sentence so it accurately
states that all three env vars are removed from the child environment (matching
the test in auto-pair-approval.test.ts that expects "unset:unset:unset").
In `@src/lib/onboard/finalization-deps.ts`:
- Around line 1-24: The CLI typecheck errors come from using typeof
import("...") type annotations that force tsc to resolve non-CLI modules
(causing missing ../dist and `@oclif` types); update finalizationHandlerDeps to
avoid those module-resolving type annotations: remove the typeof import(...)
types and instead cast the require() results to any (or annotate the functions
as using any), e.g. in checkAndRecoverSandboxProcesses and autoPairScopeApproval
replace the typed declarations around
require("../actions/sandbox/process-recovery") and
require("../actions/sandbox/connect") with an any-typed const and call the
exported functions (checkAndRecoverSandboxProcesses,
runConnectAutoPairApprovalPass) through that any-typed value so tsc no longer
tries to resolve the referenced modules during the CLI typecheck.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8d163a73-5e2d-462e-bdfb-d0ba9e81c8cc
📒 Files selected for processing (8)
src/lib/actions/sandbox/connect-autopair-budget.tssrc/lib/actions/sandbox/connect.tssrc/lib/onboard.tssrc/lib/onboard/finalization-deps.tssrc/lib/onboard/machine/handlers/finalization.test.tssrc/lib/onboard/machine/handlers/finalization.tstest/sandbox-connect-inference/auto-pair-approval.test.tstest/sandbox-connect-inference/helpers.ts
| checkAndRecoverSandboxProcesses(name: string, options: { quiet: boolean }): void { | ||
| const processRecovery: typeof import("../actions/sandbox/process-recovery") = | ||
| require("../actions/sandbox/process-recovery"); | ||
| processRecovery.checkAndRecoverSandboxProcesses(name, options); | ||
| }, |
There was a problem hiding this comment.
Return type mismatch breaks the upstream contract.
The upstream checkAndRecoverSandboxProcesses function returns a structured object with { checked, wasRunning, recovered, forwardRecovered } fields (see src/lib/actions/sandbox/process-recovery.ts:421-545), but this wrapper changes the return type to void. This contract break prevents the finalization handler from:
- Knowing whether the sandbox exists (
checked: false) - Determining if recovery succeeded or failed
- Inspecting whether the port forward was restored
- Making decisions or logging based on recovery outcomes
In src/lib/actions/sandbox/connect.ts:203-241, the same function's return value is actively used to control probe behavior and user messaging. Even if the current finalization handler doesn't need this information, preserving the return type:
- Maintains API consistency with the upstream function
- Enables future code to inspect recovery outcomes without modifying the wrapper
- Provides valuable debugging information for onboard failures
🔧 Proposed fix to preserve the upstream return type
- checkAndRecoverSandboxProcesses(name: string, options: { quiet: boolean }): void {
+ checkAndRecoverSandboxProcesses(
+ name: string,
+ options: { quiet: boolean },
+ ): { checked: boolean; wasRunning: boolean | null; recovered: boolean; forwardRecovered: boolean } {
const processRecovery: typeof import("../actions/sandbox/process-recovery") =
require("../actions/sandbox/process-recovery");
- processRecovery.checkAndRecoverSandboxProcesses(name, options);
+ return processRecovery.checkAndRecoverSandboxProcesses(name, options);
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| checkAndRecoverSandboxProcesses(name: string, options: { quiet: boolean }): void { | |
| const processRecovery: typeof import("../actions/sandbox/process-recovery") = | |
| require("../actions/sandbox/process-recovery"); | |
| processRecovery.checkAndRecoverSandboxProcesses(name, options); | |
| }, | |
| checkAndRecoverSandboxProcesses( | |
| name: string, | |
| options: { quiet: boolean }, | |
| ): { checked: boolean; wasRunning: boolean | null; recovered: boolean; forwardRecovered: boolean } { | |
| const processRecovery: typeof import("../actions/sandbox/process-recovery") = | |
| require("../actions/sandbox/process-recovery"); | |
| return processRecovery.checkAndRecoverSandboxProcesses(name, options); | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/onboard/finalization-deps.ts` around lines 10 - 14, The wrapper
checkAndRecoverSandboxProcesses in src/lib/onboard/finalization-deps.ts
currently declares a void return and discards the result from
require("../actions/sandbox/process-recovery"); change the wrapper to preserve
and return the upstream function's structured result ({ checked, wasRunning,
recovered, forwardRecovered }) by updating the wrapper signature to return that
same type and returning the value from
processRecovery.checkAndRecoverSandboxProcesses(name, options) so callers get
the original recovery object and API contract remains consistent with
src/lib/actions/sandbox/process-recovery.ts and usages such as
src/lib/actions/sandbox/connect.ts.
The auto-pair approval-pass comment still described the old behavior (removing OPENCLAW_GATEWAY_URL only). Update it to reflect the shared gateway_approval_env() policy helper, which strips URL + PORT + TOKEN — the full triplet that avoids the NVIDIA#4462 self-defeat. Comment-only. Signed-off-by: Tony Luo <xialuo@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@coderabbitai addressed your review (
|
|
Tip For best results, initiate chat on the files or code changes.
Everything looks good. 🐇 |
…topair Resolve connect.ts conflicts against upstream's new shared auto-pair approval module (auto-pair-approval.ts, NVIDIA#4263/NVIDIA#4616). The inline connect-time approval pass is replaced by the shared, base64-wrapped runSandboxAutoPairApprovalPass; the NVIDIA#4504 connect/probe/finalization budget (maxApprovals=1, list=2s, approve=10s, outer=15s) is re-applied as a per-surface override threaded through buildAutoPairApprovalScript, keeping the doctor recovery surface on its wider default budget (8/12s). Signed-off-by: Tony Luo <xialuo@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
prekshivyas
left a comment
There was a problem hiding this comment.
Approving. This is a clean reconciliation onto main's extracted auto-pair-approval.ts — verified at head 7c97388:
- Reconciled, not reimplemented: the inline approval heredoc is gone; the module now takes an optional
budget, and connect/onboard pass a tighterCONNECT_AUTO_PAIR_BUDGET(10s approve, 1 approval, 15s cap) whiledoctorkeeps main's 8/12s default. Budget is optional (?? AUTO_PAIR_*), so main's other callers are unaffected. - The genuine fix lands:
budget.approveTimeoutSthreads to the module'stimeout=${approveTimeoutS}, so a single slow scope-upgrade approve now gets 10s instead of 1s — the actual #4504 fix. Invariant holds: 2s list + 10s×1 = 12s < 15s cap. - Onboard sweep is wired correctly: best-effort, never throws, runs after process recovery (so it targets the freshly-recovered gateway, #3573) and before verification. Lazy-require in
finalization-depsavoids the connect↔onboard import cycle and keepsonboard.tslean. - Env-strip uses main's
gateway_approval_env; the earlier redundancy is resolved.
Tests are substantial (auto-pair-approval +244, finalization +52) and CI is green. Nice job landing this on top of main's refactor rather than fighting it.
## Summary - Add v0.0.62 release notes from Discussion #5100 and link release highlights to the relevant docs pages. - Document the release's GPU sandbox recreation, sandbox-side local inference verification, and Hermes dashboard port guard in the command and inference references. - Refresh generated NemoClaw user skills for the release-prep docs set. ## Source Summary - #4956 -> `docs/reference/commands.mdx`: Document CDI-first Docker GPU recreation behavior for Linux Docker-driver sandboxes. - #5024 -> `docs/inference/use-local-inference.mdx`: Document sandbox-runtime verification of the `inference.local` local inference route. - #5018 -> `docs/reference/commands.mdx`: Document Jetson/Tegra device-node group propagation for sandbox CUDA initialization. - #5012, #4763, #4706, #5030, #5015 -> `docs/about/release-notes.mdx`: Summarize onboarding and recovery reliability fixes, including the reserved Hermes API port guard. - #5017 and #5043 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Summarize mutable OpenClaw config recovery and host-side `agents list` coverage. - #5010 and #5016 -> `docs/about/release-notes.mdx`: Summarize Hermes upstream metadata visibility and WhatsApp QR rendering reliability. - #5045 and prior source docs in the v0.0.62 range -> `.agents/skills/`: Refresh generated user-skill references from the current docs source. ## Skipped - #5019 -> skipped for new prose because it touched `openclaw-sandbox-permissive.yaml`, which matches `docs/.docs-skip`. Existing source docs remain the source for generated skill synchronization. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passes; Fern reports 0 errors and 1 hidden warning) - Pre-commit hooks passed during commit, including docs-to-skills verification, markdown lint, gitleaks, and skills YAML tests. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added `nemoclaw <name> agents list` command. * v0.0.62 release notes added summarizing onboarding and recovery improvements. * **Bug Fixes** * Improved GPU sandbox onboarding reliability (NVIDIA CDI path, Jetson/Tegra device handling). * Better local inference verification and recovery for Linux Docker-driver GPU sandboxes. * Quieter/earlier handling of onboarding drift and port collisions. * **Documentation** * Expanded GPU passthrough, inference verification, writable paths (`/dev/pts`), port 8642 restriction, and command examples. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Prekshi Vyas <34834085+prekshivyas@users.noreply.github.com>
#5323) ## Summary Reopened #4504. The earlier fix (#4763) corrected the approval-pass mechanics (timeout, full env-triplet strip, `recover`/finalization wiring) but the reporter verified on v0.0.63 that the **very first** `openclaw agent` run after a fresh onboard still silently falls back to embedded mode (exit 0). ## Root cause (verified against the reporter's gateway log) On a fresh onboard the device is auto-paired with **only** `operator.pairing`. The first `openclaw agent --agent main` run immediately requests `operator.write`, which is not pre-approved → gateway rejects with `1008 pairing required` → silent embedded fallback. The connect-time approval pass cannot help the first run: the upgrade request does not exist until that run happens, *after* finalization's approval sweep already found nothing pending. OpenClaw's `devices` CLI exposes only `list`/`get`/`approve`, so there is no way to pre-grant a scope the device has not yet requested. ## Fix — warm up the upgrade during finalization Run a bounded, best-effort throwaway in-sandbox `openclaw agent --agent main -m "ping"` during onboard finalization to provoke the exact `operator.write` scope-upgrade request, poll `devices list` until that allowlisted upgrade is pending, then let the existing approval pass clear it. `operator.write` is persisted before handoff, so the user's first real run connects via the gateway with no fallback. - New best-effort leaf `src/lib/actions/sandbox/auto-pair-warmup.ts` (`runSandboxScopeWarmupRun`): 30s outer `spawnSync` cap, ~10s bounded pending-upgrade poll, all errors swallowed (never throws / never blocks onboard). - New `warmupScopeUpgrade` finalization dep, called after process recovery and **before** the existing approval pass — order is load-bearing (provoke → approve). - Builds on the merged #4763 approval-pass / `recover` wiring; if the gateway never registers the upgrade within the budget it degrades cleanly to that behavior (one fallback, then `recover` fixes it) — no regression. ## Testing - `npx vitest --project cli` on affected files: 25 pass — finalization ordering (recover → warmup → approval), best-effort/non-blocking, agent-agnostic; broad regression 1659 pass on the v2 base. - `npm run typecheck:cli` clean for the changed files; `npm run build:cli` clean. - `src/lib/onboard.ts` unchanged (dep flows through the existing `finalizationHandlerDeps` spread) → `codebase-growth-guardrails` neutral. ## Note The long-term home for this is OpenClaw pre-approving the full operator scope set at pairing time; this warm-up is the NemoClaw-side mitigation until then. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Signed-off-by: Tony Luo <xialuo@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Finalization adds a best-effort, idempotent scope-warmup step that runs after sandbox/process recovery and before scope approval to reduce first-run timing races; it is non-throwing, agent-agnostic, and does not block onboarding. * **Tests** * Added unit tests validating warm-up timing/timeout behavior, payload wrapping and execution, finalization ordering, best-effort semantics, and invocation across agent types. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Tony Luo <xialuo@nvidia.com> Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Summary
Fixes #4504. After a fresh
nemoclaw onboard, the firstopenclaw agentrun printedgateway connect failed: scope upgrade pending approvaland silently fell back to embedded mode (exit 0), and the same pendingrequestIdsurvived across every subsequent run — the approval was never auto-resolved (regression of NVB#6041401).Root cause
Two defects in the connect-time auto-pair approval pass (
runConnectAutoPairApprovalPass):openclaw devices approveran withtimeout=1sand stripped onlyOPENCLAW_GATEWAY_URLfrom the child env. LeavingOPENCLAW_GATEWAY_PORT+OPENCLAW_GATEWAY_TOKENset re-pinned the approve to the gateway, hitting the OpenClaw CLI scope-upgrade approval deadlocks and forces openclaw agent into embedded fallback #4462 self-defeat where a gateway-pinned scope-upgrade approve requests the very scopes it is trying to grant and fails. So the request was never cleared and the samerequestIdpersisted.connectpath, not onboard finalization ornemoclaw recover.Fix
timeout1s → 10s; stripOPENCLAW_GATEWAY_URL+PORT+TOKEN.MAX_APPROVALS8 → 1 (single realistic pending upgrade); outer cap 12s → 15s so a slow 10s approve is never SIGKILLed mid-loop (2 + 10×1 = 12s < 15s, ~3s startup slack).nemoclaw recover/ probe-only path (gateway-up branches only, not the failure exit).connect-autopair-budget.tsleaf so the invariant is asserted on real values, not source text.Known limitation
The CLI scope-upgrade request is only created by the user's first
openclaw agentrun (after onboard returns), so that first run can still fall back once — onboard has nothing to pre-approve yet. The deterministic remedy isnemoclaw recover(now actually clears it) or one ~30s in-sandbox watcher cycle. A docs/changelog note pointing users atnemoclaw recoveron first-run fallback is recommended as follow-up.Testing
test/sandbox-connect-inference.test.ts(5 new: probe-path positive/negative/best-effort, env-triplet strip, budget invariant) andsrc/lib/onboard/machine/handlers/finalization.test.ts(ordering, agent-agnostic, best-effort) — all pass.npm run typecheck:cliclean.🤖 Generated with Claude Code
Signed-off-by: Tony Luo xialuo@nvidia.com
Summary by CodeRabbit
New Features
Improvements
Tests