Skip to content

fix(onboard): warm up scope upgrade so first agent run avoids fallback#5323

Merged
cv merged 6 commits into
NVIDIA:mainfrom
TonyLuo-NV:fix/4504-v2-onboard-scope-warmup
Jun 13, 2026
Merged

fix(onboard): warm up scope upgrade so first agent run avoids fallback#5323
cv merged 6 commits into
NVIDIA:mainfrom
TonyLuo-NV:fix/4504-v2-onboard-scope-warmup

Conversation

@TonyLuo-NV

@TonyLuo-NV TonyLuo-NV commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Summary

Reopened #4504. The earlier fix (#4763) corrected the approval-pass mechanics (timeout, full env-triplet strip, recover/finalization wiring) but the reporter verified on v0.0.63 that the very first openclaw agent run after a fresh onboard still silently falls back to embedded mode (exit 0).

Root cause (verified against the reporter's gateway log)

On a fresh onboard the device is auto-paired with only operator.pairing. The first openclaw agent --agent main run immediately requests operator.write, which is not pre-approved → gateway rejects with 1008 pairing required → silent embedded fallback. The connect-time approval pass cannot help the first run: the upgrade request does not exist until that run happens, after finalization's approval sweep already found nothing pending. OpenClaw's devices CLI exposes only list/get/approve, so there is no way to pre-grant a scope the device has not yet requested.

Fix — warm up the upgrade during finalization

Run a bounded, best-effort throwaway in-sandbox openclaw agent --agent main -m "ping" during onboard finalization to provoke the exact operator.write scope-upgrade request, poll devices list until that allowlisted upgrade is pending, then let the existing approval pass clear it. operator.write is persisted before handoff, so the user's first real run connects via the gateway with no fallback.

  • New best-effort leaf src/lib/actions/sandbox/auto-pair-warmup.ts (runSandboxScopeWarmupRun): 30s outer spawnSync cap, ~10s bounded pending-upgrade poll, all errors swallowed (never throws / never blocks onboard).
  • New warmupScopeUpgrade finalization dep, called after process recovery and before the existing approval pass — order is load-bearing (provoke → approve).
  • Builds on the merged fix(onboard): pre-approve gateway scope upgrades after onboard and recover #4763 approval-pass / recover wiring; if the gateway never registers the upgrade within the budget it degrades cleanly to that behavior (one fallback, then recover fixes it) — no regression.

Testing

  • npx vitest --project cli on affected files: 25 pass — finalization ordering (recover → warmup → approval), best-effort/non-blocking, agent-agnostic; broad regression 1659 pass on the v2 base.
  • npm run typecheck:cli clean for the changed files; npm run build:cli clean.
  • src/lib/onboard.ts unchanged (dep flows through the existing finalizationHandlerDeps spread) → codebase-growth-guardrails neutral.

Note

The long-term home for this is OpenClaw pre-approving the full operator scope set at pairing time; this warm-up is the NemoClaw-side mitigation until then.

🤖 Generated with Claude Code

Signed-off-by: Tony Luo xialuo@nvidia.com

Summary by CodeRabbit

  • New Features

    • Finalization adds a best-effort, idempotent scope-warmup step that runs after sandbox/process recovery and before scope approval to reduce first-run timing races; it is non-throwing, agent-agnostic, and does not block onboarding.
  • Tests

    • Added unit tests validating warm-up timing/timeout behavior, payload wrapping and execution, finalization ordering, best-effort semantics, and invocation across agent types.

NVIDIA#4504 was reopened: on a fresh onboard the device is auto-paired with
only `operator.pairing`, but the first `openclaw agent --agent main` run
immediately requests `operator.write`. That upgrade is not pre-approved,
so the gateway rejects with `1008 pairing required` and the agent
silently falls back to embedded mode (exit 0). The connect-time approval
pass added earlier cannot help the first run because the upgrade request
does not exist until that run happens — after finalization's approval
sweep already found nothing pending.

OpenClaw's `devices` CLI only exposes list/get/approve, so there is no
way to pre-grant a scope the device has not yet requested. Instead, warm
up the upgrade during finalization: run a bounded throwaway in-sandbox
`openclaw agent --agent main -m "ping"` to provoke the exact
`operator.write` request, poll `devices list` until that allowlisted
upgrade is pending, then let the existing approval pass clear it — so
`operator.write` is persisted before handoff and the user's first real
run connects via the gateway with no fallback.

- New best-effort leaf `auto-pair-warmup.ts` (`runSandboxScopeWarmupRun`,
  30s outer cap, ~10s bounded pending-upgrade poll, swallows all errors).
- New `warmupScopeUpgrade` finalization dep, run after process recovery
  and before the approval pass (order is load-bearing: provoke -> approve).
- Builds on the merged NVIDIA#4763 approval-pass/recover wiring; degrades
  cleanly to that behavior if the gateway never registers the upgrade.

Closes NVIDIA#4504

Signed-off-by: Tony Luo <xialuo@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 35d98c8d-dace-4f68-aae4-223574bb65ae

📥 Commits

Reviewing files that changed from the base of the PR and between a0a8e66 and ae22d53.

📒 Files selected for processing (2)
  • src/lib/onboard/machine/handlers/finalization.test.ts
  • test/helpers/onboard-final-flow-phases.ts
💤 Files with no reviewable changes (1)
  • test/helpers/onboard-final-flow-phases.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/onboard/machine/handlers/finalization.test.ts

📝 Walkthrough

Walkthrough

Adds a best-effort sandbox scope-upgrade warm-up that runs an in‑sandbox probe and polls device scopes before finalization scope approval; wires the helper into finalization deps and adds unit and integration tests for behavior and ordering.

Changes

Sandbox scope-upgrade warm-up integration

Layer / File(s) Summary
Sandbox warm-up module with timing & execution
src/lib/actions/sandbox/auto-pair-warmup.ts
Exports timing constants (WARMUP_TIMEOUT_MS, WARMUP_POLL_ATTEMPTS, WARMUP_POLL_LIST_TIMEOUT_S) and runSandboxScopeWarmupRun(sandboxName). Implements a bounded in-sandbox script that probes openclaw agent, then polls openclaw devices list --json (parsed via embedded Python) to detect operator.write; execution is time‑bounded and failures are swallowed.
Finalization handler contract and dependency registration
src/lib/onboard/machine/handlers/finalization.ts, src/lib/onboard/finalization-deps.ts
Adds warmupScopeUpgrade(sandboxName: string): void to finalization deps, registers a lazy implementation in finalizationHandlerDeps that calls the warm-up run, and invokes it in handleFinalizationState after sandbox process recovery and before autoPairScopeApproval.
Warm-up module unit tests
src/lib/actions/sandbox/auto-pair-warmup.test.ts
Tests assert WARMUP_TIMEOUT_MS === 30_000 and verify wrapSandboxShellScript produces a newline-free base64+mktemp wrapper that executes with overall exit code 0.
Finalization integration tests and harness update
src/lib/onboard/machine/handlers/finalization.test.ts, test/helpers/onboard-final-flow-phases.ts
Adds a warmupScopeUpgrade mock to the test harness and three scenarios checking invocation ordering (after recovery, before approval), best-effort non-blocking behavior, and agent-agnostic invocation.

Sequence Diagram(s)

sequenceDiagram
  participant Host
  participant OpenShell
  participant Sandbox
  participant OpenClaw
  participant PythonParser

  Host->>OpenShell: sandbox exec "wrapped warm-up script"
  OpenShell->>Sandbox: run wrapped script
  Sandbox->>OpenClaw: openclaw agent --agent main -m "ping"
  OpenClaw-->>Sandbox: agent response
  loop poll attempts
    Sandbox->>OpenClaw: openclaw devices list --json (timeout)
    OpenClaw-->>Sandbox: JSON devices
    Sandbox->>PythonParser: parse JSON for operator.write in scopes/requestedScopes
    PythonParser-->>Sandbox: found? true/false
  end
  Sandbox-->>Host: exit 0 (best-effort, errors swallowed)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related issues

Suggested labels

area: sandbox

Suggested reviewers

  • cv

Poem

🐇 I hop in code to nudge the sleepy scope,
I ping and poll, a tiny bootstrap loop,
Best-effort and quiet, I run then stop,
Leaving approval paths a little less aloof,
Hooray — a rabbit's gentle setup hop.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: it describes the addition of scope upgrade warm-up functionality during onboarding to prevent fallback on the first agent run.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/actions/sandbox/auto-pair-warmup.ts`:
- Around line 132-136: The current call to getOpenshellBinary() is unsafe
because that helper calls process.exit on failure and will abort the whole
process; instead avoid invoking getOpenshellBinary() here and invoke the CLI in
a non-terminating way so ENOENT/timeouts remain catchable by the surrounding
try/catch. Concretely, replace the direct getOpenshellBinary() call used with
spawnSync in auto-pair-warmup.ts and call the openshell binary by name (e.g.,
'openshell') or use a local non-fatal resolver so that spawnSync('openshell',
...) runs and any ENOENT/timeout errors are thrown into the existing try/catch;
keep using spawnSync and preserve the current swallow-and-degrade behavior
rather than calling the terminating getOpenshellBinary().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 016b820c-be47-48f1-987a-4cbe0aaa3ea7

📥 Commits

Reviewing files that changed from the base of the PR and between 25c19bc and 9e0b7f7.

📒 Files selected for processing (6)
  • src/lib/actions/sandbox/auto-pair-warmup.test.ts
  • src/lib/actions/sandbox/auto-pair-warmup.ts
  • src/lib/onboard/finalization-deps.ts
  • src/lib/onboard/machine/handlers/finalization.test.ts
  • src/lib/onboard/machine/handlers/finalization.ts
  • test/helpers/onboard-final-flow-phases.ts

Comment thread src/lib/actions/sandbox/auto-pair-warmup.ts Outdated
@TonyLuo-NV

Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

runSandboxScopeWarmupRun resolved the CLI via getOpenshellBinary(), which
process.exit(1)s when OpenShell cannot be found. That fail-fast escapes the
function's try/catch and would turn the best-effort warm-up into a hard
onboard-finalization exit. Resolve with resolveOpenshell() (returns null)
and no-op when the CLI is absent, preserving the never-throw/never-block
contract.

Signed-off-by: Tony Luo <xialuo@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@TonyLuo-NV

Copy link
Copy Markdown
Contributor Author

@coderabbitai addressed (3ede0cb89): fixed the fail-fast. runSandboxScopeWarmupRun now resolves the CLI via resolveOpenshell() (returns null) and no-ops when OpenShell is absent, instead of getOpenshellBinary() whose process.exit(1) escaped the try/catch. The never-throw / never-block-finalization contract now holds even with a missing/corrupt OpenShell install. Good catch.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/lib/actions/sandbox/auto-pair-warmup.ts (1)

132-136: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't resolve OpenShell through the fatal helper in this best-effort path.

Line 136 still goes through getOpenshellBinary(), and that helper exits the whole process when resolution fails. That bypasses the try/catch here, so a missing/corrupt OpenShell install turns this documented swallow-and-degrade warm-up into a hard onboard abort.

Minimal fix
-  const { getOpenshellBinary } =
-    require("../../adapters/openshell/runtime") as typeof import("../../adapters/openshell/runtime");
   try {
     spawnSync(
-      getOpenshellBinary(),
+      "openshell",
       [
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/actions/sandbox/auto-pair-warmup.ts` around lines 132 - 136, The
warm-up currently calls getOpenshellBinary() which uses the fatal helper and can
exit the process; change this to a non-fatal resolution so the warm-up can
silently degrade: add (or use) a safe resolver like tryGetOpenshellBinary in the
"../../adapters/openshell/runtime" module that returns the binary path or
null/undefined without calling fatal, then replace the direct
getOpenshellBinary() call in auto-pair-warmup.ts with tryGetOpenshellBinary()
and only call spawnSync when a path is returned (log a debug message and skip
spawnSync when null).
🧹 Nitpick comments (1)
src/lib/actions/sandbox/auto-pair-warmup.test.ts (1)

63-72: ⚡ Quick win

Keep this unit case fully mocked/in-process.

Line 71 spawns a real sh process and depends on host shell utilities, so this test is no longer isolated to the TS contract. Please either mock the execution boundary here or move this round-trip check to the existing integration harness the file comment already references.

As per coding guidelines, "Mock external dependencies in unit tests; do not call real NVIDIA APIs in unit tests."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/actions/sandbox/auto-pair-warmup.test.ts` around lines 63 - 72, The
test currently calls the real shell via spawnSync which makes the unit test
non-isolated; change the test to mock the execution boundary instead of invoking
the host shell: in this test replace direct use of child_process.spawnSync (or
the top-level spawnSync import) with a Jest/ Sinon stub that returns a simulated
{ status: 0, stdout: "...", stderr: "" } for the wrapped script, and keep the
same assertions against result.status and any stdout/stderr expectations;
alternatively, move this “round-trip” scenario into the integration harness
referenced in the file comment if you need to exercise a real shell. Ensure you
reference wrapSandboxShellScript and stub/mock spawnSync consistently so the
test remains fully in-process and isolated.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/actions/sandbox/auto-pair-warmup.ts`:
- Around line 49-59: The poll retry budget can consume the entire outer
WARMUP_TIMEOUT_MS because WARMUP_POLL_ATTEMPTS (10) ×
(WARMUP_POLL_LIST_TIMEOUT_S (2s) + 1s sleep) ≈ 30s, leaving no headroom for
startup or the provoke agent run; update the logic so the worst-case poll
duration stays safely below WARMUP_TIMEOUT_MS by either reducing
WARMUP_POLL_ATTEMPTS or making it derived from WARMUP_TIMEOUT_MS and
WARMUP_POLL_LIST_TIMEOUT_S (e.g. attempts = Math.floor((WARMUP_TIMEOUT_MS -
HEADROOM_MS) / ((WARMUP_POLL_LIST_TIMEOUT_S*1000) + SLEEP_MS))), or stop doing
the unconditional 1s sleep after the final attempt; adjust the constants or loop
in auto-pair-warmup.ts (referencing WARMUP_POLL_ATTEMPTS,
WARMUP_POLL_LIST_TIMEOUT_S, WARMUP_TIMEOUT_MS and the sleep between retries) so
the computed worst-case poll time plus headroom is always < WARMUP_TIMEOUT_MS.

---

Duplicate comments:
In `@src/lib/actions/sandbox/auto-pair-warmup.ts`:
- Around line 132-136: The warm-up currently calls getOpenshellBinary() which
uses the fatal helper and can exit the process; change this to a non-fatal
resolution so the warm-up can silently degrade: add (or use) a safe resolver
like tryGetOpenshellBinary in the "../../adapters/openshell/runtime" module that
returns the binary path or null/undefined without calling fatal, then replace
the direct getOpenshellBinary() call in auto-pair-warmup.ts with
tryGetOpenshellBinary() and only call spawnSync when a path is returned (log a
debug message and skip spawnSync when null).

---

Nitpick comments:
In `@src/lib/actions/sandbox/auto-pair-warmup.test.ts`:
- Around line 63-72: The test currently calls the real shell via spawnSync which
makes the unit test non-isolated; change the test to mock the execution boundary
instead of invoking the host shell: in this test replace direct use of
child_process.spawnSync (or the top-level spawnSync import) with a Jest/ Sinon
stub that returns a simulated { status: 0, stdout: "...", stderr: "" } for the
wrapped script, and keep the same assertions against result.status and any
stdout/stderr expectations; alternatively, move this “round-trip” scenario into
the integration harness referenced in the file comment if you need to exercise a
real shell. Ensure you reference wrapSandboxShellScript and stub/mock spawnSync
consistently so the test remains fully in-process and isolated.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 4bb07985-601c-47da-8890-2004f65badf9

📥 Commits

Reviewing files that changed from the base of the PR and between 25c19bc and 62670c5.

📒 Files selected for processing (6)
  • src/lib/actions/sandbox/auto-pair-warmup.test.ts
  • src/lib/actions/sandbox/auto-pair-warmup.ts
  • src/lib/onboard/finalization-deps.ts
  • src/lib/onboard/machine/handlers/finalization.test.ts
  • src/lib/onboard/machine/handlers/finalization.ts
  • test/helpers/onboard-final-flow-phases.ts

Comment thread src/lib/actions/sandbox/auto-pair-warmup.ts
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@TonyLuo-NV: Thanks for the update — the resolveOpenshell() approach is exactly right. Returning null on a missing/corrupt install keeps the entire warm-up path inert without touching the process exit path, so the never-throw contract is solid. Good fix.

🐇 ✨

The pending-upgrade poll used WARMUP_POLL_ATTEMPTS=10 with a 2s list
timeout and an unconditional 1s sleep per iteration, so the poll alone
could consume ~30s — the entire WARMUP_TIMEOUT_MS — leaving no headroom
for shell startup or the throwaway agent run, exactly when devices list
is slow. Drop to 5 attempts and skip the trailing sleep (worst case
5x2 + 4x1 = 14s), comfortably under the 30s outer cap. The comment now
states the real worst case.

Signed-off-by: Tony Luo <xialuo@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@TonyLuo-NV

Copy link
Copy Markdown
Contributor Author

@coderabbitai addressed both findings:

  • Poll budget (line 59): dropped WARMUP_POLL_ATTEMPTS 10→5 and skip the trailing sleep, so worst-case poll is now 5×2 + 4×1 = 14s — well under the 30s WARMUP_TIMEOUT_MS, leaving headroom for shell startup + the provoke agent run. Comment corrected to the real worst case.
  • Fail-fast resolution (line 136): fixed in 3ede0cb — now uses resolveOpenshell() (returns null, no-op) instead of getOpenshellBinary() (process.exit).

@TonyLuo-NV

Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/actions/sandbox/auto-pair-warmup.ts`:
- Around line 77-118: The poll loop currently only returns success when a
pending device request includes 'operator.write', which delays re-onboard when
the scope is already granted; update the Python probe (the OPENCLAW 'devices
list --json' handler) to treat already-granted scopes as terminal success too by
inspecting the response for granted/active devices and their scopes in addition
to 'pending' (check keys like 'granted' or device entries under the top-level
response and examine 'scopes'/'requestedScopes' for 'operator.write'), and
short-circuit to sys.exit(0) when any device already has 'operator.write' (also
treat empty/no pending upgrade as success when no upgrade is needed); ensure the
check still falls back to the existing pending-scan behavior if no
granted/active entries are present.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2ea9d4bb-26ee-4df4-ab8d-c195fd1d7e3b

📥 Commits

Reviewing files that changed from the base of the PR and between 25c19bc and a422d46.

📒 Files selected for processing (6)
  • src/lib/actions/sandbox/auto-pair-warmup.test.ts
  • src/lib/actions/sandbox/auto-pair-warmup.ts
  • src/lib/onboard/finalization-deps.ts
  • src/lib/onboard/machine/handlers/finalization.test.ts
  • src/lib/onboard/machine/handlers/finalization.ts
  • test/helpers/onboard-final-flow-phases.ts

Comment thread src/lib/actions/sandbox/auto-pair-warmup.ts
@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@TonyLuo-NV: Both fixes look solid:

  • Poll budget: WARMUP_POLL_ATTEMPTS 10 → 5, trailing sleep dropped — worst-case poll at 5×2s + 4×1s = 14s leaves a comfortable margin under the 30s outer cap for shell startup and the provoke run. The updated comment now accurately reflects the real worst case. ✅
  • Fail-fast resolution: resolveOpenshell() returning null keeps the warm-up path fully inert when OpenShell is absent or corrupt, no process.exit escape hatch possible. ✅

The never-throw / never-block-finalization contract is clean on both fronts. Thanks for the thorough follow-through!

🐇 ✨

…nted

The pending-upgrade poll only succeeded on a PENDING operator.write
request. On a re-onboard where operator.write is already granted there is
no new pending request, so the poll burned its full budget before
finalization continued, contradicting the idempotent no-op contract. Scan
every device collection the devices-list response exposes (not just
pending) and treat operator.write present anywhere — pending or already
granted — as terminal success, so the already-paired path returns
immediately.

Signed-off-by: Tony Luo <xialuo@nvidia.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@TonyLuo-NV

Copy link
Copy Markdown
Contributor Author

@coderabbitai addressed (a0a8e661f): the poll now scans every device collection the devices list --json response exposes (not just pending) and treats operator.write present anywhere — pending OR already granted — as terminal success. The re-onboard already-paired path short-circuits on the first poll instead of burning the budget.

On the regression test: the warm-up leaf's spawn path can't be exercised in-process (the lazy require of the openshell resolver pulls in runnerplatform, which Vitest's TS loader can't resolve — the same constraint that keeps the sibling auto-pair-approval spawn path out of unit tests; it's covered by the integration harness). The idempotent re-onboard behavior is exercised at the finalization-dep level instead.

@TonyLuo-NV

Copy link
Copy Markdown
Contributor Author

@coderabbitai full review

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Full review finished.

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Tip

For best results, initiate chat on the files or code changes.

@TonyLuo-NV: Both points look solid:

  • Scan-all-collections poll (a0a8e661f): Checking every device collection (not just pending) for operator.write is the correct approach — it's idempotent on re-onboard and short-circuits on the first poll when the scope is already granted, so no poll budget is wasted on the happy path. ✅
  • Spawn path coverage: The Vitest / lazy-requirerunnerplatform resolution constraint is a well-understood limitation (same as the sibling auto-pair-approval path), and covering idempotent re-onboard behavior at the finalization-dep level while leaving the spawn path to the integration harness is the right split. ✅

The warm-up contract — best-effort, never-throw, idempotent, bounded — holds cleanly across all the cases now. Nice work threading this through.

🐇 ✨

@TonyLuo-NV

Copy link
Copy Markdown
Contributor Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@wscurran wscurran added area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow bug-fix PR fixes a bug or regression labels Jun 12, 2026
@cv cv merged commit 293d172 into NVIDIA:main Jun 13, 2026
42 checks passed
@cv cv added the v0.0.65 Release target label Jun 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow bug-fix PR fixes a bug or regression v0.0.65 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants