fix(onboard): pin OpenClaw home/state/workspace env in sandbox#4766
Conversation
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughPins OpenClaw filesystem paths into sandbox runtime env by adding OPENCLAW_HOME, OPENCLAW_STATE_DIR, and OPENCLAW_WORKSPACE_DIR via a new helper; updates createSandbox call site, runtime env script, unit/integration tests, adds a nightly E2E job, and an end-to-end test script validating install/list behavior. ChangesOpenClaw Directory Pinning
Nightly E2E job and OpenClaw E2E script
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 1 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
Selective E2E Results — ✅ All requested jobs passedRun: 26950254106
|
…LAW_WORKSPACE_DIR in runtime shell rc Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/lib/onboard.ts (1)
3566-3571:⚠️ Potential issue | 🟠 Major | ⚡ Quick winLine-budget guardrail is currently blocking this change
Line 3566 is fine functionally, but this PR is failing CI because
src/lib/onboard.tsis net+1line. Please reclaim one line in this file before merge (for example by collapsing the adjacent multi-lineappendHermesDashboardEnvArgscall).Possible net-neutral adjustment
- onboardHermesDashboard.appendHermesDashboardEnvArgs( - envArgs, - hermesDashboardState, - formatEnvAssignment, - ); + onboardHermesDashboard.appendHermesDashboardEnvArgs(envArgs, hermesDashboardState, formatEnvAssignment);Based on learnings: onboarding changes in
src/lib/onboard.tsare constrained by a CI “onboard-entrypoint-budget” line budget; prefer collapsing adjacent calls to stay net-neutral.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/onboard.ts` around lines 3566 - 3571, The file has a +1 line net change due to the two adjacent calls; collapse the multi-line call to onboardHermesDashboard.appendHermesDashboardEnvArgs into the previous line (or combine both calls onto a single line) so that require("./onboard/openclaw-runtime-env").appendOpenClawRuntimeEnvArgs(envArgs, agent); and onboardHermesDashboard.appendHermesDashboardEnvArgs(envArgs, hermesDashboardState, formatEnvAssignment); occupy fewer lines and restore net-neutral line budget; keep the same argument order and names (envArgs, agent, hermesDashboardState, formatEnvAssignment) and do not change any functionality.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/lib/onboard.ts`:
- Around line 3566-3571: The file has a +1 line net change due to the two
adjacent calls; collapse the multi-line call to
onboardHermesDashboard.appendHermesDashboardEnvArgs into the previous line (or
combine both calls onto a single line) so that
require("./onboard/openclaw-runtime-env").appendOpenClawRuntimeEnvArgs(envArgs,
agent); and onboardHermesDashboard.appendHermesDashboardEnvArgs(envArgs,
hermesDashboardState, formatEnvAssignment); occupy fewer lines and restore
net-neutral line budget; keep the same argument order and names (envArgs, agent,
hermesDashboardState, formatEnvAssignment) and do not change any functionality.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 482ce9b9-162a-4a0b-9fab-fbb7770c16fe
📒 Files selected for processing (5)
scripts/nemoclaw-start.shsrc/lib/onboard.tssrc/lib/onboard/openclaw-runtime-env.test.tssrc/lib/onboard/openclaw-runtime-env.tstest/onboard.test.ts
✅ Files skipped from review due to trivial changes (1)
- scripts/nemoclaw-start.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- test/onboard.test.ts
…collapse Hermes dashboard call to keep onboard.ts net-neutral Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26951755601
|
Selective E2E Results — ✅ All requested jobs passedRun: 26952229228
|
… list roundtrip Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@test/e2e/test-openclaw-skill-cli-e2e.sh`:
- Line 1: The new E2E test script test-openclaw-skill-cli-e2e.sh was added under
the legacy test/e2e/ boundary which is blocked by the repository growth
guardrail; move this new test file out of test/e2e/ into the approved location
(e.g., tests/e2e-scenario/) or, if intentionally adding it under test/e2e/,
include the necessary guardrail exception/update in
.github/workflows/codebase-growth-guardrails.yaml as part of the same PR so the
repository growth checks will allow the added lines.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 14e74f21-c2aa-4f6a-ad18-ad40fea89294
📒 Files selected for processing (2)
.github/workflows/nightly-e2e.yamltest/e2e/test-openclaw-skill-cli-e2e.sh
Selective E2E Results — ✅ All requested jobs passedRun: 26953084892
|
Selective E2E Results — ❌ Some jobs failedRun: 26953366040
|
Selective E2E Results — ❌ Some jobs failedRun: 26953484481
|
…c arg newline rejection Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26953911689
|
Selective E2E Results — ✅ All requested jobs passedRun: 26954366805
|
…k phases, strip issue refs from comments Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26955219528
|
Selective E2E Results — ✅ All requested jobs passedRun: 26955457721
|
prekshivyas
left a comment
There was a problem hiding this comment.
LGTM — approving. Verified against source:
appendOpenClawRuntimeEnvArgsderives the three pins fromagent.configPaths.dir(real field —AgentDefinitioninagent/defs.ts:88/:404), and theagent?.configPaths?.dir || "/sandbox/.openclaw"fallback mirrors the existing installer convention inskill-install.ts:119exactly. That's the crux of the #4709 fix:OPENCLAW_STATE_DIR = configDirandOPENCLAW_WORKSPACE_DIR = configDir/workspacenow resolve to the same root the installer writes to, soinstallandliststop drifting. Non-OpenClaw agents (Hermes, name !== "openclaw") are correctly skipped; null → OpenClaw defaults, consistent with skill-install.nemoclaw-start.sh:OPENCLAW_HOME/OPENCLAW_STATE_DIRwere already in the runtime-shell propagation list; addingOPENCLAW_WORKSPACE_DIRreuses the same single-quote-escaped emission, so all three reachopenshell sandbox execshells.onboard.tsintegration is behavior-preserving — theappendHermesDashboardEnvArgscollapse is just the net-neutral line-budget adjustment (CodeRabbit's "Major" finding is already satisfied;codebase-growth-guardrailspasses). Same args, same order.
Unit tests cover default/configured/missing-configPaths/append/Hermes-skip; CI is green. CodeRabbit's e2e-location finding was correctly refuted by the author.
One note, not blocking: the cross-component contract (OpenClaw honoring these vars so install/list converge) is exercised only by the new test-openclaw-skill-cli-e2e.sh, which is a manually-dispatched nightly needing NVIDIA_API_KEY — so it isn't covered by PR CI. The NemoClaw-side arg construction and propagation are unit-tested, which is the part this repo owns. Good to merge.
## Summary
- Add the v0.0.59 release notes from the GitHub announcement discussion.
- Refresh local inference and credential-storage guidance for the
current release behavior.
- Regenerate the user skills from the updated Fern docs.
- Tighten release-prep and docs review guidance for generated skills, PR
labels, and shared `$$nemoclaw` command placeholders.
## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" --glob '*.{md,mdx}'`
- `git diff --check`
- `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC
permission failure)
- `npm run typecheck:cli`
- Pre-commit hooks during commit passed, including markdownlint,
docs-to-skills verification, gitleaks, commitlint, and skills YAML
tests.
## Source Summary
- #3679, #4437, #4681, #4766, #4772, #4775, #4786 ->
`docs/about/release-notes.mdx`, `docs/reference/commands.mdx`,
`docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27
compatibility, runtime path pinning, plugin registry recovery, live
gateway reconciliation, and clearer host-alias/startup diagnostics.
- #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`,
`docs/inference/inference-options.mdx`,
`docs/inference/use-local-inference.mdx`,
`docs/inference/switch-inference-providers.mdx`: Document the release
inference changes covering Local NIM waits, Hermes Anthropic routing,
Nemotron 3 Ultra, the current Ollama starter fallback, and Spark
managed-vLLM context length.
- #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`,
`docs/security/credential-storage.mdx`,
`docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/troubleshooting.mdx`: Capture permission healing,
gateway-stored credential reuse, cross-sandbox messaging credential
conflict checks, and CDI preflight diagnostics.
- #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`:
Regenerate the user skill references from the updated source docs.
- Follow-up maintenance ->
`.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`,
`.coderabbit.yaml`: Add release-prep area labels for docs and skills
PRs, and teach docs review guidance that `$$nemoclaw` is the correct
shared command placeholder for examples that work across agent aliases.
Note: the `documentation` label was not present in the repository, so
this PR is labeled with `v0.0.59` only.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Updated default model for local Ollama inference setup to qwen3.5:9b
* Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option
* Clarified credential storage and reuse behavior for post-deployment
(day-two) operations
* Added v0.0.59 release notes covering OpenClaw compatibility, inference
options, Hermes messaging sync, and troubleshooting
* Clarified CLI selection guidance and updated OpenClaw version example
in status output
* Revised release-prep instructions and docs review guidance for CLI
alias usage
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
…ng onboard (#4598) ## Summary Two preflight cleanup paths assumed the OpenShell gateway and dashboard forward were process-wide singletons. When a second NemoClaw onboard ran with `NEMOCLAW_GATEWAY_PORT=N`, the preflight retired the existing per-port gateway as "legacy" and killed the first sandbox's dashboard SSH forward — leaving the first sandbox unreachable. This PR scopes both cleanups so the second instance starts its own gateway alongside the first instead of replacing it. ## Related Issue Fixes #4422 · Refs #3053 #4422 is the specific SIGKILL-on-second-onboard symptom: a second onboard with `NEMOCLAW_GATEWAY_PORT=N` destroyed the previous instance's per-port gateway and dashboard forward. This PR fixes both preflight cleanups so concurrent onboards no longer step on each other. #3053 is the broader ask — full multi-instance segregation of registry, credentials, snapshots, messaging, and lifecycle behind a configurable `NEMOCLAW_INSTANCE` identity. That work is out of scope here and tracked separately; this PR removes the destructive cross-talk that previously prevented two NemoClaw-managed sandboxes from coexisting at all, but does not yet introduce the instance identity primitive. ## Changes - `src/lib/onboard/machine/handlers/gateway.ts`: skip `retireLegacyGatewayForDockerDriverUpgrade` when `gatewayReuseState === "foreign-active"`. A foreign-active gateway is another sandbox's per-port `nemoclaw-<port>` — not legacy state to retire. Normalises to `"missing"` so the current onboard proceeds with its own per-port gateway alongside. - `src/lib/onboard.ts`: dashboard-port preflight no longer kills an "orphaned SSH port-forward" when `openshell forward list` shows the port is held by another live sandbox. The runtime allocator picks a different dashboard port for this sandbox at create time instead. - `src/lib/onboard/machine/handlers/gateway.test.ts`: unit test for the foreign-active no-retire branch. - `test/e2e/test-concurrent-gateway-ports.sh`: new E2E that onboards two sandboxes (default + `NEMOCLAW_GATEWAY_PORT=18080`), asserts both reach `Ready`, distinct gateway ports (8080 + 18080), distinct dashboard ports (18789 + 18790), and that destroying one leaves the other intact. Each sandbox is queried via its own gateway with `openshell sandbox list -g <gateway-name>` so the global active-gateway pointer does not flip the read. - `.github/workflows/nightly-e2e.yaml`: registers `concurrent-gateway-ports-e2e` in the dispatchable-jobs catalog, `needs` lists, and the advisor comment block. Also documents existing `openclaw-skill-cli-e2e` and `channels-add-remove-e2e` in the catalog so the PR-review E2E advisor surfaces them when relevant changes land — catches up leftover automation from PRs #4766 (#4709 OpenClaw skill CLI) and #4745 (#3895 channels add/remove) where the tests shipped but were never advertised to the advisor. ## Type of Change - [x] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [ ] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [x] `npx prek run --all-files` passes - [x] `npm test` passes - [x] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [ ] Docs updated for user-facing behavior changes - [ ] `npm run docs` builds without warnings (doc changes only) - [ ] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Tinson Lai <tinsonl@nvidia.com> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Manage concurrent gateway ports safely across multiple sandboxes on the same host. * **Bug Fixes** * Improved cleanup for orphaned SSH port-forwards that block dashboard ports. * **Tests** * Added E2E test validating concurrent gateway-port scenarios. * Added/updated unit tests for gateway-state and orphaned-forward handling. * **Chores** * Added nightly E2E workflow job for concurrent gateway port testing and integrated it into reporting. * **Documentation** * Expanded nightly E2E job documentation for related tests. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Tinson Lai <tinsonl@nvidia.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Summary
Workspace-installed OpenClaw skills disappear from
openclaw skills listinside the sandbox because the upstream skill loader can fall back to a hardcodedDEFAULT_AGENT_WORKSPACE_DIRthat drifts from where install wrote. InjectOPENCLAW_HOME,OPENCLAW_STATE_DIR, andOPENCLAW_WORKSPACE_DIRat sandbox creation and propagate them into the sandbox's runtime shell rc so install and list resolve identical paths regardless of the base-image HOME.Related Issue
Fixes #4709
Changes
src/lib/onboard/openclaw-runtime-env.ts: new helperappendOpenClawRuntimeEnvArgs(envArgs, agent)that derivesOPENCLAW_HOME,OPENCLAW_STATE_DIR, andOPENCLAW_WORKSPACE_DIRfromagent.configPaths.dirand gates injection to OpenClaw agents only so non-OpenClaw sandboxes (e.g. Hermes) do not pick up cross-agent state.src/lib/onboard/openclaw-runtime-env.test.ts: unit tests for the helper covering default dir, OpenClaw configured dir, missingconfigPaths, append semantics, and Hermes-skip.src/lib/onboard.ts: call the new helper fromcreateSandbox; collapse the adjacentappendHermesDashboardEnvArgscall to keeponboard.tsnet-neutral against thecodebase-growth-guardrailsbudget.scripts/nemoclaw-start.sh: addOPENCLAW_WORKSPACE_DIRto the env-name list thatwrite_runtime_shell_envemits into/tmp/nemoclaw-proxy-env.sh, so the new var reaches every shell spawned byopenshell sandbox exec(not onlynemoclaw-start).test/onboard.test.ts: extend the existing dashboard-port envArgs test with regex assertions on the three new pins in thesandbox createcommand.test/e2e/test-openclaw-skill-cli-e2e.sh: new deterministic E2E with 7 phases — pre-flight runtime env propagation, write SKILL.md inside sandbox/tmp,openclaw skills install <path>, on-disk check at$OPENCLAW_WORKSPACE_DIR/skills/<id>,openclaw skills list --jsonenumeration + source label,openclaw skills infoworkspace path resolution, andopenclaw skills checkeligibility..github/workflows/nightly-e2e.yaml: registeropenclaw-skill-cli-e2eas a manually dispatchable nightly job. It needsNVIDIA_API_KEYfor the same reasonskill-agent-e2edoes — the install phase onboards a real sandbox viainstall.sh, which goes through the full inference-provider preflight; there is no fake-provider seam at that layer today.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Tinson Lai tinsonl@nvidia.com
Summary by CodeRabbit
New Features
Tests
Chores