fix(onboard): pin openshell fetch to blueprint max_openshell_version (#3404)#3446
Conversation
…VIDIA#3404) The reactive blueprint bump in main only kicks the can down the road — the installer would re-trip the compatibility gate on the next OpenShell release bump. This makes the installer self-correct by querying published OpenShell releases and pinning the fetch to the highest release ≤ the blueprint's max_openshell_version. If no compatible release exists, the install fails with a clear error naming both the published latest and the configured max. - Adds a pure resolver (resolveOpenshellInstallVersion) plus boundary-safe tag parser (parseOpenshellReleaseTag) to onboard/openshell-install.ts. - installOpenshell() in onboard.ts lists release tags via gh/curl and passes the resolved version through NEMOCLAW_OPENSHELL_PIN_VERSION. - install-openshell.sh honours that env (validated against ^X.Y.Z$) and falls back to the hardcoded pin only when the resolver was unable to reach GitHub. - Unit tests cover the four required cases: latest > max → highest ≤ max; latest ≤ max → latest unchanged; no release ≤ max → clear error; max missing → no-max fallback. Plus malformed-input boundary cases and the QA repro (latest=0.0.38, max=0.0.36 → 0.0.36). Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR implements OpenShell version resolution during installation: it parses GitHub release tags, resolves a compatible version per blueprint min/max constraints, composes MIN/MAX/PIN env overlays, validates blueprint overrides in the installer script, and runs the installer with the computed environment while updating host PATH and cached bin path. ChangesOpenShell Install Version Resolution
Sequence Diagram(s)sequenceDiagram
participant Onboard as Onboard.installOpenshell
participant PinFlow as openshellPinFlow
participant TagFetcher as listOpenshellReleaseTags
participant GitHub as GitHub (gh / curl)
participant Resolver as resolveOpenshellInstallVersion
participant Installer as scripts/install-openshell.sh
Onboard->>PinFlow: runOpenshellInstall(deps)
PinFlow->>TagFetcher: fetch tags
TagFetcher->>GitHub: gh release list / curl pages
GitHub-->>TagFetcher: tag list or failure
PinFlow->>Resolver: available tags + blueprint max
Resolver-->>PinFlow: pin / no-max / incompatible
PinFlow->>Installer: execute with computed env (MIN/MAX/PIN or base)
Installer-->>PinFlow: exit code + output
PinFlow-->>Onboard: OpenShellInstallResult (installed, env, messages)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/install-openshell.sh`:
- Around line 47-53: The script currently validates
NEMOCLAW_OPENSHELL_PIN_VERSION and fails early even for a dev install; update
the logic so validation of NEMOCLAW_OPENSHELL_PIN_VERSION (and assignment to
PIN_VERSION) is skipped when the dev channel/RELEASE_TAG is selected (i.e., when
RELEASE_TAG == "dev" or equivalent channel flag), by moving the validation block
to after RELEASE_TAG/channel resolution or by adding a guard like if [
"$RELEASE_TAG" != "dev" ]; then ... fi; reference the
NEMOCLAW_OPENSHELL_PIN_VERSION variable, PIN_VERSION assignment, and RELEASE_TAG
check to locate and modify the conditional so malformed pin values do not abort
dev installs.
In `@src/lib/onboard.ts`:
- Around line 2777-2838: The current listOpenshellReleaseTags implementation
only fetches up to 100 releases via a single gh (--limit 100) or curl
(per_page=100) call; update listOpenshellReleaseTags to paginate both gh and
curl calls (use page numbers or gh's paging flags and curl's
?per_page=100&page=N) and accumulate tagName/tag_name results until an empty
page is returned or no more results, then return the full array so
resolveOpenshellInstallVersion gets the complete release list; ensure you keep
the same JSON parsing/filtering logic (tagName for gh, tag_name for curl) and
preserve timeouts/env behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c57c3db3-031a-4de2-90e1-4d3d024bdd0a
📒 Files selected for processing (4)
scripts/install-openshell.shsrc/lib/onboard.tssrc/lib/onboard/openshell-install.tstest/onboard.test.ts
) Addresses CodeRabbit review feedback and the onboard-entrypoint-budget check on NVIDIA#3446, and widens scope so the shell installer's MIN/MAX constants are no longer a second source of truth: - Move installOpenshell() body to src/lib/onboard/openshell-pin.ts so src/lib/onboard.ts shrinks by 19 net lines (entrypoint budget passes). - Paginate published OpenShell release lookups: gh release list uses --limit 1000; the curl fallback walks per_page=100&page=N until the page is short. Older compatible versions are reachable once NVIDIA/OpenShell exceeds one page. - Skip NEMOCLAW_OPENSHELL_PIN_VERSION (and now MIN/MAX) validation on the dev channel — that path installs from RELEASE_TAG=dev and never consults PIN_VERSION, so a malformed override should not abort a dev install. - TS pin module overlays NEMOCLAW_OPENSHELL_MIN_VERSION and NEMOCLAW_OPENSHELL_MAX_VERSION from the blueprint onto spawn env; install-openshell.sh validates each against ^X.Y.Z$ and overrides its hardcoded MIN_VERSION / MAX_VERSION when the env is present and the channel is stable. - Unit tests cover the new MIN/MAX/PIN overlay and the orchestrator fallback paths (offline, no max, no release ≤ max). Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/install-openshell.sh`:
- Around line 78-84: The current block overwrites PIN_VERSION with MAX_VERSION
even when the resolver returned the sentinel "no-max", losing the hardcoded
legacy pin; change the assignment so PIN_VERSION is only set from MAX_VERSION
when MAX_VERSION is a real version (e.g. non-empty and not "no-max"). In the
block that computes MAX_VERSION (using apply_blueprint_override and the
NEMOCLAW_OPENSHELL_MAX_VERSION variable), wrap the PIN_VERSION="$MAX_VERSION"
assignment in a conditional that checks MAX_VERSION != "no-max" (and non-empty)
so legacy PIN_VERSION remains when resolver signals no-max.
In `@src/lib/onboard/openshell-pin.ts`:
- Around line 100-107: The current logic in openshell-pin.ts returns accumulated
tags on non-first pages when a paginated fetch (result from the shell command
parsed in the block handling `result.status` and `JSON.parse(result.stdout)`)
fails, which yields incomplete release lists; change both failure branches so
they return null instead of `page === 1 ? null : tags`, ensuring
`resolveOpenshellInstallPin()` receives `null` on any page-level failure and
avoids treating partial `tags` as authoritative.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: bfe1494e-120e-4879-ba99-c5c39c9d4f70
📒 Files selected for processing (4)
scripts/install-openshell.shsrc/lib/onboard.tssrc/lib/onboard/openshell-pin.tstest/onboard.test.ts
…NVIDIA#3404) Address two CodeRabbit major findings on NVIDIA#3446: - scripts/install-openshell.sh no longer defaults PIN_VERSION to a blueprint MAX env override. When the TS resolver couldn't reach GitHub it only sets MIN/MAX (never PIN), and the script's hardcoded PIN_VERSION is the known-good safe fallback. Letting the blueprint MAX silently drive the pin meant a bumped blueprint + rate-limited install would probe an unreleased tag. - src/lib/onboard/openshell-pin.ts paginated curl fetch returns null on any per-page failure rather than the partial accumulated list. A page-2+ failure can hide older compatible releases, which would cause the resolver to falsely return "incompatible" and abort the install — null preserves the legacy fallback. Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NVIDIA#3404) While running the issue NVIDIA#3404 repro locally with NEMOCLAW_OPENSHELL_PIN_VERSION set to a malformed value, the script exited 1 with no visible error: the `apply_blueprint_override` helper called `fail` inside a `$(...)` substitution, which captured the stderr-less error message into the variable assignment instead of letting it reach the user. Inline the regex check at each call site so `fail`'s message reaches stderr when an override is malformed. Behaviour for valid overrides is unchanged. Verified end-to-end against the real install-openshell.sh: - A. issue NVIDIA#3404 repro (resolver picked 0.0.36) → "Installing OpenShell from release 'v0.0.36'..." - B. today (resolver picked 0.0.39) → "Installing OpenShell from release 'v0.0.39'..." - C. offline (TS only set MIN/MAX, no PIN) → script keeps hardcoded 'v0.0.39' - D. dev channel with garbage overrides → "Installing OpenShell from release 'dev'..." - E. malformed PIN on stable → fails loudly with the env-name in the message Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@scripts/install-openshell.sh`:
- Around line 61-69: The fail() helper currently uses echo which writes to
stdout, contradicting the block comment claiming errors reach stderr; update the
fail() implementation to send its message(s) to stderr (e.g., use echo ... >&2
or printf ... >&2) and preserve the existing exit behavior so all callers
(including the validation block around MIN/MAX/PIN) emit errors on stderr as
intended.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6305b173-6b02-4a15-9bee-6e3396ece353
📒 Files selected for processing (1)
scripts/install-openshell.sh
…VIDIA#3404) CodeRabbit minor on NVIDIA#3446: the comment introduced in 18fe705 claimed fail()'s output reaches stderr, but fail() (Line 14-17) still echoed to stdout. Today the inlined validation avoids $(...) capture, so the user does see the message, but the stderr claim itself was false. Redirect fail()'s echo to stderr (>&2) so it matches both the comment and conventional shell error reporting. The non-error path (info/warn) is untouched. The validation continues to live outside $(...) so a future caller cannot accidentally capture it. Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ect (NVIDIA#3404) CI fixup. The three pre-existing version-check tests asserted that fail()'s error string appeared on stdout. Commit 6c4187c redirected fail() to stderr (per CodeRabbit minor on NVIDIA#3446), so the assertions have to move with it. Behaviour is unchanged — only the stream the error lands on. Signed-off-by: Dongni Yang <dongniy@nvidia.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The #3478 regression guard is now on main and I merged main into this branch (maintainer edits are enabled) so the PR branch can exercise it. Targeted dispatch against the updated PR-A preview branch is still red:
So #3446 looks good for the fresh-install resolver path, but it does not satisfy Margaret's sticky/previously-installed OpenShell path from #3474 yet. The remaining fix is in |
If a host already has an OpenShell release newer than this NemoClaw release supports, reinstall the pinned compatible release instead of hard-failing before the download path. This lets the NVIDIA#3474 regression guard pass while preserving the post-install blueprint max gate. Related: NVIDIA#3474
✅ Coverage guard verified
This verifies the sticky OpenShell path from #3474: an already-installed Local validation also passed: |
## Summary - Bump the docs metadata and version switcher to `0.0.41`. - Add v0.0.41 release notes plus operator guidance for OpenShell pinning, Docker bridge reachability, Local Ollama proxy reachability, and Docker GPU onboarding diagnostics. - Refresh generated `nemoclaw-user-*` skills from the updated docs. ## Source summary - #3434 -> `docs/reference/commands.md`, `docs/reference/troubleshooting.md`, `docs/about/release-notes.md`: Document Linux Docker-driver GPU onboarding behavior, diagnostics, cleanup guidance, and the `NEMOCLAW_DOCKER_GPU_PATCH` troubleshooting escape hatch. - #3483 -> `docs/about/release-notes.md`: Note that `nemoclaw uninstall` removes all installer-managed OpenShell helper binaries unless `--keep-openshell` is passed. - #3446 -> `docs/reference/commands.md`, `docs/reference/troubleshooting.md`, `docs/about/release-notes.md`: Document blueprint-driven OpenShell install pin resolution and fallback behavior. - #3472 -> `docs/inference/use-local-inference.md`, `docs/reference/troubleshooting.md`, `docs/about/release-notes.md`: Document sandbox-side Local Ollama auth proxy reachability checks and firewall remediation. - #3459 -> `docs/reference/commands.md`, `docs/reference/troubleshooting.md`, `docs/about/release-notes.md`: Document Docker-driver sandbox-to-gateway reachability checks and firewall remediation. ## Test plan - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user` - `make docs` - `git diff --check` - `npm run build:cli` - `npm run typecheck:cli` - pre-commit hooks during `git commit` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added `nemoclaw inference get` command to check current inference settings * Improved gateway health validation with Linux firewall remediation guidance * **Bug Fixes** * Enhanced proxy readiness validation with sandbox network path probes * Improved local Ollama route onboarding with rerun-safe fixes * Better sandbox-to-gateway connectivity detection * **Documentation** * Expanded troubleshooting guidance for firewall and connectivity issues * Updated CLI reference with new command and environment variable documentation * Added gateway binding and Docker-driver GPU compatibility guidance <!-- review_stack_entry_start --> [](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3531) <!-- review_stack_entry_end --> <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Fixes #3404. The reactive bump of
max_openshell_versioninnemoclaw-blueprint/blueprint.yamlkeeps QA installs working today, but the same bug recurs on every subsequent OpenShell release because both the blueprint pin and the shell installer's hardcoded version constants would need to be bumped in lockstep.This PR makes
nemoclaw-blueprint/blueprint.yamlthe single source of truth for the OpenShell version envelope:src/lib/onboard/openshell-install.ts:parseOpenshellReleaseTag(tag)— boundary-safe parse (strips leadingv, rejects empty / leading--/ non-X.Y.Z).resolveOpenshellInstallVersion(available, { max }, { versionGte })— pure picker returning one ofpin/no-max/incompatible.src/lib/onboard/openshell-pin.ts:listOpenshellReleaseTags()— paginated GitHub release fetch (gh release list --limit 1000, falling back to a paginatedcurl https://api.github.com/repos/NVIDIA/OpenShell/releases?per_page=100&page=N). Pagination addresses CodeRabbit's "once the repo exceeds 100 releases" concern. Any per-page failure returnsnull(not a partial list), so a page-2+ outage cannot hide an older compatible release and falsely produceincompatible.resolveOpenshellInstallPin(deps)— wires the fetch into the resolver. On any fetch failure (offline, gh missing, rate-limited) it returnsno-maxso the shell installer falls back to its hardcoded pin — no regression vs. main.computeOpenshellInstallEnv(baseEnv, deps)— overlaysNEMOCLAW_OPENSHELL_MIN_VERSION/NEMOCLAW_OPENSHELL_MAX_VERSION/NEMOCLAW_OPENSHELL_PIN_VERSIONonto the spawn env from the blueprint.runOpenshellInstall(deps)— the body of the former in-lineinstallOpenshelllives here sosrc/lib/onboard.tsshrinks rather than grows.scripts/install-openshell.sh:MIN_VERSION,MAX_VERSION,PIN_VERSION) only on the stable channel — dev installs useRELEASE_TAG=devandDEV_MIN_VERSION, so a malformed override no longer aborts a dev install (CodeRabbit minor).^[0-9]+\.[0-9]+\.[0-9]+$and fails fast with a clear error naming the offending env var.NEMOCLAW_OPENSHELL_MAX_VERSIONoverride does not silently updatePIN_VERSION. The TS resolver setsPIN_VERSIONonly when it confidently picked an existing published release; if the fetch failed, the script's hardcodedPIN_VERSIONstays as the known-good fallback (CodeRabbit major).max_openshell_version: "0.0.39"in main is the source of truth.Verification matrix
latest 0.0.42 exceeds blueprint max 0.0.39latest=…, max=…in the messageno-max→ shell uses hardcoded — same as mainghmissing / rate-limitednull→ resolver returnsno-max→ shell uses hardcoded — same as mainNEMOCLAW_OPENSHELL_CHANNEL=dev)RELEASE_TAG=devUnit tests cover the four required cases plus the widened MIN/MAX overlay and the
runOpenshellInstallorchestrator (test/onboard.test.ts→resolveOpenshellInstallVersion (#3404),resolveOpenshellInstallPin (#3404 orchestrator),computeOpenshellInstallEnv overlays MIN/MAX/PIN (#3404 widening)).Entrypoint budget
src/lib/onboard.tsshrinks by 19 net lines (the bulk ofinstallOpenshellmoves intosrc/lib/onboard/openshell-pin.ts). Theonboard-entrypoint-budgetcheck is satisfied.Test plan
npm run build:clinpm run typecheck+npm run typecheck:clinpx vitest run test/onboard.test.ts— 235 passed (17 new)shellcheck scripts/install-openshell.shshfmt -i 2 -ci -bn -d scripts/install-openshell.shsrc/lib/onboard.tsnet delta -19 lines (entrypoint budget passes)v0.0.36shipped 2026-04-23,v0.0.38shipped 2026-05-11). Withmax=0.0.36, the resolver returns{ kind: "pin", version: "0.0.36", reason: "max-cap" }andinstall-openshell.sh(gh/curl stubbed to abort the download) printsInstalling OpenShell from release 'v0.0.36'— the version the old hardcoded path could not select. Pre-PR, that same install would have fetchedv0.0.38and tripped the post-install gate.scripts/install-openshell.sh:v0.0.36v0.0.39v0.0.39(no regression)devtag, overrides ignoredNEMOCLAW_OPENSHELL_PIN_VERSIONon stable → fails loudly with the env-var name in the error (regression caught and fixed mid-PR: the originalapply_blueprint_overridehelper swallowedfail's message into the$(...)capture; commit 18fe705 inlines the regex check so the error reaches stderr).v0.0.40/v0.0.42published on GitHub but blueprint still at0.0.39, resolver auto-caps to0.0.39(same observable install as today). When the maintainer is ready to accept the newer version, a one-line blueprint bump is enough — noinstall-openshell.shedit required.Signed-off-by: Dongni Yang dongniy@nvidia.com
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Behavior Changes
Tests