fix(ci): stabilize platform vitest main workflow by cv · Pull Request #4715 · NVIDIA/NemoClaw

cv · 2026-06-03T15:27:52Z

Summary

Stabilizes the Platform Vitest Main Watch workflow by making platform-sensitive tests hermetic across macOS and WSL. The fix addresses Bash 3 empty-array behavior on macOS, WSL root permission behavior, and WSL host detection leaking into generic Linux GPU preflight tests.

Changes

Avoid empty-array command expansion in scripts/nemoclaw-start.sh when refreshing the OpenClaw config hash as non-root.
Make test/nemoclaw-start.test.ts fixtures tolerate macOS Bash 3 and WSL root runners.
Pin generic Linux GPU preflight tests to a non-WSL kernel fixture.
Keep the Docker group re-exec installer test Linux-only and use a focused ensure_docker harness.
Verified the fixed workflow with manual run https://github.com/NVIDIA/NemoClaw/actions/runs/26894436087.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

Tests
- Enhanced sandbox startup and GPU preflight validation test coverage
- Improved Docker installation re-execution test to better handle cross-platform scenarios
Chores
- Refined sandbox initialization process for improved reliability across execution environments

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

coderabbitai · 2026-06-03T15:28:06Z

📝 Walkthrough

Walkthrough

This PR hardens sandbox startup security by adding explicit root/non-root branching to the config hash refresh function, ensuring root-mode operations step-down with proper permission handling. Corresponding test updates comprehensively verify step-down behavior, handle CI edge cases where uid-0 processes may bypass restrictions, isolate Docker installer tests to a minimal harness, and enhance GPU preflight tests with kernel version context.

Changes

Sandbox Step-Down Security and Test Coverage

Layer / File(s)	Summary
Step-down security in config hash refresh `scripts/nemoclaw-start.sh`	`ensure_mutable_openclaw_config_hash` replaces the `run_prefix` array pattern with explicit `id -u == 0` branching: when running as root, the hash/chmod logic executes through `STEP_DOWN_PREFIX_SANDBOX` and fails with a security error if step-down fails; non-root path runs the same logic directly.
Step-down behavior and CI edge case testing `test/nemoclaw-start.test.ts`	Test suite updates cover step-down wrapper invocation with environment preservation (`env` stub) and handle CI environments where uid-0 processes may unexpectedly bypass restrictions; conditional fixture reset allows test to verify step-down path repairs hash even when direct probe succeeds in CI root context. Helper-chain redirect configuration is updated to include test redirect stub.
Docker installer test refactoring to harness-based function extraction `test/install-docker-group-reexec.test.ts`	Test extracts the `ensure_docker` shell function from `scripts/install.sh` into memory, builds a minimal Bash harness with strict settings and environment shim, and sources that harness instead of the full installer script. Suite is now Linux-only via `describeLinux` wrapper, isolating assertions about `sg docker` re-exec behavior and loop guards from platform-specific Bash compatibility concerns.
GPU preflight test with explicit kernel version context `src/lib/onboard/sandbox-gpu-preflight.test.ts`	Test case "keeps generic Linux sandbox GPU preflight on the CDI path" now passes explicit `release` and `procVersion` fields to `validateSandboxGpuPreflight`, providing concrete Linux kernel version (6.8.0-generic) as input context.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

NVIDIA/NemoClaw#4528: Earlier iteration of the step-down refactor to ensure_mutable_openclaw_config_hash with corresponding test suite updates.
NVIDIA/NemoClaw#4650: Related changes to sandbox-gpu-preflight.test.ts and install-docker-group-reexec.test.ts involving test input and platform-gating updates.

Suggested labels

fix, v0.0.57

Suggested reviewers

prekshivyas
jyaunches

Poem

🐰 A rabbit hops through sandbox paths with care,
Step-down security takes the proper stair,
Root and non-root branches, tested right,
Linux harnesses shine with platform light! 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly identifies the PR as a fix targeting CI/platform issues in the vitest workflow, matching the substantial changes across shell scripts and test files addressing platform compatibility.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/platform-vitest-main

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-03T15:31:13Z

E2E Advisor Recommendation

Required E2E: runtime-overrides-e2e, shields-config-e2e, cloud-e2e
Optional E2E: sandbox-operations-e2e, gpu-e2e

Dispatch hint: runtime-overrides-e2e,shields-config-e2e,cloud-e2e

Auto-dispatched E2E: runtime-overrides-e2e, shields-config-e2e, cloud-e2e via nightly-e2e.yaml at 2119821c7b591566bd90e6d21b3ccdf84640a732 — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

runtime-overrides-e2e (medium): Directly exercises container entrypoint startup, runtime model/provider override patching, and verifies openclaw.json .config-hash remains valid after startup mutations. This is the closest existing E2E coverage for the changed hash-refresh logic.
shields-config-e2e (medium): Validates live sandbox mutable-default vs shields-up config behavior, content-seal drift detection, and config hash/security lifecycle. The PR changes how the mutable config hash is refreshed at sandbox start.
cloud-e2e (high): Runs the primary real user flow: install, onboard, sandbox start, and live OpenClaw inference. Because the changed entrypoint code can prevent sandbox startup or assistant readiness, this should be merge-blocking.

Optional E2E

sandbox-operations-e2e (high): Useful broader confidence for sandbox lifecycle operations and recovery, but the targeted config-hash/startup behavior is already covered more directly by runtime-overrides, shields-config, and full cloud E2E.
gpu-e2e (high): Optional only: a GPU preflight test file changed, but no GPU preflight runtime source changed. Run this if reviewers want platform/GPU confidence beyond unit coverage.

New E2E recommendations

sandbox startup config hash under dropped capabilities (medium): Existing E2E validates hash correctness, but the specific production failure mode described by the tests—root entrypoint lacking CAP_DAC_OVERRIDE and needing sandbox-owner step-down to refresh .config-hash—is mostly simulated in unit tests. A live E2E would reduce risk for future entrypoint privilege regressions.
- Suggested test: Add an E2E that starts a real OpenClaw sandbox with mutable /sandbox/.openclaw, forces a stale or read-only .config-hash condition, restarts the sandbox entrypoint, and asserts startup succeeds with a freshly valid .config-hash and working inference.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: runtime-overrides-e2e,shields-config-e2e,cloud-e2e

github-actions · 2026-06-03T15:31:15Z

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw
Optional scenario E2E: wsl-repo-cloud-openclaw

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

ubuntu-repo-cloud-openclaw: scripts/nemoclaw-start.sh changes sandbox startup/config-hash/auth-profile behavior. The canonical Ubuntu repo OpenClaw scenario exercises the normal Docker sandbox startup path and smoke/baseline onboarding coverage on the primary hosted runner.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional scenario E2E

wsl-repo-cloud-openclaw: Optional adjacent platform coverage for the same repo/cloud OpenClaw startup path under WSL, where UID/permission and Docker Desktop behavior can differ. This is a special-runner scenario, so it is optional rather than primary.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw

Relevant changed files

scripts/nemoclaw-start.sh

github-actions · 2026-06-03T15:31:34Z

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Top item: Validate mutable config hash refresh under the real startup privilege boundary

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Validate mutable config hash refresh under the real startup privilege boundary (scripts/nemoclaw-start.sh:678): The production change is in `ensure_mutable_openclaw_config_hash()`, a sandbox startup path that depends on the real `STEP_DOWN_PREFIX_SANDBOX` behavior and CAP_DAC_OVERRIDE-dropped root semantics. The updated tests exercise the root/non-root branches with shell stubs and a portable EACCES surrogate, but they do not prove the actual entrypoint/setpriv behavior in a runtime environment.
- Recommendation: Add or identify a targeted runtime/integration validation that starts the entrypoint with the real step-down prefix and verifies a mutable sandbox-owned `.config-hash` is refreshed in both root and non-root startup modes without regressing the locked/root-owned config path.
- Evidence: The diff changes `scripts/nemoclaw-start.sh` to branch at uid=0 and invoke `"${STEP_DOWN_PREFIX_SANDBOX[@]}" sh -c ...`; `test/nemoclaw-start.test.ts` verifies this through stubbed `id`, `openclaw_config_dir_owner`, and `STEP_DOWN_PREFIX_SANDBOX`, and the deterministic test-depth context flags `scripts/nemoclaw-start.sh` for runtime validation.

🌱 Nice ideas

None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

🧹 Nitpick comments (1)

test/install-docker-group-reexec.test.ts (1)
37-52: 💤 Low value

Consider clarifying the macOS comment.

The comment on lines 41-42 mentions "macOS platform tests" but the suite is now Linux-only via describeLinux (line 24), so these tests won't run on macOS. The comment is explaining the design rationale (avoiding Bash 3 compatibility issues discovered during development), but it reads as if macOS might still execute these tests. Consider rephrasing to: "Keep the harness focused on ensure_docker to avoid Bash 3 compatibility issues (originally discovered on macOS) and isolate test behavior from the installer's full top-level state."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/install-docker-group-reexec.test.ts` around lines 37 - 52, Update the
inline comment above the test harness creation to clarify that the harness is
focused on ensure_docker to avoid Bash 3 compatibility issues (originally
discovered on macOS) and to isolate test behavior from the installer's top-level
Bash state; change the existing macOS-focused wording to the suggested phrasing
and keep references around harnessDir, installHarness,
installer_non_interactive, and ENSURE_DOCKER_FUNCTION intact so only the comment
text is modified.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/install-docker-group-reexec.test.ts`:
- Around line 37-52: Update the inline comment above the test harness creation
to clarify that the harness is focused on ensure_docker to avoid Bash 3
compatibility issues (originally discovered on macOS) and to isolate test
behavior from the installer's top-level Bash state; change the existing
macOS-focused wording to the suggested phrasing and keep references around
harnessDir, installHarness, installer_non_interactive, and
ENSURE_DOCKER_FUNCTION intact so only the comment text is modified.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2302a212-1900-4390-9b62-0130c79cb2f0

📥 Commits

Reviewing files that changed from the base of the PR and between 325ed77 and 2119821.

📒 Files selected for processing (4)

scripts/nemoclaw-start.sh
src/lib/onboard/sandbox-gpu-preflight.test.ts
test/install-docker-group-reexec.test.ts
test/nemoclaw-start.test.ts

github-actions · 2026-06-03T15:38:43Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26895200572
Target ref: 2119821c7b591566bd90e6d21b3ccdf84640a732
Workflow ref: main
Requested jobs: runtime-overrides-e2e,shields-config-e2e,cloud-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job	Result
cloud-e2e	✅ success
runtime-overrides-e2e	✅ success
shields-config-e2e	✅ success

cv added 2 commits June 3, 2026 07:47

fix(ci): stabilize platform vitest main workflow

1d802a0

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

test(ci): limit docker group reexec test to linux

2119821

Signed-off-by: Carlos Villela <cvillela@nvidia.com>

cv self-assigned this Jun 3, 2026

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

cv merged commit b61ed2b into main Jun 3, 2026
37 of 38 checks passed

cv deleted the fix/platform-vitest-main branch June 3, 2026 15:57

wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026

coderabbitai Bot mentioned this pull request Jun 9, 2026

fix(sandbox): restore mutable OpenClaw config perms after raw in-sandbox doctor --fix (#4538) #5017

Merged

4 tasks

Conversation

cv commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 3, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 3, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 3, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 3, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cv commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading