Skip to content

fix(ci): stabilize platform vitest main workflow#4715

Merged
cv merged 2 commits into
mainfrom
fix/platform-vitest-main
Jun 3, 2026
Merged

fix(ci): stabilize platform vitest main workflow#4715
cv merged 2 commits into
mainfrom
fix/platform-vitest-main

Conversation

@cv

@cv cv commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Summary

Stabilizes the Platform Vitest Main Watch workflow by making platform-sensitive tests hermetic across macOS and WSL. The fix addresses Bash 3 empty-array behavior on macOS, WSL root permission behavior, and WSL host detection leaking into generic Linux GPU preflight tests.

Changes

  • Avoid empty-array command expansion in scripts/nemoclaw-start.sh when refreshing the OpenClaw config hash as non-root.
  • Make test/nemoclaw-start.test.ts fixtures tolerate macOS Bash 3 and WSL root runners.
  • Pin generic Linux GPU preflight tests to a non-WSL kernel fixture.
  • Keep the Docker group re-exec installer test Linux-only and use a focused ensure_docker harness.
  • Verified the fixed workflow with manual run https://github.com/NVIDIA/NemoClaw/actions/runs/26894436087.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Carlos Villela cvillela@nvidia.com

Summary by CodeRabbit

  • Tests

    • Enhanced sandbox startup and GPU preflight validation test coverage
    • Improved Docker installation re-execution test to better handle cross-platform scenarios
  • Chores

    • Refined sandbox initialization process for improved reliability across execution environments

cv added 2 commits June 3, 2026 07:47
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
@cv cv self-assigned this Jun 3, 2026
@coderabbitai

coderabbitai Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR hardens sandbox startup security by adding explicit root/non-root branching to the config hash refresh function, ensuring root-mode operations step-down with proper permission handling. Corresponding test updates comprehensively verify step-down behavior, handle CI edge cases where uid-0 processes may bypass restrictions, isolate Docker installer tests to a minimal harness, and enhance GPU preflight tests with kernel version context.

Changes

Sandbox Step-Down Security and Test Coverage

Layer / File(s) Summary
Step-down security in config hash refresh
scripts/nemoclaw-start.sh
ensure_mutable_openclaw_config_hash replaces the run_prefix array pattern with explicit id -u == 0 branching: when running as root, the hash/chmod logic executes through STEP_DOWN_PREFIX_SANDBOX and fails with a security error if step-down fails; non-root path runs the same logic directly.
Step-down behavior and CI edge case testing
test/nemoclaw-start.test.ts
Test suite updates cover step-down wrapper invocation with environment preservation (env stub) and handle CI environments where uid-0 processes may unexpectedly bypass restrictions; conditional fixture reset allows test to verify step-down path repairs hash even when direct probe succeeds in CI root context. Helper-chain redirect configuration is updated to include test redirect stub.
Docker installer test refactoring to harness-based function extraction
test/install-docker-group-reexec.test.ts
Test extracts the ensure_docker shell function from scripts/install.sh into memory, builds a minimal Bash harness with strict settings and environment shim, and sources that harness instead of the full installer script. Suite is now Linux-only via describeLinux wrapper, isolating assertions about sg docker re-exec behavior and loop guards from platform-specific Bash compatibility concerns.
GPU preflight test with explicit kernel version context
src/lib/onboard/sandbox-gpu-preflight.test.ts
Test case "keeps generic Linux sandbox GPU preflight on the CDI path" now passes explicit release and procVersion fields to validateSandboxGpuPreflight, providing concrete Linux kernel version (6.8.0-generic) as input context.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4528: Earlier iteration of the step-down refactor to ensure_mutable_openclaw_config_hash with corresponding test suite updates.
  • NVIDIA/NemoClaw#4650: Related changes to sandbox-gpu-preflight.test.ts and install-docker-group-reexec.test.ts involving test input and platform-gating updates.

Suggested labels

fix, v0.0.57

Suggested reviewers

  • prekshivyas
  • jyaunches

Poem

🐰 A rabbit hops through sandbox paths with care,
Step-down security takes the proper stair,
Root and non-root branches, tested right,
Linux harnesses shine with platform light! 🌙

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly identifies the PR as a fix targeting CI/platform issues in the vitest workflow, matching the substantial changes across shell scripts and test files addressing platform compatibility.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/platform-vitest-main

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: runtime-overrides-e2e, shields-config-e2e, cloud-e2e
Optional E2E: sandbox-operations-e2e, gpu-e2e

Dispatch hint: runtime-overrides-e2e,shields-config-e2e,cloud-e2e

Auto-dispatched E2E: runtime-overrides-e2e, shields-config-e2e, cloud-e2e via nightly-e2e.yaml at 2119821c7b591566bd90e6d21b3ccdf84640a732nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • runtime-overrides-e2e (medium): Directly exercises container entrypoint startup, runtime model/provider override patching, and verifies openclaw.json .config-hash remains valid after startup mutations. This is the closest existing E2E coverage for the changed hash-refresh logic.
  • shields-config-e2e (medium): Validates live sandbox mutable-default vs shields-up config behavior, content-seal drift detection, and config hash/security lifecycle. The PR changes how the mutable config hash is refreshed at sandbox start.
  • cloud-e2e (high): Runs the primary real user flow: install, onboard, sandbox start, and live OpenClaw inference. Because the changed entrypoint code can prevent sandbox startup or assistant readiness, this should be merge-blocking.

Optional E2E

  • sandbox-operations-e2e (high): Useful broader confidence for sandbox lifecycle operations and recovery, but the targeted config-hash/startup behavior is already covered more directly by runtime-overrides, shields-config, and full cloud E2E.
  • gpu-e2e (high): Optional only: a GPU preflight test file changed, but no GPU preflight runtime source changed. Run this if reviewers want platform/GPU confidence beyond unit coverage.

New E2E recommendations

  • sandbox startup config hash under dropped capabilities (medium): Existing E2E validates hash correctness, but the specific production failure mode described by the tests—root entrypoint lacking CAP_DAC_OVERRIDE and needing sandbox-owner step-down to refresh .config-hash—is mostly simulated in unit tests. A live E2E would reduce risk for future entrypoint privilege regressions.
    • Suggested test: Add an E2E that starts a real OpenClaw sandbox with mutable /sandbox/.openclaw, forces a stale or read-only .config-hash condition, restarts the sandbox entrypoint, and asserts startup succeeds with a freshly valid .config-hash and working inference.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: runtime-overrides-e2e,shields-config-e2e,cloud-e2e

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw
Optional scenario E2E: wsl-repo-cloud-openclaw

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • ubuntu-repo-cloud-openclaw: scripts/nemoclaw-start.sh changes sandbox startup/config-hash/auth-profile behavior. The canonical Ubuntu repo OpenClaw scenario exercises the normal Docker sandbox startup path and smoke/baseline onboarding coverage on the primary hosted runner.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional scenario E2E

  • wsl-repo-cloud-openclaw: Optional adjacent platform coverage for the same repo/cloud OpenClaw startup path under WSL, where UID/permission and Docker Desktop behavior can differ. This is a special-runner scenario, so it is optional rather than primary.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw

Relevant changed files

  • scripts/nemoclaw-start.sh

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Top item: Validate mutable config hash refresh under the real startup privilege boundary

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Validate mutable config hash refresh under the real startup privilege boundary (scripts/nemoclaw-start.sh:678): The production change is in `ensure_mutable_openclaw_config_hash()`, a sandbox startup path that depends on the real `STEP_DOWN_PREFIX_SANDBOX` behavior and CAP_DAC_OVERRIDE-dropped root semantics. The updated tests exercise the root/non-root branches with shell stubs and a portable EACCES surrogate, but they do not prove the actual entrypoint/setpriv behavior in a runtime environment.
    • Recommendation: Add or identify a targeted runtime/integration validation that starts the entrypoint with the real step-down prefix and verifies a mutable sandbox-owned `.config-hash` is refreshed in both root and non-root startup modes without regressing the locked/root-owned config path.
    • Evidence: The diff changes `scripts/nemoclaw-start.sh` to branch at uid=0 and invoke `"${STEP_DOWN_PREFIX_SANDBOX[@]}" sh -c ...`; `test/nemoclaw-start.test.ts` verifies this through stubbed `id`, `openclaw_config_dir_owner`, and `STEP_DOWN_PREFIX_SANDBOX`, and the deterministic test-depth context flags `scripts/nemoclaw-start.sh` for runtime validation.

🌱 Nice ideas

  • None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/install-docker-group-reexec.test.ts (1)

37-52: 💤 Low value

Consider clarifying the macOS comment.

The comment on lines 41-42 mentions "macOS platform tests" but the suite is now Linux-only via describeLinux (line 24), so these tests won't run on macOS. The comment is explaining the design rationale (avoiding Bash 3 compatibility issues discovered during development), but it reads as if macOS might still execute these tests. Consider rephrasing to: "Keep the harness focused on ensure_docker to avoid Bash 3 compatibility issues (originally discovered on macOS) and isolate test behavior from the installer's full top-level state."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/install-docker-group-reexec.test.ts` around lines 37 - 52, Update the
inline comment above the test harness creation to clarify that the harness is
focused on ensure_docker to avoid Bash 3 compatibility issues (originally
discovered on macOS) and to isolate test behavior from the installer's top-level
Bash state; change the existing macOS-focused wording to the suggested phrasing
and keep references around harnessDir, installHarness,
installer_non_interactive, and ENSURE_DOCKER_FUNCTION intact so only the comment
text is modified.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/install-docker-group-reexec.test.ts`:
- Around line 37-52: Update the inline comment above the test harness creation
to clarify that the harness is focused on ensure_docker to avoid Bash 3
compatibility issues (originally discovered on macOS) and to isolate test
behavior from the installer's top-level Bash state; change the existing
macOS-focused wording to the suggested phrasing and keep references around
harnessDir, installHarness, installer_non_interactive, and
ENSURE_DOCKER_FUNCTION intact so only the comment text is modified.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2302a212-1900-4390-9b62-0130c79cb2f0

📥 Commits

Reviewing files that changed from the base of the PR and between 325ed77 and 2119821.

📒 Files selected for processing (4)
  • scripts/nemoclaw-start.sh
  • src/lib/onboard/sandbox-gpu-preflight.test.ts
  • test/install-docker-group-reexec.test.ts
  • test/nemoclaw-start.test.ts

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26895200572
Target ref: 2119821c7b591566bd90e6d21b3ccdf84640a732
Workflow ref: main
Requested jobs: runtime-overrides-e2e,shields-config-e2e,cloud-e2e
Summary: 3 passed, 0 failed, 0 skipped

Job Result
cloud-e2e ✅ success
runtime-overrides-e2e ✅ success
shields-config-e2e ✅ success

@cv cv merged commit b61ed2b into main Jun 3, 2026
37 of 38 checks passed
@cv cv deleted the fix/platform-vitest-main branch June 3, 2026 15:57
@wscurran wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants