feat(ci): add non-root sandbox smoke test as PR gate by jyaunches · Pull Request #2711 · NVIDIA/NemoClaw

jyaunches · 2026-04-29T22:19:32Z

Summary

Add a lightweight E2E job to pr-self-hosted.yaml that validates the sandbox container starts successfully under --security-opt=no-new-privileges — the constraint applied by OpenShell on Brev Launchable and DGX Spark.

What This Tests

Starts the production image as the sandbox user with no-new-privileges
Polls /health until the gateway responds (HTTP 200 or 401) within 60s
Verifies openclaw tui --help succeeds (gateway auth token available)
Confirms the non-root fallback path is exercised in logs

What This Would Have Caught

Bug 1 (fix(sandbox): fix non-root gateway startup and add crash safety net #2472): install_configure_guard writes to .bashrc/.profile crash under Landlock + set -e → gateway never starts → HTTP probe fails immediately
Bug 2 (fix(security): revert gateway auth token externalization #2482): token removed from openclaw.json → openclaw tui fails with "Missing gateway auth token"

Implementation

test/e2e/test-non-root-smoke.sh — standalone E2E script following existing patterns (e2e-gateway-isolation.sh, e2e-port-overrides.sh)
pr-self-hosted.yaml — new test-non-root-sandbox-smoke job after build-sandbox-images (reuses isolation-image artifact)

CI Impact

Runtime: ~2 minutes on top of existing image build (health poll dominates)
Runner: linux-amd64-cpu4 (same as other E2E jobs)
Secrets: None required (no API key needed)
Timeout: 5 minutes

Strategy Gap Addressed

GAP 1 from the E2E strategy doc: non-root / Landlock coverage
Issue Add Brev E2E test: verify sandbox container starts under OpenShell no-new-privileges #851 (closed): previously proposed but never implemented

Closes #2571

Summary by CodeRabbit

Tests
- Implemented comprehensive smoke testing for sandbox functionality to validate proper operation in non-root execution environments, ensuring correct security isolation and behavior when running with restricted privileges and limited system access.
Chores
- Enhanced continuous integration pipeline with automated sandbox validation testing to systematically improve quality assurance and ensure consistent deployment reliability.

Add a lightweight E2E job to pr-self-hosted.yaml that validates the sandbox container starts successfully under --security-opt=no-new-privileges (the constraint applied by OpenShell on Brev Launchable and DGX Spark). The test: 1. Starts the production image as the sandbox user with no-new-privileges 2. Polls /health until the gateway responds (HTTP 200 or 401) 3. Verifies 'openclaw tui --help' succeeds (gateway auth token available) 4. Confirms the non-root fallback path is exercised in logs This would have caught Bug 1 from the Apr 20-25 regression (#2472) where install_configure_guard crashed under Landlock + set -e, preventing the gateway from starting — a 5-day outage invisible to all existing CI gates. Estimated CI overhead: ~2 minutes (health poll dominates). Closes #2571

coderabbitai · 2026-04-29T22:19:44Z

📝 Walkthrough

Walkthrough

This PR introduces a new CI job and supporting test script to validate sandbox startup under non-root constraints with security restrictions, catching potential entrypoint failures early in the development cycle.

Changes

Cohort / File(s)	Summary
CI Workflow Configuration `.github/workflows/pr-self-hosted.yaml`	Added new `test-non-root-sandbox-smoke` job that downloads the isolation image artifact, loads it into Docker, and executes a smoke test to validate non-root sandbox behavior with security restrictions.
Non-Root Smoke Test Script `test/e2e/test-non-root-smoke.sh`	New executable Bash script that validates Docker setup, starts container as non-root user with `no-new-privileges`, polls gateway `/health` endpoint with timeout, and verifies `openclaw tui --help` succeeds without auth token errors. Includes cleanup and error handling.

Sequence Diagram

sequenceDiagram
    actor Runner as GitHub Runner
    participant Script as test-non-root-smoke.sh
    participant Docker as Docker Engine
    participant Container as Container (non-root)
    participant Gateway as Gateway Service
    participant CLI as OpenCLAW CLI

    Runner->>Script: Execute smoke test
    Script->>Docker: Check Docker availability
    Script->>Docker: Verify image exists
    Script->>Docker: Extract UID/GID from image
    Script->>Docker: Start container (--security-opt no-new-privileges)
    Docker->>Container: Launch with security constraints
    Container->>Gateway: Entrypoint starts gateway
    Script->>Container: Wait for startup
    Script->>Gateway: Poll /health endpoint (timeout: 60s)
    Gateway-->>Script: Response (200 or 401)
    Script->>Container: Execute openclaw tui --help
    Container->>CLI: Start CLI tool
    CLI-->>Container: Output result
    Script->>Script: Verify auth token not missing
    Script->>Docker: Capture logs
    Script->>Docker: Remove container
    Script-->>Runner: Test result

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A sandbox runs without the root,
Beneath security's watchful boot,
Health checks poll through the night,
CLI spins up just right!
Non-root bugs won't slip past our sight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat(ci): add non-root sandbox smoke test as PR gate' clearly and concisely describes the main change—introducing a new CI job for non-root sandbox testing.
Linked Issues check	✅ Passed	The PR fully implements all coding requirements from issue `#2571`: adds the test-non-root-sandbox-smoke job to pr-self-hosted.yaml, creates test-non-root-smoke.sh with required functionality (no-new-privileges enforcement, /health polling with 60s timeout, openclaw tui verification).
Out of Scope Changes check	✅ Passed	All changes are directly scoped to implementing the non-root sandbox smoke test: workflow job configuration and test script. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch issue-2571-non-root-sandbox-smoke-test

_{Review rate limit: 8/10 reviews remaining, refill in 10 minutes and 53 seconds.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-non-root-smoke.sh`:
- Around line 153-158: The curl in the health probe loop (where HTTP_CODE is
set) lacks per-request timeouts so a single stalled connection can block the
whole polling loop; update the curl invocation that sets HTTP_CODE to include
--connect-timeout 1 and --max-time "$HEALTH_INTERVAL" (using the existing
HEALTH_INTERVAL variable) so each probe respects the polling interval and the
overall HEALTH_TIMEOUT behavior remains effective.
- Around line 118-124: The docker run invocation currently redirects both stdout
and stderr to /dev/null which hides failures and causes the script to exit
before the diagnostic block can run; change the invocation that starts the
container (the docker run that uses CONTAINER_NAME, SB_UID, SB_GID, HOST_PORT,
DASHBOARD_PORT, and IMAGE) so that you capture stderr (e.g., to a temp file or
variable) instead of discarding it, check the docker run exit status, and on
failure print the captured stderr (and any contextual info) before exiting so
the existing crash-diagnostic block can run; ensure you still respect set -e by
handling the failure explicitly (checking $? or using ||) rather than relying on
silent redirection.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bd0fc978-3cd1-4405-acff-45727c88d0a1

📥 Commits

Reviewing files that changed from the base of the PR and between 02c55e7 and 1cb8132.

📒 Files selected for processing (2)

.github/workflows/pr-self-hosted.yaml
test/e2e/test-non-root-smoke.sh

coderabbitai · 2026-04-29T22:25:18Z

+info "2. Start container with --security-opt no-new-privileges --user $SB_UID:$SB_GID"
+docker run -d \
+  --name "$CONTAINER_NAME" \
+  --security-opt no-new-privileges \
+  --user "${SB_UID}:${SB_GID}" \
+  -p "${HOST_PORT}:${DASHBOARD_PORT}" \
+  "$IMAGE" >/dev/null 2>&1


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n test/e2e/test-non-root-smoke.sh | sed -n '100,150p'

Repository: NVIDIA/NemoClaw

Length of output: 2179

🏁 Script executed:

head -30 test/e2e/test-non-root-smoke.sh

Repository: NVIDIA/NemoClaw

Length of output: 1258

🏁 Script executed:

grep -n "set -e" test/e2e/test-non-root-smoke.sh

Repository: NVIDIA/NemoClaw

Length of output: 156

Expose docker run failure messages for debugging.

With set -euo pipefail enabled and both stdout and stderr redirected to /dev/null on line 124, if the docker run command fails, the error message is hidden and the script exits immediately due to set -e—preventing the crash-diagnostic block (lines 126-134) from executing. This makes it difficult to diagnose why the container failed to start.

Wrap the command to catch failures and preserve stderr visibility:

Suggested fix

info "2. Start container with --security-opt no-new-privileges --user $SB_UID:$SB_GID" -docker run -d \ +if ! docker run -d \ --name "$CONTAINER_NAME" \ --security-opt no-new-privileges \ --user "${SB_UID}:${SB_GID}" \ -p "${HOST_PORT}:${DASHBOARD_PORT}" \ - "$IMAGE" >/dev/null 2>&1 + "$IMAGE" >/dev/null; then + fail "Container failed to start" + exit 1 +fi

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

info "2. Start container with --security-opt no-new-privileges --user $SB_UID:$SB_GID"

docker run -d \

--name "$CONTAINER_NAME" \

--security-opt no-new-privileges \

--user "${SB_UID}:${SB_GID}" \

-p "${HOST_PORT}:${DASHBOARD_PORT}" \

"$IMAGE" >/dev/null 2>&1

info "2. Start container with --security-opt no-new-privileges --user $SB_UID:$SB_GID"

if ! docker run -d \

--name "$CONTAINER_NAME" \

--security-opt no-new-privileges \

--user "${SB_UID}:${SB_GID}" \

-p "${HOST_PORT}:${DASHBOARD_PORT}" \

"$IMAGE" >/dev/null; then

fail "Container failed to start"

exit 1

fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-non-root-smoke.sh` around lines 118 - 124, The docker run invocation currently redirects both stdout and stderr to /dev/null which hides failures and causes the script to exit before the diagnostic block can run; change the invocation that starts the container (the docker run that uses CONTAINER_NAME, SB_UID, SB_GID, HOST_PORT, DASHBOARD_PORT, and IMAGE) so that you capture stderr (e.g., to a temp file or variable) instead of discarding it, check the docker run exit status, and on failure print the captured stderr (and any contextual info) before exiting so the existing crash-diagnostic block can run; ensure you still respect set -e by handling the failure explicitly (checking $? or using ||) rather than relying on silent redirection.

coderabbitai · 2026-04-29T22:25:18Z

+info "4. Poll gateway /health (timeout: ${HEALTH_TIMEOUT}s)"
+ELAPSED=0
+HEALTH_OK=false
+
+while [ "$ELAPSED" -lt "$HEALTH_TIMEOUT" ]; do
+  HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:${HOST_PORT}/health" 2>/dev/null || echo "000")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check file existence and read the relevant section wc -l test/e2e/test-non-root-smoke.sh

Repository: NVIDIA/NemoClaw

Length of output: 95

🏁 Script executed:

# Read the file around lines 153-158 with context sed -n '140,170p' test/e2e/test-non-root-smoke.sh

Repository: NVIDIA/NemoClaw

Length of output: 1008

🏁 Script executed:

# Search for HEALTH_INTERVAL and HEALTH_TIMEOUT definitions rg 'HEALTH_(INTERVAL|TIMEOUT)' test/e2e/test-non-root-smoke.sh -B 2 -A 2

Repository: NVIDIA/NemoClaw

Length of output: 981

🏁 Script executed:

# Check if SPDX license header is present head -5 test/e2e/test-non-root-smoke.sh

Repository: NVIDIA/NemoClaw

Length of output: 249

Add per-request timeouts to the curl health check.

Line 158 calls curl without timeout flags. If the socket accepts but never responds, a single probe can block indefinitely, defeating the 60s timeout guarantee. Add --connect-timeout 1 --max-time "$HEALTH_INTERVAL" to ensure each request times out within the polling interval:

Suggested fix

- HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:${HOST_PORT}/health" 2>/dev/null || echo "000") + HTTP_CODE=$(curl -s --connect-timeout 1 --max-time "$HEALTH_INTERVAL" -o /dev/null -w "%{http_code}" "http://localhost:${HOST_PORT}/health" 2>/dev/null || echo "000")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

info "4. Poll gateway /health (timeout: ${HEALTH_TIMEOUT}s)"

ELAPSED=0

HEALTH_OK=false

while [ "$ELAPSED" -lt "$HEALTH_TIMEOUT" ]; do

HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:${HOST_PORT}/health" 2>/dev/null || echo "000")

info "4. Poll gateway /health (timeout: ${HEALTH_TIMEOUT}s)"

ELAPSED=0

HEALTH_OK=false

while [ "$ELAPSED" -lt "$HEALTH_TIMEOUT" ]; do

HTTP_CODE=$(curl -s --connect-timeout 1 --max-time "$HEALTH_INTERVAL" -o /dev/null -w "%{http_code}" "http://localhost:${HOST_PORT}/health" 2>/dev/null || echo "000")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/e2e/test-non-root-smoke.sh` around lines 153 - 158, The curl in the health probe loop (where HTTP_CODE is set) lacks per-request timeouts so a single stalled connection can block the whole polling loop; update the curl invocation that sets HTTP_CODE to include --connect-timeout 1 and --max-time "$HEALTH_INTERVAL" (using the existing HEALTH_INTERVAL variable) so each probe respects the polling interval and the overall HEALTH_TIMEOUT behavior remains effective.

jyaunches · 2026-05-07T23:32:21Z

Superseded by #3166 (@hunglp6d) which implements the same issue #2571 with the agreed-upon scope: Tests 1–3 as PR gate now, Test 4 (openclaw tui gateway token) deferred to ride with #2485.

See #2571 for the updated acceptance checklist.

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

jyaunches mentioned this pull request Apr 29, 2026

refactor(sandbox): default to mutable config, make shields opt-in #2227

Merged

8 tasks

wscurran added Platform: DGX Spark platform: brev Affects Brev hosted development environments labels Apr 30, 2026

jyaunches mentioned this pull request May 7, 2026

test(e2e): add non-root sandbox smoke test #3166

Merged

12 tasks

jyaunches closed this May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ci): add non-root sandbox smoke test as PR gate#2711

feat(ci): add non-root sandbox smoke test as PR gate#2711
jyaunches wants to merge 1 commit into
mainfrom
issue-2571-non-root-sandbox-smoke-test

jyaunches commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Uh oh!

coderabbitai Bot Apr 29, 2026

Uh oh!

jyaunches commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jyaunches commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What This Tests

What This Would Have Caught

Implementation

CI Impact

Strategy Gap Addressed

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

jyaunches commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jyaunches commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited

Loading