Skip to content

feat(ci): add non-root sandbox smoke test as PR gate#2711

Closed
jyaunches wants to merge 1 commit into
mainfrom
issue-2571-non-root-sandbox-smoke-test
Closed

feat(ci): add non-root sandbox smoke test as PR gate#2711
jyaunches wants to merge 1 commit into
mainfrom
issue-2571-non-root-sandbox-smoke-test

Conversation

@jyaunches

@jyaunches jyaunches commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Add a lightweight E2E job to pr-self-hosted.yaml that validates the sandbox container starts successfully under --security-opt=no-new-privileges — the constraint applied by OpenShell on Brev Launchable and DGX Spark.

What This Tests

  1. Starts the production image as the sandbox user with no-new-privileges
  2. Polls /health until the gateway responds (HTTP 200 or 401) within 60s
  3. Verifies openclaw tui --help succeeds (gateway auth token available)
  4. Confirms the non-root fallback path is exercised in logs

What This Would Have Caught

Implementation

  • test/e2e/test-non-root-smoke.sh — standalone E2E script following existing patterns (e2e-gateway-isolation.sh, e2e-port-overrides.sh)
  • pr-self-hosted.yaml — new test-non-root-sandbox-smoke job after build-sandbox-images (reuses isolation-image artifact)

CI Impact

  • Runtime: ~2 minutes on top of existing image build (health poll dominates)
  • Runner: linux-amd64-cpu4 (same as other E2E jobs)
  • Secrets: None required (no API key needed)
  • Timeout: 5 minutes

Strategy Gap Addressed

Closes #2571

Summary by CodeRabbit

  • Tests

    • Implemented comprehensive smoke testing for sandbox functionality to validate proper operation in non-root execution environments, ensuring correct security isolation and behavior when running with restricted privileges and limited system access.
  • Chores

    • Enhanced continuous integration pipeline with automated sandbox validation testing to systematically improve quality assurance and ensure consistent deployment reliability.

Add a lightweight E2E job to pr-self-hosted.yaml that validates the
sandbox container starts successfully under --security-opt=no-new-privileges
(the constraint applied by OpenShell on Brev Launchable and DGX Spark).

The test:
1. Starts the production image as the sandbox user with no-new-privileges
2. Polls /health until the gateway responds (HTTP 200 or 401)
3. Verifies 'openclaw tui --help' succeeds (gateway auth token available)
4. Confirms the non-root fallback path is exercised in logs

This would have caught Bug 1 from the Apr 20-25 regression (#2472) where
install_configure_guard crashed under Landlock + set -e, preventing the
gateway from starting — a 5-day outage invisible to all existing CI gates.

Estimated CI overhead: ~2 minutes (health poll dominates).

Closes #2571
@coderabbitai

coderabbitai Bot commented Apr 29, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This PR introduces a new CI job and supporting test script to validate sandbox startup under non-root constraints with security restrictions, catching potential entrypoint failures early in the development cycle.

Changes

Cohort / File(s) Summary
CI Workflow Configuration
.github/workflows/pr-self-hosted.yaml
Added new test-non-root-sandbox-smoke job that downloads the isolation image artifact, loads it into Docker, and executes a smoke test to validate non-root sandbox behavior with security restrictions.
Non-Root Smoke Test Script
test/e2e/test-non-root-smoke.sh
New executable Bash script that validates Docker setup, starts container as non-root user with no-new-privileges, polls gateway /health endpoint with timeout, and verifies openclaw tui --help succeeds without auth token errors. Includes cleanup and error handling.

Sequence Diagram

sequenceDiagram
    actor Runner as GitHub Runner
    participant Script as test-non-root-smoke.sh
    participant Docker as Docker Engine
    participant Container as Container (non-root)
    participant Gateway as Gateway Service
    participant CLI as OpenCLAW CLI

    Runner->>Script: Execute smoke test
    Script->>Docker: Check Docker availability
    Script->>Docker: Verify image exists
    Script->>Docker: Extract UID/GID from image
    Script->>Docker: Start container (--security-opt no-new-privileges)
    Docker->>Container: Launch with security constraints
    Container->>Gateway: Entrypoint starts gateway
    Script->>Container: Wait for startup
    Script->>Gateway: Poll /health endpoint (timeout: 60s)
    Gateway-->>Script: Response (200 or 401)
    Script->>Container: Execute openclaw tui --help
    Container->>CLI: Start CLI tool
    CLI-->>Container: Output result
    Script->>Script: Verify auth token not missing
    Script->>Docker: Capture logs
    Script->>Docker: Remove container
    Script-->>Runner: Test result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A sandbox runs without the root,
Beneath security's watchful boot,
Health checks poll through the night,
CLI spins up just right!
Non-root bugs won't slip past our sight!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(ci): add non-root sandbox smoke test as PR gate' clearly and concisely describes the main change—introducing a new CI job for non-root sandbox testing.
Linked Issues check ✅ Passed The PR fully implements all coding requirements from issue #2571: adds the test-non-root-sandbox-smoke job to pr-self-hosted.yaml, creates test-non-root-smoke.sh with required functionality (no-new-privileges enforcement, /health polling with 60s timeout, openclaw tui verification).
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing the non-root sandbox smoke test: workflow job configuration and test script. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch issue-2571-non-root-sandbox-smoke-test

Review rate limit: 8/10 reviews remaining, refill in 10 minutes and 53 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/e2e/test-non-root-smoke.sh`:
- Around line 153-158: The curl in the health probe loop (where HTTP_CODE is
set) lacks per-request timeouts so a single stalled connection can block the
whole polling loop; update the curl invocation that sets HTTP_CODE to include
--connect-timeout 1 and --max-time "$HEALTH_INTERVAL" (using the existing
HEALTH_INTERVAL variable) so each probe respects the polling interval and the
overall HEALTH_TIMEOUT behavior remains effective.
- Around line 118-124: The docker run invocation currently redirects both stdout
and stderr to /dev/null which hides failures and causes the script to exit
before the diagnostic block can run; change the invocation that starts the
container (the docker run that uses CONTAINER_NAME, SB_UID, SB_GID, HOST_PORT,
DASHBOARD_PORT, and IMAGE) so that you capture stderr (e.g., to a temp file or
variable) instead of discarding it, check the docker run exit status, and on
failure print the captured stderr (and any contextual info) before exiting so
the existing crash-diagnostic block can run; ensure you still respect set -e by
handling the failure explicitly (checking $? or using ||) rather than relying on
silent redirection.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: bd0fc978-3cd1-4405-acff-45727c88d0a1

📥 Commits

Reviewing files that changed from the base of the PR and between 02c55e7 and 1cb8132.

📒 Files selected for processing (2)
  • .github/workflows/pr-self-hosted.yaml
  • test/e2e/test-non-root-smoke.sh

Comment on lines +118 to +124
info "2. Start container with --security-opt no-new-privileges --user $SB_UID:$SB_GID"
docker run -d \
--name "$CONTAINER_NAME" \
--security-opt no-new-privileges \
--user "${SB_UID}:${SB_GID}" \
-p "${HOST_PORT}:${DASHBOARD_PORT}" \
"$IMAGE" >/dev/null 2>&1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n test/e2e/test-non-root-smoke.sh | sed -n '100,150p'

Repository: NVIDIA/NemoClaw

Length of output: 2179


🏁 Script executed:

head -30 test/e2e/test-non-root-smoke.sh

Repository: NVIDIA/NemoClaw

Length of output: 1258


🏁 Script executed:

grep -n "set -e" test/e2e/test-non-root-smoke.sh

Repository: NVIDIA/NemoClaw

Length of output: 156


Expose docker run failure messages for debugging.

With set -euo pipefail enabled and both stdout and stderr redirected to /dev/null on line 124, if the docker run command fails, the error message is hidden and the script exits immediately due to set -e—preventing the crash-diagnostic block (lines 126-134) from executing. This makes it difficult to diagnose why the container failed to start.

Wrap the command to catch failures and preserve stderr visibility:

Suggested fix
 info "2. Start container with --security-opt no-new-privileges --user $SB_UID:$SB_GID"
-docker run -d \
+if ! docker run -d \
   --name "$CONTAINER_NAME" \
   --security-opt no-new-privileges \
   --user "${SB_UID}:${SB_GID}" \
   -p "${HOST_PORT}:${DASHBOARD_PORT}" \
-  "$IMAGE" >/dev/null 2>&1
+  "$IMAGE" >/dev/null; then
+  fail "Container failed to start"
+  exit 1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
info "2. Start container with --security-opt no-new-privileges --user $SB_UID:$SB_GID"
docker run -d \
--name "$CONTAINER_NAME" \
--security-opt no-new-privileges \
--user "${SB_UID}:${SB_GID}" \
-p "${HOST_PORT}:${DASHBOARD_PORT}" \
"$IMAGE" >/dev/null 2>&1
info "2. Start container with --security-opt no-new-privileges --user $SB_UID:$SB_GID"
if ! docker run -d \
--name "$CONTAINER_NAME" \
--security-opt no-new-privileges \
--user "${SB_UID}:${SB_GID}" \
-p "${HOST_PORT}:${DASHBOARD_PORT}" \
"$IMAGE" >/dev/null; then
fail "Container failed to start"
exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-non-root-smoke.sh` around lines 118 - 124, The docker run
invocation currently redirects both stdout and stderr to /dev/null which hides
failures and causes the script to exit before the diagnostic block can run;
change the invocation that starts the container (the docker run that uses
CONTAINER_NAME, SB_UID, SB_GID, HOST_PORT, DASHBOARD_PORT, and IMAGE) so that
you capture stderr (e.g., to a temp file or variable) instead of discarding it,
check the docker run exit status, and on failure print the captured stderr (and
any contextual info) before exiting so the existing crash-diagnostic block can
run; ensure you still respect set -e by handling the failure explicitly
(checking $? or using ||) rather than relying on silent redirection.

Comment on lines +153 to +158
info "4. Poll gateway /health (timeout: ${HEALTH_TIMEOUT}s)"
ELAPSED=0
HEALTH_OK=false

while [ "$ELAPSED" -lt "$HEALTH_TIMEOUT" ]; do
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:${HOST_PORT}/health" 2>/dev/null || echo "000")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check file existence and read the relevant section
wc -l test/e2e/test-non-root-smoke.sh

Repository: NVIDIA/NemoClaw

Length of output: 95


🏁 Script executed:

# Read the file around lines 153-158 with context
sed -n '140,170p' test/e2e/test-non-root-smoke.sh

Repository: NVIDIA/NemoClaw

Length of output: 1008


🏁 Script executed:

# Search for HEALTH_INTERVAL and HEALTH_TIMEOUT definitions
rg 'HEALTH_(INTERVAL|TIMEOUT)' test/e2e/test-non-root-smoke.sh -B 2 -A 2

Repository: NVIDIA/NemoClaw

Length of output: 981


🏁 Script executed:

# Check if SPDX license header is present
head -5 test/e2e/test-non-root-smoke.sh

Repository: NVIDIA/NemoClaw

Length of output: 249


Add per-request timeouts to the curl health check.

Line 158 calls curl without timeout flags. If the socket accepts but never responds, a single probe can block indefinitely, defeating the 60s timeout guarantee. Add --connect-timeout 1 --max-time "$HEALTH_INTERVAL" to ensure each request times out within the polling interval:

Suggested fix
-  HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:${HOST_PORT}/health" 2>/dev/null || echo "000")
+  HTTP_CODE=$(curl -s --connect-timeout 1 --max-time "$HEALTH_INTERVAL" -o /dev/null -w "%{http_code}" "http://localhost:${HOST_PORT}/health" 2>/dev/null || echo "000")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
info "4. Poll gateway /health (timeout: ${HEALTH_TIMEOUT}s)"
ELAPSED=0
HEALTH_OK=false
while [ "$ELAPSED" -lt "$HEALTH_TIMEOUT" ]; do
HTTP_CODE=$(curl -s -o /dev/null -w "%{http_code}" "http://localhost:${HOST_PORT}/health" 2>/dev/null || echo "000")
info "4. Poll gateway /health (timeout: ${HEALTH_TIMEOUT}s)"
ELAPSED=0
HEALTH_OK=false
while [ "$ELAPSED" -lt "$HEALTH_TIMEOUT" ]; do
HTTP_CODE=$(curl -s --connect-timeout 1 --max-time "$HEALTH_INTERVAL" -o /dev/null -w "%{http_code}" "http://localhost:${HOST_PORT}/health" 2>/dev/null || echo "000")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/e2e/test-non-root-smoke.sh` around lines 153 - 158, The curl in the
health probe loop (where HTTP_CODE is set) lacks per-request timeouts so a
single stalled connection can block the whole polling loop; update the curl
invocation that sets HTTP_CODE to include --connect-timeout 1 and --max-time
"$HEALTH_INTERVAL" (using the existing HEALTH_INTERVAL variable) so each probe
respects the polling interval and the overall HEALTH_TIMEOUT behavior remains
effective.

@jyaunches

Copy link
Copy Markdown
Contributor Author

Superseded by #3166 (@hunglp6d) which implements the same issue #2571 with the agreed-upon scope: Tests 1–3 as PR gate now, Test 4 (openclaw tui gateway token) deferred to ride with #2485.

See #2571 for the updated acceptance checklist.

@jyaunches jyaunches closed this May 7, 2026
@wscurran wscurran added area: ci CI workflows, checks, release automation, or GitHub Actions area: e2e End-to-end tests, nightly failures, or validation infrastructure chore Build, CI, dependency, or tooling maintenance feature PR adds or expands user-visible functionality platform: dgx-spark Affects DGX Spark hardware or workflows and removed priority: high chore Build, CI, dependency, or tooling maintenance labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: ci CI workflows, checks, release automation, or GitHub Actions area: e2e End-to-end tests, nightly failures, or validation infrastructure feature PR adds or expands user-visible functionality platform: brev Affects Brev hosted development environments platform: dgx-spark Affects DGX Spark hardware or workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(ci): add non-root sandbox smoke test as PR gate

2 participants