fix: skip gateway marker for docker-driver sandboxes#4748
Conversation
Signed-off-by: Glenn-Agent <glenn_agent@163.com>
|
Warning Review limit reached
More reviews will be available in 7 minutes and 12 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR fixes a bug where the ChangesDocker-driver gateway marker fix
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
scripts/nemoclaw-start.sh (1)
218-221: ⚡ Quick winHarden docker-driver detection against whitespace/case in
OPENSHELL_DRIVERS
scripts/nemoclaw-start.shuses a raw, case-sensitive match (case ",${OPENSHELL_DRIVERS:-}," in *,docker,*) ...) with no normalization. This repo’s own/tested values are always"docker"or"vm,docker"(no spaces), so the current behavior aligns with in-repo expectations; however, ifOPENSHELL_DRIVERSever comes in as"vm, docker"or"Docker", the match will fail and marker gating could be wrong—normalize (trim, lowercase, remove whitespace) before thecase.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/nemoclaw-start.sh` around lines 218 - 221, Normalize OPENSHELL_DRIVERS before the case match so whitespace and case won’t break detection: create a normalized var (derived from OPENSHELL_DRIVERS) that is lowercased and has all whitespace removed or trimmed around commas, then run the existing case on that normalized value to set _NEMOCLAW_DOCKER_DRIVER; update the snippet that currently uses case ",${OPENSHELL_DRIVERS:-}," in to reference the normalized variable and keep the same pattern matching (e.g., look for ",docker,") so both "vm, docker" and "Docker" will be detected correctly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@scripts/nemoclaw-start.sh`:
- Around line 218-221: Normalize OPENSHELL_DRIVERS before the case match so
whitespace and case won’t break detection: create a normalized var (derived from
OPENSHELL_DRIVERS) that is lowercased and has all whitespace removed or trimmed
around commas, then run the existing case on that normalized value to set
_NEMOCLAW_DOCKER_DRIVER; update the snippet that currently uses case
",${OPENSHELL_DRIVERS:-}," in to reference the normalized variable and keep the
same pattern matching (e.g., look for ",docker,") so both "vm, docker" and
"Docker" will be detected correctly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 85063e10-623d-4e68-b646-0925e5e317c9
📒 Files selected for processing (2)
scripts/nemoclaw-start.shtest/nemoclaw-start.test.ts
|
✨ Thanks for submitting this detailed PR about skipping the gateway marker for Docker-driver sandboxes, which improves the healthcheck behavior in these environments. This proposes a bug fix for the sandbox that affects the gateway marker detection. Related open issues: |
## Summary - Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the dev announcement from discussion #4877. - Fills the source-doc gaps found during release-prep review across inference, policy tiers, command behavior, security boundaries, Hermes dashboard/tooling, runtime context, and troubleshooting. - Refreshes generated agent skills under `.agents/skills/` from the current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`. ## Source summary - #4037 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents system-only runtime context that stays out of visible chat. - #4875 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents try-first sandbox network/filesystem guidance and clearer failure classification. - #4788 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents shared OpenClaw device-approval policy for startup and connect. - #4768 -> `docs/reference/network-policies.mdx`, `docs/network-policy/integration-policy-examples.mdx`, `docs/get-started/quickstart.mdx`, `docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`: Documents `weather`, `public-reference`, and Hermes managed-tool gateway preset behavior. - #3788 and #4864 -> `docs/reference/network-policies.mdx`, `docs/reference/commands.mdx`: Documents non-interactive policy-tier fail-fast behavior and interactive prompt fallback. - #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware default sandbox resolution for `list`, `status`, and `tunnel` commands. - #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel status` behavior. - #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy preset descriptions in `policy-list`. - #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents package-managed OpenShell gateway service and Docker-driver gateway-marker behavior. - #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent gateway/dashboard cleanup isolation by sandbox name and port. - #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU patch rollback behavior. - #4610 -> `docs/reference/troubleshooting.mdx`, `docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission guidance aligned and removes skipped experimental wording. - #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling for custom `onboard --from <Dockerfile>` contexts in generated skills. - #4870 -> `docs/reference/commands.mdx`, `docs/manage-sandboxes/runtime-controls.mdx`: Documents `NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage. - #4641 -> `docs/inference/inference-options.mdx`, `docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM platform-digest pulls and served-model id adoption. - #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash coverage. - #4852 -> `docs/inference/use-local-inference.mdx`, `docs/reference/troubleshooting.mdx`: Documents Ollama model fit filtering, 16K context floor, cold-load retry, and failed-model exclusion. - #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents API-family sync, Hermes `api_mode`, and Bedrock Runtime exception. - #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents Nemotron managed-inference native tool-search fallback. - #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents interactive multimodal input prompting. - #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass normalization in generated troubleshooting coverage. - #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents prebuilt Hermes dashboard assets and TUI recovery without runtime rebuilds. - #4854 -> `docs/inference/switch-inference-providers.mdx`, `docs/reference/commands.mdx`: Documents Hermes proxy API-key placeholder preservation during inference switches. - #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`, `.agents/skills/`: Keeps messaging enrollment behavior aligned with manifest-hook implementation. - #4771 -> `docs/security/best-practices.mdx`, `docs/security/credential-storage.mdx`: Documents Hermes placeholder-only secret boundary for sandbox-visible runtime files. - #4787 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents expanded memory scanner examples for OpenAI project keys and Slack app-level tokens. - #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill install mirroring into the agent home directory. - #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep structure and generated `.agents/skills/` refresh as the template for this release. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/ --prefix nemoclaw-user --doc-platform fern-mdx --dry-run` - `npm run docs` - `git diff --check` - skip-term scan across `docs/`, `.agents/skills/`, and `skills/` - `npm run build:cli` - `npm run typecheck:cli` - Commit and pre-push hook suites, including markdownlint, gitleaks, env-var docs gate, docs-to-skills verification, and skills YAML tests <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * DeepSeek-V4-Flash now available as default inference model for DGX Station. * Hermes dashboard improved with dedicated port and OAuth-authenticated tool gateway selection. * Added weather and public-reference policy presets for expanded agent capabilities. * Enhanced Ollama model selection with GPU memory filtering and automatic retry for timeouts. * **Bug Fixes** * Improved policy tier validation to prevent invalid configurations. * Better sandbox cleanup scoping by port to prevent conflicts across deployments. * Added GPU patch failure recovery with automatic rollback. * **Documentation** * Expanded troubleshooting guides for inference, security, and sandbox lifecycle. * Added .dockerignore best practices for custom deployments. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>
…hint (#4710) (#5116) ## Summary - The early conditional gating `mark_in_container_gateway` on `OPENSHELL_DRIVERS=docker` (added in #4748) never fires inside the sandbox container, because OpenShell 0.0.44 does not export `OPENSHELL_DRIVERS` into the sandbox env. @hulynn's 2026-06-08 re-verification on v0.0.60 confirms the symptom is unchanged: marker is created on every empty-`NEMOCLAW_CMD` container, the Dockerfile HEALTHCHECK short-circuit (`[ -f /tmp/nemoclaw-gateway-local ] || exit 0`) is unreachable for docker-driver sandboxes, and the container is marked `(unhealthy)` on every fresh onboard. - This PR drops the early env-gated write and calls `mark_in_container_gateway()` directly before each `openclaw gateway run --port ...` launch (both non-root and root-mode paths). The marker is now true-by-construction: it exists if-and-only-if this container is about to start the gateway, regardless of whether the deployment-mode signal is plumbed through the sandbox env. This is hulynn's fix-direction (c) from #4710: > Move mark_in_container_gateway into the gateway-launching codepath itself, so it only runs when this container will actually start the gateway. ## Related Issue Fixes #4710. Supersedes the no-op env-var guard introduced by #4748. ## Changes - `scripts/nemoclaw-start.sh`: remove the early `case ",${OPENSHELL_DRIVERS:-},"` block and the `_NEMOCLAW_DOCKER_DRIVER` gate that wrapped the early `mark_in_container_gateway` call; add `mark_in_container_gateway` immediately before both first-launch invocations of `openclaw gateway run --port "${_DASHBOARD_PORT}"` (non-root at line ~3324, root-mode at line ~3550). Restart-loop fallback launches inherit the marker from the first launch — the function is idempotent (`: >file`), so calling it again on retry is a no-op. - `scripts/nemoclaw-start.sh`: rewrite the function-definition comment block to explain the launch-site invariant and the #4710 root cause, so future regressions cannot reintroduce an env-gated form without removing the warning. - `test/nemoclaw-start.test.ts`: replace the old startup-snippet test with two new structural tests: - **`ties the in-container gateway healthcheck marker to the gateway launch site (#4503, #4710)`** — asserts (a) the region immediately after the function definition contains no early `mark_in_container_gateway` call and no `OPENSHELL_DRIVERS` conditional; (b) every `openclaw gateway run --port` invocation in the script is preceded (within 6 lines) by `mark_in_container_gateway`, OR sits inside a restart-loop body that inherits from the first launch. The first invariant blocks a regression back to the env-gated form; the second locks the new contract. - **`mark_in_container_gateway writes the marker file idempotently (#4710)`** — behavioral check on the helper itself. - `test/nemoclaw-start.test.ts`: the two existing launch-block tests (`registers child PIDs ...` and `launches the root gateway through gosu ...`) now stub `mark_in_container_gateway() { :; }` in their preamble so the extracted shell snippet remains self-contained after the new call is introduced. ## Type of Change - [x] Code change (feature, bug fix, or refactor) ## Verification - `npx vitest run --project cli test/nemoclaw-start.test.ts` — **120 passing** (vs. 119 baseline before this PR; 19 pre-existing failures on `main` are unchanged and unrelated to this change). - Manual diff review of `scripts/nemoclaw-start.sh`: confirmed both first-launch sites (lines 3328 + 3553 after edit) have `mark_in_container_gateway` immediately above the `nohup ... "$OPENCLAW" gateway run --port ...` invocation; the two restart-loop fallbacks at lines ~3382 / ~3639 sit inside `while`/`until` blocks downstream of those first launches and reuse the marker that's already there. ## Scope note A separate OpenClaw-side concern surfaced in @Dongni-Yang's 2026-06-04 thread (in-sandbox gateway loses its HTTP listener while the process stays alive) is out of scope for this PR — that fix has to live in the OpenClaw gateway codebase. This PR only addresses the **marker-file regression hulynn re-confirmed in #4710** for docker-driver sandboxes where the gateway legitimately runs on the host. Standalone deployments where the gateway runs in-container are unaffected: the marker is still created (just at the launch site instead of at startup), so the existing pgrep healthcheck behavior is preserved bit-for-bit. Signed-off-by: Shawn Xie <shaxie@nvidia.com> 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Improved container health-status handling so the in-container gateway liveness marker is only created when the gateway actually runs inside the container; host-delivered gateways skip marker creation and in-container path. * Adjusted test file-size budget for the related tests. * **Tests** * Added/updated tests verifying marker creation, idempotence, and presence at gateway start across launch modes; added host-delivery skip scenarios and harness stubs to isolate marker behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Shawn Xie <shaxie@nvidia.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>
Summary
/tmp/nemoclaw-gateway-localabsent for OpenShell Docker-driver sandboxesOPENSHELL_DRIVERS, including mixed driver listsFixes #4710.
Validation
npm test -- --run test/nemoclaw-start.test.ts -t "gateway healthcheck marker"git diff --checkgit verify-commit HEADNote: running the full
test/nemoclaw-start.test.tsfile locally currently fails in unrelated baseline fixture cases becausenemoclaw/node_modules/json5is absent in this checkout; the targeted marker regression passes.Signed-off-by: Glenn-Agent glenn_agent@163.com
Summary by CodeRabbit
Bug Fixes