fix(host-aliases): probe legacy gateway container before kubectl exec#4772
Conversation
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
📝 WalkthroughWalkthroughAdds a bounded Docker probe that classifies legacy gateway container presence (present/absent/unknown), centralizes host-alias add/remove validation to run before probing, and updates tests/stubs to assert probe ordering and probe-failure/absent behaviors. ChangesLegacy Gateway Container Availability Check & Host-alias Validation
Sequence DiagramsequenceDiagram
participant NemoclawCLI
participant DockerDaemon as Docker
participant Kubectl
NemoclawCLI->>Docker: docker ps --format {{.Names}} (probe openshell-cluster-nemoclaw)
Docker-->>NemoclawCLI: present / absent / error
alt present
NemoclawCLI->>Docker: docker exec openshell-cluster-nemoclaw kubectl ... (host-alias flow)
Docker->>Kubectl: run kubectl
Kubectl-->>NemoclawCLI: response
else absent
NemoclawCLI-->>NemoclawCLI: throw legacy gateway missing error
else error
NemoclawCLI-->>NemoclawCLI: throw docker probe failed error with docker info guidance
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: None Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 1 needs attention, 1 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
Selective E2E Results — ✅ All requested jobs passedRun: 26959715788
|
…s, drop fragile filter Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 26960995428
|
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
prekshivyas
left a comment
There was a problem hiding this comment.
LGTM — approving. Verified the logic against source:
probeLegacyGatewayContainer()correctly uses the unfiltereddocker ps --format '{{.Names}}'+ exact in-code match, sidestepping the fragile--filter name=^...$anchor.dockerSpawnSync(src/lib/adapters/docker/exec.ts) is justspawnSync("docker", ...), so.error/.code,.status,.stdout/.stderrare all valid, andencoding: "utf-8"makes the streams strings. Thepresent | absent | unknowndiscriminator cleanly separates a missing gateway from a docker outage/timeout/permission error — which is exactly whydockerSpawnSyncis the right call here overdockerCapture(you need to classify, not just capture).- The driver fast-path still short-circuits before any docker call, so direct-container drivers don't pay for a probe. Kubernetes/pre-
openshellDriverentries with a running gateway returnpresent→ no behavior change; the only regression surface is the opaque "No such container" error, which is what this fixes. - Reordering arg validation ahead of the gateway probe is a real UX win (bad hostname/IP no longer masked by a missing gateway), and the test pins it (
docker.lognever created when validation fails).
Test coverage is excellent: pre-feature registry + gateway-down, validate-before-probe, ENOENT, timeout, and non-zero-exit are each asserted distinctly, and the argv-shape checks pin the no---filter contract. CI is fully green including dco-check and the e2e suite.
Nice, careful fix.
## Summary
- Add the v0.0.59 release notes from the GitHub announcement discussion.
- Refresh local inference and credential-storage guidance for the
current release behavior.
- Regenerate the user skills from the updated Fern docs.
- Tighten release-prep and docs review guidance for generated skills, PR
labels, and shared `$$nemoclaw` command placeholders.
## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" --glob '*.{md,mdx}'`
- `git diff --check`
- `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC
permission failure)
- `npm run typecheck:cli`
- Pre-commit hooks during commit passed, including markdownlint,
docs-to-skills verification, gitleaks, commitlint, and skills YAML
tests.
## Source Summary
- #3679, #4437, #4681, #4766, #4772, #4775, #4786 ->
`docs/about/release-notes.mdx`, `docs/reference/commands.mdx`,
`docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27
compatibility, runtime path pinning, plugin registry recovery, live
gateway reconciliation, and clearer host-alias/startup diagnostics.
- #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`,
`docs/inference/inference-options.mdx`,
`docs/inference/use-local-inference.mdx`,
`docs/inference/switch-inference-providers.mdx`: Document the release
inference changes covering Local NIM waits, Hermes Anthropic routing,
Nemotron 3 Ultra, the current Ollama starter fallback, and Spark
managed-vLLM context length.
- #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`,
`docs/security/credential-storage.mdx`,
`docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/troubleshooting.mdx`: Capture permission healing,
gateway-stored credential reuse, cross-sandbox messaging credential
conflict checks, and CDI preflight diagnostics.
- #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`:
Regenerate the user skill references from the updated source docs.
- Follow-up maintenance ->
`.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`,
`.coderabbit.yaml`: Add release-prep area labels for docs and skills
PRs, and teach docs review guidance that `$$nemoclaw` is the correct
shared command placeholder for examples that work across agent aliases.
Note: the `documentation` label was not present in the repository, so
this PR is labeled with `v0.0.59` only.
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **Documentation**
* Updated default model for local Ollama inference setup to qwen3.5:9b
* Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option
* Clarified credential storage and reuse behavior for post-deployment
(day-two) operations
* Added v0.0.59 release notes covering OpenClaw compatibility, inference
options, Hermes messaging sync, and troubleshooting
* Clarified CLI selection guidance and updated OpenClaw version example
in status output
* Revised release-prep instructions and docs review guidance for CLI
alias usage
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Host alias commands depend on the legacy
openshell-cluster-nemoclawk3s gateway container, but the existing driver-based gate at src/lib/actions/sandbox/host-aliases.ts:83 only fires when the registry recordsopenshellDriverasdockerorvm. Sandboxes onboarded by older NemoClaw releases (registry pre-dates the field) and kubernetes-driver sandboxes whose gateway never came up both slip past the gate and fall through todocker exec openshell-cluster-nemoclaw kubectl ..., surfacing an opaqueError response from daemon: No such container: openshell-cluster-nemoclawto the user. Add a runtime probe that runsdocker ps --format '{{.Names}}'and exact-matches the legacy container, with a discriminatedpresent | absent | unknownstate so docker-down / timeout / spawn failures get a distinct error.Related Issue
Fixes #4317.
Changes
src/lib/actions/sandbox/host-aliases.ts: addprobeLegacyGatewayContainer()(unfiltereddocker ps --format '{{.Names}}'viadockerSpawnSync, exact post-match on container name) returningpresent | absent | unknown.assertLegacyGatewayHostAliasSupportconsumes the probe after the registry-driver fast path:absentraises the host-aliases-not-supported message;unknownraises a docker-probe-failed message that points atdocker info. The driver branch keeps its short-circuit so no docker call runs when the registry already records a direct-container driver.src/lib/actions/sandbox/host-aliases.ts: extractvalidateAddOptions/validateRemoveOptionsand call them before the gateway probe inaddSandboxHostAlias/removeSandboxHostAlias, so invalid hostname / IP / missing-arg errors are never masked by a missing legacy gateway.test/cli.test.ts:writeHostAliasDockerStubnow answersdocker psfor the legacy gateway name (default running, opt-out viagatewayRunning: false); the inline stubs for hosts-add / hosts-list / retry gain the samepsbranch. The threelog[0] === "exec"regression asserts switch to akubectl-index sequence check that survives the new leading probe and additionally pins the unfiltered probe argv shape (ps,--format,{{.Names}}, no--filter). New tests cover: legacy gateway missing on a pre-feature registry entry; argument validation running before the gateway probe; docker spawn ENOENT classified distinctly from missing gateway; docker probe timeout classified distinctly; docker probe non-zero exit classified distinctly.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Tinson Lai tinsonl@nvidia.com