Skip to content

fix(inference): improve managed vLLM download and launch progress#4676

Merged
cv merged 2 commits into
mainfrom
fix/vllm-progress-report-improve
Jun 2, 2026
Merged

fix(inference): improve managed vLLM download and launch progress#4676
cv merged 2 commits into
mainfrom
fix/vllm-progress-report-improve

Conversation

@zyang-dev

@zyang-dev zyang-dev commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Summary

Improves managed local vLLM setup progress visibility and readiness handling. Model downloads now stream native Hugging Face progress output, and vLLM launch waits quietly on the /v1/models readiness endpoint with bounded probes and failure log tails.

Changes

  • Stream hf download output directly with Docker TTY enabled for live progress.
  • Add a silence heartbeat for model downloads without imposing a hard download timeout.
  • Replace vLLM startup log-marker parsing with quiet /v1/models readiness polling.
  • Print recent vLLM container logs only when launch readiness fails.
  • Add curl prereq and bounded curl timeouts for readiness checks.
  • Clean up vLLM install and launch progress copy.
  • Update focused vLLM profile tests after removing startup marker fields.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com

Summary by CodeRabbit

  • Bug Fixes

    • More reliable vLLM readiness via API endpoint polling.
    • Improved error reporting showing container log tail on launch failures.
  • Improvements

    • Simplified install/profile behavior to rely on timeout controls.
    • Periodic heartbeat messages during image pull, model download, and readiness waiting.
    • Docker prerequisite now requires curl for readiness checks.
  • Tests

    • Updated tests to validate timeout-based profile behavior.

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 95c922ba-9faa-4205-ab86-a9b6d5e48960

📥 Commits

Reviewing files that changed from the base of the PR and between ee05b87 and 505dad7.

📒 Files selected for processing (2)
  • src/lib/inference/vllm.ts
  • test/detect-vllm-profile.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/inference/vllm.ts

📝 Walkthrough

Walkthrough

Replaces log-stream readiness detection with polling of the vLLM OpenAI-compatible /v1/models endpoint, simplifies VllmProfile to timeout-based fields, adds heartbeat messages during download and launch, requires curl for readiness checks, and prints container log tails on launch failures.

Changes

vLLM Readiness Mechanism Replacement

Layer / File(s) Summary
VllmProfile contract and profile data
src/lib/inference/vllm.ts
VllmProfile removes installation log-pattern fields and clarifies timeout control (pullTimeoutSec, loadTimeoutSec). SPARK, STATION, and GENERIC_LINUX profiles updated to use explicit timeouts.
Infrastructure helpers and Docker prerequisites
src/lib/inference/vllm.ts
Adds heartbeat interval constants and formatElapsed; Docker prereq checks now require curl.
Download phase with heartbeat messaging
src/lib/inference/vllm.ts
downloadModel gains resolved-once guarding, a heartbeat timer that emits "still running" messages when output stalls, improved stdout/stderr tail capture, adds -t to hf download run, and standardizes failure reasons.
Endpoint-based readiness polling and diagnostics
src/lib/inference/vllm.ts
Adds helpers to query /v1/models, parses its JSON, implements waitForVllmReady polling loop with 5s heartbeats and loadTimeoutSec enforcement, and adds container log-tail capture for diagnostics.
Install/launch orchestration and messaging
src/lib/inference/vllm.ts
Updates user messaging to "Installing vLLM. Progress will print below.", emits "Launching vLLM", waits via waitForVllmReady, and prints container logs on failure before returning { ok: false }.
Profile timeout assertion updates
test/detect-vllm-profile.test.ts
Test now asserts generic NVIDIA profile reuses Spark's pullTimeoutSec and loadTimeoutSec timeout values.

Sequence Diagram(s)

sequenceDiagram
  participant Install as installVllm
  participant Poll as waitForVllmReady
  participant Endpoint as /v1/models Endpoint
  participant Container as vLLM Container
  participant Logs as Container Logs
  
  Install->>Poll: start polling with timeout
  Poll->>Container: docker inspect (running?)
  Container-->>Poll: running or not
  Poll->>Endpoint: curl /v1/models
  Endpoint-->>Poll: success or network error
  alt ready
    Poll-->>Install: {ok: true}
  else timeout reached
    Poll->>Logs: tail container logs
    Logs-->>Poll: recent log lines
    Poll-->>Install: {ok: false} + log tail
  else container stopped
    Poll-->>Install: {ok: false}
  else heartbeat interval
    Poll-->>Poll: emit "API not ready" message
    Poll->>Endpoint: retry after interval
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4619: Modifies src/lib/inference/vllm.ts vLLM installation/launch flow and image/model selection; relates to endpoint-based readiness and docker invocation changes.

Suggested labels

Provider: vLLM

Poem

🐰 I watched the logs and hopped about the glen,
Now I poll the endpoint, again and again—
A heartbeat ping, a gentle cheer,
Tail the logs when doom is near,
Hop, install, vLLM's live—let's run!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 15.79% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: improving vLLM download and launch progress visibility by replacing log parsing with endpoint polling and adding heartbeat messaging.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/vllm-progress-report-improve

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: gpu-e2e, inference-routing-e2e
Optional E2E: onboard-inference-smoke-e2e, cloud-onboard-e2e

Dispatch hint: gpu-e2e,inference-routing-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required E2E

  • gpu-e2e (high): Closest existing GPU/local-inference E2E. It exercises a real NVIDIA GPU host, Docker, install/onboard, local inference startup, sandbox creation, and live inference through the sandbox. Although it uses Ollama rather than vLLM, this PR changes the managed GPU local-inference install/readiness path and should receive GPU-host coverage.
  • inference-routing-e2e (medium): Validates inference.local routing, provider route health, credential isolation, and error classification against a live sandbox. Managed vLLM install ultimately registers vllm-local as an OpenAI-compatible provider and relies on the same gateway/sandbox inference route behavior.

Optional E2E

  • onboard-inference-smoke-e2e (low): Hermetic regression guard that onboarding must not report success until a configured inference route serves a real request. Adjacent to this PR’s switch from log-marker readiness to endpoint readiness, but it does not run managed vLLM itself.
  • cloud-onboard-e2e (medium): Useful broad confidence that the onboarding flow still installs and configures a sandbox from source after changes in an onboarding-imported inference module, though it does not exercise local vLLM.

New E2E recommendations

  • managed vLLM install on GPU/Spark (high): No existing E2E appears to run NEMOCLAW_PROVIDER=install-vllm or interactively select install-vllm on a GPU/Spark/Station host. The changed code path covers Docker pull, hf download, container startup, /v1/models readiness polling, model detection, and vllm-local route setup, but current GPU E2E is Ollama-only.
    • Suggested test: Add a managed-vllm-install-e2e job on a GPU runner using a small supported vLLM model/cache, then assert /v1/models readiness, vllm-local provider registration, and a sandbox inference.local chat completion.
  • vLLM readiness failure diagnostics (medium): The diff replaces docker log streaming/fatal-marker detection with endpoint polling plus tail-on-failure. There is no apparent E2E/regression job that simulates curl missing, container exit before readiness, timeout, or malformed /v1/models response and asserts actionable diagnostics.
    • Suggested test: Add a hermetic regression E2E for vLLM readiness failure modes with mocked docker/curl commands, verifying non-zero exit, container cleanup, and printed log tail without leaking HF tokens.

Dispatch hint

  • Workflow: .github/workflows/nightly-e2e.yaml
  • jobs input: gpu-e2e,inference-routing-e2e

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • None. The PR changes the vLLM install/profile helper and its unit test, but the current scenario E2E catalog and ROUTES table do not include a vLLM onboarding scenario. Existing dispatchable scenarios cover cloud NVIDIA/OpenAI-compatible and local Ollama paths, so no scenario E2E job directly exercises this changed surface.

Optional scenario E2E

  • None.

Relevant changed files

  • src/lib/inference/vllm.ts

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 4 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 3 still apply, 0 new items found

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: src/lib/inference/vllm.ts vllmEndpointReady() tolerant JSON parse: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: vllmEndpointReady() catches JSON.parse failures and returns false; test/detect-vllm-profile.test.ts does not exercise this path.
  • Redact HF and vLLM logs before printing (src/lib/inference/vllm.ts:286): The changed progress and failure paths write Hugging Face download stdout/stderr and Docker log tails directly to the terminal. Those streams originate from third-party/container processes that receive forwarded Hugging Face credentials, so bearer tokens, signed URLs, proxy credentials, or environment dumps could bypass the repository's existing runner redaction.
    • Recommendation: Pass streamed chunks and stored tail/log-tail lines through the existing redaction utility, or route these outputs through a shared redacting logger. Add a regression test proving token-shaped strings in HF/vLLM output are not emitted verbatim.
    • Evidence: downloadModel() calls stream.write(buf) for stdout/stderr chunks and later prints raw hf tail lines; printContainerLogTail() writes each docker logs line directly with process.stderr.write(...). Existing redaction helpers are available in src/lib/security/redact.ts but are not used here.
  • Add coverage for the new readiness and failure paths (test/detect-vllm-profile.test.ts:69): The PR replaces log-marker readiness detection with /v1/models polling and adds tolerant JSON parsing, container-exit handling, load timeouts, download heartbeats, and failure log tails, but the test change only updates profile-shape assertions after removing marker fields. The main new behavior and negative paths are unverified.
    • Recommendation: Add focused tests or injectable seams for vllmEndpointReady()/waitForVllmReady() covering curl failure, empty output, malformed JSON, JSON without a data array, successful data-array responses, container exit before readiness, timeout, and failure log-tail behavior. Include runtime/integration validation for the Docker + vLLM installer path.
    • Evidence: test/detect-vllm-profile.test.ts changes the removed ready/fatal marker assertion into a timeout-budget assertion, but no tests exercise the new curl readiness probe, wait loop, timeout handling, container-exit handling, or failure log printing in src/lib/inference/vllm.ts.
  • Source-of-truth review needed for tolerant readiness JSON parsing (src/lib/inference/vllm.ts:376): vllmEndpointReady() treats malformed non-empty /v1/models output as simply not ready by catching JSON.parse failures and returning false. That localized tolerance may be appropriate for transient startup responses, but the code does not establish what invalid state creates the malformed response, why the source cannot be fixed, what regression test protects the behavior, or when the workaround can be removed.
    • Recommendation: Document or encode the expected invalid state at the source boundary, and add a regression test showing malformed readiness output is handled intentionally and can recover to a valid response. If the malformed response comes from a controllable source in this PR, prefer making that invalid state impossible instead of silently tolerating it.
    • Evidence: vllmEndpointReady() parses curl output from http://127.0.0.1:&lt;VLLM\_PORT&gt;/v1/models and catches JSON.parse errors with catch { return false; }, while the changed test file does not cover this path.

🌱 Nice ideas

  • None.
Since last review details

Current findings:

  • Source-of-truth review needed: src/lib/inference/vllm.ts vllmEndpointReady() tolerant JSON parse: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: vllmEndpointReady() catches JSON.parse failures and returns false; test/detect-vllm-profile.test.ts does not exercise this path.
  • Redact HF and vLLM logs before printing (src/lib/inference/vllm.ts:286): The changed progress and failure paths write Hugging Face download stdout/stderr and Docker log tails directly to the terminal. Those streams originate from third-party/container processes that receive forwarded Hugging Face credentials, so bearer tokens, signed URLs, proxy credentials, or environment dumps could bypass the repository's existing runner redaction.
    • Recommendation: Pass streamed chunks and stored tail/log-tail lines through the existing redaction utility, or route these outputs through a shared redacting logger. Add a regression test proving token-shaped strings in HF/vLLM output are not emitted verbatim.
    • Evidence: downloadModel() calls stream.write(buf) for stdout/stderr chunks and later prints raw hf tail lines; printContainerLogTail() writes each docker logs line directly with process.stderr.write(...). Existing redaction helpers are available in src/lib/security/redact.ts but are not used here.
  • Add coverage for the new readiness and failure paths (test/detect-vllm-profile.test.ts:69): The PR replaces log-marker readiness detection with /v1/models polling and adds tolerant JSON parsing, container-exit handling, load timeouts, download heartbeats, and failure log tails, but the test change only updates profile-shape assertions after removing marker fields. The main new behavior and negative paths are unverified.
    • Recommendation: Add focused tests or injectable seams for vllmEndpointReady()/waitForVllmReady() covering curl failure, empty output, malformed JSON, JSON without a data array, successful data-array responses, container exit before readiness, timeout, and failure log-tail behavior. Include runtime/integration validation for the Docker + vLLM installer path.
    • Evidence: test/detect-vllm-profile.test.ts changes the removed ready/fatal marker assertion into a timeout-budget assertion, but no tests exercise the new curl readiness probe, wait loop, timeout handling, container-exit handling, or failure log printing in src/lib/inference/vllm.ts.
  • Source-of-truth review needed for tolerant readiness JSON parsing (src/lib/inference/vllm.ts:376): vllmEndpointReady() treats malformed non-empty /v1/models output as simply not ready by catching JSON.parse failures and returning false. That localized tolerance may be appropriate for transient startup responses, but the code does not establish what invalid state creates the malformed response, why the source cannot be fixed, what regression test protects the behavior, or when the workaround can be removed.
    • Recommendation: Document or encode the expected invalid state at the source boundary, and add a regression test showing malformed readiness output is handled intentionally and can recover to a valid response. If the malformed response comes from a controllable source in this PR, prefer making that invalid state impossible instead of silently tolerating it.
    • Evidence: vllmEndpointReady() parses curl output from http://127.0.0.1:&lt;VLLM\_PORT&gt;/v1/models and catches JSON.parse errors with catch { return false; }, while the changed test file does not cover this path.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/lib/inference/vllm.ts (1)

243-325: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Heartbeat-only hf download can hang install forever.

Unlike pullImage(), this external download path has no watchdog or timeout at all. If docker run hf download ... wedges without exiting, the Promise never settles and the new heartbeat will print indefinitely. Please add a no-output stall cutoff and tear down the one-shot download container when it trips.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/inference/vllm.ts` around lines 243 - 325, Add a stall timeout that
force-terminates the docker download if there’s no output for too long:
introduce a STALL_CUTOFF_MS (e.g. a multiple of MODEL_DOWNLOAD_HEARTBEAT_MS or a
configured constant), and in the existing heartbeat callback check if now -
lastOutputAt >= STALL_CUTOFF_MS; if so, call proc.kill() (or
proc.kill('SIGKILL') if needed), emit a descriptive message, and call done({ ok:
false, reason: `hf download stalled (no output for ${STALL_CUTOFF_MS}ms)` }).
Ensure you only kill if proc exists, rely on the existing done() guard to avoid
double-resolve, and keep clearing the heartbeat interval as done() already does;
reference symbols: dockerSpawn, proc, heartbeat, lastOutputAt,
MODEL_DOWNLOAD_HEARTBEAT_MS, done.
🧹 Nitpick comments (1)
test/detect-vllm-profile.test.ts (1)

75-75: 💤 Low value

Consider adding type: "nvidia" for consistency.

Other tests that retrieve the Spark profile explicitly include type: "nvidia" alongside spark: true (lines 24, 47, 52). For consistency and clarity, consider:

-    const spark = detectVllmProfile({ spark: true });
+    const spark = detectVllmProfile({ spark: true, type: "nvidia" });
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/detect-vllm-profile.test.ts` at line 75, The test calling
detectVllmProfile currently uses detectVllmProfile({ spark: true }) which is
inconsistent with other tests; update the call to pass the explicit GPU type by
using detectVllmProfile({ spark: true, type: "nvidia" }) so the Spark profile
assertion matches the other cases and clarifies the intended GPU backend (refer
to detectVllmProfile and the spark variable in this test).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/inference/vllm.ts`:
- Around line 524-525: The hard-coded ETA message emitted by emit("Launch can
take 5-20 minutes; this is normal") under vllm startup should be derived from
the profile's loadTimeoutSec (or made open-ended); update the code that calls
emit (the emit invocations in this vllm launch flow) to compute a minutes value
from profile.loadTimeoutSec (e.g., Math.ceil(profile.loadTimeoutSec / 60)) and
emit a message that reflects that value (or emit a generic "Launch can take
several minutes to up to X minutes" or "Launch can take several minutes; this
can take longer for large models"), ensuring you reference
profile.loadTimeoutSec and the existing emit call/site when making the change.

---

Outside diff comments:
In `@src/lib/inference/vllm.ts`:
- Around line 243-325: Add a stall timeout that force-terminates the docker
download if there’s no output for too long: introduce a STALL_CUTOFF_MS (e.g. a
multiple of MODEL_DOWNLOAD_HEARTBEAT_MS or a configured constant), and in the
existing heartbeat callback check if now - lastOutputAt >= STALL_CUTOFF_MS; if
so, call proc.kill() (or proc.kill('SIGKILL') if needed), emit a descriptive
message, and call done({ ok: false, reason: `hf download stalled (no output for
${STALL_CUTOFF_MS}ms)` }). Ensure you only kill if proc exists, rely on the
existing done() guard to avoid double-resolve, and keep clearing the heartbeat
interval as done() already does; reference symbols: dockerSpawn, proc,
heartbeat, lastOutputAt, MODEL_DOWNLOAD_HEARTBEAT_MS, done.

---

Nitpick comments:
In `@test/detect-vllm-profile.test.ts`:
- Line 75: The test calling detectVllmProfile currently uses detectVllmProfile({
spark: true }) which is inconsistent with other tests; update the call to pass
the explicit GPU type by using detectVllmProfile({ spark: true, type: "nvidia"
}) so the Spark profile assertion matches the other cases and clarifies the
intended GPU backend (refer to detectVllmProfile and the spark variable in this
test).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 07d2389c-03a9-4970-aa4b-b949ffa6beed

📥 Commits

Reviewing files that changed from the base of the PR and between 5002ff8 and ee05b87.

📒 Files selected for processing (2)
  • src/lib/inference/vllm.ts
  • test/detect-vllm-profile.test.ts

Comment thread src/lib/inference/vllm.ts Outdated
Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
@zyang-dev zyang-dev added provider: vllm vLLM local or hosted provider behavior Platform: DGX Spark v0.0.57 Release target labels Jun 2, 2026
@cv cv merged commit 6240732 into main Jun 2, 2026
31 checks passed
@cv cv deleted the fix/vllm-progress-report-improve branch June 2, 2026 23:26
@wscurran wscurran added platform: dgx-spark Affects DGX Spark hardware or workflows platform: dgx-station Affects DGX Station hardware or workflows labels Jun 3, 2026
cv pushed a commit that referenced this pull request Jun 3, 2026
## Summary
- Add the missing `v0.0.57` release-notes section with links to the
detailed docs pages for command, inference, onboarding, messaging,
status, installer, and policy changes.
- Remove public references to docs-skip terms from source docs and
regenerate the NemoClaw user skills from the current Fern MDX docs.
- Carry forward generated references for the per-agent documentation
split, including Hermes-specific reference files.

## Source summary
- #4615 and #4653 -> `docs/about/release-notes.mdx`,
`docs/reference/commands.mdx`: Release notes now cover host-side
`sessions` and `agents` commands plus `NEMOCLAW_EXTRA_AGENTS_JSON`
secondary-agent baking.
- #4163, #4204, #4611, #4619, and #4676 ->
`docs/about/release-notes.mdx`,
`docs/inference/use-local-inference.mdx`: Release notes now cover
managed vLLM progress/readiness, DGX Spark model default changes, local
Ollama streaming usage, and inference route divergence warnings.
- #4267, #4601, #4609, #4642, #4645, and #4661 ->
`docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Release
notes now cover UFW auto-remediation, local-inference reachability
gates, gateway reuse/binding, cancel rollback, and policy selection
persistence.
- #4577, #4582, #4607, and #4660 -> `docs/about/release-notes.mdx`,
`docs/manage-sandboxes/messaging-channels.mdx`: Release notes now cover
Slack validation, atomic `channels add`, WhatsApp QR diagnostics, and
Slack placeholder normalization.
- #4388, #4600, #4646, and #4647 -> `docs/about/release-notes.mdx`,
`docs/reference/commands.mdx`: Release notes now cover status failure
layers, paused-container hints, Docker-driver doctor behavior, and
non-destructive stale-registry recovery.
- #4569, #4579, and #4678 -> `docs/about/release-notes.mdx`,
`docs/manage-sandboxes/lifecycle.mdx`,
`docs/network-policy/integration-policy-examples.mdx`: Release notes now
cover installer tag pinning, PyPI `uv` policy access, and observable
Jira validation.
- #4632 -> `.agents/skills/`: Regenerated user skills from the current
per-agent docs source, including newly generated Hermes reference files.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" docs --glob "*.mdx"`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" .agents/skills --glob "*.md"`
- `npm run docs`
- `npm run build:cli`
- Commit hooks: markdownlint, docs-to-skills verification, gitleaks,
skills YAML, commitlint

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Documentation**
* Restructured documentation to clearly distinguish OpenClaw and Hermes
agent variants throughout user guides.
* Enhanced security, credential storage, and deployment guidance with
clearer setup flows.
  * Added Hermes plugin installation and ecosystem documentation.
* Improved workspace, messaging, and policy management references with
variant-specific command examples.
  * Refined troubleshooting and CLI reference sections for clarity.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression platform: dgx-spark Affects DGX Spark hardware or workflows platform: dgx-station Affects DGX Station hardware or workflows provider: vllm vLLM local or hosted provider behavior v0.0.57 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants