fix(inference): improve managed vLLM download and launch progress by zyang-dev · Pull Request #4676 · NVIDIA/NemoClaw

zyang-dev · 2026-06-02T19:46:56Z

Summary

Improves managed local vLLM setup progress visibility and readiness handling. Model downloads now stream native Hugging Face progress output, and vLLM launch waits quietly on the /v1/models readiness endpoint with bounded probes and failure log tails.

Changes

Stream hf download output directly with Docker TTY enabled for live progress.
Add a silence heartbeat for model downloads without imposing a hard download timeout.
Replace vLLM startup log-marker parsing with quiet /v1/models readiness polling.
Print recent vLLM container logs only when launch readiness fails.
Add curl prereq and bounded curl timeouts for readiness checks.
Clean up vLLM install and launch progress copy.
Update focused vLLM profile tests after removing startup marker fields.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com

Summary by CodeRabbit

Bug Fixes
- More reliable vLLM readiness via API endpoint polling.
- Improved error reporting showing container log tail on launch failures.
Improvements
- Simplified install/profile behavior to rely on timeout controls.
- Periodic heartbeat messages during image pull, model download, and readiness waiting.
- Docker prerequisite now requires curl for readiness checks.
Tests
- Updated tests to validate timeout-based profile behavior.

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

coderabbitai · 2026-06-02T19:47:12Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 95c922ba-9faa-4205-ab86-a9b6d5e48960

📥 Commits

Reviewing files that changed from the base of the PR and between ee05b87 and 505dad7.

📒 Files selected for processing (2)

src/lib/inference/vllm.ts
test/detect-vllm-profile.test.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/lib/inference/vllm.ts

📝 Walkthrough

Walkthrough

Replaces log-stream readiness detection with polling of the vLLM OpenAI-compatible /v1/models endpoint, simplifies VllmProfile to timeout-based fields, adds heartbeat messages during download and launch, requires curl for readiness checks, and prints container log tails on launch failures.

Changes

vLLM Readiness Mechanism Replacement

Layer / File(s)	Summary
VllmProfile contract and profile data `src/lib/inference/vllm.ts`	`VllmProfile` removes installation log-pattern fields and clarifies timeout control (`pullTimeoutSec`, `loadTimeoutSec`). SPARK, STATION, and GENERIC_LINUX profiles updated to use explicit timeouts.
Infrastructure helpers and Docker prerequisites `src/lib/inference/vllm.ts`	Adds heartbeat interval constants and `formatElapsed`; Docker prereq checks now require `curl`.
Download phase with heartbeat messaging `src/lib/inference/vllm.ts`	`downloadModel` gains resolved-once guarding, a heartbeat timer that emits "still running" messages when output stalls, improved stdout/stderr tail capture, adds `-t` to `hf download` run, and standardizes failure reasons.
Endpoint-based readiness polling and diagnostics `src/lib/inference/vllm.ts`	Adds helpers to query `/v1/models`, parses its JSON, implements `waitForVllmReady` polling loop with 5s heartbeats and `loadTimeoutSec` enforcement, and adds container log-tail capture for diagnostics.
Install/launch orchestration and messaging `src/lib/inference/vllm.ts`	Updates user messaging to "Installing vLLM. Progress will print below.", emits "Launching vLLM", waits via `waitForVllmReady`, and prints container logs on failure before returning `{ ok: false }`.
Profile timeout assertion updates `test/detect-vllm-profile.test.ts`	Test now asserts generic NVIDIA profile reuses Spark's `pullTimeoutSec` and `loadTimeoutSec` timeout values.

Sequence Diagram(s)

sequenceDiagram
  participant Install as installVllm
  participant Poll as waitForVllmReady
  participant Endpoint as /v1/models Endpoint
  participant Container as vLLM Container
  participant Logs as Container Logs
  
  Install->>Poll: start polling with timeout
  Poll->>Container: docker inspect (running?)
  Container-->>Poll: running or not
  Poll->>Endpoint: curl /v1/models
  Endpoint-->>Poll: success or network error
  alt ready
    Poll-->>Install: {ok: true}
  else timeout reached
    Poll->>Logs: tail container logs
    Logs-->>Poll: recent log lines
    Poll-->>Install: {ok: false} + log tail
  else container stopped
    Poll-->>Install: {ok: false}
  else heartbeat interval
    Poll-->>Poll: emit "API not ready" message
    Poll->>Endpoint: retry after interval
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

NVIDIA/NemoClaw#4619: Modifies src/lib/inference/vllm.ts vLLM installation/launch flow and image/model selection; relates to endpoint-based readiness and docker invocation changes.

Suggested labels

Provider: vLLM

Poem

🐰 I watched the logs and hopped about the glen,
Now I poll the endpoint, again and again—
A heartbeat ping, a gentle cheer,
Tail the logs when doom is near,
Hop, install, vLLM's live—let's run!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.79% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: improving vLLM download and launch progress visibility by replacing log parsing with endpoint polling and adding heartbeat messaging.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/vllm-progress-report-improve

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-02T19:49:32Z

E2E Advisor Recommendation

Required E2E: gpu-e2e, inference-routing-e2e
Optional E2E: onboard-inference-smoke-e2e, cloud-onboard-e2e

Dispatch hint: gpu-e2e,inference-routing-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required E2E

gpu-e2e (high): Closest existing GPU/local-inference E2E. It exercises a real NVIDIA GPU host, Docker, install/onboard, local inference startup, sandbox creation, and live inference through the sandbox. Although it uses Ollama rather than vLLM, this PR changes the managed GPU local-inference install/readiness path and should receive GPU-host coverage.
inference-routing-e2e (medium): Validates inference.local routing, provider route health, credential isolation, and error classification against a live sandbox. Managed vLLM install ultimately registers vllm-local as an OpenAI-compatible provider and relies on the same gateway/sandbox inference route behavior.

Optional E2E

onboard-inference-smoke-e2e (low): Hermetic regression guard that onboarding must not report success until a configured inference route serves a real request. Adjacent to this PR’s switch from log-marker readiness to endpoint readiness, but it does not run managed vLLM itself.
cloud-onboard-e2e (medium): Useful broad confidence that the onboarding flow still installs and configures a sandbox from source after changes in an onboarding-imported inference module, though it does not exercise local vLLM.

New E2E recommendations

managed vLLM install on GPU/Spark (high): No existing E2E appears to run NEMOCLAW_PROVIDER=install-vllm or interactively select install-vllm on a GPU/Spark/Station host. The changed code path covers Docker pull, hf download, container startup, /v1/models readiness polling, model detection, and vllm-local route setup, but current GPU E2E is Ollama-only.
- Suggested test: Add a managed-vllm-install-e2e job on a GPU runner using a small supported vLLM model/cache, then assert /v1/models readiness, vllm-local provider registration, and a sandbox inference.local chat completion.
vLLM readiness failure diagnostics (medium): The diff replaces docker log streaming/fatal-marker detection with endpoint polling plus tail-on-failure. There is no apparent E2E/regression job that simulates curl missing, container exit before readiness, timeout, or malformed /v1/models response and asserts actionable diagnostics.
- Suggested test: Add a hermetic regression E2E for vLLM readiness failure modes with mocked docker/curl commands, verifying non-zero exit, container cleanup, and printed log tail without leaking HF tokens.

Dispatch hint

Workflow: .github/workflows/nightly-e2e.yaml
jobs input: gpu-e2e,inference-routing-e2e

github-actions · 2026-06-02T19:49:33Z

E2E Scenario Advisor Recommendation

Required scenario E2E: None
Optional scenario E2E: None

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

None. The PR changes the vLLM install/profile helper and its unit test, but the current scenario E2E catalog and ROUTES table do not include a vLLM onboarding scenario. Existing dispatchable scenarios cover cloud NVIDIA/OpenAI-compatible and local Ollama paths, so no scenario E2E job directly exercises this changed surface.

Optional scenario E2E

None.

Relevant changed files

src/lib/inference/vllm.ts

github-actions · 2026-06-02T19:50:47Z

PR Review Advisor

Findings: 0 needs attention, 4 worth checking, 0 nice ideas
Since last review: 0 prior items resolved, 3 still apply, 0 new items found

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: src/lib/inference/vllm.ts vllmEndpointReady() tolerant JSON parse: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: vllmEndpointReady() catches JSON.parse failures and returns false; test/detect-vllm-profile.test.ts does not exercise this path.
Redact HF and vLLM logs before printing (src/lib/inference/vllm.ts:286): The changed progress and failure paths write Hugging Face download stdout/stderr and Docker log tails directly to the terminal. Those streams originate from third-party/container processes that receive forwarded Hugging Face credentials, so bearer tokens, signed URLs, proxy credentials, or environment dumps could bypass the repository's existing runner redaction.
- Recommendation: Pass streamed chunks and stored tail/log-tail lines through the existing redaction utility, or route these outputs through a shared redacting logger. Add a regression test proving token-shaped strings in HF/vLLM output are not emitted verbatim.
- Evidence: downloadModel() calls stream.write(buf) for stdout/stderr chunks and later prints raw hf tail lines; printContainerLogTail() writes each docker logs line directly with process.stderr.write(...). Existing redaction helpers are available in src/lib/security/redact.ts but are not used here.
Add coverage for the new readiness and failure paths (test/detect-vllm-profile.test.ts:69): The PR replaces log-marker readiness detection with /v1/models polling and adds tolerant JSON parsing, container-exit handling, load timeouts, download heartbeats, and failure log tails, but the test change only updates profile-shape assertions after removing marker fields. The main new behavior and negative paths are unverified.
- Recommendation: Add focused tests or injectable seams for vllmEndpointReady()/waitForVllmReady() covering curl failure, empty output, malformed JSON, JSON without a data array, successful data-array responses, container exit before readiness, timeout, and failure log-tail behavior. Include runtime/integration validation for the Docker + vLLM installer path.
- Evidence: test/detect-vllm-profile.test.ts changes the removed ready/fatal marker assertion into a timeout-budget assertion, but no tests exercise the new curl readiness probe, wait loop, timeout handling, container-exit handling, or failure log printing in src/lib/inference/vllm.ts.
Source-of-truth review needed for tolerant readiness JSON parsing (src/lib/inference/vllm.ts:376): vllmEndpointReady() treats malformed non-empty /v1/models output as simply not ready by catching JSON.parse failures and returning false. That localized tolerance may be appropriate for transient startup responses, but the code does not establish what invalid state creates the malformed response, why the source cannot be fixed, what regression test protects the behavior, or when the workaround can be removed.
- Recommendation: Document or encode the expected invalid state at the source boundary, and add a regression test showing malformed readiness output is handled intentionally and can recover to a valid response. If the malformed response comes from a controllable source in this PR, prefer making that invalid state impossible instead of silently tolerating it.
- Evidence: vllmEndpointReady() parses curl output from http://127.0.0.1:<VLLM\_PORT>/v1/models and catches JSON.parse errors with catch { return false; }, while the changed test file does not cover this path.

🌱 Nice ideas

None.

Since last review details

Current findings:

Source-of-truth review needed: src/lib/inference/vllm.ts vllmEndpointReady() tolerant JSON parse: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: vllmEndpointReady() catches JSON.parse failures and returns false; test/detect-vllm-profile.test.ts does not exercise this path.
Redact HF and vLLM logs before printing (src/lib/inference/vllm.ts:286): The changed progress and failure paths write Hugging Face download stdout/stderr and Docker log tails directly to the terminal. Those streams originate from third-party/container processes that receive forwarded Hugging Face credentials, so bearer tokens, signed URLs, proxy credentials, or environment dumps could bypass the repository's existing runner redaction.
- Recommendation: Pass streamed chunks and stored tail/log-tail lines through the existing redaction utility, or route these outputs through a shared redacting logger. Add a regression test proving token-shaped strings in HF/vLLM output are not emitted verbatim.
- Evidence: downloadModel() calls stream.write(buf) for stdout/stderr chunks and later prints raw hf tail lines; printContainerLogTail() writes each docker logs line directly with process.stderr.write(...). Existing redaction helpers are available in src/lib/security/redact.ts but are not used here.
Add coverage for the new readiness and failure paths (test/detect-vllm-profile.test.ts:69): The PR replaces log-marker readiness detection with /v1/models polling and adds tolerant JSON parsing, container-exit handling, load timeouts, download heartbeats, and failure log tails, but the test change only updates profile-shape assertions after removing marker fields. The main new behavior and negative paths are unverified.
- Recommendation: Add focused tests or injectable seams for vllmEndpointReady()/waitForVllmReady() covering curl failure, empty output, malformed JSON, JSON without a data array, successful data-array responses, container exit before readiness, timeout, and failure log-tail behavior. Include runtime/integration validation for the Docker + vLLM installer path.
- Evidence: test/detect-vllm-profile.test.ts changes the removed ready/fatal marker assertion into a timeout-budget assertion, but no tests exercise the new curl readiness probe, wait loop, timeout handling, container-exit handling, or failure log printing in src/lib/inference/vllm.ts.
Source-of-truth review needed for tolerant readiness JSON parsing (src/lib/inference/vllm.ts:376): vllmEndpointReady() treats malformed non-empty /v1/models output as simply not ready by catching JSON.parse failures and returning false. That localized tolerance may be appropriate for transient startup responses, but the code does not establish what invalid state creates the malformed response, why the source cannot be fixed, what regression test protects the behavior, or when the workaround can be removed.
- Recommendation: Document or encode the expected invalid state at the source boundary, and add a regression test showing malformed readiness output is handled intentionally and can recover to a valid response. If the malformed response comes from a controllable source in this PR, prefer making that invalid state impossible instead of silently tolerating it.
- Evidence: vllmEndpointReady() parses curl output from http://127.0.0.1:<VLLM\_PORT>/v1/models and catches JSON.parse errors with catch { return false; }, while the changed test file does not cover this path.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/lib/inference/vllm.ts (1)
243-325: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Heartbeat-only hf download can hang install forever.

Unlike pullImage(), this external download path has no watchdog or timeout at all. If docker run hf download ... wedges without exiting, the Promise never settles and the new heartbeat will print indefinitely. Please add a no-output stall cutoff and tear down the one-shot download container when it trips.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/inference/vllm.ts` around lines 243 - 325, Add a stall timeout that
force-terminates the docker download if there’s no output for too long:
introduce a STALL_CUTOFF_MS (e.g. a multiple of MODEL_DOWNLOAD_HEARTBEAT_MS or a
configured constant), and in the existing heartbeat callback check if now -
lastOutputAt >= STALL_CUTOFF_MS; if so, call proc.kill() (or
proc.kill('SIGKILL') if needed), emit a descriptive message, and call done({ ok:
false, reason: `hf download stalled (no output for ${STALL_CUTOFF_MS}ms)` }).
Ensure you only kill if proc exists, rely on the existing done() guard to avoid
double-resolve, and keep clearing the heartbeat interval as done() already does;
reference symbols: dockerSpawn, proc, heartbeat, lastOutputAt,
MODEL_DOWNLOAD_HEARTBEAT_MS, done.

🧹 Nitpick comments (1)

test/detect-vllm-profile.test.ts (1)

75-75: 💤 Low value

Consider adding type: "nvidia" for consistency.

Other tests that retrieve the Spark profile explicitly include type: "nvidia" alongside spark: true (lines 24, 47, 52). For consistency and clarity, consider:

-    const spark = detectVllmProfile({ spark: true });
+    const spark = detectVllmProfile({ spark: true, type: "nvidia" });

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/detect-vllm-profile.test.ts` at line 75, The test calling
detectVllmProfile currently uses detectVllmProfile({ spark: true }) which is
inconsistent with other tests; update the call to pass the explicit GPU type by
using detectVllmProfile({ spark: true, type: "nvidia" }) so the Spark profile
assertion matches the other cases and clarifies the intended GPU backend (refer
to detectVllmProfile and the spark variable in this test).

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/inference/vllm.ts`:
- Around line 524-525: The hard-coded ETA message emitted by emit("Launch can
take 5-20 minutes; this is normal") under vllm startup should be derived from
the profile's loadTimeoutSec (or made open-ended); update the code that calls
emit (the emit invocations in this vllm launch flow) to compute a minutes value
from profile.loadTimeoutSec (e.g., Math.ceil(profile.loadTimeoutSec / 60)) and
emit a message that reflects that value (or emit a generic "Launch can take
several minutes to up to X minutes" or "Launch can take several minutes; this
can take longer for large models"), ensuring you reference
profile.loadTimeoutSec and the existing emit call/site when making the change.

---

Outside diff comments:
In `@src/lib/inference/vllm.ts`:
- Around line 243-325: Add a stall timeout that force-terminates the docker
download if there’s no output for too long: introduce a STALL_CUTOFF_MS (e.g. a
multiple of MODEL_DOWNLOAD_HEARTBEAT_MS or a configured constant), and in the
existing heartbeat callback check if now - lastOutputAt >= STALL_CUTOFF_MS; if
so, call proc.kill() (or proc.kill('SIGKILL') if needed), emit a descriptive
message, and call done({ ok: false, reason: `hf download stalled (no output for
${STALL_CUTOFF_MS}ms)` }). Ensure you only kill if proc exists, rely on the
existing done() guard to avoid double-resolve, and keep clearing the heartbeat
interval as done() already does; reference symbols: dockerSpawn, proc,
heartbeat, lastOutputAt, MODEL_DOWNLOAD_HEARTBEAT_MS, done.

---

Nitpick comments:
In `@test/detect-vllm-profile.test.ts`:
- Line 75: The test calling detectVllmProfile currently uses detectVllmProfile({
spark: true }) which is inconsistent with other tests; update the call to pass
the explicit GPU type by using detectVllmProfile({ spark: true, type: "nvidia"
}) so the Spark profile assertion matches the other cases and clarifies the
intended GPU backend (refer to detectVllmProfile and the spark variable in this
test).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 07d2389c-03a9-4970-aa4b-b949ffa6beed

📥 Commits

Reviewing files that changed from the base of the PR and between 5002ff8 and ee05b87.

📒 Files selected for processing (2)

src/lib/inference/vllm.ts
test/detect-vllm-profile.test.ts

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

## Summary - Add the missing `v0.0.57` release-notes section with links to the detailed docs pages for command, inference, onboarding, messaging, status, installer, and policy changes. - Remove public references to docs-skip terms from source docs and regenerate the NemoClaw user skills from the current Fern MDX docs. - Carry forward generated references for the per-agent documentation split, including Hermes-specific reference files. ## Source summary - #4615 and #4653 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Release notes now cover host-side `sessions` and `agents` commands plus `NEMOCLAW_EXTRA_AGENTS_JSON` secondary-agent baking. - #4163, #4204, #4611, #4619, and #4676 -> `docs/about/release-notes.mdx`, `docs/inference/use-local-inference.mdx`: Release notes now cover managed vLLM progress/readiness, DGX Spark model default changes, local Ollama streaming usage, and inference route divergence warnings. - #4267, #4601, #4609, #4642, #4645, and #4661 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Release notes now cover UFW auto-remediation, local-inference reachability gates, gateway reuse/binding, cancel rollback, and policy selection persistence. - #4577, #4582, #4607, and #4660 -> `docs/about/release-notes.mdx`, `docs/manage-sandboxes/messaging-channels.mdx`: Release notes now cover Slack validation, atomic `channels add`, WhatsApp QR diagnostics, and Slack placeholder normalization. - #4388, #4600, #4646, and #4647 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Release notes now cover status failure layers, paused-container hints, Docker-driver doctor behavior, and non-destructive stale-registry recovery. - #4569, #4579, and #4678 -> `docs/about/release-notes.mdx`, `docs/manage-sandboxes/lifecycle.mdx`, `docs/network-policy/integration-policy-examples.mdx`: Release notes now cover installer tag pinning, PyPI `uv` policy access, and observable Jira validation. - #4632 -> `.agents/skills/`: Regenerated user skills from the current per-agent docs source, including newly generated Hermes reference files. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" docs --glob "*.mdx"` - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" .agents/skills --glob "*.md"` - `npm run docs` - `npm run build:cli` - Commit hooks: markdownlint, docs-to-skills verification, gitleaks, skills YAML, commitlint  ## Summary by CodeRabbit * **Documentation** * Restructured documentation to clearly distinguish OpenClaw and Hermes agent variants throughout user guides. * Enhanced security, credential storage, and deployment guidance with clearer setup flows. * Added Hermes plugin installation and ecosystem documentation. * Improved workspace, messaging, and policy management references with variant-specific command examples. * Refined troubleshooting and CLI reference sections for clarity.

fix(inference): improve managed vLLM download and launch progress

ee05b87

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

coderabbitai Bot reviewed Jun 2, 2026

View reviewed changes

Comment thread src/lib/inference/vllm.ts Outdated

fix(inference): derive vLLM launch ETA from load timeout

505dad7

Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

zyang-dev added provider: vllm vLLM local or hosted provider behavior Platform: DGX Spark v0.0.57 Release target labels Jun 2, 2026

cv approved these changes Jun 2, 2026

View reviewed changes

cv merged commit 6240732 into main Jun 2, 2026
31 checks passed

cv deleted the fix/vllm-progress-report-improve branch June 2, 2026 23:26

wscurran added platform: dgx-spark Affects DGX Spark hardware or workflows platform: dgx-station Affects DGX Station hardware or workflows labels Jun 3, 2026

wangericnv mentioned this pull request Jun 3, 2026

[DGX Spark][Inference] PR #4619 vLLM/NVFP4 path crashes on container start with "CUDA unknown error" #4658

Closed

wscurran removed Platform: DGX Spark labels Jun 3, 2026

miyoungc mentioned this pull request Jun 3, 2026

docs: refresh 0.0.57 release docs #4716

Merged

coderabbitai Bot mentioned this pull request Jun 3, 2026

docs: document remaining 0.0.57 behavior changes #4717

Merged

10 tasks

wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026

coderabbitai Bot mentioned this pull request Jun 9, 2026

feat(inference): add interactive managed-vLLM model picker #5038

Merged

12 tasks

Conversation

zyang-dev commented Jun 2, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zyang-dev commented Jun 2, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading