feat: vllm engine tensor parallel and pipeline parallel by grahamking · Pull Request #16 · ai-dynamo/dynamo

grahamking · 2025-03-04T21:19:42Z

Needs more testing but good enough for now. I get the same results with this as with vllm serve.

github-actions · 2025-03-04T21:49:37Z

Test Results

2 files 2 suites 52s ⏱️
77 tests 77 ✅ 0 💤 0 ❌
99 runs 98 ✅ 1 💤 0 ❌

Results for commit 04bca65.

♻️ This comment has been updated with latest results.

Needs more testing but good enough for now. I get the same results with this as with `vllm serve`.

biswapanda

Lgtm

rmccorm4

Nice 🚀

So if I had 16 GPUs on 2x8GPU nodes, I could specify TP 16 on head node, omit TP on worker node, and the 8+8 split would be figured out internally?

grahamking · 2025-03-04T22:47:12Z

Nice 🚀

So if I had 16 GPUs on 2x8GPU nodes, I could specify TP 16 on head node, omit TP on worker node, and the 8+8 split would be figured out internally?

With vllm you do TP 8, PP 2 for 2x8. World size is tp + pp.

https://github.com/vllm-project/vllm/blob/ae122b1cbde96c871fb74611363e04eecfbcce03/docs/source/serving/distributed_serving.md#running-vllm-on-multiple-nodes

With sglang you do what you suggested, TP 16, nodes 2 and it divides TP by num_nodes. Those two projects are both very similar and very different at the same time.

Needs more testing but good enough for now. I get the same results with this as with `vllm serve`.

…ers) LaTeX book covering transformer inference, KV caching, distributed serving, Dynamo architecture, KV-aware routing, token blocks, network codecs, fuzzing methodology, and 16 bugs found across 36 fuzz targets. All bug references use ch10 canonical numbering (ai-dynamo#1-ai-dynamo#16), all cross- references resolve, all arithmetic verified. Includes TikZ diagrams, whynotbox pedagogical environments, and marginnotes throughout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…iteups for 14 open bugs - Reformat all 16 issue markdowns to match GitHub bug template (Describe the Bug, Steps to Reproduce, Expected/Actual Behavior, Environment, Additional Context) - Add upstream/ directory with per-bug subfolders containing issue.md, pr.md, and discovery.md for each of the 16 bugs found via fuzzing - Add fixes/ directory with annotated patch files and regression tests for all 14 open bugs - Create 14 minimal fix branches (fix/*-{bug#}) off upstream/main, each with a single commit touching only the affected file - Verified all 14 bugs still present in upstream/main as of 50af343 - Confirmed 2 bugs already fixed upstream (ai-dynamo#15 RadixTree scoring, ai-dynamo#16 TwoPartCodec overflow)

…(R-B Slice 3) New `ConditionalDisaggCoordinator` in `disagg/coordinator/coordinator.rs` holds the unified per-request state map (`DashMap<String, Arc<CdRequest>>`) and ports the prefill flow off `PrefillCoordinatorImpl` against the canonical types defined in Slice 2. Public API mirrors `PrefillCoordinatorImpl` so tests swap by constructor only: - `new` / `new_with_watchdog` - `observer` / `observer_callback` - `active_count` / `status_for` / `has_active_request` - `commit_output_blocks` - `cleanup_failed_request` (single canonical sink — uses `CdRequest::failed_g1_block_ids()` from Slice 2) - `PrefillCoordinator` trait impl (`ensure_started`, `on_usaa`, `observe_forward`, `on_request_finished`) Lifecycle watcher uses `spawn_lifecycle_watcher` (the helper from Slice 1). `on_request_finished`'s observer-residual finalize-deferral is preserved verbatim, including the 10s watchdog that prevents the `commit_output_blocks` ↔ `session.finalize()` race (learning #16). Decode-role bits are present on `CdRequest` but unreachable through this slice's code paths — Slice 4 wires them. Coexistence (Slice 3 only): `PrefillCoordinatorImpl` stays alive because production wiring (`prefill_leader.rs`, `init.rs`, `unified.rs`) still references it; `cd_loopback`, `cd_decode_e2e`, `cd_bidirectional_e2e`, and the unified suites also still construct it. Slice 5 cuts production over; Slice 7 deletes it. Deviation: `ConditionalDecodeG2Observer::new()` widened from private to `pub` so the new coordinator (sibling module) can construct one. No behavior change. Tests: - cd_prefill_e2e: harness swapped to ConditionalDisaggCoordinator; all 11 tests pass against the new coord. - lib unit: 197 → 198 (new `size_of_is_nonzero` constructor probe). - All other suites unchanged. - Total CD-scope: 229 active tests passing.

/ai-dynamo#8/ai-dynamo#15/ai-dynamo#16) Test-only additions for the seams the review flagged as untested. ai-dynamo#7 _project_scale_to with a real apply outcome (4 cases): both components changed → full ScalingDecision; both equal current → None (PSM-equivalent no-change); single-component proposal → other count stays None; non-apply execute_action → None. Previously every adapter.tick test hit only the None/empty path, so a regression in the projection / no-change detection would have shipped silently. ai-dynamo#8 _tick_input_to_context + FPM encoding: build a TickInput with traffic (incl kv_hit_rate), worker counts (incl scaling-in-progress flags), and a real ForwardPassMetrics; assert the PipelineContext.observations mapping and that the FPM bytes decode back (key format "<worker_id>/<dp_rank>", canonical encoder). This is the ingress glue where the add_observations P1 + the projection live. ai-dynamo#15 registry mutation during an in-flight tick: suspend a PROPOSE plugin mid-gather (asyncio.Event), register a new plugin while suspended, release, and assert the late plugin did NOT join the in-flight stage (pre-tick snapshot) and the tick completed cleanly — then a fresh tick picks it up. Exercises the no-locks invariant that scheduler.py/server.py document but no test covered. ai-dynamo#16 test_tick_diagnostics_extended scope note: clarify in the module docstring that plugin_overrides / reconcile_reasons / held_over_plugins have no production populator in this PR; these tests lock the dataclass contract (defaults / no shared-mutable aliasing), not live behavior. 835 planner tests pass (+6). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

grahamking requested review from GuanLuo, biswapanda, paulhendricks and rmccorm4 as code owners March 4, 2025 21:19

grahamking temporarily deployed to GITLAB March 4, 2025 21:19 — with GitHub Actions Inactive

grahamking temporarily deployed to GITLAB March 4, 2025 21:20 — with GitHub Actions Inactive

grahamking force-pushed the gk-vllm-multi branch from 481975a to c1e6e3b Compare March 4, 2025 21:22

grahamking temporarily deployed to GITLAB March 4, 2025 21:22 — with GitHub Actions Inactive

grahamking temporarily deployed to GITLAB March 4, 2025 21:25 — with GitHub Actions Inactive

grahamking added the enhancement New feature or request label Mar 4, 2025

grahamking self-assigned this Mar 4, 2025

grahamking force-pushed the gk-vllm-multi branch from c1e6e3b to c823645 Compare March 4, 2025 21:50

grahamking temporarily deployed to GITLAB March 4, 2025 21:50 — with GitHub Actions Inactive

rmccorm4 reviewed Mar 4, 2025

View reviewed changes

Comment thread launch/tio/README.md Outdated

grahamking temporarily deployed to GITLAB March 4, 2025 21:58 — with GitHub Actions Inactive

grahamking added 2 commits March 4, 2025 17:24

feat: vllm engine tensor parallel and pipeline parallel

8e0f3f7

Needs more testing but good enough for now. I get the same results with this as with `vllm serve`.

Update vllm version in README

221a61e

grahamking force-pushed the gk-vllm-multi branch from c823645 to 221a61e Compare March 4, 2025 22:25

grahamking temporarily deployed to GITLAB March 4, 2025 22:25 — with GitHub Actions Inactive

biswapanda approved these changes Mar 4, 2025

View reviewed changes

grahamking temporarily deployed to GITLAB March 4, 2025 22:38 — with GitHub Actions Inactive

rmccorm4 approved these changes Mar 4, 2025

View reviewed changes

Merge branch 'main' into gk-vllm-multi

04bca65

grahamking temporarily deployed to GITLAB March 4, 2025 22:52 — with GitHub Actions Inactive

grahamking temporarily deployed to GITLAB March 4, 2025 22:57 — with GitHub Actions Inactive

grahamking temporarily deployed to GITLAB March 4, 2025 23:11 — with GitHub Actions Inactive

grahamking enabled auto-merge (squash) March 4, 2025 23:15

grahamking merged commit e64935b into main Mar 4, 2025

grahamking deleted the gk-vllm-multi branch March 4, 2025 23:47

kylehh pushed a commit to kylehh/dynamo that referenced this pull request Apr 11, 2025

feat: vllm engine tensor parallel and pipeline parallel (ai-dynamo#16)

a657ec6

Needs more testing but good enough for now. I get the same results with this as with `vllm serve`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: vllm engine tensor parallel and pipeline parallel#16

feat: vllm engine tensor parallel and pipeline parallel#16
grahamking merged 3 commits into
mainfrom
gk-vllm-multi

grahamking commented Mar 4, 2025

Uh oh!

github-actions Bot commented Mar 4, 2025 •

edited

Loading

Uh oh!

Uh oh!

biswapanda left a comment

Uh oh!

rmccorm4 left a comment

Uh oh!

grahamking commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

grahamking commented Mar 4, 2025

Uh oh!

github-actions Bot commented Mar 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

Uh oh!

biswapanda left a comment

Choose a reason for hiding this comment

Uh oh!

rmccorm4 left a comment

Choose a reason for hiding this comment

Uh oh!

grahamking commented Mar 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Mar 4, 2025 •

edited

Loading