feat: vllm engine tensor parallel and pipeline parallel#16
Merged
Conversation
481975a to
c1e6e3b
Compare
Contributor
Test Results 2 files 2 suites 52s ⏱️ Results for commit 04bca65. ♻️ This comment has been updated with latest results. |
c1e6e3b to
c823645
Compare
rmccorm4
reviewed
Mar 4, 2025
Needs more testing but good enough for now. I get the same results with this as with `vllm serve`.
c823645 to
221a61e
Compare
rmccorm4
approved these changes
Mar 4, 2025
rmccorm4
left a comment
Contributor
There was a problem hiding this comment.
Nice 🚀
So if I had 16 GPUs on 2x8GPU nodes, I could specify TP 16 on head node, omit TP on worker node, and the 8+8 split would be figured out internally?
Contributor
Author
With vllm you do TP 8, PP 2 for 2x8. World size is tp + pp. With sglang you do what you suggested, TP 16, nodes 2 and it divides TP by num_nodes. Those two projects are both very similar and very different at the same time. |
kylehh
pushed a commit
to kylehh/dynamo
that referenced
this pull request
Apr 11, 2025
Needs more testing but good enough for now. I get the same results with this as with `vllm serve`.
ShounakRay
added a commit
to ShounakRay/fuzzy-dynamo
that referenced
this pull request
Mar 20, 2026
…ers) LaTeX book covering transformer inference, KV caching, distributed serving, Dynamo architecture, KV-aware routing, token blocks, network codecs, fuzzing methodology, and 16 bugs found across 36 fuzz targets. All bug references use ch10 canonical numbering (ai-dynamo#1-ai-dynamo#16), all cross- references resolve, all arithmetic verified. Includes TikZ diagrams, whynotbox pedagogical environments, and marginnotes throughout. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ShounakRay
added a commit
to ShounakRay/fuzzy-dynamo
that referenced
this pull request
Mar 20, 2026
…iteups for 14 open bugs
- Reformat all 16 issue markdowns to match GitHub bug template (Describe the Bug,
Steps to Reproduce, Expected/Actual Behavior, Environment, Additional Context)
- Add upstream/ directory with per-bug subfolders containing issue.md, pr.md, and
discovery.md for each of the 16 bugs found via fuzzing
- Add fixes/ directory with annotated patch files and regression tests for all 14
open bugs
- Create 14 minimal fix branches (fix/*-{bug#}) off upstream/main, each with a
single commit touching only the affected file
- Verified all 14 bugs still present in upstream/main as of 50af343
- Confirmed 2 bugs already fixed upstream (ai-dynamo#15 RadixTree scoring, ai-dynamo#16 TwoPartCodec overflow)
ryanolson
added a commit
that referenced
this pull request
May 2, 2026
…(R-B Slice 3) New `ConditionalDisaggCoordinator` in `disagg/coordinator/coordinator.rs` holds the unified per-request state map (`DashMap<String, Arc<CdRequest>>`) and ports the prefill flow off `PrefillCoordinatorImpl` against the canonical types defined in Slice 2. Public API mirrors `PrefillCoordinatorImpl` so tests swap by constructor only: - `new` / `new_with_watchdog` - `observer` / `observer_callback` - `active_count` / `status_for` / `has_active_request` - `commit_output_blocks` - `cleanup_failed_request` (single canonical sink — uses `CdRequest::failed_g1_block_ids()` from Slice 2) - `PrefillCoordinator` trait impl (`ensure_started`, `on_usaa`, `observe_forward`, `on_request_finished`) Lifecycle watcher uses `spawn_lifecycle_watcher` (the helper from Slice 1). `on_request_finished`'s observer-residual finalize-deferral is preserved verbatim, including the 10s watchdog that prevents the `commit_output_blocks` ↔ `session.finalize()` race (learning #16). Decode-role bits are present on `CdRequest` but unreachable through this slice's code paths — Slice 4 wires them. Coexistence (Slice 3 only): `PrefillCoordinatorImpl` stays alive because production wiring (`prefill_leader.rs`, `init.rs`, `unified.rs`) still references it; `cd_loopback`, `cd_decode_e2e`, `cd_bidirectional_e2e`, and the unified suites also still construct it. Slice 5 cuts production over; Slice 7 deletes it. Deviation: `ConditionalDecodeG2Observer::new()` widened from private to `pub` so the new coordinator (sibling module) can construct one. No behavior change. Tests: - cd_prefill_e2e: harness swapped to ConditionalDisaggCoordinator; all 11 tests pass against the new coord. - lib unit: 197 → 198 (new `size_of_is_nonzero` constructor probe). - All other suites unchanged. - Total CD-scope: 229 active tests passing.
kangclzjc
added a commit
to kangclzjc/dynamo
that referenced
this pull request
Jun 4, 2026
/ai-dynamo#8/ai-dynamo#15/ai-dynamo#16) Test-only additions for the seams the review flagged as untested. ai-dynamo#7 _project_scale_to with a real apply outcome (4 cases): both components changed → full ScalingDecision; both equal current → None (PSM-equivalent no-change); single-component proposal → other count stays None; non-apply execute_action → None. Previously every adapter.tick test hit only the None/empty path, so a regression in the projection / no-change detection would have shipped silently. ai-dynamo#8 _tick_input_to_context + FPM encoding: build a TickInput with traffic (incl kv_hit_rate), worker counts (incl scaling-in-progress flags), and a real ForwardPassMetrics; assert the PipelineContext.observations mapping and that the FPM bytes decode back (key format "<worker_id>/<dp_rank>", canonical encoder). This is the ingress glue where the add_observations P1 + the projection live. ai-dynamo#15 registry mutation during an in-flight tick: suspend a PROPOSE plugin mid-gather (asyncio.Event), register a new plugin while suspended, release, and assert the late plugin did NOT join the in-flight stage (pre-tick snapshot) and the tick completed cleanly — then a fresh tick picks it up. Exercises the no-locks invariant that scheduler.py/server.py document but no test covered. ai-dynamo#16 test_tick_diagnostics_extended scope note: clarify in the module docstring that plugin_overrides / reconcile_reasons / held_over_plugins have no production populator in this PR; these tests lock the dataclass contract (defaults / no shared-mutable aliasing), not live behavior. 835 planner tests pass (+6). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Needs more testing but good enough for now. I get the same results with this as with
vllm serve.