Skip to content

feat: expose KV routing components for easier router customization#15

Merged
nnshah1 merged 13 commits into
mainfrom
gluo/kv_breaker
Mar 6, 2025
Merged

feat: expose KV routing components for easier router customization#15
nnshah1 merged 13 commits into
mainfrom
gluo/kv_breaker

Conversation

@GuanLuo

@GuanLuo GuanLuo commented Mar 4, 2025

Copy link
Copy Markdown
Contributor

What does the PR do?

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

Test plan:

  • CI Pipeline ID:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@github-actions

github-actions Bot commented Mar 4, 2025

Copy link
Copy Markdown
Contributor

Test Results

  2 files    2 suites   55s ⏱️
 80 tests  80 ✅ 0 💤 0 ❌
102 runs  101 ✅ 1 💤 0 ❌

Results for commit 51e3267.

♻️ This comment has been updated with latest results.

Comment thread lib/llm/src/kv_router/metrics_aggregator.rs Outdated
Comment thread examples/python_rs/llm/vllm/kv_router/router.py Outdated
Comment thread lib/bindings/python/rust/llm/kv.rs Outdated
Comment thread lib/bindings/python/rust/llm/kv.rs Outdated
Comment thread lib/bindings/python/rust/llm/kv.rs Outdated
Comment thread examples/python_rs/llm/vllm/kv_router/router.py Outdated
Comment thread lib/bindings/python/src/triton_distributed/_core.pyi Outdated
Comment thread lib/bindings/python/src/triton_distributed/_core.pyi Outdated
Comment thread lib/bindings/python/rust/llm/kv.rs Outdated
Comment thread lib/llm/src/kv_router/metrics_aggregator.rs Outdated
Comment thread lib/llm/src/kv_router/metrics_aggregator.rs Outdated
Comment thread lib/llm/src/kv_router/metrics_aggregator.rs Outdated
Comment thread lib/llm/src/kv_router/metrics_aggregator.rs Outdated

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a warning?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tracing doesn't have warning level

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread lib/llm/src/kv_router/metrics_aggregator.rs Outdated
Comment thread lib/llm/src/kv_router/metrics_aggregator.rs Outdated
Comment thread lib/llm/src/kv_router/metrics_aggregator.rs Outdated

@alec-flowers alec-flowers left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments but otherwise looks good!

@GuanLuo GuanLuo enabled auto-merge (squash) March 5, 2025 23:49
@nnshah1 nnshah1 disabled auto-merge March 6, 2025 00:06
tokio::spawn(async move {
while let Some(event) = kv_events_rx.next().await {
let event: llm_rs::kv_router::indexer::RouterEvent =
serde_json::from_slice(&event.payload).unwrap();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid unwrap() - this will panic and crash if deserialization of the event fails.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GuanLuo , @alec-flowers - can you fix?

@nnshah1 nnshah1 merged commit dd3fb77 into main Mar 6, 2025
@nnshah1 nnshah1 deleted the gluo/kv_breaker branch March 6, 2025 00:06
.map(|s| Endpoint {
name: s.name,
subject: s.subject,
data: s.data.unwrap(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with unwrap() here, though I don't know the likelihood these things failing without further context

ShounakRay added a commit to ShounakRay/fuzzy-dynamo that referenced this pull request Mar 20, 2026
…iteups for 14 open bugs

- Reformat all 16 issue markdowns to match GitHub bug template (Describe the Bug,
  Steps to Reproduce, Expected/Actual Behavior, Environment, Additional Context)
- Add upstream/ directory with per-bug subfolders containing issue.md, pr.md, and
  discovery.md for each of the 16 bugs found via fuzzing
- Add fixes/ directory with annotated patch files and regression tests for all 14
  open bugs
- Create 14 minimal fix branches (fix/*-{bug#}) off upstream/main, each with a
  single commit touching only the affected file
- Verified all 14 bugs still present in upstream/main as of 50af343
- Confirmed 2 bugs already fixed upstream (ai-dynamo#15 RadixTree scoring, ai-dynamo#16 TwoPartCodec overflow)
galletas1712 added a commit that referenced this pull request Apr 23, 2026
The fork's devel stage in docker/Dockerfile.multi runs
scripts/generate_container_oss_attribution.sh and treats TRT_LLM_VER as a
required TAG argument. The docker/Makefile %_build rule derives
TRT_LLM_VERSION from tensorrt_llm/version.py::__version__ and forwards it
via --build-arg TRT_LLM_VER=..., but our direct buildx invocation was
missing that, so the devel stage failed with:

  #15 0.092 ERROR: Missing required parameter TAG
  failed to solve: process "... /tmp/gen_attribution.sh 'devel' ${TRT_LLM_VER} ${TARGETARCH}"
  did not complete successfully: exit code: 1

Mirror the Makefile: parse __version__ out of tensorrt_llm/version.py
after we append the commit suffix, and pass it as --build-arg TRT_LLM_VER.

Signed-off-by: Schwinn Saereesitthipitak <schwinns@nvidia.com>
kangclzjc added a commit to kangclzjc/dynamo that referenced this pull request Jun 4, 2026
/ai-dynamo#8/ai-dynamo#15/ai-dynamo#16)

Test-only additions for the seams the review flagged as untested.

ai-dynamo#7 _project_scale_to with a real apply outcome (4 cases):
  both components changed → full ScalingDecision; both equal current →
  None (PSM-equivalent no-change); single-component proposal → other count
  stays None; non-apply execute_action → None. Previously every adapter.tick
  test hit only the None/empty path, so a regression in the projection /
  no-change detection would have shipped silently.

ai-dynamo#8 _tick_input_to_context + FPM encoding:
  build a TickInput with traffic (incl kv_hit_rate), worker counts (incl
  scaling-in-progress flags), and a real ForwardPassMetrics; assert the
  PipelineContext.observations mapping and that the FPM bytes decode back
  (key format "<worker_id>/<dp_rank>", canonical encoder). This is the
  ingress glue where the add_observations P1 + the projection live.

ai-dynamo#15 registry mutation during an in-flight tick:
  suspend a PROPOSE plugin mid-gather (asyncio.Event), register a new
  plugin while suspended, release, and assert the late plugin did NOT join
  the in-flight stage (pre-tick snapshot) and the tick completed cleanly —
  then a fresh tick picks it up. Exercises the no-locks invariant that
  scheduler.py/server.py document but no test covered.

ai-dynamo#16 test_tick_diagnostics_extended scope note:
  clarify in the module docstring that plugin_overrides / reconcile_reasons
  / held_over_plugins have no production populator in this PR; these tests
  lock the dataclass contract (defaults / no shared-mutable aliasing), not
  live behavior.

835 planner tests pass (+6).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants