feat(sglang): MM-aware KV routing via pad_value substitution by krishung5 · Pull Request #9561 · ai-dynamo/dynamo

krishung5 · 2026-05-14T17:21:41Z

Summary

Adds MM-aware KV routing support on the SGLang backend, mirroring the existing vLLM path. The Rust frontend substitutes per-image pad_values into the routing-side token stream so the routing hash matches what SGLang publishes in BlockStored events byte-for-byte. Backend identity is auto-detected from the worker's ModelDeploymentCard — no deployer-side env var required.

Closes DIS-1933

Companion PRs

Upstream SGLang: sgl-project/sglang#25300 adds the optional mm_hashes field on GenerateReqInput that this PR's pad_value substitution relies on. Without that upstream PR, MM-aware routing silently degrades to text-prefix overlap.
PR ci(mm-router): parallel worker boot + pre_merge gater for MM routing #9542 (vLLM-side strong-gating) — overlaps with one commit in this PR (test infra in tests/utils/payloads.py and tests/utils/multimodal.py). Whichever lands first rebases the other cleanly.

What this PR adds

Commit	Purpose
MM-routing rewrite for BPE-shatterable numbered placeholders	Lightseek init: handle tokenizers that BPE-shatter `<img_1>` etc.
Qwen2-VL / Qwen2.5-VL fix	Resolve dual-token placeholder via tokenizer-loader
rustfmt	Style only
pad_value mode in Rust frontend	Routing-token substitution switches to pad-value-based for sglang workers, matching sglang's `BlockStored` token_ids
SGLang MM-routing glue + test scaffolding	`decode_handler.py` signature-probes `mm_hashes` kwarg; new sglang `agg_multimodal_router.sh` (parallel worker boot); sglang test profile matrix covering 6 VLMs
Strong-gate sglang test profiles	Opt all 6 profiles into `require_lightseek_init` + `min_routing_total_blocks=10` — fails closed when MM-aware routing silently degrades
16-char hex `mm_hashes` for sglang	vLLM's 64-char-padded form would collapse sglang's per-block `pad_value` to a constant; sglang gets the bare 16-hex u64
Auto-detect backend via MDC	Workers advertise `backend_framework: "sglang" \| "vllm"` in `ModelRuntimeConfig` at registration; frontend reads it instead of an env var. Drops `DYN_MM_ROUTING_BACKEND` from the SGLang launcher + docs

Test coverage

pre_merge: Qwen3-VL-2B-Instruct on agg_router
post_merge: Qwen2.5-VL-3B, Qwen2-VL-2B, Phi-3-vision-128k,
LLaVA-1.5-7b, LLaVA-NeXT-mistral-7b — all on agg_router

All 6 profiles use strong-gating (lightseek init log + 10-block routing threshold) so a silent regression to text-prefix-only routing fails the test rather than passing on text-prefix luck.

Test plan

pytest tests/serve/test_sglang.py -k agg_router post-merge green
Local 1-GPU Qwen3-VL-2B smoke on mm-sglang-dev container
Upstream sglang companion PR merged before this lands (or at least
the sglang image built from that branch — we control our image)

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Added multimodal-aware routing support for SGLang backend clusters, enabling efficient processing of multimodal requests.
Tests
- Added comprehensive test profiles for multimodal vision-language models with SGLang backend (Qwen3-VL-2B, Qwen2.5-VL-3B, Qwen2-VL-2B).
- Enhanced test infrastructure for multimodal request routing and KV-cache optimization validation.

When DYN_MM_ROUTING_BACKEND=sglang, the Rust frontend substitutes per-image pad_value = MM_PAD_SHIFT_VALUE + (mm_hash % 2^30) at image positions in routing_token_ids and emits no block_mm_infos. This makes the routing-side hash byte-identical to sglang's BlockStored token_ids (which inline pad_value and carry no extra_keys), so external KV-aware routing works without any sglang event-protocol change — only the minimal C1 mm_hashes interop hook. Default behavior (env unset) is unchanged: canonical image_token_id with mm_hashes appended via block_mm_infos (vLLM-compatible protocol). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Pieces required for end-to-end MM-aware routing with SGLang: - decode_handler.py: signature-probe SGLang's async_generate for the mm_hashes kwarg and forward extra_args["mm_hashes"] from the Rust frontend when supported. Older SGLang builds without the interop hook drop the kwarg and fall back to text-prefix MM routing transparently. - agg_multimodal_router.sh: 2-worker aggregated MM router launch script with xdist-aware port allocation. - multimodal_profiles/sglang.py: profile registry covering Qwen3-VL-2B, Qwen2.5-VL-3B, Qwen2-VL-2B, Phi-3-vision, LLaVA-1.5-7B, LLaVA-NeXT-mistral-7B. Each profile uses make_image_payload_cached_tokens to validate the second request hits the cache populated by the first. - test_sglang.py + tests/utils/multimodal.py: wire SGLang into the shared multimodal test harness (image_server fixture, profile expansion). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ting-blocks The base CachedTokensChatPayload (cached_tokens >= 1 on repeat) can pass when MM-aware routing has silently degraded to text-prefix overlap: with two workers and identical text bodies, the fallback router will frequently land both requests on the same warm worker by luck, and prompt-prefix reuse alone produces cached_tokens > 0. Real silent regressions in the lightseek path were caught only by reading the frontend logs by hand. Extend CachedTokensChatPayload with two opt-in flags that auto-populate the existing expected_log regex contract: * require_lightseek_init=True — assert the per-model "MM-aware KV routing enabled (lightseek)" INFO log fired, proving init succeeded for this model+placeholder spec. * min_routing_total_blocks=N — assert at least one "[ROUTING] ... with X/M blocks overlap" log line has M >= N. Real MM-aware routing inflates the block-count by 30-150x over text-only (image placeholders -> ~14 tokens/block each); N=10 reliably distinguishes the two modes for all profiled models without false positives. All 6 sglang agg_router profiles (Qwen3-VL-2B pre_merge + Qwen2.5-VL-3B / Qwen2-VL-2B / Phi-3-vision / LLaVA-1.5 / LLaVA-NeXT post_merge) opt into both flags. Note: this overlaps verbatim with PR #9542 (vLLM-side same strong-gating). Whichever lands first rebases the other cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Fixture bypasses __init__ via __new__(), so the new attribute set in DecodeWorkerHandler.__init__ must be set manually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

SGLang's supported-models docs list only Qwen-family VLMs for multimodal; phi-3 fails upstream with a rope_scaling shape error and llava variants need GPU 2x for sizing. Keep Qwen3-VL-2B (pre_merge) + Qwen2.5-VL-3B / Qwen2-VL-2B (post_merge) only. Also drops unused make_image_payload import (ruff F401). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai · 2026-05-15T01:41:57Z

Walkthrough

This PR enables multimodal hash-aware routing for SGLang by adding MM hash forwarding in the decode handler, using backend-specific token padding in preprocessing, extending test validation infrastructure, and registering multimodal profiles with a complete launcher script.

Changes

Multimodal Hash-Aware Routing for SGLang

Layer / File(s)	Summary
Decode handler MM hash detection and forwarding `components/src/dynamo/sglang/request_handlers/llm/decode_handler.py`, `components/src/dynamo/sglang/tests/test_sglang_frontend_decoding.py`	DecodeWorkerHandler probes `engine.async_generate` for `mm_hashes` kwarg support at init, extracts MM hashes from per-request `extra_args`, and conditionally forwards them during aggregated generation. Test helper initializes the probe flag for frontend-decoding tests.
Preprocessor SGLang MM routing mode `lib/llm/src/preprocessor.rs`	When `DYN_MM_ROUTING_BACKEND` is `sglang`, preprocessor replaces repeated image tokens with computed pad values that encode MM hash bits, and skips block-level MM object derivation since routing identity is carried in the pad value.
Test payload MM routing assertions `tests/utils/payloads.py`, `tests/utils/multimodal.py`	CachedTokensChatPayload accepts `require_lightseek_init` and `min_routing_total_blocks` flags; test factory functions thread these parameters and TopologyConfig gains `requested_sglang_kv_tokens` for pytest mark injection.
Multimodal SGLang profile registry `tests/serve/multimodal_profiles/sglang.py`	Registers three Qwen VLM models (Qwen3-VL-2B, Qwen2.5-VL-3B, Qwen2-VL-2B) with agg_router topology, per-model timeout/VRAM limits, KV token constraints, single-GPU packing, and cached-token payloads configured for lightseek initialization and routing overlap validation.
Test suite multimodal config wiring `tests/serve/test_sglang.py`	Imports multimodal profiles and topology scripts; generates SGLangConfig instances from SGLANG_MULTIMODAL_PROFILES via `make_multimodal_configs`, merges into sglang_configs, and adds `image_server` fixture to test_sglang_deployment.
MM-aware aggregated router launcher `examples/backends/sglang/launch/agg_multimodal_router.sh`	New bash script orchestrates multi-worker SGLang cluster: parses CLI args, launches workers with per-worker system ports and KV-events TCP endpoints, waits for `/health` readiness, starts frontend router with HTTP routing and KV mode, polls chat endpoint until ready, and blocks on service termination.

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main feature: MM-aware KV routing support on SGLang via pad_value substitution.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description is comprehensive and well-structured, covering overview, changes, related issues, companion PRs, and test coverage with clear details.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

lib/llm/src/preprocessor.rs (1)
1020-1022: ⚡ Quick win

Move environment variable read to startup.

The DYN_MM_ROUTING_BACKEND environment variable is read and parsed on every call to gather_mm_exact_routing_info (i.e., once per multimodal request). For high-throughput workloads, this is wasteful. Consider reading the env var once in OpenAIPreprocessor::new_with_parts and storing the resolved mode (e.g., mm_routing_backend: MmRoutingBackend enum or bool flag) as a field on OpenAIPreprocessor. This would reduce per-request overhead to a single field read.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@lib/llm/src/preprocessor.rs` around lines 1020 - 1022, The code currently
reads DYN_MM_ROUTING_BACKEND inside gather_mm_exact_routing_info on every
request (via sglang_pad_value_mode), causing unnecessary per-request env access;
move this logic into OpenAIPreprocessor::new_with_parts by resolving the env
once (e.g., parse into a MmRoutingBackend enum or a bool field named
mm_routing_backend or similar) and store it as a field on OpenAIPreprocessor;
update gather_mm_exact_routing_info to read that field instead of calling
std::env::var, and ensure constructors set the new field when instantiating
OpenAIPreprocessor.
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (1)
236-253: ⚡ Quick win

Consider validating list element types.

The method returns Optional[List[str]] but doesn't verify that mm_hashes list elements are actually strings. If the list contains non-string items (e.g., [1, 2, 3]), they will be forwarded to SGLang and may cause a runtime error downstream. Per Python guidelines, prefer failing fast.
🛡️ Proposed element-type guard
     mm_hashes = extra_args.get("mm_hashes")
     if not mm_hashes:
         return None
     if not isinstance(mm_hashes, list):
         return None
+    if not all(isinstance(h, str) for h in mm_hashes):
+        return None
     return mm_hashes
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@components/src/dynamo/sglang/request_handlers/llm/decode_handler.py` around
lines 236 - 253, The _extract_mm_hashes function currently returns a list
without checking element types; update _extract_mm_hashes to validate that
mm_hashes is a list of str by using an element-type guard (e.g.,
all(isinstance(h, str) for h in mm_hashes)) and if any element is not a string
fail fast by raising a TypeError (include a clear message referencing mm_hashes
and the offending values) so downstream SGLang callers never receive non-string
hashes.
examples/backends/sglang/launch/agg_multimodal_router.sh (1)
65-68: 💤 Low value

Consider using print_launch_banner for consistency.

The script prints a custom banner, but the bash launch guidelines recommend calling print_launch_banner (from launch_utils.sh) with script description, model, and port to ensure consistent, debuggable startup logs across all launch scripts.
📋 Example usage of print_launch_banner
+print_launch_banner \
+    "Lightseek MM Exact Routing (SGLang)" \
+    "${MODEL}" \
+    "${HTTP_PORT}"
+echo "NUM_WORKERS=${NUM_WORKERS}, BLOCK_SIZE=${BLOCK_SIZE}"
+echo "NAMESPACE=${NAMESPACE}"
-echo "=== Lightseek MM Exact Routing Launch (SGLang) ==="
-echo "MODEL=${MODEL}"
-echo "NUM_WORKERS=${NUM_WORKERS}, BLOCK_SIZE=${BLOCK_SIZE}"
-echo "HTTP_PORT=${HTTP_PORT}, NAMESPACE=${NAMESPACE}"
As per coding guidelines: "Call print_launch_banner with script description, model, and port to ensure consistent, debuggable startup logs."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@examples/backends/sglang/launch/agg_multimodal_router.sh` around lines 65 -
68, Replace the custom echo banner with a call to print_launch_banner to follow
launch guidelines: call print_launch_banner with a concise script description
(e.g., "Lightseek MM Exact Routing (SGLang)"), pass MODEL and HTTP_PORT as the
required model and port arguments, and keep printing any extra runtime vars like
NUM_WORKERS and BLOCK_SIZE separately if needed; update the section containing
the current echo lines to invoke print_launch_banner (referencing the
print_launch_banner function) and remove the redundant echo banner to maintain
consistent startup logs.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/utils/payloads.py`:
- Around line 390-395: The code builds a regex using min_routing_total_blocks
but only documents that it must be a power of 10; add an explicit validation
before constructing the regex (in the same block where min_routing_total_blocks
and log_patterns are used) to ensure min_routing_total_blocks is a power of 10
and > 0, and raise a clear ValueError/AssertionError if not; reference the
symbol min_routing_total_blocks and the list log_patterns so the check sits
immediately before the regex construction and prevents generating an
under-enforcing pattern for invalid inputs.

---

Nitpick comments:
In `@components/src/dynamo/sglang/request_handlers/llm/decode_handler.py`:
- Around line 236-253: The _extract_mm_hashes function currently returns a list
without checking element types; update _extract_mm_hashes to validate that
mm_hashes is a list of str by using an element-type guard (e.g.,
all(isinstance(h, str) for h in mm_hashes)) and if any element is not a string
fail fast by raising a TypeError (include a clear message referencing mm_hashes
and the offending values) so downstream SGLang callers never receive non-string
hashes.

In `@examples/backends/sglang/launch/agg_multimodal_router.sh`:
- Around line 65-68: Replace the custom echo banner with a call to
print_launch_banner to follow launch guidelines: call print_launch_banner with a
concise script description (e.g., "Lightseek MM Exact Routing (SGLang)"), pass
MODEL and HTTP_PORT as the required model and port arguments, and keep printing
any extra runtime vars like NUM_WORKERS and BLOCK_SIZE separately if needed;
update the section containing the current echo lines to invoke
print_launch_banner (referencing the print_launch_banner function) and remove
the redundant echo banner to maintain consistent startup logs.

In `@lib/llm/src/preprocessor.rs`:
- Around line 1020-1022: The code currently reads DYN_MM_ROUTING_BACKEND inside
gather_mm_exact_routing_info on every request (via sglang_pad_value_mode),
causing unnecessary per-request env access; move this logic into
OpenAIPreprocessor::new_with_parts by resolving the env once (e.g., parse into a
MmRoutingBackend enum or a bool field named mm_routing_backend or similar) and
store it as a field on OpenAIPreprocessor; update gather_mm_exact_routing_info
to read that field instead of calling std::env::var, and ensure constructors set
the new field when instantiating OpenAIPreprocessor.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 320446bd-dfb5-4e69-9660-a4b7cfc6fef2

📥 Commits

Reviewing files that changed from the base of the PR and between e77222c and b7c53ff.

📒 Files selected for processing (8)

components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
components/src/dynamo/sglang/tests/test_sglang_frontend_decoding.py
examples/backends/sglang/launch/agg_multimodal_router.sh
lib/llm/src/preprocessor.rs
tests/serve/multimodal_profiles/sglang.py
tests/serve/test_sglang.py
tests/utils/multimodal.py
tests/utils/payloads.py

Adds the SGLang row to the support matrix, a How-It-Works subsection covering the pad_value substitution path, and a Launching block with env vars + verification log signals. Cross-link from multimodal-sglang.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Collapse three call-site branches on MmRoutingProtocol::Sglang into methods on the enum: - format_mm_hash_hex: 16-char (sglang) vs 64-char (vLLM) - image_fill_token: pad_value(mm_hash) (sglang) vs find_token_id (vLLM) - emits_block_mm_infos: false (sglang) vs true (vLLM) Wire format split stays load-bearing: parse_mm_hash_from_extra_key in lib/kv-router/src/zmq_wire/extra_keys.rs uses 64-char length to filter MM hashes from LoRA/cache_salt/prompt-embed extra_keys in vLLM BlockStored events. Verified: - cargo check + clippy clean (lightseek-mm feature) - mm_pad_value_matches_sglang_protocol unit test still pins formula - qwen3-vl-2b sglang MM-routing e2e PASSED with kv_hit_rate=0.944 (>= 0.9 strong gate) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The vendored sglang patches dir (added in c733963) wasn't matched by any glob in .github/filters.yaml, so the changed-files coverage gate failed closed and cascaded into skipping all backend jobs on PR #9561. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The Report pytest markers gate in pre-commit requires every test to declare a Test Type and Hardware marker. Both tests in this file are pure unit (import sglang module, assert on values; no GPU, no engine startup), so unit + gpu_0 is the appropriate pair. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The dynamo-runtime sequential job runs on an image without sglang installed and filters by `not sglang`. Without the marker, the test file matches the filter, gets collected, fails on `import sglang.srt`. Adding pytest.mark.sglang gates it to the sglang container image where the module is available.

Tighten per-method docs to 1-2 lines and add a TODO on the enum pointing at the kv-router 64-char type-tag dependency that blocks full wire-format unification.

… + comment Strip user-visible "lightseek" mentions from the sglang multimodal-router launch script (header, default NAMESPACE, banner echos) and one stray comment in preprocessor.rs. The cargo feature name `lightseek-mm` and the `lightseek_mm` module/type names stay since they're real identifiers.

Patches are version-pinned under v${SGLANG_VER}/; the dir-existence guard already handles upstream-absorbed bumps, so the --forward / rc-tolerance layer was redundant. Switch to strict: any non-zero `patch` exit aborts the build. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

nv-anants

reviewed container/ changes

pull-request-size Bot added the size/XXL label May 14, 2026

github-actions Bot added backend::sglang Relates to the sglang backend frontend `python -m dynamo.frontend` and `dynamo-run in=http|text|grpc` feat labels May 14, 2026

krishung5 force-pushed the krish/sglang-mm-routing branch from 3791c43 to f492144 Compare May 14, 2026 17:46

pull-request-size Bot added size/L and removed size/XXL labels May 14, 2026

copy-pr-bot Bot temporarily deployed to GITLAB May 14, 2026 17:46 Inactive

krishung5 and others added 3 commits May 14, 2026 10:56

krishung5 force-pushed the krish/sglang-mm-routing branch from f492144 to a6d7031 Compare May 14, 2026 17:57

copy-pr-bot Bot temporarily deployed to GITLAB May 14, 2026 17:57 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB May 14, 2026 19:19 Inactive

krishung5 and others added 2 commits May 14, 2026 18:34

test(sglang): set _mm_hashes_supported in decode_handler fixture

90a69ec

Fixture bypasses __init__ via __new__(), so the new attribute set in DecodeWorkerHandler.__init__ must be set manually. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

copy-pr-bot Bot temporarily deployed to GITLAB May 15, 2026 01:34 Inactive

krishung5 marked this pull request as ready for review May 15, 2026 01:36

krishung5 requested review from a team as code owners May 15, 2026 01:36

coderabbitai Bot reviewed May 15, 2026

View reviewed changes

Comment thread tests/utils/payloads.py

copy-pr-bot Bot temporarily deployed to GITLAB May 15, 2026 07:36 Inactive

pull-request-size Bot added size/XL and removed size/L labels May 15, 2026

copy-pr-bot Bot temporarily deployed to GITLAB May 15, 2026 17:12 Inactive

github-actions Bot added the documentation Improvements or additions to documentation label May 15, 2026

krishung5 and others added 2 commits June 2, 2026 21:19

Merge remote-tracking branch 'origin/main' into krish/sglang-mm-routing

7b0fa68

pull-request-size Bot added size/XXL and removed size/XL labels Jun 3, 2026

copy-pr-bot Bot temporarily deployed to GITLAB June 3, 2026 04:24 Inactive

github-actions Bot added the container label Jun 3, 2026

copy-pr-bot Bot had a problem deploying to GITLAB June 3, 2026 04:25 Failure

copy-pr-bot Bot temporarily deployed to GITLAB June 3, 2026 04:28 Inactive

github-actions Bot added the actions label Jun 3, 2026

copy-pr-bot Bot had a problem deploying to GITLAB June 3, 2026 04:30 Failure

copy-pr-bot Bot temporarily deployed to GITLAB June 3, 2026 04:32 Inactive

copy-pr-bot Bot had a problem deploying to GITLAB June 3, 2026 04:35 Failure

copy-pr-bot Bot temporarily deployed to GITLAB June 3, 2026 07:49 Inactive

copy-pr-bot Bot temporarily deployed to GITLAB June 3, 2026 07:50 Inactive

refactor(preprocessor): trim MmRoutingProtocol doc comments

292e7c8

Tighten per-method docs to 1-2 lines and add a TODO on the enum pointing at the kv-router 64-char type-tag dependency that blocks full wire-format unification.

copy-pr-bot Bot temporarily deployed to GITLAB June 3, 2026 16:51 Inactive

krishung5 requested a review from GuanLuo June 3, 2026 16:55

copy-pr-bot Bot temporarily deployed to GITLAB June 3, 2026 17:03 Inactive

pull-request-size Bot added size/XL and removed size/XXL labels Jun 3, 2026

This comment has been minimized.

Sign in to view

GuanLuo approved these changes Jun 3, 2026

View reviewed changes

nv-anants reviewed Jun 3, 2026

View reviewed changes

Comment thread container/templates/sglang_runtime.Dockerfile Outdated

nv-anants approved these changes Jun 3, 2026

View reviewed changes

krishung5 mentioned this pull request Jun 4, 2026

refactor(mm-routing): unify on canonical pad_value, drop backend flags #10328

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sglang): MM-aware KV routing via pad_value substitution#9561

feat(sglang): MM-aware KV routing via pad_value substitution#9561
krishung5 merged 39 commits into
mainfrom
krish/sglang-mm-routing

krishung5 commented May 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 15, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

nv-anants left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

krishung5 commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Companion PRs

What this PR adds

Test coverage

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

This comment has been minimized.

Uh oh!

nv-anants left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

krishung5 commented May 14, 2026 •

edited

Loading

coderabbitai Bot commented May 15, 2026 •

edited

Loading