feat: optional caller-supplied mm_hashes on GenerateReqInput#25300
Merged
ishandhanani merged 6 commits intoJun 1, 2026
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
3 tasks
407285b to
d2be361
Compare
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
3552cb7 to
0db5652
Compare
Collaborator
|
/tag-and-rerun-ci |
External KV routers (dynamo, custom orchestrators) sometimes compute their own per-image hash for routing decisions and need sglang's prefix-cache key to align. Today sglang always recomputes MultimodalDataItem.hash via hash_feature() inside set_pad_value, so the caller's hash and sglang's derived pad_value are decoupled. This change adds an optional `mm_hashes: List[str] | None` field on GenerateReqInput (and matching kwargs on Engine.generate/async_generate). When supplied, each MultimodalDataItem.hash is initialised from the list and set_pad_value() skips the internal recompute, so pad_value is deterministic from the caller's hash. Length mismatch or per-item parse error falls back to the existing hash_feature() path so a bad mm_hashes never blocks a request. Defaults to None; behavior is unchanged for current callers. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0db5652 to
1c01579
Compare
stage-b-test-1-gpu-large isn't a valid CUDA suite name; CUDA suites use the base-a/b/c prefix. Switch to the stage="base-b" / runner_config="1-gpu-small" pattern other unit tests in this directory use.
Caught by sglang CI's psf/black 26.1.0 lint hook on PR sgl-project#25300. Pure whitespace; no behavior change.
Re-sort imports so `Modality` precedes `MultimodalDataItem` per isort alphabetical convention. Fixes the CI lint failure that fast-failed the rest of the test stages on PR sgl-project#25300. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5 tasks
Collaborator
|
All relevant CI has passed |
xjpang
pushed a commit
to xjpang/sglang
that referenced
this pull request
Jun 2, 2026
…ject#25300) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
mqhc2020
pushed a commit
to mqhc2020/sglang
that referenced
this pull request
Jun 2, 2026
…ject#25300) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
krishung5
added a commit
to ai-dynamo/dynamo
that referenced
this pull request
Jun 2, 2026
Adds the concrete curl + filter + patch -p2 recipe to apply sgl-project/sglang#25300 (the mm_hashes interop hook) to a stock upstream sglang install. The dynamo sglang container ships upstream sglang without this patch, so MM-aware routing silently degrades to text-prefix fallback unless the patch is applied. For pytest, mirror the same recipe in pytest_collection_modifyitems gated on sglang MM-routing test collection. Idempotent — the grep short-circuits when sglang already exposes mm_hashes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hanming-lu
pushed a commit
that referenced
this pull request
Jun 3, 2026
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
alphabetc1
pushed a commit
to alphabetc1/sglang
that referenced
this pull request
Jun 4, 2026
…ject#25300) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
jeynmann
pushed a commit
to jeynmann/sglang
that referenced
this pull request
Jun 4, 2026
…ject#25300) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
edwingao28
pushed a commit
to edwingao28/sglang
that referenced
this pull request
Jun 7, 2026
…ject#25300) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
monkeyLoveding
pushed a commit
to monkeyLoveding/sglang_open
that referenced
this pull request
Jun 9, 2026
…ject#25300) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
External KV routers compute per-image hashes upstream and need sglang's
MultimodalDataItem.hashto align byte-for-byte so that:is deterministic from the caller's hash. With this, two requests
carrying the same image get the same image-token block in sglang's
RadixAttention cache, and the upstream router can land both on the
cache-warm worker.
Today sglang always recomputes
hash = hash_feature(feature)insideset_pad_value(), so the caller's hash and sglang's derivedpad_valueare decoupled. Routing-side prefix-cache hits become acoincidence rather than a contract.
What this PR does
Adds an optional
mm_hashes: List[str] | Nonefield onGenerateReqInput(and matching kwarg onEngine.generate/Engine.async_generate). When supplied:tokenizer_managerparses each hex string into a u64 (first 16chars) and seeds the corresponding
MultimodalDataItem.hash.set_pad_value()skips the internalhash_featurerecomputewhen
hashis already set.Backward compatibility
Default is
None— no behavior change for any existing caller.Length mismatch or per-item parse error falls back to the existing
hash_featurepath so a malformedmm_hashesnever blocks a request.Tests
test/registered/unit/managers/test_mm_hashes.pypins:GenerateReqInput.mm_hashesfield shape (optional list of hexstrings) and that it defaults to
None.set_pad_value()honors a pre-sethashwithout callinghash_feature(patched to raise if invoked).pad_valueis deterministic across items with identical presethashes; and distinct preset hashes produce distinct
pad_values.Why hex strings, not ints?
Wire formats for upstream routers tend to be JSON-friendly hex
strings (matches the vLLM-compatible mm_hash encoding). Strings also
forward-compat with hashes wider than u64 if sglang's
pad_valuewidth grows.
CI States
Latest PR Test (Base): ❌ Run #26652245227
Latest PR Test (Extra): ❌ Run #26652244980