Skip to content

feat: optional caller-supplied mm_hashes on GenerateReqInput#25300

Merged
ishandhanani merged 6 commits into
sgl-project:mainfrom
krishung5:feat/mm-hash-interop-v0.5.11
Jun 1, 2026
Merged

feat: optional caller-supplied mm_hashes on GenerateReqInput#25300
ishandhanani merged 6 commits into
sgl-project:mainfrom
krishung5:feat/mm-hash-interop-v0.5.11

Conversation

@krishung5

@krishung5 krishung5 commented May 14, 2026

Copy link
Copy Markdown
Contributor

Motivation

External KV routers compute per-image hashes upstream and need sglang's
MultimodalDataItem.hash to align byte-for-byte so that:

pad_value = MM_PAD_SHIFT_VALUE + (item.hash % 2^30)

is deterministic from the caller's hash. With this, two requests
carrying the same image get the same image-token block in sglang's
RadixAttention cache, and the upstream router can land both on the
cache-warm worker.

Today sglang always recomputes hash = hash_feature(feature) inside
set_pad_value(), so the caller's hash and sglang's derived
pad_value are decoupled. Routing-side prefix-cache hits become a
coincidence rather than a contract.

What this PR does

Adds an optional mm_hashes: List[str] | None field on
GenerateReqInput (and matching kwarg on Engine.generate /
Engine.async_generate). When supplied:

  1. tokenizer_manager parses each hex string into a u64 (first 16
    chars) and seeds the corresponding MultimodalDataItem.hash.
  2. set_pad_value() skips the internal hash_feature recompute
    when hash is already set.

Backward compatibility

Default is None — no behavior change for any existing caller.
Length mismatch or per-item parse error falls back to the existing
hash_feature path so a malformed mm_hashes never blocks a request.

Tests

test/registered/unit/managers/test_mm_hashes.py pins:

  • GenerateReqInput.mm_hashes field shape (optional list of hex
    strings) and that it defaults to None.
  • set_pad_value() honors a pre-set hash without calling
    hash_feature (patched to raise if invoked).
  • pad_value is deterministic across items with identical preset
    hashes; and distinct preset hashes produce distinct pad_values.

Why hex strings, not ints?

Wire formats for upstream routers tend to be JSON-friendly hex
strings (matches the vLLM-compatible mm_hash encoding). Strings also
forward-compat with hashes wider than u64 if sglang's pad_value
width grows.


CI States

Latest PR Test (Base): ❌ Run #26652245227
Latest PR Test (Extra): ❌ Run #26652244980

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@krishung5 krishung5 force-pushed the feat/mm-hash-interop-v0.5.11 branch 2 times, most recently from 3552cb7 to 0db5652 Compare May 15, 2026 01:42
@ishandhanani

Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

External KV routers (dynamo, custom orchestrators) sometimes compute
their own per-image hash for routing decisions and need sglang's
prefix-cache key to align. Today sglang always recomputes
MultimodalDataItem.hash via hash_feature() inside set_pad_value, so
the caller's hash and sglang's derived pad_value are decoupled.

This change adds an optional `mm_hashes: List[str] | None` field on
GenerateReqInput (and matching kwargs on Engine.generate/async_generate).
When supplied, each MultimodalDataItem.hash is initialised from the
list and set_pad_value() skips the internal recompute, so pad_value
is deterministic from the caller's hash. Length mismatch or per-item
parse error falls back to the existing hash_feature() path so a bad
mm_hashes never blocks a request.

Defaults to None; behavior is unchanged for current callers.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@krishung5 krishung5 force-pushed the feat/mm-hash-interop-v0.5.11 branch from 0db5652 to 1c01579 Compare May 27, 2026 21:39
krishung5 and others added 3 commits May 27, 2026 14:46
stage-b-test-1-gpu-large isn't a valid CUDA suite name; CUDA suites
use the base-a/b/c prefix. Switch to the stage="base-b" /
runner_config="1-gpu-small" pattern other unit tests in this
directory use.
Caught by sglang CI's psf/black 26.1.0 lint hook on PR sgl-project#25300.
Pure whitespace; no behavior change.
Re-sort imports so `Modality` precedes `MultimodalDataItem` per isort
alphabetical convention. Fixes the CI lint failure that fast-failed the
rest of the test stages on PR sgl-project#25300.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mattteochen mattteochen mentioned this pull request May 28, 2026
5 tasks
@ishandhanani

Copy link
Copy Markdown
Collaborator

All relevant CI has passed

@ishandhanani ishandhanani merged commit 86afa21 into sgl-project:main Jun 1, 2026
132 of 151 checks passed
xjpang pushed a commit to xjpang/sglang that referenced this pull request Jun 2, 2026
…ject#25300)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
mqhc2020 pushed a commit to mqhc2020/sglang that referenced this pull request Jun 2, 2026
…ject#25300)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
krishung5 added a commit to ai-dynamo/dynamo that referenced this pull request Jun 2, 2026
Adds the concrete curl + filter + patch -p2 recipe to apply
sgl-project/sglang#25300 (the mm_hashes interop hook) to a stock
upstream sglang install. The dynamo sglang container ships upstream
sglang without this patch, so MM-aware routing silently degrades to
text-prefix fallback unless the patch is applied.

For pytest, mirror the same recipe in pytest_collection_modifyitems
gated on sglang MM-routing test collection. Idempotent — the grep
short-circuits when sglang already exposes mm_hashes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hanming-lu pushed a commit that referenced this pull request Jun 3, 2026
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
alphabetc1 pushed a commit to alphabetc1/sglang that referenced this pull request Jun 4, 2026
…ject#25300)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
jeynmann pushed a commit to jeynmann/sglang that referenced this pull request Jun 4, 2026
…ject#25300)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
edwingao28 pushed a commit to edwingao28/sglang that referenced this pull request Jun 7, 2026
…ject#25300)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
monkeyLoveding pushed a commit to monkeyLoveding/sglang_open that referenced this pull request Jun 9, 2026
…ject#25300)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: ishandhanani <82981111+ishandhanani@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants