Skip to content

Add kv_cache_dtype consistency check between prefill and decode#19398

Closed
YAMY1234 wants to merge 1 commit intosgl-project:mainfrom
YAMY1234:kv_cache_dtype
Closed

Add kv_cache_dtype consistency check between prefill and decode#19398
YAMY1234 wants to merge 1 commit intosgl-project:mainfrom
YAMY1234:kv_cache_dtype

Conversation

@YAMY1234
Copy link
Copy Markdown
Collaborator

Motivation

When prefill and decode use different kv_cache_dtype (e.g. prefill bf16, decode fp8_e4m3), the per-page item_len differs (32768 vs 16384). Since the transfer layer uses prefill's item_len to compute both src and dst addresses, this causes dst address overflow into unregistered memory at high concurrency, crashing with "Requested address not found" (Mooncake) or "NIXL_ERR_NOT_FOUND" (NIXL).

There is no dtype conversion in the transfer path, so mismatched dtype silently produces corrupted data at low concurrency and crashes at high concurrency. Add an item_len check when prefill receives decode's KV registration; on mismatch, log an error and refuse the peer instead of crashing the prefill process.

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ShangmingCai
Copy link
Copy Markdown
Collaborator

/rerun-stage stage-c-test-8-gpu-h20

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies).

@github-actions
Copy link
Copy Markdown
Contributor

🔗 View workflow run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants