Skip to content

[Bug Fix] missing index/KV transfer for MTP layer in NSA disaggregation#23539

Merged
ShangmingCai merged 8 commits intosgl-project:mainfrom
zRzRzRzRzRzRzR:glm-pd
Apr 30, 2026
Merged

[Bug Fix] missing index/KV transfer for MTP layer in NSA disaggregation#23539
ShangmingCai merged 8 commits intosgl-project:mainfrom
zRzRzRzRzRzRzR:glm-pd

Conversation

@zRzRzRzRzRzRzR
Copy link
Copy Markdown
Contributor

Motivation

In PD disaggregation with NSA + MTP, only the target model's NSA state buffers are registered for transfer. The draft model's NSATokenToKVPool buffers are never appended to kv_args, so the MTP layer's index/KV state is not sent from prefill to decode, causing wrong speculative decoding results.

Modifications

In DecodePreallocQueue and PrefillBootstrapQueue, when the main pool is NSA, also append draft_token_to_kv_pool.get_state_buf_infos() to kv_args if the draft pool is also NSA.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown
Collaborator

@ShangmingCai ShangmingCai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Thx for the fix.

@ShangmingCai
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@ShangmingCai
Copy link
Copy Markdown
Collaborator

/rerun-stage stage-c-test-8-gpu-h20

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies). View workflow run

@ShangmingCai
Copy link
Copy Markdown
Collaborator

/rerun-stage stage-c-test-8-gpu-h20

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies). View workflow run

@ShangmingCai
Copy link
Copy Markdown
Collaborator

/rerun-stage stage-c-test-8-gpu-h20

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered stage-c-test-8-gpu-h20 to run independently (skipping dependencies). View workflow run

@kpham-sgl kpham-sgl self-assigned this Apr 24, 2026
Comment on lines +372 to +374
if self.draft_token_to_kv_pool is not None and isinstance(
self.draft_token_to_kv_pool, NSATokenToKVPool
):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small questions here

  1. Do we need draft_token_to_kv_pool to also be NSATokenToKVPool?
  2. Need to check if hasattr(self.draft_token_to_kv_pool,"get_state_buf_infos")

Copy link
Copy Markdown
Collaborator

@ShangmingCai ShangmingCai Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. Do we need to consider the Non-MTP spec decode cases? Not sure if this is a common use-case. @zRzRzRzRzRzRzR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The isinstance(self.draft_token_to_kv_pool, NSATokenToKVPool) guard makes this PR a no-op for non-NSA draft pools, so non-MTP spec decode cases aren't affected. The fix kicks in only when the draft pool is also NSA, which today is the MTP-on-NSA path. If a future non-MTP spec decode also runs an NSA draft, the same logic applies and would Just Work.

@JustinTong0323
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

1 similar comment
@JustinTong0323
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

@ShangmingCai
Copy link
Copy Markdown
Collaborator

PD-related CI has passed. Let's merge.

@ShangmingCai ShangmingCai merged commit d040333 into sgl-project:main Apr 30, 2026
223 of 254 checks passed
@zRzRzRzRzRzRzR zRzRzRzRzRzRzR deleted the glm-pd branch April 30, 2026 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants