[bugfix] correct local_chunk_len for DCP in reorg_kvcache with long context#28526
[bugfix] correct local_chunk_len for DCP in reorg_kvcache with long context#28526vllm-bot merged 4 commits intovllm-project:mainfrom
Conversation
…l-size is enabled Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a bugfix for Distributed Context Parallelism (DCP) in the context of chunked prefill for long sequences. The core of the change is in the reorg_kvcache function, where the calculation of local_chunk_len is corrected. Previously, it incorrectly used the total local_context_len, which failed for contexts split across multiple chunks. The new implementation correctly calculates the length of the context for the current rank within the current chunk by using the newly introduced chunk_size and chunk_idx parameters. The changes are well-contained, logical, and include appropriate assertions. The fix appears correct and addresses the described issue effectively.
|
Thanks for the quick fix, it looks promising. I'll rerun the test and come back with the result. |
LucasWilkinson
left a comment
There was a problem hiding this comment.
LGTM, good catch; thanks for ascii art, helps alot!
DSV3 now runs smooth on >120k inputs 🙌 |
|
Need to merge this soon, to land in v0.11.1. |
…ontext (vllm-project#28526) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Signed-off-by: George D. Torres <gdavtor@gmail.com>
…ontext (vllm-project#28526) Signed-off-by: QiuChunshuo <qiuchunshuo@huawei.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Fix the issues #28476 #28411
The previous DCP implementation did not account for cases where long contexts were split to multi-chunks. This has now been addressed by updating the correct chunked local_chunk_len based on the chunk_size and local_context_len.
CC @LucasWilkinson @cjackal @Nemo-G