Fix DeepSeek PD decode DP + MTP failed by vectorized gather kernel in…#15457
Fix DeepSeek PD decode DP + MTP failed by vectorized gather kernel in…#15457llc-kc wants to merge 2 commits intosgl-project:mainfrom
Conversation
…dex out of bounds Signed-off-by: liluchang <liluchang@kingsoft.com>
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Possibly different fixes for the same issue. |
|
@yudian0504 thanks, If your pr fix this problem and merged, I will close this pr. |
This is not my PR😂, but we are also experiencing this bug, so we are following it as well. |
|
Additionally, it seems this issue is NOT exclusive to the PD disagg, we've encountered it in our non-PD setups as well, although the probability is quite low, making it difficult to reproduce. |
|
it works for me |
|
Agree with @yudian0504 , this fix doesn't nail the root cause on the wall. It is more like a work-around. We are still encountering this issue in on-line service. |
|
/tag-and-rerun-ci |
|
@hnyls2002 Hi, Is there any progress on this issue? Have any other PRs already fixed it? |
Motivation
Fix DeepSeek PD TP+DP+MTP, decode failed by vectorized gather kernel index out of bounds
as Descripted in
#15143
#15399
Modifications
Using clone to avoid tensor of output in cuda graph output_buffers used at other place simultaneously.
Accuracy Tests
None
Benchmarking and Profiling
None