[PD] Fix the infinite loop in deocde resolve_pending_reqs#20371
[PD] Fix the infinite loop in deocde resolve_pending_reqs#20371
Conversation
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
/rerun-stage stage-c-test-8-gpu-h20 |
|
✅ Triggered |
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
Signed-off-by: Shangming Cai <csmthu@gmail.com>
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
/rerun-stage stage-c-test-8-gpu-h20 |
|
✅ Triggered |
Signed-off-by: Shangming Cai <csmthu@gmail.com>
|
/rerun-stage stage-c-test-8-gpu-h20 |
|
✅ Triggered |
|
Actually, it is a bug introduced by me; I only considered one prefill instance. Your fix is right, we should group all the requests by the bootstrap address. |
|
BTW, can we remove the |
…t#20371) Signed-off-by: Shangming Cai <csmthu@gmail.com>
…t#20371) Signed-off-by: Shangming Cai <csmthu@gmail.com>
…t#20371) Signed-off-by: Shangming Cai <csmthu@gmail.com>
…t#20371) Signed-off-by: Shangming Cai <csmthu@gmail.com>
…t#20371) Signed-off-by: Shangming Cai <csmthu@gmail.com>
…t#20371) Signed-off-by: Shangming Cai <csmthu@gmail.com>
Motivation
Fix #20252
Can only reproduce the bug when using sglang-router (mini-lb will reject the req if prefill is dead immediately, so it won't go into the pending_reqs)
Fix plan:
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci