Checklist
Describe the bug
When I conducted a stability stress test with 400 concurrent requests of varying lengths, the unbootstrap queue in the prefill stage kept piling up. Additionally, the token_usage on the decode side approached 1. Due to the timeout mechanism in our link (e.g., automatically disconnecting when TTFT exceeds 60s), the business layer actively terminates the connection with the sglang HTTP server. However, it appears that the decode side fails to release resources properly, leading to the accumulation of unbootstrap tasks in the prefill stage and eventually causing the decode side to hang.
Decode
Prefill
Reproduction
Look Above
Environment
H20
Prefill 2-node
Decode 4-node
Checklist
Describe the bug
When I conducted a stability stress test with 400 concurrent requests of varying lengths, the unbootstrap queue in the prefill stage kept piling up. Additionally, the token_usage on the decode side approached 1. Due to the timeout mechanism in our link (e.g., automatically disconnecting when TTFT exceeds 60s), the business layer actively terminates the connection with the sglang HTTP server. However, it appears that the decode side fails to release resources properly, leading to the accumulation of unbootstrap tasks in the prefill stage and eventually causing the decode side to hang.
Decode
Prefill
Reproduction
Look Above
Environment
H20
Prefill 2-node
Decode 4-node