Checklist
Describe the bug
when benchmark SGLang PD deployment with large concurrency, when some request failed, sometimes there are a lot of Abort request logging. For example, in prefill side:
Abort bootstrap queue request. req.rid='46e7f65cacc149f1b8324548e3640ac8'
Abort queued request. req.rid='e70c35086c82432aa946efab6507b018'
Abort bootstrap queue request. req.rid='988ce7ffe69645708e0f545c3aff27ba'
Abort bootstrap queue request. req.rid='a4126825397541328fcfaff001b2fff2'
in decoding side:
Abort transfer queue request. decode_req.req.rid='3d088e464d8a4a9889f43f92746e76c3'
Abort transfer queue request. decode_req.req.rid='ec3c4d7bccb14bdf852e986ad7462e94'
Abort transfer queue request. decode_req.req.rid='25013b592b254378b3633e378fa29a87'
Actually, these request are not really Aborted.
should be bug of code in Scheduler.abort_request():
# Delete requests not in the waiting queue when PD disaggregation is enabled
if self.disaggregation_mode == DisaggregationMode.PREFILL:
# Abort requests that have not yet been bootstrapped
for i, req in enumerate(self.disagg_prefill_bootstrap_queue.queue):
logger.debug(f"Abort bootstrap queue request. {req.rid=}")
if recv_req.abort_all or req.rid.startswith(recv_req.rid):
if hasattr(req.disagg_kv_sender, "abort"):
req.disagg_kv_sender.abort()
# Abort in-flight requests
for i, req in enumerate(self.disagg_prefill_inflight_queue):
logger.debug(f"Abort inflight queue request. {req.rid=}")
if recv_req.abort_all or req.rid.startswith(recv_req.rid):
if hasattr(req.disagg_kv_sender, "abort"):
req.disagg_kv_sender.abort()
elif self.disaggregation_mode == DisaggregationMode.DECODE:
# Abort requests that have not yet finished preallocation
for i, decode_req in enumerate(self.disagg_decode_prealloc_queue.queue):
logger.debug(f"Abort prealloc queue request. {decode_req.req.rid=}")
if recv_req.abort_all or decode_req.req.rid.startswith(recv_req.rid):
if hasattr(decode_req.kv_receiver, "abort"):
decode_req.kv_receiver.abort()
# Abort requests waiting for kvcache to release tree cache
for i, decode_req in enumerate(self.disagg_decode_transfer_queue.queue):
logger.debug(f"Abort transfer queue request. {decode_req.req.rid=}")
if recv_req.abort_all or decode_req.req.rid.startswith(recv_req.rid):
if hasattr(decode_req.kv_receiver, "abort"):
decode_req.kv_receiver.abort()
these Abort logging calling should be in the scope of if statements.
Reproduction
benchmark sglang PD with large concurrency and large request num.
Environment
0.5.2
Checklist
Describe the bug
when benchmark SGLang PD deployment with large concurrency, when some request failed, sometimes there are a lot of Abort request logging. For example, in prefill side:
in decoding side:
Actually, these request are not really Aborted.
should be bug of code in Scheduler.abort_request():
these Abort logging calling should be in the scope of if statements.
Reproduction
benchmark sglang PD with large concurrency and large request num.
Environment
0.5.2