[KV Connector] Skip stale KV xfer completion notifications in scheduler#43265
[KV Connector] Skip stale KV xfer completion notifications in scheduler#43265zhewenl wants to merge 1 commit into
Conversation
…duler KV transfer completion notifications from the worker-side connector are asynchronous. In P/D setups, when a request's lifecycle ends before the underlying KV write completes (observed under load with the Mooncake connector when request lifecycle < KV write latency), the scheduler has already removed the request from `self.requests` by the time the late `finished_recving` / `finished_sending` callback arrives. The `assert req_id in self.requests` in `_update_from_kv_xfer_finished` then aborts the engine. Skip such stale notifications instead of asserting, with a debug log so drops remain observable. Same underlying issue as vllm-project#37837. Co-authored-by: Claude <noreply@anthropic.com> Signed-off-by: Zhewen Li <zhewenli@inferact.ai>
There was a problem hiding this comment.
Code Review
This pull request updates the scheduler to handle asynchronous KV transfer completions by replacing assertions with checks that skip stale notifications for requests already cleaned up. A reviewer suggested an improvement in the finished_sending loop to verify that a request is in a finished state before calling _free_blocks, preventing potential assertion failures for active requests.
| if req_id not in self.requests: | ||
| logger.debug("Dropping stale finished_sending for request %s", req_id) | ||
| continue | ||
| self._free_blocks(self.requests[req_id]) |
There was a problem hiding this comment.
Similar to the finished_recving loop, calling _free_blocks directly here will trigger an assertion failure inside that method if the request is not in a finished state. If a stale finished_sending notification arrives for an active request, it is safer to log and skip it rather than crashing the engine.
if req_id not in self.requests:
logger.debug("Dropping stale finished_sending for request %s", req_id)
continue
req = self.requests[req_id]
if RequestStatus.is_finished(req.status):
self._free_blocks(req)
else:
logger.debug("Dropping stale finished_sending for request %s with status %s", req_id, req.status)|
I'm trying to understand how this actually occurs. In the async save/load case, the req_id should by design still be in scheduler |
let me clarify, I think this happened when requests are aborted |
That shouldn't cause this in theory, they should remain in scheduler self.requests if there's a transfer in progress. |
|
better fix in #43371 |
Purpose
In P/D (prefill/decode) disaggregated setups, KV transfer completion notifications from the worker-side connector are asynchronous. When a request's lifecycle ends before the underlying KV write completes — observed under load with the Mooncake connector when request lifecycle < KV write latency — the scheduler has already removed the request from
self.requestsby the time the latefinished_recving/finished_sendingcallback arrives. Theassert req_id in self.requestsin_update_from_kv_xfer_finishedthen aborts the engine process.Same underlying issue as #37837.
This change drops late notifications for unknown request ids instead of asserting, with a
logger.debugline so the drops remain observable.