refactor(mp): replace job_id with request_id in query_prefetch_status#2996
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the multiprocess lookup protocol to use client-provided request_ids instead of server-generated job_ids for tracking prefetch jobs. It removes the QUERY_PREFETCH_LOOKUP_HITS request type and updates the LOOKUP and QUERY_PREFETCH_STATUS protocols to use the request_id. A critical issue was identified in query_prefetch_status where returning None for an unknown request_id could lead to infinite polling; returning 0 is suggested instead to allow the client to terminate. Additionally, a private method should be moved to the end of the class to comply with the project's style guide.
The server-internal integer job_id was unnecessarily leaked to the vLLM adapter via _lookup_job_ids / _finished_lookup_jobs, adding a redundant layer of indirection. The external request_id (str) is the natural key and is already stored in every _PrefetchJob. Changes: - _prefetch_jobs is now keyed by request_id (str) instead of job_id (int) - _next_prefetch_job_id removed; _register_prefetch_job() stores by request_id - lookup() now returns None; job tracking is entirely server-side - query_prefetch_status(request_id: str) replaces the int-based version - query_prefetch_lookup_hits removed: it was only called internally by sync_lookup (feature branch) and had no external callers on dev; the QUERY_PREFETCH_LOOKUP_HITS RPC endpoint is removed accordingly - Adapter simplified: _lookup_job_ids replaced by _pending_lookups: set[str] and _finished_lookup_results: dict[str, int] (cache keyed by request_id); polls QUERY_PREFETCH_STATUS by request_id; cached result handles repeated calls to check_lookup_result after the server has popped the job - Protocol updated: LOOKUP response_class=None, QUERY_PREFETCH_STATUS payload_classes=[str], QUERY_PREFETCH_LOOKUP_HITS removed - tests/v1/multiprocess/test_query_lookup_hits.py deleted - All other affected tests updated No changes to storage_manager.py or prefetch_controller.py; the internal prefetch_request_id (int) inside PrefetchHandle is unaffected. Signed-off-by: crclq2018@gmail.com Signed-off-by: rigginschen <rigginschen@tencent.com>
ee3e7da to
47554d1
Compare
375db3a to
4712304
Compare
4712304 to
9ae816c
Compare
Signed-off-by: rigginschen <rigginschen@tencent.com>
9ae816c to
1038a60
Compare
ApostaC
left a comment
There was a problem hiding this comment.
One potential problem in the query_prefetch_status's return value. Please see the details below.
| The number of hits for the prefetched keys if the lookup phase is | ||
| done. None if the lookup phase is still in progress, or the prefetch | ||
| is already completed and consumed by query_prefetch_status, or the | ||
| request_id is invalid. |
There was a problem hiding this comment.
One potential problem here:
Previously, since we have the job id, the caller should be able to avoid calling this function with an unknown job id.
However, since we changed to using an external request id, there could be a case where the caller calls this function with an unknown request id. If we still return None here, it may cause the caller to spin on this request forever because it may interpret the returned None as "lookup is in progress".
Therefore, I suggest we return 0 if the request ID is not found. WDYT?
There was a problem hiding this comment.
Agreed. We should never return None in this case. Let's return 0 and emit a warning.
85a7052 to
6562bbc
Compare
6562bbc to
85a7052
Compare
When query_prefetch_status or query_prefetch_lookup_hits is called with an unknown request_id, returning None would cause the caller to interpret it as "still in progress" and spin forever. Return 0 instead to signal no hits and allow the caller to terminate. Signed-off-by: crclq2018@gmail.com Signed-off-by: rigginschen <rigginschen@tencent.com>
85a7052 to
d176fd3
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit e0535d8. Configure here.
Signed-off-by: rigginschen <rigginschen@tencent.com>
e0535d8 to
c3c13a3
Compare
…LMCache#2996) * refactor(mp): replace job_id with request_id in query_prefetch_status The server-internal integer job_id was unnecessarily leaked to the vLLM adapter via _lookup_job_ids / _finished_lookup_jobs, adding a redundant layer of indirection. The external request_id (str) is the natural key and is already stored in every _PrefetchJob. Changes: - _prefetch_jobs is now keyed by request_id (str) instead of job_id (int) - _next_prefetch_job_id removed; _register_prefetch_job() stores by request_id - lookup() now returns None; job tracking is entirely server-side - query_prefetch_status(request_id: str) replaces the int-based version - query_prefetch_lookup_hits removed: it was only called internally by sync_lookup (feature branch) and had no external callers on dev; the QUERY_PREFETCH_LOOKUP_HITS RPC endpoint is removed accordingly - Adapter simplified: _lookup_job_ids replaced by _pending_lookups: set[str] and _finished_lookup_results: dict[str, int] (cache keyed by request_id); polls QUERY_PREFETCH_STATUS by request_id; cached result handles repeated calls to check_lookup_result after the server has popped the job - Protocol updated: LOOKUP response_class=None, QUERY_PREFETCH_STATUS payload_classes=[str], QUERY_PREFETCH_LOOKUP_HITS removed - tests/v1/multiprocess/test_query_lookup_hits.py deleted - All other affected tests updated No changes to storage_manager.py or prefetch_controller.py; the internal prefetch_request_id (int) inside PrefetchHandle is unaffected. Signed-off-by: crclq2018@gmail.com Signed-off-by: rigginschen <rigginschen@tencent.com> * fix(mp): return 0 for unknown request_id to prevent infinite polling When query_prefetch_status or query_prefetch_lookup_hits is called with an unknown request_id, returning None would cause the caller to interpret it as "still in progress" and spin forever. Return 0 instead to signal no hits and allow the caller to terminate. Signed-off-by: crclq2018@gmail.com Signed-off-by: rigginschen <rigginschen@tencent.com> --------- Signed-off-by: crclq2018@gmail.com Signed-off-by: rigginschen <rigginschen@tencent.com> Co-authored-by: rigginschen <rigginschen@tencent.com>
…LMCache#2996) * refactor(mp): replace job_id with request_id in query_prefetch_status The server-internal integer job_id was unnecessarily leaked to the vLLM adapter via _lookup_job_ids / _finished_lookup_jobs, adding a redundant layer of indirection. The external request_id (str) is the natural key and is already stored in every _PrefetchJob. Changes: - _prefetch_jobs is now keyed by request_id (str) instead of job_id (int) - _next_prefetch_job_id removed; _register_prefetch_job() stores by request_id - lookup() now returns None; job tracking is entirely server-side - query_prefetch_status(request_id: str) replaces the int-based version - query_prefetch_lookup_hits removed: it was only called internally by sync_lookup (feature branch) and had no external callers on dev; the QUERY_PREFETCH_LOOKUP_HITS RPC endpoint is removed accordingly - Adapter simplified: _lookup_job_ids replaced by _pending_lookups: set[str] and _finished_lookup_results: dict[str, int] (cache keyed by request_id); polls QUERY_PREFETCH_STATUS by request_id; cached result handles repeated calls to check_lookup_result after the server has popped the job - Protocol updated: LOOKUP response_class=None, QUERY_PREFETCH_STATUS payload_classes=[str], QUERY_PREFETCH_LOOKUP_HITS removed - tests/v1/multiprocess/test_query_lookup_hits.py deleted - All other affected tests updated No changes to storage_manager.py or prefetch_controller.py; the internal prefetch_request_id (int) inside PrefetchHandle is unaffected. Signed-off-by: crclq2018@gmail.com Signed-off-by: rigginschen <rigginschen@tencent.com> * fix(mp): return 0 for unknown request_id to prevent infinite polling When query_prefetch_status or query_prefetch_lookup_hits is called with an unknown request_id, returning None would cause the caller to interpret it as "still in progress" and spin forever. Return 0 instead to signal no hits and allow the caller to terminate. Signed-off-by: crclq2018@gmail.com Signed-off-by: rigginschen <rigginschen@tencent.com> --------- Signed-off-by: crclq2018@gmail.com Signed-off-by: rigginschen <rigginschen@tencent.com> Co-authored-by: rigginschen <rigginschen@tencent.com>

The server-internal integer job_id was unnecessarily leaked to the vLLM adapter via _lookup_job_ids / _finished_lookup_jobs, adding a redundant layer of indirection. The external request_id (str) is the natural key and is already stored in every _PrefetchJob.
No changes to storage_manager.py or prefetch_controller.py; the internal prefetch_request_id (int) inside PrefetchHandle is unaffected.
Signed-off-by: crclq2018@gmail.com
What this PR does / why we need it:
Special notes for your reviewers:
If applicable:
Note
Medium Risk
Changes the multiprocess wire protocol for
LOOKUP/QUERY_PREFETCH_STATUS(and related client/server state tracking) to userequest_id, which could break older clients/servers and affects lookup/prefetch correctness under retries.Overview
Refactors the multiprocess lookup/prefetch flow to stop leaking server-internal
job_ids and instead track/poll prefetch jobs purely by externalrequest_id.LOOKUPnow returnsNoneand the server stores_prefetch_jobskeyed byrequest_id;QUERY_PREFETCH_STATUS/QUERY_PREFETCH_LOOKUP_HITSpayloads switch fromintjob IDs tostrrequest IDs, with unknown requests returning0to avoid infinite polling. The vLLM scheduler adapter updates its lookup bookkeeping to cache results byrequest_id(supporting repeated checks after server pop) and adjusts cleanup accordingly, with tests updated to match the new protocol.Reviewed by Cursor Bugbot for commit c3c13a3. Bugbot is set up for automated code reviews on this repo. Configure here.