Refactor: sequential key polling for distributed key synchronization by hlin99 · Pull Request #2478 · LMCache/LMCache

hlin99 · 2026-01-23T07:39:49Z

Modified the cache lookup logic to handle synchronization delays in distributed environments. By introducing a polling mechanism, slow devices or lagging storage nodes are given a grace period to synchronize missing keys.

Key changes:

Introduced lookup_poll_key_timeout_ms and lookup_poll_key_intervals_ms to configure retry behavior (defaults to 0, maintaining legacy one-shot behavior).
Implemented a nested polling loop that ensures all layers of a specific chunk are synchronized (hit_chunks == self.num_layers) before advancing.
Integrated time.monotonic() for robust timeout tracking against distributed clock shifts.

This ensures data consistency and prefix-matching integrity in scenarios where storage layers exhibit eventual consistency.

Modified the cache lookup logic to handle synchronization delays in distributed environments. By introducing a polling mechanism, slow devices or lagging storage nodes are given a grace period to synchronize missing keys. Key changes: - Introduced `lookup_poll_key_timeout_ms` and `lookup_poll_key_intervals_ms` to configure retry behavior (defaults to 0, maintaining legacy one-shot behavior). - Implemented a nested polling loop that ensures all layers of a specific chunk are synchronized (`hit_chunks == self.num_layers`) before advancing. - Integrated `time.monotonic()` for robust timeout tracking against distributed clock shifts. This ensures data consistency and prefix-matching integrity in scenarios where storage layers exhibit eventual consistency. Signed-off-by: Tony Lin <tony.lin@intel.com>

gemini-code-assist · 2026-01-23T07:40:14Z

Summary of Changes

Hello @hlin99, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the cache lookup mechanism to enhance robustness in distributed environments where storage layers might exhibit eventual consistency. By introducing a configurable polling strategy, the system can now gracefully handle temporary synchronization delays, ensuring that all necessary cache layers are available before a lookup completes, thereby improving data consistency and prefix-matching integrity.

Highlights

Polling Mechanism: Implemented a polling mechanism in the cache lookup logic to account for synchronization delays in distributed environments, ensuring data consistency.
Configurable Parameters: Introduced configurable lookup_poll_key_timeout_ms and lookup_poll_key_intervals_ms parameters to control the polling behavior, with defaults maintaining legacy one-shot behavior.
Layer Synchronization: Ensured complete synchronization of all cache layers for a given chunk by introducing a nested polling loop that waits until all layers are present.
Robust Timeout Tracking: Utilized time.monotonic() for accurate, drift-resistant timeout tracking, crucial for reliable operation in distributed systems.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

The pull request introduces a sequential key polling mechanism to improve data consistency in distributed environments, particularly for slow or lagging storage nodes. It adds lookup_poll_key_intervals_ms and lookup_poll_key_timeout_ms configuration parameters, defaulting to 0 to maintain existing one-shot lookup behavior. The implementation integrates time.monotonic() for robust timeout tracking. The changes are well-aligned with the stated goals of enhancing data consistency and prefix-matching integrity.

hlin99 · 2026-02-02T12:54:10Z

hi @maobaolong @sammshen would you like to have a review and comment is this change can help increase the robustness and scalability of remote backends? I found such issues on mooncake backends where the receiver may have some latency to successfully poll the key. that's why adding polling timeout may help. Many thanks!

maobaolong

@hlin99 Would you like to add configuration introduce into the docs/source/api_reference/configurations.rst also?

hlin99 · 2026-02-17T14:27:16Z

@hlin99 Would you like to add configuration introduce into the docs/source/api_reference/configurations.rst also?

Sure. Will update. Thanks for the comments!

Signed-off-by: Tony Lin <tony.lin@intel.com>

hlin99 · 2026-02-23T16:25:47Z

@hlin99 Would you like to add configuration introduce into the docs/source/api_reference/configurations.rst also?

Sure. Will update. Thanks for the comments!

doc has been updated. please help review if anything else is needed. thanks. @maobaolong

sammshen · 2026-02-24T06:51:50Z

do we want this or do we want lookup to fail fast?

hlin99 · 2026-02-24T07:09:40Z

do we want this or do we want lookup to fail fast?

hi @sammshen, in some cases of remote backends + proxy server as PD system, the proxy server is not aware of when prefill put kv is done, then to notify decode. It relies on decode to poll kv from the remote. The latency is not high however current implementation w/o poll timeout can fail the get kv call. I encountered such an issue on mooncake + vLLM proxy server. And this PR is to offer a mechanism in LMCache to overcome the issue if others encounter the same.

sammshen · 2026-02-24T07:17:04Z

+                    else:
+                        break
+
+                    time.sleep(self.lookup_poll_key_intervals_ms / 1000)


lookup is synchronous so this could delay the scheduler

sammshen · 2026-02-24T07:17:51Z

can you show the concrete example sorry I don't fully understand

hlin99 · 2026-02-25T12:31:00Z

can you show the concrete example sorry I don't fully understand

hi @sammshen whether the issue exists depend on how the PD proxy server is implemented. a diagram is shown as below. if the proxy server gets notified at the same time of store KV, get KV from decode may happen earlier than KV is ready in the backend. hopefully, this clarifies the issue.

hlin99 · 2026-02-25T12:36:22Z

can you show the concrete example sorry I don't fully understand

hi @sammshen whether the issue exists depend on how the PD proxy server is implemented. a diagram is shown as below. if the proxy server gets notified at the same time of store KV, get KV from decode may happen earlier than KV is ready in the backend. hopefully, this clarifies the issue.

the issue is not caused by LMCache. and it seems the proxy server in LMCache example is notified after store KV, so the issue won't be there, but proxy server can be implemented in diff ways... so this PR offers a configuration to fit any implementation by tuning configs.

sammshen

I'm hesitant to approve this PR due to the while loop on the critical path lookup. I see the motivation but I don't see any issues directly otivating the need for this change. @maobaolong WDYT?

gemini-code-assist Bot reviewed Jan 23, 2026

View reviewed changes

Comment thread lmcache/v1/cache_engine.py

Comment thread lmcache/v1/cache_engine.py

hlin99 added 2 commits January 25, 2026 10:34

Merge branch 'dev' into key_polling

29030d5

Merge branch 'dev' into key_polling

535b117

hlin99 mentioned this pull request Feb 3, 2026

[Bugfix] fix crash in wait_for_save when retrieve fail from lmcache_engine #2516

Merged

2 tasks

Merge branch 'dev' into key_polling

af17fdb

maobaolong reviewed Feb 16, 2026

View reviewed changes

add new params in doc

3858caf

Signed-off-by: Tony Lin <tony.lin@intel.com>

hlin99 force-pushed the key_polling branch from 28e3456 to 3858caf Compare February 23, 2026 15:34

Merge branch 'dev' into key_polling

c9ab980

hlin99 added 2 commits February 24, 2026 08:41

Merge branch 'dev' into key_polling

c95c85c

Merge branch 'dev' into key_polling

97c4a70

Merge branch 'dev' into key_polling

c9daa86

sammshen reviewed Feb 24, 2026

View reviewed changes

hlin99 added 3 commits February 27, 2026 08:45

Merge branch 'dev' into key_polling

8bc13d0

Merge branch 'dev' into key_polling

884a902

Merge branch 'dev' into key_polling

7ca02e3

sammshen added the full Run comprehensive tests on this PR label Mar 4, 2026

sammshen requested changes Mar 4, 2026

View reviewed changes

Copilot AI mentioned this pull request Mar 5, 2026

key_polling: fix time.sleep placement and document blocking behavior hlin99/LMCache#21

Closed

Merge branch 'dev' into key_polling

2f2f380

Merge branch 'dev' into key_polling

5102fa2

hlin99 mentioned this pull request Mar 14, 2026

LMCache hit tokens are 0 even though KV Cache tokens are being stored at the MoonCake Master Server #2493

Open

hlin99 closed this Apr 15, 2026

Conversation

hlin99 commented Jan 23, 2026

Uh oh!

gemini-code-assist Bot commented Jan 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

hlin99 commented Feb 2, 2026

Uh oh!

maobaolong left a comment

Choose a reason for hiding this comment

Uh oh!

hlin99 commented Feb 17, 2026

Uh oh!

hlin99 commented Feb 23, 2026

Uh oh!

sammshen commented Feb 24, 2026

Uh oh!

hlin99 commented Feb 24, 2026

Uh oh!

sammshen Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

sammshen commented Feb 24, 2026

Uh oh!

hlin99 commented Feb 25, 2026

Uh oh!

hlin99 commented Feb 25, 2026

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants