Skip to content

Refactor: sequential key polling for distributed key synchronization#2478

Closed
hlin99 wants to merge 14 commits intoLMCache:devfrom
hlin99:key_polling
Closed

Refactor: sequential key polling for distributed key synchronization#2478
hlin99 wants to merge 14 commits intoLMCache:devfrom
hlin99:key_polling

Conversation

@hlin99
Copy link
Copy Markdown
Contributor

@hlin99 hlin99 commented Jan 23, 2026

Modified the cache lookup logic to handle synchronization delays in distributed environments. By introducing a polling mechanism, slow devices or lagging storage nodes are given a grace period to synchronize missing keys.

Key changes:

  • Introduced lookup_poll_key_timeout_ms and lookup_poll_key_intervals_ms to configure retry behavior (defaults to 0, maintaining legacy one-shot behavior).
  • Implemented a nested polling loop that ensures all layers of a specific chunk are synchronized (hit_chunks == self.num_layers) before advancing.
  • Integrated time.monotonic() for robust timeout tracking against distributed clock shifts.

This ensures data consistency and prefix-matching integrity in scenarios where storage layers exhibit eventual consistency.

Modified the cache lookup logic to handle synchronization delays in distributed
environments. By introducing a polling mechanism, slow devices or lagging
storage nodes are given a grace period to synchronize missing keys.

Key changes:
- Introduced `lookup_poll_key_timeout_ms` and `lookup_poll_key_intervals_ms` to
  configure retry behavior (defaults to 0, maintaining legacy one-shot behavior).
- Implemented a nested polling loop that ensures all layers of a specific
  chunk are synchronized (`hit_chunks == self.num_layers`) before advancing.
- Integrated `time.monotonic()` for robust timeout tracking against
  distributed clock shifts.

This ensures data consistency and prefix-matching integrity in scenarios
where storage layers exhibit eventual consistency.

Signed-off-by: Tony Lin <tony.lin@intel.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @hlin99, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the cache lookup mechanism to enhance robustness in distributed environments where storage layers might exhibit eventual consistency. By introducing a configurable polling strategy, the system can now gracefully handle temporary synchronization delays, ensuring that all necessary cache layers are available before a lookup completes, thereby improving data consistency and prefix-matching integrity.

Highlights

  • Polling Mechanism: Implemented a polling mechanism in the cache lookup logic to account for synchronization delays in distributed environments, ensuring data consistency.
  • Configurable Parameters: Introduced configurable lookup_poll_key_timeout_ms and lookup_poll_key_intervals_ms parameters to control the polling behavior, with defaults maintaining legacy one-shot behavior.
  • Layer Synchronization: Ensured complete synchronization of all cache layers for a given chunk by introducing a nested polling loop that waits until all layers are present.
  • Robust Timeout Tracking: Utilized time.monotonic() for accurate, drift-resistant timeout tracking, crucial for reliable operation in distributed systems.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a sequential key polling mechanism to improve data consistency in distributed environments, particularly for slow or lagging storage nodes. It adds lookup_poll_key_intervals_ms and lookup_poll_key_timeout_ms configuration parameters, defaulting to 0 to maintain existing one-shot lookup behavior. The implementation integrates time.monotonic() for robust timeout tracking. The changes are well-aligned with the stated goals of enhancing data consistency and prefix-matching integrity.

Comment thread lmcache/v1/cache_engine.py
Comment thread lmcache/v1/cache_engine.py
@hlin99
Copy link
Copy Markdown
Contributor Author

hlin99 commented Feb 2, 2026

hi @maobaolong @sammshen would you like to have a review and comment is this change can help increase the robustness and scalability of remote backends? I found such issues on mooncake backends where the receiver may have some latency to successfully poll the key. that's why adding polling timeout may help. Many thanks!

Copy link
Copy Markdown
Collaborator

@maobaolong maobaolong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hlin99 Would you like to add configuration introduce into the docs/source/api_reference/configurations.rst also?

@hlin99
Copy link
Copy Markdown
Contributor Author

hlin99 commented Feb 17, 2026

@hlin99 Would you like to add configuration introduce into the docs/source/api_reference/configurations.rst also?

Sure. Will update. Thanks for the comments!

Signed-off-by: Tony Lin <tony.lin@intel.com>
@hlin99
Copy link
Copy Markdown
Contributor Author

hlin99 commented Feb 23, 2026

@hlin99 Would you like to add configuration introduce into the docs/source/api_reference/configurations.rst also?

Sure. Will update. Thanks for the comments!

doc has been updated. please help review if anything else is needed. thanks. @maobaolong

@sammshen
Copy link
Copy Markdown
Contributor

do we want this or do we want lookup to fail fast?

@hlin99
Copy link
Copy Markdown
Contributor Author

hlin99 commented Feb 24, 2026

do we want this or do we want lookup to fail fast?

hi @sammshen, in some cases of remote backends + proxy server as PD system, the proxy server is not aware of when prefill put kv is done, then to notify decode. It relies on decode to poll kv from the remote. The latency is not high however current implementation w/o poll timeout can fail the get kv call. I encountered such an issue on mooncake + vLLM proxy server. And this PR is to offer a mechanism in LMCache to overcome the issue if others encounter the same.

else:
break

time.sleep(self.lookup_poll_key_intervals_ms / 1000)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lookup is synchronous so this could delay the scheduler

@sammshen
Copy link
Copy Markdown
Contributor

can you show the concrete example sorry I don't fully understand

@hlin99
Copy link
Copy Markdown
Contributor Author

hlin99 commented Feb 25, 2026

can you show the concrete example sorry I don't fully understand

hi @sammshen whether the issue exists depend on how the PD proxy server is implemented. a diagram is shown as below. if the proxy server gets notified at the same time of store KV, get KV from decode may happen earlier than KV is ready in the backend. hopefully, this clarifies the issue.


截屏2026-02-25 20 18 21

@hlin99
Copy link
Copy Markdown
Contributor Author

hlin99 commented Feb 25, 2026

can you show the concrete example sorry I don't fully understand

hi @sammshen whether the issue exists depend on how the PD proxy server is implemented. a diagram is shown as below. if the proxy server gets notified at the same time of store KV, get KV from decode may happen earlier than KV is ready in the backend. hopefully, this clarifies the issue.

截屏2026-02-25 20 18 21

the issue is not caused by LMCache. and it seems the proxy server in LMCache example is notified after store KV, so the issue won't be there, but proxy server can be implemented in diff ways... so this PR offers a configuration to fit any implementation by tuning configs.

@sammshen sammshen added the full Run comprehensive tests on this PR label Mar 4, 2026
Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm hesitant to approve this PR due to the while loop on the critical path lookup. I see the motivation but I don't see any issues directly otivating the need for this change. @maobaolong WDYT?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants