[MP] fix: add thread safety to Session for concurrent TP worker access#2807
[MP] fix: add thread safety to Session for concurrent TP worker access#2807chunxiaozheng merged 1 commit intoLMCache:devfrom
Conversation
Multiple TP worker threads concurrently call set_tokens() and get_hashes() on the same Session object (same request_id), causing race conditions in _compute_hash() that corrupt chunk_hashes (duplicates, wrong rolling state). This leads to ObjectKey mismatch between sync_lookup/store and retrieve, resulting in KEY_NOT_EXIST errors during unsafe_read. Add threading.Lock to Session protecting set_tokens and get_hashes.
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of the Session object by resolving critical race conditions that previously led to data corruption and operational failures during concurrent access from multiple worker threads. The implementation of thread-safe mechanisms ensures the integrity of token and hash management, thereby improving the overall reliability and stability of the system when processing requests in parallel. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly identifies and addresses a race condition in the Session class by introducing a threading.Lock to protect concurrent access to set_tokens and get_hashes. This is a crucial fix for ensuring thread safety when multiple worker threads access the same session. The added multithreading tests in test_session.py are a valuable addition for verifying this fix. However, my review identifies a related logical issue in set_tokens where the hashing state is not correctly reset when the token sequence is modified in a non-append-only manner. I have provided a suggestion to make the state management more robust, which will prevent potential data corruption.
|
ping @Oasis-Git |
LMCache#2807) fix: add thread safety to Session for concurrent TP worker access Multiple TP worker threads concurrently call set_tokens() and get_hashes() on the same Session object (same request_id), causing race conditions in _compute_hash() that corrupt chunk_hashes (duplicates, wrong rolling state). This leads to ObjectKey mismatch between sync_lookup/store and retrieve, resulting in KEY_NOT_EXIST errors during unsafe_read. Add threading.Lock to Session protecting set_tokens and get_hashes.
LMCache#2807) fix: add thread safety to Session for concurrent TP worker access Multiple TP worker threads concurrently call set_tokens() and get_hashes() on the same Session object (same request_id), causing race conditions in _compute_hash() that corrupt chunk_hashes (duplicates, wrong rolling state). This leads to ObjectKey mismatch between sync_lookup/store and retrieve, resulting in KEY_NOT_EXIST errors during unsafe_read. Add threading.Lock to Session protecting set_tokens and get_hashes. Signed-off-by: Aaron Wu <aaron.wu@dell.com>
LMCache#2807) fix: add thread safety to Session for concurrent TP worker access Multiple TP worker threads concurrently call set_tokens() and get_hashes() on the same Session object (same request_id), causing race conditions in _compute_hash() that corrupt chunk_hashes (duplicates, wrong rolling state). This leads to ObjectKey mismatch between sync_lookup/store and retrieve, resulting in KEY_NOT_EXIST errors during unsafe_read. Add threading.Lock to Session protecting set_tokens and get_hashes.
LMCache#2807) fix: add thread safety to Session for concurrent TP worker access Multiple TP worker threads concurrently call set_tokens() and get_hashes() on the same Session object (same request_id), causing race conditions in _compute_hash() that corrupt chunk_hashes (duplicates, wrong rolling state). This leads to ObjectKey mismatch between sync_lookup/store and retrieve, resulting in KEY_NOT_EXIST errors during unsafe_read. Add threading.Lock to Session protecting set_tokens and get_hashes.
LMCache#2807) fix: add thread safety to Session for concurrent TP worker access Multiple TP worker threads concurrently call set_tokens() and get_hashes() on the same Session object (same request_id), causing race conditions in _compute_hash() that corrupt chunk_hashes (duplicates, wrong rolling state). This leads to ObjectKey mismatch between sync_lookup/store and retrieve, resulting in KEY_NOT_EXIST errors during unsafe_read. Add threading.Lock to Session protecting set_tokens and get_hashes.
LMCache#2807) fix: add thread safety to Session for concurrent TP worker access Multiple TP worker threads concurrently call set_tokens() and get_hashes() on the same Session object (same request_id), causing race conditions in _compute_hash() that corrupt chunk_hashes (duplicates, wrong rolling state). This leads to ObjectKey mismatch between sync_lookup/store and retrieve, resulting in KEY_NOT_EXIST errors during unsafe_read. Add threading.Lock to Session protecting set_tokens and get_hashes.
LMCache#2807) fix: add thread safety to Session for concurrent TP worker access Multiple TP worker threads concurrently call set_tokens() and get_hashes() on the same Session object (same request_id), causing race conditions in _compute_hash() that corrupt chunk_hashes (duplicates, wrong rolling state). This leads to ObjectKey mismatch between sync_lookup/store and retrieve, resulting in KEY_NOT_EXIST errors during unsafe_read. Add threading.Lock to Session protecting set_tokens and get_hashes.
Multiple TP worker threads concurrently call set_tokens() and get_hashes() on the same Session object (same request_id), causing race conditions in _compute_hash() that corrupt chunk_hashes (duplicates, wrong rolling state). This leads to ObjectKey mismatch between sync_lookup/store and retrieve, resulting in KEY_NOT_EXIST errors during unsafe_read.
Add threading.Lock to Session protecting set_tokens and get_hashes.
What this PR does / why we need it:
Special notes for your reviewers:
If applicable: