feat: improve ValkeyConnector with cluster mode, TLS, and GLIDE optimizations#2790
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is an excellent pull request that significantly enhances the ValkeyConnector by adding cluster mode, TLS support, and a more performant, thread-pool-based architecture using the synchronous GLIDE client. The move to single-key storage is a great optimization. The code is well-structured, and the inclusion of detailed documentation, benchmarks, and comprehensive unit tests is highly appreciated. I have a couple of minor suggestions to improve the documentation's clarity and the code's robustness.
253f563 to
f0b53a8
Compare
sammshen
left a comment
There was a problem hiding this comment.
LGTM! thanks for the great work
a383866 to
f0cb3e8
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a significant architectural overhaul of the ValkeyConnector, replacing the previous async implementation with a high-performance synchronous GLIDE client backed by a thread pool. The changes add crucial features like cluster mode and TLS support, along with performance optimizations like single-key storage and priority scheduling, which are well-supported by the provided benchmarks. The code is well-structured, thoroughly tested, and documented. My feedback includes a minor suggestion to improve type hinting for better code clarity and maintainability.
…ata transfer - Add TLS support for ElastiCache Serverless (tls_enable config) - Leverage GLIDE SET with memoryview/bytearray support (valkey-glide#5492) and buffer GET (valkey-glide#5493) to reduce copies on large chunks - Per-thread GLIDE client pool with configurable worker count (valkey_num_workers, default 8) - Single-key storage (1 GET per chunk vs RedisClusterConnector's 2) - Priority scheduling via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT) - Update valkey.rst docs with config reference, TLS, and tuning sections - Add benchmark_l2.py for reliable L2 cache eviction testing - Add benchmarking report with full methodology and results Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster): - 70B 64k: 3,216ms (4.8x) vs RedisClusterConnector 5,794ms (2.7x) - 70B 8k: 505ms (4.4x) vs 796ms (3.0x) - Aggregate throughput: ~7.5 GB/s vs ~4.0 GB/s - TLS overhead: 7-8% at 64k context Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>
Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>
Head branch was pushed to by a user without write access
eaa1b49 to
a03f977
Compare
…izations (LMCache#2790) * feat: improve ValkeyConnector with cluster mode, TLS, and optimized data transfer - Add TLS support for ElastiCache Serverless (tls_enable config) - Leverage GLIDE SET with memoryview/bytearray support (valkey-glide#5492) and buffer GET (valkey-glide#5493) to reduce copies on large chunks - Per-thread GLIDE client pool with configurable worker count (valkey_num_workers, default 8) - Single-key storage (1 GET per chunk vs RedisClusterConnector's 2) - Priority scheduling via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT) - Update valkey.rst docs with config reference, TLS, and tuning sections - Add benchmark_l2.py for reliable L2 cache eviction testing - Add benchmarking report with full methodology and results Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster): - 70B 64k: 3,216ms (4.8x) vs RedisClusterConnector 5,794ms (2.7x) - 70B 8k: 505ms (4.4x) vs 796ms (3.0x) - Aggregate throughput: ~7.5 GB/s vs ~4.0 GB/s - TLS overhead: 7-8% at 64k context Signed-off-by: Omer Rubinstein <omerrubi@amazon.com> * fix: apply isort and ruff-format fixes for CI Signed-off-by: Omer Rubinstein <omerrubi@amazon.com> --------- Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>
…izations (LMCache#2790) * feat: improve ValkeyConnector with cluster mode, TLS, and optimized data transfer - Add TLS support for ElastiCache Serverless (tls_enable config) - Leverage GLIDE SET with memoryview/bytearray support (valkey-glide#5492) and buffer GET (valkey-glide#5493) to reduce copies on large chunks - Per-thread GLIDE client pool with configurable worker count (valkey_num_workers, default 8) - Single-key storage (1 GET per chunk vs RedisClusterConnector's 2) - Priority scheduling via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT) - Update valkey.rst docs with config reference, TLS, and tuning sections - Add benchmark_l2.py for reliable L2 cache eviction testing - Add benchmarking report with full methodology and results Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster): - 70B 64k: 3,216ms (4.8x) vs RedisClusterConnector 5,794ms (2.7x) - 70B 8k: 505ms (4.4x) vs 796ms (3.0x) - Aggregate throughput: ~7.5 GB/s vs ~4.0 GB/s - TLS overhead: 7-8% at 64k context Signed-off-by: Omer Rubinstein <omerrubi@amazon.com> * fix: apply isort and ruff-format fixes for CI Signed-off-by: Omer Rubinstein <omerrubi@amazon.com> --------- Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>
…izations (LMCache#2790) * feat: improve ValkeyConnector with cluster mode, TLS, and optimized data transfer - Add TLS support for ElastiCache Serverless (tls_enable config) - Leverage GLIDE SET with memoryview/bytearray support (valkey-glide#5492) and buffer GET (valkey-glide#5493) to reduce copies on large chunks - Per-thread GLIDE client pool with configurable worker count (valkey_num_workers, default 8) - Single-key storage (1 GET per chunk vs RedisClusterConnector's 2) - Priority scheduling via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT) - Update valkey.rst docs with config reference, TLS, and tuning sections - Add benchmark_l2.py for reliable L2 cache eviction testing - Add benchmarking report with full methodology and results Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster): - 70B 64k: 3,216ms (4.8x) vs RedisClusterConnector 5,794ms (2.7x) - 70B 8k: 505ms (4.4x) vs 796ms (3.0x) - Aggregate throughput: ~7.5 GB/s vs ~4.0 GB/s - TLS overhead: 7-8% at 64k context Signed-off-by: Omer Rubinstein <omerrubi@amazon.com> * fix: apply isort and ruff-format fixes for CI Signed-off-by: Omer Rubinstein <omerrubi@amazon.com> --------- Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>
…izations (LMCache#2790) * feat: improve ValkeyConnector with cluster mode, TLS, and optimized data transfer - Add TLS support for ElastiCache Serverless (tls_enable config) - Leverage GLIDE SET with memoryview/bytearray support (valkey-glide#5492) and buffer GET (valkey-glide#5493) to reduce copies on large chunks - Per-thread GLIDE client pool with configurable worker count (valkey_num_workers, default 8) - Single-key storage (1 GET per chunk vs RedisClusterConnector's 2) - Priority scheduling via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT) - Update valkey.rst docs with config reference, TLS, and tuning sections - Add benchmark_l2.py for reliable L2 cache eviction testing - Add benchmarking report with full methodology and results Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster): - 70B 64k: 3,216ms (4.8x) vs RedisClusterConnector 5,794ms (2.7x) - 70B 8k: 505ms (4.4x) vs 796ms (3.0x) - Aggregate throughput: ~7.5 GB/s vs ~4.0 GB/s - TLS overhead: 7-8% at 64k context Signed-off-by: Omer Rubinstein <omerrubi@amazon.com> * fix: apply isort and ruff-format fixes for CI Signed-off-by: Omer Rubinstein <omerrubi@amazon.com> --------- Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>
What this PR does / why we need it:
This PR improves the
ValkeyConnectorwith TLS support, and optimized large-value data transfer using valkey-glide.Key changes:
RedisClusterConnector.valkey_num_workerscontrols the number of worker threads (default 8), each with its own GLIDE client for parallel I/O.RedisClusterConnector's 2 GETs (metadata + kv_bytes), halving round-trips.AsyncPQExecutor(PEEK > PREFETCH > GET > PUT) so lookups aren't delayed behind bulk writes.Benchmarked on 70B TP=8 (
p4de.24xlarge, ElastiCache Valkey cluster),ValkeyConnectordelivers 1.6–1.8× faster L2 retrieval thanRedisClusterConnector:For full benchmarking methodology and results refer to
VALKEY_CONNECTOR_BENCHMARKING.mdSpecial notes for your reviewers:
valkey-gliderelease 2.3+ containing #5492 (SET withmemoryview/bytearraysupport) and #5493 (buffer GET). Falls back to standard GET + copy if buffer GET is unavailable.pq_executor.pychange:_shutdown_async→shutdown_async(private → public) so the connector can call it directly during teardown.If applicable: