feat: improve ValkeyConnector with cluster mode, TLS, and GLIDE optimizations by omerrubi-amzn · Pull Request #2790 · LMCache/LMCache

omerrubi-amzn · 2026-03-16T12:52:27Z

What this PR does / why we need it:

This PR improves the ValkeyConnector with TLS support, and optimized large-value data transfer using valkey-glide.

Key changes:

TLS support — enables connections to TLS-enabled clusters, including ElastiCache Serverless (which requires TLS). This was not possible with RedisClusterConnector.
Optimized large-value handling — leverages two GLIDE PRs that optimize memory copies on large KV cache chunks: feat(python-sync): enable zero-copy SET by accepting bytearray and memoryview in command args valkey-io/valkey-glide#5492 feat(python-sync): add response buffer support to get() to improve performance by reducing copies valkey-io/valkey-glide#5493.
Configurable per-thread client pool — valkey_num_workers controls the number of worker threads (default 8), each with its own GLIDE client for parallel I/O.
Single-key storage — 1 GET per chunk vs RedisClusterConnector's 2 GETs (metadata + kv_bytes), halving round-trips.
Priority scheduling — operations dispatched via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT) so lookups aren't delayed behind bulk writes.

Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster), ValkeyConnector delivers 1.6–1.8× faster L2 retrieval than RedisClusterConnector:

	ValkeyConnector	RedisClusterConnector
70B 64k L2 TTFT	3,216ms (4.8×)	5,794ms (2.7×)
70B 8k L2 TTFT	505ms (4.4×)	796ms (3.0×)
8B 64k L2 TTFT	2,527ms (4.5×)	15,600ms (0.8×)
Aggregate throughput (70B 64k)	~7.5 GB/s	~4.0 GB/s
TLS / Serverless ElastiCache	✅	❌ Not supported

For full benchmarking methodology and results refer to VALKEY_CONNECTOR_BENCHMARKING.md

Special notes for your reviewers:

Requires valkey-glide release 2.3+ containing #5492 (SET with memoryview/bytearray support) and #5493 (buffer GET). Falls back to standard GET + copy if buffer GET is unavailable.
All benchmarks used the same Valkey cluster backends for both connectors — the performance difference is purely connector-side.
pq_executor.py change: _shutdown_async → shutdown_async (private → public) so the connector can call it directly during teardown.

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

gemini-code-assist · 2026-03-16T12:52:58Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ValkeyConnector by integrating advanced features like TLS, cluster mode, and optimized data transfer mechanisms. The changes aim to boost performance, particularly for large-value KV cache operations, and expand compatibility with modern Valkey deployments like ElastiCache Serverless. The refactoring consolidates previous connector logic and introduces a robust worker pool architecture, backed by comprehensive benchmarking and new unit tests.

Highlights

Performance Improvement: The ValkeyConnector now delivers 1.6–1.8x faster L2 retrieval compared to RedisClusterConnector, achieving up to 4.8x speedup over cold compute at 64k context. This is primarily due to single-key storage, parallel worker threads, and optimized large-value handling.
TLS Support: Added full TLS support, enabling connections to TLS-enabled clusters, including ElastiCache Serverless, which was previously unsupported by RedisClusterConnector. TLS overhead is minimal (7-8% at 64k context).
Optimized Large-Value Handling: Leverages valkey-glide PRs #5492 (zero-copy SET via bytearray/memoryview) and #5493 (buffer GET into pre-allocated memory) to significantly reduce memory copies for large KV cache chunks.
Configurable Per-Thread Client Pool: Introduced valkey_num_workers to control the number of worker threads, each with its own GLIDE client for parallel I/O, improving aggregate throughput.
Single-Key Storage: Switched to a single-key storage model, reducing round-trips to Valkey by half (1 GET per chunk vs. RedisClusterConnector's 2 GETs for metadata + kv_bytes).
Priority Scheduling: Operations are now dispatched via AsyncPQExecutor with priority scheduling (PEEK > PREFETCH > GET > PUT) to ensure latency-sensitive lookups are not delayed by bulk writes.
Changelog Visibility: The _shutdown_async method in pq_executor.py was made public as shutdown_async to allow direct calls during connector teardown.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

docs/source/kv_cache/storage_backends/valkey.rst
- Updated prerequisites to remove old pipelining text
- Added a new 'Configuration Reference' table detailing valkey_num_workers, valkey_mode, tls_enable, valkey_username, valkey_password, and valkey_database
- Included new example configurations for TLS/ElastiCache Serverless and performance tuning
examples/kv_cache_reuse/remote_backends/valkey/VALKEY_CONNECTOR_BENCHMARKING.md
- Added a new markdown document detailing the benchmarking methodology, setup, and results for the improved ValkeyConnector
examples/kv_cache_reuse/remote_backends/valkey/benchmark_l2.py
- Added a new Python script for end-to-end L2 benchmarking, including prompt generation and execution logic to ensure full L1 eviction
examples/kv_cache_reuse/remote_backends/valkey/valkey.yaml
- Added a new example YAML configuration file for the ValkeyConnector
lmcache/v1/storage_backend/connector/valkey_adapter.py
- Updated imports, removing List and Tuple and adding Optional
- Refactored create_connector to use a single ValkeyConnector class, consolidating logic for standalone and cluster modes
- Implemented parsing of extra_config for num_workers, username, password, tls_enable, valkey_mode, and database_id
- Added a warning log for valkey_database being ignored in cluster mode
lmcache/v1/storage_backend/connector/valkey_connector.py
- Completely refactored ValkeyConnector to use a _ThreadWorkerPool for managing per-thread GLIDE sync clients and a ThreadPoolExecutor
- Implemented single-key storage, eliminating the need for separate metadata and kv_bytes keys
- Added support for TLS, cluster mode, and configurable worker threads (num_workers)
- Integrated zero-copy buffer GET and memoryview for optimized data transfer, with fallback for older GLIDE versions
- Removed the deprecated ValkeyClusterConnector class
- Updated exists, get, put, batched_put, batched_get, batched_contains, batched_async_contains, and batched_get_non_blocking methods to leverage the new worker pool and AsyncPQExecutor
- Introduced BATCH_TIMEOUT_SECS and OP_TIMEOUT_SECS constants for operation timeouts
lmcache/v1/storage_backend/job_executor/pq_executor.py
- Renamed _shutdown_async to shutdown_async and updated its docstring, making it a public method
tests/conftest.py
- Added MockSyncGlideClient for in-memory mocking of synchronous Glide client behavior in tests
- Included a reset_store class method for MockSyncGlideClient to clear test data
tests/v1/storage_backend/test_valkey_connector.py
- Added a new comprehensive test file for ValkeyConnector
- Introduced MockThreadWorkerPool to simulate the worker pool and avoid external dependencies in tests
- Included tests for basic operations (exists, get, put), batch operations (batched_put, batched_get, batched_async_contains), handling of non-existent keys, sequential and concurrent operations, synchronous exists, batched contains prefix logic, different chunk sizes, pipelined batching exceeding worker count, worker scaling, and configuration passthrough for standalone/cluster/TLS modes
- Added tests for batched_get with partial misses and batched_get_non_blocking for prefix truncation and missing first keys

Activity

The ValkeyConnector has been significantly refactored to improve performance and add new capabilities.
New documentation has been added to explain the configuration options and provide usage examples.
A detailed benchmarking report has been included, demonstrating the performance gains of the new connector.
A new L2 benchmarking script has been introduced to facilitate performance validation.
Comprehensive unit tests have been added to cover the new functionality and ensure stability.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This is an excellent pull request that significantly enhances the ValkeyConnector by adding cluster mode, TLS support, and a more performant, thread-pool-based architecture using the synchronous GLIDE client. The move to single-key storage is a great optimization. The code is well-structured, and the inclusion of detailed documentation, benchmarks, and comprehensive unit tests is highly appreciated. I have a couple of minor suggestions to improve the documentation's clarity and the code's robustness.

sammshen

LGTM! thanks for the great work

deng451e

LGTM!

omerrubi-amzn · 2026-03-23T10:31:15Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a significant architectural overhaul of the ValkeyConnector, replacing the previous async implementation with a high-performance synchronous GLIDE client backed by a thread pool. The changes add crucial features like cluster mode and TLS support, along with performance optimizations like single-key storage and priority scheduling, which are well-supported by the provided benchmarks. The code is well-structured, thoroughly tested, and documented. My feedback includes a minor suggestion to improve type hinting for better code clarity and maintainability.

…ata transfer - Add TLS support for ElastiCache Serverless (tls_enable config) - Leverage GLIDE SET with memoryview/bytearray support (valkey-glide#5492) and buffer GET (valkey-glide#5493) to reduce copies on large chunks - Per-thread GLIDE client pool with configurable worker count (valkey_num_workers, default 8) - Single-key storage (1 GET per chunk vs RedisClusterConnector's 2) - Priority scheduling via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT) - Update valkey.rst docs with config reference, TLS, and tuning sections - Add benchmark_l2.py for reliable L2 cache eviction testing - Add benchmarking report with full methodology and results Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster): - 70B 64k: 3,216ms (4.8x) vs RedisClusterConnector 5,794ms (2.7x) - 70B 8k: 505ms (4.4x) vs 796ms (3.0x) - Aggregate throughput: ~7.5 GB/s vs ~4.0 GB/s - TLS overhead: 7-8% at 64k context Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>

…izations (LMCache#2790) * feat: improve ValkeyConnector with cluster mode, TLS, and optimized data transfer - Add TLS support for ElastiCache Serverless (tls_enable config) - Leverage GLIDE SET with memoryview/bytearray support (valkey-glide#5492) and buffer GET (valkey-glide#5493) to reduce copies on large chunks - Per-thread GLIDE client pool with configurable worker count (valkey_num_workers, default 8) - Single-key storage (1 GET per chunk vs RedisClusterConnector's 2) - Priority scheduling via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT) - Update valkey.rst docs with config reference, TLS, and tuning sections - Add benchmark_l2.py for reliable L2 cache eviction testing - Add benchmarking report with full methodology and results Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster): - 70B 64k: 3,216ms (4.8x) vs RedisClusterConnector 5,794ms (2.7x) - 70B 8k: 505ms (4.4x) vs 796ms (3.0x) - Aggregate throughput: ~7.5 GB/s vs ~4.0 GB/s - TLS overhead: 7-8% at 64k context Signed-off-by: Omer Rubinstein <omerrubi@amazon.com> * fix: apply isort and ruff-format fixes for CI Signed-off-by: Omer Rubinstein <omerrubi@amazon.com> --------- Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>

gemini-code-assist Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread examples/kv_cache_reuse/remote_backends/valkey/VALKEY_CONNECTOR_BENCHMARKING.md

Comment thread lmcache/v1/storage_backend/connector/valkey_connector.py Outdated

omerrubi-amzn force-pushed the feat/valkey-connector-improvements branch 5 times, most recently from 253f563 to f0b53a8 Compare March 19, 2026 10:19

sammshen approved these changes Mar 22, 2026

View reviewed changes

sammshen requested a review from deng451e March 22, 2026 17:53

deng451e approved these changes Mar 23, 2026

View reviewed changes

omerrubi-amzn changed the title ~~feat: improve ValkeyConnector with cluster mode, TLS, and optimized d…~~ feat: improve ValkeyConnector with cluster mode, TLS, and GLIDE optimizations Mar 23, 2026

omerrubi-amzn force-pushed the feat/valkey-connector-improvements branch 2 times, most recently from a383866 to f0cb3e8 Compare March 23, 2026 07:56

gemini-code-assist Bot reviewed Mar 23, 2026

View reviewed changes

Comment thread lmcache/v1/storage_backend/connector/valkey_connector.py

sammshen enabled auto-merge (squash) March 24, 2026 09:16

github-actions Bot added the full Run comprehensive tests on this PR label Mar 24, 2026

omerrubi-amzn added 2 commits March 24, 2026 13:28

fix: apply isort and ruff-format fixes for CI

a03f977

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>

auto-merge was automatically disabled March 24, 2026 11:28
Head branch was pushed to by a user without write access

omerrubi-amzn force-pushed the feat/valkey-connector-improvements branch from eaa1b49 to a03f977 Compare March 24, 2026 11:28

github-actions Bot removed the full Run comprehensive tests on this PR label Mar 24, 2026

sammshen enabled auto-merge (squash) March 24, 2026 19:10

github-actions Bot added the full Run comprehensive tests on this PR label Mar 24, 2026

sammshen merged commit f3abfcf into LMCache:dev Mar 24, 2026
35 of 36 checks passed

omerrubi-amzn deleted the feat/valkey-connector-improvements branch March 24, 2026 20:26

omerrubi-amzn mentioned this pull request Apr 29, 2026

Support TLS connection for Valkey #2311

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve ValkeyConnector with cluster mode, TLS, and GLIDE optimizations#2790

feat: improve ValkeyConnector with cluster mode, TLS, and GLIDE optimizations#2790
sammshen merged 2 commits intoLMCache:devfrom
omerrubi-amzn:feat/valkey-connector-improvements

omerrubi-amzn commented Mar 16, 2026

Uh oh!

gemini-code-assist Bot commented Mar 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

sammshen left a comment

Uh oh!

deng451e left a comment

Uh oh!

omerrubi-amzn commented Mar 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

omerrubi-amzn commented Mar 16, 2026

Uh oh!

gemini-code-assist Bot commented Mar 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

sammshen left a comment

Choose a reason for hiding this comment

Uh oh!

deng451e left a comment

Choose a reason for hiding this comment

Uh oh!

omerrubi-amzn commented Mar 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants