Skip to content

feat: improve ValkeyConnector with cluster mode, TLS, and GLIDE optimizations#2790

Merged
sammshen merged 2 commits intoLMCache:devfrom
omerrubi-amzn:feat/valkey-connector-improvements
Mar 24, 2026
Merged

feat: improve ValkeyConnector with cluster mode, TLS, and GLIDE optimizations#2790
sammshen merged 2 commits intoLMCache:devfrom
omerrubi-amzn:feat/valkey-connector-improvements

Conversation

@omerrubi-amzn
Copy link
Copy Markdown
Contributor

What this PR does / why we need it:

This PR improves the ValkeyConnector with TLS support, and optimized large-value data transfer using valkey-glide.

Key changes:

Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster), ValkeyConnector delivers 1.6–1.8× faster L2 retrieval than RedisClusterConnector:

ValkeyConnector RedisClusterConnector
70B 64k L2 TTFT 3,216ms (4.8×) 5,794ms (2.7×)
70B 8k L2 TTFT 505ms (4.4×) 796ms (3.0×)
8B 64k L2 TTFT 2,527ms (4.5×) 15,600ms (0.8×)
Aggregate throughput (70B 64k) ~7.5 GB/s ~4.0 GB/s
TLS / Serverless ElastiCache ❌ Not supported

For full benchmarking methodology and results refer to VALKEY_CONNECTOR_BENCHMARKING.md

Special notes for your reviewers:

  • Requires valkey-glide release 2.3+ containing #5492 (SET with memoryview/bytearray support) and #5493 (buffer GET). Falls back to standard GET + copy if buffer GET is unavailable.
  • All benchmarks used the same Valkey cluster backends for both connectors — the performance difference is purely connector-side.
  • pq_executor.py change: _shutdown_asyncshutdown_async (private → public) so the connector can call it directly during teardown.

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the ValkeyConnector by integrating advanced features like TLS, cluster mode, and optimized data transfer mechanisms. The changes aim to boost performance, particularly for large-value KV cache operations, and expand compatibility with modern Valkey deployments like ElastiCache Serverless. The refactoring consolidates previous connector logic and introduces a robust worker pool architecture, backed by comprehensive benchmarking and new unit tests.

Highlights

  • Performance Improvement: The ValkeyConnector now delivers 1.6–1.8x faster L2 retrieval compared to RedisClusterConnector, achieving up to 4.8x speedup over cold compute at 64k context. This is primarily due to single-key storage, parallel worker threads, and optimized large-value handling.
  • TLS Support: Added full TLS support, enabling connections to TLS-enabled clusters, including ElastiCache Serverless, which was previously unsupported by RedisClusterConnector. TLS overhead is minimal (7-8% at 64k context).
  • Optimized Large-Value Handling: Leverages valkey-glide PRs #5492 (zero-copy SET via bytearray/memoryview) and #5493 (buffer GET into pre-allocated memory) to significantly reduce memory copies for large KV cache chunks.
  • Configurable Per-Thread Client Pool: Introduced valkey_num_workers to control the number of worker threads, each with its own GLIDE client for parallel I/O, improving aggregate throughput.
  • Single-Key Storage: Switched to a single-key storage model, reducing round-trips to Valkey by half (1 GET per chunk vs. RedisClusterConnector's 2 GETs for metadata + kv_bytes).
  • Priority Scheduling: Operations are now dispatched via AsyncPQExecutor with priority scheduling (PEEK > PREFETCH > GET > PUT) to ensure latency-sensitive lookups are not delayed by bulk writes.
  • Changelog Visibility: The _shutdown_async method in pq_executor.py was made public as shutdown_async to allow direct calls during connector teardown.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • docs/source/kv_cache/storage_backends/valkey.rst
    • Updated prerequisites to remove old pipelining text
    • Added a new 'Configuration Reference' table detailing valkey_num_workers, valkey_mode, tls_enable, valkey_username, valkey_password, and valkey_database
    • Included new example configurations for TLS/ElastiCache Serverless and performance tuning
  • examples/kv_cache_reuse/remote_backends/valkey/VALKEY_CONNECTOR_BENCHMARKING.md
    • Added a new markdown document detailing the benchmarking methodology, setup, and results for the improved ValkeyConnector
  • examples/kv_cache_reuse/remote_backends/valkey/benchmark_l2.py
    • Added a new Python script for end-to-end L2 benchmarking, including prompt generation and execution logic to ensure full L1 eviction
  • examples/kv_cache_reuse/remote_backends/valkey/valkey.yaml
    • Added a new example YAML configuration file for the ValkeyConnector
  • lmcache/v1/storage_backend/connector/valkey_adapter.py
    • Updated imports, removing List and Tuple and adding Optional
    • Refactored create_connector to use a single ValkeyConnector class, consolidating logic for standalone and cluster modes
    • Implemented parsing of extra_config for num_workers, username, password, tls_enable, valkey_mode, and database_id
    • Added a warning log for valkey_database being ignored in cluster mode
  • lmcache/v1/storage_backend/connector/valkey_connector.py
    • Completely refactored ValkeyConnector to use a _ThreadWorkerPool for managing per-thread GLIDE sync clients and a ThreadPoolExecutor
    • Implemented single-key storage, eliminating the need for separate metadata and kv_bytes keys
    • Added support for TLS, cluster mode, and configurable worker threads (num_workers)
    • Integrated zero-copy buffer GET and memoryview for optimized data transfer, with fallback for older GLIDE versions
    • Removed the deprecated ValkeyClusterConnector class
    • Updated exists, get, put, batched_put, batched_get, batched_contains, batched_async_contains, and batched_get_non_blocking methods to leverage the new worker pool and AsyncPQExecutor
    • Introduced BATCH_TIMEOUT_SECS and OP_TIMEOUT_SECS constants for operation timeouts
  • lmcache/v1/storage_backend/job_executor/pq_executor.py
    • Renamed _shutdown_async to shutdown_async and updated its docstring, making it a public method
  • tests/conftest.py
    • Added MockSyncGlideClient for in-memory mocking of synchronous Glide client behavior in tests
    • Included a reset_store class method for MockSyncGlideClient to clear test data
  • tests/v1/storage_backend/test_valkey_connector.py
    • Added a new comprehensive test file for ValkeyConnector
    • Introduced MockThreadWorkerPool to simulate the worker pool and avoid external dependencies in tests
    • Included tests for basic operations (exists, get, put), batch operations (batched_put, batched_get, batched_async_contains), handling of non-existent keys, sequential and concurrent operations, synchronous exists, batched contains prefix logic, different chunk sizes, pipelined batching exceeding worker count, worker scaling, and configuration passthrough for standalone/cluster/TLS modes
    • Added tests for batched_get with partial misses and batched_get_non_blocking for prefix truncation and missing first keys
Activity
  • The ValkeyConnector has been significantly refactored to improve performance and add new capabilities.
  • New documentation has been added to explain the configuration options and provide usage examples.
  • A detailed benchmarking report has been included, demonstrating the performance gains of the new connector.
  • A new L2 benchmarking script has been introduced to facilitate performance validation.
  • Comprehensive unit tests have been added to cover the new functionality and ensure stability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent pull request that significantly enhances the ValkeyConnector by adding cluster mode, TLS support, and a more performant, thread-pool-based architecture using the synchronous GLIDE client. The move to single-key storage is a great optimization. The code is well-structured, and the inclusion of detailed documentation, benchmarks, and comprehensive unit tests is highly appreciated. I have a couple of minor suggestions to improve the documentation's clarity and the code's robustness.

Comment thread lmcache/v1/storage_backend/connector/valkey_connector.py Outdated
@omerrubi-amzn omerrubi-amzn force-pushed the feat/valkey-connector-improvements branch 5 times, most recently from 253f563 to f0b53a8 Compare March 19, 2026 10:19
Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks for the great work

@sammshen sammshen requested a review from deng451e March 22, 2026 17:53
Copy link
Copy Markdown
Collaborator

@deng451e deng451e left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@omerrubi-amzn omerrubi-amzn changed the title feat: improve ValkeyConnector with cluster mode, TLS, and optimized d… feat: improve ValkeyConnector with cluster mode, TLS, and GLIDE optimizations Mar 23, 2026
@omerrubi-amzn omerrubi-amzn force-pushed the feat/valkey-connector-improvements branch 2 times, most recently from a383866 to f0cb3e8 Compare March 23, 2026 07:56
@omerrubi-amzn
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant architectural overhaul of the ValkeyConnector, replacing the previous async implementation with a high-performance synchronous GLIDE client backed by a thread pool. The changes add crucial features like cluster mode and TLS support, along with performance optimizations like single-key storage and priority scheduling, which are well-supported by the provided benchmarks. The code is well-structured, thoroughly tested, and documented. My feedback includes a minor suggestion to improve type hinting for better code clarity and maintainability.

Comment thread lmcache/v1/storage_backend/connector/valkey_connector.py
@sammshen sammshen enabled auto-merge (squash) March 24, 2026 09:16
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 24, 2026
…ata transfer

- Add TLS support for ElastiCache Serverless (tls_enable config)
- Leverage GLIDE SET with memoryview/bytearray support (valkey-glide#5492)
  and buffer GET (valkey-glide#5493) to reduce copies on large chunks
- Per-thread GLIDE client pool with configurable worker count
  (valkey_num_workers, default 8)
- Single-key storage (1 GET per chunk vs RedisClusterConnector's 2)
- Priority scheduling via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT)
- Update valkey.rst docs with config reference, TLS, and tuning sections
- Add benchmark_l2.py for reliable L2 cache eviction testing
- Add benchmarking report with full methodology and results

Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster):
  - 70B 64k: 3,216ms (4.8x) vs RedisClusterConnector 5,794ms (2.7x)
  - 70B 8k:  505ms (4.4x)  vs 796ms (3.0x)
  - Aggregate throughput: ~7.5 GB/s vs ~4.0 GB/s
  - TLS overhead: 7-8% at 64k context

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>
Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>
auto-merge was automatically disabled March 24, 2026 11:28

Head branch was pushed to by a user without write access

@omerrubi-amzn omerrubi-amzn force-pushed the feat/valkey-connector-improvements branch from eaa1b49 to a03f977 Compare March 24, 2026 11:28
@github-actions github-actions Bot removed the full Run comprehensive tests on this PR label Mar 24, 2026
@sammshen sammshen enabled auto-merge (squash) March 24, 2026 19:10
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Mar 24, 2026
@sammshen sammshen merged commit f3abfcf into LMCache:dev Mar 24, 2026
35 of 36 checks passed
@omerrubi-amzn omerrubi-amzn deleted the feat/valkey-connector-improvements branch March 24, 2026 20:26
realAaronWu pushed a commit to realAaronWu/LMCache that referenced this pull request Mar 26, 2026
…izations (LMCache#2790)

* feat: improve ValkeyConnector with cluster mode, TLS, and optimized data transfer

- Add TLS support for ElastiCache Serverless (tls_enable config)
- Leverage GLIDE SET with memoryview/bytearray support (valkey-glide#5492)
  and buffer GET (valkey-glide#5493) to reduce copies on large chunks
- Per-thread GLIDE client pool with configurable worker count
  (valkey_num_workers, default 8)
- Single-key storage (1 GET per chunk vs RedisClusterConnector's 2)
- Priority scheduling via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT)
- Update valkey.rst docs with config reference, TLS, and tuning sections
- Add benchmark_l2.py for reliable L2 cache eviction testing
- Add benchmarking report with full methodology and results

Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster):
  - 70B 64k: 3,216ms (4.8x) vs RedisClusterConnector 5,794ms (2.7x)
  - 70B 8k:  505ms (4.4x)  vs 796ms (3.0x)
  - Aggregate throughput: ~7.5 GB/s vs ~4.0 GB/s
  - TLS overhead: 7-8% at 64k context

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>

* fix: apply isort and ruff-format fixes for CI

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>

---------

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>
deng451e pushed a commit to deng451e/LMCache that referenced this pull request Mar 27, 2026
…izations (LMCache#2790)

* feat: improve ValkeyConnector with cluster mode, TLS, and optimized data transfer

- Add TLS support for ElastiCache Serverless (tls_enable config)
- Leverage GLIDE SET with memoryview/bytearray support (valkey-glide#5492)
  and buffer GET (valkey-glide#5493) to reduce copies on large chunks
- Per-thread GLIDE client pool with configurable worker count
  (valkey_num_workers, default 8)
- Single-key storage (1 GET per chunk vs RedisClusterConnector's 2)
- Priority scheduling via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT)
- Update valkey.rst docs with config reference, TLS, and tuning sections
- Add benchmark_l2.py for reliable L2 cache eviction testing
- Add benchmarking report with full methodology and results

Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster):
  - 70B 64k: 3,216ms (4.8x) vs RedisClusterConnector 5,794ms (2.7x)
  - 70B 8k:  505ms (4.4x)  vs 796ms (3.0x)
  - Aggregate throughput: ~7.5 GB/s vs ~4.0 GB/s
  - TLS overhead: 7-8% at 64k context

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>

* fix: apply isort and ruff-format fixes for CI

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>

---------

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…izations (LMCache#2790)

* feat: improve ValkeyConnector with cluster mode, TLS, and optimized data transfer

- Add TLS support for ElastiCache Serverless (tls_enable config)
- Leverage GLIDE SET with memoryview/bytearray support (valkey-glide#5492)
  and buffer GET (valkey-glide#5493) to reduce copies on large chunks
- Per-thread GLIDE client pool with configurable worker count
  (valkey_num_workers, default 8)
- Single-key storage (1 GET per chunk vs RedisClusterConnector's 2)
- Priority scheduling via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT)
- Update valkey.rst docs with config reference, TLS, and tuning sections
- Add benchmark_l2.py for reliable L2 cache eviction testing
- Add benchmarking report with full methodology and results

Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster):
  - 70B 64k: 3,216ms (4.8x) vs RedisClusterConnector 5,794ms (2.7x)
  - 70B 8k:  505ms (4.4x)  vs 796ms (3.0x)
  - Aggregate throughput: ~7.5 GB/s vs ~4.0 GB/s
  - TLS overhead: 7-8% at 64k context

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>

* fix: apply isort and ruff-format fixes for CI

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>

---------

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>
jooho-XCENA pushed a commit to xcena-dev/LMCache that referenced this pull request Apr 2, 2026
…izations (LMCache#2790)

* feat: improve ValkeyConnector with cluster mode, TLS, and optimized data transfer

- Add TLS support for ElastiCache Serverless (tls_enable config)
- Leverage GLIDE SET with memoryview/bytearray support (valkey-glide#5492)
  and buffer GET (valkey-glide#5493) to reduce copies on large chunks
- Per-thread GLIDE client pool with configurable worker count
  (valkey_num_workers, default 8)
- Single-key storage (1 GET per chunk vs RedisClusterConnector's 2)
- Priority scheduling via AsyncPQExecutor (PEEK > PREFETCH > GET > PUT)
- Update valkey.rst docs with config reference, TLS, and tuning sections
- Add benchmark_l2.py for reliable L2 cache eviction testing
- Add benchmarking report with full methodology and results

Benchmarked on 70B TP=8 (p4de.24xlarge, ElastiCache Valkey cluster):
  - 70B 64k: 3,216ms (4.8x) vs RedisClusterConnector 5,794ms (2.7x)
  - 70B 8k:  505ms (4.4x)  vs 796ms (3.0x)
  - Aggregate throughput: ~7.5 GB/s vs ~4.0 GB/s
  - TLS overhead: 7-8% at 64k context

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>

* fix: apply isort and ruff-format fixes for CI

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>

---------

Signed-off-by: Omer Rubinstein <omerrubi@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants