[Operator] Add L2 RESP (Redis/Valkey) adapter support#2967
[Operator] Add L2 RESP (Redis/Valkey) adapter support#2967royyhuang merged 9 commits intoLMCache:devfrom
Conversation
Add first-class support for Redis/Valkey as an L2 storage backend in the LMCache operator, with cross-namespace secret management and Buildkite CI skip logic for operator-only PRs. L2 Backend: - Replace generic l2Backends list with typed l2Backend (single adapter) - Add RESPL2AdapterSpec with host, port, numWorkers, maxCapacityGB, authSecretRef - Add RawL2AdapterSpec as escape hatch for other adapter types (nixl_store, fs, mock) - Add storePolicy, prefetchPolicy, prefetchMaxInFlight to L2BackendSpec - CRD validation for RESP fields and exactly-one-of constraint Auth Secret Management: - Cross-namespace SecretReference (name + optional namespace) - Controller copies source secret to engine namespace as managed secret with ownerRef - DaemonSet references local copy via secretKeyRef (credentials never in pod args) - Username is optional (Optional: true) for password-only Redis Enterprise auth Server: - Switch entrypoint from python3 -m to lmcache server CLI - Add httpPort to ServerSpec (default 8080) for HTTP frontend - Expose both ZMQ and HTTP ports in lookup Service and DaemonSet Reconciler Fixes: - Re-fetch engine before status update to avoid resourceVersion conflicts - Return done=true after finalizer addition to prevent stale object usage Buildkite CI: - Add is-operator-only.sh script to skip Python/CUDA tests for operator-only PRs - Steps still trigger and report green so GitHub required checks pass - Add operator/ as safe path in should-run-comprehensive.sh Docs: - Update README with L2 Redis examples, raw escape hatch, single-adapter caveat - Add /dev/shm warning for both LMCache and vLLM pods - Add full reference sample CR with all fields commented out - Update DESIGN.md L2 section Signed-off-by: royyhuang <roy.y.huang@gmail.com>
There was a problem hiding this comment.
Code Review
This pull request introduces a test-skipping optimization for operator-only changes and refactors the LMCache operator to support a structured L2 backend configuration, specifically adding native Redis/Valkey (RESP) support with secure credential management. Key changes include the introduction of cross-namespace secret reconciliation, an updated container entrypoint, and improved controller stability through resource re-fetching. Review feedback recommends hardening the secret reconciliation logic by validating required keys and ensuring controller references are correctly applied during patches, alongside better error handling for JSON serialization.
- Validate source secret has required 'password' key before copying
- Set ownerRef on existing secret during patch (not just on desired)
- Handle json.Marshal errors in L2 adapter JSON builders
- Add Owns(&corev1.Secret{}) watch so secret changes trigger reconcile
- Remove unused NeedsCrossNamespaceSecret function
Signed-off-by: royyhuang <roy.y.huang@gmail.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 64c6f17. Configure here.
| patch := client.MergeFrom(existing.DeepCopy()) | ||
| existing.Data = desired.Data | ||
| existing.Labels = desired.Labels | ||
| return r.Patch(ctx, existing, patch) |
There was a problem hiding this comment.
OwnerRef change lost due to wrong DeepCopy ordering
Medium Severity
In reconcileRESPAuthSecret, SetControllerReference is called on existing before client.MergeFrom(existing.DeepCopy()). The DeepCopy captures the already-mutated state (with ownerRef), so the merge patch only includes Data and Labels changes — the ownerRef mutation is invisible to the diff and never persisted. All other reconcile methods in this file avoid this by setting the ownerRef on desired instead of existing. The DeepCopy call needs to happen before SetControllerReference for the ownerRef to be included in the patch.
Reviewed by Cursor Bugbot for commit 64c6f17. Configure here.
Signed-off-by: royyhuang <roy.y.huang@gmail.com>
|
@ruizhang0101 PTAL. |
There was a problem hiding this comment.
Can this support multiple L2 adapters?
EDIT: It is documented that this PR only supports one single L2 adapter, but I am wondering can we support multiple adapters for this PR as well or it will be added as a following PR soon since this will be really useful for multi-tenant.
PS: Maybe not relevant to this PR, but if there are multiple l2 adapters, it would be great to have ID for each adapters in the lmcache and also operator to differentiate them.
I feel it would be better to support in another PR since I am not sure if multi-adapter has been thoroughly tested and validated. There could be some questions like you brought up regarding adding adapter ID need to be addressed. So that definitely worth another PR. |
* [Operator] Add L2 RESP (Redis/Valkey) adapter support
Add first-class support for Redis/Valkey as an L2 storage backend in the
LMCache operator, with cross-namespace secret management and Buildkite CI
skip logic for operator-only PRs.
L2 Backend:
- Replace generic l2Backends list with typed l2Backend (single adapter)
- Add RESPL2AdapterSpec with host, port, numWorkers, maxCapacityGB, authSecretRef
- Add RawL2AdapterSpec as escape hatch for other adapter types (nixl_store, fs, mock)
- Add storePolicy, prefetchPolicy, prefetchMaxInFlight to L2BackendSpec
- CRD validation for RESP fields and exactly-one-of constraint
Auth Secret Management:
- Cross-namespace SecretReference (name + optional namespace)
- Controller copies source secret to engine namespace as managed secret with ownerRef
- DaemonSet references local copy via secretKeyRef (credentials never in pod args)
- Username is optional (Optional: true) for password-only Redis Enterprise auth
Server:
- Switch entrypoint from python3 -m to lmcache server CLI
- Add httpPort to ServerSpec (default 8080) for HTTP frontend
- Expose both ZMQ and HTTP ports in lookup Service and DaemonSet
Reconciler Fixes:
- Re-fetch engine before status update to avoid resourceVersion conflicts
- Return done=true after finalizer addition to prevent stale object usage
Buildkite CI:
- Add is-operator-only.sh script to skip Python/CUDA tests for operator-only PRs
- Steps still trigger and report green so GitHub required checks pass
- Add operator/ as safe path in should-run-comprehensive.sh
Docs:
- Update README with L2 Redis examples, raw escape hatch, single-adapter caveat
- Add /dev/shm warning for both LMCache and vLLM pods
- Add full reference sample CR with all fields commented out
- Update DESIGN.md L2 section
Signed-off-by: royyhuang <roy.y.huang@gmail.com>
* [Operator] Address review feedback from Gemini and Cursor bots
- Validate source secret has required 'password' key before copying
- Set ownerRef on existing secret during patch (not just on desired)
- Handle json.Marshal errors in L2 adapter JSON builders
- Add Owns(&corev1.Secret{}) watch so secret changes trigger reconcile
- Remove unused NeedsCrossNamespaceSecret function
Signed-off-by: royyhuang <roy.y.huang@gmail.com>
* [Operator] Revert Buildkite CI changes (will be in separate PR)
Signed-off-by: royyhuang <roy.y.huang@gmail.com>
---------
Signed-off-by: royyhuang <roy.y.huang@gmail.com>
* [Operator] Add L2 RESP (Redis/Valkey) adapter support
Add first-class support for Redis/Valkey as an L2 storage backend in the
LMCache operator, with cross-namespace secret management and Buildkite CI
skip logic for operator-only PRs.
L2 Backend:
- Replace generic l2Backends list with typed l2Backend (single adapter)
- Add RESPL2AdapterSpec with host, port, numWorkers, maxCapacityGB, authSecretRef
- Add RawL2AdapterSpec as escape hatch for other adapter types (nixl_store, fs, mock)
- Add storePolicy, prefetchPolicy, prefetchMaxInFlight to L2BackendSpec
- CRD validation for RESP fields and exactly-one-of constraint
Auth Secret Management:
- Cross-namespace SecretReference (name + optional namespace)
- Controller copies source secret to engine namespace as managed secret with ownerRef
- DaemonSet references local copy via secretKeyRef (credentials never in pod args)
- Username is optional (Optional: true) for password-only Redis Enterprise auth
Server:
- Switch entrypoint from python3 -m to lmcache server CLI
- Add httpPort to ServerSpec (default 8080) for HTTP frontend
- Expose both ZMQ and HTTP ports in lookup Service and DaemonSet
Reconciler Fixes:
- Re-fetch engine before status update to avoid resourceVersion conflicts
- Return done=true after finalizer addition to prevent stale object usage
Buildkite CI:
- Add is-operator-only.sh script to skip Python/CUDA tests for operator-only PRs
- Steps still trigger and report green so GitHub required checks pass
- Add operator/ as safe path in should-run-comprehensive.sh
Docs:
- Update README with L2 Redis examples, raw escape hatch, single-adapter caveat
- Add /dev/shm warning for both LMCache and vLLM pods
- Add full reference sample CR with all fields commented out
- Update DESIGN.md L2 section
Signed-off-by: royyhuang <roy.y.huang@gmail.com>
* [Operator] Address review feedback from Gemini and Cursor bots
- Validate source secret has required 'password' key before copying
- Set ownerRef on existing secret during patch (not just on desired)
- Handle json.Marshal errors in L2 adapter JSON builders
- Add Owns(&corev1.Secret{}) watch so secret changes trigger reconcile
- Remove unused NeedsCrossNamespaceSecret function
Signed-off-by: royyhuang <roy.y.huang@gmail.com>
* [Operator] Revert Buildkite CI changes (will be in separate PR)
Signed-off-by: royyhuang <roy.y.huang@gmail.com>
---------
Signed-off-by: royyhuang <roy.y.huang@gmail.com>


Summary
python3 -mtolmcache serverCLI, expose both ZMQ and HTTP ports in ServiceDetails
L2 Backend
l2Backendfield (singular, single adapter for now) replaces genericl2BackendslistRESPL2AdapterSpec:host,port,numWorkers,maxCapacityGB,authSecretRefRawL2AdapterSpec: escape hatch fornixl_store,fs,mock, etc.storePolicy/prefetchPolicy/prefetchMaxInFlightfor L2 pipeline controlCross-Namespace Auth Secrets
SecretReferencewith optionalnamespacefieldsecretKeyRefon local copy (credentials never in pod args orkubectl describe)Optional: true) for password-only Redis Enterprise authServer & Service
/opt/venv/bin/lmcache serverinstead ofpython3 -m lmcache.v1.multiprocess.serverhttpPortfield (default 8080) for HTTP frontend (health checks, admin API)server(ZMQ) andhttpportsReconciler Fixes
Status().Update()to avoid conflicts from Owns watch eventsdone=trueafter finalizer addition to prevent stale resourceVersionBuildkite CI
is-operator-only.sh: detects operator-only PRs, steps report green without running testsoperator/*added as safe path inshould-run-comprehensive.shTest plan
make test— all unit tests pass (api, controller, resources)make lint— 0 issuesmake runagainst live cluster with Redis Enterprise--l2-adapterargs, env var injection, cross-namespace secret copyNote
Medium Risk
Introduces a CRD shape change (
l2Backends→l2Backend) and new secret-handling logic (cross-namespace copy + env injection), which can break existing manifests and affects credential management. Also changes the LMCache DaemonSet entrypoint and exposed Service ports, so rollout should be validated against running deployments.Overview
Adds a new singular
spec.l2Backendwith typed RESP (Redis/Valkey) configuration, arawescape hatch for other adapters, and L2 store/prefetch policy flags; updates validation/tests and regenerates CRD/deepcopy accordingly.The controller now manages RESP auth via Secrets (including cross-namespace source → managed local copy) and injects credentials as env vars; RBAC and controller ownership are updated to reconcile
Secretresources.Switches the LMCache DaemonSet to run
/opt/venv/bin/lmcache server, addsserver.httpPort(default8080), and exposes both ZMQ and HTTP ports on the lookup Service; docs and sample manifests are updated, and status/finalizer handling is adjusted to avoidresourceVersionconflicts.Reviewed by Cursor Bugbot for commit 3882f1e. Bugbot is set up for automated code reviews on this repo. Configure here.