feat(provider): pause redis-operator reconciliation during StatefulSet scale-to-zero#963
Merged
acouvreur merged 3 commits intoJun 5, 2026
Conversation
…Set scale-to-zero The OT-CONTAINER-KIT redis-operator continuously reconciles its managed StatefulSets back to the desired replica count. When Sablier scales a redis-operator-owned StatefulSet to zero the operator immediately restores the replica count, making scale-to-zero ineffective. Before scaling a StatefulSet to zero, check whether it is controlled by a Redis CR (via ownerReferences). If so, set the redis.opstreelabs.in/skip-reconcile annotation on the owning Redis CR to pause the operator's reconciliation loop. After scaling back up, remove the annotation to restore normal operation. The annotation patch is best-effort: if it fails (e.g. the CRD is absent or the dynamic client is unconfigured) a warning is logged and the scale proceeds unchanged, preserving existing behaviour for non-redis-operator StatefulSets. fix(kubernetes): address code review findings on redis-operator skip-reconcile Correctness: - Annotation no longer leaks when scale fails: a cleanup defer is registered immediately after setting skip-reconcile=true, and clears it if p.scale() returns an error. - Annotation is only set on the scale-to-zero path, not on scale-mode stops (sc.Idle.Replicas >= 1), so the operator is not paused while the StatefulSet remains non-zero. - Deferred annotation removal in InstanceStart now uses context.WithoutCancel so a cancelled request context cannot leave skip-reconcile set after a successful scale-up. - Both InstanceStop and InstanceStart log a warning when the StatefulSet fetch fails rather than silently skipping annotation management. Quality: - redisOperatorOwner matches on API group only (not the full version string), so the fix continues to work if the operator promotes from v1beta2 to v1. The now-redundant redisOperatorAPIVersion constant is removed. - The redundant StatefulSet GET on the stop path is eliminated by moving the annotation block to after getWorkloadLabels (which already fetches the StatefulSet), placing it immediately before p.scale(ctx, parsed, 0). - New unit tests in statefulset_redis_operator_test.go cover redisOperatorOwner (including forward-compat with v1), apiVersionGroup, setRedisOperatorSkipReconcile set/clear, no-op for plain StatefulSets, stop annotation set/cleanup-on-failure, and start annotation removal after successful scale.
0a93b92 to
72e5a43
Compare
acouvreur
reviewed
Jun 4, 2026
Installs a minimal Redis CRD into the shared k3s cluster (following the same pattern as the CNPG integration test) and creates a Redis CR plus a companion StatefulSet whose ownerReference points to that CR, simulating what the redis-operator would produce. The three sub-tests verify Sablier's behavior against a real API server: - stop: InstanceStop sets skip-reconcile on the Redis CR and scales the StatefulSet to 0 - inspect: InstanceInspect correctly reports the stopped state - start: InstanceStart scales back to 1 and clears the annotation No operator binary is run — the tests validate Sablier's own API interactions with real Kubernetes objects, not operator behavior.
646fae1 to
57bd7c4
Compare
Contributor
Author
|
@acouvreur I added The three sub-tests verify Sablier's behavior against a real API server:
No operator binary is deployed — the tests validate Sablier's own API interactions with real Kubernetes objects, not operator behavior. Let me know if there are additional changes you'd like. |
acouvreur
approved these changes
Jun 5, 2026
acouvreur
left a comment
Member
There was a problem hiding this comment.
LGTM
Thank you for the pull request
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #962
What this does
I run a homelab where several applications sit behind Traefik with Sablier managing scale-to-zero. After the recent CloudNativePG support landed I started enabling it across my stack, and noticed Redis instances managed by the OT-CONTAINER-KIT redis-operator were the odd one out — Sablier would scale the StatefulSet to zero on session expiry, but the operator would immediately reconcile replicas back to 1.
The operator already ships a pause mechanism:
redis.opstreelabs.in/skip-reconcile: "true"on the Redis CR. Setting this before scaling to zero keeps the operator out of the way, and clearing it after scale-up hands control back.skip-reconcile: "true"on the owning Redis CR, then scale the StatefulSet to 0Design choices
rediskind. The operator propagates labels (includingsablier.enable/sablier.group) from the Redis CR to the StatefulSet it creates, so existing label-based opt-in works without any change to how users configure Sablier.sablier.idle.replicas >= 1), where the StatefulSet stays non-zero and the operator should continue reconciling normally.redis.redis.opstreelabs.in), not on the specific version string, so it stays correct if the operator promotes to v1.Testing done
statefulset_redis_operator_test.go) covering owner detection (including forward-compatibility with a future v1 APIVersion), annotation set/clear, no-op for plain StatefulSets, stop annotation set with cleanup-on-failure, and start annotation removal.make fmt,golangci-lint run, and the fullpkg/provider/kubernetestest suite pass.skip-reconcile: "true"set; the operator logs the annotation and stands down. A new request scales back to 1 and the annotation is removed. The redis-operator resumes normally.