Skip to content

Update grove-operator to use label-filtered cache for the pod informer #530

@joeltg

Description

@joeltg

What happened?

We have a large k8s cluster that has many tens of thousands of pods at all times. Only a very small number of these are Dynamo/Grove PodCliques. Currently, the Grove operator creates an unfiltered cluster-wide pod informer caches all pods in memory, meaning that it needs an excessively large memory reservation or risks getting OOMkilled.

What did you expect to happen?

The PodClique controller watches v1.Pod via .Owns(&corev1.Pod{}) (internal/controller/podclique/register.go:66). While event predicates correctly filter reconciliation to only grove-managed pods, the underlying controller-runtime cache still lists/watches all pods cluster-wide because no Cache config is set in createManagerOptions() (internal/controller/manager.go:87).

What grove could do instead is add a cache configuration with a label selector for the Pod GVK in createManagerOptions():

  opts.Cache = cache.Options{
      ByObject: map[client.Object]cache.ByObject{
          &corev1.Pod{}: {
              Label: labels.SelectorFromSet(labels.Set{
                  "app.kubernetes.io/managed-by": "grove-operator",
              }),
          },
      },
  }

This limits the Pod informer to only pods with the app.kubernetes.io/managed-by=grove-operator label, reducing the cache by many orders of magnitude.

Environment

  • Kubernetes version
  • Grove version: v0.1.0-alpha.7
  • Scheduler details
  • Cloud provider or hardware configuration
  • Tools that you are using Grove together with
  • Anything else that is relevant

Metadata

Metadata

Labels

Type

No fields configured for Bug.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions