Skip to content

Pods should opt-in to webhook mutation directly #601

@enj

Description

@enj

Is your feature request related to a problem? Please describe.

Today, the default configuration of the webhook is to intercept all pod mutations:

failurePolicy: {{ .Values.mutatingWebhookFailurePolicy }}
matchPolicy: Equivalent
name: mutation.azure-workload-identity.io
objectSelector: {{- toYaml .Values.mutatingWebhookObjectSelector | nindent 4 }}
rules:
- apiGroups:
- ""
apiVersions:
- v1
operations:
- CREATE
- UPDATE
resources:
- pods

and the default mutatingWebhookObjectSelector is empty:

| mutatingWebhookObjectSelector | The label selector to further refine which namespaced resources will be selected by the webhook. | `` |

The webhook uses the service account associated with the pod to filter the mutation logic:

// check if the service account has the annotation
if !isServiceAccountAnnotated(serviceAccount) {
logger.Info("service account not annotated")
return admission.Allowed("service account not annotated")
}

While a user could specify a custom mutatingWebhookObjectSelector and mutatingWebhookFailurePolicy, I believe we can have better defaults around this flow. The webhook should only be invoked if we actually expect to mutate the pod. This will reduce the latency impact of the webhook and allow us to fail closed (not as a security boundary, but instead to prevent pods from starting in an unexpected state).

Describe the solution you'd like

The defaults should be changed so that:

objectSelector -> azure.workload.identity/use

UseWorkloadIdentityLabel = "azure.workload.identity/use"

failurePolicy -> Fail

The webhook would be updated to mutate the pod if either pod or the service account has the label (eventually these checks will be removed altogether and the webhook would always mutate the pod).

To limit the impact of this breaking change, we could do a few things:

  • Attempt to issue warnings and audit annotations to help users know that this change is coming
  • Template the mutatingWebhookObjectSelector YAML so that it can be set to the empty string as a way to restore the old behavior for a period of time
  • Have the change behind a feature gate for some period of time (and the gate's default state would change from off to on over time)

Describe alternatives you've considered

Not sure what other options there are.

Additional context

xref #540

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions