Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
152 changes: 152 additions & 0 deletions Documentation/proposals/accepted/202405-agent-daemonset.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

* Owners:
* [haanhvu](https://github.com/haanhvu)
* [slashexx](https://github.com/slashexx)

Check warning on line 15 in Documentation/proposals/accepted/202405-agent-daemonset.md

View workflow job for this annotation

GitHub Actions / spell-check

Unknown word (slashexx)
* Status:
* `Accepted`
* Related Tickets:
Expand Down Expand Up @@ -84,6 +85,97 @@

We will add a new `mode` field that accepts either `StatefulSet` or `DaemonSet`, with `StatefulSet` being the default. If the DaemonSet mode is activated (`mode: DaemonSet`), all the unrelated fields listed above will not be accepted. In the MVP, we will simply fail the reconciliation if any of those fields are set. We will prevent users to directly switch from a live StatefulSet setup to DaemonSet, because that might break their workload if they forget to unset the unsupported fields. Following up, we will leverage validation rules with [Kubernetes' Common Expression Language (CEL)](https://kubernetes.io/docs/reference/using-api/cel/). Only then, we will allow switching from a live StatefulSet setup to DaemonSet. We already have an issue for CEL [here](https://github.com/prometheus-operator/prometheus-operator/issues/5079).

#### 6.1.1 CEL Validation rules

When `mode:DaemonSet`, the following CEL rules will be applied to prevent access to these fields:

- `replicas`
- `storage`
- `shards`
- `persistentVolumeClaimRetentionPolicy`
- `scrapeConfigSelector`
- `scrapeConfigNamespaceSelector`
- `probeSelector`
- `probeNamespaceSelector`
- `serviceMonitorSelector`
- `serviceMonitorNamespaceSelector`
- `additionalScrapeConfigs`

This is implemented by adding `x-kubernetes-validations` like:

```yaml
x-kubernetes-validations:
- rule: "self.mode == 'DaemonSet' ? !has(self.replicas) : true"
message: "replicas field is not allowed when mode is 'DaemonSet'"
- rule: "self.mode == 'DaemonSet' ? !has(self.storage) : true"
message: "storage field is not allowed when mode is 'DaemonSet'"
- rule: "self.mode == 'DaemonSet' ? !has(self.shards) : true"
message: "shards field is not allowed when mode is 'DaemonSet'"
- rule: "self.mode == 'DaemonSet' ? !has(self.persistentVolumeClaimRetentionPolicy) : true"
message: "persistentVolumeClaimRetentionPolicy field is not allowed when mode is 'DaemonSet'"
- rule: "!(has(self.mode) && self.mode == 'DaemonSet' && has(self.scrapeConfigSelector))"
message: "scrapeConfigSelector cannot be set when mode is DaemonSet"
- rule: "!(has(self.mode) && self.mode == 'DaemonSet' && has(self.probeSelector))"
message: "probeSelector cannot be set when mode is DaemonSet"
- rule: "!(has(self.mode) && self.mode == 'DaemonSet' && has(self.probeNamespaceSelector))"
message: "probeNamespaceSelector cannot be set when mode is DaemonSet"
- rule: "!(has(self.mode) && self.mode == 'DaemonSet' && has(self.scrapeConfigNamespaceSelector))"
message: "scrapeConfigNamespaceSelector cannot be set when mode is DaemonSet"
- rule: "!(has(self.mode) && self.mode == 'DaemonSet' && has(self.serviceMonitorSelector))"
message: "serviceMonitorSelector cannot be set when mode is DaemonSet"
- rule: "!(has(self.mode) && self.mode == 'DaemonSet' && has(self.serviceMonitorNamespaceSelector))"
message: "serviceMonitorNamespaceSelector cannot be set when mode is DaemonSet"
- rule: "!(has(self.mode) && self.mode == 'DaemonSet' && has(self.additionalScrapeConfigs))"
message: "additionalScrapeConfigs cannot be set when mode is DaemonSet"
```

#### 6.1.2 Runtime Validation Logic as Fallback

CEL validation will provide immediate feedback during `kubectl apply` but we will need runtime validation logic in the controller as a fallback mechanism. This fallback will be integrated directly in the `PrometheusAgent` reconciler loop.

This is mainly because :
1. CEL validation will require Kubernetes version 1.25+ and hence not all users might have CEL supported clusters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CEL has been there for quite some time now (1.25 is 3 years-old) so we can also update our min requirements (no cloud provider should be offering 1.25 by now).

Copy link
Member Author

@slashexx slashexx Aug 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonpasquier then should we drop the runtime validations entirely ? We already have a PR in place.

Or should we specify that while runtime validations exist, most validations will be handled by CEL ? Exceptions include bare metal environments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even when running self-managed Kubernetes you should be upgrading to a reasonably recent version :)

I don't have a strong opinion but I wouldn't mind if we need to require at least 1.25. Maybe a discussion item for the next office hours meeting.

2. This will provide an in-depth defense mechamnism against misconfigurations.

Check warning on line 138 in Documentation/proposals/accepted/202405-agent-daemonset.md

View workflow job for this annotation

GitHub Actions / spell-check

Unknown word (mechamnism)
3. More detailed error response in case the first layer of defense fails.

```go
if spec.Mode == "DaemonSet" {
if spec.Replicas != nil {
return fmt.Errorf("cannot configure replicas when using DaemonSet mode")

Check warning on line 144 in Documentation/proposals/accepted/202405-agent-daemonset.md

View workflow job for this annotation

GitHub Actions / spell-check

Unknown word (Errorf)
}
if spec.Storage != nil {
return fmt.Errorf("cannot configure storage when using DaemonSet mode")

Check warning on line 147 in Documentation/proposals/accepted/202405-agent-daemonset.md

View workflow job for this annotation

GitHub Actions / spell-check

Unknown word (Errorf)
}
if spec.Shards > 1 {
return fmt.Errorf("shards cannot be greater than 1 when mode is DaemonSet")

Check warning on line 150 in Documentation/proposals/accepted/202405-agent-daemonset.md

View workflow job for this annotation

GitHub Actions / spell-check

Unknown word (Errorf)
}
if spec.PersistentVolumeClaimRetentionPolicy != nil {
return fmt.Errorf("cannot configure persistentVolumeClaimRetentionPolicy when using DaemonSet mode")

Check warning on line 153 in Documentation/proposals/accepted/202405-agent-daemonset.md

View workflow job for this annotation

GitHub Actions / spell-check

Unknown word (Errorf)
}
if spec.ScrapeConfigSelector != nil {
return fmt.Errorf("cannot configure scrapeConfigSelector when using DaemonSet mode")

Check warning on line 156 in Documentation/proposals/accepted/202405-agent-daemonset.md

View workflow job for this annotation

GitHub Actions / spell-check

Unknown word (Errorf)
}
if spec.ProbeSelector != nil {
return fmt.Errorf("cannot configure probeSelector when using DaemonSet mode")
}
if spec.ProbeNamespaceSelector != nil {
return fmt.Errorf("cannot configure probeNamespaceSelector when using DaemonSet mode")
}
if spec.ScrapeConfigNamespaceSelector != nil {
return fmt.Errorf("cannot configure scrapeConfigNamespaceSelector when using DaemonSet mode")
}
if spec.ServiceMonitorSelector != nil {
return fmt.Errorf("cannot configure serviceMonitorSelector when using DaemonSet mode")
}
if spec.ServiceMonitorNamespaceSelector != nil {
return fmt.Errorf("cannot configure serviceMonitorNamespaceSelector when using DaemonSet mode")
}
if spec.AdditionalScrapeConfigs != nil {
return fmt.Errorf("cannot configure additionalScrapeConfigs when using DaemonSet mode")
}
}
```

### 6.2. Node detecting:

As pointed out in [Danny from GMP’s talk](https://www.youtube.com/watch?v=yk2aaAyxgKw), to make Prometheus Agent DaemonSet know which node it’s on, we can use [Kubernetes’ downward API](https://kubernetes.io/docs/tasks/inject-data-application/environment-variable-expose-pod-information/). In `config-reloader` container, we can mount the node name as an environment variable like this:
Expand Down Expand Up @@ -115,6 +207,66 @@

We've also considered using relabel config that filters pods by `__meta_kubernetes_pod_node_name` label. However, we didn't choose to go with this option because it filters pods only after discovering all the pods from PodMonitor, which increases load on Kubernetes API server.

## Secondary/Extended goal (new feature gate)

> **Note:** We are exploring the integration of ServiceMonitor support for DaemonSet mode using EndpointSlice as an experimental feature. This exploration will determine feasibility and performance, and if viable, it may be introduced behind a separate feature gate. This approach allows the main DaemonSet mode to reach GA independently of this feature.

### ServiceMonitor Support with EndpointSlice

To enable ServiceMonitor support for DaemonSet mode while addressing the performance concerns mentioned in section 5, we implement EndpointSlice-based service discovery:

#### EndpointSlice Discovery Implementation

The PrometheusAgent CRD already supports a `serviceDiscoveryRole` field that can be set to `EndpointSlice`:

```yaml
apiVersion: monitoring.coreos.com/v1alpha1
kind: PrometheusAgent
spec:
mode: DaemonSet
serviceDiscoveryRole: EndpointSlice # Use EndpointSlice instead of Endpoints
serviceMonitorSelector:
matchLabels:
team: platform
```

When `serviceDiscoveryRole: EndpointSlice` is specified, the generated Prometheus configuration will use:

```yaml
scrape_configs:
- job_name: serviceMonitor/default/my-service/0
kubernetes_sd_configs:
- role: endpointslice # Instead of "endpoints"
namespaces:
names: [default]
```

#### Performance Benefits

EndpointSlice provides significant performance improvements over classic Endpoints:
* **Scalability**: EndpointSlice objects are limited to 1000 endpoints each, preventing massive objects
* **Efficiency**: Multiple smaller objects reduce memory usage and network traffic
* **Parallel Processing**: Multiple EndpointSlice objects can be processed in parallel
* **Reduced API Server Load**: Less stress on Kubernetes API server with distributed endpoint information

#### Implementation Details

The implementation properly handles EndpointSlice support by checking both the user's `serviceDiscoveryRole` setting and cluster compatibility. The logic involves:

```go
// Check if THIS PrometheusAgent wants EndpointSlice discovery
cpf := p.GetCommonPrometheusFields()
if ptr.Deref(cpf.ServiceDiscoveryRole, monitoringv1.EndpointsRole) == monitoringv1.EndpointSliceRole {
if c.endpointSliceSupported {
opts = append(opts, prompkg.WithEndpointSliceSupport())

Check warning on line 261 in Documentation/proposals/accepted/202405-agent-daemonset.md

View workflow job for this annotation

GitHub Actions / spell-check

Unknown word (prompkg)
} else {
// Warn user that they want EndpointSlice but cluster doesn't support it
c.logger.Warn("EndpointSlice requested but not supported by Kubernetes cluster")
// Fall back to classic endpoints
}
}
```

## 7. Action Plan

For the implementation, we’ll do what we detailed in the How section. The common logics between StatefulSet and DaemonSet modes will be extracted into one place. We will have a separate `daemonset.go` for the separate logic of the DaemonSet mode.
Expand Down
Loading