Skip to content

smon status update after reconcillation#7767

Merged
simonpasquier merged 35 commits intoprometheus-operator:mainfrom
yp969803:issue7685new
Aug 7, 2025
Merged

smon status update after reconcillation#7767
simonpasquier merged 35 commits intoprometheus-operator:mainfrom
yp969803:issue7685new

Conversation

@yp969803
Copy link
Contributor

@yp969803 yp969803 commented Aug 1, 2025

Description

smon status update after reconcillation

Closes: #7685

Type of change

What type of changes does your code introduce to the Prometheus operator? Put an x in the box that apply.

  • CHANGE (fix or feature that would cause existing functionality to not work as expected)
  • FEATURE (non-breaking change which adds functionality)
  • BUGFIX (non-breaking change which fixes an issue)
  • ENHANCEMENT (non-breaking change which improves existing functionality)
  • NONE (if none of the other choices apply. Example, tooling, build system, CI, docs, etc.)

Verification

Please check the Prometheus-Operator testing guidelines for recommendations about automated tests.

Changelog entry

Please put a one-line changelog entry below. This will be copied to the changelog file during the release process.


@yp969803
Copy link
Contributor Author

yp969803 commented Aug 1, 2025

@simonpasquier PTAL

// The map's key is the full resource name in the "<namespace>/<name>" form.
type ResourcesSelection[T configurationResource] map[string]struct {
// Resource is a generic type that holds a configuration resource with its validation status.
type Resource[T configurationResource] struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resource -> ConfigurationResource?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and configurationResource -> TypedConfigurationResource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

reason string // Reason for rejection; empty if accepted.
}

func NewResource[T configurationResource](resource T, err error, reason string) Resource[T] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to be used?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Group: ru.gvr.Group,
Conditions: []monitoringv1.ConfigResourceCondition{condition},
}
switch r := any(res.resource).(type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit awkward to have a generic type and then fallback to type-casting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done the changes

// resources selected by the Prometheus or PrometheusAgent.
func (ru *ConfigResourceSyncer[T]) AddStatus(ctx context.Context, p metav1.Object, resources ResourcesSelection[T]) error {
for key, res := range resources {
condition := monitoringv1.ConfigResourceCondition{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(suggestion) can we have a receiver function of Resource which would return a slice of monitoringv1.ConfigResourceCondition? E.g.

func (r *Resource) Conditions() []monitoringv1.ConfigResourceCondition {
  ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

binding.Conditions = []monitoringv1.ConfigResourceCondition{condition}
r.Status.Bindings = append(r.Status.Bindings, binding)
}
_, err := ru.mclient.MonitoringV1().ServiceMonitors(r.Namespace).ApplyStatus(ctx, ApplyConfigurationFromServiceMonitor(r), metav1.ApplyOptions{FieldManager: operator.PrometheusOperatorFieldManager, Force: true})
Copy link
Contributor

@simonpasquier simonpasquier Aug 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we should try not updating the status if nothing has changed. But in the scope of this PR, it's ok to blindly apply the new status.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
_, err := ru.mclient.MonitoringV1().ServiceMonitors(r.Namespace).ApplyStatus(ctx, ApplyConfigurationFromServiceMonitor(r), metav1.ApplyOptions{FieldManager: operator.PrometheusOperatorFieldManager, Force: true})
if err != nil {
ru.logger.Debug("Failed to update serviceMonitor status", "error", err, "key", key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll discuss further what's the best approach to deal with such errors (only log or retry?) but at the minimum it should be a warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok done

if binding.Namespace == p.GetNamespace() &&
binding.Name == p.GetName() &&
binding.Resource == monitoringv1.PrometheusName {
if binding.Conditions[0].ObservedGeneration != condition.ObservedGeneration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test isn't needed for now: the resource's generation might have not changed but still the resource's status has changed (e.g. a secret reference which was missing is now valid).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

_, err := ru.mclient.MonitoringV1().ServiceMonitors(r.Namespace).ApplyStatus(ctx, ApplyConfigurationFromServiceMonitor(r), metav1.ApplyOptions{FieldManager: operator.PrometheusOperatorFieldManager, Force: true})
if err != nil {
ru.logger.Debug("Failed to update serviceMonitor status", "error", err, "key", key)
return err
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we should either log the error or return it but not doing both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

return &pStatus, nil
}

func (r *ConfigurationResource[T]) condition(observedGeneration int64) []monitoringv1.ConfigResourceCondition {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this next to the declaration of ConfigurationResource?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return &pStatus, nil
}

func (r *ConfigurationResource[T]) condition(observedGeneration int64) []monitoringv1.ConfigResourceCondition {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit)

Suggested change
func (r *ConfigurationResource[T]) condition(observedGeneration int64) []monitoringv1.ConfigResourceCondition {
func (r *ConfigurationResource[T]) conditions(observedGeneration int64) []monitoringv1.ConfigResourceCondition {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

// AddServiceMonitorStatus add the latest status in serviceMonitors resources selected by the Prometheus or PrometheusAgent.
func AddServiceMonitorStatus(ctx context.Context, p metav1.Object, c *ConfigResourceSyncer, resources ResourcesSelection[*monitoringv1.ServiceMonitor]) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(suggestion) can we make the function work on a single ServiceMonitor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// The map's key is the full resource name in the "<namespace>/<name>" form.
type ResourcesSelection[T configurationResource] map[string]struct {
// ConfigurationResource is a generic type that holds a configuration resource with its validation status.
type ConfigurationResource[T TypedConfigurationResource] struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type ConfigurationResource[T TypedConfigurationResource] struct {
type TypedConfigurationResource[T ConfigurationResource] struct {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// TypedConfigurationResource is a type constraint that permits only the specific pointer types for configuration resources
// selectable by Prometheus or PrometheusAgent.
type configurationResource interface {
type TypedConfigurationResource interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we use the Typed prefix for types which use type parameters (see below).

Suggested change
type TypedConfigurationResource interface {
type ConfigurationResource interface {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

// ResourcesSelection represents a map of configuration resources selected by Prometheus or PrometheusAgent.
type ResourcesSelection[T TypedConfigurationResource] map[string]ConfigurationResource[T]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
type ResourcesSelection[T TypedConfigurationResource] map[string]ConfigurationResource[T]
type TypedResourcesSelection[T ConfigurationResource] map[string]TypedConfigurationResource[T]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

configResourceSyncer := prompkg.NewConfigResourceSyncer(monitoringv1.SchemeGroupVersion.WithResource(monitoringv1.PrometheusName), c.mclient, logger)
prompkg.AddServiceMonitorStatus(ctx, p, configResourceSyncer, resources.sMons)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to my comment on prompkg.AddServiceMonitorStatus I'd recommend that we iterate over the config resources in this function

Suggested change
prompkg.AddServiceMonitorStatus(ctx, p, configResourceSyncer, resources.sMons)
for _, sMon := range resources.sMons {
err:= prompkg.UpdateServiceMonitorStatus(ctx, p, configResourceSyncer, sMon)
...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@yp969803
Copy link
Contributor Author

yp969803 commented Aug 6, 2025

@simonpasquier done the changes

}

// UpdateServiceMonitorStatus add the latest status in serviceMonitor selected by the Prometheus or PrometheusAgent.
func UpdateServiceMonitorStatus(ctx context.Context, p metav1.Object, c *ConfigResourceSyncer, res TypedConfigurationResource[*monitoringv1.ServiceMonitor]) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) I would put the resource syncer as the second argument

Suggested change
func UpdateServiceMonitorStatus(ctx context.Context, p metav1.Object, c *ConfigResourceSyncer, res TypedConfigurationResource[*monitoringv1.ServiceMonitor]) error {
func UpdateServiceMonitorStatus(
ctx context.Context,
c *ConfigResourceSyncer,
p metav1.Object,
res TypedConfigurationResource[*monitoringv1.ServiceMonitor]) error {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

binding := &smon.Status.Bindings[i]
if binding.Namespace == p.GetNamespace() &&
binding.Name == p.GetName() &&
binding.Resource == monitoringv1.PrometheusName {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also check the group?

Suggested change
binding.Resource == monitoringv1.PrometheusName {
binding.Resource == c.gvr.Resource {

Copy link
Contributor Author

@yp969803 yp969803 Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, not needed i think

Conditions: conditions,
})
}
_, err := c.mclient.MonitoringV1().ServiceMonitors(smon.Namespace).ApplyStatus(ctx, ApplyConfigurationFromServiceMonitor(smon), metav1.ApplyOptions{FieldManager: operator.PrometheusOperatorFieldManager, Force: true})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that if we set the proper markers on , we could apply the updated binding instead of passing the full list of bindings.

type ConfigResourceStatus struct {
    // The list of workload resources (Prometheus or PrometheusAgent) which select the configuration resource.
    // +listType=map
    // +listMapKey=group
    // +listMapKey=resource
    // +listMapKey=name
    // +listMapKey=namespace
    // +optional
    Bindings []WorkloadBinding `json:"bindings,omitempty"` 
}

return nil
}

// updateConfigResourcesStatus updates the status of the selected configuration resources (serviceMonitor, podMonitor, scrapeClass and podMonitor).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// updateConfigResourcesStatus updates the status of the selected configuration resources (serviceMonitor, podMonitor, scrapeClass and podMonitor).
// updateConfigResourcesStatus updates the status of the selected configuration resources (ServiceMonitor, PodMonitor, ScrapeConfig and PodMonitor).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

configResourceSyncer := prompkg.NewConfigResourceSyncer(monitoringv1.SchemeGroupVersion.WithResource(monitoringv1.PrometheusName), c.mclient)
for key, sm := range resources.sMons {
if err := prompkg.UpdateServiceMonitorStatus(ctx, p, configResourceSyncer, sm); err != nil {
logger.Warn("Failed to update serviceMonitor status", "error", err, "key", key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.Warn("Failed to update serviceMonitor status", "error", err, "key", key)
logger.Warn("Failed to update ServiceMonitor status", "error", err, "key", key)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

mclient monitoringclient.Interface
}

func NewConfigResourceSyncer(gvr schema.GroupVersionResource, mclient monitoringclient.Interface) *ConfigResourceSyncer {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(suggestion) the workload object could be an input for the constructor since it doesn't change when we call UpdateServiceMonitorStatus().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

didnt get what you are trying to say

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant something like:

Suggested change
func NewConfigResourceSyncer(gvr schema.GroupVersionResource, mclient monitoringclient.Interface) *ConfigResourceSyncer {
func NewConfigResourceSyncer(gvr schema.GroupVersionResource, mclient monitoringclient.Interface, workload metav1.Object) *ConfigResourceSyncer {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines +1041 to +1044
if len(resources.sMons) == 0 {
return
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) we don't need to check if the slice is empty.

Suggested change
if len(resources.sMons) == 0 {
return
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@simonpasquier
Copy link
Contributor

Tested the PR quickly on my kind cluster and it's so cool, well done!

image image

return &pStatus, nil
}

// UpdateServiceMonitorStatus add the latest status in serviceMonitor selected by the Prometheus or PrometheusAgent.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// UpdateServiceMonitorStatus add the latest status in serviceMonitor selected by the Prometheus or PrometheusAgent.
// UpdateServiceMonitorStatus updates the status binding of the serviceMonitor for the given workload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

}
}

func (r *TypedConfigurationResource[T]) conditions(observedGeneration int64) []monitoringv1.ConfigResourceCondition {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) can we move this close to the TypedConfigurationResource definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

_, err = framework.MonClientV1.ServiceMonitors(ns).Create(ctx, smon, v1.CreateOptions{})
require.NoError(t, err)

time.Sleep(10 * time.Second)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use a poll-retry construct, sleeping for a fixed duration is too brittle (and also costly). See examples with wait.PollUntilContextTimeout (for instance framework.WaitForResourceAvailable()).
I suggest to add a WaitForConfigurationResourceCondtionmethod to the test/framework package.

require.NoError(t, err)
require.Equal(t, 1, len(sm.Status.Bindings))
require.Equal(t, name, sm.Status.Bindings[0].Name)
require.Equal(t, ns, sm.Status.Bindings[0].Namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need to check the Accepted condition.
Can we also add a negative test (e.g. configure an invalid secret reference and check that Accepted=false)?

@yp969803
Copy link
Contributor Author

yp969803 commented Aug 7, 2025

@simonpasquier ready to merge this

@simonpasquier simonpasquier merged commit e3ebfc5 into prometheus-operator:main Aug 7, 2025
23 checks passed
miinsun pushed a commit to miinsun/prometheus-operator that referenced this pull request Aug 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

Update serviceMonitor status subresource after reconcile

2 participants