Ignore reconcile errors that occur because a pod is being terminated#1233
Conversation
Signed-off-by: Kevin Earls <kearls@redhat.com>
Signed-off-by: Kevin Earls <kearls@redhat.com>
| func (r *OpenTelemetryCollectorReconciler) RunTasks(ctx context.Context, params reconcile.Params) error { | ||
| for _, task := range r.tasks { | ||
| if err := task.Do(ctx, params); err != nil { | ||
| r.log.Error(err, fmt.Sprintf("failed to reconcile %s", task.Name)) |
There was a problem hiding this comment.
How often does this logline occur when a pod is terminated?
There was a problem hiding this comment.
Many. :-). Like for everything the reconciler touches. To be honest, it's not that big a deal, and probably something that will only be seen when running kuttl tests. But I found it very annoying, so I submitted the PR.
There was a problem hiding this comment.
Would it make sense to end the reconciling with by returing nil if the reason is a terminating pod? That way the user gets informed that something did not work and we would get rid of all the unnecessary log lines?
Signed-off-by: Kevin Earls <kearls@redhat.com>
| for _, task := range r.tasks { | ||
| if err := task.Do(ctx, params); err != nil { | ||
| // If we get an error that occur because a pod is being terminated then exit this loop | ||
| if apierrors.IsForbidden(err) && strings.Contains(err.Error(), "because it is being terminated") { |
There was a problem hiding this comment.
one last thing, should we print the termination logline before? And maybe an extra hint, that we cancel the reconciliation?
There was a problem hiding this comment.
@frzifus what log level should I print that at?
There was a problem hiding this comment.
i thought maybe:
r.log.Error(err, fmt.Sprintf("failed to reconcile %s", task.Name))And maybe an additional line that says that the reconciliation will be canceled?
There was a problem hiding this comment.
I am a bit skeptical about checking the string because it is being terminated". This could easily change in the future
There was a problem hiding this comment.
You might be able to check for v1.NamespaceTerminatingCause in the error: https://github.com/kubernetes/apiserver/blob/v0.26.0-rc.0/pkg/admission/plugin/namespace/lifecycle/admission.go#L177
Signed-off-by: Kevin Earls <kearls@redhat.com>
|
@pavolloffay Can you merge this? |
Signed-off-by: Kevin Earls <kearls@redhat.com>
Signed-off-by: Kevin Earls <kearls@redhat.com>
| if err := task.Do(ctx, params); err != nil { | ||
| // If we get an error that occurs because a pod is being terminated, then exit this loop | ||
| if apierrors.IsForbidden(err) && apierrors.HasStatusCause(err, corev1.NamespaceTerminatingCause) { | ||
| r.log.V(2).Info("Exiting reconcile loop because namespace", params.Instance.Namespace, "is being terminated") |
There was a problem hiding this comment.
| r.log.V(2).Info("Exiting reconcile loop because namespace", params.Instance.Namespace, "is being terminated") | |
| r.log.V(2).Info("Exiting reconcile loop because namespace is being terminated", "namespace", params.Instance.Namespace) |
(see https://pkg.go.dev/github.com/go-logr/logr#Logger.Info)
Signed-off-by: Kevin Earls <kearls@redhat.com>
Signed-off-by: Kevin Earls <kearls@redhat.com>
…pen-telemetry#1233) * Ignore reconcile errors that occur because a pod is being terminated Signed-off-by: Kevin Earls <kearls@redhat.com> * Appease the all powerfull linter Signed-off-by: Kevin Earls <kearls@redhat.com> * Change behavior to end reconcile loop if pod has been terminated Signed-off-by: Kevin Earls <kearls@redhat.com> * Print a log message if we exit the reconciler loop Signed-off-by: Kevin Earls <kearls@redhat.com> * Look for NamespaceTerminatingCause Signed-off-by: Kevin Earls <kearls@redhat.com> * Appease the almighty linter Signed-off-by: Kevin Earls <kearls@redhat.com> * Fix log message Signed-off-by: Kevin Earls <kearls@redhat.com> * Skip flaky test Signed-off-by: Kevin Earls <kearls@redhat.com> Signed-off-by: Kevin Earls <kearls@redhat.com> Co-authored-by: Ben B <bongartz@klimlive.de>
…pen-telemetry#1233) * Ignore reconcile errors that occur because a pod is being terminated Signed-off-by: Kevin Earls <kearls@redhat.com> * Appease the all powerfull linter Signed-off-by: Kevin Earls <kearls@redhat.com> * Change behavior to end reconcile loop if pod has been terminated Signed-off-by: Kevin Earls <kearls@redhat.com> * Print a log message if we exit the reconciler loop Signed-off-by: Kevin Earls <kearls@redhat.com> * Look for NamespaceTerminatingCause Signed-off-by: Kevin Earls <kearls@redhat.com> * Appease the almighty linter Signed-off-by: Kevin Earls <kearls@redhat.com> * Fix log message Signed-off-by: Kevin Earls <kearls@redhat.com> * Skip flaky test Signed-off-by: Kevin Earls <kearls@redhat.com> Signed-off-by: Kevin Earls <kearls@redhat.com> Co-authored-by: Ben B <bongartz@klimlive.de>
Signed-off-by: Kevin Earls kearls@redhat.com
This fixes #1219 by not logging errors that occur because we are trying to reconcile something in a namespace that is being terminated.