Skip to content

Only remove the cleanup finalizer if the cleanup succeeds#7286

Merged
cert-manager-prow[bot] merged 1 commit into
cert-manager:masterfrom
inteon:finalizer_on_failure
Feb 5, 2026
Merged

Only remove the cleanup finalizer if the cleanup succeeds#7286
cert-manager-prow[bot] merged 1 commit into
cert-manager:masterfrom
inteon:finalizer_on_failure

Conversation

@inteon

@inteon inteon commented Sep 18, 2024

Copy link
Copy Markdown
Member

This addresses two problems:

  • The finalizer was being removed unconditionally, regardless of whether solver.Cleanup succeeds
  • The finalizer was assumed to be the first in the list of finalizers, potentially resulting in the removal of foreign finalizers and resulting in the cleanup finalizer never being removed.
  • solver.Cleanup was invoked inconsistently from two different places, in handleFinalizer and in an acme.IsFinalState block, but in the former, the Status.Processing and Status.Presented fields were not being reset and in the latter case, the finalizer wasn't being removed if the cleanup succeeded.

Part of:

supersedes #5126

Kind

/kind cleanup

Release Note

NONE

@cert-manager-prow cert-manager-prow Bot added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. area/acme Indicates a PR directly modifies the ACME Issuer code area/acme/dns01 Indicates a PR modifies ACME DNS01 provider code area/acme/http01 Indicates a PR modifies ACME HTTP01 provider code size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 18, 2024
@inteon inteon force-pushed the finalizer_on_failure branch 2 times, most recently from 0470fa7 to 41f5ebc Compare September 18, 2024 18:58
@inteon inteon force-pushed the finalizer_on_failure branch from 41f5ebc to 95eeab9 Compare September 19, 2024 10:05

@SgtCoDFish SgtCoDFish left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this looks pretty self explanatory and makes sense. Just one question from me before I'd feel comfortable merging!

solver, err := c.solverFor(ch.Spec.Type)
if err != nil {
log.Error(err, "error getting solver for challenge")
return nil

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Previously we swallowed the error and returned nil. Same in the if err := solver.CleanUp(ctx, ch); err != nil block below.

What's the motivation for the change to now start returning the error here (and below)? What effect will it have?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two situations in which a cleanup is performed:

  • acme.IsFinalState(ch.Status.State)
  • !ch.DeletionTimestamp.IsZero()

The old code swallowed errors when !ch.DeletionTimestamp.IsZero() but returned errors when acme.IsFinalState(ch.Status.State). So this change will only affect the first case.
In that case, the deletion of the challenge resource will be blocked until the handleFinalizer is successful. An error will be returned if something went wrong and the CleanUp function will be retried.
The previous behavior for the first case was that the CleanUp function was called once and the error was only logged and the finalizer deleted unconditionally.

@inteon inteon Sep 19, 2024

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the new behavior will block deletion until the cleanup succeeds & keep retrying (which was what is described here: #3640 and what the original PR tried to solve #5126).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I just found there is a counter-issue: #7234
Challenges that are stuck in their cleanup phase: cleanup is failing and the resource is not being deleted...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@erikgb convinced me that we should still move forward. I updated the PR.

@inteon

inteon commented Sep 19, 2024

Copy link
Copy Markdown
Member Author

/hold
There might be an unforeseen issue: #7286 (comment)

@cert-manager-prow cert-manager-prow Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 19, 2024
@cert-manager-bot

Copy link
Copy Markdown
Contributor

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close.
/lifecycle stale

@cert-manager-prow cert-manager-prow Bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 18, 2024
@cert-manager-bot

Copy link
Copy Markdown
Contributor

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close.
/lifecycle rotten
/remove-lifecycle stale

@cert-manager-prow cert-manager-prow Bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 17, 2025
@inteon inteon added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. labels Jan 20, 2025
@cert-manager-prow cert-manager-prow Bot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 9, 2025
@inteon inteon force-pushed the finalizer_on_failure branch from 95eeab9 to 9755045 Compare February 4, 2026 09:38
@cert-manager-prow cert-manager-prow Bot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Feb 4, 2026
@inteon

inteon commented Feb 4, 2026

Copy link
Copy Markdown
Member Author

Looks like issue #7234 will be solved in a different manner: #7234 (comment)

I think we are able to move forward with this PR.

/unhold

@cert-manager-prow cert-manager-prow Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 4, 2026
@inteon inteon force-pushed the finalizer_on_failure branch from 9755045 to 3aca768 Compare February 4, 2026 09:48
@inteon inteon force-pushed the finalizer_on_failure branch from 3aca768 to 2c7db23 Compare February 4, 2026 09:52
@inteon inteon requested a review from erikgb February 4, 2026 10:21
Comment thread pkg/controller/acmechallenges/finalizer_test.go Outdated
Comment thread pkg/controller/acmechallenges/finalizer.go Outdated
Comment thread pkg/controller/acmechallenges/finalizer.go Outdated
Signed-off-by: Tim Ramlot <42113979+inteon@users.noreply.github.com>
@inteon inteon force-pushed the finalizer_on_failure branch from 2c7db23 to 0a5bab8 Compare February 5, 2026 08:19
Delete(cmacme.ACMEDomainQualifiedFinalizer, cmacme.ACMELegacyFinalizer)
return len(finalizers) > 0
finalizers := sets.New(ch.Finalizers...)
return !finalizers.Has(cmacme.ACMELegacyFinalizer) && !finalizers.Has(cmacme.ACMEDomainQualifiedFinalizer)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return !finalizers.Has(cmacme.ACMELegacyFinalizer) && !finalizers.Has(cmacme.ACMEDomainQualifiedFinalizer)
return !finalizers.Has(cmacme.ACMEDomainQualifiedFinalizer)

I think this is clearer, as we require all challenges to have the new finalizer.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is called in two places:

So, I do think that is makes sense to check for both to make sure we cleanup in the second case.

@erikgb erikgb Feb 5, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but the naming is confusing here. Have to look at the "surroundings" to understand how this works. But we can leave this for an eventual follow-up.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's leave the name for now. I already reverted the name change I wanted to make.

gen.DefaultTestNamespace,
gen.ChallengeFrom(deletedChallenge,
gen.SetChallengeProcessing(true),
gen.SetChallengeProcessing(false),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change expected? Processing is part of the status subresource. This action should only assert that the finalizers are removed.

Comment thread pkg/controller/acmeorders/util.go

@erikgb erikgb left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Thanks @inteon. I would expect a bit of "noise" when unleashing this bugfix, but I am convinced that moving forward is correct and worth it.

@cert-manager-prow cert-manager-prow Bot added the lgtm Indicates that a PR is ready to be merged. label Feb 5, 2026
@inteon

inteon commented Feb 5, 2026

Copy link
Copy Markdown
Member Author

/approve

@cert-manager-prow

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: inteon

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@cert-manager-prow cert-manager-prow Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 5, 2026
@inteon

inteon commented Feb 5, 2026

Copy link
Copy Markdown
Member Author

/retest

@cert-manager-prow

cert-manager-prow Bot commented Feb 5, 2026

Copy link
Copy Markdown
Contributor

@inteon: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-cert-manager-master-e2e-v1-32-upgrade 95eeab9 link true /test pull-cert-manager-master-e2e-v1-32-upgrade
pull-cert-manager-master-e2e-v1-32 95eeab9 link true /test pull-cert-manager-master-e2e-v1-32
pull-cert-manager-master-e2e-v1-33-upgrade 95eeab9 link true /test pull-cert-manager-master-e2e-v1-33-upgrade
pull-cert-manager-master-e2e-v1-34-upgrade 95eeab9 link true /test pull-cert-manager-master-e2e-v1-34-upgrade

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@inteon

inteon commented Feb 5, 2026

Copy link
Copy Markdown
Member Author

/retest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/acme/dns01 Indicates a PR modifies ACME DNS01 provider code area/acme/http01 Indicates a PR modifies ACME HTTP01 provider code area/acme Indicates a PR directly modifies the ACME Issuer code dco-signoff: yes Indicates that all commits in the pull request have the valid DCO sign-off message. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm Indicates that a PR is ready to be merged. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants