Skip to content

feat: add reconciliation retries for CRs#423

Merged
mjnagel merged 6 commits intomainfrom
reconcile-retry-addition
May 22, 2024
Merged

feat: add reconciliation retries for CRs#423
mjnagel merged 6 commits intomainfrom
reconcile-retry-addition

Conversation

@mjnagel
Copy link
Copy Markdown
Contributor

@mjnagel mjnagel commented May 22, 2024

Description

Adds re-tries to Package CR status + logic to increment and handle retries. Currently will attempt package reconcile 5x before failing.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Other (security config, docs update, etc)

Checklist before merging

@mjnagel mjnagel self-assigned this May 22, 2024
@mjnagel mjnagel marked this pull request as ready for review May 22, 2024 21:50
@mjnagel mjnagel merged commit 424b57b into main May 22, 2024
@mjnagel mjnagel deleted the reconcile-retry-addition branch May 22, 2024 21:58
mjnagel pushed a commit that referenced this pull request May 23, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.22.0](v0.21.1...v0.22.0)
(2024-05-22)


### Features

* add `expose` service entry for internal cluster traffic
([#356](#356))
([1bde4cc](1bde4cc))
* add reconciliation retries for CRs
([#423](#423))
([424b57b](424b57b))
* uds common renovate config
([#391](#391))
([035786c](035786c))
* uds core docs
([#414](#414))
([a35ca7b](a35ca7b))


### Bug Fixes

* mismatched exemption/policy for DropAllCapabilities
([#384](#384))
([d8ec278](d8ec278))
* pepr mutation annotation overwrite
([#385](#385))
([6e56b2a](6e56b2a))
* renovate config grouping, test-infra
([#411](#411))
([05fd407](05fd407))
* renovate pepr comment
([#410](#410))
([a825388](a825388))


### Miscellaneous

* **deps:** update keycloak
([#390](#390))
([3e82c4e](3e82c4e))
* **deps:** update keycloak to v24.0.4
([#397](#397))
([c0420ea](c0420ea))
* **deps:** update keycloak to v24.0.4
([#402](#402))
([e454576](e454576))
* **deps:** update neuvector to v9.4
([#381](#381))
([20d4170](20d4170))
* **deps:** update pepr to 0.31.0
([#360](#360))
([fbd61ea](fbd61ea))
* **deps:** update prometheus-stack
([#348](#348))
([49cb11a](49cb11a))
* **deps:** update prometheus-stack
([#392](#392))
([2e656f5](2e656f5))
* **deps:** update uds to v0.10.4
([#228](#228))
([1750b23](1750b23))
* **deps:** update uds-k3d to v0.6.0
([#398](#398))
([288f009](288f009))
* **deps:** update velero
([#350](#350))
([e7cb33e](e7cb33e))
* **deps:** update zarf to v0.33.2
([#394](#394))
([201a37b](201a37b))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@mjnagel
Copy link
Copy Markdown
Contributor Author

mjnagel commented Jul 1, 2024

Posting additional context on this shift retroactively since this change was rather significant and resulted in a few issues.

Retries were introduced here to account for a specific error we ran into during pepr upgrades/pods cycling. With the introduction of service monitor generation in the operator, we have a flow where the watcher pod generates a service monitor that the admission pods then mutate. Across upgrades we encountered intermittent failures due to webhook timeouts - the watcher would fail to apply the service monitors, erroring out reconciliation of a Package on something that should be retry-able (thinking about normal helm/zarf flow, multiple apply attempts would be made).

Rather than introduce a targeted retry for just the servicemonitor behavior we decided it would potentially solve more intermittent issues (ex: intermittent networking related problems) if we just did a generic 5x retry on all Packages. This was reviewed synchronously and tested against a few scenarios where retries did resolve issues. For history sake linking bugs introduced here:

rjferguson21 pushed a commit that referenced this pull request Jul 11, 2024
## Description

Adds re-tries to Package CR status + logic to increment and handle
retries. Currently will attempt package reconcile 5x before failing.

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor Guide
Steps](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)(https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md#submitting-a-pull-request)
followed
rjferguson21 pushed a commit that referenced this pull request Jul 11, 2024
🤖 I have created a release *beep* *boop*
---


##
[0.22.0](v0.21.1...v0.22.0)
(2024-05-22)


### Features

* add `expose` service entry for internal cluster traffic
([#356](#356))
([1bde4cc](1bde4cc))
* add reconciliation retries for CRs
([#423](#423))
([424b57b](424b57b))
* uds common renovate config
([#391](#391))
([035786c](035786c))
* uds core docs
([#414](#414))
([a35ca7b](a35ca7b))


### Bug Fixes

* mismatched exemption/policy for DropAllCapabilities
([#384](#384))
([d8ec278](d8ec278))
* pepr mutation annotation overwrite
([#385](#385))
([6e56b2a](6e56b2a))
* renovate config grouping, test-infra
([#411](#411))
([05fd407](05fd407))
* renovate pepr comment
([#410](#410))
([a825388](a825388))


### Miscellaneous

* **deps:** update keycloak
([#390](#390))
([3e82c4e](3e82c4e))
* **deps:** update keycloak to v24.0.4
([#397](#397))
([c0420ea](c0420ea))
* **deps:** update keycloak to v24.0.4
([#402](#402))
([e454576](e454576))
* **deps:** update neuvector to v9.4
([#381](#381))
([20d4170](20d4170))
* **deps:** update pepr to 0.31.0
([#360](#360))
([fbd61ea](fbd61ea))
* **deps:** update prometheus-stack
([#348](#348))
([49cb11a](49cb11a))
* **deps:** update prometheus-stack
([#392](#392))
([2e656f5](2e656f5))
* **deps:** update uds to v0.10.4
([#228](#228))
([1750b23](1750b23))
* **deps:** update uds-k3d to v0.6.0
([#398](#398))
([288f009](288f009))
* **deps:** update velero
([#350](#350))
([e7cb33e](e7cb33e))
* **deps:** update zarf to v0.33.2
([#394](#394))
([201a37b](201a37b))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
mjnagel added a commit to BagelLab/uds-core that referenced this pull request Nov 14, 2025
## Description

Adds re-tries to Package CR status + logic to increment and handle
retries. Currently will attempt package reconcile 5x before failing.

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor Guide
Steps](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)(https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md#submitting-a-pull-request)
followed
mjnagel pushed a commit to BagelLab/uds-core that referenced this pull request Nov 14, 2025
🤖 I have created a release *beep* *boop*
---


##
[0.22.0](defenseunicorns/uds-core@v0.21.1...v0.22.0)
(2024-05-22)


### Features

* add `expose` service entry for internal cluster traffic
([defenseunicorns#356](defenseunicorns#356))
([1bde4cc](defenseunicorns@1bde4cc))
* add reconciliation retries for CRs
([defenseunicorns#423](defenseunicorns#423))
([424b57b](defenseunicorns@424b57b))
* uds common renovate config
([defenseunicorns#391](defenseunicorns#391))
([035786c](defenseunicorns@035786c))
* uds core docs
([defenseunicorns#414](defenseunicorns#414))
([a35ca7b](defenseunicorns@a35ca7b))


### Bug Fixes

* mismatched exemption/policy for DropAllCapabilities
([defenseunicorns#384](defenseunicorns#384))
([d8ec278](defenseunicorns@d8ec278))
* pepr mutation annotation overwrite
([defenseunicorns#385](defenseunicorns#385))
([6e56b2a](defenseunicorns@6e56b2a))
* renovate config grouping, test-infra
([defenseunicorns#411](defenseunicorns#411))
([05fd407](defenseunicorns@05fd407))
* renovate pepr comment
([defenseunicorns#410](defenseunicorns#410))
([a825388](defenseunicorns@a825388))


### Miscellaneous

* **deps:** update keycloak
([defenseunicorns#390](defenseunicorns#390))
([3e82c4e](defenseunicorns@3e82c4e))
* **deps:** update keycloak to v24.0.4
([defenseunicorns#397](defenseunicorns#397))
([c0420ea](defenseunicorns@c0420ea))
* **deps:** update keycloak to v24.0.4
([defenseunicorns#402](defenseunicorns#402))
([e454576](defenseunicorns@e454576))
* **deps:** update neuvector to v9.4
([defenseunicorns#381](defenseunicorns#381))
([20d4170](defenseunicorns@20d4170))
* **deps:** update pepr to 0.31.0
([defenseunicorns#360](defenseunicorns#360))
([fbd61ea](defenseunicorns@fbd61ea))
* **deps:** update prometheus-stack
([defenseunicorns#348](defenseunicorns#348))
([49cb11a](defenseunicorns@49cb11a))
* **deps:** update prometheus-stack
([defenseunicorns#392](defenseunicorns#392))
([2e656f5](defenseunicorns@2e656f5))
* **deps:** update uds to v0.10.4
([defenseunicorns#228](defenseunicorns#228))
([1750b23](defenseunicorns@1750b23))
* **deps:** update uds-k3d to v0.6.0
([defenseunicorns#398](defenseunicorns#398))
([288f009](defenseunicorns@288f009))
* **deps:** update velero
([defenseunicorns#350](defenseunicorns#350))
([e7cb33e](defenseunicorns@e7cb33e))
* **deps:** update zarf to v0.33.2
([defenseunicorns#394](defenseunicorns#394))
([201a37b](defenseunicorns@201a37b))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants