feat: add reconciliation retries for CRs#423
Conversation
🤖 I have created a release *beep* *boop* --- ## [0.22.0](v0.21.1...v0.22.0) (2024-05-22) ### Features * add `expose` service entry for internal cluster traffic ([#356](#356)) ([1bde4cc](1bde4cc)) * add reconciliation retries for CRs ([#423](#423)) ([424b57b](424b57b)) * uds common renovate config ([#391](#391)) ([035786c](035786c)) * uds core docs ([#414](#414)) ([a35ca7b](a35ca7b)) ### Bug Fixes * mismatched exemption/policy for DropAllCapabilities ([#384](#384)) ([d8ec278](d8ec278)) * pepr mutation annotation overwrite ([#385](#385)) ([6e56b2a](6e56b2a)) * renovate config grouping, test-infra ([#411](#411)) ([05fd407](05fd407)) * renovate pepr comment ([#410](#410)) ([a825388](a825388)) ### Miscellaneous * **deps:** update keycloak ([#390](#390)) ([3e82c4e](3e82c4e)) * **deps:** update keycloak to v24.0.4 ([#397](#397)) ([c0420ea](c0420ea)) * **deps:** update keycloak to v24.0.4 ([#402](#402)) ([e454576](e454576)) * **deps:** update neuvector to v9.4 ([#381](#381)) ([20d4170](20d4170)) * **deps:** update pepr to 0.31.0 ([#360](#360)) ([fbd61ea](fbd61ea)) * **deps:** update prometheus-stack ([#348](#348)) ([49cb11a](49cb11a)) * **deps:** update prometheus-stack ([#392](#392)) ([2e656f5](2e656f5)) * **deps:** update uds to v0.10.4 ([#228](#228)) ([1750b23](1750b23)) * **deps:** update uds-k3d to v0.6.0 ([#398](#398)) ([288f009](288f009)) * **deps:** update velero ([#350](#350)) ([e7cb33e](e7cb33e)) * **deps:** update zarf to v0.33.2 ([#394](#394)) ([201a37b](201a37b)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
|
Posting additional context on this shift retroactively since this change was rather significant and resulted in a few issues. Retries were introduced here to account for a specific error we ran into during pepr upgrades/pods cycling. With the introduction of service monitor generation in the operator, we have a flow where the watcher pod generates a service monitor that the admission pods then mutate. Across upgrades we encountered intermittent failures due to webhook timeouts - the watcher would fail to apply the service monitors, erroring out reconciliation of a Package on something that should be retry-able (thinking about normal helm/zarf flow, multiple apply attempts would be made). Rather than introduce a targeted retry for just the servicemonitor behavior we decided it would potentially solve more intermittent issues (ex: intermittent networking related problems) if we just did a generic 5x retry on all Packages. This was reviewed synchronously and tested against a few scenarios where retries did resolve issues. For history sake linking bugs introduced here:
|
## Description Adds re-tries to Package CR status + logic to increment and handle retries. Currently will attempt package reconcile 5x before failing. ## Type of change - [x] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Other (security config, docs update, etc) ## Checklist before merging - [x] Test, docs, adr added or updated as needed - [x] [Contributor Guide Steps](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)(https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md#submitting-a-pull-request) followed
🤖 I have created a release *beep* *boop* --- ## [0.22.0](v0.21.1...v0.22.0) (2024-05-22) ### Features * add `expose` service entry for internal cluster traffic ([#356](#356)) ([1bde4cc](1bde4cc)) * add reconciliation retries for CRs ([#423](#423)) ([424b57b](424b57b)) * uds common renovate config ([#391](#391)) ([035786c](035786c)) * uds core docs ([#414](#414)) ([a35ca7b](a35ca7b)) ### Bug Fixes * mismatched exemption/policy for DropAllCapabilities ([#384](#384)) ([d8ec278](d8ec278)) * pepr mutation annotation overwrite ([#385](#385)) ([6e56b2a](6e56b2a)) * renovate config grouping, test-infra ([#411](#411)) ([05fd407](05fd407)) * renovate pepr comment ([#410](#410)) ([a825388](a825388)) ### Miscellaneous * **deps:** update keycloak ([#390](#390)) ([3e82c4e](3e82c4e)) * **deps:** update keycloak to v24.0.4 ([#397](#397)) ([c0420ea](c0420ea)) * **deps:** update keycloak to v24.0.4 ([#402](#402)) ([e454576](e454576)) * **deps:** update neuvector to v9.4 ([#381](#381)) ([20d4170](20d4170)) * **deps:** update pepr to 0.31.0 ([#360](#360)) ([fbd61ea](fbd61ea)) * **deps:** update prometheus-stack ([#348](#348)) ([49cb11a](49cb11a)) * **deps:** update prometheus-stack ([#392](#392)) ([2e656f5](2e656f5)) * **deps:** update uds to v0.10.4 ([#228](#228)) ([1750b23](1750b23)) * **deps:** update uds-k3d to v0.6.0 ([#398](#398)) ([288f009](288f009)) * **deps:** update velero ([#350](#350)) ([e7cb33e](e7cb33e)) * **deps:** update zarf to v0.33.2 ([#394](#394)) ([201a37b](201a37b)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
## Description Adds re-tries to Package CR status + logic to increment and handle retries. Currently will attempt package reconcile 5x before failing. ## Type of change - [x] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Other (security config, docs update, etc) ## Checklist before merging - [x] Test, docs, adr added or updated as needed - [x] [Contributor Guide Steps](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)(https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md#submitting-a-pull-request) followed
🤖 I have created a release *beep* *boop* --- ## [0.22.0](defenseunicorns/uds-core@v0.21.1...v0.22.0) (2024-05-22) ### Features * add `expose` service entry for internal cluster traffic ([defenseunicorns#356](defenseunicorns#356)) ([1bde4cc](defenseunicorns@1bde4cc)) * add reconciliation retries for CRs ([defenseunicorns#423](defenseunicorns#423)) ([424b57b](defenseunicorns@424b57b)) * uds common renovate config ([defenseunicorns#391](defenseunicorns#391)) ([035786c](defenseunicorns@035786c)) * uds core docs ([defenseunicorns#414](defenseunicorns#414)) ([a35ca7b](defenseunicorns@a35ca7b)) ### Bug Fixes * mismatched exemption/policy for DropAllCapabilities ([defenseunicorns#384](defenseunicorns#384)) ([d8ec278](defenseunicorns@d8ec278)) * pepr mutation annotation overwrite ([defenseunicorns#385](defenseunicorns#385)) ([6e56b2a](defenseunicorns@6e56b2a)) * renovate config grouping, test-infra ([defenseunicorns#411](defenseunicorns#411)) ([05fd407](defenseunicorns@05fd407)) * renovate pepr comment ([defenseunicorns#410](defenseunicorns#410)) ([a825388](defenseunicorns@a825388)) ### Miscellaneous * **deps:** update keycloak ([defenseunicorns#390](defenseunicorns#390)) ([3e82c4e](defenseunicorns@3e82c4e)) * **deps:** update keycloak to v24.0.4 ([defenseunicorns#397](defenseunicorns#397)) ([c0420ea](defenseunicorns@c0420ea)) * **deps:** update keycloak to v24.0.4 ([defenseunicorns#402](defenseunicorns#402)) ([e454576](defenseunicorns@e454576)) * **deps:** update neuvector to v9.4 ([defenseunicorns#381](defenseunicorns#381)) ([20d4170](defenseunicorns@20d4170)) * **deps:** update pepr to 0.31.0 ([defenseunicorns#360](defenseunicorns#360)) ([fbd61ea](defenseunicorns@fbd61ea)) * **deps:** update prometheus-stack ([defenseunicorns#348](defenseunicorns#348)) ([49cb11a](defenseunicorns@49cb11a)) * **deps:** update prometheus-stack ([defenseunicorns#392](defenseunicorns#392)) ([2e656f5](defenseunicorns@2e656f5)) * **deps:** update uds to v0.10.4 ([defenseunicorns#228](defenseunicorns#228)) ([1750b23](defenseunicorns@1750b23)) * **deps:** update uds-k3d to v0.6.0 ([defenseunicorns#398](defenseunicorns#398)) ([288f009](defenseunicorns@288f009)) * **deps:** update velero ([defenseunicorns#350](defenseunicorns#350)) ([e7cb33e](defenseunicorns@e7cb33e)) * **deps:** update zarf to v0.33.2 ([defenseunicorns#394](defenseunicorns#394)) ([201a37b](defenseunicorns@201a37b)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Description
Adds re-tries to Package CR status + logic to increment and handle retries. Currently will attempt package reconcile 5x before failing.
Type of change
Checklist before merging