feat: add status for removing / removalfailed#1334
Merged
Conversation
mjnagel
commented
Mar 4, 2025
chance-coleman
previously approved these changes
Mar 5, 2025
Contributor
chance-coleman
left a comment
There was a problem hiding this comment.
looks good to me (LGTM), ran through tests, everything is working as expected
chance-coleman
approved these changes
Mar 7, 2025
noahpb
approved these changes
Mar 7, 2025
chance-coleman
pushed a commit
that referenced
this pull request
Mar 19, 2025
🤖 I have created a release *beep* *boop* --- ## [0.38.0](v0.37.0...v0.38.0) (2025-03-19) ### Features * add status for removing / removalfailed ([#1334](#1334)) ([a99b408](a99b408)) * document workaround for Keycloak and Apple M4 Macs ([#1337](#1337)) ([ae51155](ae51155)) * root domain templating ([#1343](#1343)) ([f64974c](f64974c)) * sso doc restructure ([#1293](#1293)) ([3c934a0](3c934a0)) ### Bug Fixes * renovate not checking test directory versions ([#1357](#1357)) ([9e78362](9e78362)) ### Miscellaneous * **ci:** disable compliance checks ([#1347](#1347)) ([e984131](e984131)) * **ci:** rm `create_bucket_lifecycle` input to s3 module calls ([#1348](#1348)) ([c93aa7b](c93aa7b)) * **ci:** swap to govcloud for aws ci tests ([#1342](#1342)) ([d51db55](d51db55)) * **ci:** swap to new aws account for rke/eks tests ([#1339](#1339)) ([3b6fb50](3b6fb50)) * **ci:** switch to local modules ([#1369](#1369)) ([9f8536d](9f8536d)) * **deps:** update grafana ([#1346](#1346)) ([d869ca7](d869ca7)) * **deps:** update pepr to v0.46.1 ([#1336](#1336)) ([5e9c119](5e9c119)) * **deps:** update pepr to v15.5.0 ([#1353](#1353)) ([8d7b44b](8d7b44b)) * **deps:** update prometheus-stack ([#1324](#1324)) ([d6840be](d6840be)) * **deps:** update support dependencies to v0.24.0 ([#1360](#1360)) ([bf23651](bf23651)) * **deps:** update support dependencies to v4.1.5 ([#1340](#1340)) ([0714b05](0714b05)) * **deps:** update support dependencies to v4.23.0 ([#1358](#1358)) ([e6a986e](e6a986e)) * **deps:** update support-deps ([#1332](#1332)) ([e37d062](e37d062)) * **deps:** update support-deps ([#1345](#1345)) ([e390899](e390899)) * **deps:** update support-deps ([#1351](#1351)) ([551a865](551a865)) * **deps:** update support-deps ([#1354](#1354)) ([dd36d03](dd36d03)) * **deps:** update velero ([#1299](#1299)) ([59ce747](59ce747)) * **docs:** keycloak session timeout doc ([#1315](#1315)) ([9509ac7](9509ac7)) ### Documentation * add developer doc on ci testing ([#1344](#1344)) ([0e011a4](0e011a4)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
mjnagel
added a commit
to BagelLab/uds-core
that referenced
this pull request
Nov 14, 2025
## Description This PR utilizes the "new" ability in a Pepr finalizer to not remove the `finalizer`. This enables us to update the status while finalizing, and catch errors if cleanup does not work as expected. Changes: - Skip finalizer if it's already running (based on status) - Skip finalizer if Package isn't ready/failed yet (for defenseunicorns#963) - Patch `Removing` status on the CR - Catch errors on finalization and patch `RemovalFailed` status and create a failure event - Retry each cleanup/purge function using `retryWithDelay` I also updated the diagram to support these changes, as well as adding test cases for the finalizer function. Diagram update can be previewed on the docs by using [this link](https://raw.githubusercontent.com/defenseunicorns/uds-core/c41964d426b8bb9780c26d41c631dbe6f50e854a/docs/.images/diagrams/uds-core-operator-uds-package.svg) on `docs/reference/configuration/UDS operator/package.md`, specific changes: - Moved finalizer section to the right of reconciler - Simplified flow of validator (to make more space in the diagram) - Added new pieces of finalizer flow (failure, status patching, etc) ## Related Issue Fixes defenseunicorns#963 Fixes defenseunicorns#1159 ## Type of change - [x] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Other (security config, docs update, etc) ## Steps to Validate <details><summary>Testing Steps</summary> Test setup: ```console # Install slim-dev (unicorn flavor to avoid pull rate limiting) uds run slim-dev --set flavor=unicorn # Create the test packages zarf p create src/test --skip-sbom # Deploy the test packages zarf p deploy build/zarf-package-uds-core-test-apps-*.tar.zst --confirm # Validate all package CRs go to Ready status kubectl get pkg -A # should all show ready ``` Test that normal deletion works and makes events: ```console # Delete a package CR kubectl delete pkg -n test-tenant-app test-tenant-app # Validate success and events kubectl get pkg -n test-tenant-app # should show no resources kubectl get events -n test-tenant-app | grep package # should show 3 removal events ``` Test that finalizer doesn't run until CR is ready: ```console # This forces a re-reconcile of the package and then deletes immediately # If you watch while this happens (k9s, etc) you should see it go to Pending before Removing kubectl patch pkg httpbin-other -n authservice-test-app --subresource=status --type=json -p='[{"op": "remove", "path": "/status"}]' && kubectl delete pkg httpbin-other -n authservice-test-app # Validate that the watcher waited to finalize kubectl logs -n pepr-system -l app=pepr-uds-core-watcher --tail=-1 | grep "Waiting" kubectl get events -n authservice-test-app | grep package # should show 3 removal events ``` Test that finalizer places CR in RemovalFailed state on failed cleanup: ```console # Deploy the test apps again (we need the sso client) zarf p deploy build/zarf-package-uds-core-test-apps-*.tar.zst --confirm # Edit the peprstore kubectl edit peprstore -n pepr-system pepr-uds-core-store # Delete the line with `uds-core-operator-v2-sso-client-uds-core-httpbin`, this is the client token and will make Pepr unable to cleanup the client # Save the peprstore # Delete the package CR kubectl delete pkg httpbin-other -n authservice-test-app # Make sure that status is marked as RemovalFailed (after ~15 seconds) kubectl get pkg httpbin-other -n authservice-test-app # Make sure events show up that client failed to be removed kubectl describe pkg httpbin-other -n authservice-test-app # Make sure that the SSO client removal was retried 4 times before final failure kubectl logs -n pepr-system -l app=pepr-uds-core-watcher --tail=-1 | grep "cleanupSSOClients" ``` </details> Also note the automated jest unit tests and validate those. ## Checklist before merging - [x] Test, docs, adr added or updated as needed - [x] [Contributor Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md) followed
mjnagel
pushed a commit
to BagelLab/uds-core
that referenced
this pull request
Nov 14, 2025
🤖 I have created a release *beep* *boop* --- ## [0.38.0](defenseunicorns/uds-core@v0.37.0...v0.38.0) (2025-03-19) ### Features * add status for removing / removalfailed ([defenseunicorns#1334](defenseunicorns#1334)) ([a99b408](defenseunicorns@a99b408)) * document workaround for Keycloak and Apple M4 Macs ([defenseunicorns#1337](defenseunicorns#1337)) ([ae51155](defenseunicorns@ae51155)) * root domain templating ([defenseunicorns#1343](defenseunicorns#1343)) ([f64974c](defenseunicorns@f64974c)) * sso doc restructure ([defenseunicorns#1293](defenseunicorns#1293)) ([3c934a0](defenseunicorns@3c934a0)) ### Bug Fixes * renovate not checking test directory versions ([defenseunicorns#1357](defenseunicorns#1357)) ([9e78362](defenseunicorns@9e78362)) ### Miscellaneous * **ci:** disable compliance checks ([defenseunicorns#1347](defenseunicorns#1347)) ([e984131](defenseunicorns@e984131)) * **ci:** rm `create_bucket_lifecycle` input to s3 module calls ([defenseunicorns#1348](defenseunicorns#1348)) ([c93aa7b](defenseunicorns@c93aa7b)) * **ci:** swap to govcloud for aws ci tests ([defenseunicorns#1342](defenseunicorns#1342)) ([d51db55](defenseunicorns@d51db55)) * **ci:** swap to new aws account for rke/eks tests ([defenseunicorns#1339](defenseunicorns#1339)) ([3b6fb50](defenseunicorns@3b6fb50)) * **ci:** switch to local modules ([defenseunicorns#1369](defenseunicorns#1369)) ([9f8536d](defenseunicorns@9f8536d)) * **deps:** update grafana ([defenseunicorns#1346](defenseunicorns#1346)) ([d869ca7](defenseunicorns@d869ca7)) * **deps:** update pepr to v0.46.1 ([defenseunicorns#1336](defenseunicorns#1336)) ([5e9c119](defenseunicorns@5e9c119)) * **deps:** update pepr to v15.5.0 ([defenseunicorns#1353](defenseunicorns#1353)) ([8d7b44b](defenseunicorns@8d7b44b)) * **deps:** update prometheus-stack ([defenseunicorns#1324](defenseunicorns#1324)) ([d6840be](defenseunicorns@d6840be)) * **deps:** update support dependencies to v0.24.0 ([defenseunicorns#1360](defenseunicorns#1360)) ([bf23651](defenseunicorns@bf23651)) * **deps:** update support dependencies to v4.1.5 ([defenseunicorns#1340](defenseunicorns#1340)) ([0714b05](defenseunicorns@0714b05)) * **deps:** update support dependencies to v4.23.0 ([defenseunicorns#1358](defenseunicorns#1358)) ([e6a986e](defenseunicorns@e6a986e)) * **deps:** update support-deps ([defenseunicorns#1332](defenseunicorns#1332)) ([e37d062](defenseunicorns@e37d062)) * **deps:** update support-deps ([defenseunicorns#1345](defenseunicorns#1345)) ([e390899](defenseunicorns@e390899)) * **deps:** update support-deps ([defenseunicorns#1351](defenseunicorns#1351)) ([551a865](defenseunicorns@551a865)) * **deps:** update support-deps ([defenseunicorns#1354](defenseunicorns#1354)) ([dd36d03](defenseunicorns@dd36d03)) * **deps:** update velero ([defenseunicorns#1299](defenseunicorns#1299)) ([59ce747](defenseunicorns@59ce747)) * **docs:** keycloak session timeout doc ([defenseunicorns#1315](defenseunicorns#1315)) ([9509ac7](defenseunicorns@9509ac7)) ### Documentation * add developer doc on ci testing ([defenseunicorns#1344](defenseunicorns#1344)) ([0e011a4](defenseunicorns@0e011a4)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR utilizes the "new" ability in a Pepr finalizer to not remove the
finalizer. This enables us to update the status while finalizing, and catch errors if cleanup does not work as expected. Changes:Removingstatus on the CRRemovalFailedstatus and create a failure eventretryWithDelayI also updated the diagram to support these changes, as well as adding test cases for the finalizer function. Diagram update can be previewed on the docs by using this link on
docs/reference/configuration/UDS operator/package.md, specific changes:Related Issue
Fixes #963
Fixes #1159
Type of change
Steps to Validate
Testing Steps
Test setup:
Test that normal deletion works and makes events:
Test that finalizer doesn't run until CR is ready:
Test that finalizer places CR in RemovalFailed state on failed cleanup:
Also note the automated jest unit tests and validate those.
Checklist before merging