Skip to content

feat: add status for removing / removalfailed#1334

Merged
mjnagel merged 19 commits intomainfrom
finalizer-rework
Mar 7, 2025
Merged

feat: add status for removing / removalfailed#1334
mjnagel merged 19 commits intomainfrom
finalizer-rework

Conversation

@mjnagel
Copy link
Copy Markdown
Contributor

@mjnagel mjnagel commented Mar 4, 2025

Description

This PR utilizes the "new" ability in a Pepr finalizer to not remove the finalizer. This enables us to update the status while finalizing, and catch errors if cleanup does not work as expected. Changes:

I also updated the diagram to support these changes, as well as adding test cases for the finalizer function. Diagram update can be previewed on the docs by using this link on docs/reference/configuration/UDS operator/package.md, specific changes:

  • Moved finalizer section to the right of reconciler
  • Simplified flow of validator (to make more space in the diagram)
  • Added new pieces of finalizer flow (failure, status patching, etc)

Related Issue

Fixes #963

Fixes #1159

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Other (security config, docs update, etc)

Steps to Validate

Testing Steps

Test setup:

# Install slim-dev (unicorn flavor to avoid pull rate limiting)
uds run slim-dev --set flavor=unicorn
# Create the test packages
zarf p create src/test --skip-sbom
# Deploy the test packages
zarf p deploy build/zarf-package-uds-core-test-apps-*.tar.zst --confirm
# Validate all package CRs go to Ready status
kubectl get pkg -A # should all show ready

Test that normal deletion works and makes events:

# Delete a package CR
kubectl delete pkg -n test-tenant-app test-tenant-app
# Validate success and events
kubectl get pkg -n test-tenant-app # should show no resources
kubectl get events -n test-tenant-app | grep package # should show 3 removal events

Test that finalizer doesn't run until CR is ready:

# This forces a re-reconcile of the package and then deletes immediately
# If you watch while this happens (k9s, etc) you should see it go to Pending before Removing
kubectl patch pkg httpbin-other -n authservice-test-app --subresource=status --type=json  -p='[{"op": "remove", "path": "/status"}]' && kubectl delete pkg httpbin-other -n authservice-test-app
# Validate that the watcher waited to finalize
kubectl logs -n pepr-system -l app=pepr-uds-core-watcher --tail=-1 | grep "Waiting"
kubectl get events -n authservice-test-app | grep package # should show 3 removal events

Test that finalizer places CR in RemovalFailed state on failed cleanup:

# Deploy the test apps again (we need the sso client)
zarf p deploy build/zarf-package-uds-core-test-apps-*.tar.zst --confirm
# Edit the peprstore
kubectl edit peprstore -n pepr-system pepr-uds-core-store
# Delete the line with `uds-core-operator-v2-sso-client-uds-core-httpbin`, this is the client token and will make Pepr unable to cleanup the client
# Save the peprstore
# Delete the package CR
kubectl delete pkg httpbin-other -n authservice-test-app
# Make sure that status is marked as RemovalFailed (after ~15 seconds)
kubectl get pkg httpbin-other -n authservice-test-app
# Make sure events show up that client failed to be removed
kubectl describe pkg httpbin-other -n authservice-test-app
# Make sure that the SSO client removal was retried 4 times before final failure
kubectl logs -n pepr-system -l app=pepr-uds-core-watcher --tail=-1 | grep "cleanupSSOClients"

Also note the automated jest unit tests and validate those.

Checklist before merging

@mjnagel mjnagel self-assigned this Mar 4, 2025
@mjnagel mjnagel changed the title feat: add status for removal / removalfailed feat: add status for removing / removalfailed Mar 4, 2025
@mjnagel mjnagel marked this pull request as ready for review March 4, 2025 18:19
@mjnagel mjnagel requested a review from a team as a code owner March 4, 2025 18:19
Comment thread src/pepr/operator/controllers/keycloak/client-sync.ts
Comment thread src/pepr/operator/reconcilers/package-reconciler.spec.ts
chance-coleman
chance-coleman previously approved these changes Mar 5, 2025
Copy link
Copy Markdown
Contributor

@chance-coleman chance-coleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me (LGTM), ran through tests, everything is working as expected

Comment thread src/pepr/operator/reconcilers/package-reconciler.ts
@mjnagel mjnagel merged commit a99b408 into main Mar 7, 2025
@mjnagel mjnagel deleted the finalizer-rework branch March 7, 2025 17:42
chance-coleman pushed a commit that referenced this pull request Mar 19, 2025
🤖 I have created a release *beep* *boop*
---


##
[0.38.0](v0.37.0...v0.38.0)
(2025-03-19)


### Features

* add status for removing / removalfailed
([#1334](#1334))
([a99b408](a99b408))
* document workaround for Keycloak and Apple M4 Macs
([#1337](#1337))
([ae51155](ae51155))
* root domain templating
([#1343](#1343))
([f64974c](f64974c))
* sso doc restructure
([#1293](#1293))
([3c934a0](3c934a0))


### Bug Fixes

* renovate not checking test directory versions
([#1357](#1357))
([9e78362](9e78362))


### Miscellaneous

* **ci:** disable compliance checks
([#1347](#1347))
([e984131](e984131))
* **ci:** rm `create_bucket_lifecycle` input to s3 module calls
([#1348](#1348))
([c93aa7b](c93aa7b))
* **ci:** swap to govcloud for aws ci tests
([#1342](#1342))
([d51db55](d51db55))
* **ci:** swap to new aws account for rke/eks tests
([#1339](#1339))
([3b6fb50](3b6fb50))
* **ci:** switch to local modules
([#1369](#1369))
([9f8536d](9f8536d))
* **deps:** update grafana
([#1346](#1346))
([d869ca7](d869ca7))
* **deps:** update pepr to v0.46.1
([#1336](#1336))
([5e9c119](5e9c119))
* **deps:** update pepr to v15.5.0
([#1353](#1353))
([8d7b44b](8d7b44b))
* **deps:** update prometheus-stack
([#1324](#1324))
([d6840be](d6840be))
* **deps:** update support dependencies to v0.24.0
([#1360](#1360))
([bf23651](bf23651))
* **deps:** update support dependencies to v4.1.5
([#1340](#1340))
([0714b05](0714b05))
* **deps:** update support dependencies to v4.23.0
([#1358](#1358))
([e6a986e](e6a986e))
* **deps:** update support-deps
([#1332](#1332))
([e37d062](e37d062))
* **deps:** update support-deps
([#1345](#1345))
([e390899](e390899))
* **deps:** update support-deps
([#1351](#1351))
([551a865](551a865))
* **deps:** update support-deps
([#1354](#1354))
([dd36d03](dd36d03))
* **deps:** update velero
([#1299](#1299))
([59ce747](59ce747))
* **docs:** keycloak session timeout doc
([#1315](#1315))
([9509ac7](9509ac7))


### Documentation

* add developer doc on ci testing
([#1344](#1344))
([0e011a4](0e011a4))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
mjnagel added a commit to BagelLab/uds-core that referenced this pull request Nov 14, 2025
## Description

This PR utilizes the "new" ability in a Pepr finalizer to not remove the
`finalizer`. This enables us to update the status while finalizing, and
catch errors if cleanup does not work as expected. Changes:
- Skip finalizer if it's already running (based on status) 
- Skip finalizer if Package isn't ready/failed yet (for
defenseunicorns#963)
- Patch `Removing` status on the CR
- Catch errors on finalization and patch `RemovalFailed` status and
create a failure event
- Retry each cleanup/purge function using `retryWithDelay`

I also updated the diagram to support these changes, as well as adding
test cases for the finalizer function. Diagram update can be previewed
on the docs by using [this
link](https://raw.githubusercontent.com/defenseunicorns/uds-core/c41964d426b8bb9780c26d41c631dbe6f50e854a/docs/.images/diagrams/uds-core-operator-uds-package.svg)
on `docs/reference/configuration/UDS operator/package.md`, specific
changes:
- Moved finalizer section to the right of reconciler
- Simplified flow of validator (to make more space in the diagram)
- Added new pieces of finalizer flow (failure, status patching, etc)

## Related Issue

Fixes defenseunicorns#963

Fixes defenseunicorns#1159

## Type of change

- [x] Bug fix (non-breaking change which fixes an issue)
- [x] New feature (non-breaking change which adds functionality)
- [ ] Other (security config, docs update, etc)

## Steps to Validate

<details><summary>Testing Steps</summary>

Test setup:
```console
# Install slim-dev (unicorn flavor to avoid pull rate limiting)
uds run slim-dev --set flavor=unicorn
# Create the test packages
zarf p create src/test --skip-sbom
# Deploy the test packages
zarf p deploy build/zarf-package-uds-core-test-apps-*.tar.zst --confirm
# Validate all package CRs go to Ready status
kubectl get pkg -A # should all show ready
```

Test that normal deletion works and makes events:
```console
# Delete a package CR
kubectl delete pkg -n test-tenant-app test-tenant-app
# Validate success and events
kubectl get pkg -n test-tenant-app # should show no resources
kubectl get events -n test-tenant-app | grep package # should show 3 removal events
```

Test that finalizer doesn't run until CR is ready:
```console
# This forces a re-reconcile of the package and then deletes immediately
# If you watch while this happens (k9s, etc) you should see it go to Pending before Removing
kubectl patch pkg httpbin-other -n authservice-test-app --subresource=status --type=json  -p='[{"op": "remove", "path": "/status"}]' && kubectl delete pkg httpbin-other -n authservice-test-app
# Validate that the watcher waited to finalize
kubectl logs -n pepr-system -l app=pepr-uds-core-watcher --tail=-1 | grep "Waiting"
kubectl get events -n authservice-test-app | grep package # should show 3 removal events
```

Test that finalizer places CR in RemovalFailed state on failed cleanup:
```console
# Deploy the test apps again (we need the sso client)
zarf p deploy build/zarf-package-uds-core-test-apps-*.tar.zst --confirm
# Edit the peprstore
kubectl edit peprstore -n pepr-system pepr-uds-core-store
# Delete the line with `uds-core-operator-v2-sso-client-uds-core-httpbin`, this is the client token and will make Pepr unable to cleanup the client
# Save the peprstore
# Delete the package CR
kubectl delete pkg httpbin-other -n authservice-test-app
# Make sure that status is marked as RemovalFailed (after ~15 seconds)
kubectl get pkg httpbin-other -n authservice-test-app
# Make sure events show up that client failed to be removed
kubectl describe pkg httpbin-other -n authservice-test-app
# Make sure that the SSO client removal was retried 4 times before final failure
kubectl logs -n pepr-system -l app=pepr-uds-core-watcher --tail=-1 | grep "cleanupSSOClients"
```

</details>

Also note the automated jest unit tests and validate those.

## Checklist before merging

- [x] Test, docs, adr added or updated as needed
- [x] [Contributor
Guide](https://github.com/defenseunicorns/uds-template-capability/blob/main/CONTRIBUTING.md)
followed
mjnagel pushed a commit to BagelLab/uds-core that referenced this pull request Nov 14, 2025
🤖 I have created a release *beep* *boop*
---


##
[0.38.0](defenseunicorns/uds-core@v0.37.0...v0.38.0)
(2025-03-19)


### Features

* add status for removing / removalfailed
([defenseunicorns#1334](defenseunicorns#1334))
([a99b408](defenseunicorns@a99b408))
* document workaround for Keycloak and Apple M4 Macs
([defenseunicorns#1337](defenseunicorns#1337))
([ae51155](defenseunicorns@ae51155))
* root domain templating
([defenseunicorns#1343](defenseunicorns#1343))
([f64974c](defenseunicorns@f64974c))
* sso doc restructure
([defenseunicorns#1293](defenseunicorns#1293))
([3c934a0](defenseunicorns@3c934a0))


### Bug Fixes

* renovate not checking test directory versions
([defenseunicorns#1357](defenseunicorns#1357))
([9e78362](defenseunicorns@9e78362))


### Miscellaneous

* **ci:** disable compliance checks
([defenseunicorns#1347](defenseunicorns#1347))
([e984131](defenseunicorns@e984131))
* **ci:** rm `create_bucket_lifecycle` input to s3 module calls
([defenseunicorns#1348](defenseunicorns#1348))
([c93aa7b](defenseunicorns@c93aa7b))
* **ci:** swap to govcloud for aws ci tests
([defenseunicorns#1342](defenseunicorns#1342))
([d51db55](defenseunicorns@d51db55))
* **ci:** swap to new aws account for rke/eks tests
([defenseunicorns#1339](defenseunicorns#1339))
([3b6fb50](defenseunicorns@3b6fb50))
* **ci:** switch to local modules
([defenseunicorns#1369](defenseunicorns#1369))
([9f8536d](defenseunicorns@9f8536d))
* **deps:** update grafana
([defenseunicorns#1346](defenseunicorns#1346))
([d869ca7](defenseunicorns@d869ca7))
* **deps:** update pepr to v0.46.1
([defenseunicorns#1336](defenseunicorns#1336))
([5e9c119](defenseunicorns@5e9c119))
* **deps:** update pepr to v15.5.0
([defenseunicorns#1353](defenseunicorns#1353))
([8d7b44b](defenseunicorns@8d7b44b))
* **deps:** update prometheus-stack
([defenseunicorns#1324](defenseunicorns#1324))
([d6840be](defenseunicorns@d6840be))
* **deps:** update support dependencies to v0.24.0
([defenseunicorns#1360](defenseunicorns#1360))
([bf23651](defenseunicorns@bf23651))
* **deps:** update support dependencies to v4.1.5
([defenseunicorns#1340](defenseunicorns#1340))
([0714b05](defenseunicorns@0714b05))
* **deps:** update support dependencies to v4.23.0
([defenseunicorns#1358](defenseunicorns#1358))
([e6a986e](defenseunicorns@e6a986e))
* **deps:** update support-deps
([defenseunicorns#1332](defenseunicorns#1332))
([e37d062](defenseunicorns@e37d062))
* **deps:** update support-deps
([defenseunicorns#1345](defenseunicorns#1345))
([e390899](defenseunicorns@e390899))
* **deps:** update support-deps
([defenseunicorns#1351](defenseunicorns#1351))
([551a865](defenseunicorns@551a865))
* **deps:** update support-deps
([defenseunicorns#1354](defenseunicorns#1354))
([dd36d03](defenseunicorns@dd36d03))
* **deps:** update velero
([defenseunicorns#1299](defenseunicorns#1299))
([59ce747](defenseunicorns@59ce747))
* **docs:** keycloak session timeout doc
([defenseunicorns#1315](defenseunicorns#1315))
([9509ac7](defenseunicorns@9509ac7))


### Documentation

* add developer doc on ci testing
([defenseunicorns#1344](defenseunicorns#1344))
([0e011a4](defenseunicorns@0e011a4))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Update Pepr finalizer code to reflect deletion status and failure Deleting Package before status is Ready does not properly clean up clients

3 participants