fix: error checking and prune bool check upon HasPruned and removed some dead code#1897
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughReconciler error handling switched from errors.Is to errors.As for NotReadyError/DeletionError; Deployer controller APIs Untrack and pruneWithApplySet now return (bool, error) to signal async completion; resource controller adds a nil-check for RepositorySpec and returns not-ready when absent. Changes
Sequence Diagram(s)sequenceDiagram
participant Reconciler as Reconciler
participant Deployer as DeployerController
participant WatchMgr as WatchManager
participant K8sAPI as KubernetesAPI
Reconciler->>Deployer: reconcile deletion (call pruneWithApplySet)
Deployer->>WatchMgr: stop watches / send stop events
WatchMgr->>WatchMgr: attempt async teardown
WatchMgr-->>Deployer: return in-progress (done=false) or done=true
Deployer-->>Reconciler: return (done, error)
alt done=false
Reconciler->>K8sAPI: requeue after 1s to verify completion
else done=true
Reconciler->>K8sAPI: remove finalizers / continue cleanup
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@kubernetes/controller/internal/controller/resource/resource_controller.go`:
- Around line 266-270: Fix the typo in the status message passed to
status.MarkNotReady: change "repository spec in component status most not be
nil" to "repository spec in component status must not be nil" where
component.Status.Component.RepositorySpec is checked and
status.MarkNotReady(resource, v1alpha1.ResourceIsNotAvailable, ...) is called
(in the same block that returns fmt.Errorf about repository spec being nil for
component.Name).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3b0bac26-a2ca-4d2b-aeab-e1af395211d1
📒 Files selected for processing (3)
kubernetes/controller/internal/controller/component/component_controller.gokubernetes/controller/internal/controller/deployer/deployer_controller.gokubernetes/controller/internal/controller/resource/resource_controller.go
…ome dead code On-behalf-of: Gergely Brautigam <gergely.brautigam@sap.com> Signed-off-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>
On-behalf-of: Gergely Brautigam <gergely.brautigam@sap.com> Signed-off-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>
On-behalf-of: Gergely Brautigam <gergely.brautigam@sap.com> Signed-off-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
kubernetes/controller/internal/controller/deployer/deployer_controller_test.go (2)
353-362: Same concerns as the earlier cleanup block.This cleanup has the same issues: cluster-wide deployer listing and silently ignored update errors. Apply the same namespace scoping and error logging fix as recommended for lines 64-73.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@kubernetes/controller/internal/controller/deployer/deployer_controller_test.go` around lines 353 - 362, The cleanup is listing all Deployer resources cluster-wide and ignoring update errors; change the List call to scope to the test namespace (use client.ListOptions{Namespace: <test namespace>} or client.InNamespace(ctxNamespace) equivalent) so v1alpha1.DeployerList named deployers only returns namespace resources, and check/handle the error returned by k8sClient.Update(ctx, d) instead of discarding it—log or fail the test on error; keep using controllerutil.RemoveFinalizer for applySetPruneFinalizer and resourceWatchFinalizer and continue calling test.DeleteObject(ctx, k8sClient, d) after ensuring the update succeeded.
64-73: Scope cleanup to test namespace and handle update errors.Two concerns with this cleanup logic:
k8sClient.List(ctx, deployers)lists all deployers cluster-wide, which could interfere with parallel test runs by cleaning up deployers from other tests.
_ = k8sClient.Update(ctx, d)silently ignores update errors. If the update fails (e.g., conflict), the finalizers remain and the subsequent delete may hang indefinitely.♻️ Proposed fix to scope listing and log update errors
By("cleaning up deployers") deployers := &v1alpha1.DeployerList{} -Expect(k8sClient.List(ctx, deployers)).To(Succeed()) +Expect(k8sClient.List(ctx, deployers, client.InNamespace(namespace.GetName()))).To(Succeed()) for i := range deployers.Items { d := &deployers.Items[i] controllerutil.RemoveFinalizer(d, applySetPruneFinalizer) controllerutil.RemoveFinalizer(d, resourceWatchFinalizer) - _ = k8sClient.Update(ctx, d) + if err := k8sClient.Update(ctx, d); err != nil { + GinkgoLogr.Error(err, "failed to remove finalizers", "deployer", d.Name) + } test.DeleteObject(ctx, k8sClient, d) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@kubernetes/controller/internal/controller/deployer/deployer_controller_test.go` around lines 64 - 73, Scope the Deployer cleanup to the test namespace and stop ignoring update errors: when listing deployers, pass the namespace option (e.g., client.InNamespace(testNamespace) or the existing namespace variable) to k8sClient.List(ctx, deployers, ...); when removing finalizers and calling k8sClient.Update(ctx, d) capture the error and handle retries/conflicts instead of silencing it — for example assert Expect(err).To(Succeed()) or on apierrors.IsConflict fetch the latest object, remove the finalizers again and retry the Update before calling test.DeleteObject; keep references to deployers, controllerutil.RemoveFinalizer, applySetPruneFinalizer, resourceWatchFinalizer, k8sClient.Update, and test.DeleteObject to locate the code.kubernetes/controller/internal/controller/deployer/deployer_controller_rgd_test.go (1)
188-190: Scope teardown to test-owned objects only.At Line 189 and Line 201, the test lists all Deployers/RGDs and then deletes everything found. This can cause cross-test interference in shared environments.
Suggested direction
- deployers := &v1alpha1.DeployerList{} - Expect(k8sClient.List(ctx, deployers)).To(Succeed()) - for i := range deployers.Items { ... } + d := &v1alpha1.Deployer{} + err := k8sClient.Get(ctx, client.ObjectKey{Name: deployerObjName}, d) + if err == nil { + // remove finalizers + update + delete + } else { + Expect(errors.IsNotFound(err)).To(BeTrue()) + } - RGDs := &unstructured.UnstructuredList{} - RGDs.SetGroupVersionKind(listGVK) - Expect(k8sClient.List(ctx, RGDs)).To(Succeed()) - for i := range RGDs.Items { - test.DeleteObject(ctx, k8sClient, &RGDs.Items[i]) - } + rgdToDelete := &unstructured.Unstructured{} + rgdToDelete.SetGroupVersionKind(gvk) + rgdToDelete.SetName(rgdObj.GetName()) + err = k8sClient.Get(ctx, client.ObjectKeyFromObject(rgdToDelete), rgdToDelete) + if err == nil { + test.DeleteObject(ctx, k8sClient, rgdToDelete) + } else { + Expect(errors.IsNotFound(err)).To(BeTrue()) + }Also applies to: 199-204
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@kubernetes/controller/internal/controller/deployer/deployer_controller_rgd_test.go` around lines 188 - 190, The test currently lists and deletes all Deployers/RGDs via k8sClient.List into the deployers/rgds slices which can remove resources from other tests; change the List calls to restrict results to test-owned objects only by passing appropriate client.ListOptions (e.g., client.InNamespace(...) if tests use a dedicated namespace, or client.MatchingLabels{"test-run": testID} if using a per-test label) and/or filter the returned items by checking a specific label/ownerReference before deleting; update the code around the deployers variable and the rgds list handling (the k8sClient.List calls and subsequent deletion loop) to only operate on items that have the test label or the expected ownerReference.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@kubernetes/controller/internal/controller/deployer/deployer_controller_rgd_test.go`:
- Around line 190-195: The loop over deployers.Items currently removes
finalizers with controllerutil.RemoveFinalizer and swallows k8sClient.Update
errors; change it to detect whether RemoveFinalizer actually removed a finalizer
(e.g., by checking controllerutil.ContainsFinalizer before/after or comparing
returned bool), only call k8sClient.Update(ctx, d) when a finalizer was removed,
and replace the silent assignment `_ = k8sClient.Update(...)` with a Ginkgo
assertion (Expect(err).ToNot(HaveOccurred()) or similar) to ensure the update
succeeded before calling test.DeleteObject(ctx, k8sClient, d); keep references
to deployers.Items, controllerutil.RemoveFinalizer, k8sClient.Update, and
test.DeleteObject to locate the change.
---
Nitpick comments:
In
`@kubernetes/controller/internal/controller/deployer/deployer_controller_rgd_test.go`:
- Around line 188-190: The test currently lists and deletes all Deployers/RGDs
via k8sClient.List into the deployers/rgds slices which can remove resources
from other tests; change the List calls to restrict results to test-owned
objects only by passing appropriate client.ListOptions (e.g.,
client.InNamespace(...) if tests use a dedicated namespace, or
client.MatchingLabels{"test-run": testID} if using a per-test label) and/or
filter the returned items by checking a specific label/ownerReference before
deleting; update the code around the deployers variable and the rgds list
handling (the k8sClient.List calls and subsequent deletion loop) to only operate
on items that have the test label or the expected ownerReference.
In
`@kubernetes/controller/internal/controller/deployer/deployer_controller_test.go`:
- Around line 353-362: The cleanup is listing all Deployer resources
cluster-wide and ignoring update errors; change the List call to scope to the
test namespace (use client.ListOptions{Namespace: <test namespace>} or
client.InNamespace(ctxNamespace) equivalent) so v1alpha1.DeployerList named
deployers only returns namespace resources, and check/handle the error returned
by k8sClient.Update(ctx, d) instead of discarding it—log or fail the test on
error; keep using controllerutil.RemoveFinalizer for applySetPruneFinalizer and
resourceWatchFinalizer and continue calling test.DeleteObject(ctx, k8sClient, d)
after ensuring the update succeeded.
- Around line 64-73: Scope the Deployer cleanup to the test namespace and stop
ignoring update errors: when listing deployers, pass the namespace option (e.g.,
client.InNamespace(testNamespace) or the existing namespace variable) to
k8sClient.List(ctx, deployers, ...); when removing finalizers and calling
k8sClient.Update(ctx, d) capture the error and handle retries/conflicts instead
of silencing it — for example assert Expect(err).To(Succeed()) or on
apierrors.IsConflict fetch the latest object, remove the finalizers again and
retry the Update before calling test.DeleteObject; keep references to deployers,
controllerutil.RemoveFinalizer, applySetPruneFinalizer, resourceWatchFinalizer,
k8sClient.Update, and test.DeleteObject to locate the code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 5fb6f938-3e44-4980-998d-3eaba2ab0e06
📒 Files selected for processing (2)
kubernetes/controller/internal/controller/deployer/deployer_controller_rgd_test.gokubernetes/controller/internal/controller/deployer/deployer_controller_test.go
…ardown and we requeue if there are still finalizers to process On-behalf-of: Gergely Brautigam <gergely.brautigam@sap.com> Signed-off-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@kubernetes/controller/internal/controller/deployer/deployer_controller.go`:
- Around line 370-379: The finalizer removal happens even when cleanup returns
an error; change the control flow in the reconciler so that
controllerutil.RemoveFinalizer is only called when the cleanup call succeeded
(no error) and, for pruneWithApplySet, when done==true; for the apply-set branch
(pruneWithApplySet) check err==nil and done==true before logging success and
removing applySetPruneFinalizer, and for the resource watch branch call Untrack
and only remove resourceWatchFinalizer when Untrack returns nil; if cleanup
returned an error append to errs and return/requeue instead of falling through
to RemoveFinalizer.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 3cd7301f-471a-4974-953b-e2185dd185e2
📒 Files selected for processing (1)
kubernetes/controller/internal/controller/deployer/deployer_controller.go
On-behalf-of: Gergely Brautigam <gergely.brautigam@sap.com> Signed-off-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>
On-behalf-of: Gergely Brautigam <gergely.brautigam@sap.com> Signed-off-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>
jakobmoellerdev
left a comment
There was a problem hiding this comment.
how are we not adjusting any tests here?
This particular segment wasn't covered. I can add tests ofc.
I don't understand the question, sorry. :D Followed up by what...? |
|
This was an original comment on #1897 (comment) |
|
@jakobmoellerdev Ah, okay... Sorry, so what's my action here? Should I improve upon this section some more here? |
|
Talked about this. Going to see what's up with Kro and adjust accordingly. |
|
Okay, so I read through the entire path section and our stuff. So, Jakob you know this better probably. :D But as far as I understand, Kro does a two step approach where it Loops on Delete until finalizers can be removed, then prunes orphaned resources which is only a single loop because it doesn't care about finalizers anymore. As far as I understand OUR code, we do the same, but a bit differently, because we don't have topological ordering on delete flow. Adopting that would mean even more code. :) Our prune with it's looptyloop is actually fine, because it mimics Kro's delete loop prune cycle just all in one I guess. Which works. The only fix is really just that the logic was inverted. The rest, I believe, is okay. :) |
jakobmoellerdev
left a comment
There was a problem hiding this comment.
discussed again that this needs a test
On-behalf-of: Gergely Brautigam <gergely.brautigam@sap.com> Signed-off-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>
…el into fix-stuck-rgd
matthiasbruns
left a comment
There was a problem hiding this comment.
duration looks really odd with the ns display of 10 * 60_000_000_000
On-behalf-of: Gergely Brautigam <gergely.brautigam@sap.com> Signed-off-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com>
|
@jakobmoellerdev I tried every iteration of using https://pkg.go.dev/errors#AsType and it's ugly as HELL! :D It's only useful if we use the actual value, otherwise it's several if statements because it returns the value and the The only way that was "slightly" better was something like this: _, notReadyOk := errors.AsType[*util.NotReadyError](err)
_, deletionOk := errors.AsType[*util.DeletionError](err)
if notReadyOk || deletionOk {...But even this is harder to read than the two errors.As's IMHO. |
I see, sad to hear. but at least you bumped go for ocm, thanks <3 :D |
…ome dead code (open-component-model#1897) <!-- markdownlint-disable MD041 --> #### What this PR does / why we need it When all objects were deleted at the same time, there was an error check to see if the specific error was a deletion error or not. This was checked with errors.Is. `errors.Is` however, is also checking for the message being there and it was comparing with an empty error. Another options would be to implement `Is` on the error, but that's literally just reimplementing As. Here, we want to make sure that whatever is coming back is the Type of the error and not Is that error. Further, the HasPruned's check was actually inverted. We wanted to do the thing when it actually HasPruned. And not when it hasn't. Lastely, removed some dead code and added ```go if component.Status.Component.RepositorySpec == nil { ``` which is actually a fix for a masked NIL panic that happened further below. #### Which issue(s) this PR fixes <!-- Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. --> #### Testing ##### How to test the changes <!-- Required files to test the changes: .ocmconfig ```yaml type: generic.config.ocm.software/v1 configurations: - type: credentials.config.ocm.software repositories: - repository: type: DockerConfig/v1 dockerConfigFile: "~/.docker/config.json" ``` Commands that test the change: ```bash ocm get cv xxx ocm transfer xxx ``` --> ##### Verification - [ ] I have tested the changes locally by running `ocm` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved detection and messaging for unavailable resources. * Added nil-safety checks to prevent failures when resource specifications are missing. * **Improvements** * Better reconciliation progress signaling to reduce unnecessary requeues during cleanup. * More precise error handling to distinguish transient vs. final unavailability, improving stability and observability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com> Co-authored-by: Matthias Bruns <github@matthiasbruns.com> Co-authored-by: Jakob Möller <jakob.moeller@sap.com>
…ome dead code (open-component-model#1897) <!-- markdownlint-disable MD041 --> #### What this PR does / why we need it When all objects were deleted at the same time, there was an error check to see if the specific error was a deletion error or not. This was checked with errors.Is. `errors.Is` however, is also checking for the message being there and it was comparing with an empty error. Another options would be to implement `Is` on the error, but that's literally just reimplementing As. Here, we want to make sure that whatever is coming back is the Type of the error and not Is that error. Further, the HasPruned's check was actually inverted. We wanted to do the thing when it actually HasPruned. And not when it hasn't. Lastely, removed some dead code and added ```go if component.Status.Component.RepositorySpec == nil { ``` which is actually a fix for a masked NIL panic that happened further below. #### Which issue(s) this PR fixes <!-- Usage: `Fixes #<issue number>`, or `Fixes (paste link of issue)`. --> #### Testing ##### How to test the changes <!-- Required files to test the changes: .ocmconfig ```yaml type: generic.config.ocm.software/v1 configurations: - type: credentials.config.ocm.software repositories: - repository: type: DockerConfig/v1 dockerConfigFile: "~/.docker/config.json" ``` Commands that test the change: ```bash ocm get cv xxx ocm transfer xxx ``` --> ##### Verification - [ ] I have tested the changes locally by running `ocm` <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Improved detection and messaging for unavailable resources. * Added nil-safety checks to prevent failures when resource specifications are missing. * **Improvements** * Better reconciliation progress signaling to reduce unnecessary requeues during cleanup. * More precise error handling to distinguish transient vs. final unavailability, improving stability and observability. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Gergely Brautigam <182850+Skarlso@users.noreply.github.com> Co-authored-by: Matthias Bruns <github@matthiasbruns.com> Co-authored-by: Jakob Möller <jakob.moeller@sap.com> Signed-off-by: Gerald Morrison (SAP) <gerald.morrison@sap.com>
What this PR does / why we need it
When all objects were deleted at the same time, there was an error check to see if the specific error was a deletion error or not.
This was checked with errors.Is.
errors.Ishowever, is also checking for the message being there and it was comparing with an empty error. Another options would be to implementIson the error, but that's literally just reimplementing As.Here, we want to make sure that whatever is coming back is the Type of the error and not Is that error.
Further, the HasPruned's check was actually inverted. We wanted to do the thing when it actually HasPruned. And not when it hasn't.
Lastely, removed some dead code and added
which is actually a fix for a masked NIL panic that happened further below.
Which issue(s) this PR fixes
Testing
How to test the changes
Verification
ocmSummary by CodeRabbit
Bug Fixes
Improvements