🐛 Fix fake client's SSA status patch resource version check#3443
Conversation
|
Welcome @josvazg! |
|
Hi @josvazg. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
pkg/client/fake/client_test.go
Outdated
| Expect(updatedPod.ResourceVersion).NotTo(Equal("1")) | ||
| originalRV := updatedPod.ResourceVersion | ||
|
|
||
| // Now try to do an SSA apply on the status with an explicitly mismatched resourceVersion. |
There was a problem hiding this comment.
I can not confirm the behavior described here against a real kube apiserver: If you pass an incorrect RV the request will fail (you can test this with kubectl apply --server-side). Only if you pass no RV (which I think is the common case for SSA) will it succeed regardless of server RV.
There was a problem hiding this comment.
I see, makes sense. Still the code from which we hit this was failing even after we were wiping the resource version clean right before the SSA, when in version 0.22 it was working fine without changes.
Let me double check.
There was a problem hiding this comment.
I need to dig deeper.
When testing this again removing the fix and rv value I get it to pass. But our original test still fails.
Comparing if accessor.GetResourceVersion() != oldAccessor.GetResourceVersion() from my controller-runtime checkout (using this test modified) the comparison is rv = "1" == old rv = "1" so it passes. But from our code I saw rv="" == old rv = "1000"
Not sure what is the difference between both tests. Also not sure how the passing test is comparing "1 against 1" even though we wiped the rv before calling SSA.
There was a problem hiding this comment.
We think we found it:
if accessor.GetResourceVersion() == "" {
switch {
case allowsUnconditionalUpdate(gvk):
accessor.SetResourceVersion(oldAccessor.GetResourceVersion())
// This is needed because if the patch explicitly sets the RV to null, the client-go reaction we use
// to apply it and whose output we process here will have it unset. It is not clear why the Kubernetes
// apiserver accepts such a patch, but it does so we just copy that behavior.
// Kubernetes apiserver behavior can be checked like this:
// `kubectl patch configmap foo --patch '{"metadata":{"annotations":{"foo":"bar"},"resourceVersion":null}}' -v=9`
case bytes.Contains(debug.Stack(), []byte("sigs.k8s.io/controller-runtime/pkg/client/fake.(*fakeClient).Patch")):
// We apply patches using a client-go reaction that ends up calling the trackers Update. As we can't change
// that reaction, we use the callstack to figure out if this originated from the "fakeClient.Patch" func.
accessor.SetResourceVersion(oldAccessor.GetResourceVersion())
case bytes.Contains(debug.Stack(), []byte("sigs.k8s.io/controller-runtime/pkg/client/fake.(*fakeClient).Apply")):
// We apply patches using a client-go reaction that ends up calling the trackers Update. As we can't change
// that reaction, we use the callstack to figure out if this originated from the "fakeClient.Apply" func.
accessor.SetResourceVersion(oldAccessor.GetResourceVersion())
}
}This POD test cannot reproduce the issue we hit because it is a core resource and follows the first case case allowsUnconditionalUpdate(gvk) which fixed the RV to match
Our resource is a CRD, and we are setting a subresource on it. So we are missing this case:
case bytes.Contains(debug.Stack(), []byte("sigs.k8s.io/controller-runtime/pkg/client/fake.(*fakeSubResourceClient).Apply")):
// We apply patches using a client-go reaction that ends up calling the trackers Update. As we can't change
// that reaction, we use the callstack to figure out if this originated from the "fakeClient.Patch" func.
accessor.SetResourceVersion(oldAccessor.GetResourceVersion())I shall revamp this PR:
- Test must reproduce the issue using a CRD, not a core resource.
- The old code fix goes away.
- The new fix is just the new entry case in that are missing.
There was a problem hiding this comment.
Not all in-tree resources allow unconditional update, so you might find one there you can use for testing
There was a problem hiding this comment.
In the end I used an adhoc custom object. It took longer to verify because if the test registers an unstructured custom object, the issue is not reproduced. In that case, the oldAccessor is set with an empty resourceVersion which matches the one in the accessor.
Only when the custom object registered with a proper type, like in our case, you would hit the issue. In that case, the oldAccessor is set with a non empty resourceVersion which does not match the one in the accessor.
This was tricky to figure out, I had to debug our failing test and the reproducer until I spotted the difference between both.
This PR is split into 2 commits, one with the test and one with the fix. If you run without the fix commit you can see the error we hit in our case.
|
@josvazg @alvaroaleman just got to the same conclusion 😅 with #3444 Closing it now as this was here before, let me know otherwise. |
Signed-off-by: jose.vazquez <jose.vazquez@mongodb.com>
Signed-off-by: jose.vazquez <jose.vazquez@mongodb.com>
bd7b274 to
1fa48c2
Compare
| Expect(cl.Status().Apply(ctx, node, client.FieldOwner("test-owner"))).To(Succeed()) | ||
| }) | ||
|
|
||
| It("should allow SSA apply on status without object has changed issues", func(ctx SpecContext) { |
There was a problem hiding this comment.
Can we also add a test that validates that status apply requests with an invalid RV fail?
There was a problem hiding this comment.
@alvaroaleman I am confused. Maybe I misunderstood some details here.
I would assume that the RV value sent on an SSA will be ignored. In fact, would not the applied config that is what we pass the SSA operation have that already removed?
There was a problem hiding this comment.
No, if you send an SSA request with an RV in it and it doesn't match, the apiserver fails the request: #3443 (comment)
There was a problem hiding this comment.
ah! OK makes sense.
Test added now
|
/ok-to-test |
025d487 to
31564ac
Compare
|
/assign (I'll review once Alvaro is fine with it) |
31564ac to
d80043e
Compare
alvaroaleman
left a comment
There was a problem hiding this comment.
Thanks for debugging this! Two small comments
pkg/client/fake/versioned_tracker.go
Outdated
| // We apply patches using a client-go reaction that ends up calling the trackers Update. As we can't change | ||
| // that reaction, we use the callstack to figure out if this originated from the "fakeClient.Apply" func. | ||
| accessor.SetResourceVersion(oldAccessor.GetResourceVersion()) | ||
| case bytes.Contains(debug.Stack(), []byte("sigs.k8s.io/controller-runtime/pkg/client/fake.(*fakeSubResourceClient).Patch")): |
There was a problem hiding this comment.
Would you mind using a single case statement for Apply, subresource.Patch and subresource.Apply that ORs them? We don't really need the same comment three times
There was a problem hiding this comment.
yeah it was getting too verbose now
|
|
||
| // This is expected to fail with the wring rv value passed in in the applied config | ||
| err := cl.Status().Apply(ctx, resourceAC, client.FieldOwner("test-owner"), client.ForceOwnership) | ||
| Expect(err).To(HaveOccurred(), "SSA apply on status should not succeed when resourceVersion is wrongly set") |
There was a problem hiding this comment.
Please check that we got the correct error here as well and not just any error:
Expect(apierrors.IsConflict(err)).To(BeTrue())
|
/cherrypick release-0.23 |
|
@alvaroaleman: once the present PR merges, I will cherry-pick it on top of DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Signed-off-by: jose.vazquez <jose.vazquez@mongodb.com>
d80043e to
b0fee5c
Compare
alvaroaleman
left a comment
There was a problem hiding this comment.
/hold
so @sbueringer can have a look
|
LGTM label has been added. DetailsGit tree hash: d973602ef2870ada24f9fa483171303afa1c07f9 |
sbueringer
left a comment
There was a problem hiding this comment.
Thank you very much. Makes sense, just two nits
Co-authored-by: Stefan Büringer <4662360+sbueringer@users.noreply.github.com>
Co-authored-by: Stefan Büringer <4662360+sbueringer@users.noreply.github.com>
|
Thx again! /lgtm |
|
LGTM label has been added. DetailsGit tree hash: 471359de1349cdea8c889622043f98f79ae1e0ba |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alvaroaleman, josvazg, sbueringer The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
@alvaroaleman: new pull request created: #3446 DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This change includes a test showing a sample SSA on a Pod status which failed before the fix with the "object was modified".
It also includes the fix in the fake client, which is copy the resource version from the old accessor as it was already done for non status patch and apply operations.
Advice on simplifying the reproducer is welcome.
Fixes #3442