Skip to content

Sending existing object to the webhook for the DELETE verb#76346

Merged
k8s-ci-robot merged 4 commits intokubernetes:masterfrom
caesarxuchao:delete-admission-objects
May 18, 2019
Merged

Sending existing object to the webhook for the DELETE verb#76346
k8s-ci-robot merged 4 commits intokubernetes:masterfrom
caesarxuchao:delete-admission-objects

Conversation

@caesarxuchao
Copy link
Copy Markdown
Contributor

@caesarxuchao caesarxuchao commented Apr 9, 2019

Based on #66535, with these major changes:

  • The deleteValidation is plumbed to the GuaranteedUpdate and etcd3#Delete, so that it's always called when the update and delete retry on conflicts.

/assign
/sig api-machinery
/kind bug

For admission webhooks registered for DELETE operations on k8s built APIs or CRDs, the apiserver now sends the existing object as admissionRequest.Request.OldObject to the webhook. 
For custom apiservers they uses the generic registry in the apiserver library, they get this behavior automatically.

@k8s-ci-robot k8s-ci-robot added do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Apr 9, 2019
@caesarxuchao caesarxuchao force-pushed the delete-admission-objects branch from 390f260 to 744bf2a Compare April 10, 2019 21:06
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. area/test sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Apr 10, 2019
@caesarxuchao caesarxuchao force-pushed the delete-admission-objects branch from 744bf2a to 8e93456 Compare April 10, 2019 21:09
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a duplicate of line 183.

@caesarxuchao caesarxuchao force-pushed the delete-admission-objects branch 3 times, most recently from 8a3bd0f to 35c38f7 Compare April 11, 2019 00:11
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be controversial.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is interesting. wouldn't malformed data in etcd also prevent us from listing and finding out what the problematic key was? I'm trying to understand if this special handling actually enables us to resolve the issue, or if it's just a step in that direction.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The error message send by the LIST handler does include the key:

return storage.NewInternalErrorf("unable to transform key %q: %v", kv.Key, err)

So this special handling enables us to resolve the issue.

@caesarxuchao caesarxuchao force-pushed the delete-admission-objects branch 4 times, most recently from 6bafd02 to f6b94b4 Compare April 11, 2019 17:46
@caesarxuchao
Copy link
Copy Markdown
Contributor Author

/retest

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Apr 11, 2019
@caesarxuchao
Copy link
Copy Markdown
Contributor Author

/unassign
/assign @liggitt @mbohlool @roycaihw @yue9944882

@caesarxuchao
Copy link
Copy Markdown
Contributor Author

CHAO: finalizer for roles.rbac.authorization.k8s.io/role1 is [test/k8s.io], uid is 01f1dcc1-65e4-4836-b85f-ccf0b3be4f56, rv is 1312
CHAO: again, finalizer for roles.rbac.authorization.k8s.io/role1 is [test/k8s.io], uid is 01f1dcc1-65e4-4836-b85f-ccf0b3be4f56, rv is 1312
CHAO: in updateForGracefulDeletionAndFinalizers 1, finalizer for role1 is [], uid is 01f1dcc1-65e4-4836-b85f-ccf0b3be4f56, rv is 1311
CHAO: in updateForGracefulDeletionAndFinalizers 2, finalizer for role1 is [], uid is 01f1dcc1-65e4-4836-b85f-ccf0b3be4f56, rv is 1311
CHAO: in updateForGracefulDeletionAndFinalizers 3, finalizer for role1 is [], uid is 01f1dcc1-65e4-4836-b85f-ccf0b3be4f56, rv is 1311

Somehow the rv goes backwards in this code block: https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/registry/generic/registry/store.go#L873-L909.

@sttts
Copy link
Copy Markdown
Contributor

sttts commented May 10, 2019

Any chance to squash this into logical pieces for easier review?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to add a specific delete test for CRD, because like namespaces, apiserver does not delete CRD on an update that removes the last pending finalizer.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it? that seems like the same bug as the one you fixed in storage

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRD storage doesn't override Update, and only changes Delete behavior on first delete... I don't think it should require a custom test method (I'd expect removing finalizers on update of a CRD with a deletiontimestamp set to actually delete)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the commit that modified shouldDeleteDuringUpdate. The function returns false unless the DeletionGracePeriodSeconds was set on the first Delete. CRD storage doesn't set the DeletionGracePeriodSeconds.

I removed my previous fix to shouldDeleteDuringUpdate because that would make Namespaces misbehave.

I tracked the issue in #77944. We can solve it in a followup.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This matches the existing behavior:

// Delete succeeds but reports an error because we cannot access the body
if err := store.Delete(ctx, preset[1].key, &example.Pod{}, nil); !storage.IsInternalError(err) {
t.Errorf("Unexpected error: %v", err)
}
if err := store.Get(ctx, preset[1].key, "", &example.Pod{}, false); !storage.IsNotFound(err) {
t.Errorf("Unexpected error: %v", err)
}

@liggitt
Copy link
Copy Markdown
Member

liggitt commented May 15, 2019

/retest

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to skip validateDeletion or call it with a nil object? Currently, delete admission is always called, regardless of transform error, with a nil oldObject. would it be preferable to continue calling delete admission with a nil oldObject in the case of a transform error over skipping it entirely?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't decide which bothers me more, giving an inconsistent API to admission consumers (sometimes getting a nil oldObject in a delete admission call), skipping delete admission entirely, or requiring talking to etcd to delete a malformed value.

Will CRD webhook conversion errors and KMS transform errors exercise this path? I'm not sure I'd want to abandon calling admission if an external system like that flakes.

Copy link
Copy Markdown
Contributor Author

@caesarxuchao caesarxuchao May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed the KMS transform is part of the transformer stack:

transformer, err = getEnvelopePrefixTransformer(provider.KMS, envelopeService, kmsTransformerPrefixV1)
.

I'll trace the CRD webhook path later.

[update] CRD webhook conversion is plumbed to the storage codec, not as a transformer. See code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Among all the options, "giving an inconsistent API to admission consumers" seems to be the lesser of the evils.

I added the last commit to send nil objects to validateDelete.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline. since we already have to look up the existing object in the REST handler to verify there are no remaining finalizers, a persistent transform error is already blocking, and we'll never make it here. Continuing to simply return if we encounter a transform error (rather than skipping admission or passing a nil object) is no worse than the current state, and lets us give a consistent API to admission authors.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's drop the transformation error changes from this PR to keep it focused on passing existing objects to delete admission

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also return if origState is nil, since we don't have the rev info to use later

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a return after the error checks.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRD storage doesn't override Update, and only changes Delete behavior on first delete... I don't think it should require a custom test method (I'd expect removing finalizers on update of a CRD with a deletiontimestamp set to actually delete)

@caesarxuchao
Copy link
Copy Markdown
Contributor Author

/retest

@caesarxuchao
Copy link
Copy Markdown
Contributor Author

@liggitt I cherry-picked your #77952 to get the integration tests working, I would need a rebase after #77952 merged. The PR is otherwise ready for another round of review. Thank you for the meticulous reviews.

@liggitt
Copy link
Copy Markdown
Member

liggitt commented May 17, 2019

/lgtm
/approve

needs a rebase since #77952 merged

yue9944882 and others added 4 commits May 17, 2019 09:50
validate deletion admission object

backward compatibility: add validation for direct storage delete calls

apply nil validation to existing tests

revert behavior changes in deleteCollection call

fixes validation on wiring graceful deletion

remove nil validation check

continue admission check on not found error
conflict.

Adding unit test verify that deleteValidation is retried.

adding e2e test verifying the webhook can intercept configmap and custom
resource deletion, and the existing object is sent via the
admissionreview.OldObject.

update the admission integration test to verify that the existing object
is passed to the deletion admission webhook as oldObject, in case of an
immediate deletion and in case of an update-on-delete.
@caesarxuchao
Copy link
Copy Markdown
Contributor Author

/retest

1 similar comment
@caesarxuchao
Copy link
Copy Markdown
Contributor Author

/retest

@liggitt
Copy link
Copy Markdown
Member

liggitt commented May 17, 2019

/lgtm
/approve
/priority important-soon

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: caesarxuchao, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@yue9944882
Copy link
Copy Markdown
Member

awesome! 🎉

@liggitt
Copy link
Copy Markdown
Member

liggitt commented May 31, 2019

included in docs PR for 1.15 at kubernetes/website#14671

@roycaihw
Copy link
Copy Markdown
Member

/area admission-control

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/admission-control area/apiserver area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants