Skip to content

Fix TerminationGracePeriodSeconds is negative (part 1)#98866

Merged
k8s-ci-robot merged 3 commits into
kubernetes:masterfrom
wzshiming:fix/termination_grace_period_seconds_is_negative
Jun 28, 2021
Merged

Fix TerminationGracePeriodSeconds is negative (part 1)#98866
k8s-ci-robot merged 3 commits into
kubernetes:masterfrom
wzshiming:fix/termination_grace_period_seconds_is_negative

Conversation

@wzshiming

@wzshiming wzshiming commented Feb 8, 2021

Copy link
Copy Markdown
Member

What type of PR is this?

/kind bug
/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Fixes #98506
xref #98507
xref #103476

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

ACTION REQUIRED: TerminationGracePeriodSeconds on pod specs and container probes should not be negative.
Negative values of TerminationGracePeriodSeconds will be treated as the value `1s` on the delete path.
Immutable field validation will be relaxed in order to update negative values. 
In a future release, negative values will not be permitted.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

NONE

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 8, 2021
@k8s-ci-robot k8s-ci-robot added area/kubelet kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Feb 8, 2021
@wzshiming

Copy link
Copy Markdown
Member Author

/retest

@fejta-bot

Copy link
Copy Markdown

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@ehashman

ehashman commented Feb 8, 2021

Copy link
Copy Markdown
Member

/triage accepted
/priority important-longterm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Feb 8, 2021
Comment thread pkg/kubelet/kuberuntime/kuberuntime_container.go Outdated
Comment thread pkg/kubelet/nodeshutdown/nodeshutdown_manager_linux.go Outdated
Comment thread pkg/apis/core/validation/validation.go Outdated
Comment thread pkg/registry/core/pod/strategy.go Outdated
Comment thread staging/src/k8s.io/apimachinery/pkg/apis/meta/v1/validation/validation.go Outdated
@liggitt

liggitt commented Feb 9, 2021

Copy link
Copy Markdown
Member

the BeforeDelete implementation currently recalculates and updates DeletionGracePeriodSeconds:

// a resource was previously left in a state that was non-recoverable. We
// check if the existing stored resource has a grace period as 0 and if so
// attempt to delete immediately in order to recover from this scenario.
if objectMeta.GetDeletionGracePeriodSeconds() == nil || *objectMeta.GetDeletionGracePeriodSeconds() == 0 {
return false, false, nil
}
// only a shorter grace period may be provided by a user
if options.GracePeriodSeconds != nil {
period := int64(*options.GracePeriodSeconds)
if period >= *objectMeta.GetDeletionGracePeriodSeconds() {
return false, true, nil
}
newDeletionTimestamp := metav1.NewTime(
objectMeta.GetDeletionTimestamp().Add(-time.Second * time.Duration(*objectMeta.GetDeletionGracePeriodSeconds())).
Add(time.Second * time.Duration(*options.GracePeriodSeconds)))
objectMeta.SetDeletionTimestamp(&newDeletionTimestamp)
objectMeta.SetDeletionGracePeriodSeconds(&period)
return true, false, nil
}

There are two things that need changing here (with unit tests that exercise each change):

  • if the existing grace period is already less than zero we should delete immediately
  • if the new period is less than zero we should clip to zero
diff --git a/staging/src/k8s.io/apiserver/pkg/registry/rest/delete.go b/staging/src/k8s.io/apiserver/pkg/registry/rest/delete.go
index 3e7ca85b761..7f90e5e18be 100644
--- a/staging/src/k8s.io/apiserver/pkg/registry/rest/delete.go
+++ b/staging/src/k8s.io/apiserver/pkg/registry/rest/delete.go
@@ -103,9 +103,9 @@ func BeforeDelete(strategy RESTDeleteStrategy, ctx context.Context, obj runtime.
 		// 2. Delete the object from storage.
 		// If the update succeeds, but the delete fails (network error, internal storage error, etc.),
 		// a resource was previously left in a state that was non-recoverable.  We
-		// check if the existing stored resource has a grace period as 0 and if so
+		// check if the existing stored resource has a grace period <= 0 and if so
 		// attempt to delete immediately in order to recover from this scenario.
-		if objectMeta.GetDeletionGracePeriodSeconds() == nil || *objectMeta.GetDeletionGracePeriodSeconds() == 0 {
+		if objectMeta.GetDeletionGracePeriodSeconds() == nil || *objectMeta.GetDeletionGracePeriodSeconds() <= 0 {
 			return false, false, nil
 		}
 		// only a shorter grace period may be provided by a user
@@ -113,10 +113,13 @@ func BeforeDelete(strategy RESTDeleteStrategy, ctx context.Context, obj runtime.
 			period := int64(*options.GracePeriodSeconds)
 			if period >= *objectMeta.GetDeletionGracePeriodSeconds() {
 				return false, true, nil
+			} else if period < 0 {
+				// clip to zero
+				period = 0
 			}
 			newDeletionTimestamp := metav1.NewTime(
 				objectMeta.GetDeletionTimestamp().Add(-time.Second * time.Duration(*objectMeta.GetDeletionGracePeriodSeconds())).
-					Add(time.Second * time.Duration(*options.GracePeriodSeconds)))
+					Add(time.Second * time.Duration(period)))
 			objectMeta.SetDeletionTimestamp(&newDeletionTimestamp)
 			objectMeta.SetDeletionGracePeriodSeconds(&period)
 			return true, false, nil

@liggitt liggitt self-assigned this Feb 9, 2021
@wzshiming wzshiming force-pushed the fix/termination_grace_period_seconds_is_negative branch from c07858f to c6d3696 Compare February 9, 2021 14:24
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
Comment thread staging/src/k8s.io/apiserver/pkg/registry/rest/delete_test.go Outdated
@ehashman

Copy link
Copy Markdown
Member

/priority important-soon
/remove-priority important-longterm

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding another options mutation looks like it will exacerbate the issue being fixed in #100101

@deads2k, is that PR close? is there something this PR should do differently to avoid that issue?

Comment thread pkg/apis/core/validation/validation.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if probe.TerminationGracePeriodSeconds is still alpha, can we add the validation to require non-negative values before promotion to beta, rather than making the update validation more complicated/expensive?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Update and I sent a new patch to follow this #103245

Comment thread pkg/apis/core/validation/validation.go Outdated

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this lgtm

@liggitt

liggitt commented Jun 28, 2021

Copy link
Copy Markdown
Member

/lgtm
/approve

@k8s-ci-robot

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, wzshiming

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wzshiming

Copy link
Copy Markdown
Member Author

/kind feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/kubelet cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-action-required Denotes a PR that introduces potentially breaking changes that require user action. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/node Categorizes an issue or PR as relevant to SIG Node. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Projects

Status: API review completed, 1.22

Development

Successfully merging this pull request may close these issues.

Force deleting pods not working after deletionGracePeriodSeconds set to a negative value