[KEP-3521] Part 3: Bug fixes, integration & E2E Test by Huang-Wei · Pull Request #113442 · kubernetes/kubernetes

Huang-Wei · 2022-10-29T05:45:02Z

What type of PR is this?

/kind feature
/sig scheduling

What this PR does / why we need it:

This PR is rebased atop Part 1 (#113274) & Part 2 (#113275), covering the following logic:

Bug fixes ([KEP-3521] Part 2: Core scheduling implementation #113275 (comment))
Integration tests
E2E tests

Meanwhile, I'm creating a draft PR in test-infra kubernetes/test-infra#27862 to enable it in CI.

Which issue(s) this PR fixes:

Part of #113269

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/3521-pod-scheduling-readiness

Huang-Wei · 2022-10-29T05:45:20Z

~~E2E test is WIP.~~ Completed.

k8s-triage-robot · 2022-10-29T05:57:00Z

This PR may require stable metrics review.

Stable metrics are guaranteed to not change. Please review the documentation for the requirements and lifecycle of stable metrics and ensure that your metrics meet these guidelines.

k8s-triage-robot · 2022-10-29T06:13:00Z

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

alculquicondor · 2022-11-01T20:04:09Z

/assign @ahg-g

fedebongio · 2022-11-01T21:23:13Z

/remove-sig api-machinery

Huang-Wei · 2022-11-01T21:59:54Z

E2E is now only tested locally:

Create a local cluster having the feature enabled:

FEATURE_GATES=PodSchedulingReadiness=true hack/local-up-cluster.sh

Building e2e test binary:
```
make WHAT=test/e2e/e2e.test
```

Running e2e test

_output/bin/e2e.test --provider=local --ginkgo.focus='PodSchedulingReadiness'

Test log: e2e.log

ahg-g · 2022-11-08T17:37:28Z

This is good to have for sure, but I assume that podInfo for a gated pod should never have its Timestamp set beyond the first time we observed the pod (on Add(...)) because we should have never attempted to schedule the pod (i.e., Attempts should always be zero), and so Timestamp shouldn't be reset.

Yes, attempts is always zero, so duration := p.calculateBackoffDuration(podInfo) would always return podInitialBackoffDuration. In other words, if the duration between the pod is added and updated less than podInitialBackoffDuration, isPodBackingOff would return true.

This bug can be observed by running hack/local-up-cluster.sh with the following diff:

diff --git a/hack/local-up-cluster.sh b/hack/local-up-cluster.sh index 6ec7f1ded28..e2c67dd0e34 100755 --- a/hack/local-up-cluster.sh +++ b/hack/local-up-cluster.sh @@ -869,6 +869,8 @@ clientConnection: kubeconfig: ${CERT_DIR}/scheduler.kubeconfig leaderElection: leaderElect: false +podInitialBackoffSeconds: 120 +podMaxBackoffSeconds: 200 EOF ${CONTROLPLANE_SUDO} "${GO_OUT}/kube-scheduler" \ --v="${LOG_LEVEL}" \

ahg-g · 2022-11-08T18:03:44Z

looks good to me, pls squash

- test generic integration in plugins_test.go - test integration with SchedulingGates plugin in queue_test.go

Huang-Wei · 2022-11-08T18:17:33Z

Oh I need one test owner to approve the test/* changes. ~~@dims do you have some cycles to review and approve this?~~ (Dims is out today)

ahg-g · 2022-11-08T18:23:12Z

/lgtm
/approve

for the scheduler

Huang-Wei · 2022-11-08T20:07:51Z

/retest

aojea · 2022-11-08T20:14:47Z

+		},
+		{
+			name:         "pod is not admitted to enqueue",
+			pod:          st.MakePod().Name("p").Namespace(testCtx.NS.Name).SchedulingGates([]string{"foo"}).Container("pause").Obj(),


is it a problem that both testcases use a pod with the same name?

it's fine as in an integration test, each sub-test's context/env is destroyed, and is supposed to run statelessly.

testutils.CleanupPods(testCtx.ClientSet, t, []*v1.Pod{pod})

aojea · 2022-11-08T20:23:36Z

+			for _, idx := range tt.rmPodsSchedulingGates {
+				patch := `{"spec": {"schedulingGates": null}}`
+				podName := tt.pods[idx].Name
+				if _, err := cs.CoreV1().Pods(ns).Patch(ctx, podName, types.StrategicMergePatchType, []byte(patch), metav1.PatchOptions{}); err != nil {


not an expert on these things, but I can see that the strategy is `merge``

kubernetes/staging/src/k8s.io/api/core/v1/types.go

Lines 3340 to 3343 in 7b6293b

// +patchStrategy=merge

// +listType=map

// +listMapKey=name

SchedulingGates []PodSchedulingGate `json:"schedulingGates,omitempty" patchStrategy:"merge" patchMergeKey:"name" protobuf:"bytes,38,opt,name=schedulingGates"`

is this patch removing the gates?

yes, it's removing all scheduling gates.

yeah, for strategic merge patch;

foo: null deletes the whole field

foo: [{"name":"bar"}] adds or overwrites the existing entry with name=bar

foo: [{"$patch":"delete", "name":"bar"}] deletes the existing entry with name=bar

aojea · 2022-11-08T20:32:59Z

the functions says evaluate if a certain amount of pods in given ns are running., but this doesn't checked yet that there are numpods running.
Maybe you can exit fast if len(pods) < num after the Pods().List, this guarantee that nonMatchingPods == 0 => runningPods == num

oh, thanks for the catch. I should have rewritten it to the style in WaitForPodsSchedulingGated() but somehow missed it. Let me update.

fedebongio · 2022-11-08T21:11:17Z

/triage accepted

Huang-Wei · 2022-11-08T21:20:45Z

/retest

aojea · 2022-11-08T21:35:14Z

+1 on the mechanics, I don't have enough knowledge to review the feature and its logic
Thanks for working on tests

Huang-Wei · 2022-11-08T21:37:53Z

@aojea really appreciate your time! Logic-wise, @ahg-g has signed off reviewing, so it'd be good if you can /approve the test changes, then we're in good shape to 🚢.

aojea · 2022-11-08T22:04:25Z

/approve

for tests

k8s-ci-robot · 2022-11-08T22:04:53Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, aojea, Huang-Wei, logicalhan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~pkg/scheduler/OWNERS~~ [Huang-Wei,ahg-g]
~~test/OWNERS~~ [aojea]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

aojea · 2022-11-08T22:05:31Z

/lgtm /approve

for the scheduler

carrying over the lgtm since the diff was only about the comments on the tests and is close to code freeze

/lgtm

k8s-ci-robot requested review from alculquicondor and andrewsykim October 29, 2022 05:45

Huang-Wei mentioned this pull request Oct 31, 2022

Umbrella issue tracking Alpha impl. of KEP 3521 (Pod Scheduling Readiness) #113269

Closed

13 tasks

marosset mentioned this pull request Oct 31, 2022

Pod Scheduling Readiness kubernetes/enhancements#3521

Closed

k8s-ci-robot assigned ahg-g Nov 1, 2022

k8s-ci-robot removed the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Nov 1, 2022

Huang-Wei force-pushed the kep-3521-C branch from b72420d to 3828773 Compare November 1, 2022 21:48

k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Nov 1, 2022

ahg-g reviewed Nov 8, 2022

View reviewed changes

Huang-Wei added 2 commits November 8, 2022 10:05

Fix an issue that pod may be added to backoffQ

0f66366

Integration tests for KEP Pod Scheduling Readiness

ae5d430

- test generic integration in plugins_test.go - test integration with SchedulingGates plugin in queue_test.go

Huang-Wei force-pushed the kep-3521-C branch from be87293 to 1722765 Compare November 8, 2022 18:06

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2022

aojea reviewed Nov 8, 2022

View reviewed changes

E2E test for KEP Scheduling Readiness Gates

abe0c5d

Huang-Wei force-pushed the kep-3521-C branch from 1722765 to abe0c5d Compare November 8, 2022 20:38

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2022

k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Nov 8, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 8, 2022

k8s-ci-robot assigned aojea Nov 8, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 8, 2022

k8s-ci-robot merged commit d619f60 into kubernetes:master Nov 8, 2022

Huang-Wei deleted the kep-3521-C branch November 8, 2022 23:14

Huang-Wei mentioned this pull request Mar 2, 2023

feature(scheduler): implement matchLabelKeys/mismatchLabelKeys in PodAffinity and PodAntiAffinity #116065

Merged

	// +patchStrategy=merge
	// +listType=map
	// +listMapKey=name
	SchedulingGates []PodSchedulingGate `json:"schedulingGates,omitempty" patchStrategy:"merge" patchMergeKey:"name" protobuf:"bytes,38,opt,name=schedulingGates"`

Conversation

Huang-Wei commented Oct 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

Uh oh!

Huang-Wei commented Oct 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

k8s-triage-robot commented Oct 29, 2022

Uh oh!

k8s-triage-robot commented Oct 29, 2022

Uh oh!

alculquicondor commented Nov 1, 2022

Uh oh!

fedebongio commented Nov 1, 2022

Uh oh!

Huang-Wei commented Nov 1, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahg-g commented Nov 8, 2022

Uh oh!

Huang-Wei commented Nov 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahg-g commented Nov 8, 2022

Uh oh!

Huang-Wei commented Nov 8, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aojea Nov 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fedebongio commented Nov 8, 2022

Uh oh!

Huang-Wei commented Nov 8, 2022

Uh oh!

aojea commented Nov 8, 2022

Uh oh!

Huang-Wei commented Nov 8, 2022

Uh oh!

aojea commented Nov 8, 2022

Uh oh!

k8s-ci-robot commented Nov 8, 2022

Uh oh!

aojea commented Nov 8, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Huang-Wei commented Oct 29, 2022 •

edited

Loading

Huang-Wei commented Oct 29, 2022 •

edited

Loading

Huang-Wei commented Nov 8, 2022 •

edited

Loading

aojea Nov 8, 2022 •

edited

Loading