Skip to content

fix(kep-3633): redesign matchLabelKeys as matchLabelSelectors to introduce Operator field#4001

Merged
k8s-ci-robot merged 7 commits intokubernetes:masterfrom
sanposhiho:matchlabelselector
Jun 7, 2023
Merged

fix(kep-3633): redesign matchLabelKeys as matchLabelSelectors to introduce Operator field#4001
k8s-ci-robot merged 7 commits intokubernetes:masterfrom
sanposhiho:matchlabelselector

Conversation

@sanposhiho
Copy link
Copy Markdown
Member

  • One-line PR description: redesign matchLabelKeys as matchLabelSelectors to introduce Operator field
  • Other comments:

@sanposhiho sanposhiho force-pushed the matchlabelselector branch from 8b7bee8 to 5bb547b Compare May 14, 2023 06:17
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 14, 2023
@k8s-ci-robot k8s-ci-robot requested a review from ahg-g May 14, 2023 06:17
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label May 14, 2023
@k8s-ci-robot k8s-ci-robot added sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 14, 2023
- These e2e tests will be added.
- `MatchLabelKeys: ["pod-template-hash"]` in `PodAffinity` (both in Filter and Score) works as expected with a rolling upgrade scenario.
- `MatchLabelKeys: ["pod-template-hash"]` in `PodAntiAffinity` (both in Filter and Score) works as expected with a rolling upgrade scenario.
N/A
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to matchLabelSelectors, but we decided not to have e2e test in kubernetes/kubernetes#116065.

@sanposhiho sanposhiho force-pushed the matchlabelselector branch from 5bb547b to 15c1f05 Compare May 14, 2023 06:19
@sanposhiho
Copy link
Copy Markdown
Member Author

/cc @ahg-g @Huang-Wei @alculquicondor


#### Story 2

When users want to achieve exclusive 1:1 job to domain placement in [kubernetes-sigs/jobset](https://github.com/kubernetes-sigs/jobset).
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this use case be generalized? I'd like this case to be explained as JobSet-like, but not restricted to JobSet only.

For example: when M Pods and N Pods share the same PodTemplate. The orchestrator spins up the two set of Pods at runtime, distinguish at their common key with different value: one can be foo=abc, the other is foo=xyz. If we want to place M pods in a topology, while keep N pods away from that topology, how can we achieve that?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll try to have another example here to explain the same usecase.

Copy link
Copy Markdown
Member Author

@sanposhiho sanposhiho May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you suggested, I changed this example to be more general, not using jobset. PTAL.

I didn't follow your example because I feel like users can define LabelSelector of foo=abc / foo=xyz directly in Pod affinity / anti affinity. I think the point of the usecase of matchLabelKey is that "users want to use it in PodAffinity, but they cannot know the label value because, for example, the label value is applied from another system automatically" (like jobset controller for job-name, the deployment controller for pod-template-hash, generated in the manifest manager, etc)

Comment thread keps/sig-scheduling/3633-matchlabelselectors-to-podaffinity/README.md Outdated
Comment on lines +301 to +303
- matchLabelSelectors:
- matchLabelKey: job-name
operator: NotIn
Copy link
Copy Markdown
Member

@Huang-Wei Huang-Wei May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this design. Look at this yaml - it specifies labelSelector and matchLabelSelector in a podAntiAffinity term, so can a user really understand its intents from first glance?

As an alternative, can we implement the semantics in the operator of podAffinityTerm? so we don't need matchLabels or matchLabelSelectors - all semantics are described using podAffinityTerm.

  • (new) ExistsWithSameValue - means "key" Exists and the match value is identical
  • (new) ExistsWithDifferentValue- means "key" Exists and the matched value is different

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you provide how your proposal looks via example yaml or something?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to see another proposal for the API, I agree that it duplicates labelSelector, but I didn't have better suggestions other than perhaps changing LabelSelector API itself to allow for implicitly selecting the label from the object, which is not applicable in all cases and can be confusing in the cases where it isn't.

@Huang-Wei I am also not following, which operator are you referring to? a new one we add to podAffinityTerm? an example would be great indeed.

Copy link
Copy Markdown
Member

@Huang-Wei Huang-Wei May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some examples:

a set of Pods A doesn't want to co-exist with other set of Pods, but want the set of Pods A co-located

    spec:
      affinity:
        podAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: pod-set
                operator: ExistsWithSameValue
            topologyKey: kubernetes.io/hostname
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: pod-set
                operator: ExistsWithDifferentValue
            topologyKey: kubernetes.io/hostname

smooth rolling upgrade for PodAntiAffinity:

    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app
                operator: In
                values:
                - pause
            topologyKey: kubernetes.io/hostname
          - labelSelector:
              matchExpressions:
              - key: pod-template-hash
                operator: ExistsWithSameValue
            topologyKey: kubernetes.io/hostname

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Huang-Wei I am also not following, which operator are you referring to?

(The term "operator" is misleading...) I meant LabelSelectorOperator. And see ^^ for examples.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally adding a new enum to a set is problematic in the API. The readers in this case are the scheduler and any external management tools (descheduler, etc). Does anyone else read this field for enforcement? The kubelet?

Copy link
Copy Markdown
Member Author

@sanposhiho sanposhiho May 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does anyone else read this field for enforcement? The kubelet?

no one else in k/k will read it. (As you said, some external tools like descheduler or cluster autoscaler will)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but also, this is generally consumed via the labels pkg: https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/apimachinery/pkg/labels; so if the new enums are handled there, most readers will just get them.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you list the enum alternative in the "Alternatives" section of the KEP?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added this idea as an alternative for now.
Who can give the final decision on whether the enum addition is allowed or not? (API reviewer?)

Comment thread keps/sig-scheduling/3633-matchlabelselectors-to-podaffinity/README.md Outdated
Comment thread keps/sig-scheduling/3633-matchlabelselectors-to-podaffinity/README.md Outdated
@alculquicondor
Copy link
Copy Markdown
Member

Just a nit regarding the API.

Leaving the rest to @ahg-g

@sanposhiho
Copy link
Copy Markdown
Member Author

sanposhiho commented May 20, 2023

(just push the change in the wording, keep the API as it is for now until we conclude something on this thread)

@ahg-g
Copy link
Copy Markdown
Member

ahg-g commented May 25, 2023

/label api-review

@k8s-ci-robot k8s-ci-robot added the api-review Categorizes an issue or PR as actively needing an API review. label May 25, 2023
@sanposhiho sanposhiho force-pushed the matchlabelselector branch from 844206e to 6bd1c2b Compare May 26, 2023 02:09
Comment on lines +285 to +287
- matchLabelSelectors:
- Key: tenant
operator: In
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the previous discussion of adding a new enum to label selector, I don't think we should do that, as existing clients are not prepared to handle that. Additionally, that new enum would make label selectors contextual, which would mean all code paths taking label selectors and returning matchers would then need to take label selectors AND a set of the labels of the containing object... that's a super invasive change I don't think we would ever successfully ripple out to all label selector consumers.

Having a separate spot in the API with key/operator fields that explicitly means "this label selector but with the value pulled from the pod's label" seems better to me than changing semantics of the existing label selector field.

I do wonder if it would make sense to have the API server actually populate the peer label selector field with the pod's label values on create. For example, creating this pod:

metadata:
  labels:
    tenant: foo
affinity:
  podAffinity:      # ensures the pods of this tenant land on the same node pool
    requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: otherlabel
            operator: In
            values: ["othervalue"]
        matchLabelSelectors:
          - key: tenant
            operator: In

would result in this pod being persisted:

 metadata:
   labels:
     tenant: foo 
 affinity:
   podAffinity:      # ensures the pods of this tenant land on the same node pool
     requiredDuringSchedulingIgnoredDuringExecution:
       - labelSelector:
           matchExpressions:
           - key: otherlabel
             operator: In
             values: ["othervalue"]
+          - key: tenant
+            operator: In
+            values: ["foo"]
         matchLabelSelectors:
           - key: tenant
             operator: In

That would mean anything reading pods and understanding how they are scheduled wouldn't have to learn about this new field, and the podAffinity / podAntiAffinity fields would continue to contain all the scheduling info

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(note that populating/merging with user specified fields on write in the server is usually not a great idea, but could be ok here specifically on pods since the field is immutable after create… also need to consider what would happen with an old client writing pods that didn't know about the new field and maybe copy the old value to the new pod in prepareforupdate)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would simplify the implementation, but wouldn't users be surprised to see that duplicated in the labelSelector when reading back the spec?

Copy link
Copy Markdown
Member

@liggitt liggitt May 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would need to game it out for sure, but users would also be surprised to see their affinity/antiAffinity rules get ignored, which is what all existing API clients (alternate schedulers, autoscalers, etc) making scheduling decisions on pods would do (until updated to understand this new field, which might never happen)

also, those clients could be very confused (or maybe even broken) if they encounter apparently incomplete/invalid affinity/antiAffinity stanzas in pods

Copy link
Copy Markdown
Member Author

@sanposhiho sanposhiho Jun 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this comment.
But, I prefer the idea of keeping matchLabelKeys + adding notMatchLabelKeys to matchLabelSelectors because

the API field is similar with what we introduced in PodTopologySpread.

makes both easier for users to understand instead of making a difference (matchLabelKeys vs matchLabelSelectors) between them to specify a similar thing.

@alculquicondor @ahg-g What do you think?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll defer to @ahg-g and @msau42

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahg-g @msau42 kindly ping. 🙏

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can find a more intuitive name for notMatchLabelKeys, then I am onboard. Otherwise I would stick with the key+operator approach.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • distinctLabelKeys
  • ignoreSameValueLabelKeys
  • ...? (My English skill can put out only them 😓

@liggitt liggitt removed their assignment May 26, 2023
@sanposhiho
Copy link
Copy Markdown
Member Author

cc @wojtek-t
since you are the PRR reviewer on this enhancement. 🙏

@wojtek-t wojtek-t self-assigned this May 29, 2023
@sanposhiho
Copy link
Copy Markdown
Member Author

@ahg-g Could you have a look again? (Sorry to rush, a bit mind about the enhancement freeze

Comment on lines +378 to +379
- If Operator is `In`, `key in (value)` is merged with LabelSelector.
- If Operator is `NotIn`, `key notin (value)` is merged with LabelSelector.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we validate that only in and notin are supported? are there other relevant operators?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be nice to add concrete examples here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we validate that only in and notin are supported?

+1. I don't think we will have supports for other fields. I added the mention that in and notin are valid.

Comment on lines +285 to +288
- matchLabelSelectors:
- key: tenant
operator: In
topologyKey: node-pool
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- matchLabelSelectors:
- key: tenant
operator: In
topologyKey: node-pool
- matchLabelSelectors:
- key: tenant
operator: In
topologyKey: node-pool

Comment on lines +291 to +298
- matchLabelSelectors:
- key: tenant
operator: NotIn
- labelSelector:
matchExpressions:
- key: tenant
operator: Exists
topologyKey: node-pool
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- matchLabelSelectors:
- key: tenant
operator: NotIn
- labelSelector:
matchExpressions:
- key: tenant
operator: Exists
topologyKey: node-pool
- matchLabelSelectors:
- key: tenant
operator: NotIn
- labelSelector:
matchExpressions:
- key: tenant
operator: Exists
topologyKey: node-pool

#### Story 2

Let's say all Pods on each tenant get `tenant` label via a controller or a manifest management tool like Helm.
And, the cluster admin now wants to achieve exclusive 1:1 tenant to domain placement.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe emphasize a bit that the value of tenant is unknown when composing the yaml.

Suggested change
And, the cluster admin now wants to achieve exclusive 1:1 tenant to domain placement.
Although the value of `tenant` label is unknown when composing the workload's manifest, the cluster admin still wants to achieve exclusive 1:1 tenant to domain placement.

type LabelSelectorOperator string

type MatchLabelSelector struct {
// Key is used to lookup value from the incoming pod labels,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make the indents consistent? (you used tab below)

Comment on lines +368 to +370
// MatchLabelSelectors is a set of pod label keys to select the group of existing pods
// which pods will be taken into consideration for the incoming pod's pod (anti) affinity.
// The default value is empty.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Format the comments (to have same empty spaces)

Comment on lines +378 to +379
- If Operator is `In`, `key in (value)` is merged with LabelSelector.
- If Operator is `NotIn`, `key notin (value)` is merged with LabelSelector.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be nice to add concrete examples here.

@sanposhiho
Copy link
Copy Markdown
Member Author

@ahg-g @Huang-Wei @msau42 Thanks all for the reviews! I fixed based on your suggestions, let me know if I missed something. PTAL again 🙏

@sanposhiho sanposhiho force-pushed the matchlabelselector branch from 5f62b7d to d5d098d Compare June 2, 2023 15:17
@ahg-g
Copy link
Copy Markdown
Member

ahg-g commented Jun 2, 2023

/lgtm
/approve
/hold

holding just in case someone else has some last comments.

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 2, 2023
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 2, 2023
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ahg-g, sanposhiho

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2023
@sanposhiho
Copy link
Copy Markdown
Member Author

Does this change require a review from PRR @wojtek-t?

@alculquicondor
Copy link
Copy Markdown
Member

You have updates to the PRR questionnaire, so yes.

Also, I don't see an update for the size increase estimate.

@Huang-Wei
Copy link
Copy Markdown
Member

Just want to call out #4001 (comment) given we only support In and NotIn and it's not likely to support other operators in the future.

@msau42
Copy link
Copy Markdown
Member

msau42 commented Jun 2, 2023

API lgtm

@wojtek-t
Copy link
Copy Markdown
Member

wojtek-t commented Jun 5, 2023

I would like to read it - queued, will try to get to it later today or tomorrow.

Copy link
Copy Markdown
Member

@wojtek-t wojtek-t left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question (apparently missed by me before) and w nits :)

A new optional field `MatchLabelSelectors` is introduced to `PodAffinityTerm`.

```go
type LabelSelectorOperator string
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we define the allowed values? Are you planning only In and NotIn or sth else too?

Copy link
Copy Markdown
Member Author

@sanposhiho sanposhiho Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.
Yes, only In and NotIn are expected.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #4069

**Downgrade**

kube-apiserver will ignore `MatchLabelKeys` in PodAffinity/PodAntiAffinity,
kube-apiserver will ignore `MatchLabelSelectors` in PodAffinity/PodAntiAffinity,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually - it will ignore (drop) setting this field for newly added objects.
But existing objects will continue to exist.

So I'm assuming that kube-scheduler will also be performing the actual computations under feature-gate, right?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also touching on
"How can this feature be enabled / disabled in a live cluster?" in PRR.

Copy link
Copy Markdown
Member Author

@sanposhiho sanposhiho Jun 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existing objects will continue to exist.

Ah, yes correct. This PR changes MatchLabelSelectors to be handled in kube-apiserver when Pod is created.
So, existing objects will continue to have the labelselector made by MatchLabelSelectors even after the feature gate is turned off.

What is the ideal action here? OK to leave the existing objects when turning off the feature gate (as long as we state in KEP), or we somehow need to ignore the labelselector made by MatchLabelSelectors (gonna be tricky though..)?

@ahg-g
Copy link
Copy Markdown
Member

ahg-g commented Jun 7, 2023

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 7, 2023
@k8s-ci-robot k8s-ci-robot merged commit 68f741d into kubernetes:master Jun 7, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.28 milestone Jun 7, 2023
@ahg-g
Copy link
Copy Markdown
Member

ahg-g commented Jun 7, 2023

ah, sorry, merged by mistake

@sanposhiho
Copy link
Copy Markdown
Member Author

OK, I'll create another PR to follow up remaining discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

Status: API review completed, 1.28

Development

Successfully merging this pull request may close these issues.

9 participants