fix(kep-3633): redesign matchLabelKeys as matchLabelSelectors to introduce Operator field#4001
Conversation
8b7bee8 to
5bb547b
Compare
| - These e2e tests will be added. | ||
| - `MatchLabelKeys: ["pod-template-hash"]` in `PodAffinity` (both in Filter and Score) works as expected with a rolling upgrade scenario. | ||
| - `MatchLabelKeys: ["pod-template-hash"]` in `PodAntiAffinity` (both in Filter and Score) works as expected with a rolling upgrade scenario. | ||
| N/A |
There was a problem hiding this comment.
Not related to matchLabelSelectors, but we decided not to have e2e test in kubernetes/kubernetes#116065.
…oduce Operator field
5bb547b to
15c1f05
Compare
|
|
||
| #### Story 2 | ||
|
|
||
| When users want to achieve exclusive 1:1 job to domain placement in [kubernetes-sigs/jobset](https://github.com/kubernetes-sigs/jobset). |
There was a problem hiding this comment.
Can this use case be generalized? I'd like this case to be explained as JobSet-like, but not restricted to JobSet only.
For example: when M Pods and N Pods share the same PodTemplate. The orchestrator spins up the two set of Pods at runtime, distinguish at their common key with different value: one can be foo=abc, the other is foo=xyz. If we want to place M pods in a topology, while keep N pods away from that topology, how can we achieve that?
There was a problem hiding this comment.
OK, I'll try to have another example here to explain the same usecase.
There was a problem hiding this comment.
As you suggested, I changed this example to be more general, not using jobset. PTAL.
I didn't follow your example because I feel like users can define LabelSelector of foo=abc / foo=xyz directly in Pod affinity / anti affinity. I think the point of the usecase of matchLabelKey is that "users want to use it in PodAffinity, but they cannot know the label value because, for example, the label value is applied from another system automatically" (like jobset controller for job-name, the deployment controller for pod-template-hash, generated in the manifest manager, etc)
| - matchLabelSelectors: | ||
| - matchLabelKey: job-name | ||
| operator: NotIn |
There was a problem hiding this comment.
I'm not sure I like this design. Look at this yaml - it specifies labelSelector and matchLabelSelector in a podAntiAffinity term, so can a user really understand its intents from first glance?
As an alternative, can we implement the semantics in the operator of podAffinityTerm? so we don't need matchLabels or matchLabelSelectors - all semantics are described using podAffinityTerm.
- (new)
ExistsWithSameValue- means "key"Existsand the match value is identical - (new)
ExistsWithDifferentValue- means "key"Existsand the matched value is different
There was a problem hiding this comment.
Could you provide how your proposal looks via example yaml or something?
There was a problem hiding this comment.
I would love to see another proposal for the API, I agree that it duplicates labelSelector, but I didn't have better suggestions other than perhaps changing LabelSelector API itself to allow for implicitly selecting the label from the object, which is not applicable in all cases and can be confusing in the cases where it isn't.
@Huang-Wei I am also not following, which operator are you referring to? a new one we add to podAffinityTerm? an example would be great indeed.
There was a problem hiding this comment.
Here are some examples:
a set of Pods A doesn't want to co-exist with other set of Pods, but want the set of Pods A co-located
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: pod-set
operator: ExistsWithSameValue
topologyKey: kubernetes.io/hostname
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: pod-set
operator: ExistsWithDifferentValue
topologyKey: kubernetes.io/hostnamesmooth rolling upgrade for PodAntiAffinity:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- pause
topologyKey: kubernetes.io/hostname
- labelSelector:
matchExpressions:
- key: pod-template-hash
operator: ExistsWithSameValue
topologyKey: kubernetes.io/hostnameThere was a problem hiding this comment.
@Huang-Wei I am also not following, which operator are you referring to?
(The term "operator" is misleading...) I meant LabelSelectorOperator. And see ^^ for examples.
There was a problem hiding this comment.
Generally adding a new enum to a set is problematic in the API. The readers in this case are the scheduler and any external management tools (descheduler, etc). Does anyone else read this field for enforcement? The kubelet?
There was a problem hiding this comment.
Does anyone else read this field for enforcement? The kubelet?
no one else in k/k will read it. (As you said, some external tools like descheduler or cluster autoscaler will)
There was a problem hiding this comment.
but also, this is generally consumed via the labels pkg: https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/apimachinery/pkg/labels; so if the new enums are handled there, most readers will just get them.
There was a problem hiding this comment.
Can you list the enum alternative in the "Alternatives" section of the KEP?
There was a problem hiding this comment.
Sure, added this idea as an alternative for now.
Who can give the final decision on whether the enum addition is allowed or not? (API reviewer?)
|
Just a nit regarding the API. Leaving the rest to @ahg-g |
|
(just push the change in the wording, keep the API as it is for now until we conclude something on this thread) |
|
/label api-review |
844206e to
6bd1c2b
Compare
| - matchLabelSelectors: | ||
| - Key: tenant | ||
| operator: In |
There was a problem hiding this comment.
About the previous discussion of adding a new enum to label selector, I don't think we should do that, as existing clients are not prepared to handle that. Additionally, that new enum would make label selectors contextual, which would mean all code paths taking label selectors and returning matchers would then need to take label selectors AND a set of the labels of the containing object... that's a super invasive change I don't think we would ever successfully ripple out to all label selector consumers.
Having a separate spot in the API with key/operator fields that explicitly means "this label selector but with the value pulled from the pod's label" seems better to me than changing semantics of the existing label selector field.
I do wonder if it would make sense to have the API server actually populate the peer label selector field with the pod's label values on create. For example, creating this pod:
metadata:
labels:
tenant: foo
affinity:
podAffinity: # ensures the pods of this tenant land on the same node pool
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: otherlabel
operator: In
values: ["othervalue"]
matchLabelSelectors:
- key: tenant
operator: Inwould result in this pod being persisted:
metadata:
labels:
tenant: foo
affinity:
podAffinity: # ensures the pods of this tenant land on the same node pool
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: otherlabel
operator: In
values: ["othervalue"]
+ - key: tenant
+ operator: In
+ values: ["foo"]
matchLabelSelectors:
- key: tenant
operator: InThat would mean anything reading pods and understanding how they are scheduled wouldn't have to learn about this new field, and the podAffinity / podAntiAffinity fields would continue to contain all the scheduling info
There was a problem hiding this comment.
+1 on doing this in apiserver.
But then we should do the same in https://github.com/kubernetes/enhancements/tree/master/keps/sig-scheduling/3243-respect-pod-topology-spread-after-rolling-upgrades
There was a problem hiding this comment.
(note that populating/merging with user specified fields on write in the server is usually not a great idea, but could be ok here specifically on pods since the field is immutable after create… also need to consider what would happen with an old client writing pods that didn't know about the new field and maybe copy the old value to the new pod in prepareforupdate)
There was a problem hiding this comment.
That would simplify the implementation, but wouldn't users be surprised to see that duplicated in the labelSelector when reading back the spec?
There was a problem hiding this comment.
would need to game it out for sure, but users would also be surprised to see their affinity/antiAffinity rules get ignored, which is what all existing API clients (alternate schedulers, autoscalers, etc) making scheduling decisions on pods would do (until updated to understand this new field, which might never happen)
also, those clients could be very confused (or maybe even broken) if they encounter apparently incomplete/invalid affinity/antiAffinity stanzas in pods
There was a problem hiding this comment.
Sorry, missed this comment.
But, I prefer the idea of keeping matchLabelKeys + adding notMatchLabelKeys to matchLabelSelectors because
the API field is similar with what we introduced in PodTopologySpread.
makes both easier for users to understand instead of making a difference (matchLabelKeys vs matchLabelSelectors) between them to specify a similar thing.
@alculquicondor @ahg-g What do you think?
There was a problem hiding this comment.
If we can find a more intuitive name for notMatchLabelKeys, then I am onboard. Otherwise I would stick with the key+operator approach.
There was a problem hiding this comment.
- distinctLabelKeys
- ignoreSameValueLabelKeys
- ...? (My English skill can put out only them 😓
|
cc @wojtek-t |
|
@ahg-g Could you have a look again? (Sorry to rush, a bit mind about the enhancement freeze |
| - If Operator is `In`, `key in (value)` is merged with LabelSelector. | ||
| - If Operator is `NotIn`, `key notin (value)` is merged with LabelSelector. |
There was a problem hiding this comment.
should we validate that only in and notin are supported? are there other relevant operators?
There was a problem hiding this comment.
it'd be nice to add concrete examples here.
There was a problem hiding this comment.
should we validate that only in and notin are supported?
+1. I don't think we will have supports for other fields. I added the mention that in and notin are valid.
| - matchLabelSelectors: | ||
| - key: tenant | ||
| operator: In | ||
| topologyKey: node-pool |
There was a problem hiding this comment.
| - matchLabelSelectors: | |
| - key: tenant | |
| operator: In | |
| topologyKey: node-pool | |
| - matchLabelSelectors: | |
| - key: tenant | |
| operator: In | |
| topologyKey: node-pool |
| - matchLabelSelectors: | ||
| - key: tenant | ||
| operator: NotIn | ||
| - labelSelector: | ||
| matchExpressions: | ||
| - key: tenant | ||
| operator: Exists | ||
| topologyKey: node-pool |
There was a problem hiding this comment.
| - matchLabelSelectors: | |
| - key: tenant | |
| operator: NotIn | |
| - labelSelector: | |
| matchExpressions: | |
| - key: tenant | |
| operator: Exists | |
| topologyKey: node-pool | |
| - matchLabelSelectors: | |
| - key: tenant | |
| operator: NotIn | |
| - labelSelector: | |
| matchExpressions: | |
| - key: tenant | |
| operator: Exists | |
| topologyKey: node-pool |
| #### Story 2 | ||
|
|
||
| Let's say all Pods on each tenant get `tenant` label via a controller or a manifest management tool like Helm. | ||
| And, the cluster admin now wants to achieve exclusive 1:1 tenant to domain placement. |
There was a problem hiding this comment.
Maybe emphasize a bit that the value of tenant is unknown when composing the yaml.
| And, the cluster admin now wants to achieve exclusive 1:1 tenant to domain placement. | |
| Although the value of `tenant` label is unknown when composing the workload's manifest, the cluster admin still wants to achieve exclusive 1:1 tenant to domain placement. |
| type LabelSelectorOperator string | ||
|
|
||
| type MatchLabelSelector struct { | ||
| // Key is used to lookup value from the incoming pod labels, |
There was a problem hiding this comment.
make the indents consistent? (you used tab below)
| // MatchLabelSelectors is a set of pod label keys to select the group of existing pods | ||
| // which pods will be taken into consideration for the incoming pod's pod (anti) affinity. | ||
| // The default value is empty. |
There was a problem hiding this comment.
Format the comments (to have same empty spaces)
| - If Operator is `In`, `key in (value)` is merged with LabelSelector. | ||
| - If Operator is `NotIn`, `key notin (value)` is merged with LabelSelector. |
There was a problem hiding this comment.
it'd be nice to add concrete examples here.
|
@ahg-g @Huang-Wei @msau42 Thanks all for the reviews! I fixed based on your suggestions, let me know if I missed something. PTAL again 🙏 |
5f62b7d to
d5d098d
Compare
|
/lgtm holding just in case someone else has some last comments. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ahg-g, sanposhiho The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Does this change require a review from PRR @wojtek-t? |
|
You have updates to the PRR questionnaire, so yes. Also, I don't see an update for the size increase estimate. |
|
Just want to call out #4001 (comment) given we only support In and NotIn and it's not likely to support other operators in the future. |
|
API lgtm |
|
I would like to read it - queued, will try to get to it later today or tomorrow. |
wojtek-t
left a comment
There was a problem hiding this comment.
One question (apparently missed by me before) and w nits :)
| A new optional field `MatchLabelSelectors` is introduced to `PodAffinityTerm`. | ||
|
|
||
| ```go | ||
| type LabelSelectorOperator string |
There was a problem hiding this comment.
Can we define the allowed values? Are you planning only In and NotIn or sth else too?
There was a problem hiding this comment.
Sure.
Yes, only In and NotIn are expected.
| **Downgrade** | ||
|
|
||
| kube-apiserver will ignore `MatchLabelKeys` in PodAffinity/PodAntiAffinity, | ||
| kube-apiserver will ignore `MatchLabelSelectors` in PodAffinity/PodAntiAffinity, |
There was a problem hiding this comment.
Actually - it will ignore (drop) setting this field for newly added objects.
But existing objects will continue to exist.
So I'm assuming that kube-scheduler will also be performing the actual computations under feature-gate, right?
There was a problem hiding this comment.
This is also touching on
"How can this feature be enabled / disabled in a live cluster?" in PRR.
There was a problem hiding this comment.
existing objects will continue to exist.
Ah, yes correct. This PR changes MatchLabelSelectors to be handled in kube-apiserver when Pod is created.
So, existing objects will continue to have the labelselector made by MatchLabelSelectors even after the feature gate is turned off.
What is the ideal action here? OK to leave the existing objects when turning off the feature gate (as long as we state in KEP), or we somehow need to ignore the labelselector made by MatchLabelSelectors (gonna be tricky though..)?
|
/hold cancel |
|
ah, sorry, merged by mistake |
|
OK, I'll create another PR to follow up remaining discussion. |
MatchLabelKeysto Pod Affinity and Pod Anti Affinity #3633