Populate NodeAffinity on top of labels for cloud based PersistentVolumes#64226
Populate NodeAffinity on top of labels for cloud based PersistentVolumes#64226k8s-github-robot merged 1 commit intokubernetes:masterfrom ddebroy:ddebroy-affinity1
Conversation
|
/ok-to-test |
|
/assign @verult |
There was a problem hiding this comment.
So on GCE, we support Regional Disks, which are replicated in two zones. To represent that, we actually stuffed both zones in a single label.
You'll have to treat the zone label specially. And you'll need to use volumeutil.LabelZonesToSet in order to parse this label format. And then, each zone should be included as a separate element in the Values. See the example pv node affinity here
There was a problem hiding this comment.
Drawing parallels with the find*Labels() methods, we could have something like translate*Labels() that implement plugin-specific translation logic like this.
There was a problem hiding this comment.
You can use append on requirements instead of keeping track of i
There was a problem hiding this comment.
We should only process/set NodeAffinity if VolumeScheduling feature is enabled.
There was a problem hiding this comment.
how are we verifying this doesnt conflict with user provided labels and overwrites them ?
There was a problem hiding this comment.
We should make sure these labels passed in are the ones given by the cloud provider.
You also bring up a good point. If the user already specified a NodeAffinity in the PV, we should probably not override it.
There was a problem hiding this comment.
Good point indeed. I did think about potential conflicts between user-provided labels and those applied by the system but based on this unit-test I thought overriding is(/was) preferred way: https://github.com/kubernetes/kubernetes/blob/master/plugin/pkg/admission/storage/persistentvolume/label/admission_test.go#L161
There was a problem hiding this comment.
I discussed this briefly with @saad-ali and @verult. We think it's better if there's any conflict between user provided and cloud provider provided values, then we should error instead of blindly overwriting. Users should know that they made a configuration mistake because it could indicate that they are accidentally using the wrong volume than intended.
There was a problem hiding this comment.
Sounds good. Thinking about how to detect conflicts and "merge" with an arbitrary pre-specified affinity in a safe way, the following scheme should work. Thoughts?
Scan through each NodeSelectorTerm in NodeAffinity.Required:
Create a map <NodeSelectorRequirement, bool> populated with NodeSelectorRequirement s corresponding to cloud specific labels to record if a matching NodeSelectorRequirement has been found in current NodeSelectorTerm
Scan through each NodeSelectorRequirement in the NodeSelectorTerm:
If NodeSelectorRequirement.Key matches but operator or values do not match, return error due to conflict.
If key, operator, value of NodeSelectorRequirement matches entry in map, mark it in map.
Append NodeSelectorRequirement entries from map that have not yet been marked/matched with NodeSelectorRequirements (corresponding to cloud specific label) in current NodeSelectorTerm
This should result in the NodeSelectorRequirement entries corresponding to cloud specific labels being "and"ed to each NodeSelectorTerm entry of an arbitrarily specified NodeAffinity.Required block.
There was a problem hiding this comment.
This could work; alternatively maybe we just want to leave the NodeAffinity alone if the user already specified something?
verult
left a comment
There was a problem hiding this comment.
Is there a need for a way to convert from PV NodeAffinity back to zone labels? For example, when a new volume is created with VolumeScheduling enabled, AFAIU only has PV NodeAffinity, no labels. If the feature is then turned off, the cluster still needs labels to schedule correctly.
There was a problem hiding this comment.
Drawing parallels with the find*Labels() methods, we could have something like translate*Labels() that implement plugin-specific translation logic like this.
There was a problem hiding this comment.
This could work; alternatively maybe we just want to leave the NodeAffinity alone if the user already specified something?
We should still keep the old behavior with labels for backwards compatibility exactly for that reason |
@verult I will update the code with the above logic - it is certainly simple. I did want to confirm a couple of things around this:
|
|
Thanks for the great points
Right, an argument could be made that if a user is already aware of PV NodeAffinity and is working with it, it's reasonable to expect them specify it correctly and completely. As for (2), regardless of whether a controller exists today, it'd be nice to be future-proof here and avoid headaches if another controller writing to PV NodeAffinity is created in the future. Now I'm leaning towards your original proposed merging method for this reason. As long as the process is only additive to the field we should be OK. One thing to consider, as @msau42 pointed out offline, is there are multiple ways to express the same topology using NodeAffinity. See here for an example with CSI, but this issue could come up with Regional PDs. |
There was a problem hiding this comment.
Check nil ptrs when dereferencing each field. Same for all tests
There was a problem hiding this comment.
You could sort the two lists and do a DeepEqual
|
I will go ahead and implement the additive/merge logic for NodeSelectorRequirement elements corresponding to the zone and regions labels as discussed above. Will update this PR. |
|
I have implemented the merge mechanism for NodeSelectorRequirements corresponding to the cloud labels as discussed earlier. |
|
lgtm, go ahead and squash the commits @verult could you test this out with repd? |
|
commits squashed |
|
Manually tested using regional PDs for the following cases:
Everything looks good! |
|
/lgtm |
|
/assign @thockin @derekwaynecarr |
verult
left a comment
There was a problem hiding this comment.
Minor comments, LGTM otherwise
There was a problem hiding this comment.
nit: This block can be moved to line 249
There was a problem hiding this comment.
nit: populateAffinity := utilfeature.DefaultFeatureGate.Enabled(features.VolumeScheduling) && len(volLabels) != 0, so you don't need the populateAffinity = true later
There was a problem hiding this comment.
nit: What about if reqs is empty?
There was a problem hiding this comment.
added case to return false for empty reqs
There was a problem hiding this comment.
nit:
if tc.affinityMerged {
if obj.Spec.NodeAffinity == nil {
....
}
if obj.Spec.NodeAffinity.Required == nil {
...There was a problem hiding this comment.
block removed as no longer necessary
There was a problem hiding this comment.
I'd avoid this logic here - it's part of the same logic we want to test. Instead of affinityMerged, what if we had expectedNodeAffinity? We should be able to directly verify the result without having to analyze implementation details in the test like merging.
There was a problem hiding this comment.
Hardcoded expectations and matching with target using deepequal.
|
Addressed CR comments from @verult, rebased and squashed commits |
Signed-off-by: Deep Debroy <ddebroy@docker.com>
|
/lgtm Thanks Deep! |
|
/assign @thockin |
|
/assign @erictune @smarterclayton @luxas @wlan0 @deads2k Adding some more folks for review/approval based on the OWNERS files mentioned above. |
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ddebroy, msau42, thockin, verult The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Automatic merge from submit-queue (batch tested with PRs 64226, 65880). If you want to cherry-pick this change to another branch, please follow the instructions here. |
What this PR does / why we need it:
This PR populates the NodeAffinity field (on top of the existing labels) for PVs backed by cloud providers like EC2 EBS and GCE PD.
Special notes for your reviewer:
Related to #63232
Sample
describe pvoutput for EBS with node affinity field populated:/sig storage
/assign @msau42
Release note:
NONE