Add DynamicProvisioningScheduling support for EBS#65730
Add DynamicProvisioningScheduling support for EBS#65730k8s-github-robot merged 1 commit intokubernetes:masterfrom ddebroy:ebs-affinity1
Conversation
msau42
left a comment
There was a problem hiding this comment.
Can you also add an e2e test for this? Under test/e2e/storage. You can probably reuse parts of the volume_provisioning.go tests, but you'll need to modify the StorageClass and create a pod too.
New multi-zonal e2es can be considered as a separate PR too.
pkg/volume/aws_ebs/aws_util.go
Outdated
There was a problem hiding this comment.
Maybe you can suggest to use allowedTopologies instead
There was a problem hiding this comment.
or zone/zones cannot be specified with VolumeBindingMode = WaitForFirstConsumer.
There was a problem hiding this comment.
@msau42 , from what I understand, with DynamicProvisioningScheduling enabled, AllowedTopologies will only be relevant for scenarios like GCE where multi-zonal volumes are supported, correct? In case of EBS, I am ignoring AllowedTopologies. I will update the error msg to what Jan suggested above.
Should we return error in case of EBS if AllowedTopologies is specified and the selected node is not in list of zones specified in AllowedTopologies?
There was a problem hiding this comment.
No, AllowedTopologies is relevant to single-zone volumes too. It basically replaces the "zone" and "zones" parameters, and allows an admin to restrict the zones where volumes can be provisioned in. The difference is the scheduler will respect it when choosing a node.
There was a problem hiding this comment.
Here's what I think the provision implementation of single-zone volumes should do:
- node not set, allowed topologies not set: old behavior
- node not set, allowed topologies set: choose from allowed topologies
- node set, zone/zones set: error always?
- node set, allowed topologies set or not set: use node's zone
- zone/zones set + allowedtopologies set: error
|
/ok-to-test |
pkg/volume/aws_ebs/aws_util.go
Outdated
There was a problem hiding this comment.
This could be a volume util that gce pd could use too
There was a problem hiding this comment.
Actually @verult may have some comments here since he is looking at doing a similar allowedTopologies conversion for CSI
There was a problem hiding this comment.
I think I will refactor this area a bit to create a common populateVolumeZones that we can share between GCE (single zone) PD, EBS (and maybe Azure Disk when it gets zone support). The provider-specific volume options like kmskeyid / encryption should probably stay in the above provider-specifc files' CreateVolume.
For Multi-zonal PDs, the logic here is going to be quite different.
There was a problem hiding this comment.
I'm not sure if multi-zonal PDs will be different. This function is just creating a list of allowed zones. The only difference is that for multi-zonal PDs, it will pick 2 zones from the list instead of 1.
There was a problem hiding this comment.
I don't think we'll be able to reuse the logic for CSI conversion as it'll be more complicated, for example there isn't a one-to-one mapping between allowedTopology and CSI topology representation.
There was a problem hiding this comment.
I don't think we'll be able to reuse the logic for CSI conversion as it'll be more complicated, for example there isn't a one-to-one mapping between allowedTopology and CSI topology representation.
Can you give an example of what you mean? If there is no one-to-one mapping, then how will you be able to convert it?
There was a problem hiding this comment.
This could be a volume util that gce pd could use too
Should we pull it out into volume util when we get to the GCE plugin? It shouldn't be too hard to do it later on. It's easy to make the wrong assumptions about commonalities if this is done prematurely. I think this is more of a problem when creating abstraction boundaries so I don't have a strong preference either way for this case.
- Flatten allowed topologies to a list of maps (csi topology format)
I don't think we should complicate the logic in in-tree volume plugins for the sake of aligning with CSI implementation details.
Can you give an example of what you mean? If there is no one-to-one mapping, then how will you be able to convert it?
AllowedTopologies can have ORs of ANDs of ORs, so there are multiple possible representations of the same topology constraint. CSI only has ORs of ANDs. To convert from CSI representation to AllowedTopologies, we could pick a canonical representation, similar to how we deal with the same issue with volume node affinity.
There was a problem hiding this comment.
Ok, let's consider refactoring this when we get to the CSI implementation. But I feel the in-tree plugins use of allowed topologies is a subset of the CSI functionality, so it should be possible to get the list of zones out of the CSI representation. Plus, here we're only talking about converting from AllowedTopologies -> CSI, not the other way around.
So for now, I think a common util that only handles zones should be fine
There was a problem hiding this comment.
CSI only has ORs of ANDs
@verult Why does CSI not support ORs of ANDs of ORs? For multi-zonal implementations like GCE PD, the "last" ORs bit corresponding to the values is where the possible set of zones are specified:
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- us-central1-a
- us-central1-b
- us-central1-c
Will the above convention change for multi-zonal volumes in CSI?
There was a problem hiding this comment.
AFAIU we could similarly have
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- us-central1-a
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- us-central1-b
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- us-central1-c
to represent the same thing (in both cases we are saying, these are the zones the volume plugin is free to choose from), and an equivalent form of this is how CSI represents it
pkg/volume/aws_ebs/aws_util.go
Outdated
There was a problem hiding this comment.
Should this be a set? The same zone could have mistakenly been specified multiple times.
There was a problem hiding this comment.
I think API validation should enforce the following semantics, if it doesn't already:
- There are no repeats in requirement
Values - Each TopologySelectorTerm is unique (regardless of the ordering of any array fields)
- Map semantics for MatchLabelExpressions
In other words, the type of AllowedTopologies is essentially set( set ( map(string, set[string]))), and the API should preserve these set and map semantics.
If the above is satisfied then we don't need a set here, given that zones are unique per region.
There was a problem hiding this comment.
We can test with some examples. But if it doesn't validate, we'll have to think about if we can strengthen it; the validation reuses much of the existing selector validation, and we can't tighten that.
There was a problem hiding this comment.
the validation reuses much of the existing selector validation, and we can't tighten that.
If it does turn out that current validation isn't sufficient, then I'd argue either the existing selector validation is somehow not strong enough, or maybe we shouldn't use these selectors
There was a problem hiding this comment.
Can you open a bug for that? We can look into fixing that separately
There was a problem hiding this comment.
Looked at the current validation code briefly and it doesn't seem like we have any structural validation today.
I'm happy to take this task if folks agree this is a good idea: #66184
pkg/volume/aws_ebs/aws_util.go
Outdated
There was a problem hiding this comment.
I wonder if this should error anything other than zone is specified. Because that is information the user expects us to use, but we didn't. cc @verult
There was a problem hiding this comment.
I guess they could be arbitrary labels. The example in https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/volume-topology-scheduling.md#multi-label-volume refers to key: foo.io/rack which should be okay?
There was a problem hiding this comment.
Please ignore above .. the above example is not relevant. The one here does refer to key: rack in addition to key: zone. So the user should be allowed to specify arbitrary keys in case some other code elsewhere down the line decides to act on other keys?
There was a problem hiding this comment.
Topology keys are a little different than label keys in Kubernetes, in that the topology keys are defined by the storage plugin and not the user, and the topology labels define the inherent restrictions of the storage provider. So I think we shouldn't mix plugin and user-defined keys in allowedTopologies. I think the safest course of action now is to error if there's a key we don't recognize.
pkg/volume/aws_ebs/aws_util_test.go
Outdated
There was a problem hiding this comment.
one more test case for single zone param + allowed topologies
There was a problem hiding this comment.
What about nodeWithLabels + some allowedTopologies?
pkg/volume/aws_ebs/aws_util_test.go
Outdated
There was a problem hiding this comment.
you probably need to disable the feature gate in the else case, to handle the times when you call continue
pkg/volume/aws_ebs/aws_util.go
Outdated
There was a problem hiding this comment.
Is this check necessary? the !ok case below should be sufficient
pkg/volume/aws_ebs/aws_util.go
Outdated
There was a problem hiding this comment.
Shouldn't this be in API validation instead, if it's not already there? @msau42
pkg/volume/aws_ebs/aws_util.go
Outdated
There was a problem hiding this comment.
I think API validation should enforce the following semantics, if it doesn't already:
- There are no repeats in requirement
Values - Each TopologySelectorTerm is unique (regardless of the ordering of any array fields)
- Map semantics for MatchLabelExpressions
In other words, the type of AllowedTopologies is essentially set( set ( map(string, set[string]))), and the API should preserve these set and map semantics.
If the above is satisfied then we don't need a set here, given that zones are unique per region.
pkg/volume/aws_ebs/aws_util.go
Outdated
There was a problem hiding this comment.
I don't think we'll be able to reuse the logic for CSI conversion as it'll be more complicated, for example there isn't a one-to-one mapping between allowedTopology and CSI topology representation.
pkg/volume/aws_ebs/aws_util_test.go
Outdated
There was a problem hiding this comment.
nit: Comment on what "reject" means
pkg/volume/aws_ebs/aws_util_test.go
Outdated
There was a problem hiding this comment.
What about nodeWithLabels + some allowedTopologies?
pkg/volume/aws_ebs/aws_util_test.go
Outdated
There was a problem hiding this comment.
"failed since zones parameter and multiple zones specified...
pkg/volume/aws_ebs/aws_ebs.go
Outdated
There was a problem hiding this comment.
I don't think that this PR should touch block volume code. Is it result of unsuccessful merge? VolumeMode is already set in VolumeMode: volumeMode ~35 lines above.
There was a problem hiding this comment.
Thanks for catching this - will fix. Indeed a case of a bad merge.
pkg/volume/util/util.go
Outdated
There was a problem hiding this comment.
We should document somewhere that only LabelZoneFailureDomain is supported on AWS (in a separate PR).
pkg/volume/util/util.go
Outdated
There was a problem hiding this comment.
Should it return error if allowedZones.Len() == 0 ? Such allowedTopologies are not very useful.
There was a problem hiding this comment.
i'm not sure if api validation could handle this case.
you can have allowed topologies > 0 but not actually specify the zone label
pkg/volume/util/util.go
Outdated
pkg/volume/util/util.go
Outdated
pkg/volume/util/util.go
Outdated
There was a problem hiding this comment.
field names start in lowercase
pkg/volume/util/util.go
Outdated
pkg/volume/util/util.go
Outdated
There was a problem hiding this comment.
fileld / object names start with lowercase
pkg/volume/util/util.go
Outdated
There was a problem hiding this comment.
Nr. of replicas is hardcoded to 1, so lot of code below is useless. I'd suggest to focus on the topology in this PR and then send a subsequent one with multi-zonal volumes.
pkg/volume/aws_ebs/aws_util.go
Outdated
There was a problem hiding this comment.
The aws cloud provider interface abstraction seems a bit off to me. I feel like the aws cloud provider layer should not be processing the zone and zones arguments, which are StorageClass parameters. It seems better suited for the Kubernetes plugin layer to process, and pass in the selected zone to the aws cloud provider layer. Right now, both layers have to handle all the permutations of zone and zones.
If the aws cloud layer is refactored to push the zone argument processing up to the aws plugin, then I think this PopulateZoneConfig could be greatly simplified, and only needs to return a list of zones, either from allowedTopologies or from old zone/zones. @jsafrane wdyt?
There was a problem hiding this comment.
With an extra function in the CloudProvider interface to expose something like GetAllCurrentZones/getCandidateZonesForDynamicVolume (that PopulateZoneConfig can invoke for the no zone/zones specified scenarios), we should be able to refactor all the zone/zones processing in CreateDisk here into PopulateZoneConfig.
|
Addressed CR comments so far with the following major changes:
The Azure zone awareness PRs are also coming in now and that can use the common util in this PR for zonal volume creation. |
pkg/volume/util/util_test.go
Outdated
There was a problem hiding this comment.
nit: add a comment when test args start, and when test expected results start
|
/test pull-kubernetes-e2e-gce |
|
/lgtm |
|
AWS code is fine. Please squash the commits before it gets merged. /approve |
|
/approve no-issue |
Signed-off-by: Deep Debroy <ddebroy@docker.com>
|
/test pull-kubernetes-e2e-gce-100-performance |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ddebroy, jsafrane, msau42 The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/test all [submit-queue is verifying that this PR is safe to merge] |
|
/test pull-kubernetes-e2e-kops-aws |
|
Automatic merge from submit-queue (batch tested with PRs 65730, 66615, 66684, 66519, 66510). If you want to cherry-pick this change to another branch, please follow the instructions here. |
What this PR does / why we need it:
This PR adds support for the DynamicProvisioningScheduling feature in EBS. With this in place, if VolumeBindingMode: WaitForFirstConsumer is specified in a EBS storageclass and DynamicProvisioningScheduling is enabled, EBS provisioner will use the selected node's LabelZoneFailureDomain as the zone to provision the EBS volume in.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Related to #63232
Sample
describe pvoutput with NodeAffinity populated:Release note:
/sig storage
/assign @msau42 @jsafrane