[Federation] Hpa controller controls target objects by irfanurrehman · Pull Request #49950 · kubernetes/kubernetes

irfanurrehman · 2017-08-01T14:05:10Z

This is in the series of PRs over #45993.
The last commit is reviewable. Probably the last PR in this chain with e2e tests for relevant scenario, including the scenario created by this PR is soon to follow.

Special notes for your reviewer:
@kubernetes/sig-federation-pr-reviews
@quinton-hoole

Release note:

irfanurrehman · 2017-08-01T18:00:57Z

/assign @quinton-hoole

ghost · 2017-08-07T23:38:57Z

/retest

ghost

More review to come...

ghost · 2017-08-07T23:48:01Z

federation/pkg/federatedtypes/hpa.go

+	}
+}
+
+func updateRuntimeObjectForKind(c federationclientset.Interface, kind schema.GroupKind, ns string, obj pkgruntime.Object) (pkgruntime.Object, error) {


This and the above function seem rather brittle? It seems that you should use the list of object types supported by the cluster HPA's ideally (i.e. not hardcode the names of the types here, and have to update them here every time the cluster HPA's get updated. I realize that this is perhaps non-trivial, and am happy to defer it to a later PR provided that we open a known issue for it now.

Actually, I did look to do something similar what you suggest, but to my surprise found that, at almost all places, where ever the kind or the resource name is initialised, its used as a string directly.
For example federation-apiserver default resource config (which is done quite similar in kube-apiserver) or specifying a kind, as in deployment controller ref kind or the replicaset controller kind itself.
I did go ahead to use the string as in the code reviewed, thus.

Additionally, in k8s api, the hpa targetref object is not interpreted (as is done here) within the hpa controllers purview. The scales subresource is used instead. So what ever target name and kind comes up in the hpa spec, the scales api is called using the same. Only particular types support this subresource, thus the api call will fail, if called on a wrong type or kind (which probably was wrong in the hpa spec, specified by the user), ref.
I did use only "Deployment" and "ReplicaSet" in the kind check here, because those are the only two kinds which are available against federation API.
I am not sure, how else can I handle it. On the other hand, I have limited understanding of the API machinery; @nikhiljindal would you kindly be able to help a bit on this, thanks!

OK, fair enough. I'm still concerned that in this code we explicitly only support Replicasets and Deployments, while the cluster HPA also supported ReplicationControllers, and in future perhaps other types. Please make sure that this is clearly documented in the user guide.

Yes, I completely agree, but as mentioned I did not find a better way for this.
Also, I have used Deployment and ReplicaSet only in the target objects, because those are the two objects which are supported in control plane as of now. I have added comments on top of the functions for now, with a TODO to update the documentation when I do that.

ghost · 2017-08-07T23:53:17Z

federation/pkg/federatedtypes/scheduling.go

 	return true
 }

+func isSelected(names []string, name string) bool {


Rather replace with a simple set inclusion operation? That would be less code, and more efficient. I think we have a function in the utils library to do that. StringSet or similar.

If you are talking about this, I found it less efficient then using the simple slice like I have used here, simply by the fact that it uses maps (I have used it in hpa controller though, because of the need of set operations over there).
I could have used this string set to store the cluster list in the first place to ensure, I don't introduce a function like the one, I got this comment on. But a slice suited the need better because of marshalling being super easy as here.
Please let me know if you have a better suggestion.

OK, fair enough.

ghost

First pass review complete. You seem to be missing unit tests on some of this code?

ghost · 2017-08-11T19:11:53Z

federation/pkg/federatedtypes/hpa.go

 		Spec: autoscalingv1.HorizontalPodAutoscalerSpec{
 			ScaleTargetRef: autoscalingv1.CrossVersionObjectReference{
-				Kind: "replicaset",
+				Kind: "ReplicaSet",


I think that you will need tests for Deployments also.

This is actually used only for crud test written with the sync controller (which tests crud of the hpa object alone). Meanwhile, I will add tests for both types in e2e.

ghost · 2017-08-11T20:55:52Z

federation/pkg/federatedtypes/hpa.go

+func (a *HpaAdapter) updateClusterListOnTargetObject(fedHpa *autoscalingv1.HorizontalPodAutoscaler, scheduleStatus map[string]*replicaNums) error {
+	if len(fedHpa.Spec.ScaleTargetRef.Kind) <= 0 || len(fedHpa.Spec.ScaleTargetRef.Name) <= 0 {
+		// nothing to do
+		return nil


I think you need to log what's happening here, for debugging purposes?

ghost · 2017-08-11T20:57:22Z

federation/pkg/federatedtypes/hpa.go

+	targetObj, err := getRuntimeObjectForKind(a.client, qualifiedKind, fedHpa.Namespace, fedHpa.Spec.ScaleTargetRef.Name)
+	if errors.IsNotFound(err) {
+		// Nothing to do; the target object does not exist in federation.
+		return nil


As above. A log message here?

ghost · 2017-08-11T20:59:22Z

federation/pkg/federatedtypes/hpa.go

 	}
+
+	if err := a.updateClusterListOnTargetObject(fedHpa, schedulingInfo.(*hpaSchedulingInfo).scheduleState); err != nil {
+		return err


In the case above you augment the error, but here not. Any particular reason?

I omitted that because, I was already formatting an error string in the functions getRuntimeObjectForKind() and updateRuntimeObjectForKind(). On second thoughts, and your suggestion, have augmented the string here also.

ghost · 2017-08-11T21:02:39Z

federation/pkg/federatedtypes/hpa.go

+	}
+}
+
+func updateRuntimeObjectForKind(c federationclientset.Interface, kind schema.GroupKind, ns string, obj pkgruntime.Object) (pkgruntime.Object, error) {


OK, fair enough. I'm still concerned that in this code we explicitly only support Replicasets and Deployments, while the cluster HPA also supported ReplicationControllers, and in future perhaps other types. Please make sure that this is clearly documented in the user guide.

ghost · 2017-08-11T21:42:28Z

federation/pkg/federatedtypes/scheduling.go

+		return nil, hpaControlled, error
+	}
+
+	if hpaSelectedClusters != nil {


This seems to imply that if the federated HPA has no cluster selector ( and hence targets all clusters), and the target object has a cluster selector, then the two can get out of sync? Should the lack of a cluster selector on the HPA, not transfer to the target object?

Please see my comment above.
Yes this is a choice, to implement (option 1 above). Please let me know if you think this is a better choice.

I think you've done the right thing (option 2).

ghost · 2017-08-11T21:51:27Z

federation/pkg/federatedtypes/scheduling.go

+		}
+		if hpaControlled {
+			if isSelected(hpaSelectedClusters.Names, clusterName) {
+				replicaState.isSelected = true


It's not clear to me what the meaning of having replicas set to zero here is? Or more generally, what the purpose of this field is. To the uninformed (like me :-) it could mean the number of replicas in the underlying target object (i.e. status) in which case it should current by undefined, rather than zero? Or does it mean something else?

ReplicaScheduleState stores the result of schedule() function specific to adapter (which cluster gets how many replicas). 0 means the particular cluster does not get any replicas, and is the initialised state too. Thought better to use 0 as that marker to avoid null checks. I have used an additional bool field to indicate if this cluster is selected in this reconcile pass or not, because we use the same field(s) to map the selection percolated from hpa or actual schedule function of this adapter.

OK, that makes sense.

ghost · 2017-08-11T21:53:39Z

federation/pkg/federatedtypes/scheduling.go

+		}
+		return &ReplicaSchedulingInfo{
+			ScheduleState: state,
+			Status:        ReplicaSchedulingStatus{},


Similar comment to above. Is having the fields of Status zero to indicate undefined, ambiguous with having the actual status from the underlying cluster be zero?

I think this was confusing because of the naming of the two structures also. I have updated the name with comments explaining the difference to avoid confusion also explained in this comment. Also the use of zero values in status fields to indicate unassigned in replicaset/deployments original code (and duplicated here), which I haven't really given much thought to.

OK, I think that's fine. If Kubernetes represents unknown status as zero, we should probably follow that for now.

ghost · 2017-08-11T21:55:13Z

federation/pkg/federatedtypes/scheduling.go

-		Schedule: schedule(plnr, obj, key, clusterNames, currentReplicasPerCluster, estimatedCapacity),
-		Status:   ReplicaSchedulingStatus{},
+		ScheduleState: schedule(plnr, obj, key, clusterNames, currentReplicasPerCluster, estimatedCapacity, initializedState),
+		Status:        ReplicaSchedulingStatus{},


Same comment as above.

ghost · 2017-08-11T22:06:00Z

federation/pkg/federation-controller/util/hpa/hpa.go

+const (
+	// FederatedAnnotationOnHpaTargetObj as key, is used by hpa controller to
+	// set selected cluster name list as annotation on the target object.
+	FederatedAnnotationOnHpaTargetObj = "federation.kubernetes.io/hpa-target-cluster-list"


OK, so rather than overwrite the user-specified cluster list on the target object, you add another list that overrides the former, right?

It does not override the former but the intersection of the two is used as elaborated here.

OK, thanks, makes sense.

irfanurrehman · 2017-08-15T16:16:15Z

@quinton-hoole updated this based on your comments PTAL!

After hpa controller determines the replica nums needed per cluster, it also controls the distribution of target objs (rs or deployment) into the correct clusters by telling the corresponding controllers, which clusters they should put the objects into (passed as list of selected clusters in annotations).

ghost

Thanks @irfanurrehman . The code looks good to me. I would like someone else to also do a code review before I approve for merging. I'll solicit help on slack.

ghost · 2017-08-28T21:32:23Z

No help offered on slack. I've added it to the SIG meeting notes as a PR requiring some additional love.

ghost · 2017-09-15T20:10:01Z

/approve

irfanurrehman · 2017-09-20T17:58:23Z

@quinton-hoole this needs lgtm as well, thanks a lot for handling this!

jdumars · 2017-09-20T18:04:28Z

@quinton-hoole @irfanurrehman can you follow the exception process to get this in 1.8? https://github.com/kubernetes/features/blob/master/EXCEPTIONS.md Also, please make sure this feature is mentioned in the https://github.com/kubernetes/features/blob/master/release-1.8/release_notes_draft.md Thanks!

irfanurrehman · 2017-09-21T10:34:38Z

@quinton-hoole @irfanurrehman can you follow the exception process to get this in 1.8? https://github.com/kubernetes/features/blob/master/EXCEPTIONS.md Also, please make sure this feature is mentioned in the https://github.com/kubernetes/features/blob/master/release-1.8/release_notes_draft.md Thanks!

@jdumars I did send in an exception request yesterday; please let me know if anything else might be needed. thanks!

jdumars · 2017-09-21T14:35:29Z

@quinton-hoole if you (and SIG Federation) LGTM/Approve this, it will go in 1.8 assuming you get it done today. Tomorrow is likely the hard cutoff for 1.8 except extraordinary fixes that need to be cherry picked in.

ghost · 2017-09-21T22:07:39Z

/lgtm no-issue

k8s-github-robot · 2017-09-21T22:07:56Z

/test all

Tests are more than 96 hours old. Re-running tests.

ghost · 2017-09-21T22:08:29Z

/approve no-issue

k8s-github-robot · 2017-09-21T22:08:54Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: irfanurrehman, quinton-hoole

Associated issue requirement bypassed by: quinton-hoole

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these OWNERS Files:

~~federation/OWNERS~~ [irfanurrehman,quinton-hoole]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

k8s-github-robot · 2017-09-21T23:07:30Z

/test all [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2017-09-22T00:01:10Z

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here..

k8s-ci-robot added sig/federation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 1, 2017

k8s-github-robot assigned madhusudancs and nikhiljindal Aug 1, 2017

k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Aug 1, 2017

irfanurrehman force-pushed the fed-hpa-targetObj branch from 9b53f43 to 43b7b26 Compare August 1, 2017 17:42

k8s-ci-robot assigned ghost Aug 1, 2017

This was referenced Aug 4, 2017

[Federation] hpa e2e tests #50168

Closed

[Federation] HPA controller #45993

Merged

irfanurrehman force-pushed the fed-hpa-targetObj branch from 43b7b26 to 41b6440 Compare August 5, 2017 19:41

ghost suggested changes Aug 8, 2017

View reviewed changes

irfanurrehman force-pushed the fed-hpa-targetObj branch from 41b6440 to 4994e64 Compare August 10, 2017 15:36

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 10, 2017

ghost suggested changes Aug 11, 2017

View reviewed changes

irfanurrehman force-pushed the fed-hpa-targetObj branch from 4994e64 to 1f9de1d Compare August 15, 2017 16:14

Irfan Ur Rehman added 2 commits August 24, 2017 15:34

[Federation]build files for hpa controller controlling target objects

da2db33

irfanurrehman force-pushed the fed-hpa-targetObj branch from 1f9de1d to da2db33 Compare August 24, 2017 10:07

ghost approved these changes Aug 24, 2017

View reviewed changes

irfanurrehman mentioned this pull request Sep 5, 2017

Federated Pod AutoScaler feature kubernetes/enhancements#257

Closed

irfanurrehman mentioned this pull request Sep 18, 2017

Federated Hpa feature doc kubernetes/website#5487

Merged

ghost added this to the v1.8 milestone Sep 19, 2017

jdumars added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 20, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 21, 2017

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 21, 2017

k8s-github-robot merged commit 8657a74 into kubernetes:master Sep 22, 2017

Conversation

irfanurrehman commented Aug 1, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

irfanurrehman commented Aug 1, 2017

Uh oh!

ghost commented Aug 7, 2017

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

irfanurrehman Aug 10, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ghost left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

irfanurrehman Aug 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

irfanurrehman Aug 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

irfanurrehman commented Aug 1, 2017 •

edited

Loading

irfanurrehman Aug 10, 2017 •

edited

Loading

irfanurrehman Aug 15, 2017 •

edited

Loading

irfanurrehman Aug 15, 2017 •

edited

Loading