Skip to content

[Federation] Hpa controller controls target objects#49950

Merged
k8s-github-robot merged 2 commits intokubernetes:masterfrom
irfanurrehman:fed-hpa-targetObj
Sep 22, 2017
Merged

[Federation] Hpa controller controls target objects#49950
k8s-github-robot merged 2 commits intokubernetes:masterfrom
irfanurrehman:fed-hpa-targetObj

Conversation

@irfanurrehman
Copy link
Copy Markdown

@irfanurrehman irfanurrehman commented Aug 1, 2017

This is in the series of PRs over #45993.
The last commit is reviewable. Probably the last PR in this chain with e2e tests for relevant scenario, including the scenario created by this PR is soon to follow.

Special notes for your reviewer:
@kubernetes/sig-federation-pr-reviews
@quinton-hoole

Release note:

@k8s-ci-robot k8s-ci-robot added sig/federation cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 1, 2017
@k8s-github-robot k8s-github-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Aug 1, 2017
@irfanurrehman
Copy link
Copy Markdown
Author

/assign @quinton-hoole

@ghost
Copy link
Copy Markdown

ghost commented Aug 7, 2017

/retest

Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More review to come...

}
}

func updateRuntimeObjectForKind(c federationclientset.Interface, kind schema.GroupKind, ns string, obj pkgruntime.Object) (pkgruntime.Object, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the above function seem rather brittle? It seems that you should use the list of object types supported by the cluster HPA's ideally (i.e. not hardcode the names of the types here, and have to update them here every time the cluster HPA's get updated. I realize that this is perhaps non-trivial, and am happy to defer it to a later PR provided that we open a known issue for it now.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I did look to do something similar what you suggest, but to my surprise found that, at almost all places, where ever the kind or the resource name is initialised, its used as a string directly.
For example federation-apiserver default resource config (which is done quite similar in kube-apiserver) or specifying a kind, as in deployment controller ref kind or the replicaset controller kind itself.
I did go ahead to use the string as in the code reviewed, thus.

Additionally, in k8s api, the hpa targetref object is not interpreted (as is done here) within the hpa controllers purview. The scales subresource is used instead. So what ever target name and kind comes up in the hpa spec, the scales api is called using the same. Only particular types support this subresource, thus the api call will fail, if called on a wrong type or kind (which probably was wrong in the hpa spec, specified by the user), ref.
I did use only "Deployment" and "ReplicaSet" in the kind check here, because those are the only two kinds which are available against federation API.
I am not sure, how else can I handle it. On the other hand, I have limited understanding of the API machinery; @nikhiljindal would you kindly be able to help a bit on this, thanks!

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fair enough. I'm still concerned that in this code we explicitly only support Replicasets and Deployments, while the cluster HPA also supported ReplicationControllers, and in future perhaps other types. Please make sure that this is clearly documented in the user guide.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I completely agree, but as mentioned I did not find a better way for this.
Also, I have used Deployment and ReplicaSet only in the target objects, because those are the two objects which are supported in control plane as of now. I have added comments on top of the functions for now, with a TODO to update the documentation when I do that.

return true
}

func isSelected(names []string, name string) bool {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather replace with a simple set inclusion operation? That would be less code, and more efficient. I think we have a function in the utils library to do that. StringSet or similar.

Copy link
Copy Markdown
Author

@irfanurrehman irfanurrehman Aug 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are talking about this, I found it less efficient then using the simple slice like I have used here, simply by the fact that it uses maps (I have used it in hpa controller though, because of the need of set operations over there).
I could have used this string set to store the cluster list in the first place to ensure, I don't introduce a function like the one, I got this comment on. But a slice suited the need better because of marshalling being super easy as here.
Please let me know if you have a better suggestion.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fair enough.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack.

@k8s-github-robot k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Aug 10, 2017
Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass review complete. You seem to be missing unit tests on some of this code?

Spec: autoscalingv1.HorizontalPodAutoscalerSpec{
ScaleTargetRef: autoscalingv1.CrossVersionObjectReference{
Kind: "replicaset",
Kind: "ReplicaSet",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you will need tests for Deployments also.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually used only for crud test written with the sync controller (which tests crud of the hpa object alone). Meanwhile, I will add tests for both types in e2e.

func (a *HpaAdapter) updateClusterListOnTargetObject(fedHpa *autoscalingv1.HorizontalPodAutoscaler, scheduleStatus map[string]*replicaNums) error {
if len(fedHpa.Spec.ScaleTargetRef.Kind) <= 0 || len(fedHpa.Spec.ScaleTargetRef.Name) <= 0 {
// nothing to do
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to log what's happening here, for debugging purposes?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

targetObj, err := getRuntimeObjectForKind(a.client, qualifiedKind, fedHpa.Namespace, fedHpa.Spec.ScaleTargetRef.Name)
if errors.IsNotFound(err) {
// Nothing to do; the target object does not exist in federation.
return nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above. A log message here?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated!

}

if err := a.updateClusterListOnTargetObject(fedHpa, schedulingInfo.(*hpaSchedulingInfo).scheduleState); err != nil {
return err
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case above you augment the error, but here not. Any particular reason?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I omitted that because, I was already formatting an error string in the functions getRuntimeObjectForKind() and updateRuntimeObjectForKind(). On second thoughts, and your suggestion, have augmented the string here also.

}
}

func updateRuntimeObjectForKind(c federationclientset.Interface, kind schema.GroupKind, ns string, obj pkgruntime.Object) (pkgruntime.Object, error) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fair enough. I'm still concerned that in this code we explicitly only support Replicasets and Deployments, while the cluster HPA also supported ReplicationControllers, and in future perhaps other types. Please make sure that this is clearly documented in the user guide.

return nil, hpaControlled, error
}

if hpaSelectedClusters != nil {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to imply that if the federated HPA has no cluster selector ( and hence targets all clusters), and the target object has a cluster selector, then the two can get out of sync? Should the lack of a cluster selector on the HPA, not transfer to the target object?

Copy link
Copy Markdown
Author

@irfanurrehman irfanurrehman Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see my comment above.
Yes this is a choice, to implement (option 1 above). Please let me know if you think this is a better choice.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you've done the right thing (option 2).

}
if hpaControlled {
if isSelected(hpaSelectedClusters.Names, clusterName) {
replicaState.isSelected = true
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what the meaning of having replicas set to zero here is? Or more generally, what the purpose of this field is. To the uninformed (like me :-) it could mean the number of replicas in the underlying target object (i.e. status) in which case it should current by undefined, rather than zero? Or does it mean something else?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReplicaScheduleState stores the result of schedule() function specific to adapter (which cluster gets how many replicas). 0 means the particular cluster does not get any replicas, and is the initialised state too. Thought better to use 0 as that marker to avoid null checks. I have used an additional bool field to indicate if this cluster is selected in this reconcile pass or not, because we use the same field(s) to map the selection percolated from hpa or actual schedule function of this adapter.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, that makes sense.

}
return &ReplicaSchedulingInfo{
ScheduleState: state,
Status: ReplicaSchedulingStatus{},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comment to above. Is having the fields of Status zero to indicate undefined, ambiguous with having the actual status from the underlying cluster be zero?

Copy link
Copy Markdown
Author

@irfanurrehman irfanurrehman Aug 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was confusing because of the naming of the two structures also. I have updated the name with comments explaining the difference to avoid confusion also explained in this comment. Also the use of zero values in status fields to indicate unassigned in replicaset/deployments original code (and duplicated here), which I haven't really given much thought to.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I think that's fine. If Kubernetes represents unknown status as zero, we should probably follow that for now.

Schedule: schedule(plnr, obj, key, clusterNames, currentReplicasPerCluster, estimatedCapacity),
Status: ReplicaSchedulingStatus{},
ScheduleState: schedule(plnr, obj, key, clusterNames, currentReplicasPerCluster, estimatedCapacity, initializedState),
Status: ReplicaSchedulingStatus{},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above.

const (
// FederatedAnnotationOnHpaTargetObj as key, is used by hpa controller to
// set selected cluster name list as annotation on the target object.
FederatedAnnotationOnHpaTargetObj = "federation.kubernetes.io/hpa-target-cluster-list"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so rather than overwrite the user-specified cluster list on the target object, you add another list that overrides the former, right?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not override the former but the intersection of the two is used as elaborated here.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks, makes sense.

@irfanurrehman
Copy link
Copy Markdown
Author

@quinton-hoole updated this based on your comments PTAL!

Irfan Ur Rehman added 2 commits August 24, 2017 15:34
After hpa controller determines the replica nums needed per cluster, it also
controls the distribution of target objs (rs or deployment) into the correct
clusters by telling the corresponding controllers, which clusters they should
put the objects into (passed as list of selected clusters in annotations).
Copy link
Copy Markdown

@ghost ghost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @irfanurrehman . The code looks good to me. I would like someone else to also do a code review before I approve for merging. I'll solicit help on slack.

@ghost
Copy link
Copy Markdown

ghost commented Aug 28, 2017

No help offered on slack. I've added it to the SIG meeting notes as a PR requiring some additional love.

@ghost
Copy link
Copy Markdown

ghost commented Sep 15, 2017

/approve

@ghost ghost added this to the v1.8 milestone Sep 19, 2017
@irfanurrehman
Copy link
Copy Markdown
Author

@quinton-hoole this needs lgtm as well, thanks a lot for handling this!

@jdumars
Copy link
Copy Markdown
Contributor

jdumars commented Sep 20, 2017

@quinton-hoole @irfanurrehman can you follow the exception process to get this in 1.8? https://github.com/kubernetes/features/blob/master/EXCEPTIONS.md Also, please make sure this feature is mentioned in the https://github.com/kubernetes/features/blob/master/release-1.8/release_notes_draft.md Thanks!

@jdumars jdumars added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Sep 20, 2017
@irfanurrehman
Copy link
Copy Markdown
Author

@quinton-hoole @irfanurrehman can you follow the exception process to get this in 1.8? https://github.com/kubernetes/features/blob/master/EXCEPTIONS.md Also, please make sure this feature is mentioned in the https://github.com/kubernetes/features/blob/master/release-1.8/release_notes_draft.md Thanks!

@jdumars I did send in an exception request yesterday; please let me know if anything else might be needed. thanks!

@jdumars
Copy link
Copy Markdown
Contributor

jdumars commented Sep 21, 2017

@quinton-hoole if you (and SIG Federation) LGTM/Approve this, it will go in 1.8 assuming you get it done today. Tomorrow is likely the hard cutoff for 1.8 except extraordinary fixes that need to be cherry picked in.

@ghost
Copy link
Copy Markdown

ghost commented Sep 21, 2017

/lgtm no-issue

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 21, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 21, 2017
@k8s-github-robot
Copy link
Copy Markdown

/test all

Tests are more than 96 hours old. Re-running tests.

@ghost
Copy link
Copy Markdown

ghost commented Sep 21, 2017

/approve no-issue

@k8s-github-robot
Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: irfanurrehman, quinton-hoole

Associated issue requirement bypassed by: quinton-hoole

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-github-robot
Copy link
Copy Markdown

/test all [submit-queue is verifying that this PR is safe to merge]

@k8s-github-robot
Copy link
Copy Markdown

Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions here..

@k8s-github-robot k8s-github-robot merged commit 8657a74 into kubernetes:master Sep 22, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants