Set default requests cpu resource to all deployments by ymesika · Pull Request #6229 · istio/istio

ymesika · 2018-06-13T08:41:09Z

Adding the minimal resources requests values to all components:

resources:
  requests:  
    cpu: 100m

This is the minimal set that enables the Horizontal Pod Autoscaler to function correctly.

Each component can overwrite those default resources by declaring the custom values within the chart's section in values.yaml.

Fixes #5976

codecov · 2018-06-13T08:53:09Z

Codecov Report

Merging #6229 into master will decrease coverage by 1%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #6229    +/-   ##
=======================================
- Coverage      71%     70%   -<1%     
=======================================
  Files         353     352     -1     
  Lines       30757   30099   -658     
=======================================
- Hits        21595   21016   -579     
+ Misses       8316    8249    -67     
+ Partials      846     834    -12

Impacted Files	Coverage Δ
istioctl/cmd/istioctl/proxystatus.go	`16% <0%> (-9%)`	⬇️
pilot/pkg/networking/plugin/authz/rbac.go	`73% <0%> (-6%)`	⬇️
mixer/adapter/stackdriver/helper/common.go	`79% <0%> (-4%)`	⬇️
pilot/pkg/serviceregistry/kube/controller.go	`66% <0%> (-2%)`	⬇️
mixer/adapter/redisquota/redisquota.go	`88% <0%> (-2%)`	⬇️
mixer/adapter/fluentd/fluentd.go	`74% <0%> (-1%)`	⬇️
mixer/adapter/stackdriver/trace/trace.go	`89% <0%> (ø)`	⬇️
pilot/pkg/networking/plugin/authz/util.go	`98% <0%> (ø)`	⬇️
pilot/pkg/networking/util/util.go	`30% <0%> (ø)`	⬇️
galley/pkg/runtime/resource/schema.go	`100% <0%> (ø)`	⬆️
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d6a0aa7...7785cd1. Read the comment docs.

ymesika · 2018-06-13T09:05:10Z

@linsun @ldemailly @costinm

~~I will probably need to update a lot of golden testsdata so~~ can you please let me know if putting those to all components is OK or just have it in "pure" control plane components.

costinm

/lgtm
/approve

ymesika · 2018-06-13T18:44:13Z

Last commit reduces the requested CPU from 100m to 50m.
The reason is that the CircleCI test node has 2CPUs which means that @100m only 20 containers with such request could be scheduled.
Some tests actually have more than that resulting in some pods in pending state. 50m seems to be sufficient for the test but I wonder whether we should request something like 10m just to allow HPA and have minimal side effect.

ymesika · 2018-06-13T18:45:41Z

@kyessenov What do you think?

ymesika · 2018-06-18T16:11:51Z

@frankbu What's your opinion re: #6229 (comment)
Thanks.

ldemailly · 2018-06-18T16:38:42Z

shouldn’t circle pass a lower value explicitly instead ? also how do we end up with 20 containers ?

ymesika · 2018-06-18T16:43:45Z

@ldemailly In this test, for instance, with 100m there was one pod waiting for available resources:

NAME                                          READY     STATUS    RESTARTS   AGE
a-7646dd46c5-cvjzs                            2/2       Running   0          4m
echosrv-deployment-1-7b4d6b7c84-cmkj9         2/2       Running   0          5m
echosrv-deployment-2-6c789cc7d5-g6948         2/2       Running   0          5m
istio-citadel-56df7c9cf-9v2zc                 1/1       Running   0          6m
istio-egressgateway-764f45594f-xqnzq          1/1       Running   0          6m
istio-galley-59cf69b6b-sfvw8                  1/1       Running   0          6m
istio-ingressgateway-9bf96d645-2cpq8          1/1       Running   0          6m
istio-pilot-57f798757d-md7z7                  2/2       Running   0          6m
istio-policy-6c57449875-6xm4l                 2/2       Running   0          6m
istio-sidecar-injector-66cc9fb84d-v6dw7       1/1       Running   0          6m
istio-statsd-prom-bridge-6978574dc4-n9jrm     1/1       Running   0          6m
istio-telemetry-6888d65655-s6pvb              2/2       Running   0          6m
nginx-default-http-backend-6d5b847486-9r8cd   2/2       Running   4          4m
nginx-ingress-controller-7854b56c8-t9smw      0/2       Pending   0          4m
prometheus-54b7c766d6-knr2t                   1/1       Running   0          6m
raw-cli-deployement-5cd88fb45d-qgghs          1/1       Running   0          5m
t-6bfc4495d-jd4k2                             1/1       Running   0          4m

What's the downside of having a low requested CPU? Like 10m?

ldemailly · 2018-06-18T16:54:11Z

the downside is if it’s too low the scheduler would cram too many pods on the same node at first only to have to move them when traffic starts / cpu usuage goes up

i guess it goes back to test values and production or even demo default aren’t the same

ymesika · 2018-06-18T17:09:44Z

Which might not be too different than what we have today when scheduling without any resources stated.

Fun facts - Default requests CPU*:

GKE - 0m
IBM - 0m
Azure - 0m

Brief google search

frankbu · 2018-06-18T19:48:10Z

I'm wondering if there is a single default value that is good to use for all the control pane services. Aren't some much more cpu intensive than others? How do we test the hpa to see if it's working properly (optimally?) with any given service / cpu request value?

# Conflicts: # install/kubernetes/helm/istio/charts/egressgateway/templates/deployment.yaml # install/kubernetes/helm/istio/charts/gateways/templates/deployment.yaml # install/kubernetes/helm/istio/values.yaml

ymesika · 2018-06-20T03:57:00Z

This is just the requested cpu (something like minimum requested).
Assuming that most k8s providers are defaulting it to 0m I don't think there is any harm in setting this value to 10m. Just to enable the HPA.
We can argue about whether we should apply it to all Istio components are just subset which are the core control plane components. I.e. don't apply it to Kiali, Servicegraph, etc.

gyliu513 · 2018-06-20T04:11:22Z

install/kubernetes/helm/istio/values.yaml

      requests:
-        cpu: 100m
-        memory: 128Mi
+        cpu: 10m


does it will impact the performance of proxy if setting cpu as 10m?

It shouldn't. Just notifies the scheduler that the proxy container needs at least 10m.
The 100m/128Mi were also arbitrarily set for the same purpose of enabling HPA to pods that contain the proxy. (in a nutshull: because all containers must have at least the requested cpu the proxy container used to block customers from having HPA on their injected pods).

Ah, I see, it is a best effort pod.

gyliu513 · 2018-06-20T04:23:37Z

/lgtm

ymesika · 2018-06-20T04:24:25Z

/hold

Golden injected testdata should be updated with the new proxy values. I will hold for now to make sure all (or majority of) us have an agreement about this before updating all golden testdata files.

# Conflicts: # install/kubernetes/helm/istio/values.yaml

ymesika · 2018-06-28T08:09:34Z

The PR is ready to be merged and on hold waiting for the participants agreement to add a requested cpu resource of 10m which is slightly above the default (in most providers) of 0m but enables HPA.

/hold cancel

# Conflicts: # install/kubernetes/helm/istio/values.yaml

ldemailly · 2018-06-30T15:46:52Z

sgtm but conflicts need to be resolved

istio-testing · 2018-06-30T17:13:27Z

@ymesika: The following test failed, say /retest to rerun them all:

Test name	Commit	Details	Rerun command
prow/istio-pilot-e2e.sh	`f4c8d5c`	link	`/test istio-pilot-e2e`

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

rshriram · 2018-07-02T20:32:11Z

/lgtm

istio-testing · 2018-07-02T20:32:16Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: costinm, gyliu513, rshriram, ymesika

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~install/OWNERS~~ [costinm,gyliu513,rshriram]
~~pilot/OWNERS~~ [costinm,rshriram]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

* Introducing mcpc, a command-line mcp client tool. (#6715) * Set default requests cpu resource to all deployments (#6229) * Set default requests cpu resource to all deployments * Description added * Requested CPU decreased to meet test env node capacity * Setting requested cpu to 10m as most k8s platforms are defaulting to 0m * Merge master remains * Apply the global resources to the new gateways * Updated goldens to cpu 10m * Clean up and rename the global default resources * Redundant Redis servers were removed + defer in a loop is not recommended. (#6768) * Fix upgrade. (#6792) * Fix upgrade. * Format * Move back the fail close - not clear if we have time to get mixer client in this build, we can remove in next * Add configuration to allow transformation of durations to integers. (#5416) The fluentd handler currently transforms durations to their string representation for log messages. This commit adds a configuration to override this default behaviour and tranform durations to integers of unit millisecond. * Add workload and service dashboard (#6789) * Add workload and service dashboard * fix typo * Skip service dash e2e test for now * Add rushed, missed workload dash * Touch up * Commenting out test until can replicate locally and fix * simplify fmt script (#6798) * simplify fmt script - goimports is a superset of gofmt - shouldn't need to remove spaces - now utilizing -l flag to tell us if there is a diff * always update gofmt to latest * spacing that is now considered incorrect * support hash based routing for soft session affinity (#6742) * support sticky sessions * fmt :( * update servicediscovery fakes * bump istio/api * use updated api for ring hash * support session affinity for sidecar * update destination rule yaml to reflect new api * ttl is a string * unset sdsUdsPath in configmap (#6799) * fix bug so that destination.service.** attributes are collected (#6801) * remove unnecessary generated attributes finding. (#6785) * modify docker template files for proxyv2 (#6790) * Long-running testing improvements (#6800) * Add values for config map settings, including access log. More docs. * Updates and improvements for the stress-testing configs. * Add values for config map settings, including access log. (#6797) * Add values for config map settings, including access log. More docs. * Updates and improvements for the stress-testing configs. * Address review comments * Merged wrong files * Add the setup helm file - this change now depend on the previous PR. * Sync with remote, remove accidentally added files. * Another accidental file * SNI routing from sidecar to gateway with virtual services (#6402) * quick sni matching 1st pass with no refactoring of existing code * use shriram's api sha * quick pass at using tls block * add some validation * copyright * fix lint + remove deadcode * rename protocol tcp_tls -> tls * update back to istio/api master * remove accidentally added test file * add tls block to gateway logic * add todos * basic sni wildcard implementation * add tcp, fix problems with rbac, matching * better tcp + tls validation * address code review comments * remove out of date comment * update comments * fix compile error * use tcp proxy in tcp routing * add tcp routing e2e test * add forgotten vs config file + update description of test * Comments, bug fixes Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * cleanup gateway tcp test Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * moving networking test yamls Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * tcp/tls tests Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * yaml fixes Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * fix file switcheroo * port matches Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * fix authN plugin overwriting TLS context Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * more tests - route via egress gateway Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * yaml fixes Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * initialize prom variables Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * split tests Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * final test fix hopefully Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * revert gateway tweaks Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * Add myself back to OWNERS, sort the list alphabetically. (#6830) * Upload cloudfoundry test result to GCS (#6829) * Fix typos in command-line output.

Set default requests cpu resource to all deployments

0e693b1

istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 13, 2018

googlebot added the cla: yes label Jun 13, 2018

istio-testing requested a review from linsun June 13, 2018 08:45

Description added

a0eaf1f

costinm approved these changes Jun 13, 2018

View reviewed changes

istio-testing assigned costinm Jun 13, 2018

istio-testing added lgtm labels Jun 13, 2018

Requested CPU decreased to meet test env node capacity

f4c8d5c

istio-testing removed the lgtm label Jun 13, 2018

ymesika changed the title ~~[WIP] Set default requests cpu resource to all deployments~~ Set default requests cpu resource to all deployments Jun 14, 2018

istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 14, 2018

ymesika changed the title ~~Set default requests cpu resource to all deployments~~ [WIP] Set default requests cpu resource to all deployments Jun 18, 2018

istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 18, 2018

ldemailly mentioned this pull request Jun 18, 2018

Pod healthchecks & resource request/limits #384

Closed

ymesika added 2 commits June 20, 2018 06:46

Merge branch 'master' into defaultResourcesRequests

9dd0f4a

# Conflicts: # install/kubernetes/helm/istio/charts/egressgateway/templates/deployment.yaml # install/kubernetes/helm/istio/charts/gateways/templates/deployment.yaml # install/kubernetes/helm/istio/values.yaml

Setting requested cpu to 10m as most k8s platforms are defaulting to 0m

1e94496

ymesika changed the title ~~[WIP] Set default requests cpu resource to all deployments~~ Set default requests cpu resource to all deployments Jun 20, 2018

istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 20, 2018

ymesika added 2 commits June 20, 2018 07:02

Merge master remains

3a04ab6

Apply the global resources to the new gateways

021c093

gyliu513 reviewed Jun 20, 2018

View reviewed changes

istio-testing assigned gyliu513 Jun 20, 2018

istio-testing added the lgtm label Jun 20, 2018

istio-testing added the do-not-merge/hold Block automatic merging of a PR. label Jun 20, 2018

Merge branch 'master' into defaultResourcesRequests

ac9b3c6

# Conflicts: # install/kubernetes/helm/istio/values.yaml

istio-testing removed the lgtm label Jun 27, 2018

ymesika added 3 commits June 28, 2018 09:31

Updated goldens to cpu 10m

0ba7438

Merge branch 'master' into defaultResourcesRequests

8f379a8

Clean up and rename the global default resources

388e547

istio-testing added needs-rebase Indicates a PR needs to be rebased before being merged and removed do-not-merge/hold Block automatic merging of a PR. labels Jun 28, 2018

Merge branch 'master' into defaultResourcesRequests

8e0431f

istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 29, 2018

Merge branch 'master' into defaultResourcesRequests

e4e6010

# Conflicts: # install/kubernetes/helm/istio/values.yaml

Merge branch 'master' into defaultResourcesRequests

7785cd1

istio-testing assigned rshriram Jul 2, 2018

istio-testing added the lgtm label Jul 2, 2018

istio-testing merged commit e5a09e9 into istio:master Jul 2, 2018

ymesika deleted the defaultResourcesRequests branch July 9, 2018 06:05

Conversation

ymesika commented Jun 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jun 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ymesika commented Jun 13, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

costinm left a comment

Choose a reason for hiding this comment

Uh oh!

ymesika commented Jun 13, 2018

Uh oh!

ymesika commented Jun 13, 2018

Uh oh!

ymesika commented Jun 18, 2018

Uh oh!

ldemailly commented Jun 18, 2018

Uh oh!

ymesika commented Jun 18, 2018

Uh oh!

ldemailly commented Jun 18, 2018

Uh oh!

ymesika commented Jun 18, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

frankbu commented Jun 18, 2018

Uh oh!

ymesika commented Jun 20, 2018

Uh oh!

gyliu513 Jun 20, 2018

Choose a reason for hiding this comment

Uh oh!

ymesika Jun 20, 2018

Choose a reason for hiding this comment

Uh oh!

gyliu513 Jun 20, 2018

Choose a reason for hiding this comment

Uh oh!

gyliu513 commented Jun 20, 2018

Uh oh!

ymesika commented Jun 20, 2018

Uh oh!

ymesika commented Jun 28, 2018

Uh oh!

ldemailly commented Jun 30, 2018

Uh oh!

istio-testing commented Jun 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rshriram commented Jul 2, 2018

Uh oh!

istio-testing commented Jul 2, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ymesika commented Jun 13, 2018 •

edited

Loading

codecov bot commented Jun 13, 2018 •

edited

Loading

ymesika commented Jun 13, 2018 •

edited

Loading

ymesika commented Jun 18, 2018 •

edited

Loading

istio-testing commented Jun 30, 2018 •

edited

Loading