Set default requests cpu resource to all deployments#6229
Set default requests cpu resource to all deployments#6229istio-testing merged 14 commits intoistio:masterfrom ymesika:defaultResourcesRequests
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6229 +/- ##
=======================================
- Coverage 71% 70% -<1%
=======================================
Files 353 352 -1
Lines 30757 30099 -658
=======================================
- Hits 21595 21016 -579
+ Misses 8316 8249 -67
+ Partials 846 834 -12
Continue to review full report at Codecov.
|
|
|
|
Last commit reduces the requested CPU from 100m to 50m. |
|
@kyessenov What do you think? |
|
@frankbu What's your opinion re: #6229 (comment) |
|
shouldn’t circle pass a lower value explicitly instead ? also how do we end up with 20 containers ? |
|
@ldemailly In this test, for instance, with 100m there was one pod waiting for available resources: NAME READY STATUS RESTARTS AGE
a-7646dd46c5-cvjzs 2/2 Running 0 4m
echosrv-deployment-1-7b4d6b7c84-cmkj9 2/2 Running 0 5m
echosrv-deployment-2-6c789cc7d5-g6948 2/2 Running 0 5m
istio-citadel-56df7c9cf-9v2zc 1/1 Running 0 6m
istio-egressgateway-764f45594f-xqnzq 1/1 Running 0 6m
istio-galley-59cf69b6b-sfvw8 1/1 Running 0 6m
istio-ingressgateway-9bf96d645-2cpq8 1/1 Running 0 6m
istio-pilot-57f798757d-md7z7 2/2 Running 0 6m
istio-policy-6c57449875-6xm4l 2/2 Running 0 6m
istio-sidecar-injector-66cc9fb84d-v6dw7 1/1 Running 0 6m
istio-statsd-prom-bridge-6978574dc4-n9jrm 1/1 Running 0 6m
istio-telemetry-6888d65655-s6pvb 2/2 Running 0 6m
nginx-default-http-backend-6d5b847486-9r8cd 2/2 Running 4 4m
nginx-ingress-controller-7854b56c8-t9smw 0/2 Pending 0 4m
prometheus-54b7c766d6-knr2t 1/1 Running 0 6m
raw-cli-deployement-5cd88fb45d-qgghs 1/1 Running 0 5m
t-6bfc4495d-jd4k2 1/1 Running 0 4mWhat's the downside of having a low requested CPU? Like 10m? |
|
the downside is if it’s too low the scheduler would cram too many pods on the same node at first only to have to move them when traffic starts / cpu usuage goes up i guess it goes back to test values and production or even demo default aren’t the same |
|
Which might not be too different than what we have today when scheduling without any resources stated. Fun facts - Default requests CPU*:
|
|
I'm wondering if there is a single default value that is good to use for all the control pane services. Aren't some much more cpu intensive than others? How do we test the hpa to see if it's working properly (optimally?) with any given service / cpu request value? |
# Conflicts: # install/kubernetes/helm/istio/charts/egressgateway/templates/deployment.yaml # install/kubernetes/helm/istio/charts/gateways/templates/deployment.yaml # install/kubernetes/helm/istio/values.yaml
|
This is just the requested cpu (something like minimum requested). |
| requests: | ||
| cpu: 100m | ||
| memory: 128Mi | ||
| cpu: 10m |
There was a problem hiding this comment.
does it will impact the performance of proxy if setting cpu as 10m?
There was a problem hiding this comment.
It shouldn't. Just notifies the scheduler that the proxy container needs at least 10m.
The 100m/128Mi were also arbitrarily set for the same purpose of enabling HPA to pods that contain the proxy. (in a nutshull: because all containers must have at least the requested cpu the proxy container used to block customers from having HPA on their injected pods).
There was a problem hiding this comment.
Ah, I see, it is a best effort pod.
|
/lgtm |
|
/hold Golden injected testdata should be updated with the new proxy values. I will hold for now to make sure all (or majority of) us have an agreement about this before updating all golden testdata files. |
# Conflicts: # install/kubernetes/helm/istio/values.yaml
|
The PR is ready to be merged and on hold waiting for the participants agreement to add a requested cpu resource of 10m which is slightly above the default (in most providers) of 0m but enables HPA. /hold cancel |
# Conflicts: # install/kubernetes/helm/istio/values.yaml
|
sgtm but conflicts need to be resolved |
|
@ymesika: The following test failed, say
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: costinm, gyliu513, rshriram, ymesika The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Introducing mcpc, a command-line mcp client tool. (#6715) * Set default requests cpu resource to all deployments (#6229) * Set default requests cpu resource to all deployments * Description added * Requested CPU decreased to meet test env node capacity * Setting requested cpu to 10m as most k8s platforms are defaulting to 0m * Merge master remains * Apply the global resources to the new gateways * Updated goldens to cpu 10m * Clean up and rename the global default resources * Redundant Redis servers were removed + defer in a loop is not recommended. (#6768) * Fix upgrade. (#6792) * Fix upgrade. * Format * Move back the fail close - not clear if we have time to get mixer client in this build, we can remove in next * Add configuration to allow transformation of durations to integers. (#5416) The fluentd handler currently transforms durations to their string representation for log messages. This commit adds a configuration to override this default behaviour and tranform durations to integers of unit millisecond. * Add workload and service dashboard (#6789) * Add workload and service dashboard * fix typo * Skip service dash e2e test for now * Add rushed, missed workload dash * Touch up * Commenting out test until can replicate locally and fix * simplify fmt script (#6798) * simplify fmt script - goimports is a superset of gofmt - shouldn't need to remove spaces - now utilizing -l flag to tell us if there is a diff * always update gofmt to latest * spacing that is now considered incorrect * support hash based routing for soft session affinity (#6742) * support sticky sessions * fmt :( * update servicediscovery fakes * bump istio/api * use updated api for ring hash * support session affinity for sidecar * update destination rule yaml to reflect new api * ttl is a string * unset sdsUdsPath in configmap (#6799) * fix bug so that destination.service.** attributes are collected (#6801) * remove unnecessary generated attributes finding. (#6785) * modify docker template files for proxyv2 (#6790) * Long-running testing improvements (#6800) * Add values for config map settings, including access log. More docs. * Updates and improvements for the stress-testing configs. * Add values for config map settings, including access log. (#6797) * Add values for config map settings, including access log. More docs. * Updates and improvements for the stress-testing configs. * Address review comments * Merged wrong files * Add the setup helm file - this change now depend on the previous PR. * Sync with remote, remove accidentally added files. * Another accidental file * SNI routing from sidecar to gateway with virtual services (#6402) * quick sni matching 1st pass with no refactoring of existing code * use shriram's api sha * quick pass at using tls block * add some validation * copyright * fix lint + remove deadcode * rename protocol tcp_tls -> tls * update back to istio/api master * remove accidentally added test file * add tls block to gateway logic * add todos * basic sni wildcard implementation * add tcp, fix problems with rbac, matching * better tcp + tls validation * address code review comments * remove out of date comment * update comments * fix compile error * use tcp proxy in tcp routing * add tcp routing e2e test * add forgotten vs config file + update description of test * Comments, bug fixes Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * cleanup gateway tcp test Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * moving networking test yamls Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * tcp/tls tests Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * yaml fixes Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * fix file switcheroo * port matches Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * fix authN plugin overwriting TLS context Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * more tests - route via egress gateway Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * yaml fixes Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * initialize prom variables Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * split tests Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * final test fix hopefully Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * revert gateway tweaks Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com> * Add myself back to OWNERS, sort the list alphabetically. (#6830) * Upload cloudfoundry test result to GCS (#6829) * Fix typos in command-line output.
Adding the minimal resources requests values to all components:
This is the minimal set that enables the Horizontal Pod Autoscaler to function correctly.
Each component can overwrite those default resources by declaring the custom values within the chart's section in
values.yaml.Fixes #5976