Skip to content

Set default requests cpu resource to all deployments#6229

Merged
istio-testing merged 14 commits intoistio:masterfrom
ymesika:defaultResourcesRequests
Jul 2, 2018
Merged

Set default requests cpu resource to all deployments#6229
istio-testing merged 14 commits intoistio:masterfrom
ymesika:defaultResourcesRequests

Conversation

@ymesika
Copy link
Copy Markdown
Member

@ymesika ymesika commented Jun 13, 2018

Adding the minimal resources requests values to all components:

resources:
  requests:  
    cpu: 100m

This is the minimal set that enables the Horizontal Pod Autoscaler to function correctly.

Each component can overwrite those default resources by declaring the custom values within the chart's section in values.yaml.

Fixes #5976

@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 13, 2018
@istio-testing istio-testing requested a review from linsun June 13, 2018 08:45
@codecov
Copy link
Copy Markdown

codecov bot commented Jun 13, 2018

Codecov Report

Merging #6229 into master will decrease coverage by 1%.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #6229    +/-   ##
=======================================
- Coverage      71%     70%   -<1%     
=======================================
  Files         353     352     -1     
  Lines       30757   30099   -658     
=======================================
- Hits        21595   21016   -579     
+ Misses       8316    8249    -67     
+ Partials      846     834    -12
Impacted Files Coverage Δ
istioctl/cmd/istioctl/proxystatus.go 16% <0%> (-9%) ⬇️
pilot/pkg/networking/plugin/authz/rbac.go 73% <0%> (-6%) ⬇️
mixer/adapter/stackdriver/helper/common.go 79% <0%> (-4%) ⬇️
pilot/pkg/serviceregistry/kube/controller.go 66% <0%> (-2%) ⬇️
mixer/adapter/redisquota/redisquota.go 88% <0%> (-2%) ⬇️
mixer/adapter/fluentd/fluentd.go 74% <0%> (-1%) ⬇️
mixer/adapter/stackdriver/trace/trace.go 89% <0%> (ø) ⬇️
pilot/pkg/networking/plugin/authz/util.go 98% <0%> (ø) ⬇️
pilot/pkg/networking/util/util.go 30% <0%> (ø) ⬇️
galley/pkg/runtime/resource/schema.go 100% <0%> (ø) ⬆️
... and 49 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d6a0aa7...7785cd1. Read the comment docs.

@ymesika
Copy link
Copy Markdown
Member Author

ymesika commented Jun 13, 2018

@linsun @ldemailly @costinm

I will probably need to update a lot of golden testsdata so can you please let me know if putting those to all components is OK or just have it in "pure" control plane components.

Copy link
Copy Markdown
Contributor

@costinm costinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@ymesika
Copy link
Copy Markdown
Member Author

ymesika commented Jun 13, 2018

Last commit reduces the requested CPU from 100m to 50m.
The reason is that the CircleCI test node has 2CPUs which means that @100m only 20 containers with such request could be scheduled.
Some tests actually have more than that resulting in some pods in pending state. 50m seems to be sufficient for the test but I wonder whether we should request something like 10m just to allow HPA and have minimal side effect.

@ymesika
Copy link
Copy Markdown
Member Author

ymesika commented Jun 13, 2018

@kyessenov What do you think?

@ymesika ymesika changed the title [WIP] Set default requests cpu resource to all deployments Set default requests cpu resource to all deployments Jun 14, 2018
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 14, 2018
@ymesika
Copy link
Copy Markdown
Member Author

ymesika commented Jun 18, 2018

@frankbu What's your opinion re: #6229 (comment)
Thanks.

@ldemailly
Copy link
Copy Markdown
Member

shouldn’t circle pass a lower value explicitly instead ? also how do we end up with 20 containers ?

@ymesika
Copy link
Copy Markdown
Member Author

ymesika commented Jun 18, 2018

@ldemailly In this test, for instance, with 100m there was one pod waiting for available resources:

NAME                                          READY     STATUS    RESTARTS   AGE
a-7646dd46c5-cvjzs                            2/2       Running   0          4m
echosrv-deployment-1-7b4d6b7c84-cmkj9         2/2       Running   0          5m
echosrv-deployment-2-6c789cc7d5-g6948         2/2       Running   0          5m
istio-citadel-56df7c9cf-9v2zc                 1/1       Running   0          6m
istio-egressgateway-764f45594f-xqnzq          1/1       Running   0          6m
istio-galley-59cf69b6b-sfvw8                  1/1       Running   0          6m
istio-ingressgateway-9bf96d645-2cpq8          1/1       Running   0          6m
istio-pilot-57f798757d-md7z7                  2/2       Running   0          6m
istio-policy-6c57449875-6xm4l                 2/2       Running   0          6m
istio-sidecar-injector-66cc9fb84d-v6dw7       1/1       Running   0          6m
istio-statsd-prom-bridge-6978574dc4-n9jrm     1/1       Running   0          6m
istio-telemetry-6888d65655-s6pvb              2/2       Running   0          6m
nginx-default-http-backend-6d5b847486-9r8cd   2/2       Running   4          4m
nginx-ingress-controller-7854b56c8-t9smw      0/2       Pending   0          4m
prometheus-54b7c766d6-knr2t                   1/1       Running   0          6m
raw-cli-deployement-5cd88fb45d-qgghs          1/1       Running   0          5m
t-6bfc4495d-jd4k2                             1/1       Running   0          4m

What's the downside of having a low requested CPU? Like 10m?

@ldemailly
Copy link
Copy Markdown
Member

the downside is if it’s too low the scheduler would cram too many pods on the same node at first only to have to move them when traffic starts / cpu usuage goes up

i guess it goes back to test values and production or even demo default aren’t the same

@ymesika
Copy link
Copy Markdown
Member Author

ymesika commented Jun 18, 2018

Which might not be too different than what we have today when scheduling without any resources stated.

Fun facts - Default requests CPU*:

GKE - 0m
IBM - 0m
Azure - 0m

  • Brief google search

@ymesika ymesika changed the title Set default requests cpu resource to all deployments [WIP] Set default requests cpu resource to all deployments Jun 18, 2018
@istio-testing istio-testing added the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 18, 2018
@frankbu
Copy link
Copy Markdown
Contributor

frankbu commented Jun 18, 2018

I'm wondering if there is a single default value that is good to use for all the control pane services. Aren't some much more cpu intensive than others? How do we test the hpa to see if it's working properly (optimally?) with any given service / cpu request value?

ymesika added 2 commits June 20, 2018 06:46
# Conflicts:
#	install/kubernetes/helm/istio/charts/egressgateway/templates/deployment.yaml
#	install/kubernetes/helm/istio/charts/gateways/templates/deployment.yaml
#	install/kubernetes/helm/istio/values.yaml
@ymesika
Copy link
Copy Markdown
Member Author

ymesika commented Jun 20, 2018

This is just the requested cpu (something like minimum requested).
Assuming that most k8s providers are defaulting it to 0m I don't think there is any harm in setting this value to 10m. Just to enable the HPA.
We can argue about whether we should apply it to all Istio components are just subset which are the core control plane components. I.e. don't apply it to Kiali, Servicegraph, etc.

@ymesika ymesika changed the title [WIP] Set default requests cpu resource to all deployments Set default requests cpu resource to all deployments Jun 20, 2018
@istio-testing istio-testing removed the do-not-merge/work-in-progress Block merging of a PR because it isn't ready yet. label Jun 20, 2018
requests:
cpu: 100m
memory: 128Mi
cpu: 10m
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it will impact the performance of proxy if setting cpu as 10m?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't. Just notifies the scheduler that the proxy container needs at least 10m.
The 100m/128Mi were also arbitrarily set for the same purpose of enabling HPA to pods that contain the proxy. (in a nutshull: because all containers must have at least the requested cpu the proxy container used to block customers from having HPA on their injected pods).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see, it is a best effort pod.

@gyliu513
Copy link
Copy Markdown
Member

/lgtm

@ymesika
Copy link
Copy Markdown
Member Author

ymesika commented Jun 20, 2018

/hold

Golden injected testdata should be updated with the new proxy values. I will hold for now to make sure all (or majority of) us have an agreement about this before updating all golden testdata files.

@istio-testing istio-testing added the do-not-merge/hold Block automatic merging of a PR. label Jun 20, 2018
# Conflicts:
#	install/kubernetes/helm/istio/values.yaml
@ymesika
Copy link
Copy Markdown
Member Author

ymesika commented Jun 28, 2018

The PR is ready to be merged and on hold waiting for the participants agreement to add a requested cpu resource of 10m which is slightly above the default (in most providers) of 0m but enables HPA.

/hold cancel

@istio-testing istio-testing added needs-rebase Indicates a PR needs to be rebased before being merged and removed do-not-merge/hold Block automatic merging of a PR. labels Jun 28, 2018
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 29, 2018
# Conflicts:
#	install/kubernetes/helm/istio/values.yaml
@ldemailly
Copy link
Copy Markdown
Member

sgtm but conflicts need to be resolved

@istio-testing
Copy link
Copy Markdown
Collaborator

istio-testing commented Jun 30, 2018

@ymesika: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
prow/istio-pilot-e2e.sh f4c8d5c link /test istio-pilot-e2e
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@rshriram
Copy link
Copy Markdown
Member

rshriram commented Jul 2, 2018

/lgtm

@istio-testing
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: costinm, gyliu513, rshriram, ymesika

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@istio-testing istio-testing merged commit e5a09e9 into istio:master Jul 2, 2018
istio-testing pushed a commit that referenced this pull request Jul 7, 2018
* Introducing mcpc, a command-line  mcp client tool. (#6715)

* Set default requests cpu resource to all deployments (#6229)

* Set default requests cpu resource to all deployments

* Description added

* Requested CPU decreased to meet test env node capacity

* Setting requested cpu to 10m as most k8s platforms are defaulting to 0m

* Merge master remains

* Apply the global resources to the new gateways

* Updated goldens to cpu 10m

* Clean up and rename the global default resources

* Redundant Redis servers were removed + defer in a loop is not recommended. (#6768)

* Fix upgrade. (#6792)

* Fix upgrade.

* Format

* Move back the fail close - not clear if we have time to get mixer client in this build, we can remove in next

* Add configuration to allow transformation of durations to integers. (#5416)

The fluentd handler currently transforms durations to their string
representation for log messages. This commit adds a configuration to
override this default behaviour and tranform durations to integers of
unit millisecond.

* Add workload and service dashboard (#6789)

* Add workload and service dashboard

* fix typo

* Skip service dash e2e test for now

* Add rushed, missed workload dash

* Touch up

* Commenting out test until can replicate locally and fix

* simplify fmt script (#6798)

* simplify fmt script

- goimports is a superset of gofmt
- shouldn't need to remove spaces
- now utilizing -l flag to tell us if there is a diff

* always update gofmt to latest

* spacing that is now considered incorrect

* support hash based routing for soft session affinity (#6742)

* support sticky sessions

* fmt :(

* update servicediscovery fakes

* bump istio/api

* use updated api for ring hash

* support session affinity for sidecar

* update destination rule yaml to reflect new api

* ttl is a string

* unset sdsUdsPath in configmap (#6799)

* fix bug so that destination.service.** attributes are collected (#6801)

* remove unnecessary generated attributes finding. (#6785)

* modify docker template files for proxyv2 (#6790)

* Long-running testing improvements (#6800)

* Add values for config map settings, including access log.
More docs.

* Updates and improvements for the stress-testing configs.

* Add values for config map settings, including access log. (#6797)

* Add values for config map settings, including access log.
More docs.

* Updates and improvements for the stress-testing configs.

* Address review comments

* Merged wrong files

* Add the setup helm file - this change now depend on the previous PR.

* Sync with remote, remove accidentally added files.

* Another accidental file

* SNI routing from sidecar to gateway with virtual services (#6402)

* quick sni matching 1st pass with no refactoring of existing code

* use shriram's api sha

* quick pass at using tls block

* add some validation

* copyright

* fix lint + remove deadcode

* rename protocol tcp_tls -> tls

* update back to istio/api master

* remove accidentally added test file

* add tls block to gateway logic

* add todos

* basic sni wildcard implementation

* add tcp, fix problems with rbac, matching

* better tcp + tls validation

* address code review comments

* remove out of date comment

* update comments

* fix compile error

* use tcp proxy in tcp routing

* add tcp routing e2e test

* add forgotten vs config file + update description of test

* Comments, bug fixes

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* cleanup gateway tcp test

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* moving networking test yamls

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* tcp/tls tests

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* yaml fixes

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* fix file switcheroo

* port matches

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* fix authN plugin overwriting TLS context

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* more tests - route via egress gateway

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* yaml fixes

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* initialize prom variables

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* split tests

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* final test fix hopefully

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* revert gateway tweaks

Signed-off-by: Shriram Rajagopalan <shriramr@vmware.com>

* Add myself back to OWNERS, sort the list alphabetically. (#6830)

* Upload cloudfoundry test result to GCS (#6829)

* Fix typos in command-line output.
@ymesika ymesika deleted the defaultResourcesRequests branch July 9, 2018 06:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants