Skip to content

Improve config validation health checks#7605

Merged
rshriram merged 1 commit intoistio:release-1.0from
ayj:release-1.0-improve-config-validation-health-checks
Aug 14, 2018
Merged

Improve config validation health checks#7605
rshriram merged 1 commit intoistio:release-1.0from
ayj:release-1.0-improve-config-validation-health-checks

Conversation

@ayj
Copy link
Copy Markdown
Contributor

@ayj ayj commented Aug 2, 2018

Readiness and liveliness probes were failing on some providers because
GET requests were blocking for multiple seconds. Mitigate the issue
by decreasing the frequency of GET (to avoid possible throttling) and
increase the acceptable health check interval from 4s to 10s.

Short term fix for #7586. Longer
term fix requires switching to use SharedInformer.

Readiness and liveliness probes were failing on some providers because
GET requests were blocking for multiple seconds. Mitigate the issue
by decreasing the frequency of GET (to avoid possible throttling) and
increase the acceptable health check interval from 4s to 10s.

Shorm term fix for istio#7586. Longer
term fix requires switching to proper controller-style
reconciliation. That work should be aligned with the sidecar injector.
@ayj ayj requested review from ostromart and ozevren August 2, 2018 19:26
@ayj ayj removed the request for review from geeknoid August 2, 2018 19:29
@codecov
Copy link
Copy Markdown

codecov bot commented Aug 2, 2018

Codecov Report

Merging #7605 into release-1.0 will increase coverage by 1%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff              @@
##           release-1.0   #7605    +/-   ##
============================================
+ Coverage           72%     72%    +1%     
============================================
  Files              358     358            
  Lines            30990   31091   +101     
============================================
+ Hits             22123   22242   +119     
+ Misses            7923    7901    -22     
- Partials           944     948     +4
Impacted Files Coverage Δ
galley/pkg/crd/validation/webhook.go 79% <100%> (-1%) ⬇️
mixer/adapter/solarwinds/log_handler.go 58% <0%> (-22%) ⬇️
mixer/adapter/cloudwatch/cloudwatch.go 58% <0%> (-14%) ⬇️
mixer/adapter/solarwinds/metrics_handler.go 70% <0%> (-13%) ⬇️
pilot/pkg/config/memory/monitor.go 82% <0%> (-9%) ⬇️
pilot/pkg/model/trace.go 16% <0%> (-3%) ⬇️
galley/pkg/kube/listener.go 93% <0%> (-2%) ⬇️
mixer/adapter/signalfx/metrics.go 42% <0%> (-2%) ⬇️
mixer/adapter/prometheus/server.go 95% <0%> (-1%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fcf7750...68bff06. Read the comment docs.

@ozevren
Copy link
Copy Markdown
Contributor

ozevren commented Aug 2, 2018

@irisdingbj

@ozevren
Copy link
Copy Markdown
Contributor

ozevren commented Aug 2, 2018

/lgtm

Though I don't think I understand your comment about long term fix being reconciliation. Can you elaborate?

@istio-testing
Copy link
Copy Markdown
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ayj, ozevren

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ayj
Copy link
Copy Markdown
Contributor Author

ayj commented Aug 2, 2018

We should use a SharedInformer instead of polling. I updated the original comment message to reflect this.

@ayj ayj added this to the 1.0 milestone Aug 3, 2018
@krmayankk
Copy link
Copy Markdown

@ayj i would like to help with switching to shared informers, whats the process ? can i create an issue and start working on it. Can you point to which components you specifically meant ?

@rshriram rshriram merged commit d3fb10a into istio:release-1.0 Aug 14, 2018
ayj added a commit to ayj/istio that referenced this pull request Aug 21, 2018
Readiness and liveliness probes were failing on some providers because
GET requests were blocking for multiple seconds. Mitigate the issue
by decreasing the frequency of GET (to avoid possible throttling) and
increase the acceptable health check interval from 4s to 10s.

Shorm term fix for istio#7586. Longer
term fix requires switching to proper controller-style
reconciliation. That work should be aligned with the sidecar injector.
rshriram pushed a commit that referenced this pull request Aug 22, 2018
* Improve config validation health checks (#7605)

Readiness and liveliness probes were failing on some providers because
GET requests were blocking for multiple seconds. Mitigate the issue
by decreasing the frequency of GET (to avoid possible throttling) and
increase the acceptable health check interval from 4s to 10s.

Shorm term fix for #7586. Longer
term fix requires switching to proper controller-style
reconciliation. That work should be aligned with the sidecar injector.

* decouple validation webhook's health checking and reconciliation loop (#7986)

The validation webhook's health file updates and configuration
reconciliation were invoked from the same goroutine. Delays in
checking and updating the k8s configuration could result in the health
file not being updated in time to pass the health checks. Use
istio.io/istio/pkg/probe to decouple the two code paths.
@ayj ayj deleted the release-1.0-improve-config-validation-health-checks branch October 17, 2018 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants