Improve config validation health checks#7605
Conversation
Readiness and liveliness probes were failing on some providers because GET requests were blocking for multiple seconds. Mitigate the issue by decreasing the frequency of GET (to avoid possible throttling) and increase the acceptable health check interval from 4s to 10s. Shorm term fix for istio#7586. Longer term fix requires switching to proper controller-style reconciliation. That work should be aligned with the sidecar injector.
Codecov Report
@@ Coverage Diff @@
## release-1.0 #7605 +/- ##
============================================
+ Coverage 72% 72% +1%
============================================
Files 358 358
Lines 30990 31091 +101
============================================
+ Hits 22123 22242 +119
+ Misses 7923 7901 -22
- Partials 944 948 +4
Continue to review full report at Codecov.
|
|
/lgtm Though I don't think I understand your comment about long term fix being reconciliation. Can you elaborate? |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ayj, ozevren The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
We should use a SharedInformer instead of polling. I updated the original comment message to reflect this. |
|
@ayj i would like to help with switching to shared informers, whats the process ? can i create an issue and start working on it. Can you point to which components you specifically meant ? |
Readiness and liveliness probes were failing on some providers because GET requests were blocking for multiple seconds. Mitigate the issue by decreasing the frequency of GET (to avoid possible throttling) and increase the acceptable health check interval from 4s to 10s. Shorm term fix for istio#7586. Longer term fix requires switching to proper controller-style reconciliation. That work should be aligned with the sidecar injector.
* Improve config validation health checks (#7605) Readiness and liveliness probes were failing on some providers because GET requests were blocking for multiple seconds. Mitigate the issue by decreasing the frequency of GET (to avoid possible throttling) and increase the acceptable health check interval from 4s to 10s. Shorm term fix for #7586. Longer term fix requires switching to proper controller-style reconciliation. That work should be aligned with the sidecar injector. * decouple validation webhook's health checking and reconciliation loop (#7986) The validation webhook's health file updates and configuration reconciliation were invoked from the same goroutine. Delays in checking and updating the k8s configuration could result in the health file not being updated in time to pass the health checks. Use istio.io/istio/pkg/probe to decouple the two code paths.
Readiness and liveliness probes were failing on some providers because
GET requests were blocking for multiple seconds. Mitigate the issue
by decreasing the frequency of GET (to avoid possible throttling) and
increase the acceptable health check interval from 4s to 10s.
Short term fix for #7586. Longer
term fix requires switching to use SharedInformer.