Skip to content

fix readiness probe regression#18679

Merged
istio-testing merged 2 commits intoistio:masterfrom
ramaraochavali:fix/readiness_reg
Nov 6, 2019
Merged

fix readiness probe regression#18679
istio-testing merged 2 commits intoistio:masterfrom
ramaraochavali:fix/readiness_reg

Conversation

@ramaraochavali
Copy link
Copy Markdown
Contributor

@ramaraochavali ramaraochavali commented Nov 6, 2019

Fixes #18665

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@ramaraochavali ramaraochavali requested a review from a team as a code owner November 6, 2019 03:45
@googlebot googlebot added the cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. label Nov 6, 2019
@istio-testing istio-testing added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 6, 2019
if p.lastKnownState.versionStats.CDSVersion > 0 && p.lastKnownState.versionStats.LDSVersion > 0 {
// Envoy seems to not updatig cds version (need to confirm) when partial rejection happens.
// Till that is fixed, we should treat LDS success as readiness success as LDS happens last.
if p.lastKnownState.versionStats.LDSVersion > 0 {
Copy link
Copy Markdown
Contributor Author

@ramaraochavali ramaraochavali Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@howardjohn for partial cluster rejections (that is what is happening in tests), Envoy seems to be not updating this (need to verify though), that is why we are seeing cds as Not Received and lds as Received. For now, I changed it like this - I will verify in Envoy and follow-up

Copy link
Copy Markdown
Member

@howardjohn howardjohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this fully solves the regression, we are seeing constant timeouts on the readiness probe..

Will approve since this fixes it a bit, but if we can't properly fix we should revert to old behavior most likely?

@ramaraochavali
Copy link
Copy Markdown
Contributor Author

Will approve since this fixes it a bit, but if we can't properly fix we should revert to old behavior most likely?

Sure let us see how this goes. But timeout is very surprising to me. Let us watch and if it continues I can revert even though it is not fully functionally correct.

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
@istio-testing istio-testing merged commit 37edd86 into istio:master Nov 6, 2019
@ramaraochavali ramaraochavali deleted the fix/readiness_reg branch November 6, 2019 05:40
@ramaraochavali
Copy link
Copy Markdown
Contributor Author

@howardjohn If this continues to have issues, I have prepared a PR #18685 to revert back to update status stats and cache them. - Please merge that

sdake pushed a commit to sdake/istio that referenced this pull request Dec 1, 2019
* fix readiness probe regression

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>

* fix unit test

Signed-off-by: Rama Chavali <rama.rao@salesforce.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/networking cla: yes Set by the Google CLA bot to indicate the author of a PR has signed the Google CLA. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression in proxy readiness probe, leading to pods failing to start up

5 participants