fix: fail fast when unrecoverable discovery errors happens on checking optional CRDs#7872
Merged
zhaohuabing merged 20 commits intoenvoyproxy:mainfrom Jan 15, 2026
Merged
fix: fail fast when unrecoverable discovery errors happens on checking optional CRDs#7872zhaohuabing merged 20 commits intoenvoyproxy:mainfrom
zhaohuabing merged 20 commits intoenvoyproxy:mainfrom
Conversation
36e3fe1 to
d4af0fb
Compare
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (48.57%) is below the target coverage (60.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #7872 +/- ##
==========================================
- Coverage 72.80% 72.74% -0.07%
==========================================
Files 235 235
Lines 35313 35380 +67
==========================================
+ Hits 25709 25736 +27
- Misses 7781 7806 +25
- Partials 1823 1838 +15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
zhaohuabing
commented
Jan 7, 2026
Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
ff6bbad to
0e6c3a9
Compare
Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
zhaohuabing
commented
Jan 7, 2026
arkodg
reviewed
Jan 7, 2026
arkodg
reviewed
Jan 7, 2026
✅ Deploy Preview for cerulean-figolla-1f9435 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>
273f904 to
f96ea29
Compare
zirain
previously approved these changes
Jan 14, 2026
Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
nareddyt
approved these changes
Jan 14, 2026
zirain
approved these changes
Jan 15, 2026
andreik-n2
pushed a commit
to andreik-n2/gateway
that referenced
this pull request
Jan 15, 2026
…g optional CRDs (envoyproxy#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * only retry transient errors Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix potenial dead lock Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * minor wording Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * create discovery client once Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix lint Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * remove redundant logging Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * add e2e test Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix test Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com> * fix test Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Member
|
FYI, during test #7964, it would take more 60s before runner return error with discovery failure. |
Member
Author
Member
zirain
pushed a commit
to zirain/gateway
that referenced
this pull request
Jan 26, 2026
…g optional CRDs (envoyproxy#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * only retry transient errors Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix potenial dead lock Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * minor wording Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * create discovery client once Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix lint Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * remove redundant logging Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * add e2e test Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix test Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com> * fix test Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
rudrakhp
pushed a commit
to rudrakhp/gateway
that referenced
this pull request
Jan 26, 2026
…g optional CRDs (envoyproxy#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * only retry transient errors Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix potenial dead lock Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * minor wording Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * create discovery client once Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix lint Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * remove redundant logging Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * add e2e test Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix test Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com> * fix test Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>
zirain
added a commit
that referenced
this pull request
Jan 26, 2026
* fix: fail fast when unrecoverable discovery errors happens on checking optional CRDs (#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * only retry transient errors Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix potenial dead lock Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * minor wording Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * create discovery client once Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix lint Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * remove redundant logging Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * add e2e test Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix test Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com> * fix test Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * fix: extproc is discarded with failOpen is enabled for wasm (#7956) * fix: extproc is discarded with failOpen is enabled for wasm Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * add test Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * polish code Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * add test Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * fix: sanitize control plane config dump (#7901) * mask secrets Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * fix: server run race (#7964) * add test Signed-off-by: zirain <zirain2009@gmail.com> * fix race Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> * fix Signed-off-by: zirain <zirain2009@gmail.com> * fix Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> * use Semaphore instead of WaitGroup Signed-off-by: zirain <zirain2009@gmail.com> * comments Signed-off-by: zirain <zirain2009@gmail.com> * lint Signed-off-by: zirain <zirain2009@gmail.com> * fix Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> * callback Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> * run hook sequentially Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> * rename to cfgMux Signed-off-by: zirain <zirain2009@gmail.com> --------- Signed-off-by: zirain <zirain2009@gmail.com> * fix: wrong cluster type with mixed FQDN backend and service backend refs (#7994) * fix: wrong cluster type with mixed FQDN backend and service backend refs Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * fix mirror cluster endpoint type Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * simplify the test Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * update comment Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * fix: merge route match rule with match all route (#8011) Signed-off-by: zirain <zirain2009@gmail.com> * fix gen Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> * fix for golang 11.24 Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> * fix watch CRD version Signed-off-by: zirain <zirain2009@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Signed-off-by: zirain <zirain2009@gmail.com> Co-authored-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
rudrakhp
added a commit
that referenced
this pull request
Jan 26, 2026
* fix: extproc is discarded with failOpen is enabled for wasm (#7956) * fix: extproc is discarded with failOpen is enabled for wasm Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * add test Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * polish code Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * add test Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * fix: sanitize control plane config dump (#7901) * mask secrets Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * fix: server run race (#7964) * add test Signed-off-by: zirain <zirain2009@gmail.com> * fix race Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> * fix Signed-off-by: zirain <zirain2009@gmail.com> * fix Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> * use Semaphore instead of WaitGroup Signed-off-by: zirain <zirain2009@gmail.com> * comments Signed-off-by: zirain <zirain2009@gmail.com> * lint Signed-off-by: zirain <zirain2009@gmail.com> * fix Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> * callback Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> * run hook sequentially Signed-off-by: zirain <zirain2009@gmail.com> * fix lint Signed-off-by: zirain <zirain2009@gmail.com> * rename to cfgMux Signed-off-by: zirain <zirain2009@gmail.com> --------- Signed-off-by: zirain <zirain2009@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * fix: wrong cluster type with mixed FQDN backend and service backend refs (#7994) * fix: wrong cluster type with mixed FQDN backend and service backend refs Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * fix mirror cluster endpoint type Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * simplify the test Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> * update comment Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * fix: fail fast when unrecoverable discovery errors happens on checking optional CRDs (#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * only retry transient errors Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix potenial dead lock Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * minor wording Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * create discovery client once Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix lint Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * remove redundant logging Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * add e2e test Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix test Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com> * fix test Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * fix: merge route match rule with match all route (#8011) Signed-off-by: zirain <zirain2009@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * fix: do not set autoHTTPConfig when used mixed(HTTP + HTTPS) backends (#7950) * fix: do not set autoHTTPConfig when used mixed backend Signed-off-by: zirain <zirain2009@gmail.com> * release notes Signed-off-by: zirain <zirain2009@gmail.com> * fix Signed-off-by: zirain <zirain2009@gmail.com> * add e2e Signed-off-by: zirain <zirain2009@gmail.com> --------- Signed-off-by: zirain <zirain2009@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * fix: backend tls default namespace (#7987) Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * fix: race in gatewaapi runner (#8037) * add testcase Signed-off-by: zirain <zirain2009@gmail.com> * fix Signed-off-by: zirain <zirain2009@gmail.com> * simply Signed-off-by: zirain <zirain2009@gmail.com> --------- Signed-off-by: zirain <zirain2009@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * [release/v1.6] v1.6.3 release notes (#8054) Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * v1.6.3 version Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * fix gen-check Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> * fix lint Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com> Signed-off-by: zirain <zirain2009@gmail.com> Co-authored-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Co-authored-by: zirain <zirain2009@gmail.com>
SadmiB
pushed a commit
to SadmiB/gateway
that referenced
this pull request
Jan 30, 2026
…g optional CRDs (envoyproxy#7872) * fail fast when unrecoverable discovery errors happens Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * only retry transient errors Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix potenial dead lock Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * minor wording Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * create discovery client once Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix lint Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * address comments Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * remove redundant logging Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * add e2e test Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com> * fix test Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com> * fix test Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com> --------- Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com> Signed-off-by: Sadmi Bouhafs <sadmibouhafs@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What type of PR is this?
This PR adds retries to the controller when it fails to discover optional CRDs from the API server. If all retries fail, the error is propagated and causes the EG pod to restart. This prevents the EG pod from reconciling incomplete resources and serving partial xDS configuration to Envoy.
It also propagates runner startup errors to the server, so the Envoy Gateway process can exit and restart cleanly. Previously, runner startup failures were only logged, and Envoy Gateway continued running even with failed runners.
Fixes #7871
Release Notes: Yes