Skip to content

fix: race in gatewaapi runner#8037

Merged
zirain merged 3 commits intoenvoyproxy:mainfrom
zirain:runner/data-race
Jan 26, 2026
Merged

fix: race in gatewaapi runner#8037
zirain merged 3 commits intoenvoyproxy:mainfrom
zirain:runner/data-race

Conversation

@zirain
Copy link
Copy Markdown
Member

@zirain zirain commented Jan 24, 2026

fixes: #8035

@zirain zirain requested a review from a team as a code owner January 24, 2026 12:48
@zirain zirain changed the title fix: fix race in gatewaapi runner fix: race in gatewaapi runner Jan 24, 2026
@netlify
Copy link
Copy Markdown

netlify bot commented Jan 24, 2026

Deploy Preview for cerulean-figolla-1f9435 canceled.

Name Link
🔨 Latest commit 2d84080
🔍 Latest deploy log https://app.netlify.com/projects/cerulean-figolla-1f9435/deploys/6976fe1b35703f00085cca3b

@zirain
Copy link
Copy Markdown
Member Author

zirain commented Jan 24, 2026

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.75%. Comparing base (424d039) to head (2d84080).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8037      +/-   ##
==========================================
+ Coverage   73.70%   73.75%   +0.04%     
==========================================
  Files         237      237              
  Lines       35703    35709       +6     
==========================================
+ Hits        26316    26338      +22     
+ Misses       7529     7515      -14     
+ Partials     1858     1856       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

r.Logger = r.Logger.WithName(r.Name()).WithValues("runner", r.Name())

go r.startWasmCache(ctx)
r.done.Add(2)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment here, outlining why 2 is needed here

arkodg
arkodg previously approved these changes Jan 24, 2026
@arkodg arkodg requested review from a team January 24, 2026 21:50
@zirain zirain requested a review from arkodg January 25, 2026 04:06
@zirain zirain force-pushed the runner/data-race branch 3 times, most recently from 0314fcc to 9535661 Compare January 25, 2026 06:36
keyCache *KeyCache

// Goroutine synchronization
done sync.WaitGroup
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name it as wg? done.Done() was confusing at first glance.

go r.startWasmCache(ctx)
// Add 2 to the WaitGroup: one for the WASM cache server goroutine and one for the
// subscribeAndTranslate goroutine that handles resource translation
r.done.Add(2)
Copy link
Copy Markdown
Member

@rudrakhp rudrakhp Jan 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would we want to add 2 right away or Add(1) just before calling each routine? We wouldn't want to wait on routines that might not have started for some reason. Also any new routine that we might add here will follow the same pattern.

    // Increment by 1 specifically for the WASM cache
	r.done.Add(1)
	go func() {
		defer r.done.Done()
		r.startWasmCache(ctx)
	}()

	// If Subscribe crashes or returns an error, the WaitGroup 
	// won't be stuck waiting for a goroutine that never started.
	c := r.ProviderResources.GatewayAPIResources.Subscribe(ctx)

	// Increment by 1 specifically for the translation handler
	r.done.Add(1)
	go func() {
		defer r.done.Done()
		r.subscribeAndTranslate(c)
	}()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no strong opinion on this.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to keeping the Add close to the go func()

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could simply it with r.done.Go()

// t.Output() while goroutines are still active.
//
// Run with: go test -race -run TestRunnerGoroutineRace -count=100 ./internal/cmd/
func TestRunnerGoroutineRace(t *testing.T) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see another test in runner_race_test, do we need this as well? Which one would we need to detect a race if someone spawns another routine without WG?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this's in case we have another one unkonwn race, TBH it's hard to reproduce here.

@zirain
Copy link
Copy Markdown
Member Author

zirain commented Jan 26, 2026

/retest

@zirain zirain requested a review from rudrakhp January 26, 2026 03:31
Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: zirain <zirain2009@gmail.com>
@zirain zirain merged commit 1f9c321 into envoyproxy:main Jan 26, 2026
57 of 59 checks passed
@zirain zirain deleted the runner/data-race branch January 26, 2026 07:31
zirain added a commit to zirain/gateway that referenced this pull request Jan 26, 2026
* add testcase

Signed-off-by: zirain <zirain2009@gmail.com>

* fix

Signed-off-by: zirain <zirain2009@gmail.com>

* simply

Signed-off-by: zirain <zirain2009@gmail.com>

---------

Signed-off-by: zirain <zirain2009@gmail.com>
rudrakhp pushed a commit to rudrakhp/gateway that referenced this pull request Jan 26, 2026
* add testcase

Signed-off-by: zirain <zirain2009@gmail.com>

* fix

Signed-off-by: zirain <zirain2009@gmail.com>

* simply

Signed-off-by: zirain <zirain2009@gmail.com>

---------

Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>
rudrakhp added a commit that referenced this pull request Jan 26, 2026
* fix: extproc is discarded with failOpen is enabled for wasm (#7956)

* fix: extproc is discarded with failOpen is enabled for wasm

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* add test

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>

* polish code

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>

* add test

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>

---------

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: sanitize control plane config dump (#7901)

* mask secrets

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* address comments

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>

---------

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: server run race (#7964)

* add test

Signed-off-by: zirain <zirain2009@gmail.com>

* fix race

Signed-off-by: zirain <zirain2009@gmail.com>

* fix lint

Signed-off-by: zirain <zirain2009@gmail.com>

* fix

Signed-off-by: zirain <zirain2009@gmail.com>

* fix

Signed-off-by: zirain <zirain2009@gmail.com>

* fix lint

Signed-off-by: zirain <zirain2009@gmail.com>

* use Semaphore instead of WaitGroup

Signed-off-by: zirain <zirain2009@gmail.com>

* comments

Signed-off-by: zirain <zirain2009@gmail.com>

* lint

Signed-off-by: zirain <zirain2009@gmail.com>

* fix

Signed-off-by: zirain <zirain2009@gmail.com>

* fix lint

Signed-off-by: zirain <zirain2009@gmail.com>

* callback

Signed-off-by: zirain <zirain2009@gmail.com>

* fix lint

Signed-off-by: zirain <zirain2009@gmail.com>

* run hook sequentially

Signed-off-by: zirain <zirain2009@gmail.com>

* fix lint

Signed-off-by: zirain <zirain2009@gmail.com>

* rename to cfgMux

Signed-off-by: zirain <zirain2009@gmail.com>

---------

Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: wrong cluster type with mixed FQDN backend and service backend refs (#7994)

* fix: wrong cluster type with mixed FQDN backend and service backend refs

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>

* fix mirror cluster endpoint type

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>

* simplify the test

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>

* update comment

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>

---------

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: fail fast when unrecoverable discovery errors happens on checking optional CRDs (#7872)

* fail fast when unrecoverable discovery errors happens

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* only retry transient errors

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* fix potenial dead lock

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* address comments

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* minor wording

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* create discovery client once

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* fix lint

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* address comments

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* remove redundant logging

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* add e2e test

Signed-off-by: Huabing Zhao <zhaohuabing@gmail.com>

* fix test

Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com>

* fix test

Signed-off-by: Huabing(Robin) Zhao <zhaohuabing@gmail.com>

---------

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: merge route match rule with match all route (#8011)

Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: do not set autoHTTPConfig when used mixed(HTTP + HTTPS) backends (#7950)

* fix: do not set autoHTTPConfig when used mixed backend

Signed-off-by: zirain <zirain2009@gmail.com>

* release notes

Signed-off-by: zirain <zirain2009@gmail.com>

* fix

Signed-off-by: zirain <zirain2009@gmail.com>

* add e2e

Signed-off-by: zirain <zirain2009@gmail.com>

---------

Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: backend tls default namespace (#7987)

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix: race in gatewaapi runner (#8037)

* add testcase

Signed-off-by: zirain <zirain2009@gmail.com>

* fix

Signed-off-by: zirain <zirain2009@gmail.com>

* simply

Signed-off-by: zirain <zirain2009@gmail.com>

---------

Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* [release/v1.6] v1.6.3 release notes (#8054)

Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* v1.6.3 version

Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix gen-check

Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

* fix lint

Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>

---------

Signed-off-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>
Signed-off-by: zirain <zirain2009@gmail.com>
Co-authored-by: Huabing (Robin) Zhao <zhaohuabing@gmail.com>
Co-authored-by: zirain <zirain2009@gmail.com>
SadmiB pushed a commit to SadmiB/gateway that referenced this pull request Jan 30, 2026
* add testcase

Signed-off-by: zirain <zirain2009@gmail.com>

* fix

Signed-off-by: zirain <zirain2009@gmail.com>

* simply

Signed-off-by: zirain <zirain2009@gmail.com>

---------

Signed-off-by: zirain <zirain2009@gmail.com>
Signed-off-by: Sadmi Bouhafs <sadmibouhafs@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data race while running tests that fixes with retry

3 participants