Skip to content

fix: new subscribe pattern for snapshots to prevent race conditions#6412

Closed
rudrakhp wants to merge 1 commit intoenvoyproxy:mainfrom
rudrakhp:fix_subscribe_close_race
Closed

fix: new subscribe pattern for snapshots to prevent race conditions#6412
rudrakhp wants to merge 1 commit intoenvoyproxy:mainfrom
rudrakhp:fix_subscribe_close_race

Conversation

@rudrakhp
Copy link
Copy Markdown
Member

@rudrakhp rudrakhp commented Jun 26, 2025

What type of PR is this?
fix: implement new subscribe pattern to prevent race condition with close

What this PR does / why we need it:
Propose a new pre-subscribe pattern for watchable Map snapshots to ensure that all subscriptions occur exclusively within the main routine. This guarantees that subscriptions are initialized in a single, centralized location and can be cleanly closed within the same context.

  • With this pattern no usage of watchable.Map or its Subscribe() calls should exist outside message/presubscribe.go, enforcing strict control over subscription points. This approach makes it easier to reason about subscription ownership and cleanup. Added a linter check for this.
  • Note that the number of subscriptions to pre-subscribe to are decided in the resource constructors defined in message/types.go. Update the count used for initializing each resource if more subscriptions are required after a code change.

Which issue(s) this PR fixes:

Related #5505

Release Notes: Yes

@codecov
Copy link
Copy Markdown

codecov bot commented Jun 26, 2025

Codecov Report

Attention: Patch coverage is 72.64957% with 32 lines in your changes missing coverage. Please review.

Project coverage is 70.83%. Comparing base (64f3576) to head (df5a5e9).
Report is 35 commits behind head on main.

Files with missing lines Patch % Lines
internal/gatewayapi/runner/runner.go 21.05% 15 Missing ⚠️
internal/cmd/server.go 0.00% 4 Missing ⚠️
internal/infrastructure/runner/runner.go 0.00% 4 Missing ⚠️
internal/xds/server/runner/runner.go 0.00% 4 Missing ⚠️
internal/message/presubscribe.go 93.33% 1 Missing and 1 partial ⚠️
internal/provider/kubernetes/controller.go 33.33% 2 Missing ⚠️
internal/provider/kubernetes/status.go 93.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6412      +/-   ##
==========================================
+ Coverage   70.75%   70.83%   +0.08%     
==========================================
  Files         220      221       +1     
  Lines       37757    37815      +58     
==========================================
+ Hits        26715    26788      +73     
+ Misses       9482     9467      -15     
  Partials     1560     1560              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@rudrakhp rudrakhp marked this pull request as ready for review June 26, 2025 20:34
@rudrakhp rudrakhp requested a review from a team as a code owner June 26, 2025 20:34
@rudrakhp rudrakhp changed the title fix: implement new subscribe pattern to prevent race condition with close fix: new subscribe pattern for snapshots to prevent race conditions Jun 27, 2025
@rudrakhp rudrakhp requested a review from arkodg June 27, 2025 05:42
@rudrakhp rudrakhp added this to the v1.5.0-rc.1 Release milestone Jun 27, 2025
@rudrakhp
Copy link
Copy Markdown
Member Author

rudrakhp commented Jul 1, 2025

/retest

@mathetake
Copy link
Copy Markdown
Member

sorry i got caught up in lots of other things atm... hopefully i can give a quick review this week but cannot guarantee (Friday is also a us holiday)

Copy link
Copy Markdown
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am struggling to understand why do you want to have a slice and keep having the reference to it.

No Subscribe() calls should exist outside subscription.go, enforcing strict control over subscription points.

At the end of the day, this is only thing we want to achieve right? Why do you want the list to keep track the channels as well as I am not sure how the fixed size on the slice helps.

@mathetake
Copy link
Copy Markdown
Member

my comments are mostly the question rather than blocking -- feel free to ignore them. I would appreciate you taking a shot at this bug anyways, thank you @rudrakhp !

@rudrakhp
Copy link
Copy Markdown
Member Author

rudrakhp commented Jul 8, 2025

@mathetake Thanks for the comments 🙌
One major comment I ack is that the new structs/methods must be private to restrict subscription points outside. I am working on that. This was a draft proposal for getting some initial buy in on this approach.

Why do you want the list to keep track the channels as well as I am not sure how the fixed size on the slice helps.

We will need to fix the size of the list and know how many subscriptions we need to pre-subscribe. Dynamically increasing the list would basically mean not pre-subscribing that we were doing already today. I don't see major advantages of subscribing to snapshots in runtime as we should know required subscriptions as part of our code change/compile time.

maybe put code comment on why this code path is not unexpected for the future reference?

Like you correctly mentioned we should never hit this path if we have correctly identified the required number of subscriptions in the controllers/runners.
Do let me know what you think ...

Copy link
Copy Markdown
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think what you explained made sense to me

Copy link
Copy Markdown
Member

@mathetake mathetake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool!

@zirain
Copy link
Copy Markdown
Member

zirain commented Jul 11, 2025

/retest

Copy link
Copy Markdown
Member

@zirain zirain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg, defer to @arkodg

Signed-off-by: Rudrakh Panigrahi <rudrakh97@gmail.com>
@arkodg
Copy link
Copy Markdown
Contributor

arkodg commented Jul 11, 2025

hey @rudrakhp, if the goal is to move subscription to the main goroutine, can we instead move the logic into runner creation / New

runners := []struct {

from runner.Start

@rudrakhp
Copy link
Copy Markdown
Member Author

Simpler fix in #6566

@rudrakhp rudrakhp closed this Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants