Skip to content

fix: add mutex lock to snapshot cache OnStreamResponse/OnStreamDeltaResponse#8277

Merged
jukie merged 3 commits intoenvoyproxy:mainfrom
jaffarkeikei:fix/snapshot-cache-data-race
Feb 19, 2026
Merged

fix: add mutex lock to snapshot cache OnStreamResponse/OnStreamDeltaResponse#8277
jukie merged 3 commits intoenvoyproxy:mainfrom
jaffarkeikei:fix/snapshot-cache-data-race

Conversation

@jaffarkeikei
Copy link
Copy Markdown
Contributor

OnStreamResponse and OnStreamDeltaResponse read from s.streamIDNodeInfo without the mutex, causing a data race with concurrent writes from OnStreamOpen/OnStreamClosed/etc. This can crash the control plane under concurrent xDS traffic.

Adds s.mu.Lock()/s.mu.Unlock() to both functions.

Fixes #8276

…esponse

OnStreamResponse and OnStreamDeltaResponse read from s.streamIDNodeInfo
without holding the mutex, while other callbacks write to this map under
the lock. This is a data race that can crash the control plane with
"fatal error: concurrent map read and map write".

Fixes envoyproxy#8276

Signed-off-by: jaffar <keikei.jaffar@mail.utoronto.ca>
@jaffarkeikei jaffarkeikei requested a review from a team as a code owner February 16, 2026 03:53
@netlify
Copy link
Copy Markdown

netlify bot commented Feb 16, 2026

Deploy Preview for cerulean-figolla-1f9435 canceled.

Name Link
🔨 Latest commit 6ca2543
🔍 Latest deploy log https://app.netlify.com/projects/cerulean-figolla-1f9435/deploys/6996aacd00b6b60008f15fee

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.66%. Comparing base (fa3ff1d) to head (6ca2543).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8277      +/-   ##
==========================================
+ Coverage   73.58%   73.66%   +0.07%     
==========================================
  Files         242      242              
  Lines       37003    37007       +4     
==========================================
+ Hits        27228    27260      +32     
+ Misses       7856     7826      -30     
- Partials     1919     1921       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.


func (s *snapshotCache) OnStreamResponse(_ context.Context, streamID int64, _ *discoveryv3.DiscoveryRequest, _ *discoveryv3.DiscoveryResponse) {
// No mutex lock required here because no writing to the cache.
s.mu.Lock()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate more on why this is necessary?

Copy link
Copy Markdown
Contributor Author

@jaffarkeikei jaffarkeikei Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OnStreamOpen, OnStreamClosed, OnStreamRequest, OnDeltaStreamOpen, and OnDeltaStreamClosed all write to s.streamIDNodeInfo while holding s.mu. But OnStreamResponse and OnStreamDeltaResponse were reading from the same map without any lock.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Adds tests that exercise OnStreamResponse and OnStreamDeltaResponse
concurrently with OnStreamOpen/OnDeltaStreamOpen to verify the mutex
fix prevents data races on the streamIDNodeInfo map.

Signed-off-by: jaffar <keikei.jaffar@mail.utoronto.ca>
@jaffarkeikei jaffarkeikei force-pushed the fix/snapshot-cache-data-race branch from 6290ce9 to 3797393 Compare February 16, 2026 07:05
@jukie jukie requested a review from a team February 19, 2026 06:25
@jukie
Copy link
Copy Markdown
Contributor

jukie commented Feb 19, 2026

/retest

@jukie jukie merged commit 0423613 into envoyproxy:main Feb 19, 2026
76 of 82 checks passed
Inode1 pushed a commit to Inode1/gateway that referenced this pull request Feb 23, 2026
…esponse (envoyproxy#8277)

* fix: add mutex lock to snapshot cache OnStreamResponse/OnStreamDeltaResponse

OnStreamResponse and OnStreamDeltaResponse read from s.streamIDNodeInfo
without holding the mutex, while other callbacks write to this map under
the lock. This is a data race that can crash the control plane with
"fatal error: concurrent map read and map write".

Fixes envoyproxy#8276

Signed-off-by: jaffar <keikei.jaffar@mail.utoronto.ca>

* test: add concurrent access tests for snapshot cache callbacks

Adds tests that exercise OnStreamResponse and OnStreamDeltaResponse
concurrently with OnStreamOpen/OnDeltaStreamOpen to verify the mutex
fix prevents data races on the streamIDNodeInfo map.

Signed-off-by: jaffar <keikei.jaffar@mail.utoronto.ca>

---------

Signed-off-by: jaffar <keikei.jaffar@mail.utoronto.ca>
Co-authored-by: jaffar <keikei.jaffar@mail.utoronto.ca>
Co-authored-by: Isaac Wilson <isaac.wilson514@gmail.com>
antonio-mazzini pushed a commit to antonio-mazzini/gateway that referenced this pull request Mar 5, 2026
…esponse (envoyproxy#8277)

* fix: add mutex lock to snapshot cache OnStreamResponse/OnStreamDeltaResponse

OnStreamResponse and OnStreamDeltaResponse read from s.streamIDNodeInfo
without holding the mutex, while other callbacks write to this map under
the lock. This is a data race that can crash the control plane with
"fatal error: concurrent map read and map write".

Fixes envoyproxy#8276

Signed-off-by: jaffar <keikei.jaffar@mail.utoronto.ca>

* test: add concurrent access tests for snapshot cache callbacks

Adds tests that exercise OnStreamResponse and OnStreamDeltaResponse
concurrently with OnStreamOpen/OnDeltaStreamOpen to verify the mutex
fix prevents data races on the streamIDNodeInfo map.

Signed-off-by: jaffar <keikei.jaffar@mail.utoronto.ca>

---------

Signed-off-by: jaffar <keikei.jaffar@mail.utoronto.ca>
Co-authored-by: jaffar <keikei.jaffar@mail.utoronto.ca>
Co-authored-by: Isaac Wilson <isaac.wilson514@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: data race in snapshot cache OnStreamResponse/OnStreamDeltaResponse

3 participants