Skip to content

fix(pool): prevent FIFO ordering violation in notifyWaiters by tracking state locally#3777

Merged
ofekshenawa merged 3 commits into
redis:masterfrom
0x48core:fix/conn-state-fifo-ordering
Apr 15, 2026
Merged

fix(pool): prevent FIFO ordering violation in notifyWaiters by tracking state locally#3777
ofekshenawa merged 3 commits into
redis:masterfrom
0x48core:fix/conn-state-fifo-ordering

Conversation

@0x48core

@0x48core 0x48core commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Fix a race condition in ConnStateMachine.notifyWaiters() that breaks FIFO execution ordering of queued waiters

Problem

  • notifyWaiters() re-reads the atomic state (sm.GetState()) on every iteration of its outer loop. When a woken goroutine runs fast enough to call Transition(StateIdle) before the loop iterates again, the function sees the new state and wakes the next waiter within the same mutex hold. This cascades — potentially notifying all queued waiters in a single call. Since they all become runnable simultaneously, the Go scheduler executes them in arbitrary order, violating the FIFO guarantee.

  • Example: With waiters queued as [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], the observed execution order was [0, 1, 9, 2, 3, 4, 5, 6, 7, 8].

  • This caused flaky failures in TestConnStateMachine_FIFOOrdering across multiple CI matrix entries (8.0.x stable, 8.2.x 1.25.x, 8.4.x stable).

Fix

Read the state once at entry, and after a successful CAS, update a local state variable to w.targetState instead of re-reading the atomic. This ensures notifyWaiters only considers transitions made within its own call, not concurrent transitions from woken goroutines. The CAS failure path still re-reads the atomic (correctly — an external change requires the real value).


Note

Medium Risk
Touches concurrency/state-transition logic in notifyWaiters, where subtle ordering or CAS behavior changes could impact pool fairness or cause missed wakeups under load.

Overview
Fixes a race in ConnStateMachine.notifyWaiters that could wake multiple queued waiters in a single mutex hold and let the scheduler run them out of FIFO order.

notifyWaiters now snapshots state once per call and updates a local currentState only on successful CAS transitions, only re-reading the atomic state on CAS failure so concurrent transitions by woken goroutines don’t cascade additional wakeups.

Reviewed by Cursor Bugbot for commit 4381942. Bugbot is set up for automated code reviews on this repo. Configure here.

@ofekshenawa ofekshenawa left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @0x48core , LGTM!

@ofekshenawa ofekshenawa merged commit 4b8d027 into redis:master Apr 15, 2026
55 of 57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants