Skip to content

fix: add read lock to WAL segment Flush to prevent data race with Close#893

Merged
mattisonchao merged 1 commit intomainfrom
fix/wal-flush-data-race
Feb 23, 2026
Merged

fix: add read lock to WAL segment Flush to prevent data race with Close#893
mattisonchao merged 1 commit intomainfrom
fix/wal-flush-data-race

Conversation

@mattisonchao
Copy link
Copy Markdown
Member

Motivation

TestCoordinatorE2E is flaky due to a data race between the WAL runSync goroutine calling segment.Flush() (which reads the mmap) and wal.Close() calling segment.Close()MMap.Unmap() (which writes/invalidates the mmap). The race was latent in the WAL code and surfaced after the follower controller lifecycle refactor in #887 changed shutdown timing.

CI failure: https://github.com/oxia-db/oxia/actions/runs/22312993235/job/64549930899

Modification

  • Add RLock/RUnlock to readWriteSegment.Flush() to synchronize against the write lock already held by Close(). This ensures either Flush() completes before Unmap() begins, or Unmap() completes first and Flush() sees txnMappedFile == nil and returns safely.

The runSync goroutine calls segment.Flush() without holding any lock,
while Close() concurrently calls Unmap() under the write lock. This
adds RLock/RUnlock to Flush() so the two operations are properly
serialized, and guards against a nil txnMappedFile after Unmap.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattisonchao mattisonchao self-assigned this Feb 23, 2026
@mattisonchao mattisonchao merged commit dab0daa into main Feb 23, 2026
11 of 12 checks passed
@mattisonchao mattisonchao deleted the fix/wal-flush-data-race branch February 23, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant