Skip to content

fix: eliminate data races on controller logger and term fields#907

Merged
mattisonchao merged 4 commits intomainfrom
fix/data-race-leader-controller-log
Feb 24, 2026
Merged

fix: eliminate data races on controller logger and term fields#907
mattisonchao merged 4 commits intomainfrom
fix/data-race-leader-controller-log

Conversation

@mattisonchao
Copy link
Copy Markdown
Member

@mattisonchao mattisonchao commented Feb 24, 2026

Summary

  • fix: data race on leaderController.log in async closures — Closures passed to goroutines captured lc and read lc.log without holding the lock, while setLogger() called from NewTerm() under the write lock could reassign lc.log. Fixed by capturing lc.log into a local variable before spawning goroutines.
  • fix: make leaderController.term atomic and remove setLogger() — Replaced the mutable term field and logger recreation pattern with atomic.Int64, so the logger is created once and never reassigned. This eliminates the data race on lc.log without needing local captures in closures.
  • *fix: remove atomic.Int64 term from followerController base logger — Applied the same pattern to followerController. The previous slog.Any("term", term) with *atomic.Int64 did not format correctly as an integer. Now passes slog.Int64("term", fc.term.Load()) explicitly in each log call.

Closes #902

Test plan

  • go build ./oxiad/dataserver/controller/follow/... compiles
  • go test ./oxiad/dataserver/controller/follow/... passes
  • Run with -race flag to verify no data races remain on leader/follower controller log and term fields

mattisonchao and others added 3 commits February 24, 2026 16:40
Closures passed to goroutines and async callbacks captured `lc` (the
struct pointer) and later read `lc.log` without holding the lock.
Meanwhile, `setLogger()` called from `NewTerm()` under the write lock
could reassign `lc.log`, causing a data race detected by `-race`.

Fix by capturing `lc.log` into a local variable before spawning the
goroutine or creating the callback, following the existing pattern
used in `propose()` for `lc.wal`, `lc.quorumAckTracker`, `lc.term`.

Affected methods: Read, list, RangeScan, GetNotifications,
proposeFeaturesEnable, ProposeRecordChecksum.

Closes #902

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the mutable term field and logger recreation pattern with an
atomic.Int64 term, so the logger is created once and never reassigned.
This eliminates the data race on lc.log without needing local log
captures in closures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Follow the same pattern as leaderController: create the logger once
without embedding the term, and pass slog.Int64("term", fc.term.Load())
explicitly in each log call. The previous slog.Any("term", term) with
*atomic.Int64 did not format correctly as an integer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattisonchao mattisonchao self-assigned this Feb 24, 2026
@mattisonchao mattisonchao merged commit a61fdfa into main Feb 24, 2026
9 checks passed
@mattisonchao mattisonchao deleted the fix/data-race-leader-controller-log branch February 24, 2026 16:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky TestNormalShardBalancer: data race on leaderController.log field

1 participant