Skip to content

fix: eliminate data race on leaderController.log via atomic term#903

Merged
mattisonchao merged 2 commits intomainfrom
fix/data-race-leader-controller-log
Feb 24, 2026
Merged

fix: eliminate data race on leaderController.log via atomic term#903
mattisonchao merged 2 commits intomainfrom
fix/data-race-leader-controller-log

Conversation

@mattisonchao
Copy link
Copy Markdown
Member

@mattisonchao mattisonchao commented Feb 24, 2026

Summary

  • Make leaderController.term an *atomic.Int64 instead of a plain int64
  • Add atomicInt64Value type implementing slog.LogValuer so the logger resolves the term dynamically at log time
  • Remove setLogger() entirely — the logger is now created once in the constructor and never reassigned
  • Revert the log := lc.log local captures (no longer needed since lc.log is immutable after construction)
  • Simplify Term() getter to a lock-free lc.term.Load()

Test plan

  • go build ./... passes
  • go test -race -count=1 -run TestLeaderController ./... passes
  • go test -race -count=1 ./... passes (full package)

Closures passed to goroutines and async callbacks captured `lc` (the
struct pointer) and later read `lc.log` without holding the lock.
Meanwhile, `setLogger()` called from `NewTerm()` under the write lock
could reassign `lc.log`, causing a data race detected by `-race`.

Fix by capturing `lc.log` into a local variable before spawning the
goroutine or creating the callback, following the existing pattern
used in `propose()` for `lc.wal`, `lc.quorumAckTracker`, `lc.term`.

Affected methods: Read, list, RangeScan, GetNotifications,
proposeFeaturesEnable, ProposeRecordChecksum.

Closes #902

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattisonchao mattisonchao changed the title fix: data race on leaderController.log in async closures fix: eliminate data race on leaderController.log via atomic term Feb 24, 2026
@mattisonchao mattisonchao force-pushed the fix/data-race-leader-controller-log branch 7 times, most recently from 1cf5145 to 7a81fa5 Compare February 24, 2026 09:50
@mattisonchao mattisonchao self-assigned this Feb 24, 2026
@mattisonchao mattisonchao force-pushed the fix/data-race-leader-controller-log branch 2 times, most recently from 90d599c to ab8495c Compare February 24, 2026 09:55
Replace the mutable term field and logger recreation pattern with an
atomic.Int64 term, so the logger is created once and never reassigned.
This eliminates the data race on lc.log without needing local log
captures in closures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattisonchao mattisonchao force-pushed the fix/data-race-leader-controller-log branch from ab8495c to 00558e2 Compare February 24, 2026 10:10
@mattisonchao
Copy link
Copy Markdown
Member Author

Review Summary

Comprehensive review of the diff against mainno bugs or regressions found.

What changed

  • leaderController.term changed from int64 to *atomic.Int64
  • setLogger() removed — logger is created once in the constructor and never reassigned, eliminating the data race on lc.log
  • Term() getter is now lock-free (lc.term.Load())
  • All lc.log local captures (log := lc.log) reverted since lc.log is immutable after construction
  • term attribute added to all log call sites explicitly (was previously baked into the base logger via setLogger())
  • Methods with multiple lc.term.Load() calls extract a local variable to avoid redundant atomic loads
  • Bonus: fixed a wrong log message in RangeScan ("Received list request""Received range-scan request")

Verified

  • All lc.term reads use .Load(), all writes use .Store() — no bare field access remaining
  • Methods that held the write lock when modifying term still do (NewTerm, constructor)
  • Local term captures in becomeLeader, addFollower, truncateFollowerIfNeeded, applyAllEntriesIntoDB, DeleteShard are all under lock, so they see a consistent value
  • Async callbacks in proposeFeaturesEnable and ProposeRecordChecksum capture term at call time (not callback time) to log the term under which the proposal was made
  • session_manager.go uses controller.term.Load() — eagerly resolved by slog.With, same behavior as the original int64 field

Known limitation

  • slog.With() eagerly resolves values, so term cannot be dynamically included in the base logger without a custom slog.Handler. Instead, term is passed explicitly at each log call site.

@mattisonchao mattisonchao merged commit 39a9faa into main Feb 24, 2026
9 checks passed
@mattisonchao mattisonchao deleted the fix/data-race-leader-controller-log branch February 24, 2026 10:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant