fix(blockassembly): make DiskTxMap.bytesWritten access atomic#1029
Merged
Conversation
bytesWritten was a plain int64 incremented by each disk's writerLoop goroutine and read by Stats() from the block-processing goroutine (reportDiskMapStats runs during moveForwardBlock, before Clear() quiesces the writers). That is an unsynchronized concurrent read/write — a data race the -race detector flags, and a torn read on 32-bit. The field comment even claimed it was single-goroutine-only, which Stats() violates. Access it via sync/atomic (AddInt64 in the writer, LoadInt64 in Stats); the field stays a plain int64 so diskShard remains copy-safe (it is assigned as a struct literal at construction). Add a -race regression test that reads Stats() concurrently with active writers; reverting to plain access trips the detector.
Contributor
|
🤖 Claude Code Review Status: Complete Current Review: No issues found. The data race fix is correct and well-tested. Analysis:
|
|
Contributor
Benchmark Comparison ReportBaseline: Current: Summary
All benchmark results (sec/op)
Threshold: >10% with p < 0.05 | Generated: 2026-06-03 11:08 UTC |
freemans13
approved these changes
Jun 3, 2026
ordishs
approved these changes
Jun 3, 2026
ordishs
left a comment
Collaborator
There was a problem hiding this comment.
Correct, minimal fix for a genuine data race on diskShard.bytesWritten.
- Both access sites (writer
AddInt64,Stats()LoadInt64) now atomic; no stragglers on plain access. - Atomics target the canonical array-element address — correctly paired.
- Keeping the field a plain
int64(rather thanatomic.Int64) is the right call:diskShardis value-assigned at construction, sonoCopywould trip vet copylocks. Matches the existingm.countpattern. TestDiskTxMap_StatsRaceWithWritersis a real-raceregression — verified passing locally under-race.- Misleading "no atomic needed" comment corrected;
Stats()now documents its concurrent-call contract.
Approve.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



Data race
diskShard.bytesWrittenwas a plainint64:writerLoopgoroutine (d.bytesWritten += …), andStats()(diskBytes += m.disks[i].bytesWritten).Stats()is called in production viareportDiskMapStatsduringmoveForwardBlock— beforeClear()quiesces the writers — so the read races the active increments from a different goroutine. That's a data race (go test -raceflags it) and a torn read on 32-bit. The field's own comment claimed "only touched by the single writer goroutine — no atomic needed", whichStats()violates.Fix
Access
bytesWrittenviasync/atomic(AddInt64in the writer,LoadInt64inStats()). The field stays a plainint64rather thanatomic.Int64on purpose:diskShardis assigned as a struct literal at construction (m.disks[i] = diskShard{…}), so keeping it copy-safe avoids tainting the struct with anoCopyand avetcopylocks failure.Test
TestDiskTxMap_StatsRaceWithWritersdrivesSet(writer-loop increments) concurrently withStats()reads. Reverting to plain access makes it fail under-racewith aDATA RACEreport onbytesWritten; with the fix it's clean.Verification
go test -race: passes with fix; fails (DATA RACE) when reverted to plain access.go build,go vet,gofmt,gci: clean.Third fix from the concurrency/lifecycle audit (after #1027 kafka final-drain and #1028 blockassembly ErrChan deadlock).