Skip to content

Revert "[r3.4] logger: avoid mutex contention"#20652

Merged
AskAlexSharov merged 1 commit into
release/3.4from
revert-20487-alex/slog_34_r34
Apr 18, 2026
Merged

Revert "[r3.4] logger: avoid mutex contention"#20652
AskAlexSharov merged 1 commit into
release/3.4from
revert-20487-alex/slog_34_r34

Conversation

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

Reverts #20487

reason: gnosis has regression


The rotatingWriter.mu does guard the file — it wraps both Write and Close. But a mutex doesn't make the write non-blocking. It just serializes access. If
r.file.Write() blocks in the kernel (pipe full), r.mu stays locked and other goroutines pile up at r.mu.Lock().

But this isn't the current problem. Look at the stuck goroutine stacks — they go through streamHandler directly to os.File.Write(). None of them go through
rotatingWriter. They're writing to stdout or stderr (plain os.File, no rotating writer, no mutex).

The node is set up with log.Root().SetHandler(... log.StderrHandler) — just a streamHandler wrapping os.Stderr. No mutex, no rotation. Multiple goroutines
call os.Stderr.Write() concurrently, serialize at the internal fdMutex, and one of them is stuck in the kernel write() syscall.

So: rotatingWriter.mu is correct but irrelevant to this bug. The bug is the stderr StreamHandler has no Go-level protection and when the stderr pipe
consumer backs up, the kernel write stalls the execution goroutine.


1 @ 0x4bf68e 0x49be32 0x49be09 0x4c1265 0x53bde5 0x53f1a5 0x53f1b6 0x549bce 0x549bc9 0x8dc431 0x8dcc88 0x8df530 0x8dc60e 0x8dd1c3 0x8dda16 0x1fa2c3d 0x1fa2b2e 0x4c7901
#	0x4c1264	internal/poll.runtime_Semacquire+0x24							runtime/sema.go:76
#	0x53bde4	internal/poll.(*fdMutex).rwlock+0xc4							internal/poll/fd_mutex.go:154
#	0x53f1a4	internal/poll.(*FD).writeLock+0x64							internal/poll/fd_mutex.go:239
#	0x53f1b5	internal/poll.(*FD).Write+0x75								internal/poll/fd_unix.go:361
#	0x549bcd	os.(*File).write+0x4d									os/file_posix.go:46
#	0x549bc8	os.(*File).Write+0x48									os/file.go:215
#	0x8dc430	github.com/erigontech/erigon/common/log/v3.streamHandler.Log+0x50			github.com/erigontech/erigon/common/log/v3/handler.go:54
#	0x8dcc87	github.com/erigontech/erigon/common/log/v3.lazyHandler.Log+0x3e7			github.com/erigontech/erigon/common/log/v3/handler.go:487
#	0x8df52f	github.com/erigontech/erigon/common/log/v3.lvlFilterHandler.Log+0x2f			github.com/erigontech/erigon/common/log/v3/handler.go:302
#	0x8dc60d	github.com/erigontech/erigon/common/log/v3.multiHandler.Log+0xcd			github.com/erigontech/erigon/common/log/v3/handler.go:334
#	0x8dd1c2	github.com/erigontech/erigon/common/log/v3.(*swapHandler).Log+0x62			github.com/erigontech/erigon/common/log/v3/handler.go:567
#	0x8dda15	github.com/erigontech/erigon/common/log/v3.(*logger).write+0x175			github.com/erigontech/erigon/common/log/v3/logger.go:145
#	0x1fa2c3c	github.com/erigontech/erigon/common/log/v3.Debug+0x7dc					github.com/erigontech/erigon/common/log/v3/root.go:52
#	0x1fa2b2d	github.com/erigontech/erigon/cl/sentinel.(*Sentinel).proactiveSubnetPeerSearch+0x6cd	github.com/erigontech/erigon/cl/sentinel/discovery.go:261

1 @ 0x4bf68e 0x49be32 0x49be09 0x4c1265 0x53bde5 0x53f1a5 0x53f1b6 0x549bce 0x549bc9 0x8dc431 0x8dcc88 0x8df530 0x8dc60e 0x8dd1c3 0x8dda16 0x2165b33 0x21654a2 0x2143987 0x2172da8 0x213fdce 0x218abb0 0x2189d6c 0x2543a0a 0x2541105 0x4c7901
#	0x4c1264	internal/poll.runtime_Semacquire+0x24								runtime/sema.go:76
#	0x53bde4	internal/poll.(*fdMutex).rwlock+0xc4								internal/poll/fd_mutex.go:154
#	0x53f1a4	internal/poll.(*FD).writeLock+0x64								internal/poll/fd_mutex.go:239
#	0x53f1b5	internal/poll.(*FD).Write+0x75									internal/poll/fd_unix.go:361
#	0x549bcd	os.(*File).write+0x4d										os/file_posix.go:46
#	0x549bc8	os.(*File).Write+0x48										os/file.go:215
#	0x8dc430	github.com/erigontech/erigon/common/log/v3.streamHandler.Log+0x50				github.com/erigontech/erigon/common/log/v3/handler.go:54
#	0x8dcc87	github.com/erigontech/erigon/common/log/v3.lazyHandler.Log+0x3e7				github.com/erigontech/erigon/common/log/v3/handler.go:487
#	0x8df52f	github.com/erigontech/erigon/common/log/v3.lvlFilterHandler.Log+0x2f				github.com/erigontech/erigon/common/log/v3/handler.go:302
#	0x8dc60d	github.com/erigontech/erigon/common/log/v3.multiHandler.Log+0xcd				github.com/erigontech/erigon/common/log/v3/handler.go:334
#	0x8dd1c2	github.com/erigontech/erigon/common/log/v3.(*swapHandler).Log+0x62				github.com/erigontech/erigon/common/log/v3/handler.go:567
#	0x8dda15	github.com/erigontech/erigon/common/log/v3.(*logger).write+0x175				github.com/erigontech/erigon/common/log/v3/logger.go:145
#	0x2165b32	github.com/erigontech/erigon/common/log/v3.Debug+0x1232						github.com/erigontech/erigon/common/log/v3/root.go:52
#	0x21654a1	github.com/erigontech/erigon/execution/stagedsync.(*serialExecutor).exec+0xba1			github.com/erigontech/erigon/execution/stagedsync/exec3_serial.go:168
#	0x2143986	github.com/erigontech/erigon/execution/stagedsync.ExecV3+0x19c6					github.com/erigontech/erigon/execution/stagedsync/exec3.go:289
#	0x2172da7	github.com/erigontech/erigon/execution/stagedsync.SpawnExecuteBlocksStage+0x247			github.com/erigontech/erigon/execution/stagedsync/stage_execute.go:400
#	0x213fdcd	github.com/erigontech/erigon/execution/stagedsync.PipelineStages.func10+0xed			github.com/erigontech/erigon/execution/stagedsync/default_stages.go:222
#	0x218abaf	github.com/erigontech/erigon/execution/stagedsync.(*Sync).runStage+0x12f			github.com/erigontech/erigon/execution/stagedsync/sync.go:500
#	0x2189d6b	github.com/erigontech/erigon/execution/stagedsync.(*Sync).Run+0x28b				github.com/erigontech/erigon/execution/stagedsync/sync.go:419
#	0x2543a09	github.com/erigontech/erigon/execution/execmodule.(*ExecModule).updateForkChoice+0x2569		github.com/erigontech/erigon/execution/execmodule/forkchoice.go:458
#	0x2541104	github.com/erigontech/erigon/execution/execmodule.(*ExecModule).UpdateForkChoice.func1+0x84	github.com/erigontech/erigon/execution/execmodule/forkchoice.go:129

1 @ 0x4bf68e 0x49be32 0x49be09 0x4c1265 0x53bde5 0x53f1a5 0x53f1b6 0x549bce 0x549bc9 0x8dc431 0x8dcc88 0x8df530 0x8dc60e 0x8dd1c3 0x8dda16 0x8de04b 0x18dc0a6 0x2681105 0x93d050 0x4c7901
#	0x4c1264	internal/poll.runtime_Semacquire+0x24					runtime/sema.go:76
#	0x53bde4	internal/poll.(*fdMutex).rwlock+0xc4					internal/poll/fd_mutex.go:154
#	0x53f1a4	internal/poll.(*FD).writeLock+0x64					internal/poll/fd_mutex.go:239
#	0x53f1b5	internal/poll.(*FD).Write+0x75						internal/poll/fd_unix.go:361
#	0x549bcd	os.(*File).write+0x4d							os/file_posix.go:46
#	0x549bc8	os.(*File).Write+0x48							os/file.go:215
#	0x8dc430	github.com/erigontech/erigon/common/log/v3.streamHandler.Log+0x50	github.com/erigontech/erigon/common/log/v3/handler.go:54
#	0x8dcc87	github.com/erigontech/erigon/common/log/v3.lazyHandler.Log+0x3e7	github.com/erigontech/erigon/common/log/v3/handler.go:487
#	0x8df52f	github.com/erigontech/erigon/common/log/v3.lvlFilterHandler.Log+0x2f	github.com/erigontech/erigon/common/log/v3/handler.go:302
#	0x8dc60d	github.com/erigontech/erigon/common/log/v3.multiHandler.Log+0xcd	github.com/erigontech/erigon/common/log/v3/handler.go:334
#	0x8dd1c2	github.com/erigontech/erigon/common/log/v3.(*swapHandler).Log+0x62	github.com/erigontech/erigon/common/log/v3/handler.go:567
#	0x8dda15	github.com/erigontech/erigon/common/log/v3.(*logger).write+0x175	github.com/erigontech/erigon/common/log/v3/logger.go:145
#	0x8de04a	github.com/erigontech/erigon/common/log/v3.(*logger).Debug+0x2a		github.com/erigontech/erigon/common/log/v3/logger.go:177
#	0x18dc0a5	github.com/erigontech/erigon/txnprovider/txpool.(*TxPool).Run+0x765	github.com/erigontech/erigon/txnprovider/txpool/pool.go:2115
#	0x2681104	github.com/erigontech/erigon/node/eth.(*Ethereum).Start.func7+0x84	github.com/erigontech/erigon/node/eth/backend.go:1620
#	0x93d04f	golang.org/x/sync/errgroup.(*Group).Go.func1+0x4f			golang.org/x/sync@v0.20.0/errgroup/errgroup.go:93

@AskAlexSharov AskAlexSharov enabled auto-merge (squash) April 18, 2026 12:42

@erigon-copilot erigon-copilot Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The revert makes sense on release/3.4. The earlier change optimized away synchronization in StreamHandler/rotatingWriter, but the stuck stacks here point to blocking writes on stderr/os.File.Write and fdMutex serialization in the runtime. Restoring SyncHandler at the StreamHandler boundary and reinstating rotatingWriter locking is the safer behavior for production logging, even if it gives up the no-contention benchmark path.

@AskAlexSharov AskAlexSharov merged commit e36695e into release/3.4 Apr 18, 2026
21 checks passed
@AskAlexSharov AskAlexSharov deleted the revert-20487-alex/slog_34_r34 branch April 18, 2026 12:54
AskAlexSharov added a commit that referenced this pull request Apr 22, 2026
Reverts #20652

Because I did miss-interpreted stacktrace as deadlock/race inside logger
#20661 - but it had another
Root-Cause
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant