System information
Geth version: v1.2.7
OS & Version: Linux
Commit hash : 9ed95d8
Expected behaviour
No deadlock.
Actual behaviour
Deadlock
Steps to reproduce the behaviour
It may not easy to reproduce, I was running a fast node to sync the latest BSC mainnet block. I upgraded it to v1.2.7 and it keeps running smoothly for ~6 hours, then it got stuck, can not sync anymore.
bsc --tries-verify-mode none --rpc.allow-unprotected-txs --metrics --txlookuplimit 0 --datadir <path> --config ./config.toml
Backtrace
use this cmd to dump all the routine: curl http://localhost:6060/debug/pprof/goroutine?debug=2
first dump: dump_routine_0626.log
after 2 minutes: dump_routine_0626_2.log
I found the critical path got stuck by this callstack.
goroutine 66558840 [select, 6684 minutes]:
reflect.rselect({0xc0f69ac358, 0x2, 0x3?})
/opt/hostedtoolcache/go/1.19.10/x64/src/runtime/select.go:590 +0x23e
reflect.Select({0xc0297c8000?, 0x2, 0xc053fcc558?})
/opt/hostedtoolcache/go/1.19.10/x64/src/reflect/value.go:2952 +0xd2
github.com/ethereum/go-ethereum/event.(*Feed).Send(0xc000568190, {0x2153ee0?, 0xc35a583800?})
/home/runner/work/bsc/bsc/event/feed.go:170 +0x478
github.com/ethereum/go-ethereum/core.(*BlockChain).insertChain.func1()
/home/runner/work/bsc/bsc/core/blockchain.go:1772 +0x133
github.com/ethereum/go-ethereum/core.(*BlockChain).insertChain(0xc000568000, {0xc3283adbc0?, 0x1, 0x1}, 0x1, 0x1)
/home/runner/work/bsc/bsc/core/blockchain.go:2081 +0x3df9
github.com/ethereum/go-ethereum/core.(*BlockChain).InsertChain(0xc000568000, {0xc3283adbc0?, 0x1, 0x1})
/home/runner/work/bsc/bsc/core/blockchain.go:1744 +0xb51
github.com/ethereum/go-ethereum/eth.newHandler.func3({0xc3283adbc0?, 0x1, 0x1})
/home/runner/work/bsc/bsc/eth/handler.go:319 +0x6c5
github.com/ethereum/go-ethereum/eth/fetcher.(*BlockFetcher).importBlocks.func1()
/home/runner/work/bsc/bsc/eth/fetcher/block_fetcher.go:916 +0x918
created by github.com/ethereum/go-ethereum/eth/fetcher.(*BlockFetcher).importBlocks
/home/runner/work/bsc/bsc/eth/fetcher/block_fetcher.go:885 +0x417
The previous routine tries to Send the event: ChainHeadEvent, it could be caused by the following routine, which also got stuck by sending another event: NewVoteEvent, so it did not consume its ChainHeadEvent, so its channel buffer could be exhausted, which make the previous routine stuck.
goroutine 428 [select, 6692 minutes]:
reflect.rselect({0xc0c79a5bf8, 0x2, 0xc0f7d2f180?})
/opt/hostedtoolcache/go/1.19.10/x64/src/runtime/select.go:590 +0x23e
reflect.Select({0xc03501c000?, 0x2, 0xc0c79a5df8?})
/opt/hostedtoolcache/go/1.19.10/x64/src/reflect/value.go:2952 +0xd2
github.com/ethereum/go-ethereum/event.(*Feed).Send(0xc001e3a4a8, {0x2154160?, 0xc059a6bd90?})
/home/runner/work/bsc/bsc/event/feed.go:170 +0x478
github.com/ethereum/go-ethereum/core/vote.(*VotePool).putIntoVotePool(0xc001e3a480, 0xc059a6bd90)
/home/runner/work/bsc/bsc/core/vote/vote_pool.go:162 +0x2d3
github.com/ethereum/go-ethereum/core/vote.(*VotePool).loop(0xc001e3a480)
/home/runner/work/bsc/bsc/core/vote/vote_pool.go:109 +0x15f
created by github.com/ethereum/go-ethereum/core/vote.NewVotePool
/home/runner/work/bsc/bsc/core/vote/vote_pool.go:89 +0x2c5
System information
Geth version: v1.2.7
OS & Version: Linux
Commit hash : 9ed95d8
Expected behaviour
No deadlock.
Actual behaviour
Deadlock
Steps to reproduce the behaviour
It may not easy to reproduce, I was running a fast node to sync the latest BSC mainnet block. I upgraded it to v1.2.7 and it keeps running smoothly for ~6 hours, then it got stuck, can not sync anymore.
Backtrace
use this cmd to dump all the routine:
curl http://localhost:6060/debug/pprof/goroutine?debug=2first dump: dump_routine_0626.log
after 2 minutes: dump_routine_0626_2.log
I found the critical path got stuck by this callstack.
The previous routine tries to Send the event:
ChainHeadEvent, it could be caused by the following routine, which also got stuck by sending another event:NewVoteEvent, so it did not consume its ChainHeadEvent, so its channel buffer could be exhausted, which make the previous routine stuck.