-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Closed
Labels
T:bugType Bug (Confirmed)Type Bug (Confirmed)T:validatorType: Validator relatedType: Validator related
Milestone
Description
(originally posted by @fkbenjamin in cosmos/cosmos-sdk#3603)
Reproduced by @jackzampolin with sudo systemctl stop gaiad && sudo systemctl start gaiad
Tendermint Version: v0.30.0-rc0
Gaia Version: v0.31.1
Summary of Bug
On GoS6, I changed the values in gaiad.toml in two of my nodes (one validating node, one not-validating node). After stopping and restarting the node again immediately, I get the following:
I[2019-02-11|15:53:34.544] Starting ABCI with Tendermint module=main
E[2019-02-11|15:53:35.878] Error dialing peer module=p2p err="dial tcp X:X:X:X:12345: i/o timeout"
E[2019-02-11|15:53:36.032] Corrupted entry. Skipping... module=consensus wal=/home/ubuntu/.gaiad/data/cs.wal/wal err="DataCorruptionError[failed to read data: EOF]"
E[2019-02-11|15:53:36.202] data has been corrupted in last height of consensus WAL module=consensus err="DataCorruptionError[failed to read data: EOF]" height=1944
E[2019-02-11|15:53:36.202] Encountered corrupt WAL file module=consensus err="DataCorruptionError[failed to read data: EOF]"
E[2019-02-11|15:53:36.202] Please repair the WAL file before restarting module=consensus
You can attempt to repair the WAL as follows:
----
WALFILE=~/.tendermint/data/cs.wal/wal
cp $WALFILE ${WALFILE}.bak # backup the file
go run scripts/wal2json/main.go $WALFILE > wal.json # this will panic, but can be ignored
rm $WALFILE # remove the corrupt file
go run scripts/json2wal/main.go wal.json $WALFILE # rebuild the file without corruption
----
E[2019-02-11|15:53:36.202] Error starting conS module=consensus err="DataCorruptionError[failed to read data: EOF]"
E[2019-02-11|15:53:36.669] Error dialing peer module=p2p err="dial tcp X:X:X:X:12345: connect: connection refused"
E[2019-02-11|15:53:37.559] Error dialing peer module=p2p err="dial tcp X:X:X:X:12345: i/o timeout"
E[2019-02-11|15:53:37.813] Connection failed @ recvRoutine (reading byte) module=p2p peer=censoredcensored@X:X:X:X:12345 conn=MConn{X:X:X:X:12345} err=EOF
E[2019-02-11|15:53:37.813] Stopping peer for error module=p2p peer="Peer{MConn{X:X:X:X:12345} censoredcensored out}" err=EOF
E[2019-02-11|15:53:38.146] Connection failed @ recvRoutine (reading byte) module=p2p peer=censoredcensored@X:X:X:X:12345 conn=MConn{X:X:X:X:12345} err=EOF
E[2019-02-11|15:53:38.146] Stopping peer for error module=p2p peer="Peer{MConn{X:X:X:X:12345} censoredcensored out}" err=EOF
E[2019-02-11|15:53:38.146] MConnection flush failed module=p2p peer=censoredcensored@X:X:X:X:12345 err="write tcp X:X:X:X:59324->X:X:X:X:12345: use of closed network connection"
E[2019-02-11|15:53:38.479] Connection failed @ recvRoutine (reading byte) module=p2p peer=censoredcensored@X:X:X:X:12345 conn=MConn{X:X:X:X:12345} err=EOF
This caused me to miss a few blocks on Game of Stakes 6. The issue was solved by stopping the node again, waiting for a few seconds and then restarting it.
Steps to Reproduce
cosmos-sdk: 0.31.1
git commit: b9e523212ec47910a00db00be2f1b7935e201ee7
vendor hash: 85e6c5c7a700e822cccf169c97f4a3974312dfd1
go version go1.11.5 linux/amd64
- Change value in gaiad.toml (although I don't think it has anything to do with this)
- Stop Node
- Immediately start node again with
gaiad start - See corrupted WAL Errors
For Admin Use
- Not duplicate issue
- Appropriate labels applied
- Appropriate contributors tagged
- Contributor assigned/self-assigned
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
T:bugType Bug (Confirmed)Type Bug (Confirmed)T:validatorType: Validator relatedType: Validator related