inspect: add inspect mode for debugging crashed tendermint node#6785
inspect: add inspect mode for debugging crashed tendermint node#6785mergify[bot] merged 48 commits intomasterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #6785 +/- ##
==========================================
- Coverage 62.88% 62.66% -0.22%
==========================================
Files 307 309 +2
Lines 40464 40564 +100
==========================================
- Hits 25447 25421 -26
- Misses 13231 13343 +112
- Partials 1786 1800 +14
|
|
This pull request introduces 1 alert when merging e2a6522 into 9a2a7d4 - view on LGTM.com new alerts:
|
|
I like the approach here and the clean up!! |
fbaf7d4 to
03ee71e
Compare
cmwaters
left a comment
There was a problem hiding this comment.
Great Work!
I think just some minor touch ups / linting is required before this can be merged.
As a high level question, what would happen if I had a node running and then I inspected it at the same time. Would it work as expected or error? As a guess, this would depend on whether the db supports additional read-only connections right?
There was a problem hiding this comment.
Great Work!
I think just some minor touch ups / linting is required before this can be merged.
As a high level question, what would happen if I had a node running and then I inspected it at the same time. Would it work as expected or error? As a guess, this would depend on whether the db supports additional read-only connections right?
Yeah, this would be reliant on how the DB manages the files for the storage. I'm not yet sure users will want to run this at the same time as a node. The node already provides RPC so this would be somewhat redundant. I tried running a node of each DB type alongside the corresponding inspect command one by one.
When trying this right now:
Error that is reported by goleveldb:
./build/tendermint inspect
ERROR: failed to initialize database: resource temporarily unavailable
Error that is reported by cleveldb:
./build/tendermint inspect --db-backend cleveldb
ERROR: failed to initialize database: IO error: lock /home/william/.tendermint/data/blockstore.db/LOCK: Resource temporarily unavailable
badgerdb:
./build/tendermint inspect --db-backend badgerdb
ERROR: failed to initialize database: Cannot acquire directory lock on "/home/william/.tendermint/data/blockstore". Another process is using this Badger database.: resource temporarily unavailable
boltdb hangs trying to initialize the db connection
rocksdb:
./build/tendermint inspect --db-backend rocksdb
ERROR: failed to initialize database: IO error: While lock file: /home/william/.tendermint/data/blockstore.db/LOCK: Resource temporarily unavailable
badgerdb:
./build/tendermint inspect --db-backend badgerdb
ERROR: failed to initialize database: Cannot acquire directory lock on "/home/william/.tendermint/data/blockstore". Another process is using this Badger database.: resource temporarily unavailable
| require.NoError(t, d.Run(ctx)) | ||
| }() | ||
| // FIXME: used to induce context switch. | ||
| // Determine more deterministic method for prompting a context switch |
There was a problem hiding this comment.
FYI I also filed #6858 to track this more generally.
EDIT: Updated, see [comment below]( #6785 (comment)) This change adds a sketch of the `Debug` mode. This change adds a `Debug` struct to the node package. This `Debug` struct is intended to be created and started by a command in the `cmd` directory. The `Debug` struct runs the RPC server on the data directories: both the state store and the block store. This change required a good deal of refactoring. Namely, a new `rpc.go` file was added to the `node` package. This file encapsulates functions for starting RPC servers used by nodes. A potential additional change is to further factor this code into shared code _in_ the `rpc` package. Minor API tweaks were also made that seemed appropriate such as the mechanism for fetching routes from the `rpc/core` package. Additional work is required to register the `Debug` service as a command in the `cmd` directory but I am looking for feedback on if this direction seems appropriate before diving much further. closes: #5908
resurrect the inspect command from #6785 Co-authored-by: Sam Kleinman <garen@tychoish.com> Co-authored-by: Thane Thomson <connect@thanethomson.com> Co-authored-by: Callum Waters <cmwaters19@gmail.com>
EDIT: Updated, see comment below
This change adds a sketch of the
Debugmode.This change adds a
Debugstruct to the node package. ThisDebugstruct is intended to be created and started by a command in thecmddirectory. TheDebugstruct runs the RPC server on the data directories: both the state store and the block store.This change required a good deal of refactoring. Namely, a new
rpc.gofile was added to thenodepackage. This file encapsulates functions for starting RPC servers used by nodes. A potential additional change is to further factor this code into shared code in therpcpackage.Minor API tweaks were also made that seemed appropriate such as the mechanism for fetching routes from the
rpc/corepackage.Additional work is required to register the
Debugservice as a command in thecmddirectory but I am looking for feedback on if this direction seems appropriate before diving much further.closes: #5908