Populate legacy block's state from database instead of executing transactions#5792
Populate legacy block's state from database instead of executing transactions#5792halfalicious merged 6 commits intomasterfrom
Conversation
c3d9fbf to
022ce85
Compare
Refactor Block::populateFromChain so that the function doesn't do any unnecessary work (e.g. verify a block, execute transactions).
The reprocess admin RPC method is the only function which depends on the old Block::populateFromChain functionality where it replays each tx in the block. There's no need for the function so we can remove it. Additionally, check for StateDB corruption in Block::populateFromChain before setting the state root so that we can log a user-friendly error message.
022ce85 to
c54a1fd
Compare
Seal must be removed since Client::call executes a tx which can mutate the block's state.
Codecov Report
@@ Coverage Diff @@
## master #5792 +/- ##
==========================================
+ Coverage 64.02% 64.08% +0.05%
==========================================
Files 359 359
Lines 30796 30761 -35
Branches 3417 3417
==========================================
- Hits 19718 19712 -6
+ Misses 9855 9825 -30
- Partials 1223 1224 +1 |
|
I also plan on doing some basic performance profiling to determine if we're seeing any noticeable performance deltas with these changes. |
|
I've performed some basic profiling to get some ballpark numbers of the perf impact of these changes. For context, I performed all profiling on my laptop running private Aleth release builds. I added timer code to compute the time it took for the The results are pretty dramatic - for a block without any txs (genesis), the execution time drops by ~3.5% (presumably primarily due to not validating the block). Once there are any transactions in a block, the execution time drops by 90+% (and in some cases up to ~99%!). Results:
|
gumb0
left a comment
There was a problem hiding this comment.
Would it be not enough to just call sync(_bc, _h, blockHeader)?
Ah no, I think it rather creates a block wich is supposed to be mined on top of requested one...
| // We know there are no transactions, so just populate directly. | ||
| m_state = State(m_state.accountStartNonce(), m_state.db(), BaseState::Empty); // TODO: try with PreExisting. | ||
| sync(_bc, _h, bi); | ||
| m_transactions.push_back(Transaction{txRLP.data(), CheckTransaction::None}); |
There was a problem hiding this comment.
Did it fill m_transactions, m_transactionsSet, m_currentTxs, m_currentUncles before?
I suspect these are also useful only for mining, and RPC methods get them separately from BlockChain
There was a problem hiding this comment.
Good question 😀
m_transactions&m_transactionsSet: Yesm_currentTxs&m_currentUncles: No
So you raise a good question - can we assume that this function will only be called via RPC? I populated everything in the block because populateFromChain is a public function so it can be called by anyone...unless these fields are only usable in very specific situations (e.g. mining), perhaps we should fork the populateFromChain into two functions - 1 which populates the fields which are useful in most cases and another which populates the rest?
There was a problem hiding this comment.
Yeah I think ideally these different cases should be handled somehow by different classes.
But let's not complicate it here, I think I'm fine with leaving like this.
But m_currentTxs really looks to be specific to only commitToSeal and sealBlock methods, and I think it would be fine to not copy it here. m_currentUncles probably too.
Correct - the biggest problem I see with this (assuming we don't care about |
Don't save block bytes in block and log "block not found" as warning rather than error
|
cc @gumb0 |
Also don't populate RLP representation of txs and uncles in Block::populateFromChain since they are only useful during mining.
4c6f03f to
155cd8a
Compare
Fix #5034
Description
These changes optimize the function that the client calls to initialize a
Blockobject with blockchain data (Block::populateFromChain). Note thatBlock::populateFromChainis called by numerous RPC functions e.g.eth_call,eth_getBalance,eth_sendTransaction.For context, the Client calls
Block::populateFromChainto create a newBlockobject that's initialized with data from a blockchain block with a specific hash - this initializedBlockcan then be queried for specific data (e.g.eth_getBalance) or used to execute a transaction (eth_callandeth_sendTransaction).Prior to these changes,
Block::populateFromChainpopulated a block by retrieving the target block's header and parent's block header and block bytes from the blockchain and initializing various block fields using this information (e.g. current block headerBlock::m_currentBlock, parent block headerBlock::m_previousBlock). It would also verify the target block's header and build the target block's state trie and generate the transaction receipts by setting the block being populated's state root to the parent's state and executing all of the transactions (retrieved from the target block's bytes). The verification step and tx execution steps are obviously unnecessary - rather, we can simply retrieve the state root from the state database and retrieve the transactions and receipts from the block bytes (which we get from the block database).My changes retrieve the block headers and block bytes from the database as before - however, I don't verify the block (not necessary since it was already verified during import) and I don't execute the transactions..rather I read the tx receipts from the block bytes and read the state root from the state database.
Validation
I validated my changes by retrieving the genesis block and a block with several transactions (64999) via RPC (web3.eth.getBlock) and comparing the results against a "stock" Aleth and verifying the results were identical. Results:
Genesis Block
Without Changes
With Changes
Block 64999
Without Changes
With Changes