about "block snapshot ahead of state" error: experience has shown that "just catchup the blocks" is not very strong/stable solution.
There are two solutions...
allow recovery
we can recover via seg rm-state
- snapshot release has lots of failures;
rm chaindata feature becomes fragile
- recovery is not smooth - because last snapshot might span multiple steps (effects time-to-tip on restart)
- minimal node + history files scenario
it's possible to live with the first 3 atleast -- by "delaying large merge" until after 1 step is available. e.g. if files 0-32, 32-48, 48-56, 56-60, 60-62, 62-63, 63-64 is available; don't merge yet...only merge after step 65 is available. This way, recovery via seg rm-state becomes lot more stable (but not 100%).
We can automate recovery in snapshot release as well. But it's hard to say if we can recover "most of the time".
"block collation caps state collation"
we're experiment a lot with reducing stepSize. We must consider how it interacts with the capping feature...
- "block collation caps state collation" while blocks collation capped by tip-96; and the blocks snapshot stepSize=1000
- note: this means it imposes limit on "step size reduction" -- it'll lag behind by about
1000*tx_rate_per_block, for ethereum it is 400k txs in db (just enough for current stepSize of 390625)
- bloatnet does 60tx/block -- so block collation cap can keep 60000 "hostage" in worst case; current stepSize is 15625 ...so we removed the block collation cap on bloatnet because it forced to keep about 3-4 state steps in db.
- so we need to revisit if we want to reduce stepsize of block snapshots - it'll "free up" state collation, specially on bloatnet.
I'm in favor of the capping solution. It needs us to reduce block snapshots stepSize to work properly. If we're considering reducing state stepSize for more chains (we probably need it for ethereum, because of the work towards higher and higher tps), this is probably the way to go.
Others:
about "block snapshot ahead of state" error: experience has shown that "just catchup the blocks" is not very strong/stable solution.
There are two solutions...
allow recovery
we can recover via
seg rm-staterm chaindatafeature becomes fragileit's possible to live with the first 3 atleast -- by "delaying large merge" until after 1 step is available. e.g. if files 0-32, 32-48, 48-56, 56-60, 60-62, 62-63, 63-64 is available; don't merge yet...only merge after step 65 is available. This way, recovery via
seg rm-statebecomes lot more stable (but not 100%).We can automate recovery in snapshot release as well. But it's hard to say if we can recover "most of the time".
"block collation caps state collation"
we're experiment a lot with reducing stepSize. We must consider how it interacts with the capping feature...
1000*tx_rate_per_block, for ethereum it is 400k txs in db (just enough for current stepSize of 390625)I'm in favor of the capping solution. It needs us to reduce block snapshots stepSize to work properly. If we're considering reducing state stepSize for more chains (we probably need it for ethereum, because of the work towards higher and higher tps), this is probably the way to go.
Others:
BlockTransactionstable andstepsInDB=5#21326reorgBlockDepthis now temporary -- if "blocks collation cap state collation" then we don't need this; but in bloatnet we disabled the cap; so we needreorgBlockDepthto make sure we don't freeze the windable txs data. Probably we'll get more clarity after discussing this.