state collation strategy: capping or recovery

about "block snapshot ahead of state" error: experience has shown that "just catchup the blocks" is not very strong/stable solution.

There are two solutions...

## allow recovery

we can recover via `seg rm-state`

- snapshot release has lots of failures; 
- `rm chaindata` feature becomes fragile
- recovery is not smooth - because last snapshot might span multiple steps (effects time-to-tip on restart)
- [minimal node + history files scenario](https://github.com/erigontech/erigon/pull/20434#issuecomment-4258335368)

it's possible to live with the first 3 atleast -- by "delaying large merge" until after 1 step is available. e.g. if files 0-32, 32-48, 48-56, 56-60, 60-62, 62-63, 63-64 is available; don't merge yet...only merge after step 65 is available. This way, recovery via `seg rm-state` becomes lot more stable (but not 100%).
We can automate recovery in snapshot release as well. But it's hard to say if we can recover "most of the time".


## "block collation caps state collation" 

we're experiment a lot with reducing stepSize. We must consider how it interacts with the capping feature...

- "block collation caps state collation" while blocks collation capped by tip-96; and the blocks snapshot stepSize=1000
- note: this means it imposes limit on "step size reduction" -- it'll lag behind by about `1000*tx_rate_per_block`, for ethereum it is 400k txs in db (just enough for current stepSize of 390625)
- bloatnet does 60tx/block -- so block collation cap can keep 60000 "hostage" in worst case; current stepSize is 15625 ...so we removed the block collation cap on bloatnet because it forced to keep about 3-4 state steps in db. 
- so we need to revisit if we want to **reduce stepsize of block snapshots** - it'll "free up" state collation, specially on bloatnet. 

---

I'm in favor of the capping solution. It needs us to reduce block snapshots stepSize to work properly. If we're considering reducing state stepSize for more chains (we probably need it for ethereum, because of the work towards higher and higher tps), this is probably the way to go.

---

Others:
- [x] SetMaxTxNumCollation -- probably needs to be removed
- [x] CollateAndPruneIfNeeded refactor - remove the loop, the time.Sleep etc.
- [ ] decision on https://github.com/erigontech/erigon/pull/20895 -- see https://github.com/erigontech/erigon/pull/20895#issuecomment-4515657141 
- [x] https://github.com/erigontech/erigon/issues/21326
- [x] `reorgBlockDepth` is now temporary -- if "blocks collation cap state collation" then we don't need this; but in bloatnet we disabled the cap; so we need `reorgBlockDepth` to make sure we don't freeze the windable txs data. Probably we'll get more clarity after discussing this.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

state collation strategy: capping or recovery #21366

allow recovery

"block collation caps state collation"

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

state collation strategy: capping or recovery #21366

Description

allow recovery

"block collation caps state collation"

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions