Skip to content

optimise FindBlockNum#15398

Merged
sudeepdino008 merged 30 commits into
mainfrom
e3/optimize_findblocknum
Jun 10, 2025
Merged

optimise FindBlockNum#15398
sudeepdino008 merged 30 commits into
mainfrom
e3/optimize_findblocknum

Conversation

@sudeepdino008

@sudeepdino008 sudeepdino008 commented Jun 2, 2025

Copy link
Copy Markdown
Member

ticket: #15209

  • ReadTxNumFunc gets maxtxnum given a blockNumber...it was also used to find blockNumber given maxTxNum..
  • add TxBlockIndex in-place of ReadTxNumFunc -- difference is that there's now a separate function to GetBlockNumber.
    Additionally:
  • earlier, binary search was performed over a range of block numbers, each search step queries MDBX and then fetches block from snapshot.
  • now, binary search is separated into two stages: first stage will binary_search in snapshots; second stage will do binary_search on MDBX.

@AskAlexSharov AskAlexSharov requested a review from JkLondon June 2, 2025 14:38
@sudeepdino008 sudeepdino008 changed the title optimise FindBlockNum wip: optimise FindBlockNum [DO NOT MERGE] Jun 3, 2025
@sudeepdino008 sudeepdino008 marked this pull request as ready for review June 3, 2025 11:22
@sudeepdino008 sudeepdino008 changed the title wip: optimise FindBlockNum [DO NOT MERGE] wip: optimise FindBlockNum [DO NOT MERGE or REVIEW yet] Jun 3, 2025
@sudeepdino008

Copy link
Copy Markdown
Member Author

after the change...GetBlockNumber exec_time is 12% of getLogsV3; compared to around 50% previously - #15209

Screenshot 2025-06-04 at 12 38 13 PM

@sudeepdino008 sudeepdino008 changed the title wip: optimise FindBlockNum [DO NOT MERGE or REVIEW yet] optimise FindBlockNum Jun 4, 2025

@AskAlexSharov AskAlexSharov left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sudeepdino008

  • does it work on --prune.mode=minimal?

  • do we need special DefaultTxBlockIndex object? Maybe move all logic of txnumreader to BlockReader?

  • On bormainnet:

du -hsc /erigon-data/snapshots/*bod*seg
700mb
du -hsc /erigon-data/snapshots/*bod*idx
200mb

du -hsc /erigon-data/snapshots/*trans*seg
1.4T	
du -hsc /erigon-data/snapshots/*trans*idx
1.4T	

i feel it's wrong direction to move from bodies to txs. it's maybe look ok from 1 read perspective - but if think on it from "many parallel RPC" perspective - it can make "all txs files warm".

  • Does Erigon already using bodies.idx field? Mabye we can use it here?

@sudeepdino008

sudeepdino008 commented Jun 4, 2025

Copy link
Copy Markdown
Member Author

i feel it's wrong direction to move from bodies to txs.

interesting...I can't quite formalize this...but makes sense that this will significantly increase warming of txs files...

One idea is: figure out the exact blocks.seg where the block containing query_txnum might be (this needs to mmap touches); and then binary search query_txnum on that block.seg file alone..we should still get good savings from that.

@sudeepdino008

Copy link
Copy Markdown
Member Author

the current issue with this is that entries for smaller files don't get removed when they're merged. Will add a GC method for such entries..

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

14% on FindBlockNumber now.. i guess it's because we don't have handmade rlp decoding for bodies (maybe i'm wrong).


// idea is to use lookup tables to implement uniform binary search
type BlockTxNumLookupCache struct {
cache map[snapshotsync.Range][]uint64

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how this structure better than array of blockNums (1 blockNum per 1M txnums)?

@sudeepdino008 sudeepdino008 Jun 6, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean - not having a per-file map, but rather a "flat" array that stores 1blockNum per 1M txnum?

we have to create a per-file cache structure - it allows the first few layers of the binary search (on that file) to be common among any query txNum in it.
If we don't keep it per-file, and just do this structure from [0-last_snapshot_block_num), the path that the queries take can keep changing (when a new file is introduced)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can agree on the general idea - maybe it's possible to make it simpler.
I also need to lock to synchronize access to the cache.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a "flat" array that stores 1blockNum per 1M txnum? - yes. then search will look like arr[txNum/1M] (don't need binsearch).
Of-course we can have 1 array per file or 1 array per all files - append-only.

maybe i also not fully understand current solution. but i slowly understanding - it allow store big and small files and get 1 array per file by 1 map get and then do binsearch inside this array? (array of 10 uints - brobably sequential search will be faster than binsearch).

but i don't fully understand how do we fill current array

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't get through abstractions yet:

if !ok && !exhaust {
 			query.SetValue(maxTxNum)
 		}

where/what does it set? cache map[snapshotsync.Range][]uint64 storing txnums or blocknums?

@sudeepdino008 sudeepdino008 Jun 6, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map[snapshotsync.Range][]uint64

key: file range (basically file)
value: list of lookup values (let's call this per-file array lookup)

e.g.

Screenshot 2025-06-06 at 12 09 53 PM

file range: [10000-10500)

so the file has 500 blocks..the array lookup size (assuming cadence=64) will be 9.
the binary search in [10000-10500) is mirrored in binary search on the lookup array

lookup[0] -> maxTxNum of block=10000
lookup[8] -> maxTxNum of block = 10500-1=10499

lookup[(0+8)/2 = 4] -> maxTxNum of block = (10000+10499)/2 = 10249
lookup[(0+4)/2 = 2] -> maxTxNum of block = (10000+10249)/2
lookup[(4+8)/2 = 6] -> maxTxNum of block = (10249+10499)/2
etc.

essentially the lookup stores maxTxNum for blk= [10000, 10062, 10124, 10186, 10249,10311,10374,10436,10499]

so the binary search in lookup [0-9) is mirrored in binary search in "block number space" [10000, 10500).

But if we were to mirror all blocks, then lookup size = number of block in file, we can't afford this...so there's a depth beyond which, we don't store maxTxNum for any block (so always query the file)

sort.Search in block_reader.go happens on the "blockNumber" space. While CacheQuery.Left() etc. traversing this tree in it's "lookup array index" space (mirroring).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe it's easier to have lookup as map[block_number][maxTxnum], so the cache will be map[snapshotsync.Range]map[block_number][maxTxnum], but that needs extra space for storing blockNumber too...and max cache size is already 50MB or so.

@sudeepdino008 sudeepdino008 Jun 6, 2025

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it allow store big and small files and get 1 array per file by 1 map get and then do binsearch inside this array? (array of 10 uints - brobably sequential search will be faster than binsearch).

yes. But the lookup size is number of blocks/cadence - approx 30000 for 500-step-blocks-file.

@sudeepdino008

sudeepdino008 commented Jun 6, 2025

Copy link
Copy Markdown
Member Author

14% on FindBlockNumber now.. i guess it's because we don't have handmade rlp decoding for bodies (maybe i'm wrong).

BodyForStorage has handmade decoding.
for cadence=8 (maxtxnum cached for every 8th block), FindBlockNumber disappeared entirely in the profile. (with file touches=2-3) - the max cache size will be 100MB (for bor).

@sudeepdino008 sudeepdino008 marked this pull request as draft June 9, 2025 06:28
@sudeepdino008 sudeepdino008 changed the title optimise FindBlockNum WIP: optimise FindBlockNum Jun 9, 2025
@sudeepdino008 sudeepdino008 changed the title WIP: optimise FindBlockNum optimise FindBlockNum Jun 10, 2025
@sudeepdino008 sudeepdino008 marked this pull request as ready for review June 10, 2025 05:09
@sudeepdino008

sudeepdino008 commented Jun 10, 2025

Copy link
Copy Markdown
Member Author

@AskAlexSharov you were right. Storing one lookup value every x blocks turns out to be much simpler than trying to exactly find the block number in "binary search tree".

In current case, the cache stores cache every 20 blocks, which gives max cache size of 40MB (for 100M blocks).

In separate PR, we can do "GC/removal of smaller file entries when they get merged" - #15512

if err != nil {
return true
}
txNum = atomic.LoadUint64(&lookup[i])

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can use non-pointer atomic.Uint64 - it will not add ram overhead:

type Uint64 struct {
	_ noCopy
	_ align64
	v uint64
}

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

Looks good to me.

Does MapTxNum2BlockNumIter looks ok? Does it need to call FindBlockNum/Min/Max? (maybe it can call less?)

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

FYI: for all such caches - need to keep in mind that likely we will have 2 copies of them on server: in erigon and in external rpcd

@AskAlexSharov

AskAlexSharov commented Jun 10, 2025

Copy link
Copy Markdown
Collaborator

pic from ethmainnet on this branch - looks good:
Screenshot 2025-06-10 at 20 37 45

@AskAlexSharov

Copy link
Copy Markdown
Collaborator

50% of BodyForStorageFromSnapshots is rlp decode of withdrawals. ok, some day we will get there.

@sudeepdino008

sudeepdino008 commented Jun 10, 2025

Copy link
Copy Markdown
Member Author

Looks good to me.

Does MapTxNum2BlockNumIter looks ok? Does it need to call FindBlockNum/Min/Max? (maybe it can call less?)

I think it's fine.
One thing I noticed was that the queries were - consecutive txNums. This means creating a new MapTxNum2BlockNumIter and 5 file touches for each query (to FindBlockNum).

If consecutive txNum queries are common, maybe we also add additional "hottest block" (startTxNum, endTxNum) per snapshot file. This will make answering queries for consecutive txNums faster (atleast invariantsEthGetLogs should fly through).

@sudeepdino008 sudeepdino008 enabled auto-merge (squash) June 10, 2025 14:24
@sudeepdino008 sudeepdino008 merged commit 1760f16 into main Jun 10, 2025
14 checks passed
@sudeepdino008 sudeepdino008 deleted the e3/optimize_findblocknum branch June 10, 2025 15:04
sudeepdino008 added a commit that referenced this pull request Jun 11, 2025
ticket: #15209

- `ReadTxNumFunc` gets maxtxnum given a blockNumber...it was also used
to find blockNumber given maxTxNum..
- add `TxBlockIndex` in-place of `ReadTxNumFunc` -- difference is that
there's now a separate function to GetBlockNumber.
Additionally:
- earlier, binary search was performed over a range of block numbers,
each search step queries MDBX and then fetches block from snapshot.
- now, binary search is separated into two stages: first stage will
binary_search in snapshots; second stage will do binary_search on MDBX.
sudeepdino008 added a commit that referenced this pull request Jun 12, 2025
ticket: #15209

- `ReadTxNumFunc` gets maxtxnum given a blockNumber...it was also used
to find blockNumber given maxTxNum..
- add `TxBlockIndex` in-place of `ReadTxNumFunc` -- difference is that
there's now a separate function to GetBlockNumber. Additionally:
- earlier, binary search was performed over a range of block numbers,
each search step queries MDBX and then fetches block from snapshot.
- now, binary search is separated into two stages: first stage will
binary_search in snapshots; second stage will do binary_search on MDBX.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants