Skip to content

answer RequestManager queries from disk if possible#6109

Merged
etan-status merged 4 commits into
unstablefrom
dev/etan/gp-diskcache
Mar 21, 2024
Merged

answer RequestManager queries from disk if possible#6109
etan-status merged 4 commits into
unstablefrom
dev/etan/gp-diskcache

Conversation

@etan-status

Copy link
Copy Markdown
Contributor

When restarting beacon node, orphaned blocks remain in the database but on startup, only the canonical chain as selected by fork choice loads. When a new block is discovered that builds on top of an orphaned block, the orphaned block is re-downloaded using sync/request manager, despite it already being present on disk. Such queries can be answered locally to improve discovery speed of alternate forks.

When restarting beacon node, orphaned blocks remain in the database but
on startup, only the canonical chain as selected by fork choice loads.
When a new block is discovered that builds on top of an orphaned block,
the orphaned block is re-downloaded using sync/request manager, despite
it already being present on disk. Such queries can be answered locally
to improve discovery speed of alternate forks.
@github-actions

github-actions Bot commented Mar 20, 2024

Copy link
Copy Markdown

Unit Test Results

         9 files  ±0    1 115 suites  ±0   25m 27s ⏱️ - 5m 18s
  4 243 tests ±0    3 896 ✔️ ±0  347 💤 ±0  0 ±0 
16 923 runs  ±0  16 525 ✔️ ±0  398 💤 ±0  0 ±0 

Results for commit a380f43. ± Comparison against base commit eb5acdb.

♻️ This comment has been updated with latest results.

@etan-status etan-status force-pushed the dev/etan/gp-diskcache branch from e274539 to c2fc6de Compare March 20, 2024 17:50
@etan-status

Copy link
Copy Markdown
Contributor Author
{"lvl":"DBG","ts":"2024-03-20 16:52:59.641+00:00","msg":"Block parent unknown or finalized already","topics":"gossip_blocks","parentId":{"oResultPrivate":true,"vResultPrivate":{"slot":7855977,"root":"7aa4631abafc3ed1835e32b819fc1330f511576a14e6c9530a1484383f939d68"}},"blockRoot":"d97d54b1","signature":"8cdb4ac2","blck":{"slot":7856108,"proposer_index":396632,"parent_root":"7aa4631a","state_root":"564f9741","eth1data":{"deposit_root":"22068c6cf375d3cbf39ddd9166835a6a7f7a5d0bde9d3ec384866779ed2c0d65","deposit_count":528461,"block_hash":"7d88904cb1fb1542c333ee2305e28fbff4922b75986017021b76cf814d4bbf4f"},"graffiti":"","proposer_slashings_len":0,"attester_slashings_len":2,"attestations_len":128,"deposits_len":0,"voluntary_exits_len":0,"sync_committee_participants":9,"block_number":10666680,"block_hash":"0xf56e62f10ff276796a5640f1667dbff02739272396c86ebd155ca748a8221049","parent_hash":"0x1cb8ff778de7f135c7e8a89ec4d59e17734189eefa92328a086d76f2df3ca96f","fee_recipient":"0xb3b0c18f1db3226fd466508c32bbb25b4ef1d696","bls_to_execution_changes_len":0,"blob_kzg_commitments_len":0}}

for example, fasttracks these, when data is available tto connect it to the DAG quicker

Comment thread beacon_chain/gossip_processing/eth2_processor.nim Outdated
Comment thread beacon_chain/sync/request_manager.nim Outdated
@etan-status etan-status enabled auto-merge (squash) March 21, 2024 14:45
@etan-status etan-status merged commit 6f46689 into unstable Mar 21, 2024
@etan-status etan-status deleted the dev/etan/gp-diskcache branch March 21, 2024 17:37
@arnetheduck

arnetheduck commented Mar 22, 2024

Copy link
Copy Markdown
Member

I'm not too sure about this one - ie the keeping of forks in the db is a bug more than a feature and we're increasing complexity of the flow quite a bit here, for a fringe case where you both restart and are loading a non-final chain.

It feels as though the better approach would perhaps be to load all heads on startup, which also would involve storing all heads on any head update (this is trivial btw).

Here, we end up with a significant increase in module coupling, creating kind of a spaghetti/pingpong flow just to support a minor use case, without instead considering the underlying source of this discord.

@etan-status

Copy link
Copy Markdown
Contributor Author

Yeah, an alternative to this (request manager ingress) + #6122 (sync manager / gossip ingress) is to extend the DB schema. Has the disadvantage of also loading non-canonical but orphaned branches. But, on the other hand, avoids the two code segments from here + #6122; not sure which one is better at this time.

Given the situation on Goerli, either of the fixes are quite important, because the network is currently split across three major branches that are relatively deep. In such a situation, re-discovering all the blocks from the network takes hours, preventing timely inclusion of a different but eventually canonical branch.

I think starting with the change that does not affect DB schema (this + #6122), seeing how it goes, before optimizing it further, is also valid. The blobless queue suffers from similar problems that the same block takes several extra round trips, btw, even in non-branching scenarios.

@arnetheduck

arnetheduck commented Mar 22, 2024

Copy link
Copy Markdown
Member

I think starting with the change that does not affect DB schema .. Has the disadvantage of also loading non-canonical but orphaned branches.

My point is more that this is a long-standing bug in chaindag - not loading branches has the consequence of them not getting pruned either, which is a fundamental bug (it's not a feature so fixing it is not a disadvantage) - here, we're basically heaping workarounds on top of this flaw which would not be necessary if the chaindag was doing the right thing (remembering what it has in the database).

Fixing this is as easy as writing a new list of non-canonical heads every now and then and reloading it on startup, and then everything will work much better - and we can keep the decoupled, more simple design by relying on chaindag working as it should.

@etan-status

Copy link
Copy Markdown
Contributor Author

Regarding pruning, either of the fixes improves the situation, as the end result is that BlockRef will be available. It is just the discovery mechanism that's different. The database solution is a bit more accurate though, indeed, as it does not depend on non-canonical branches to still be alive to get re-imported.

I suspect the total complexity to be similar ~130 lines, but a DB based solution obviously also needs more thorough testing given its persistence. I think going with the workarounds for now, to unblock other improvements while goerli still is an interesting network to test on, is also alright. Monitoring those nodes quite closely these days and finding small tweaks in different areas. Once the DB based solution exists, can remove the workarounds.

Have annotated the workaround with a TODO highlighting the discussion, in #6122

etan-status added a commit that referenced this pull request May 16, 2026
- #4732 KZG primitives added
- #4785 added gossip validation for BlobSidecar but did not check KZG
- #4844 added req/resp for BlobSidecar but did not validate responses
- #5445 fixed KZG validate_blobs in state transition (Opt[bool] jank)
- ethereum/consensus-specs#3531 new spec hits
- #5565 new spec, + fixed gossip missing inclusion proof + KZG checks
- #6109 goerli hardening, + fixed Req/Resp missing inclusion proof check
- #6741 copies the req/resp gap to Fulu
- #8254 removes the Deneb blob_quarantine
- This one now finally adds the missing KZG checks to req/resp
etan-status added a commit that referenced this pull request May 16, 2026
- #4732 KZG primitives added
- #4785 added gossip validation for BlobSidecar but did not check KZG
- #4844 added req/resp for BlobSidecar but did not validate responses
- #5445 fixed KZG validate_blobs in state transition (Opt[bool] jank)
- ethereum/consensus-specs#3531 new spec hits
- #5565 new spec, + fixed gossip missing inclusion proof + KZG checks
- #6109 goerli hardening, + fixed Req/Resp missing inclusion proof check
- #6741 copies the remaining req/resp missing KZG check gap to Fulu
- #8254 removes the Deneb blob_quarantine
- This one now finally adds the missing KZG checks to req/resp
tersec pushed a commit that referenced this pull request May 17, 2026
- #4732 KZG primitives added
- #4785 added gossip validation for BlobSidecar but did not check KZG
- #4844 added req/resp for BlobSidecar but did not validate responses
- #5445 fixed KZG validate_blobs in state transition (Opt[bool] jank)
- ethereum/consensus-specs#3531 new spec hits
- #5565 new spec, + fixed gossip missing inclusion proof + KZG checks
- #6109 goerli hardening, + fixed Req/Resp missing inclusion proof check
- #6741 copies the remaining req/resp missing KZG check gap to Fulu
- #8254 removes the Deneb blob_quarantine
- This one now finally adds the missing KZG checks to req/resp
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants